Milan's civic digital archive, managed under the Comune di Milano's cultural heritage directorate, is sitting on a backlog problem that has quietly grown for years: thousands of duplicate images spread across databases serving the city's museums, design institutes and urban planning offices. The issue came into sharper focus this spring, when a cross-departmental audit of assets linked to Milan-Cortina 2026 Olympic planning materials revealed that multiple agencies had independently scanned and stored identical photographs of venues, including redundant files of Piazza Santa Giulia and the Santa Giulia district development zone.
The timing matters. With the Winter Olympics now months away and the city's cultural and design economy under intense international scrutiny, the integrity of visual records — from promotional assets for fashion week to documentation of the Porta Nuova skyline — is no longer a back-office concern. Municipal IT procurement records show that in April 2026 the city approved a framework contract worth up to €2.1 million over three years for a new digital asset management platform intended to address precisely this class of problem, though implementation is still in early stages.
What Milan Is Actually Doing
The Biblioteca Braidense on Via Brera, one of Italy's national libraries and a repository for significant photographic and printed collections, began piloting AI-assisted perceptual hashing tools in late 2025 to flag near-duplicate digitised images. Staff there are working alongside the Politecnico di Milano's DASTU department — the department of architecture and urban studies — which has contributed research on metadata standardisation. The goal is to tag and collapse duplicate entries without destroying provenance records, a distinction archivists regard as non-trivial.
The Fondazione Prada's digital team, operating independently from civic structures, has further ahead. The foundation completed a deduplication sweep of roughly 340,000 digitised image assets in its internal systems by March 2026, using perceptual hashing and manual curatorial review in tandem. Their approach has since been cited informally by other Milanese institutions as a practical benchmark, though the foundation operates with resources most civic bodies cannot match.
The problem is not unique to Milan. London's Victoria and Albert Museum launched a structured duplicate-removal programme across its 1.4-million-image digital collection in 2023, completing a first pass by mid-2025. The Rijksmuseum in Amsterdam, whose open-access collection exceeds 700,000 items online, has embedded deduplication into its continuous ingestion workflow since 2022. Paris's Bibliothèque nationale de France reported in its 2024 annual report that approximately 8 percent of assets in the Gallica digitisation programme were classified as duplicates or near-duplicates — a figure that caught the attention of archivists across Europe.
Why Milan Lags, and What Comes Next
The gap is partly structural. Unlike Amsterdam or Paris, Milan distributes curatorial responsibility across a patchwork of civic departments, private foundations and national-level institutions that do not share a single digital infrastructure. The tension between the centre-right Lombardy regional government and Beppe Sala's centre-left city administration has not helped streamline shared technology investment; several proposed joint digitisation projects have stalled in inter-institutional negotiation since 2023.
The €2.1 million framework contract is meant to change that, at least within the Comune's own perimeter. If the platform — which will initially serve cultural assets connected to the Museo del Novecento and the Palazzo Reale — goes live by the first quarter of 2027 as planned, Milan will have a centralised deduplication layer for the first time. Whether private foundations and Lombardy-administered collections will eventually plug in remains an open question.
For institutions and professionals working with Milan's image libraries now, the practical advice is straightforward: assume duplicates exist, build manual checks into any archival ingestion workflow, and do not rely on filenames or basic metadata alone to identify unique assets. The tools to do this properly — perceptual hashing, machine-learning classifiers, human curatorial review — all exist. The harder work, in Milan as elsewhere, is organisational rather than technical.