Milan's War on Duplicate Images: How the City Stacks Up Against London, Paris and Tokyo
From Porta Nuova to the Navigli, Milan's institutions are racing to clean up digital archives bloated with duplicate imagery — and the results are uneven.
From Porta Nuova to the Navigli, Milan's institutions are racing to clean up digital archives bloated with duplicate imagery — and the results are uneven.

Milan's major cultural institutions and city agencies collectively manage millions of digitised assets, and a growing share of that archive is redundant. The problem has a name in archival circles — duplicate image proliferation — and the Comune di Milano's digital services directorate flagged it as a priority area in its 2025-2026 digital transformation roadmap. The question now is whether the city's approach is working, and how it compares with what comparable cities are actually doing.
The issue matters more in mid-2026 than it did even two years ago because of the Milan-Cortina Winter Olympics. Dozens of municipal bodies, tourism boards, and private sponsors are publishing event imagery simultaneously across platforms. Without coordinated deduplication protocols, the same photograph can sit in eight separate databases, inflating storage costs, muddying licensing records, and creating legal exposure when copyright attribution is applied inconsistently to near-identical files.
Two institutions stand out in the local effort. The Fondazione Giangiacomo Feltrinelli, based on Viale Pasubio in the Isola district, began a structured image-deduplication audit of its digital library in the autumn of 2025, applying perceptual hashing — a technique that identifies visually similar images even when file names or metadata differ — across roughly 340,000 catalogued assets. The foundation has not published a completion date. Separately, the Triennale di Milano in the Parco Sempione has integrated deduplication checkpoints into its Digital Collections Policy, updated in January 2026, requiring that any newly ingested image batch be screened against existing holdings before accession.
The city's broader infrastructure lags those two examples. The Archivio Storico Civico, housed in the Castello Sforzesco, still relies primarily on manual tagging and cataloguer review to catch duplicates — a process that archival management specialists have long described as inadequate for collections of that scale, though the archive has not publicly quantified the backlog. The Milan Tourism Office, which manages the visual asset library supporting Milano è Unica and related campaigns, told The Daily Milan it was evaluating automated solutions but declined to confirm a procurement timeline.
London's Victoria and Albert Museum completed a system-wide deduplication project for its 1.2 million digitised objects in 2024, using AI-assisted clustering tools developed in partnership with a UCL research team. Paris's Bibliothèque nationale de France has run automated duplicate screening on all incoming digital acquisitions since 2022 under its Plan de transformation numérique. Tokyo's National Museum consortium — covering four institutions including the Tokyo National Museum in Ueno — adopted a shared image registry in March 2025 that flags near-duplicates at the point of upload across all member collections.
By those benchmarks, Milan is operating one to two years behind the curve. The gap is partly structural: London and Paris benefit from centralised national funding streams for digital infrastructure that Italian municipal institutions do not access directly. Tokyo's consortium model required years of inter-institutional negotiation that Milan has not replicated. But the gap is also a product of procurement pace. A competitive tender for a unified digital asset management platform covering twelve Comune di Milano departments was issued in October 2025 with a closing date of February 2026; as of this week, no contract award had been published in the city's official procurement register.
Storage costs provide a concrete illustration of the stakes. Commercial cloud storage for cultural institutions in Europe is running at roughly €18 to €22 per terabyte per month for archival-grade tiers, according to published pricing from major providers. A collection holding even 50,000 unnecessary duplicate image files at high resolution can represent several terabytes of redundant data — a modest but recurring expense that compounds annually and grows faster as event-driven content volumes spike.
Institutions managing digital collections in Milan should not wait for the Comune's procurement process to resolve. The Triennale's approach — building deduplication into intake policy rather than treating it as a retrospective cleanup exercise — is replicable at relatively low cost using open-source perceptual hashing libraries. For smaller organisations along the Via Tortona design corridor or the Brera museum cluster, a shared intake protocol, modelled loosely on Tokyo's registry concept, would cost less and move faster than any single-institution tender. The Olympics deadline is not abstract. Content volumes will spike before the end of this year, and duplicates accumulate precisely when organisations are moving fastest.
How does this story make you feel?
Spread the word
About this article
Published by The Daily Milan
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News