A coordinated push to clean up duplicate and mislabelled images across Milan's major public digital archives took a concrete step forward this week, when three of the city's leading cultural institutions agreed to align their cataloguing systems under a shared technical protocol. The agreement, reached in meetings at the Palazzo delle Stelline in Corso Magenta on Wednesday, July 1, brings together the Pinacoteca di Brera, the Triennale di Milano, and the Archivio Civico Fotografico — institutions that together hold tens of thousands of digitised records accessible to the public online.
The problem of duplicate images in digital archives is not new, but it has grown sharply as institutions scrambled to digitise collections during the 2020-2022 period and then pushed those assets onto public-facing platforms with limited cross-checking. A single artwork or historical photograph can appear under multiple catalogue entries, sometimes with conflicting dates, different titles, or incompatible rights declarations. For researchers, educators, and the design and fashion industry professionals who regularly mine these collections for reference material, the errors create real practical problems — wrong attributions end up in publications, licensing disputes surface, and search results return cluttered, redundant results.
Why This Week's Agreement Matters
The timing is not accidental. Milan-Cortina 2026, the Winter Olympics opening in February, has placed enormous pressure on the city's cultural and communications infrastructure to present a coherent, high-quality digital face to international audiences. Tourism bodies, media rights holders, and accredited press organisations routinely pull imagery from civic archives, and duplicate or misidentified photographs create legal and reputational exposure. The Comune di Milano's digital services directorate flagged the issue formally in a report circulated to department heads in late May, according to public procurement notices posted on the city's transparency portal.
The Archivio Civico Fotografico alone holds more than 400,000 digitised images, a collection that spans from late-nineteenth-century views of the Navigli canal network to mid-century industrial reportage from the Bicocca manufacturing district. Staff there have been working since early 2025 to implement a deduplication workflow using perceptual hashing — a technique that generates a numerical fingerprint for each image and flags near-identical files for human review. The backlog as of June 30 stood at roughly 18,000 flagged image pairs awaiting manual verification, according to figures published on the archive's own project tracking page.
Brera's digital team faces a different but related challenge. Its online catalogue, which integrated with the national MIC — Ministero della Cultura — SigecWeb system, contains entries where the same photographic reproduction of a painting appears under both the original artwork record and a separate photograph record, effectively doubling the search noise. The Triennale, meanwhile, discovered last autumn that several hundred images from its historical design exhibitions had been ingested twice during a 2023 server migration, with metadata timestamps that made automated deduplication unreliable.
What the Protocol Requires — and What Comes Next
The shared protocol agreed this week is built around three requirements: a common persistent identifier format for every image file, a standardised rights statement using Creative Commons vocabulary, and a quarterly cross-institutional reconciliation check. The Politecnico di Milano's Department of Design, which has a working relationship with both the Triennale and the Comune, is providing technical consultancy on the identifier scheme. The first reconciliation run is scheduled for October 2026, timed to precede the Olympic media surge.
For anyone who uses these archives — designers sourcing historical reference imagery along Via Tortona, researchers at the Biblioteca Nazionale Braidense on Via del Senato, or journalists pulling archival photographs — the practical advice for now is straightforward. Cross-reference any image pulled from a civic digital catalogue against at least one secondary source before publishing or licensing. Check the catalogue entry date: records created or updated before January 2025 carry the highest duplication risk. And if you find a clear duplicate during your own searches, the Archivio Civico Fotografico has an active feedback form on its portal — staff say they are processing submissions within ten working days. The cleanup is real, it is moving, and by early 2027, the collections should finally start to look like the world-class resources they were always meant to be.