Milan's major cultural and commercial institutions are sitting on a problem that has been building quietly since roughly 2003: thousands of duplicate images locked inside digital archives that were never properly organised, costing staff time, inflating storage bills, and in some cases sending the wrong photograph to print. The reckoning is now underway.
The issue matters today because the pressure to fix it has converged from several directions at once. The Milan-Cortina 2026 Winter Olympics, whose opening ceremony is scheduled for February 6, has pushed the city's communications and tourism infrastructure into an unusually public spotlight. Institutions from the Comune di Milano to the Fondazione Fiera Milano in the Portello district have been auditing their image libraries ahead of a global audience. What they found, in many cases, was a sprawling mess.
The Origins: Digitisation Without a Plan
The root of the problem traces back to a wave of digitisation projects that swept through Italian public bodies after the Codice dei Beni Culturali was revised in 2004. Museums, design schools, and municipal press offices scanned physical archives at speed, often using multiple contractors working to different file-naming conventions and resolution standards. A single photograph of, say, the Galleria Vittorio Emanuele II might exist in the same database at 72 dpi, 150 dpi, and 300 dpi, each saved under a different filename, none of them flagged as related to the others.
The problem compounded through the 2010s as social media departments spun up inside the same organisations. Images downloaded, re-exported, cropped for Instagram, and then re-uploaded into the master archive created what archivists describe as phantom duplicates — files that are visually identical but differ slightly in metadata or pixel dimensions. By 2020, one estimate from the European Commission's digital cultural heritage working group put the proportion of duplicate or near-duplicate files in large institutional image libraries at between 18 and 35 percent, though the precise figure varies sharply depending on the institution and how duplicates are defined.
In Milan's fashion and design economy, where image rights carry direct commercial value, the stakes are higher than in most cities. A misidentified archive file sent to a luxury client can mean not just embarrassment but legal exposure. The Camera Nazionale della Moda Italiana, headquartered near Via Montenapoleone, began a systematic deduplication review in late 2024. The Politecnico di Milano's design faculty, which maintains an image archive of student and faculty work stretching back to the 1980s, launched a parallel project in January 2025 using open-source perceptual hashing tools to flag near-identical files for human review.
What a Fix Actually Looks Like
Deduplication is not a single action. It is a process with several distinct stages: automated detection, human verification, metadata reconciliation, and finally deletion or consolidation. The automated stage has become significantly cheaper since 2022, when cloud-based tools capable of processing tens of thousands of images per hour became available at price points accessible to mid-sized organisations — some starting at under €200 per month. The human verification stage remains the expensive part, because software can flag near-duplicates but cannot always determine which version of an image is the canonical one an institution actually wants to keep.
The Porta Nuova business district, which has attracted a concentration of tech and creative firms over the past decade, has become something of an informal testing ground for newer approaches. Several agencies based in the Varesine towers have adopted workflows that embed deduplication checks into the upload process itself, preventing the pile-up from recurring. The principle is straightforward: it is far cheaper to stop duplicates forming than to excavate them years later.
For institutions now beginning the process, archivists working on similar projects in other European cities recommend starting with a read-only audit before touching any files — a step that takes longer upfront but prevents accidental deletion of master copies. The Milan-Cortina deadline has concentrated minds, but the underlying work will extend well past the closing ceremony. The archives did not fill up overnight, and they will not be fixed overnight either.