The Daily Milan

Milan news, every day

News

How Milan's Visual Archives Ended Up Drowning in Duplicate Images — and the Long Road to Fixing It

From the chaotic digitisation drives of the early 2000s to today's AI-assisted cleanup, the city's institutions are finally confronting a decades-old problem embedded in their own hard drives.

By Milan News Desk · Published 4 July 2026, 9:00 pm

3 min read

How Milan's Visual Archives Ended Up Drowning in Duplicate Images — and the Long Road to Fixing It
Photo: Photo by Andrew Patrick Photo on Pexels

Milan's major cultural and commercial institutions are sitting on a problem that has been building quietly since roughly 2003: thousands of duplicate images locked inside digital archives that were never properly organised, costing staff time, inflating storage bills, and in some cases sending the wrong photograph to print. The reckoning is now underway.

The issue matters today because the pressure to fix it has converged from several directions at once. The Milan-Cortina 2026 Winter Olympics, whose opening ceremony is scheduled for February 6, has pushed the city's communications and tourism infrastructure into an unusually public spotlight. Institutions from the Comune di Milano to the Fondazione Fiera Milano in the Portello district have been auditing their image libraries ahead of a global audience. What they found, in many cases, was a sprawling mess.

The Origins: Digitisation Without a Plan

The root of the problem traces back to a wave of digitisation projects that swept through Italian public bodies after the Codice dei Beni Culturali was revised in 2004. Museums, design schools, and municipal press offices scanned physical archives at speed, often using multiple contractors working to different file-naming conventions and resolution standards. A single photograph of, say, the Galleria Vittorio Emanuele II might exist in the same database at 72 dpi, 150 dpi, and 300 dpi, each saved under a different filename, none of them flagged as related to the others.

The problem compounded through the 2010s as social media departments spun up inside the same organisations. Images downloaded, re-exported, cropped for Instagram, and then re-uploaded into the master archive created what archivists describe as phantom duplicates — files that are visually identical but differ slightly in metadata or pixel dimensions. By 2020, one estimate from the European Commission's digital cultural heritage working group put the proportion of duplicate or near-duplicate files in large institutional image libraries at between 18 and 35 percent, though the precise figure varies sharply depending on the institution and how duplicates are defined.

In Milan's fashion and design economy, where image rights carry direct commercial value, the stakes are higher than in most cities. A misidentified archive file sent to a luxury client can mean not just embarrassment but legal exposure. The Camera Nazionale della Moda Italiana, headquartered near Via Montenapoleone, began a systematic deduplication review in late 2024. The Politecnico di Milano's design faculty, which maintains an image archive of student and faculty work stretching back to the 1980s, launched a parallel project in January 2025 using open-source perceptual hashing tools to flag near-identical files for human review.

What a Fix Actually Looks Like

Deduplication is not a single action. It is a process with several distinct stages: automated detection, human verification, metadata reconciliation, and finally deletion or consolidation. The automated stage has become significantly cheaper since 2022, when cloud-based tools capable of processing tens of thousands of images per hour became available at price points accessible to mid-sized organisations — some starting at under €200 per month. The human verification stage remains the expensive part, because software can flag near-duplicates but cannot always determine which version of an image is the canonical one an institution actually wants to keep.

The Porta Nuova business district, which has attracted a concentration of tech and creative firms over the past decade, has become something of an informal testing ground for newer approaches. Several agencies based in the Varesine towers have adopted workflows that embed deduplication checks into the upload process itself, preventing the pile-up from recurring. The principle is straightforward: it is far cheaper to stop duplicates forming than to excavate them years later.

For institutions now beginning the process, archivists working on similar projects in other European cities recommend starting with a read-only audit before touching any files — a step that takes longer upfront but prevents accidental deletion of master copies. The Milan-Cortina deadline has concentrated minds, but the underlying work will extend well past the closing ceremony. The archives did not fill up overnight, and they will not be fixed overnight either.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Milan

This article was produced by the The Daily Milan editorial desk and covers news in Milan. See our editorial standards for how we use AI.

The Daily Milan brief

The day's Milan news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Milan and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Milan news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Milan and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Milan

More in News

Enjoyed this story? Get tomorrow's briefing free.