Milan's public institutions and creative industries collectively manage an estimated 40 million digital image files — and, according to data management specialists working with the city's cultural sector, somewhere between 18 and 22 percent of those files are unnecessary duplicates. That is not an abstract problem. At current cloud storage rates, it translates into a recurring annual cost burden running into the hundreds of thousands of euros for organisations that have simply never cleaned house.
The issue has sharpened this summer because of one fixed deadline: the Milan-Cortina 2026 Winter Olympics open in February, and the Fondazione Milano Cortina 2026 is finalising its official visual archive — the repository of images that will serve broadcasters, sponsors and media partners across 50-plus countries. Bloated, disorganised image libraries do not just cost money. They cost time, and right now time is the one thing nobody in the Olympic secretariat has to spare.
Where the Duplication Problem Is Worst
The Porta Nuova district, home to the Unicredit Tower and dozens of tech and design firms that relocated there over the past decade, has become a particular flashpoint. Several mid-sized design agencies operating out of the Varesine quarter told The Daily Milan — without being named because of client confidentiality agreements — that they are running duplicate-image audits for the first time in their company histories ahead of contract renewals tied to Olympic branding work. One audit tool now widely used in the sector, the Paris-based Imagga API platform, reports that typical corporate image libraries accumulate a duplication rate of roughly 20 percent per three years of unchecked uploads.
At the Fondazione Feltrinelli on Viale Pasubio, which manages one of northern Italy's largest physical and digital cultural archives, librarians adopted a systematic deduplication protocol in 2024. The process — run using open-source software across approximately 2.1 million scanned items — identified 340,000 redundant files and recovered 1.4 terabytes of storage. Felt small? At enterprise cloud pricing on platforms such as Microsoft Azure, 1.4 terabytes costs roughly €28 per month to store — and that figure compounds every year the files sit untouched.
The fashion industry, which defines the economy of the Quadrilatero della Moda, has its own version of this headache at an entirely different scale. Major houses running seasonal campaigns generate tens of thousands of RAW image files per shoot. Without automated duplicate detection baked into their digital asset management systems — tools like Bynder or Canto, both used by multiple Milan-based fashion groups — product images end up filed under different names across different departments, with no single master copy. Industry consultants estimate that luxury fashion groups operating out of Milan spend an average of €60,000 to €90,000 per year on avoidable storage and licensing-audit overhead directly attributable to image duplication, though individual figures vary widely by company size.
What Comes Next, and What Organisations Can Do
The city's digital infrastructure push is not happening in isolation. The Comune di Milano's Piano Triennale per l'Informatica 2025–2027, the rolling three-year technology plan for municipal services, includes a specific strand on data-quality standardisation across civic departments. That plan covers the Anagrafe civica — Milan's civil registry — and the Sistema Informativo Territoriale, the GIS mapping unit based in the Palazzo Marino complex on Piazza della Scala. Both departments have flagged image-data redundancy as a second-tier priority, meaning it is on the agenda but not yet funded as a standalone project.
For smaller organisations watching the clock toward February 2026, the practical path is straightforward. Perceptual-hash algorithms — which identify near-identical images even when filenames differ — can process a library of 500,000 files in under four hours on standard hardware. Free tools including dupeGuru and open-source scripts built on the ImageHash Python library handle most use cases without enterprise licensing fees. The upfront time investment is real. The ongoing cost of not doing it is increasingly measurable, and in Milan's Olympic year, that calculation is getting harder to ignore.