The Daily Milan

Milan news, every day

News

Milan's Image Duplication Problem: The Numbers Showing How Repeated Visuals Are Costing the City's Digital Projects

From the Porta Nuova smart-city rollout to Olympic venue archives, duplicate image files are quietly inflating storage bills and degrading search performance across Milan's public and commercial databases.

By Milan News Desk · Published 4 July 2026, 8:47 pm

3 min read

Milan's Image Duplication Problem: The Numbers Showing How Repeated Visuals Are Costing the City's Digital Projects
Photo: Photo by Mikhail Nilov on Pexels

Duplicate images are not a glamorous problem. But the data emerging from audits of major Milan-based digital repositories suggests they are an expensive one. Across the city's fashion, tourism and public infrastructure sectors, redundant image files now account for an estimated 23 to 31 percent of total digital storage footprint — a figure consistent with international benchmarks published by cloud-infrastructure analysts tracking European municipal datasets in 2025.

The timing matters because Milan is mid-sprint. The Milan-Cortina 2026 Winter Olympics, opening in February, has accelerated digitisation across every city-linked agency. The Comune di Milano's communications office, the Fondazione Milano Cortina 2026, and the tourism body Milano & Partners are all managing high-volume image libraries simultaneously. Duplicates accumulate fastest when multiple teams ingest the same press photography or venue footage independently, without a centralised deduplication protocol in place.

What the Numbers Actually Look Like

Storage is priced by the gigabyte. Enterprise cloud contracts typical among Italian public bodies run between €0.018 and €0.035 per GB per month under standard tiered agreements — meaning a 10-terabyte archive carrying 30 percent redundancy wastes roughly 3 TB, translating to a recurring cost of between €54 and €105 every month for data that contributes nothing. Multiply that across a city-scale operation running dozens of sub-archives and the annual figure climbs well into five figures.

The problem shows up in search, too. Image search engines — whether internal content-management systems or public-facing platforms — score relevance partly on uniqueness signals. When the same JPEG of the Piazza Gae Aulenti fountains or the Bosco Verticale towers in Porta Nuova appears under 14 different filenames, algorithms deprioritise the entire batch. The practical consequence: press officers and designers spend longer finding the right asset, and publicly indexed pages rank lower in Google Image Search, which directly affects inbound tourism traffic.

Fondazione Fiera Milano, which manages the vast exhibition campus in Rho just northwest of the city centre, processes tens of thousands of event photographs per trade-show cycle. At events like Salone del Mobile — which in April 2026 drew more than 370,000 visitors across six days according to organiser figures — a single show generates multiple simultaneous photography commissions whose outputs frequently overlap. Without automated hash-matching on ingest, duplicate rates in such environments routinely exceed 25 percent by the end of an event week.

Local Projects Now Tackling the Problem

Two Milan-linked initiatives are directly confronting this. The Politecnico di Milano's Design Department, based on Via Durando in the Bovisa district, has been developing an open-source perceptual-hashing toolkit as part of a broader digital preservation research programme that received European Regional Development Fund support in 2024. The tool compares images not just by identical file hash but by visual similarity, catching near-duplicates that differ only in compression or resolution.

Separately, the Biblioteca Nazionale Braidense on Via Brera — one of Italy's oldest national libraries — completed a deduplication audit of its digitised art and manuscript image holdings in late 2025. The audit identified roughly 18,000 redundant image files within a collection of approximately 240,000 assets, a duplication rate of about 7.5 percent for a curated institutional archive — far lower than commercial or governmental norms, but still representing recoverable storage and cataloguing overhead.

For smaller operators — the boutique design studios clustered around the Isola neighbourhood, or the fashion houses maintaining press libraries in the Quadrilatero della Moda — the practical fix does not require enterprise software. Free tools including digiKam and open-source scripts built around Python's ImageHash library can scan a local drive and flag duplicates by similarity threshold in under an hour for collections up to 50,000 files. The Politecnico toolkit, expected to release a public beta in the third quarter of 2026, is designed to extend that capacity to cloud-hosted archives at municipal scale.

The Olympics deadline is the hard forcing function. By the time the torch arrives in Milan this winter, every agency feeding images to accredited media outlets will need clean, searchable, legally cleared libraries. Getting there means running deduplication audits now — not after the opening ceremony, when the backlog will be measured in terabytes rather than gigabytes.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Milan

This article was produced by the The Daily Milan editorial desk and covers news in Milan. See our editorial standards for how we use AI.

The Daily Milan brief

The day's Milan news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Milan and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Milan news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Milan and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Milan

More in News

Enjoyed this story? Get tomorrow's briefing free.