The Daily Milan

Milan news, every day

News

Milan Leads Europe on Purging Duplicate Images From Public Databases, but Rivals Are Closing the Gap

As cities digitise vast cultural and civic archives, Milan's record on eliminating redundant visual data sets a standard — though Amsterdam and Barcelona are moving fast.

By Milan News Desk · Published 4 July 2026, 8:45 pm

3 min read

Milan Leads Europe on Purging Duplicate Images From Public Databases, but Rivals Are Closing the Gap
Photo: Photo by Diogo Miranda on Pexels

Milan's civic archive office, housed near the Archivio di Stato on Via Senato, formally completed the first phase of its duplicate-image removal programme on 30 June, scrubbing more than 340,000 redundant photographs from the city's centralised digital repository. The clean-up, part of a broader digitisation push tied to infrastructure deadlines for the Milan-Cortina 2026 Winter Olympics, has drawn attention from municipal technology officers across Europe who are wrestling with the same bloated-database problem.

The issue matters now because public institutions across the continent are racing to make their digital image libraries both searchable and legally compliant under European Union rules on data minimisation. Duplicate records inflate storage costs, slow search tools, and — critically — can mean the same misidentified or unlicensed photograph circulates indefinitely inside government systems. For Milan, which has positioned its Porta Nuova district as a continental hub for design-tech start-ups, tolerating a chaotic back-end archive contradicts the city's outward ambition.

What Milan Has Done Differently

The Comune di Milano contracted the deduplication work through its existing partnership with the digital-services arm of Fondazione Cariplo, supplementing that with tooling developed at the Politecnico di Milano's Image and Sound Processing research group, known internally as the ISPG lab. The Politecnico team's software flags near-duplicate images — not just exact copies — using perceptual hashing, a technique that catches photographs cropped or colour-adjusted to avoid straightforward matching. The city says the first phase covered civic event photographs, street-level infrastructure images, and planning department records stretching back to 2003. A second phase targeting heritage and tourism assets is scheduled to begin in September 2026.

Specifics matter here. The archive's pre-clean-up repository held approximately 1.2 million image files. The removal of 340,000 items represents a reduction of roughly 28 percent, bringing storage overhead down in a system that had been costing the municipality an estimated €180,000 per year in cloud infrastructure fees, according to the Comune's 2025 digital-budget summary document. The September phase is expected to handle roughly 400,000 additional files drawn from the Musei Civici network, which includes the Museo del Novecento and the Castello Sforzesco collections.

How Amsterdam and Barcelona Compare

Neither Amsterdam nor Barcelona has yet completed a comparable sweep at this scale, though both cities have active programmes. Amsterdam's Stadsarchief — the city's municipal archive — launched its own deduplication project in early 2025 but has publicly acknowledged, in documentation on the Stadsarchief website updated in March 2026, that work remains confined to pre-1945 photographic collections. The Barcelona city council's Smart City office began a pilot in the Eixample district's planning database in late 2025, focused narrowly on permit-application images. Neither city has publicly reported a removal figure approaching Milan's 340,000.

Paris presents a different model. The Bibliothèque nationale de France runs its own large-scale deduplication operation for the Gallica digital library, but that is a national institution managing cultural heritage, not a municipal government managing operational civic data — a meaningful distinction when assessing what Milan has actually accomplished at the local-authority level.

The gap matters commercially, too. Milan's fashion and design economy depends on image rights being clearly logged. Brands headquartered around Via Montenapoleone routinely submit image documentation to Comune planning and events offices. A duplicate-heavy civic database increases the risk that a brand's proprietary visual assets become entangled in public records — a reputational and legal exposure the city's private-sector stakeholders have lobbied against for years.

The September phase will be the harder test. Heritage images carry more complex licensing histories, and the Musei Civici collections include material whose rights sit with third-party photographers or estates. City digital officers will need to decide whether to remove, flag, or quarantine files where provenance is unclear rather than simply deleting duplicates outright. How they handle that distinction will determine whether Milan's model is genuinely exportable — or just a tidy solution to the easier half of the problem.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Milan

This article was produced by the The Daily Milan editorial desk and covers news in Milan. See our editorial standards for how we use AI.

The Daily Milan brief

The day's Milan news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Milan and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Milan news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Milan and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Milan

More in News

Enjoyed this story? Get tomorrow's briefing free.