Milan's leading cultural and civic institutions are staring down a problem that has quietly ballooned alongside their digitisation drives: duplicate image files are clogging archives, inflating storage costs and threatening the accuracy of public-facing catalogues at a moment when the city's global visibility is at an all-time high. The issue has sharpened ahead of the Milan-Cortina 2026 Winter Olympics, which opens in February, with institutions under pressure to present clean, reliable digital inventories to an international audience.
The timing matters because Milan spent the better part of the last three years accelerating digitisation. The push was partly funded through the Piano Nazionale di Ripresa e Resilienza — Italy's post-pandemic recovery plan — which earmarked resources for cultural heritage digitalisation across the country. Local institutions responded by scanning at scale. The unintended consequence: redundant files, inconsistent metadata tagging and, in some cases, conflicting version histories sitting inside the same database.
Where the Bottlenecks Are Worst
The Pinacoteca di Brera on Via Brera is among the institutions grappling most visibly with the backlog. Its digital catalogue, accessible through the national SAN — Sistema Archivistico Nazionale — network, contains works captured under multiple digitisation campaigns stretching back to 2009. Each campaign used different resolution standards and naming conventions, leaving archivists to reconcile files that technically represent the same physical painting but appear as separate records. The problem is compounded at the Museo del Novecento in Piazza del Duomo, where a 2024 expansion of gallery space triggered a new round of photography that was loaded onto servers already holding legacy scans.
Across town in Porta Nuova, the design and fashion sector faces a parallel crisis. The Camera Nazionale della Moda Italiana maintains image libraries tied to seasonal collections going back decades. Industry sources familiar with the organisation's internal logistics — though not authorised to speak on record — have described the deduplication challenge as one of the most labour-intensive tasks the sector's archivists now face, particularly as brands prepare high-resolution lookbooks for international licensing. No official figure on the scale of duplication has been published, but a 2023 study by the European Commission's Joint Research Centre found that cultural heritage databases across EU member states contained duplicate records at rates ranging from 12 to 31 percent, depending on institution size and digitisation history.
What Happens Next: The Decisions No One Can Avoid
The immediate fork in the road is technical: institutions must choose between algorithmic deduplication tools — which can process thousands of files quickly but require human review to catch near-duplicates where the image content is identical but the file metadata differs — and manual audits, which are slower but more precise. Several Italian university departments, including the Politecnico di Milano's design faculty in Bovisa, have been developing AI-assisted matching tools tailored to archival image sets. Whether public institutions can afford to license or co-develop those tools under current budget constraints is the central question for the second half of 2026.
The city government's Assessorato alla Cultura has not yet announced a unified deduplication policy, though internal discussions are understood to be ongoing. Any framework it adopts would likely apply to institutions receiving municipal funding, which includes the Civiche Raccolte d'Arte — the umbrella body overseeing the Museo del Novecento and the Castello Sforzesco collections. A decision before the October deadline for Olympic-related infrastructure sign-offs would give institutions the maximum runway to clean up catalogues before global scrutiny arrives in earnest.
For smaller organisations — the independent design studios clustered around the Tortona district, the photography galleries along Corso di Porta Ticinese — the practical advice from archivists and IT consultants is to start with low-hanging fruit: purging exact-match duplicates using open-source tools before tackling near-duplicates that require contextual judgment. Storage costs are not trivial; commercial cloud pricing for large uncompressed image archives in Italy typically runs between €0.02 and €0.05 per gigabyte per month, meaning even a modest reduction in redundant files produces measurable savings over a 12-month cycle. The bigger prize, though, is reputational — a clean, trustworthy catalogue that holds up under international scrutiny when the Olympic spotlight swings toward Milan this winter.