Milan's museum and archive sector confronted a concrete housekeeping problem this week: duplicate image files clogging digital catalogues, slowing public-facing databases, and in some cases serving the wrong artwork to online visitors. The issue, long treated as a low-priority technical chore, moved up the agenda after the Regione Lombardia's cultural digitisation office flagged the backlog in a working document circulated to city institutions on 30 June 2026.
The timing matters. With the Milan-Cortina 2026 Winter Olympics opening ceremony now fewer than six months away, the city's cultural bodies are under pressure to present polished digital storefronts to a global audience arriving in Lombardy this winter. An influx of international visitors browsing museum collections online before they travel is expected to stress-test platforms that have not been audited since major uploads during the 2020-2022 pandemic digitisation push.
Where the Problem Sits — and Who Is Fixing It
Three institutions in particular are deep in remediation work this week. The Pinacoteca di Brera on Via Brera has been running a batch-deduplication script across its public API since Monday, targeting an estimated 4,200 redundant image entries identified by an internal audit completed in May. The Museo del Novecento, overlooking Piazza del Duomo in the Arengario building, confirmed it is in a separate but parallel process, combing through digital records tied to its permanent collection of roughly 4,000 twentieth-century works. Further north in Porta Nuova, the design-focused Fondazione Giangiacomo Feltrinelli is reviewing its digital library holdings after staff discovered duplicate scans of periodical covers had propagated through three different content management systems during a server migration last autumn.
The practical consequence for users has ranged from minor irritation — a search for a Boccioni canvas returning the same thumbnail twice — to more serious errors where metadata attached to a duplicate pointed to the wrong artist, wrong date, or wrong provenance note. For institutions whose scholarly credibility depends on catalogue precision, that kind of error carries reputational weight, particularly with international researchers.
Milan's tech sector, concentrated in the Isola and NoLo neighbourhoods north of the centre, has a commercial stake in the problem too. Several startups working on AI-assisted image recognition — including at least two incubated through PoliHub, the innovation hub attached to Politecnico di Milano on Piazza Leonardo da Vinci — have pitched deduplication tools to cultural clients over the past eighteen months. The Olympics deadline has sharpened those conversations.
The Broader Picture: Why Digital Catalogues Break This Way
Duplicate image problems in cultural archives are rarely caused by carelessness. They typically accumulate through legitimate migration events: a collection moves from one content management system to another, a vendor delivers a batch upload without cross-referencing existing records, or digitisation campaigns run in parallel across departments without a shared identifier standard. The result is a kind of digital sediment — multiple versions of the same file, sometimes in different resolutions or colour profiles, occupying server space and confusing search algorithms.
The Regione Lombardia working document, according to its stated scope, covers 47 publicly funded cultural institutions across the region. It sets a target of having primary catalogues free of confirmed duplicates before 1 October 2026 — giving institutions roughly thirteen weeks of runway. Failure to meet that internal benchmark does not carry a formal sanction, but institutions risk being deprioritised for the next round of regional digitisation funding, a pot that last year totalled approximately €3.2 million across Lombardy.
For institutions that have not yet started, the practical advice circulating among archivists this week is to begin with a file-hash audit rather than a manual review — automated tools that compare pixel-level fingerprints of images can identify true duplicates in hours, even across collections of tens of thousands of files. The harder work comes after: deciding which version of a duplicated image to keep, ensuring metadata is clean and correctly linked, and updating any external embeds or API calls that pointed to the now-deleted file. That last step, archivists note, is where the real labour hides — and where the October deadline may prove tighter than it looks.