Construction information services · Italy

Text recognition for resolution PDFs — less review work, better data quality

The starting situation

Thousands of Italian municipal resolutions per year are read by an older OCR/NER pipeline. With the existing field quality, too many incorrect master-data matches land in the database, date heuristics confuse format variants, and accents and layout breaks create manual review work.

How we solve it

We rebuild the pipeline on modern Document AI — LLM vision for OCR and information extraction, cross-checking through two models, and predecessor-project search through a vector database. Master data never leaves the customer infrastructure; only top-K candidates go to the LLM reranker. A confidence heatmap in the UI sends only fields below threshold to reviewers.

What you get

Data quality on critical columns — contracting authority, resolution number, date and amount — improves significantly. False positives become rarer, manual review focuses on truly uncertain cases, and the database becomes more reliable as a calculation basis and search index.

Do you recognise a similar process in your company?

If this use case feels close to your daily work, we can assess data availability, effort and a realistic first MVP together.

Why AI projects work

This use case is based on a real client project. Sector and region are named, the company itself remains anonymous.