Construction information services · Italy
Text recognition for resolution PDFs — less review work, better data quality
The starting situation
Thousands of Italian municipal resolutions per year are read by an older OCR/NER pipeline. With the existing field quality, too many incorrect master-data matches land in the database, date heuristics confuse format variants, and accents and layout breaks create manual review work.
How we solve it
We rebuild the pipeline on modern Document AI — LLM vision for OCR and information extraction, cross-checking through two models, and predecessor-project search through a vector database. Master data never leaves the customer infrastructure; only top-K candidates go to the LLM reranker. A confidence heatmap in the UI sends only fields below threshold to reviewers.
What you get
Data quality on critical columns — contracting authority, resolution number, date and amount — improves significantly. False positives become rarer, manual review focuses on truly uncertain cases, and the database becomes more reliable as a calculation basis and search index.
Do you recognise a similar process in your company?
If this use case feels close to your daily work, we can assess data availability, effort and a realistic first MVP together.
Why AI projects work →This use case is based on a real client project. Sector and region are named, the company itself remains anonymous.