Forum Discussion
KabileshVijayakumar
Jun 10, 2025Copper Contributor
Azure Form Recognizer Redaction Issue with Scanned PDFs and Page Size Variations
Hi all,
I’m working on a PDF redaction process using Azure Form Recognizer and Azure Functions. The flow works well in most cases — I extract the text and bounding box coordinates and apply redaction based on that.
However, I’m facing an issue with scanned PDFs or PDFs with slightly different page sizes. In these cases, the redaction boxes don’t align properly — they either miss the text or appear slightly off (above or below the intended area).
It seems like the coordinate mapping doesn't match accurately when the document isn't a standard A4 size or has DPI inconsistencies.
Has anyone else encountered this?
Any suggestions on:
- Adjusting for page size or DPI dynamically?
- Mapping normalized coordinates correctly for scanned PDFs?
Appreciate any help or suggestions!
1 Reply
Sort By
How about this:
- Normalize DPI Before Processing – If the scanned PDFs have inconsistent DPI settings, try standardizing them using an image processing tool before passing them to Form Recognizer.
- Adjust Bounding Box Scaling – Some users have resolved similar issues by dynamically adjusting bounding box coordinates based on the detected page size.
- Use a Fixed Template Approach – If possible, define a standard template for expected document sizes and adjust the redaction logic accordingly.
- Check for OCR Misalignment – OCR engines sometimes misinterpret scanned text positioning. Running a preprocessing step to enhance text clarity might help.
- Explore Azure AI Document Intelligence – Azure Form Recognizer was renamed to Azure AI Document Intelligence, which may offer improved handling for scanned documents.
Document Intellegence scanned PDF or PNG to editable pdf - Microsoft Q&A