Azure Form Recognizer Redaction Issue with Scanned PDFs and Page Size Variations

Question

Hi all,I’m working on a PDF redaction process using Azure Form Recognizer and Azure Functions. The flow works well in most cases — I extract the text and bounding box coordinates and apply redaction based on that.However, I’m facing an issue with scanned PDFs or PDFs with slightly different page sizes. In these cases, the redaction boxes don’t align properly — they either miss the text or appear slightly off (above or below the intended area).&nbsp;It seems like the coordinate mapping doesn't match accurately when the document isn't a standard A4 size or has DPI inconsistencies.Has anyone else encountered this?Any suggestions on:Adjusting for page size or DPI dynamically?Mapping normalized coordinates correctly for scanned PDFs?Appreciate any help or suggestions!

kidd_ip · Answer

How about this:
&nbsp;

Normalize DPI Before Processing – If the scanned PDFs have inconsistent DPI settings, try standardizing them using an image processing tool before passing them to Form Recognizer.
Adjust Bounding Box Scaling – Some users have resolved similar issues by dynamically adjusting bounding box coordinates based on the detected page size.
Use a Fixed Template Approach – If possible, define a standard template for expected document sizes and adjust the redaction logic accordingly.
Check for OCR Misalignment – OCR engines sometimes misinterpret scanned text positioning. Running a preprocessing step to enhance text clarity might help.
Explore Azure AI Document Intelligence – Azure Form Recognizer was renamed to Azure AI Document Intelligence, which may offer improved handling for scanned documents.

&nbsp;
Document Intellegence scanned PDF or PNG to editable pdf - Microsoft Q&amp;A

Forum Discussion

Azure Form Recognizer Redaction Issue with Scanned PDFs and Page Size Variations

1 Reply

Resources