Apply OCR to images embedded in PDF files

dkanagala · April 11, 2025, 3:46am

What is a one sentence summary of your feature request?

Enable OCR processing for images embedded within PDF files to allow content inspection and policy enforcement.

Please describe your idea in detail. What is your problem, why do you feel this idea is the best solution, etc.

Currently, Endpoint Protector can scan and analyze text-based documents for sensitive content, but it lacks the ability to extract and analyze text within image-based PDFs—such as scanned documents or screenshots embedded in PDFs. This limitation creates a blind spot where data exfiltration can occur by simply embedding sensitive information as an image. Adding OCR capabilities to analyze images within PDFs would close this gap, ensuring the product delivers comprehensive data loss prevention across all document types. This would enhance compliance with data protection regulations and improve visibility into potential data leakage attempts.

How do you currently solve the challenges you have by not having this feature?

Today, we rely on manual review processes or separate OCR tools upstream in our document workflows to convert image-based PDFs into searchable text before EPP scans them. This is not only inefficient but also prone to human error and inconsistent enforcement. It also leaves real-time monitoring and blocking ineffective against this vector.

simona.lazsadi · April 11, 2025, 11:30am

Hi Dinesh,

Thank you for sharing this idea!

Good news! We are delighted to inform you that this item is currently being considered within our roadmap, with plans to include it in the forthcoming 5.9.4.3 release.

However, please be aware that we are continuing our efforts to extend this functionality to the macOS platform as well. So in this way, we encourage you to stay tuned for updates on the progress for that and the final target date.

Kind Regards,
Simona

simona.lazsadi · June 6, 2025, 1:41pm

Hello Dinesh,

Exciting news here!
We’ve just launched the OCR for Images Embedded in PDF files with the 5.9.4.3 EPP Client Release. Dive into our release notes for an in-depth look at this feature and discover even more exciting enhancements: Version 5.9.4.3 Released

I hope this will bring value to your workflow!

All the best,
Simona

derek.putnam · October 24, 2025, 1:18pm

A post was split to a new topic: OCR not detecting Word files

Topic		Replies	Views
OCR for embedded images Ideas under-review , ideas	1	29	May 19, 2026
OCR not detecting Word files Discussions & Questions endpoint-protector , ocr	4	118	December 27, 2025
Version 5.9.4.3 Released (Now with Hotfix 1) News endpoint-protector , announcement , release-major	8	3832	July 25, 2025
Netwrix Endpoint Protector Client Version 2605 — Release Notes News endpoint-protector , release-major , epp-client	0	296	May 19, 2026
Endpoint Protector Client Bug Fix List News endpoint-protector , bug-fix-list , epp-client	1	200	March 12, 2026

Apply OCR to images embedded in PDF files

What is a one sentence summary of your feature request?

Please describe your idea in detail. What is your problem, why do you feel this idea is the best solution, etc.

How do you currently solve the challenges you have by not having this feature?

Related topics