Sensitive Data Detection by File Extension in eDiscovery

What is a one sentence summary of your feature request?

I request the ability for eDiscovery to detect sensitive data only within specific file extensions, similar to how CAP allows targeted detection.

Please describe your idea in detail. What is your problem, why do you feel this idea is the best solution, etc.

In CAP, the Content Detection Summary feature enables administrators to detect sensitive data only within designated file extensions, making scans more precise and efficient. However, eDiscovery currently lacks this capability and scans all files indiscriminately, which generates unnecessary results and reduces efficiency. By adding an option for eDiscovery to limit sensitive data detection to specific file extensions, the solution would become more practical and significantly enhance usability in real-world scenarios.

How do you currently solve the challenges you have by not having this feature?

Currently, we cannot restrict eDiscovery to detect sensitive data by file extension. As a result, we must review a wide range of irrelevant scan results, which increases workload and slows down analysis.

Hi KimDongHyun,

Thank you for your query. It seems like an interesting use case, and I have a small suggestion for you. Could you try defining your “File Name” dictionary in the Denylists, specifying particular extensions like .docx? Then, add it to your eD policy and verify if this solution meets your needs.

Best regards, Krzysiek

Hello, krzysiek

As you mentioned, configuring it this way will only detect the specified file extensions, but it will not block extensions that contain sensitive data.

Best regards, Donghyun

Hi Donghyun,

Just to clarify, following your request, I configured it as you described:

I request the ability for eDiscovery to detect sensitive data only within specific file extensions, similar to how CAP allows targeted detection.

In the eDiscovery policy, I defined:

  • FileType (a more accurate method than just extension configuration, as we rely on true type extension definitions), specifically Word and PDF documents in my case.

  • Checked the necessary predefined content, specifically PL SSN in my example.

As a result, I was able to search for sensitive data within the specified file types. Consequently, my report, as you requested, focused solely on this narrow criteria.

Isn’t this the use case you wanted to achieve?

Best regards,
Krzysiek

Hello, krzysiek

As far as I understand, the items checked under File Type and those checked under predefined Content or Custom Content work as an OR condition.
In the example you provided, this would mean that all files containing PL SSN as well as all Word and PDF files would be detected.

Am I mistaken in this understanding?

What I want is for the condition to work as an AND, so that only Word/PDF files containing PL SSN from the predefined Content would be detected.

I apologize if I misunderstood.

Thank you.

Best regards, Donghyun

Hello, this is Donghyun.

I would like to kindly request a reminder regarding the previous request.

Thank you.

Hello @Donghyun_Kim,

Thank you for your patience, and my apologies for the delayed response — your message somehow slipped through my notifications.

Regarding the scenario you described (having the condition work as an AND so that only Word/PDF files containing PL SSN from the predefined Content are detected) — this will be achievable once the Content Detection Summary feature is extended to eDiscovery Policies.

At the moment, this functionality is available for Content Aware Policies, and we’re planning to introduce the same capability for eDiscovery in a future release.
We can consider your request approved and aligned with our roadmap, though please note that we currently don’t have a specific delivery timeline as it’s part of our mid/long-term plans.

Rest assured, we’ll keep you informed as soon as there’s progress on this feature.

Kind Regards,
Simona

Hello, this is Donghyun.

Thank you for your response and for approving the requested feature.

Thank you