List All Term RegEx Matches

I am doing an evaluation of the taxonomies and terms that have had hits in our environment to tune out false-positives. I am finding myself having to evaluate individual pages to see what was found by the RegEx in order to determine if it is a false-positive or not.

Is there a way to select a Taxonomy and Term, and maybe even a specific clue, and have have NDC list all of the matches that it found for all documents tagged with that term? Seeing an enter list in an export/report would save a ton of time.

Hi Marc,

That’s a good question! Let me check with the dev team and get back to you.

- Dan

1 Like

I would even be fine with a TSQL query that I can run manually to generate this data. I just don’t want to do all of that clicking :slight_smile: I know you don’t do Professional Services anymore, but if someone can confirm the tables involved, I can write something myself.

Marc,

Unfortunately, the short answer is that there is currently no mechanism to get the values found for all the Regexes so the exact functionality you have requested would be a feature request. There is functionality in the product to capture all of the metadata values used in the product that I had hoped to use for this. Unfortunately, it only captures metadata that the indexer processes not that added by the classifier.

With the product in its current state the only option to capture the data you want would be to create a workflow plugin that stores the Regex values for each document in a repository of your choosing then re-classify all of your content. If that interests you then you can download a sample plugin from within the tool.

OK. Thank you.

Where are the Clue “hits” stored? Or are they not stored and presented in the UI on-the-fly?

Marc, The information is stored in the SQL database in the PageText table. Unfortunately the metadata is stored as a compressed (gzip) base64 encoded value. If you want to try to unpick it then look for the rows where Type=7 as those are the Regular expression matches.

Thank you for the info. I might try to “data warehouse” that metadata. Storage is cheap. CPU and memory is not ! Well, relative to one another. :slight_smile:

What about other matches, like basic text hits.?

Unfortunately all of the basic text hits are stored in the proprietry inverted index.

One possible option for you is the “Auto Classification Review” report which allows you to see for each document which clues have matched including Regex and text clues. You will need to enable “classification Calculations” in the tracing and do a re-classification of the documents you want to get this data for. Then it will return all of the documents matching a specified term along with the clues that matched for each. This report can be run either manually or scripted through the API.

Thank you.