IDOL PHI Package 12.13

IDOL Eduction Grammars

The following section describes the Eduction grammars available in the IDOL PHI Package.

You can use these grammars with IDOL Eduction, by using Eduction Server, the edktool command-line utility, or the Eduction SDK. For more information, refer to the IDOL Eduction User Guide and the Eduction SDK Programming Guide.

IMPORTANT: To use the Eduction grammars in the IDOL PHI Package, you must have a license that enables them. To obtain a license, contact Micro Focus Support.

The IDOL PHI Package includes a default configuration file, which includes the basic required settings that you need to use the PHI grammars.

NOTE: If you create your own configuration file, you must include some of the settings in the default configuration file, such as post-processing and Eduction components (see Configure Post Processing).

Configure Post Processing

When you use the IDOL PHI Package Eduction grammars it is essential to configure a Lua post-processing task to run the script phi_postprocessing.lua. This script contains post-processing to improve results for various entities, such as stop list filtering, and checksum validation (see Validated ID Numbers).

IMPORTANT: If you do not run this script, you might encounter unexpected behavior.

The default configuration file provided in the IDOL PHI Package includes a suitable post-processing task. If you use a different configuration, you must add the post-processing task to your Eduction configuration. For example:

[Eduction]
PostProcessingTask0=MyPostProcessingSection

[MyPostProcessingSection]
Type=Lua
Script=scripts/phi_postprocessing.lua
Entities=phi/*

IMPORTANT: The post-processing script requires Eduction components (see Components). The default PHI configuration file enables components. If you use a custom configuration file you must set the EnableComponents parameter to True to return components.

For more information about configuring post-processing tasks, refer to the Eduction User and Programming Guide.

Configure Pre-Filtering

Pre-filtering allows the IDOL PHI Package to run a quick initial check to find potential matches in your input text. It then selects match windows around these potential matches, reducing the amount of text that it must match against your grammars. This process can improve the performance in certain cases.

Micro Focus recommends that you use the following pre-filtering configuration with the address.ecr grammar.

[Eduction]
PrefilterTask0=AddressPrefilter

[AddressPrefilter]
Regex=\d{1,7}
WindowCharsBeforeMatch=100
WindowCharsAfterMatch=100

NOTE: Pre-filter tasks run for all configured entities, so you must configure it only for the appropriate entities to ensure that it does not affect the results for other entities.

The IDOL PHI Package also includes sample pre-filter configuration files for the name, address, and medical grammars, including dictionary pre-filter files where they are required by the sample configuration.

IMPORTANT: To use the DPF files from the 12.13 package, you must use Eduction tools with a version of 12.9 or later.

NOTE: The provided medical grammar pre-filter files can improve match performance in cases where there is a low density of matches. However, it can reduce the performance when there is a high density of matches.

For more information about pre-filtering, refer to the Eduction User and Programming Guide.

Entity Context

Some of the entities are available in two versions, with and without context. The context-based entities match the entity when it occurs in an easily identifiable location in text. For example, it might match a telephone number that occurs next to the prefix Phone:.

The entities that do not have context attempt to match the entity wherever it occurs. This version might over-match significantly (that is, it is likely to return values that are similar to the entity patterns, such a number that is not a telephone number). However, it also reduces the number of false negatives (that is, it misses fewer matches).

You can configure Eduction to use both versions of an entity; matches located with context are given a higher score in the results.

When you have data in tables, the context for an entity might not occur next to the entity value. For example, you might have a table with columns titled name and date of birth, but the values themselves do not occur next to these headers.

In this case, you can use Eduction table extraction to extract entities according to the landmarks detected in the table headers. For example, you can configure Eduction so that if it finds a table heading that matches the landmark date of birth, it extracts dates from that column.

For more information about how to configure table extraction, refer to the Eduction User and Programming Guide.