Combined Entities

In addition to the entities described in the PCI Eduction Grammar Reference, the IDOL Eduction Grammars Package includes grammar files that contain "combined" entities. These files are named combined_*.ecr (or combined_*_cjkvt.ecr for Japan) and the entities match names from multiple countries.

  • The entities that end in /all match data for any supported non-CJKVT country or language.
  • The entities that end in /all_cjkvt match data for any supported CJKVT country.

For example:

  • Using pii/names/all from combined_names.ecr matches a name from any non-CJKVT country. This is similar to using the name.ecr grammar file and extracting pii/name/??.

The combined (/all and /all_cjkvt) entities provide a significant improvement in processing speed when you extract matches for all countries or languages.

The combined grammar files might produce fewer matches, because (by default) only a single match is returned in cases where the same characters in the input text would match multiple countries or languages.

TIP: If you need all matches, you can turn on the AllowMultipleResults configuration option. This option slows down the matching process because it does not stop after a single match, but is generally still faster than using the individual grammars.

File Entity
combined_name.ecr pii/name/all
combined_name_cjkvt.ecr pii/name/all_cjkvt
pii/name/latin/all_cjkvt
pii/name/cjkvt/all_cjkvt