Combined Entities
In addition to the entities described in the PCI Eduction Grammar Reference, the IDOL Eduction Grammars Package includes grammar files that contain "combined" entities. These files are named combined_*.ecr
(or combined_*_cjkvt.ecr
for Japan) and the entities match names from multiple countries.
- The entities that end in
/all
match data for any supported non-CJKVT country or language. - The entities that end in
/all_cjkvt
match data for any supported CJKVT country.
For example:
- Using
pii/names/all
fromcombined_names.ecr
matches a name from any non-CJKVT country. This is similar to using thename.ecr
grammar file and extractingpii/name/??
.
The combined (/all
and /all_cjkvt
) entities provide a significant improvement in processing speed when you extract matches for all countries or languages.
The combined grammar files might produce fewer matches, because (by default) only a single match is returned in cases where the same characters in the input text would match multiple countries or languages.
TIP: If you need all matches, you can turn on the AllowMultipleResults
configuration option. This option slows down the matching process because it does not stop after a single match, but is generally still faster than using the individual grammars.
File | Entity |
---|---|
combined_name.ecr | pii/name/all |
combined_name_cjkvt.ecr | pii/name/all_cjkvt |
pii/name/latin/all_cjkvt | |
pii/name/cjkvt/all_cjkvt |