Combined Entities
In addition to the entities described in the Eduction Grammar Reference, the IDOL PCI Package includes grammar files that contain "combined" entities. These files are named combined_*.ecr (or combined_*_cjkvt.ecr for Japan) and the entities match names from multiple countries.
- The entities that end in
/allmatch data for any supported non-CJKVT country or language. - The entities that end in
/all_cjkvtmatch data for any supported CJKVT country.
For example:
- Using
pci/names/allfromcombined_names.ecrmatches a name from any non-CJKVT country. This is similar to using thename.ecrgrammar file and extractingpci/name/??.
The combined (/all and /all_cjkvt) entities provide a significant improvement in processing speed when you extract matches for all countries or languages.
The combined grammar files might produce fewer matches, because (by default) only a single match is returned in cases where the same characters in the input text would match multiple countries or languages.
TIP: If you need all matches, you can turn on the AllowMultipleResults configuration option. This option slows down the matching process because it does not stop after a single match, but is generally still faster than using the individual grammars.
| File | Entity |
|---|---|
| combined_name.ecr | pci/name/all |
| combined_name_cjkvt.ecr | pci/name/all_cjkvt |
| pci/name/latin/all_cjkvt | |
| pci/name/cjkvt/all_cjkvt |