The ProperNames
configuration parameter controls whether terms are created from pairs of consecutive words in index fields. The rationale behind this is to increase the relevance of results by matching pairs of associated words in a query with documents in which those words are also paired.
When you search for George Washington you want documents in which those words appear consecutively to have a higher score than a document containing the text I saw George Bush speak in Washington, D. C.
In most cases, HPE does not recommend using ProperNames
. It can increase the number of terms and the size of the index considerably, whilst achieving only marginal gains in most queries.
When you turn on AdvancedSearch (or AdvancedPlus and AdvancedCaseSearch), IDOL Server implicitly uses WNEAR
as the default query operator. This method ensures that a search for George Washington matches documents that contain those words consecutively with a higher score than documents in which they occur further apart.
HPE recommends that you use AdvancedSearch to achieve this functionality, rather than use ProperNames
.
You can use ProperNames
to match stop words in some situations, such as when they occur as part of a capitalized phrase. For example, you might want a query for The Queen to weight a document with those exact words higher than one only containing Queen (or indeed the queen) despite the fact that the is configured as a stop word.
With the appropriate setting (for example, ProperNames=7
), IDOL Server indexes a term for THEQUEEN
to allow this. The same is true for pairs of stop words (for example, The Who or Take That).
In situations, such as plagiarism or near-duplicate detection, where you want to match documents containing a significant amount of the same text, rather than conceptually similar documents, you can use ProperNames
to help. In fact, setting IDOL to index only proper name terms optimizes this process.
You can set the ProperNames
configuration parameter in each language configuration section to one of the following values:
Value | Tokenization of And The Cats Dogs ran away |
---|---|
0 | CAT DOG RAN AWAY |
1 | CAT CATSDOG DOG RAN AWAY |
2 | CAT CATSDOG DOG DOGSRAN RAN RANAWAY AWAY |
3 | ANDTH CAT CATSDOG DOG RAN AWAY |
4 | ANDTH THECAT CAT CATSDOG DOG RAN AWAY |
5 | ANDTHE CAT CATSDOGS DOG RAN AWAY |
6 | ANDTHE THECATS CAT CATSDOGS DOG RAN AWAY |
7 | ANDTHE THECATS CAT DOG RAN AWAY |
For specialized usage, you can also set the ProperNames
parameter by using bitwise values. You can combine any of the following values by adding multiple bits.
Bit | Short name | Description |
---|---|---|
8 | stem | Stem any ProperNames term. |
16 | case | Return only capitalized ProperNames terms. |
32 | neither | Return ProperNames terms if neither is a stop word. |
64 | one | Return ProperNames terms if exactly one is a stop word. |
128 | both | Return ProperNames terms if both are stop words. |
256 | only | Only return ProperNames terms. |
The standard configurable value of ProperNames
then have the following meanings:
Value | Bitwise equivalent | Sum |
---|---|---|
0 | 0 | 0 |
1 | 56 | 8+16+32 |
2 | 40 | 8+32 |
3 | 184 | 8+16+32+128 |
4 | 248 | 8+16+32+64+128 |
5 | 176 | 16+32+128 |
6 | 240 | 16+32+64+128 |
7 | 208 | 16+64+128 |
|