The field name is the name of a field in your index. The field name might represent the name of an IDX field (#DREFIELD
), or an XML tag or attribute. A field identifier is the full name and path of the field in your index, for example the full path to the XML tag or attribute. You can use the GetTagNames action to retrieve the identifiers for existing fields (for example, for use in Query FieldText); alternatively, you can monitor field types in the data index on the Field Types page in the Monitor section of IDOL Admin.
IDOL query syntax typically allows you to use either the field name or the field identifier. You might want to use the identifier to remove any ambiguity.
Field names are often referred to with the prefix */
. This prefix automatically matches the parent nodes of the document, which are typically not of interest.
For example, */DRECONTENT
matches all fields named DRECONTENT
that are not at the root level. In the field processing section of the IDOL Server configuration file, a field listed as FieldName
is processed as */FieldName
.
Unlike structured information management systems, you do not need to know all the IDOL fields during the configuration of your system. If IDOL Server encounters a previously unknown field during the index process, it creates a field with that name. For example, if you index the following document:
#DREREFERENCE testdoc #DREFIELD MyNewField="value" #DRECONTENT Hello, world #DREENDDOC
In this example, IDOL Server creates a field called MyNewField
. You can immediately use the MyNewField
field in a FieldText query, such as action=Query&FieldText=MATCH{value}:MyNewField
.
Depending on the configuration of the Content server that indexes the document, this field might have properties associated with it, which allow for enhanced search functionality, or optimization.
Field names can contain alphanumeric characters (a-z, 0-9), period (.), underscore (_) and hyphen (-). IDOL Server replaces all other characters with an underscore during indexing, and processes the new field name this creates as normal.
IDOL indexes and queries the following two documents in exactly the same way, because it replaces the #
character in the second document with an underscore (_):
#DREREFERENCE testdoc #DREENDDOC #DREREFERENCE testdoc #DREFIELD my#field="value" #DREENDDOC
It is also best practice to use field names that conform to XML specifications.
When you index documents into IDOL Server by using connectors, many connectors standardize field names to a common string. This process ensures that common metadata values have the same field name in your IDOL index, regardless of the field name in the original repository.
Your connector installation includes a dictionary.xml
file, which lists the standardized names for various fields in different connectors.
Main Topic: Field Properties
By default, a field has no special properties associated with it, which can result in sub-optimal query performance, depending on how you use the field in queries. You can improve query performance by applying properties to certain fields before you index them.
To assign properties to a field, you configure rules in the Field Processing section of the IDOL Server configuration file. These rules list the field names that you want to assign properties to. You can use wildcards to match several fields.
The following example configuration defines certain fields as index fields when IDOL creates them in the index. In this case, any field called DRECONTENT
or DRETITLE
, and fields whose name start with PAGE
are index fields).
[FieldProcessing] 0=SetIndexFields [SetIndexFields] Property=IndexFields PropertyFieldCSVs=*/DRECONTENT,*/DRETITLE,*/PAGE* TrimSpaces=False [IndexFields] Index=True
The GetTagNames
action returns the list of currently known field names. You can also set the TypeDetails
parameter to True
to also return the associated properties for each field name. For example:
<autn:name code="4" types="index,highlight,title,sourcefield,textparseindex">DOCUMENT/DRETITLE</autn:name> <autn:name code="5" types="numeric">DOCUMENT/PRICE</autn:name> <autn:name code="6" types="numericdate">DOCUMENT/MYDATE</autn:name> <autn:name code="7" types="index,highlight,sourcefield,textparseindex">DOCUMENT/DRECONTENT</autn:name>
You can use the output of the GetTagNames
action to help you to interpret the response of the MemoryReport
action, which uses the numeric field codes. For example, the DOCUMENT/PRICE
field in the GetTagNames
response above corresponds to Numeric Index 5
in the MemoryReport
response below.
<name>Numeric Indexes</name> <memoryusage>3333393</memoryusage> <noncomponentusage>0</noncomponentusage> <approx>false</approx> <components>5</components> <memory0> <name>Numeric Index 5</name> <memoryusage>627357</memoryusage> <noncomponentusage>96</noncomponentusage> <approx>false</approx> <components>3</components>
See Also: Case-Sensitive Search
By default, IDOL matches field names in a case-insensitive manner. If you index XML directly, you might need case sensitive matching. You can configure this behavior by using the CaseSensitiveFieldNames
configuration parameter.
When you index XML directly, you might also need to configure existing XML namespaces in your data in the AdditionalNameSpaces
configuration parameter.
IDOL Server treats XML attributes as fields. When it creates the field name, it uses format _ATTR_AttributeName
, and it uses the name of the parent tag as part of the field identifier.
For the field <PRODUCT PRICE=value>
, the PRICE
attribute becomes PRODUCT/_ATTR_PRICE
in the IDOL Server index.
|