When IDOL Server processes a document, it treats the text as a series of tokens (words), each of which is a unit of meaning. At a low level, this method is language independent. However, you can improve your query results by applying some language dependent processing.
Language dependent configuration allows you to:
make sure that all your content is treated consistently, allowing cross-lingual search.
filter your searches to content in a specific language.
This section describes the most important language concepts, and explains why you might use them.
Language Types. The language and encoding of a document.
Tokenization. The methods IDOL Server uses to split text into searchable tokens.
Stemming. Processing that reduces groups of related words to a common stem.
Stop Lists. Lists of words that do not convey meaning in documents.
Cross-Lingual Search. Search across documents in multiple languages.
Order of Language Processes. The order in which the language processing steps occur during indexing and querying.
|