The LangDetectUTF8
configuration parameter allows you classify files that contain 7-bit ASCII as UTF-8.
Automatic Language Detection uses the contents of the LangDetectType to determine the language of the document. If these fields contain only 7-bit ASCII characters, IDOL Server detects the document as ASCII. If additional fields in the document contain UTF-8, these might be converted incorrectly.
If you know that your documents are generally in UTF-8, set LangDetectUTF8
to True
, to classify these documents as UTF-8. For example, when you Retrieve Content using connectors, the connectors output most data in UTF-8.
|