Previous Topic Next topic Print topic


Micro Focus Compression Routines

Micro Focus routines are stored in modules called CBLDCnnn, where nnn is within the range 001 to 127. To use Micro Focus compression routines, set fcd-data-compress in the FCD to a value between 001 and 127.

Routine CBLDC001 and CBLDC101

The compression routine CBLDC001 uses a form of run-length encoding. This is a method of compression that detects strings (runs) of the same character and reduces them to an identifier, a count and one occurrence of the character.

Note: The routine is not effective for use with files that contain significant occurrences of double-byte characters, including double-byte spaces, as these are not compressed.

The routine put special emphasis on runs of spaces, binary zeros and character zeros (that can be reduced to a single character) and printable characters (that are reduced to two characters consisting of a count followed by the repeated character).

The compression routine CBLDC101 behaves in exactly the same way as CBLDC001.

The difference between CBLDC001 and CBLDC101 is that CBLDC101 can handle strings of up to 256 kilobytes (KB) in length, whereas CBLDC001 can only handle strings of up to 65535 bytes.

In the compressed file, bytes have the following meanings (hexadecimal values shown):

20-7F (most printable characters) normal ASCII meaning.
80-9F 1-32 spaces respectively.
A0-BF 1-32 binary zeros respectively.
C0-DF 1-32 character zeros respectively.
E0-FF 1-32 occurrences of the character following.
00-1F 1-32 occurrences of the character following, and that it should be interpreted literally, not as a compression code.

This is used when characters in the range 00-1F, 80-9F, A0-BF, C0-DF or E0-FF occur in the original data. (Thus, one such character is expanded to two bytes; otherwise, no penalty is incurred by the compression.)

Routine CBLDC003 and CBLDC103

Like CBLDC001, the CBLDC003 routine uses run-length encoding, but detects strings (runs) of single- or double-byte characters. This routine is therefore suitable for DBCS characters, but can also be used in place of CBLDC001.

The CBLDC103 routine behaves in the same way as routine CBLDC003.

The difference between CBLDC003 and CBLDC103 is that CBLDC103 can handle strings of up to 256 kilobytes (KB) in length, whereas CBLDC003 can only handle strings of up to 65535 bytes.

The format of the compression is two header bytes followed by one or more characters. The bits in the header bytes indicate:

bit 15 Unset - single character.
bit 14 Set - compressed sequence. Unset - uncompressed sequence.
bit 0-13 Compressed character(s) or count of uncompressed characters.

The length of the character string depends on the header bits:

bit 14 and 15 set Two repeating characters.
Only bit 14 is set One repeating character.
Otherwise Between 1 and 63 uncompressed characters.
Previous Topic Next topic Print topic