Micro Focus Visual COBOL 2.3 Update 1 for Eclipse (UNIX) > Programming > Data Access > File Handling > File Handling Guide > Data and Key Compression > Compression Routines > Micro Focus Compression Routines

Micro Focus Compression Routines

Micro Focus routines are stored in modules called CBLDCnnn, where nnn is within the range 001 to 127. To use Micro Focus compression routines, set fcd-data-compress in the FCD to a value between 001 and 127.

Routine CBLDC001

The compression routine CBLDC001 uses a form of run-length encoding. This is a method of compression that detects strings (runs) of the same character and reduces them to an identifier, a count and one occurrence of the character.

Note: The routine is not effective for use with files that contain significant occurrences of double-byte characters, including double-byte spaces, as these are not compressed.

The routine put special emphasis on runs of spaces, binary zeros and character zeros (that can be reduced to a single character) and printable characters (that are reduced to two characters consisting of a count followed by the repeated character).

In the compressed file, bytes have the following meanings (hexadecimal values shown):

20-7F	(most printable characters) normal ASCII meaning.
80-9F	1-32 spaces respectively.
A0-BF	1-32 binary zeros respectively.
C0-DF	1-32 character zeros respectively.
E0-FF	1-32 occurrences of the character following.
00-1F	1-32 occurrences of the character following, and that it should be interpreted literally, not as a compression code. This is used when characters in the range 00-1F, 80-9F, A0-BF, C0-DF or E0-FF occur in the original data. (Thus, one such character is expanded to two bytes; otherwise, no penalty is incurred by the compression.)

Routine CBLDC003

Like CBLDC001, the CBLDC003 routine uses run-length encoding, but detects strings (runs) of single- or double-byte characters. This routine is therefore suitable for DBCS characters, but can also be used in place of CBLDC001.

The format of the compression is two header bytes followed by one or more characters. The bits in the header bytes indicate:

bit 15	Unset - single character.
bit 14	Set - compressed sequence. Unset - uncompressed sequence.
bit 0-13	Compressed character(s) or count of uncompressed characters.

The length of the character string depends on the header bits:

bit 14 and 15 set	Two repeating characters.
Only bit 14 is set	One repeating character.
Otherwise	Between 1 and 63 uncompressed characters.