Micro Focus routines are stored in modules called CBLDCnnn, where nnn is within the range 001 to 127. To use Micro Focus compression routines, set fcd-data-compress in the FCD to a value between 001 and 127.
The compression routine CBLDC001 uses a form of run-length encoding. This is a method of compression that detects strings (runs) of the same character and reduces them to an identifier, a count and one occurrence of the character.
The routine put special emphasis on runs of spaces, binary zeros and character zeros (that can be reduced to a single character) and printable characters (that are reduced to two characters consisting of a count followed by the repeated character).
In the compressed file, bytes have the following meanings (hexadecimal values shown):
20-7F | (most printable characters) normal ASCII meaning. |
80-9F | 1-32 spaces respectively. |
A0-BF | 1-32 binary zeros respectively. |
C0-DF | 1-32 character zeros respectively. |
E0-FF | 1-32 occurrences of the character following. |
00-1F | 1-32 occurrences of the character following, and that it should be interpreted literally, not as a compression code.
This is used when characters in the range 00-1F, 80-9F, A0-BF, C0-DF or E0-FF occur in the original data. (Thus, one such character is expanded to two bytes; otherwise, no penalty is incurred by the compression.) |
Like CBLDC001, the CBLDC003 routine uses run-length encoding, but detects strings (runs) of single- or double-byte characters. This routine is therefore suitable for DBCS characters, but can also be used in place of CBLDC001.
The format of the compression is two header bytes followed by one or more characters. The bits in the header bytes indicate:
bit 15 | Unset - single character. |
bit 14 | Set - compressed sequence. Unset - uncompressed sequence. |
bit 0-13 | Compressed character(s) or count of uncompressed characters. |
The length of the character string depends on the header bits:
bit 14 and 15 set | Two repeating characters. |
Only bit 14 is set | One repeating character. |
Otherwise | Between 1 and 63 uncompressed characters. |