Overview
In this document, we will examine the different functions supported by CitSORT, and include examples, intended to facilitate your adoption of, or transition to CitSORT.
CitSORT is a command-line driven data transformation utility. It provides high-performance Sort alternative and powerful data transformation capabilities.
Functionally, CitSORT provides syntax for making use of multiple processors, if available, naming input files, naming rules for interpreting the input files, which can include rules for transformation, and then naming output files, for which different formats can be described.
To better understand, it helps to understand logically how the utility parses a typical command. In the simplest case, CitSORT is used to sort a file on a key represented by the “sort fields” parameter, into an output file. In the example below, we begin with a sequential file that lists the Presidents and Vice Presidents of the United States from first to most recent. Our simple sort will reverse the order of the list, and generate the output in line sequential format.
Setting the Temporary Directory
CitSORT uses the system’s temporary directory to store the temporary files that are required for the intermediate stages of a SORT
. As a general rule, Windows, Linux and UNIX operating systems have default ways of handling temporary files in a default temporary directory.
On Windows
TMP
, or TEMP
is used to specify the directory to be used for temporary files. If neither TMP
or TEMP
is defined, or if it is set to the name of a directory that does not exist, temporary files are created in the current working directory.
A typical path is:
%USERPROFILE%\AppData\Local\Temp.
There is no limitation on the size of the temporary folder in Windows. You are only limited by the overall amount of free disk space that you have.
To change the temporary directory:
SET TMP=C:\NEWPATH\TMP
On Linux/UNIX
TMPDIR
is used to specify the directory to be used for temporary files. A typical path is /tmp
or /var/tmp
.
If you need to change the maximum size of a file, or extend a limitation in place on the size of the temporary folder in Linux/UNIX operating environment, consult with your System Administrator.
To change the temporary directory:
export TMPDIR=/usr/newpath/tmp
Command-line examples, Windows & Linux/Unix Considerations
The command-line examples in the Reference Manual are executed on a Windows platform where the shell (CMD.exe) is not interpreting the parentheses.
Executing this command-line in a Linux/UNIX environment, where the shell interprets parentheses, you could see an error, such as:
-bash: syntax error near unexpected token
`('
This is a shell error, not a citsort error. To correct this, you can escape the parentheses with \
.
See the difference below:
In a Windows environment
>citsort use presidents.dat
record F 85
sort fields (1,2,nu,d)
give pres2.txt org ls
In a Linux/Unix environment
>citsort use presidents.dat
record F 85
sort fields \(1,2,nu,d\)
give pres2.txt org ls
In a Linux/UNIX environment, you can avoid this error by transferring the commands into a parameter-file, and using the syntax:
>citsort take parameter-file.txt
Transferring commands into a parameter file
Transferring commands into a parameter-file is recommended when running citsort in Linux and UNIX environments.
Transferring command line into a parameter-file can also be useful, if your command lines are very long, and if you wish to add comments to the different parts of the command. Comments can be inserted into a parameter-file after the “*
” character.
>citsort take president-params.txt
(where president-params.txt
contains the following:)
Command | Description |
---|---|
Use presidents.dat |
input file to be sorted |
Record F 85 |
fixed length, 85 bytes, sequential |
Sort Fields (1,2,nu,d) |
sort key bytes 1-2/numeric,descending |
Give pres2.txt org ls |
output to pres2.txt , line sequential |
The CitSORT Flow of Control for a SORT
As noted above, the general flow of logic is:
Description | Statement/Clause |
---|---|
1. Allow for multiprocessing | -processmax , -processrec clauses |
2. Name the input file(s) | USE Statement |
3. Re-format the input file(s) | INREC Statement |
4. Name the Sort Key(s) | SORT Statement |
5. Apply Sum algorithm(s) | SUM Statement |
6. Define Sort Filter(s) | INCLUDE /OMIT Statements |
7. Name the output file | GIVE Statement |
8. Allow different output files | OUTFIL Statement |
9. Re-format output file(s) | OUTREC Statement |