OCCURS Table Row Filtering
OCCURS tables are secondary tables generated by the Databridge Client
when OCCURS clauses for items (or GROUPs) are not flattened. This is the
default behavior of the Databridge Client. It involves creating a
separate row in these tables for each occurrence of the item (or GROUP)
with the keys of the primary table record duplicated and an additional
column named index1
, which contains the occurrence number (starting
at 1), added to them. In the case of nested OCCURS clauses you end up
with two tables, the first of which could be suppressed when you have
nothing but keys in it (i.e. you have a GROUP within an OCCURS clause
that contains only a GROUP, which also has an OCCURS clause). In the
case of nested OCCURS clauses the second table has two columns named
index1
and index2
added. These columns hold the occurrence
numbers of the corresponding items (or GROUPS) within the OCCURS
clauses.
Not all of the rows in such tables contain meaningful data, for this reason it is sometimes desirable to discard the ones with meaningless data. There are several advantages to doing this:
-
It saves storage, as these secondary tables are quite expensive, particularly when the item with the OCCURS clause is a single item.
-
The users of the database do not have to discard unwanted data when they fetch data from the secondary table.
-
The number of updates is significantly reduced, resulting in better performance. This can further be improved by setting the
optimize_updates
parameter to true. This parameter only applies updates to rows that are actually changed. This avoids doing redundant updates, and can thus greatly improve performance. The process of discarding rows that do not contain meaningful data is done by defining a set of filters for such tables that describe the conditions under which the rows should be discarded. This requires having access the before and after images for updates, as a change in the data can affect whether the row is to be filtered or not. Since we already have the before and after images when doing filtering, enablingoptimize_updates
does not add any additional overhead, other than the comparison of the before image and after image data to determine if anything changed, which is a lot quicker than executing a redundant update (that is SQL that does not change anything).
Filter Source File
The implementation of row filtering for secondary tables derived from
items with OCCURS clause does not involve changing any configuration
file parameters. All you need to do is to create a text file that
specifies the filtering conditions for all such tables that need to be
filtered. We refer to this text file as the filter source file. This
file normally resides in the config sub-directory of the data
source's working directory.The filter source file, which is formatted
in a somewhat similar manner to the row filtering sections of
GenFormat, defines the filters for the various secondary tables
using SQL-like statements. This file is then be compiled using a utility
called makefilter, which is included in the Client files. The
makefilter utility checks the syntax of the filter source file and
validates all the specified table and column names. It then creates a
binary file named "dbfilter.cfg" in the config sub-directory of
the Client's working directory. This file then gets loaded and bound to
the corresponding data tables and data items at the start of a Client
process
or clone
command. The Client looks for the file
"dbfilter.cfg" and loads it when it is present. The binding process
replaces column numbers by pointers to the structures that hold the
corresponding DATAITEMS control table entries. The Client uses a general
purpose filtering procedure that interprets the filter pseudo code using
the DMSII data buffer for the update and returns a result that indicates
whether or not the row should be discarded. The Client can thus determine
whether or not to insert (load in the case of data extraction) or update
a row in the table. In the case of a delete we do not bother with
filtering, we simply delete all rows that have the keys of the parent
table record (i.e. for all values of index1
). To make the Client run
efficiently, we made it use host variables to do these sort of
operations, which we refer to as DELETE_ALL operations (when using
stored procedure we use the z_tablename stored procedure for this
purpose). This means that besides INSERT, DELETE and UPDATE statements
we also have compound DELETE statements for OCCURS tables (i.e.
delete from tabname where key1=val1 and ... keyn=valn; without specifying a value for index1
).
The Filter File
The filter source file, which is modeled after the row filtering in GenFormat, uses a syntax that defines the conditions when a row is to be discarded, rather than when it is to be selected. The statements are free format and can extend over multiple lines, but they must be terminated by a semicolon. You can add comments using "// ...", which makes the scanner stop scanning the image before the slashes.
By using delete statements instead of select statements we make the "where" clause define the conditions under which a row is filtered out rather than selected. The reason for doing this, is that it is easier to follow (no need to use De Morgan's law). An example of a filter file source follows.
Sample Filter File
delete from customer_hold_information where hold_type = 0 or hold_type = 4;
delete from customer_account_abbr where account_abbr = " ";
delete from meter_readings where amount_read = NULL;
The makefilter program converts these filters into a list of tokens that contain all the required information for processing them using the general purpose filtering procedure that acts like a VM that executes the filter pseudo-code.
Any table that is not specified in the filter file will have no filter and will be treated normally. Filtering is limited to secondary tables derived from items with OCCURS clauses (a.k.a. OCCURS tables). We allow the testing for NULL by using "column_name= NULL" or "column_name!= NULL" (or "column_name <> NULL"), which is not proper SQL. If the item is ALPHA the fact that NULL is not in quotes is enough to distinguish it from the value "NULL". Unlike relational databases, NULL in DMSII is an actual value (typically all high values for numeric items and all low values for ALPHA items). All constants are stored in the data area of the filter using their corresponding DMSII representations. Character constants are automatically space padded. Numeric constants have leading zeroes added.
The 3 types of tokens involved in these expressions are variables (i.e. column names), constants and operators. Constants consist of a data type (that matches the first operand's type, which must be a column name), an offset into the filter's data area (the length is the same as that of the associated column name). We do not need the declared length, as all comparisons work at the character or digit level (we already do this when testing for NULL). Operators also include and end-of-statement indicator which corresponds to the semicolon in the pseudo-SQL statements in the filter source file. All comparisons must start with a column name and the second operand must be a constant or the word "null". Comparing two columns as a condition for filtering is not allowed. All object names are case sensitive and must be typed in lower-case, keywords and the word NULL are not case sensitive. String constants must be enclosed in double quotes (the use of single quotes is not currently supported).
Important
String constants must be enclosed in double quotes (the use of single quotes is not currently supported).
In the case of a DMSII structural reorganization the filters must be
recompiled if any of the data sets that have filters for secondary
tables are affected by the reorganization. The Client automatically
takes care of this by initiating a recompile of the filter at the end of
define
and redefine
commands or an Administrative Console Configure command run, when
there is filter file present in the config directory.
The changes to the Client itself are pretty straightforward and involve
using the filter routine on the image to determine whether it gets
discarded or not. The Client handles the situation where an item, that
was not stored, needs to be stored after an update (in this case the
Client does an INSERT). Similarly, it handles the situation where an
item, that was being stored, needs to be discarded after an update (in
this case the Client does a DELETE). The remaining cases are handled
normally, if the item was discarded and still needs to be discarded, we
do nothing. And if the item was stored and still needs to be stored we
update it, unless optimize_updates
is True, in which case we skip
the update if the values of all the columns are unchanged.
The following table summarizes the supported operators and their relative precedence.
Level | Operators |
---|---|
1 | =,>,<,>=,<=,!= (or <>) |
2 | AND |
3 | OR |
The use of parentheses is allowed, but usually not usually necessary. There is no limit to the number of items that can be specified in the where clause, other than the actual number of data items that are not keys contained in the table.
The use of DMSII items whose data type is REAL are restricted to tests for NULL and 0 in filters. Items that are not nullable in DMSII cannot be tested for NULL. When using items whose data type is BOOLEAN you must use 0 or 1 in the constants (the use of TRUE and FALSE is currently not supported).
The makefilter program has two commands, import
and display
. The
import
command compiles the filter source file, which can be specified
using the -f option, to create the binary filter file dbfilter.cfg. If
no filter file is specified the command tries to use the file
dbfilter.txt in the config subdirectory of the data source's working
directory. The display
command produces a report that describes the
content of the binary filter file. All makefilter log output is
written to the file "prefix_flt_yyyymmdd[_hhmmss].log" keeping
it separate from the Client log files.