Cloning a DMSII Database

This chapter covers the steps to clone a DMSII database.

Cloning Issues for All Relational Databases

We recommend that you read this section before you use the process or clone commands.


Disk Space	You need to consider two types of disk space for the Databridge Client, as follows: Database storage is required by both the relational database and the DMSII data. Temporary file storage is required for Windows Clients during the cloning process. These temporary disk files hold the data used by the bulk loader utilities. For information on how to handle temporary file storage, see Controlling Temporary File Storage for the Windows Clients.
Column Order	The columns in the Client database are built in a different order than the order in the DMSII database. Specifically, the key items are placed first, followed by the non-key items in DMSII column order.
Databridge Client Log File	Logging and tracing are separate activities in the Databridge Client. Logging cannot be disabled. Log files are written to the logs subdirectory of the working directory. Trace files are only created when the `-t` or `-d` options are used and they are placed in the working directory. Using `-t1` is not allowed because this would create a second copy of the log file. You must specify at least one more bit in the trace mask for the option to be accepted.

Bulk Loader Parameters

Both dbutility process and dbutility clone use a bulk loader utility to populate the Databridge tables in the relational database during the data extraction phase (not during change tracking). Using the relational database bulk loader utility greatly increases the speed with which the Databridge data tables are populated.

This section lists the configuration parameters that affect the Databridge Client operations when using the relational database bulk loader utility. You can use these parameters to do the following:

Control temporary file storage (max_temp_storage parameter, Windows Client only)
Control the bulk loader utility maximum error count (max_errors parameter)

You can set the bulk loader parameters from the Administrative Console. These and other configuration parameters are available in the Configure item of the Settings the menu for the data source in question. For information, see the Administrative Console help Help.

Parameters that are specific to the SQL*Loader and BCP API are discussed in the next sections.

Controlling Temporary File Storage for Windows Clients

During cloning on Windows platforms, the Oracle Client and the SQLServer Client (unless directed to use the BCP API, which does not involve the use of temporary files) writes bulk loader data to multiple temporary text files for each table being loaded.

These temporary text files are used as holding areas for the bulk loader data. The Windows Client uses overlapped operations to write data to one set of text files while the bulk loader is loading tables from another set of files. The configuration file parameter max_temp_storage determines the maximum amount of storage to be used by all of the temporary files.

The Databridge Client writes data to as many temporary files as it needs, while keeping track of the total amount of storage used. When the amount of storage used exceeds half of the configured value of the configuration file parameter max_temp_storage, the Databridge Client closes all the temporary files and queue the tables on the bulk loader thread's work queue. (The default setting for max_temp_storage is 400 MB.) While the bulk loader thread is sequentially launching the loads for for these tables (which run as separate processes), the Databridge Client starts filling a new set of temporary files for the next group of loads. This mode of operation significantly enhances performance on systems that have more than one CPU.

Bulk Loader Operations for UNIX Clients

UNIX Clients do not use temporary text files; instead, they use pipes (such as lpipe_nnn.dat) to communicate data between processes. This introduces a lot more overlap between the Client and the bulk loader, resulting in a much smoother flow of data.

Controlling the Bulk Loader Maximum Error Count

The max_errors parameter controls the number of data errors allowed before the bulk loader's operations are canceled. The default value for max_errors is 10, which means that the bulk loader aborts after encountering 10 bad records. These bad records are written to the discard file for the table in the discards directory and information about the error is written into the bulk loader log file.

When several bulk loader errors occur, increasing the maximum error count allows you to gather all the errors in one run rather than finding 10 errors and then having to start over again. For more details, see the max_errors parameter description in [Bulk_Loader].

Oracle SQL*Loader Bulk Loader

This section lists the configuration file parameters that affect cloning with Oracle.

The enable_parallel_mode parameter, which is only meaningful when direct mode is enabled, causes the program to include the PARALLEL option in the SQL*Loader command line. In direct mode, the loader runs faster at the expense of system resources; however, enabling this option has a much more noticeable impact on Windows Clients than on UNIX Clients.

For non-US sites where the period (.) and comma (,) decimal characters are swapped, the Databridge Client automatically reads the database's NLS parameters and makes the necessary adjustments so that the SQL*Loader input records are formatted using the numeric characters that SQL*Loader expects.

The inhibit_direct_mode parameter applies when you run dbutility for a remote Oracle database using SQL*Net®.

The following parameters are meaningful only when inhibit_direct_mode is enabled.

The sqlld_rows parameter defines the value to be used for the ROWS specification for SQL*Loader operations.
The sqlld_bindsize parameter defines the value to be used for the BINDSIZE parameter for SQL*Loader operations. Increasing this value can speed up SQL*Loader operations when not using DIRECT mode (for example, running remote to a database on a UNIX system).

For more information about the bulk loader parameters mentioned here, see [Bulk_Loader].

Each execution of SQL*Loader uses a control file (load_nnn.ctl, which is a copy of the file sqlld.tablename.ctl created by the generate command) and a data file (lpipe_nnn.dat) as input.

As a result of the bulk loading process, SQL*Loader produces a log file (load_nnn.log) and, if there are any records that cannot be loaded due to data errors, a discard file (lpipe_nnn.bad) for each table. Discard files are placed into the subdirectory named discards.

Windows Log Files

In Windows, to prevent log files and discard files from being overwritten as a result of successive executions of SQL*Loader during segmented bulk load operations, the Databridge Client uses the SQL*Loader log and discard files as temporary files and does the following:

At the end of the first load segment, the Databridge Client copies the temporary log file to the permanent log file (sqlld.tablename.log). If a discard file was produced, the Databridge Client also copies the temporary discard file to the permanent discard file (sqlld.tablename.bad) in the discards folder.
At the end of every subsequent load segment, the Databridge Client appends the temporary log files to the end of the permanent log file (sqlld.tablename.log). If a temporary discard file was produced, the Databridge Client either copies it or appends it to the permanent discard file (sqlld.tablename.bad), depending on whether this file exists or not.
The Databridge Client deletes the temporary log and discard files as soon as they are appended to Databridge Client permanent log and discard files.

UNIX Log Files

In order to maintain compatibility with the Windows Clients, the UNIX Client renames the log and discard files at the end of a SQL*Loader operation. Therefore, the log file loadnnn.log is renamed sqlldtablename.log and the discard file lpipe_nnn.bad is moved to the discards folder as sqlld_tablename.bad in the SQL*Loader shell scripts.

List of Files Related to SQL*Loader

The table below lists files related to SQL*Loader and Databridge Client operations. In some of the filenames below, nnn is the value in the table_number column of the DATATABLES Client control table. It is unique within each data source.

File	Description
`sqlld.tablename.ctl`	The SQL*Loader control file created by the `generate` command. It describes the format of the data in the data file (`lpipe`_nnn.`dat`).
`lpipe_nnn.dat`	For Windows: This is a temporary file that the Databridge Client creates. It contains the data to be loaded into an Oracle table. Since the Client uses two copies of this file simultaneously when doing overlapped operations, it appends a suffix to the file name to make it unique every time a new file is created. The suffix of the form "_nnn", where nnn is a number that starts at 1 and gets incremented by 1 each time a new file is created for the table in question. Thus the name `"lpipe_12.dat"` will be changed to `lpipe_12_1.dat` for the second file and so on. This file is automatically deleted after a successful load of a table. If the table is not loaded successfully, the file is not deleted. This gives you the opportunity to manually run SQLLoader to determine why it is failing. For UNIX:This is a UNIX pipe that the SQLLoader shell script creates and uses to pass data to the SQLLoader program. This pipe is automatically removed after a successful load of a table. If the Databridge Client or SQL*Loader abends, the pipe is not immediately deleted. If you run the Databridge Client again, you receive a warning message as the pipe is being deleted. You can safely ignore this warning, as this is not a fatal error.
`sqlld.tablename.log`	For Windows: This file is a concatenation of all of the load_nnn.log files created during the cloning process. For UNIX: This is the log file generated by SQLLodare. CAUTION:* Do not delete the sqlld.tablename.log file until you have looked at it. It can contain valuable information such as error messages about rows that were not loaded.
`sqlld.tablename.bad`	For Windows: This file is a concatenation of all of the load_nnn.bad files created during the cloning process. It is created in the discards subdirectory only if discard records exist. For UNIX: This is the discard file generated by SQLLoader that has been moved to the discards* folder. CAUTION: Do not delete the sqlld.tablename.bad file until you have looked at it to determine which records were rejected by SQLLoader. Correct the bad data. Then use SQLLoader to load these records into the appropriate table.

The following temporary files are created while the bulk loader is being launched, but they are deleted before the run is completed:

load_nnn.ctl
load_nnn.log (renamed in UNIX to sqlld_tablename.log)
lpipe_nnn.bad (renamed in UNIX to sqlld.tablename.bad)

You see these files only if the bulk loader operation abends.

Microsoft SQL Server BCP API and bcp utility

This section lists the configuration file parameters that affect cloning with BCP API and bcp utility. SQL Server Clients on Windows use the bcp utility by default.

You can make the SQL Server Client use the BCP API, which allows the program to perform bulk loader operations by making BCP API calls, which operate like SQL statements. This produces more overlap between the Client and the load operations, resulting in a much smoother flow of data. We have had some problems with the BCP API, which lead us to change the default mode of operations to use the bcp utility which is more reliable.

It is recommended that you use multi-threaded updates, as this allows multiple tables to be loaded simultaneously by different threads which increases the resource utilization and offers all the advantages of multi-threaded updates when doing data extraction.

Note

Bulk loader operations will run efficiently if the database recovery model is set to "Simple" or "Bulk-logged". If you are running a database with a recovery model of "Full", we recommend that you switch to "Bulk Logged" for the duration of the bulk-load and then switch back to "Full" recovery.

The following Client configuration file parameters affect the bcp utility or BCP API calls made by the Client. For more details, see [EbcdictoAscii].

Parameter	Description
`bcp_batch_size`	The Databridge Client using the BCP API or the bcp utility can load a table in several batches instead of loading the entire table in a single operation. You can control the batch size using this parameter.
`bcp_code_page`	Adds the -C code_page to the bcp command line, which specifies the code page of the data in the file. For example, because the Japanese code page is 932, setting this parameter to 932 adds -C 932 to the bcp command line. This parameter is only applicable when using the bcp utility.
`bcp_packet_size`	Defines the network packet size value for the bcp utility (applies to remote servers only). If you have wide tables, setting this parameter to a packet size larger than the default (4096) can speed up loading the data into the table at the expense of system resources. This parameter is only applicable when using the bcp utility.
bcp_copied_msg	Enables the bcp_auditor program to determine whether or not a bcp was successful in cases where the database language is not English. This parameter is only applicable when using the bcp utility.
bcp_delim	Defines the delimiter character bcp uses (the TAB character, by default). If you want to preserve TAB characters in your data, set this parameter to a value that allows multiple characters. This parameter is only applicable when using the bcp utility.
max_errors	Controls the bulk loader’s tolerance to records that are discarded due to data errors.
max_temp_storage	Activates the segmented bulk load feature, which allows you to specify the maximum amount of storage that dbutility should use for temporary files. This parameter is only applicable when using the bcp utility.

bcp_auditor Utility

The bcp command files capture bcp execution output by redirecting the output to a temporary file. These command files then invoke the bcp_auditor utility to examine this file to determine if the bcp operation was successful. The bcp_auditor utility sets the exit code such that the Databridge Client can determine if the table load was successful.

Note

The SQL Server Client will only use the BCP API when specifically directed to do so. You can globally control whether the Client is to use the bcp utility or the BCP API, by setting the use_bcp parameter accordingly in the [bulk_loader] section of the Client configuration file. You must run a redefine command with the -R option to make this change take effect. You can force the Client to always use the bcp utility by using the /l command line option. Alternatively, you can control the loading of tables at the data set level by setting the DSOPT_Use_bcp (0x1000000) bit in the ds_options column for the corresponding entries in the DATASETS Client control table.

Each execution of bcp uses a format file (bcp.tablename.fmt) and a data file (bcppipe.tablename) as input.

As a result of the bulk loading process, bcp produces a log file (load_*nnn*.log) for each table. If there are any records that cannot be loaded due to data errors, bcp also produces a discard file (load_nnn.bad).

To prevent log files and discard files from being overwritten during segmented bulk load operations, Databridge Client treats bcp log and discard files as temporary files:

At the end of the first load segment, the Databridge Client copies the temporary log file to the permanent log file ( bcp.tablename.log). If a discard file was produced, the Databridge Client also copies the temporary discard file to the permanent discard file (bcp.tablename.bad).
At the end of every subsequent load segment, the Databridge Client appends the temporary log files to the end of the permanent log file (bcp.tablename.log). If a temporary discard file was produced, the Databridge Client either copies it or appends it to the permanent discard file (bcp.tablename.bad), depending on whether this file exists or not.
The Databridge Client deletes the temporary log and discard files as soon as they are appended to Databridge Client permanent log and discard files.

Files related to bcp and Databridge Client operations are listed in the following table. In some of the filenames below, nnn is the value for the table_number column in the DATATABLES Client control table. The table number is unique within each data source.

File	Description
`bcp.tablename.fmt`	The bcp format file that is created by the `generate` command. It describes the format of the data in the data file (`bcppipe`.tablename).
`bcppipe.tablename`	A temporary file created by the Databridge Client. It contains the data to be loaded into a Microsoft SQL Server table. Since the Client uses two copies of this file simultaneously when doing overlapped operations, it appends a suffix to the file name to make it unique every time a new file is created. The suffix is of the form "_nnn" where nnn is a number that starts at 1 and gets incremented by 1 each time a new file is created for the table in question. Thus the name "`bcppipe.customer`" will be changed to "`bcppipe.customer_1`" for the second file and so on. This file is automatically deleted after a successful load for a table. If the table is not loaded successfully, the file is not deleted. This gives you the opportunity to manually run bcp to determine why it is failing. Important: The `bcppipe`.tablename files can be quite large. When these files are no longer needed, make sure you delete them to prevent errors from occurring.
`bcp.tablename.log`	A concatenation of bcp screen output created during the cloning process. The files are created in the working directory for the data source. CAUTION: Do not delete the `bcp`.tablename.`log` file until you have looked at it. It can contain valuable information such as error messages about rows that were not loaded.
`bcp.tablename.bad`	A concatenation of all of the `load_nnn.bad` files created during the cloning process. These files are created in the discards subdirectory. CAUTION: Do not delete the bcp.tablename.bad file until you have looked at it. It can contain valuable information such as which rows were not loaded. Correct the bad data and use bcp to load these records into the appropriate table.

The following temporary files are created while the bulk loader is being launched, but they are deleted before the run is completed:

load_nnn.log
load_nnn.bad

You see these files only if the bulk loader operation abends.

When using the BCP API all errors are logged to the Client log file. If there are discarded records, they are written to the tablename.bad file located in the discards folder, this is similar to discards during the tracking phase.

Configuring Host Parameters

TCP/IP throughput is greatly affected by the BLOCKTIMEOUT parameter on the host. Typically, the default is 100, which is acceptable for character-oriented communications (for example, Telnet VT™ 100 emulation), but not good for record and block-oriented communications, as with Databridge or FTP (file transfer protocol). For Databridge communications, you can increase throughput by reducing the BLOCKTIMEOUT parameter to a value of 2.

If the Databridge Client system is on a different subnet from the mainframe, put it on the same subnet so that Ethernet packets can be larger. If you cannot put the Databridge Client on the same subnet as the mainframe, you can improve throughput by adjusting BLOCKSIZE on the host and TCP/IP Window Size on the Windows Server PC.

Running tcptest

During the initial setup, use the tcptest command to determine if the TCP/IP interface is operating properly. Before you run the tcptest command, you must define a data source. For more information, see the table in dbutility Commands. An example of the test is shown below:

E:\>dbutility tcptest demodb 111.222.33.444 5555 100 1000
11:49:10 Databridge Client version 7.0.0.000 [OCI/Oracle]
11:49:10 (C) Copyright 2021 Micro Focus or one of its affiliates.
11:49:14 Connecting to 111.222.33.444, port 5555
11:49:16 TCP_Test: len=100, count=1000
11:49:17 Bytes Processed 100.00 KB of DMSII data in 1.000 secs, throughput = 100.00 KB/sec
11:49:17 Bytes Received 112.00 KB in 1.000 secs, total throughput = 112.00 KB/sec
11:49:17 TCP/IP_time = 0.841 secs, (84.10% of total time)
11:49:17 TCP Test completed successfully
11:49:17 Client exit code: 0 - Successful

Populating the Databridge Data Tables

Before you populate the Databridge data tables, determine if you need to customize the character translation tables or not. If you do, modify the [EbcdictoAscii] section of the Client configuration file before you run either the process or clonecommand. For more information on character translation tables and modifying the configuration file, see [EbcdictoAscii] and Export or Import a Configuration File.

You can populate the Databridge data tables in the relational database using either of the following methods:

dbutility process
dbutility clone

The process and clone commands use the relational database bulk loader utility to populate the Databridge tables. We recommend that you read one of the previous sections, Oracle SQL*Loader Bulk Loader or Microsoft SQL Server BCP API and bcp utility before you use the dbutility clone or dbutility process command.

The process command is typically used to populate the data tables. The clone command is a special case of the process command that allows you to clone a small number of data sets without changing the values of the corresponding entries in the active column of the DATASETS Client control table.

Data Validation and Discard Files

While processing DMSII extract and update records, Databridge validates all numeric and alpha fields. Fields that contain NULL values (data with all high-bits set) usually are recognized as DMSII NULLS. In this section, the following types of data validation and discard files are described:

Numeric data validation
Alpha data validation
Date validation
Special handling of key items in discard files
The handling of blank character data for key items in the Databridge Client for Oracle

Numeric Data Validation

Numeric data that contains illegal digits (for example, values other than 0 through 9, excluding the sign field for signed numbers) are flagged as bad. If the da_options column of the corresponding DATAITEMS control table entry has the DAOPT_Allow_Nulls bit (1) set, Databridge treats numeric items that have bad digits as NULL.

The configuration parameter allow_nulls defines the default value for this bit, which can be altered by user scripts. If the bit is zero, the NULL or bad numeric data is stored as either all 9s or all 0s based on the value of the configuration parameter, null_digit_value (default value is 9). For more information, see bracket_tabnames and null_digit_value.

Alpha Data Validation

With alpha data, bad characters are usually replaced with a question mark (?) instead of the whole field being set to NULL. The Client configuration file parameter inhibit_ctrl_chars determines whether or not control characters are to be treated as bad characters (the program treats a few control characters such as NUL, CR and LF as bad regardless of the value of this parameter). The Client configuration file parameter inhibit_8_bit_data determines whether or not 8-bit characters are to be treated as bad characters. The Client configuration parameter convert_ctrl_char (which is incompatible with inhibit_ctrl_chars) replaces control characters by spaces instead or question marks. For more information, see [Bulk_Loader].

The Client configuration file parameter alpha_error_cutoff determines the percentage of bad characters in an ALPHA field that are tolerated before the entire field is declared bad and treated as NULL.

If ALPHA data is stored as binary data, no alpha data validation is performed because no invalid values exist in binary data. See the DIOPT_Clone_as_Binary option in the di_options column of DMS_ITEMS.

NULL data is treated as NULL if the da_options column of the corresponding DATAITEMS control table entry has the DAOPT_Allow_Nulls bit (1) set. Otherwise, the NULL data is stored as blanks.

Date Validation

Whenever Databridge processes numeric or alpha items that are cloned as relational database date data types, it checks the validity of the data. Invalid dates are usually treated as NULL. The Databridge Client for Microsoft SQL Server stores bad or NULL dates as 1/1/1900, when the DAOPT_Allow_Nulls bit (1) in the da_options column of the corresponding DATAITEMS control table entry has not been set. The Databridge Client for Oracle uses the date 1/1/0001 instead. A numeric date of all 0s or all 9s is treated as NULL rather than an error. Similarly, an ALPHA date that is all blanks is treated as a NULL date.

Special Handling of Key Items in Discard Files

Because the stored procedures used during update processing use equality tests in the where clauses, key items (items that are used in the index for a table) can never be NULL. In relational databases, you cannot use equality tests for items that are NULL.

If a key item has a data error or it is NULL, Databridge places the entire record in a discard file named tablename.bad in the discards subdirectory. The syntax for discard file data is the calling sequence that would typically be used for the stored procedure that performs the update. Therefore, discarded records from both the data extraction and update phases are identical. Databridge preserves bad numeric digits and characters to help you better troubleshoot the problem.

Note

During data extraction records discarded by the Client and those discarded by the bulk loader end up in different files in the discards folder (in the case of the SQL Server Client tablename.bad and bcp.tablename.bad respectively). Bulk loader discards are extremely rare as the Client catches all the data errors.

Handling Blank Character Data for Key Items (Databridge Client for Oracle)

The Databridge Client strips all trailing blanks when constructing SQL statements using varchar data. When an application reads the records back from the database, the access routines put back the trailing blanks, greatly reducing the storage requirements for the SQL statements and bulk loader data files.

In Oracle, char or varchar items that have a length of 0 are treated as NULL. If any of the key items used in where clauses are NULL, the corresponding update or delete SQL statements fail as mentioned above. To prevent the key item from becoming NULL, the Databridge Client for Oracle keeps the last blank of the item.

The Process Command

The process command is the main command of the Databridge Client. It populates and updates the tables for all data sets whose active column is 1 in the corresponding entries of the DATASETS Client control table. Since the define command initializes the ds_mode column, all the selected data sets are cloned the first time you run a process command.

Note

If you do not select specific data sets in the data set global mapping customization script, the Databridge Client automatically clones all data sets except for remaps, the restart data set, and the global data set. This operation may take a very long time and require a lot of disk space.

You can schedule the process command to update the Databridge data tables. The schedule becomes effective after you run the process command for the first time. For more information, see Scheduling dbutility Updates.

To populate the Databridge data tables in the relational database via the dbutility process command, you must first make sure that the current directory is set to the working directory you created for this data source. This must be the same working directory you used when you executed a generate command for this data source; otherwise, the Databridge Client cannot locate the scripts to create and populate the Databridge data tables.

Cloning a DMSII Database

Use the following procedure to clone a DMSII database via the processcommand.

To run the process command

Make sure that Databridge Server is running. If it is not, the Databridge Client will try to connect to the host and eventually time out.
Make sure that your signon parameters are configured appropriately.
If you plan to use the [EbcdictoAscii] section to customize character translation or any other parameters in the dbridge.cfg file, set them before you continue. In particular, make sure you have appropriate settings for the following parameters. (For information on setting these parameters, see Appendix C: Client Configuration.)

The following parameters affect the way a process or clone command operates. You can change these parameters before running these command without running a redefine command as they do not change anything in the control tables:

 alpha_error_cutoff
 aux_stmts
 batch_job_period
 century_break
 commit_absn_inc
 commit_idle_database
 commit_longtrans
 commit_time_inc
 commit_txn_inc
 commit_update_inc
 controlled_execution (dbutility only)
 convert_reversals
 correct_bad_days
 dbe_dflt_origin
 defer_fixup_phase
 discard_data_errors
 display_bad_data
 eatran_dll_name
 enable_af_stats
 enable_doc_records
 enable_encryption
 enable_minimized_col
 enable_optimized_sql
 engine_workers
 error_display_limits
 inhibit_8_bit_data
 inhibit_console
 inhibit_ctrl_chars
 inhibit_drop_history
 inhibit_init_values
 keep_undigits
 linc_century_base
 masking_parameter (SQL Server only -- run a generate command when changed)
 max_clone_count (Only meaning when using the -s option)
 max_discards
 max_retry_secs
 max_srv_idle_time
 max_temp_storage (Windows only)
 max_wait_secs
 n_dmsii_buffers
 n_update_threads
 null_datetime_value (SQL Server only)
 null_datetime2_value  (SQL Server only)
 null_digit_value
 numeric_date_format
 preserve_deletes
 set_blanks_to_null
 set_lincday0_to_null
 show_perf_stats
 show_statistics
 show_table_stat
 sql_exec_timeout
 sql_heart_beat
 statistics_increment
 stop_after_fixups
 stop_after_gc_reorg
 stop_after_given_afn (dbutility only)
 stop_on_dbe_mode_chg
 track_vfds_nolinks
 use_ctrl_tab_sp
 use_dbwait
 use_latest_si

Enter the following command:

    dbutility [signon_options misc_options] process datasource

Where	Is
signon_options	For each Databridge Client type, the following command-line options specify the relational database signon parameters: Oracle: [-U userid] [-P password] [-D database] SQL Server: [-U userid] [-P password] [-W] [-O ODBCdatasource]
misc_options	See table in the next section titled "Process Command Options".
datasource	The name of the data source specified in the DBServer control file (DATA/SERVER/CONTROL) or via Enterprise Server.

If the Databridge Client connects to DBServer, it selects all the data sets whose corresponding active columns have a value of 1 in the DATASETS table. Next, the Databridge Client requests that DBServer clone all the selected data sets. At the end of the data extraction phase, the Databridge Client issues another request to start sending the fixup records followed by updates. The processing of audit files continues until there are no more audit files available.

If the Databridge Client connects to DBEnterprise, DBEnterprise supplies the data, either by reading the DMSII data set directly (direct disk) or by issuing a request to DBServer to have Databridge Engine read a block of data from a specific region of the disk (remote regions). DBEnterprise then processes this block of data. Since Databridge Engine is only reading raw data in remote regions mode and does not do any processing of this data, this mode of operations is less expensive than having the Client connect directly to DBServer in term mainframe resource utilization. Direct disk mode offers the biggest mainframe resource savings, as DBServer is not involved in reading data sets.

In the case of audit file data, DBEnterprise either reads the data from its caches (if configured), or it reads the audit file directly by issuing a request to DBServer to have Databridge Engine read a block of data from a specific region of the disk.

After the cloning of the DMSII database completes, the tables in the relational database will contain the same data as DMSII. At this point you can execute SQL queries to view the data and make sure that all the tables have been populated. When you are ready to update the relational database with changes made to the DMSII database, see Updating the Databridge Data Tables.

Process Command Options

Command-line options related to the process command are as follows:

Option	Description
`-c`	Toggles the `defer_fixup_phase` configuration file parameter.
`-d`	Enables default tracing.
`-f` filename	Specifies a configuration file other than the default dbridge.cfg file in the working directory.
`-l`	(SQL Server only) forces the Client to use the bcp utility instead of the BCP API.
`-s`	Tells the Client not to use the bulk loader.
`-t`	Enables selective tracing.
`-w`	Toggles the `use_dbwait` parameter in dbridge.cfg
`-K`	Inhibits the audit file removal WFL from being run on the host.
`-L`	Forces the Client to use a new log file for this run.
`-T`	Forces the Client to use a new trace file for this run, if tracing is enabled.

For information on the command-line options, see dbutility Command-Line Options.

Terminate Cloning

Use the following procedures to stop the cloning process before it is complete.

To terminate cloning

When using the service, from the Administrative Console, use the Abort command in the Run menu for the data source.
When using dbutility, use the QUIT NOWcommand.

To terminate processing during the fixup and tracking phases

When using the service, from the Administrative Console, use the Stop command in the Run menu for the data source. The Databridge Client will stop at the next quiet point.
When using dbutility, use the QUIT command (or the SIGTERM (15) signal on UNIX).

Note

If you issue a QUIT command or send a SIGTERM signal to the program during the data extraction phase, the Databridge Client stops only when the fixup phase begins.

When you terminate the Client during the fixup phase or during updates, the process command restarts from the last commit point. If you terminate the Client during the data extraction phase, only the data sets that have successfully completed the data extraction phase (ds_mode = 1) are recoverable. You can resume the process by running another process command.

In the unlikely event that all of these commands fail to terminate the Client, press Ctrl+C or kill the run.

Tracking the State of Data Sets

The DATASETS Client control table keeps track of the state of data sets. State information consists of the ds_mode value and the DMSII audit file location from which subsequent updates should be processed. The audit file location includes the AFN, the ABSN, the segment and index in the audit files, and the audit file time stamp. These values, which are collectively referred to as the stateinfo, are stored in the audit_filenum, audit_block, audit_seg, audit_inx, and audit_time6 columns of the DATASETS Client control table. The column audit_ts contains a date/time value, which corresponds to the audit_time6 data, which is binary and represents a DMSII TIME(6) value. This last column is not part of the stateinfo; it is there because knowing the audit time stamp value can sometimes be very useful.

Each subsequent time you run a processcommand, the Databridge Client passes the stateinfo and the mode of each data set to the Databridge Engine. The Engine uses this information to determine whether data sets should be cloned and the starting location in the audit trail. From that starting location, the Databridge Engine begins processing updates to the DMSII database. Every time a transaction group ends, the Databridge Client updates the stateinfo for the data sets in the DATASETS Client control table. At the end of the process command, the location of the last quiet point in the audit trail is saved in the DATASETS Client control table. This is the starting point for the next Client run (processcommand).

If the in_sync column of a data set has a value of 1, its stateinfo columns may be out-of-date. You can determine if it is current by checking the Global_Dataset entry in the DATASETS control table. For more information, see Optimizing State Information Passing.

ds_mode values

The following values are defined for the ds_mode column of the DATASETS Client control table:

Value	Name	Description
0	CLONE	Initial state of `ds_mode` before the data set is cloned.
1	FIXUP	Data extraction completed, fixup processing not completed.
2	NORMAL	Normal update tracking mode.
10	BCP-FAILURE	The bulk loading of the table failed. Further processing is not possible until the problem is resolved.
11	PRE-FIXUP	Data extraction completed, fixup processing cannot be done due to index creation errors or lack of an index.
12	INVALID-AA	AA Values invalidated by a DMSII garbage collection reorganization.
31	NEEDREORG	The data set needs to be reorganized and the `redefine` command has created scripts to make the relational database table match the new layout that resulted from the reorganization of the DMSII data set. You must run the `reorganize` command in order to run the reorganization scripts created by the `redefine` command.
33	REORGFAILED	The data set needs to be reorganized and the scripts created by the `redefine` command for this data set failed when the `reorganize` command was run. In this case, you must manually alter the table or re-clone it.

In the case of DMSII reorganizations, the status_bits column in the DATASETS table is used instead. The Databridge Client leaves the ds_mode column unchanged and sets the DS_Needs_Redefining bit (8) of the status_bits column of the DATASETS Client control table.

Following the initialization (purge) of a data set, the Client is notified of the purge. The Client drops the tables for the data set and recreates them. The ds_mode of the data set is set to 2 and the index for the empty tables are created. This enables the normal update processing to repopulate the tables. .

The Clone Command

From a command line, use the clone command to select the data sets you want to clone. You can use this command for cloning or recloning. To update the resulting Databridge data tables, you must use the process command. The process command is generally recommended instead the clone command, unless you want to deal only with a specific data set without processing updates at the same time.

The clone command is basically a process command, except that it forces the data sets specified on the command line to have their ds_mode set to 0 and it treats all data sets not specified on the command line as if their active column is 0.

To populate the Databridge data tables in the relational database via the clone command, first make sure that the working directory is set to the directory you created for this data source. This must be the same directory as the working directory used when you executed a generate command for this data source; otherwise, the Databridge Client cannot locate the scripts to load the Databridge data tables.

Cloning Specific Data Sets

Using the clone command follow the exact same procedure described in the section on "Cloning a DMSII Database". type the following command line:

        dbutility [signon_opts misc_opts] clone source dataset1 [... datasetn]

Where	Is
signon_opts	For each Databridge Client type, the following command-line options specify the relational database signon parameters: Oracle: [-U userid] [-P password] [-D database] SQL Server: [-U userid] [-P password] [-W] [-O ODBCdatasource ]
misc_options	See table in the next section titled "Clone Command Options".
source	The name of the source specified in the DBServer control file or by Enterprise Server.
`dataset1 [... datasetn]`	The names of the data sets you want to clone. You must specify at least one data set name. If you specify more than one data set name, separate the names with spaces. Note the following: The data set names you enter must match the names of the data sets as they are defined in the DASDL for the DMSII database. Databridge Client automatically converts them to uppercase for you. For example, if the data set you want to clone is named ORD-DETAIL, you must type ORD-DETAIL or `ord-detail`. You must use a hyphen (-), not an underscore (_). The exact data set names are listed in the DATASETS Client control table. If a DMSII data set is a relational database reserved word, enter it normally without quotes or any other delimiter. The `active` column of the selected data sets must be set to 1. Otherwise, an error appears when you specify the data set on the command line.

Warning

If for some reason the clone command abends, do not rerun it before you determine whether or not some of the data sets completed the data extraction phase and are recoverable. Rerunning the clone command starts the cloning operations from scratch.

If you need to re-clone the entire database you can use the process command with the -Y option. When using dbutility you need to add the text "all_datasets" as an argument to the -Y option.

Clone Command options

Command-line options related to the clone command are as follows:

Option	Description
`-c`	Toggles the `defer_fixup_phase` configuration file parameter . When you use this option, the dbutility `clone` does not enter the fixup phase at the end of data extraction. Instead of issuing request to the Databridge Server to initiate the fixup phase, the Databridge Client terminates. The `ds_mode` values of all cloned data sets remain set to 1 with all of the necessary stateinfo stored in the Client control tables (for example, `audit_filenum`, `audit_block`, and `host_info`). The next `process` command then picks up where the `clone` command left off.
`-d`	Enables default tracing.
`-f` filename	Specifies a configuration file other than the default dbridge.cfg file in the working directory.
`-l`	(SQL Server only) forces the Client to use the bcp utility instead of the BCP API.
`-s`	Tells the Client not to use the bulk loader.
`-t`	Enables selective tracing.
`-x`	Reverses the meaning of the data set list for the clone command, as follows: Without the -x option, the Databridge Client clones the data sets listed on the command line. With the -x option, the Databridge Client clones all active data sets except those listed on the command line.
`-L`	Forces the Client to use a new log file for this run.
`-T`	Forces the Client to use a new trace file for this run, if tracing is enabled.

For information on the command-line options, see dbutility Command-Line Options.

Configuring for Optimal Performance

Several configuration file parameters have a very visible effect on the performance of the Databridge Client. Databridge Clients operate efficiently with the following default configuration file parameters:

max_temp_storage (Windows only)
aux_stmts
optimize_updates

This section discusses these parameters and other factors that can make Databridge Client run more efficiently.

Overlapped Bulk Loader Operations for Windows

Note

The SQL Server Client uses the bcp utility by default in this version, as using the BCP API did not prove to be very reliable.

The Windows Clients use a separate thread to bulk load tables using SQL*Loader or bcp during the data extraction phase of data set cloning. The Client creates a set temporary files for the tables that hold the data for the records received from the Databridge Engine. When the total temporary file size reaches half of the value specified by the parameter max_temp_storage, all the tables that have temporary files get placed on the bcp thread's work queue. The bcp thread processes the list of tables and does the loads sequentially. While this is happening the Client creates a new set of temporary files for tables that that get additional records and continue processing DMSII data until the total temporary file size reaches the above mentioned threshold. If the bcp thread is done loading the table involved, processing continues. However, if the bcp is not finished loading the data for the table Client stops processing DMSII data until it is signalled that the loading of the table completed.

Starting with the 6.6 Client multi-threaded updates were extended to also include extracts, which improves performance as the load is distributed among the update threads. This leads to better resource utilization, as multiple CPUs are working concurrently on processing the data for the records that are received from the Databridge Engine.

Note

Configuration file parameters for increasing bulk loader speed are listed with the related bulk loader utility in the sections Oracle SQL*Loader Bulk Loader and Microsoft SQL Server BCP API and bcp utility. See a complete list in Appendix C's section on [Bulk_Loader]

Adjust the max_temp_storage value to determine what works best for your site. Setting this parameter too high tends to reduce the benefits of using multiple threads to launch bulk loader operations. Conversely, setting this parameter too low tends to increase overhead, particularly when the record sizes are large, by firing off too many bulk loader operations. A value in the range 400M (default) and 1G seems to work best.

Caution

Do not set this parameter to 0, or the program will bulk load tables in one step, use a huge amount of temporary storage, and eliminate all overlapped processing.

For details about max_temp_storage parameter, see [Bulk_Loader].

Using the BCP API eliminates the use of temporary files and allows for overlapped data extraction operations. Combined with optimized Client code for data extraction, clone speeds are much faster than before (especially in cases where the bottleneck is the CPU). The benefits of this mode of operation are much more dramatic when using a system with multiple CPUs.

Overlapped Index Creation

The Databridge Clients use a separate thread to execute the index creation scripts for tables after the data extraction for the table is completed. This allows lengthy index creation operations to be overlapped with the bulk loading of tables and has a noticeable impact on speeding up the data extraction process when many data sets are involved.

Optimizing State Information Passing

The Databridge Client optimizes the process of updating the stateinfo in the DATASETS Client control table, which is identical for all data sets that are in update tracking mode (ds_mode=2). Instead of updating every data set each time the stateinfo is updated by Databridge Engine prior to a COMMIT, it stores the common information in a dummy data set in the DATASETS table named Global_DataSet. When the Databridge Client is finished processing updates, the stateinfo in the Global_DataSet entry in the DATASETS Client control table is copied to all data sets that need to be updated with the stateinfo. The Global_DataSet row is stored in the DATASETS Client control table along with the in_sync column that keeps track of stateinfo synchronization between updated data sets and the Global_DataSet. This significantly reduces the number of SQL update statements for the DATASETS Client control table when you have a large number of data sets.

To keep the DATASETS table current, particularly when READ ACTIVE AUDIT is set to TRUE in the Databridge Engine Control File, the Databridge Client copies the stateinfo in the Global_DataSet entry to all data sets whose in_sync column is 1 after an audit file switch.

When the Client is restarted after it abends, it detects the fact that the data sets are out of sync. The Client corrects this situation by copying the global stateinfo to all data sets whose in_sync column is 1 and setting all of the in_sync columns to 0.

Multiple Statements and Pre-parsed SQL Statements

The aux_stmts parameter applies only to the Databridge Client during update processing (not cloning).

The aux_stmts parameter defines the maximum number of ODBC or OCI auxiliary statements that can be assigned to SQL statements. Using auxiliary statements allows SQL statements to be parsed once and executed multiple times, as long as the auxiliary statement is not reassigned to hold another SQL statement.

In general, higher values for aux_stmts result in faster update processing times at the expense of more memory usage. If you have the memory to spare, increase this parameter as needed. The optimum setting for this parameter will depend on the nature of your data. We recommended using a value of 100 or higher, which allows the SQL statements to be re-executed without having to reparse them and re-bind their host variables.

For details, see aux_stmts.

Reducing the Number of Updates to Data Tables

If your most frequently updated data sets have a significant number of items with OCCURS clauses that are not flattened, you may want to set the DSOPT_Use_bi_ai bit in the ds_options column of the corresponding DATASETS entries. The configuration file parameter optimize_updates causes the define command to set this bit for all data sets that have active items with unflattened OCCURS clauses.

Note

To set this parameter from the Client Configurator by checking the checkbox "Optimize SQL updates" in the Customizing > Advanced page of the Client Configuration dialog.

If the ratio of SQL rows to DMSII records is five or more during update processing, setting this parameter to True will likely improve performance. Note that this increases the TCP/IP and CPU overhead. If the ratio of SQL rows to DMSII records is low, you won't see any significant reduction in SQL overhead, which can hinder performance.

For best results, set the DSOPT_Use_bi_ai bit only for data sets that have a high ratio of SQL rows to DMSII records. For example, a data set that has only one item with an OCCURS 2 TIMES clause is a poor candidate for the DSOPT_Use_bi_ai bit (SQL rows/DMSII records = 3). Conversely, a data set that has 3 items with OCCURS 12 TIMES clauses is a good candidate to use the DSOPT_Use_bi_ai bit (SQL row/DMSII records = 37).

For details, see optimize_updates.

Commit Frequency

The size of transactions in the relational database can have a noticeable impact on performance during update processing. Large transaction sizes will result in the Client getting locked out during the commit when it tries to update a table. Conversely small transactions can add overhead by constantly committing a small number of updates. Some site use very low commit frequency parameters to get lower lag times during on-line periods (lag time is defined as the elapsed time between the time when an update is applied to the relational database and the time it was applied to DMSII). This type of commit frequency might not work well during the processing of audit files created by batch jobs. To solve this problem we implemented a second set of commit frequency parameters that are enabled by setting the batch_job_period parameter to define the time period during which batch jobs run, for example batch_job_period = 22:00, 01:00. This example specifies that batch period spans from 11:00 pm to 1 am. When the Client detects that the audit timestamp crosses these boundaries it switches to using the appropriate values for the commit frequency parameters. On-line periods use the first set of values, while batch period use the second set of values. You can override these settings from the Administrative Console, however these settings will be overridden by the configured values at the next period switch.

Other Considerations

A few other factors that can significantly affect performance include:

The number of CPUs (at least four are recommended)
The type of CPU
The amount of memory on your Client machine
The type of disks you use. Redundant array of independent disks (RAID) or striped disks are recommended. During data extraction, do not use the same physical disks for temporary files and the database files. Ideally, use RAID for the database files and a separate disk for the Databridge Client files (bulk loader temporary files, scripts files, log files, and so on).
The condition of your database

Tips for Efficient Cloning

When you first run the define command to populate the Client control tables, you will notice that most DMSII data sets are set to be cloned. Although you can accept the default data sets and their items for cloning, note the following:

Cloning an entire DMSII database can take several hours or more. Most sites do not clone the entire DMSII database.
The cloning of the DMSII restart data set is automatically disabled. The restart data set is related to restarting the DMSII database only.
If you clone virtual data sets, do not disable the cloning of the data sets from which the virtual data set is derived. (Virtual data sets have a value of 17 in the subtype column of the DATASETS table.) Virtual data sets are created on the host and are explained in the Databridge Programmer's Reference.
Make sure that you have enough disk space on the relational database server for the DMSII data. If there is not enough room, the cloning process stops. In addition to the space required for DMSII data, you must have some additional file space for the temporary files used by the bulk loader utilities (bcp for Microsoft SQL Server; SQL*Loader for Oracle).

Note

The bulk loader temporary files should not be on the same disk as the relational database.
You do not have to clone all of the data sets at one time. One approach is to clone the most essential data sets and then determine how much space is still available.
If you do not want to clone secondary tables (those tables generated from a data set), you have two choices. In either case, the primary table is still generated, while the resulting secondary tables are not.
- Set the value of the active column (for that table) in the corresponding DATATABLES Client control table entry to 0 (script.user_define.primary_tablename).
- Set the value of the active column in the corresponding DMS_ITEMS Client control table entry to 0 for an item with an OCCURS clause (script.user_layout.primary_tablename).
- Flatten the OCCURS in either the primary or the secondary table.
If the active columns for all tables related to a data set are 0, Databridge sets the active column of the corresponding entry in the DATASETS table to 0.
The cloning of DMSII remaps is automatically disabled because the remaps are just different views of the base data sets; the Databridge Client assumes that the base data set will be cloned. We recommend, therefore, that you clone the base data set and then set up a view in the relational database to achieve the same result as the REMAP.

If you do want to clone a remap of a data set instead of the base data set, you can do so by changing the values of the active columns of the data sets in the DATASETS Client control table. You can identify remaps by their base structure number (base_strnum). For example, if structure numbers (strnum) 11, 121, and 227 are remaps of structure number 10, the base structure number for structures 11, 121, and 227 is 10.

For data sets that are not remaps, the strnum and base_strnum columns are equal. If you do not want to clone anything related to a particular data set, set the value of the active column (for that data set) in the corresponding DATASETS Client control table entry to 0. No tables (primary or secondary) are generated from this data set.

For more information about REMAPS, see the next section.

REMAPS

If the base structure of a REMAP is not visible to the Client (due to GenFormat filtering or the use of a logical DMSII database) the Client will set the active column of the first REMAP to 1, and clone it in place of the base structure. If the base structure of an embedded data set is not visible to the Client and it has a REMAP, the Client will set the active column of the REMAP to 1 and use it as the parent instead.

Note

If you are using the Administrative Console or Client Configurator and you want to change the active column of a data set that is 0, you should be aware that the data sets view has a filter.

The picture below shows this for the Administrative Console. To access it, click on the Filters button above the data sets. To see the data sets that have their active column set to 0, you need to click on Inactive and then on Apply. This causes data sets with their active column set 0 to be included in the data set view. The items in question will have "Inactive" in their Status column to indicate that their active columns are 0. You can change the active column by clicking on the gear icon to the left of the data set and moving the slider labeled Active to the on position (dark blue).

Cloning a DMSII Database

Cloning Issues for All Relational Databases

Bulk Loader Parameters

Controlling Temporary File Storage for Windows Clients

Bulk Loader Operations for UNIX Clients

Controlling the Bulk Loader Maximum Error Count

Oracle SQL*Loader Bulk Loader

Files Related to SQL*Loader

Microsoft SQL Server BCP API and bcp utility

bcp_auditor Utility

Files Related to BCP

Files related to the BCP API

Configuring Host Parameters

Running tcptest

Populating the Databridge Data Tables

Data Validation and Discard Files

Numeric Data Validation

Alpha Data Validation

Date Validation

Special Handling of Key Items in Discard Files

Handling Blank Character Data for Key Items (Databridge Client for Oracle)

The Process Command

Cloning a DMSII Database

Process Command Options

Terminate Cloning

Tracking the State of Data Sets

ds_mode values

The Clone Command

Cloning Specific Data Sets

Clone Command options

Configuring for Optimal Performance

Overlapped Bulk Loader Operations for Windows

Overlapped Index Creation

Optimizing State Information Passing

Multiple Statements and Pre-parsed SQL Statements

Reducing the Number of Updates to Data Tables

Commit Frequency

Other Considerations

Tips for Efficient Cloning

REMAPS