Cloning a DMSII Database
This chapter covers the steps to clone a DMSII database.
Cloning Issues for All Relational Databases
We recommend that you read this section before you use the process
or clone
commands.
Disk Space | You need to consider two types of disk space for the Databridge Client, as follows:
|
Column Order | The columns in the Client database are built in a different order than the order in the DMSII database. Specifically, the key items are placed first, followed by the non-key items in DMSII column order. |
Databridge Client Log File | Logging and tracing are separate activities in the Databridge Client. Logging cannot be disabled. Log files are written to the logs subdirectory of the working directory. Trace files are only created when the -t or -d options are used and they are placed in the working directory.Using -t1 is not allowed because this would create a second copy of the log file. You must specify at least one more bit in the trace mask for the option to be accepted. |
Bulk Loader Parameters
Both dbutility process
and dbutility clone
use a bulk loader utility
to populate the Databridge tables in the relational database during the
data extraction phase (not during change tracking). Using the relational
database bulk loader utility greatly increases the speed with which the
Databridge data tables are populated.
This section lists the configuration parameters that affect the Databridge Client operations when using the relational database bulk loader utility. You can use these parameters to do the following:
- Control temporary file storage (
max_temp_storage
parameter, Windows Client only) - Control the bulk loader utility maximum error count (
max_errors
parameter)
You can set the bulk loader parameters from the Administrative Console. These and other configuration parameters are available in the Configure item of the Settings the menu for the data source in question. For information, see the Administrative Console help Help.
Parameters that are specific to the SQL*Loader
and BCP API are discussed in the next sections.
Controlling Temporary File Storage for Windows Clients
During cloning on Windows platforms, the Oracle Client and the SQLServer Client (unless directed to use the BCP API, which does not involve the use of temporary files) writes bulk loader data to multiple temporary text files for each table being loaded.
These temporary text files are used as holding areas for the bulk loader
data. The Windows Client uses overlapped operations to write data to one
set of text files while the bulk loader is loading tables from another set of files.
The configuration file parameter max_temp_storage
determines the
maximum amount of storage to be used by all of the temporary files.
The Databridge Client writes data to as many temporary files as it
needs, while keeping track of the total amount of storage used. When the
amount of storage used exceeds half of the configured value of the
configuration file parameter max_temp_storage
, the Databridge Client
closes all the temporary files and queue the tables on the bulk loader
thread's work queue.
(The default setting for max_temp_storage
is 400 MB.)
While the bulk loader thread is sequentially launching the loads for
for these tables (which run as separate processes), the Databridge Client
starts filling a new set of temporary files for the next group of loads.
This mode of operation significantly enhances performance on systems
that have more than one CPU.
Bulk Loader Operations for UNIX Clients
UNIX Clients do not use temporary text files; instead, they use pipes (such as lpipe_nnn.dat) to communicate data between processes. This introduces a lot more overlap between the Client and the bulk loader, resulting in a much smoother flow of data.
Controlling the Bulk Loader Maximum Error Count
The max_errors
parameter controls the number of data errors allowed
before the bulk loader's operations are canceled. The default value for
max_errors
is 10, which means that the bulk loader aborts after
encountering 10 bad records. These bad records are written to the
discard file for the table in the discards directory and information about
the error is written into the bulk loader log file.
When several bulk loader errors occur, increasing the maximum error
count allows you to gather all the errors in one run rather than finding
10 errors and then having to start over again. For more details, see the
max_errors
parameter description in [Bulk_Loader].
Oracle SQL*Loader
Bulk Loader
This section lists the configuration file parameters that affect cloning with Oracle.
The enable_parallel_mode
parameter, which is only meaningful when
direct mode is enabled, causes the program to include the PARALLEL
option in the SQL*Loader
command line. In direct mode, the loader runs
faster at the expense of system resources; however, enabling this option
has a much more noticeable impact on Windows Clients than on UNIX
Clients.
For non-US sites where the period (.) and comma (,) decimal characters
are swapped, the Databridge Client automatically reads the database's NLS
parameters and makes the necessary adjustments so that the SQL*Loader
input records are formatted using the numeric characters that
SQL*Loader
expects.
The inhibit_direct_mode
parameter applies when you run dbutility for a
remote Oracle database using SQL*Net®.
The following parameters are meaningful only when inhibit_direct_mode
is enabled.
-
The
sqlld_rows
parameter defines the value to be used for the ROWS specification forSQL*Loader
operations. -
The
sqlld_bindsize
parameter defines the value to be used for the BINDSIZE parameter forSQL*Loader
operations. Increasing this value can speed upSQL*Loader
operations when not using DIRECT mode (for example, running remote to a database on a UNIX system).
For more information about the bulk loader parameters mentioned here, see [Bulk_Loader].
Files related to SQL*Loader
Each execution of SQL*Loader
uses a control file (load_
nnn.ctl
, which is a copy of the file sqlld
.tablename.ctl
created by the generate
command) and
a data file (lpipe_
nnn.dat
) as input.
As a result of the bulk loading process, SQL*Loader
produces a log file (load_
nnn.log
) and, if there are any records that cannot be loaded due to data errors, a discard file (lpipe_
nnn.bad
) for each table. Discard files are placed into the subdirectory named discards.
Windows Log Files
In Windows, to prevent log files and discard files from being
overwritten as a result of successive executions of SQL*Loader
during
segmented bulk load operations, the Databridge Client uses the
SQL*Loader
log and discard files as temporary files and does the
following:
-
At the end of the first load segment, the Databridge Client copies the temporary log file to the permanent log file (
sqlld
.tablename.log
). If a discard file was produced, the Databridge Client also copies the temporary discard file to the permanent discard file (sqlld
.tablename.bad
) in the discards folder. -
At the end of every subsequent load segment, the Databridge Client appends the temporary log files to the end of the permanent log file (
sqlld
.tablename.log
). If a temporary discard file was produced, the Databridge Client either copies it or appends it to the permanent discard file (sqlld
.tablename.bad
), depending on whether this file exists or not. -
The Databridge Client deletes the temporary log and discard files as soon as they are appended to Databridge Client permanent log and discard files.
UNIX Log Files
In order to maintain compatibility with the Windows Clients, the UNIX Client renames the log and discard files at the end of a SQL*Loader
operation. Therefore, the log file load
nnn.log
is renamed sqlld
tablename.log
and the discard file lpipe_
nnn.bad
is moved to the discards folder as sqlld_
tablename.bad
in the SQL*Loader
shell scripts.
List of files related to SQL*Loader
The table below lists files related to SQL*Loader
and Databridge Client operations. In some of the filenames below, nnn is the value in the table_number
column of the DATATABLES Client control table. It is unique within each data source.
File | Description |
---|---|
sqlld.tablename.ctl |
The SQL*Loader control file created by the generate command. It describes the format of the data in the data file (lpipe _nnn.dat ). |
lpipe_nnn.dat |
For Windows: This is a temporary file that the Databridge Client creates. It contains the data to be loaded into an Oracle table. Since the Client uses two copies of this file simultaneously when doing overlapped operations, it appends a suffix to the file name to make it unique every time a new file is created. The suffix of the form "_nnn", where nnn is a number that starts at 1 and gets incremented by 1 each time a new file is created for the table in question. Thus the name "lpipe_12.dat" will be changed to lpipe_12_1.dat for the second file and so on.This file is automatically deleted after a successful load of a table. If the table is not loaded successfully, the file is not deleted. This gives you the opportunity to manually run SQL*Loader to determine why it is failing.For UNIX: This is a UNIX pipe that the SQL*Loader shell script creates and uses to pass data to the SQL*Loader program. This pipe is automatically removed after a successful load of a table.If the Databridge Client or SQL*Loader abends, the pipe is not immediately deleted. If you run the Databridge Client again, you receive a warning message as the pipe is being deleted. You can safely ignore this warning, as this is not a fatal error. |
sqlld.tablename.log |
For Windows: This file is a concatenation of all of the load_nnn.log files created during the cloning process. For UNIX: This is the log file generated by SQL*Loader .CAUTION: Do not delete the sqlld.tablename.log file until you have looked at it. It can contain valuable information such as error messages about rows that were not loaded. |
sqlld.tablename.bad |
For Windows: This file is a concatenation of all of the load_nnn.bad files created during the cloning process. It is created in the discards subdirectory only if discard records exist. For UNIX: This is the discard file generated by SQL*Loader that has been moved to the discards folder.CAUTION: Do not delete the sqlld.tablename.bad file until you have looked at it to determine which records were rejected by SQL*Loader . Correct the bad data. Then use SQL*Loader to load these records into the appropriate table. |
The following temporary files are created while the bulk loader is being launched, but they are deleted before the run is completed:
load_nnn.ctl
load_nnn.log
(renamed in UNIX tosqlld_tablename.log
)lpipe_nnn.bad
(renamed in UNIX tosqlld.tablename.bad
)
You see these files only if the bulk loader operation abends.
Microsoft SQL Server BCP API and bcp utility
This section lists the configuration file parameters that affect cloning with BCP API and bcp utility. SQL Server Clients on Windows use the bcp utility by default.
You can make the SQL Server Client use the BCP API, which allows the program to perform bulk loader operations by making BCP API calls, which operate like SQL statements. This produces more overlap between the Client and the load operations, resulting in a much smoother flow of data. We have had some problems with the BCP API, which lead us to change the default mode of operations to use the bcp utility which is more reliable.
It is recommended that you use multi-threaded updates, as this allows multiple tables to be loaded simultaneously by different threads which increases the resource utilization and offers all the advantages of multi-threaded updates when doing data extraction.
Note
Bulk loader operations will run efficiently if the database recovery model is set to "Simple" or "Bulk-logged". If you are running a database with a recovery model of "Full", we recommend that you switch to "Bulk Logged" for the duration of the bulk-load and then switch back to "Full" recovery.
The following Client configuration file parameters affect the bcp utility or BCP API calls made by the Client. For more details, see [EbcdictoAscii].
Parameter | Description |
---|---|
bcp_batch_size |
The Databridge Client using the BCP API or the bcp utility can load a table in several batches instead of loading the entire table in a single operation. You can control the batch size using this parameter. |
bcp_code_page |
Adds the -C code_page to the bcp command line, which specifies the code page of the data in the file. For example, because the Japanese code page is 932, setting this parameter to 932 adds -C 932 to the bcp command line. This parameter is only applicable when using the bcp utility. |
bcp_packet_size |
Defines the network packet size value for the bcp utility (applies to remote servers only). If you have wide tables, setting this parameter to a packet size larger than the default (4096) can speed up loading the data into the table at the expense of system resources. This parameter is only applicable when using the bcp utility. |
bcp_copied_msg | Enables the bcp_auditor program to determine whether or not a bcp was successful in cases where the database language is not English. This parameter is only applicable when using the bcp utility. |
bcp_delim | Defines the delimiter character bcp uses (the TAB character, by default). If you want to preserve TAB characters in your data, set this parameter to a value that allows multiple characters. This parameter is only applicable when using the bcp utility. |
max_errors | Controls the bulk loader’s tolerance to records that are discarded due to data errors. |
max_temp_storage | Activates the segmented bulk load feature, which allows you to specify the maximum amount of storage that dbutility should use for temporary files. This parameter is only applicable when using the bcp utility. |
bcp_auditor Utility
The bcp command files capture bcp execution output by redirecting the
output to a temporary file. These command files then invoke the
bcp_auditor
utility to examine this file to determine if the bcp
operation was successful. The bcp_auditor
utility sets the exit code
such that the Databridge Client can determine if the table load was
successful.
Files related to BCP
Note
The SQL Server Client will only use the BCP API when specifically directed to do so. You can globally control whether the Client is to use the bcp utility or the BCP API, by setting the use_bcp
parameter accordingly in the [bulk_loader] section of the Client configuration file. You must run a redefine
command with the -R
option to make this change take effect. You can force the Client to always use the bcp utility by using the /l
command line option. Alternatively, you can control the loading of tables at the data set level by setting the DSOPT_Use_bcp (0x1000000) bit in the ds_options
column for the corresponding entries in the DATASETS Client control table.
Each execution of bcp uses a format file (bcp
.tablename.fmt
) and a data file (bcppipe
.tablename) as input.
As a result of the bulk loading process, bcp produces a log file (load_*nnn*.log
) for each table. If there are any records that cannot be loaded due to data errors, bcp also produces a discard file (load
_nnn.bad
).
To prevent log files and discard files from being overwritten during segmented bulk load operations, Databridge Client treats bcp log and discard files as temporary files:
-
At the end of the first load segment, the Databridge Client copies the temporary log file to the permanent log file (
bcp
.tablename.log
). If a discard file was produced, the Databridge Client also copies the temporary discard file to the permanent discard file (bcp
.tablename.bad
). -
At the end of every subsequent load segment, the Databridge Client appends the temporary log files to the end of the permanent log file (
bcp
.tablename.log
). If a temporary discard file was produced, the Databridge Client either copies it or appends it to the permanent discard file (bcp
.tablename.bad
), depending on whether this file exists or not. -
The Databridge Client deletes the temporary log and discard files as soon as they are appended to Databridge Client permanent log and discard files.
Files related to bcp and Databridge Client operations are listed in the following table. In some of the filenames below, nnn is the value for the table_number
column in the DATATABLES Client control table. The table number is unique within each data source.
File | Description |
---|---|
bcp.tablename.fmt |
The bcp format file that is created by the generate command. It describes the format of the data in the data file (bcppipe .tablename). |
bcppipe.tablename |
A temporary file created by the Databridge Client. It contains the data to be loaded into a Microsoft SQL Server table. Since the Client uses two copies of this file simultaneously when doing overlapped operations, it appends a suffix to the file name to make it unique every time a new file is created. The suffix is of the form "_nnn" where nnn is a number that starts at 1 and gets incremented by 1 each time a new file is created for the table in question. Thus the name "bcppipe.customer " will be changed to "bcppipe.customer_1 " for the second file and so on.This file is automatically deleted after a successful load for a table. If the table is not loaded successfully, the file is not deleted. This gives you the opportunity to manually run bcp to determine why it is failing. Important: The bcppipe .tablename files can be quite large. When these files are no longer needed, make sure you delete them to prevent errors from occurring. |
bcp.tablename.log |
A concatenation of bcp screen output created during the cloning process. The files are created in the working directory for the data source. CAUTION: Do not delete the bcp .tablename.log file until you have looked at it. It can contain valuable information such as error messages about rows that were not loaded. |
bcp.tablename.bad |
A concatenation of all of the load_nnn.bad files created during the cloning process. These files are created in the discards subdirectory.CAUTION: Do not delete the bcp.tablename.bad file until you have looked at it. It can contain valuable information such as which rows were not loaded. Correct the bad data and use bcp to load these records into the appropriate table. |
The following temporary files are created while the bulk loader is being launched, but they are deleted before the run is completed:
load_nnn.log
load_nnn.bad
You see these files only if the bulk loader operation abends.
Files related to the BCP API
When using the BCP API all errors are logged to the Client log file. If there are discarded records, they are written to the tablename.bad
file located in the discards folder, this is similar to discards during the tracking phase.
PGLoader Utility
The PostgreSQL Client uses the PGLoader utility that is supplied with the PostgreSQL Client to load data into the relational database tables during data extraction.
The following Client configuration file parameters affect the PGLoader utility.
Parameter | Description |
---|---|
bcp_batch_size |
The Databridge Postgres Client using the PGLoader utility can load a table in several batches instead of loading the entire table in a single operation. You can control the batch size using this parameter. |
bcp_code_page |
Specifies the code page of the data in the file to be loaded. The default value is "iso-8859-1". |
max_errors | Controls the PGLoader’s tolerance to records that are discarded due to data errors. |
max_temp_storage | Activates the segmented bulk load feature, which allows you to specify the maximum amount of storage that the Databridge Client should use for temporary files. This parameter is only applicable to Windows platforms. |
Files related to PGLoader
Each execution of PGLoader uses a control file named pgpipe
.tablename.ctl
and a data file named pgpipe
.tablename_
nn as input.
As a result of the bulk loading process, PGLoader produces two log files named pgloader
nn.log
and pgloader
nn.log2
for each table. The files are combined into a single log file named pgloader
_tablename.log
. In the case of Windows this file contains all the log files for the segmented load. The original log files are treated as temporary and deleted after they are merged into the final log file.
If there are any records that cannot be loaded due to data errors, PGLoader also produces a discard file named tablename.bad
in the tmp subdirectory. These file are combined into a
PGLoader Utility
The PostgreSQL Client uses the PGLoader utility that is supplied with the PostgreSQL Client to load data into the relational database tables during data extraction.
The following Client configuration file parameters affect the PGLoader utility.
Parameter | Description |
---|---|
bcp_batch_size |
The Databridge Postgres Client using the PGLoader utility can load a table in several batches instead of loading the entire table in a single operation. You can control the batch size using this parameter. |
bcp_code_page |
Specifies the code page of the data in the file to be loaded. The default value is "iso-8859-1". |
max_errors | Controls the PGLoader’s tolerance to records that are discarded due to data errors. |
max_temp_storage | Activates the segmented bulk load feature, which allows you to specify the maximum amount of storage that the Databridge Client should use for temporary files. This parameter is only applicable to Windows platforms. |
Files related to PGLoader
Each execution of PGLoader uses a control file named pgpipe
.tablename.ctl
and a data file named pgpipe
.tablename_
nn as input.
As a result of the bulk loading process, PGLoader produces two log files named pgloader
nn.log
and pgloader
nn.log2
for each table. The files are combined into a single log file named pgloader
_tablename.log
. In the case of Windows this file contains all the log files for the segmented load. The original log files are treated as temporary and deleted after they are merged into the final log file.
If there are any records that cannot be loaded due to data errors, PGLoader also produces a discard file named tablename.bad
in the tmp subdirectory. These file are combined into a
Configuring Host Parameters
TCP/IP throughput is greatly affected by the BLOCKTIMEOUT
parameter on the host. Typically, the default is 100, which is acceptable for character-oriented communications (for example, Telnet VT™ 100 emulation), but not good for record and block-oriented communications, as with Databridge or FTP (file transfer protocol). For Databridge communications, you can increase throughput by reducing the BLOCKTIMEOUT
parameter to a value of 2.
If the Databridge Client system is on a different subnet from the mainframe, put it on the same subnet so that Ethernet packets can be larger. If you cannot put the Databridge Client on the same subnet as the mainframe, you can improve throughput by adjusting BLOCKSIZE on the host and TCP/IP Window Size on the Windows Server PC.
Running tcptest
During the initial setup, use the tcptest
command to determine if the TCP/IP interface is operating properly. Before you run the tcptest
command, you must define a data source. For more information, see the table in dbutility Commands. An example of the test is shown below:
E:\>dbutility tcptest demodb 111.222.33.444 5555 100 1000
11:49:10 Databridge Client version 7.1.0.000 [OCI/Oracle]
11:49:10 (C) 1995-2024 Rocket Software Inc. or its affiliates. All Rights Reserved.
11:49:14 Connecting to 111.222.33.444, port 5555
11:49:16 TCP_Test: len=100, count=1000
11:49:17 Bytes Processed 100.00 KB of DMSII data in 1.000 secs, throughput = 100.00 KB/sec
11:49:17 Bytes Received 112.00 KB in 1.000 secs, total throughput = 112.00 KB/sec
11:49:17 TCP/IP_time = 0.841 secs, (84.10% of total time)
11:49:17 TCP Test completed successfully
11:49:17 Client exit code: 0 - Successful
Populating the Databridge Data Tables
Before you populate the Databridge data tables, determine if you need to customize the character translation tables or not. If you do, modify the [EbcdictoAscii] section of the Client configuration file before you run either the process
or clone
command. For more information on character translation tables and modifying the configuration file, see [EbcdictoAscii] and Export or Import a Configuration File.
You can populate the Databridge data tables in the relational database using either of the following methods:
- dbutility
process
- dbutility
clone
The process
and clone
commands use the relational database bulk loader utility to populate the Databridge tables.
Note
We recommend that you read one of the previous sections, Oracle SQL*Loader
Bulk Loader or Microsoft SQL Server BCP API and bcp utility before you use the dbutility clone
or dbutility process
command.
The process
command is typically used to populate the data tables. The clone
command is a special case of the process
command that allows you to clone a small number of data sets without changing the values of the corresponding entries in the active
column of the DATASETS Client control table.
Data Validation and Discard Files
While processing DMSII extract and update records, Databridge validates all numeric and alpha fields. Fields that contain NULL values (data with all high-bits set) usually are recognized as DMSII NULLS. In this section, the following types of data validation and discard files are described:
- Numeric data validation
- Alpha data validation
- Date validation
- Special handling of key items in discard files
- The handling of blank character data for key items in the Databridge Client for Oracle
Numeric Data Validation
Numeric data that contains illegal digits (for example, values other than 0 through 9, excluding the sign field for signed numbers) are flagged as bad. If the da_options
column of the corresponding DATAITEMS control table entry has the DAOPT_Allow_Nulls bit (1) set, Databridge treats numeric items that have bad digits as NULL.
The configuration parameter allow_nulls
defines the default value for this bit, which can be altered by user scripts. If the bit is zero, the NULL or bad numeric data is stored as either all 9s or all 0s based on the value of the configuration parameter, null_digit_value
(default value is 9). For more information, see bracket_tabnames and null_digit_value.
Alpha Data Validation
With alpha data, bad characters are usually replaced with a question mark (?
) instead of the whole field being set to NULL. The Client configuration file parameter inhibit_ctrl_chars
determines whether or not control characters are to be treated as bad characters. (The program treats a few control characters, such as NUL, CR, and LF, as bad regardless of this parameter's value.)
The Client configuration file parameter inhibit_8_bit_data
determines whether or not 8-bit characters are to be treated as bad characters. In this context, "8-bit" refers to ASCII data larger than 0x7F and includes non-English accented letters. When the parameter inhibit_8-bit_data
is true, Databridge replaces the 8-bit character with ?
.
If your data contains non-English characters, the inhibit_8-bit_data
parameter should never be set to true.
Note
All data translations can be customized using the Translation table section in Appendix C: Client Configuration.
The Client configuration parameter convert_ctrl_char
(which is incompatible with inhibit_ctrl_chars
) replaces control characters by spaces instead or question marks. For more information, see [Bulk_Loader].
The Client configuration file parameter alpha_error_cutoff
determines the percentage of bad characters in an ALPHA field that are tolerated before the entire field is declared bad and treated as NULL.
If ALPHA data is stored as binary data, no alpha data validation is performed because no invalid values exist in binary data. See the DIOPT_Clone_as_Binary option in the di_options
column of DMS_ITEMS.
NULL data is treated as NULL if the da_options
column of the corresponding DATAITEMS control table entry has the DAOPT_Allow_Nulls bit (1) set. Otherwise, the NULL data is stored as blanks.
Date Validation
Whenever Databridge processes numeric or alpha items that are cloned as
relational database date data types, it checks the validity of the data.
Invalid dates are usually treated as NULL. The Databridge Client for
Microsoft SQL Server stores bad or NULL dates as 1/1/1900, when the
DAOPT_Allow_Nulls bit (1) in the da_options
column of the corresponding
DATAITEMS control table entry has not been set. The Databridge Client
for Oracle uses the date 1/1/0001 instead. A numeric date of all 0s or
all 9s is treated as NULL rather than an error. Similarly, an ALPHA date
that is all blanks is treated as a NULL date.
Special Handling of Key Items in Discard Files
Because the stored procedures used during update processing use equality tests in the where clauses, key items (items that are used in the index for a table) can never be NULL. In relational databases, you cannot use equality tests for items that are NULL.
If a key item has a data error or it is NULL, Databridge places the
entire record in a discard file named tablename.bad
in the discards subdirectory. The syntax for discard file data is
the calling sequence that would typically be used for the stored
procedure that performs the update. Therefore, discarded records from
both the data extraction and update phases are identical. Databridge
preserves bad numeric digits and characters to help you better
troubleshoot the problem.
Note
During data extraction records discarded by the Client and those discarded by the bulk loader end up in different files in the discards folder (in the case of the SQL Server Client tablename.bad
and bcp
.tablename.bad
respectively). Bulk loader discards are extremely rare as the Client catches all the data errors.
Handling Blank Character Data for Key Items (Databridge Client for Oracle)
The Databridge Client strips all trailing blanks when constructing SQL statements using varchar data. When an application reads the records back from the database, the access routines put back the trailing blanks, greatly reducing the storage requirements for the SQL statements and bulk loader data files.
In Oracle, char or varchar items that have a length of 0 are treated as
NULL. If any of the key items used in where
clauses are NULL, the
corresponding update or delete SQL statements fail as mentioned above.
To prevent the key item from becoming NULL, the Databridge Client for
Oracle keeps the last blank of the item.
The Process Command
The process
command is the main command of the Databridge Client. It
populates and updates the tables for all data sets whose active
column
is 1 in the corresponding entries of the DATASETS Client control table.
Since the define
command initializes the ds_mode
column, all the
selected data sets are cloned the first time you run a process
command.
Note
If you do not select specific data sets in the data set global mapping customization script, the Databridge Client automatically clones all data sets except for remaps, the restart data set, and the global data set. This operation may take a very long time and require a lot of disk space.
You can schedule the process
command to update the Databridge data
tables. The schedule becomes effective after you run the process
command for the first time. For more information, see Scheduling dbutility Updates.
To populate the Databridge data tables in the relational database via
the dbutility process
command, you must first make sure that the
current directory is set to the working directory you created for this
data source. This must be the same working directory you used when you
executed a generate
command for this data source; otherwise, the
Databridge Client cannot locate the scripts to create and populate the
Databridge data tables.
Cloning a DMSII Database
Use the following procedure to clone a DMSII database via the process
command.
To run the process
command
-
Make sure that Databridge Server is running. If it is not, the Databridge Client will try to connect to the host and eventually time out.
-
Make sure that your signon parameters are configured appropriately.
-
If you plan to use the [EbcdictoAscii] section to customize character translation or any other parameters in the
dbridge.cfg
file, set them before you continue. In particular, make sure you have appropriate settings for the following parameters. (For information on setting these parameters, see Appendix C: Client Configuration.) -
The following parameters affect the way a process or clone command operates. You can change these parameters before running these command without running a redefine command as they do not change anything in the control tables:
alpha_error_cutoff aux_stmts batch_job_period century_break commit_absn_inc commit_idle_database commit_longtrans commit_time_inc commit_txn_inc commit_update_inc controlled_execution (dbutility only) convert_reversals correct_bad_days dbe_dflt_origin defer_fixup_phase discard_data_errors display_bad_data eatran_dll_name enable_af_stats enable_doc_records enable_encryption enable_minimized_col enable_optimized_sql engine_workers error_display_limits inhibit_8_bit_data inhibit_console inhibit_ctrl_chars inhibit_drop_history inhibit_init_values keep_undigits linc_century_base masking_parameter (SQL Server only -- run a generate command when changed) max_clone_count (Only meaning when using the -s option) max_discards max_retry_secs max_srv_idle_time max_temp_storage (Windows only) max_wait_secs n_dmsii_buffers n_update_threads null_datetime_value (SQL Server only) null_datetime2_value (SQL Server only) null_digit_value numeric_date_format preserve_deletes set_blanks_to_null set_lincday0_to_null show_perf_stats show_statistics show_table_stat sql_exec_timeout sql_heart_beat statistics_increment stop_after_fixups stop_after_gc_reorg stop_after_given_afn (dbutility only) stop_on_dbe_mode_chg track_vfds_nolinks use_dbwait use_latest_si
-
Enter the following command:
dbutility [signon_options misc_options] process datasource
Where Is signon_options For each Databridge Client type, the following command-line options specify the relational database signon parameters:
Oracle: [-U userid] [-P password] [-D database]
SQL Server: [-U userid] [-P password] [-W] [-O ODBCdatasource]misc_options See table in the next section titled "Process Command Options". datasource The name of the data source specified in the DBServer control file (DATA/SERVER/CONTROL) or via Enterprise Server.
If the Databridge Client connects to DBServer, it selects all the data
sets whose corresponding active
columns have a value of 1 in the
DATASETS table. Next, the Databridge Client requests that DBServer clone
all the selected data sets. At the end of the data extraction phase, the
Databridge Client issues another request to start sending the fixup
records followed by updates. The processing of audit files continues
until there are no more audit files available.
If the Databridge Client connects to DBEnterprise, DBEnterprise supplies the data, either by reading the DMSII data set directly (direct disk) or by issuing a request to DBServer to have Databridge Engine read a block of data from a specific region of the disk (remote regions). DBEnterprise then processes this block of data. Since Databridge Engine is only reading raw data in remote regions mode and does not do any processing of this data, this mode of operations is less expensive than having the Client connect directly to DBServer in term mainframe resource utilization. Direct disk mode offers the biggest mainframe resource savings, as DBServer is not involved in reading data sets.
In the case of audit file data, DBEnterprise either reads the data from its caches (if configured), or it reads the audit file directly by issuing a request to DBServer to have Databridge Engine read a block of data from a specific region of the disk.
After the cloning of the DMSII database completes, the tables in the relational database will contain the same data as DMSII. At this point you can execute SQL queries to view the data and make sure that all the tables have been populated. When you are ready to update the relational database with changes made to the DMSII database, see Updating the Databridge Data Tables.
Process Command Options
Command-line options related to the process
command are as follows:
Option | Description |
---|---|
-c |
Toggles the defer_fixup_phase configuration file parameter. |
-d |
Enables default tracing. |
-f filename |
Specifies a configuration file other than the default dbridge.cfg file in the working directory. |
-l |
(SQL Server only) forces the Client to use the bcp utility instead of the BCP API. |
-s |
Tells the Client not to use the bulk loader. |
-t |
Enables selective tracing. |
-w |
Toggles the use_dbwait parameter in dbridge.cfg |
-K |
Inhibits the audit file removal WFL from being run on the host. |
-L |
Forces the Client to use a new log file for this run. |
-T |
Forces the Client to use a new trace file for this run, if tracing is enabled. |
For information on the command-line options, see dbutility Command-Line Options.
Terminate Cloning
Use the following procedures to stop the cloning process before it is complete.
To terminate cloning
-
When using the service, from the Administrative Console, use the
Abort
command in the Run menu for the data source. -
When using dbutility, use the
QUIT NOW
command.
To terminate processing during the fixup and tracking phases
-
When using the service, from the Administrative Console, use the
Stop
command in the Run menu for the data source. The Databridge Client will stop at the next quiet point. -
When using dbutility, use the
QUIT
command (or the SIGTERM (15) signal on UNIX).Note
If you issue a
QUIT
command or send a SIGTERM signal to the program during the data extraction phase, the Databridge Client stops only when the fixup phase begins.
When you terminate the Client during the fixup phase or during updates,
the process
command restarts from the last commit point. If you
terminate the Client during the data extraction phase, only the data
sets that have successfully completed the data extraction phase
(ds_mode = 1) are recoverable. You can resume the process by running
another process
command.
In the unlikely event that all of these commands fail to terminate the Client, press Ctrl+C or kill the run.
Tracking the State of Data Sets
The DATASETS Client control table keeps track of the state of data sets.
State information consists of the ds_mode
value and the DMSII audit file
location from which subsequent updates should be processed. The audit
file location includes the AFN, the ABSN, the segment and index in the
audit files, and the audit file time stamp. These values, which are
collectively referred to as the stateinfo, are stored in the
audit_filenum
, audit_block
, audit_seg
, audit_inx
, and audit_time6
columns of the
DATASETS Client control table. The column audit_ts
contains a date/time
value, which corresponds to the audit_time6
data, which is binary and
represents a DMSII TIME(6) value. This last column is not part of the
stateinfo; it is there because knowing the audit time stamp value can
sometimes be very useful.
Each subsequent time you run a process
command, the Databridge Client
passes the stateinfo and the mode of each data set to the Databridge
Engine. The Engine uses this information to determine whether data sets
should be cloned and the starting location in the audit trail. From that
starting location, the Databridge Engine begins processing updates to
the DMSII database. Every time a transaction group ends, the Databridge
Client updates the stateinfo for the data sets in the DATASETS Client
control table. At the end of the process
command, the location of the
last quiet point in the audit trail is saved in the DATASETS Client
control table. This is the starting point for the next Client run (process
command).
If the in_sync
column of a data set has a value of 1, its stateinfo
columns may be out-of-date. You can determine if it is current by
checking the Global_Dataset entry in the DATASETS control table. For more information, see Optimizing State Information Passing.
ds_mode values
The following values are defined for the ds_mode
column of the DATASETS
Client control table:
Value | Name | Description |
---|---|---|
0 | CLONE | Initial state of ds_mode before the data set is cloned. |
1 | FIXUP | Data extraction completed, fixup processing not completed. |
2 | NORMAL | Normal update tracking mode. |
10 | BCP-FAILURE | The bulk loading of the table failed. Further processing is not possible until the problem is resolved. |
11 | PRE-FIXUP | Data extraction completed, fixup processing cannot be done due to index creation errors or lack of an index. |
12 | INVALID-AA | AA Values invalidated by a DMSII garbage collection reorganization. |
31 | NEEDREORG | The data set needs to be reorganized and the redefine command has created scripts to make the relational database table match the new layout that resulted from the reorganization of the DMSII data set. You must run the reorganize command in order to run the reorganization scripts created by the redefine command. |
33 | REORGFAILED | The data set needs to be reorganized and the scripts created by the redefine command for this data set failed when the reorganize command was run. In this case, you must manually alter the table or re-clone it. |
In the case of DMSII reorganizations, the status_bits
column in the DATASETS table is used instead. The Databridge Client leaves the ds_mode
column unchanged and sets the DS_Needs_Redefining bit (8) of the status_bits
column of the DATASETS Client control table.
Following the initialization (purge) of a data set, the Client is notified of the purge. The Client drops the tables for the data set and recreates them. The ds_mode
of the data set is set to 2 and the index for the empty tables are created. This enables the normal update processing to repopulate the tables. .
The Clone Command
From a command line, use the clone
command to select the data sets you
want to clone. You can use this command for cloning or recloning. To
update the resulting Databridge data tables, you must use the process
command. The process
command is generally recommended instead the
clone
command, unless you want to deal only with a specific data set
without processing updates at the same time.
The clone
command is basically a process
command, except that it forces
the data sets specified on the command line to have their ds_mode
set to 0
and it treats all data sets not specified on the command line as if their
active
column is 0.
To populate the Databridge data tables in the relational database via
the clone
command, first make sure that the working directory is set
to the directory you created for this data source. This must be the same
directory as the working directory used when you executed a generate
command for this data source; otherwise, the Databridge Client cannot
locate the scripts to load the Databridge data tables.
Cloning Specific Data Sets
Using the clone command follow the exact same procedure described in the section on "Cloning a DMSII Database". type the following command line:
dbutility [signon_opts misc_opts] clone source dataset1 [... datasetn]
Where | Is |
---|---|
signon_opts | For each Databridge Client type, the following command-line options specify the relational database signon parameters: Oracle: [-U userid] [-P password] [-D database] SQL Server: [-U userid] [-P password] [-W] [-O ODBCdatasource ] PostgreSQL: [-U userid] [-P password] [-O ODBCdatasource ] |
misc_options | See table in the next section titled Clone Command Options. |
source | The name of the source specified in the DBServer control file or by Enterprise Server. |
dataset1 [... datasetn] |
The names of the data sets you want to clone. You must specify at least one data set name. If you specify more than one data set name, separate the names with spaces. Note the following:
|
Warning
If for some reason the clone
command abends, do not rerun it before you determine whether or not some of the data sets completed the data extraction phase and are recoverable. Rerunning the clone
command starts the cloning operations from scratch.
If you need to re-clone the entire database you can use the process
command with the -Y
option. When using dbutility you need to add the text "all_datasets" as an argument to the -Y
option.
Clone Command options
Command-line options related to the clone
command are as follows:
Option | Description |
---|---|
-c |
Toggles the defer_fixup_phase configuration file parameter . When you use this option, the dbutility clone does not enter the fixup phase at the end of data extraction. Instead of issuing request to the Databridge Server to initiate the fixup phase, the Databridge Client terminates. The ds_mode values of all cloned data sets remain set to 1 with all of the necessary stateinfo stored in the Client control tables (for example, audit_filenum , audit_block , and host_info ). The next process command then picks up where the clone command left off. |
-d |
Enables default tracing. |
-f filename |
Specifies a configuration file other than the default dbridge.cfg file in the working directory. |
-l |
(SQL Server only) forces the Client to use the bcp utility instead of the BCP API. |
-s |
Tells the Client not to use the bulk loader. |
-t |
Enables selective tracing. |
-x |
Reverses the meaning of the data set list for the clone command, as follows: Without the -x option, the Databridge Client clones the data sets listed on the command line. With the -x option, the Databridge Client clones all active data sets except those listed on the command line. |
-L |
Forces the Client to use a new log file for this run. |
-T |
Forces the Client to use a new trace file for this run, if tracing is enabled. |
For information on the command-line options, see dbutility Command-Line Options.
Configuring for Optimal Performance
Several configuration file parameters have a very visible effect on the performance of the Databridge Client. Databridge Clients operate efficiently with the following default configuration file parameters:
-
max_temp_storage
(Windows only) -
aux_stmts
-
optimize_updates
This section discusses these parameters and other factors that can make Databridge Client run more efficiently.
Overlapped Bulk Loader Operations for Windows
Note
The SQL Server Client uses the bcp utility by default in this version, as using the BCP API did not prove to be very reliable.
The Windows Clients use a separate thread to bulk load tables using SQL*Loader
or bcp during the data extraction phase of data
set cloning. The Client creates a set temporary files for the tables that hold the data for the records received from the Databridge Engine.
When the total temporary file size reaches half of the value specified by the parameter max_temp_storage
, all
the tables that have temporary files get placed on the bcp thread's work queue. The bcp thread processes the list of tables and does the
loads sequentially. While this is happening the Client creates a new set of temporary files for tables that that get additional
records and continue processing DMSII data until the total temporary file size reaches the above mentioned threshold. If the bcp thread is done
loading the table involved, processing continues. However, if the bcp is not finished loading the data for the table Client stops
processing DMSII data until it is signalled that the loading of the table completed.
Starting with the 6.6 Client multi-threaded updates were extended to also include extracts, which improves performance as the load is distributed among the update threads. This leads to better resource utilization, as multiple CPUs are working concurrently on processing the data for the records that are received from the Databridge Engine.
Note
Configuration file parameters for increasing bulk loader speed are listed with the related bulk loader utility in the sections Oracle SQL*Loader
Bulk Loader and Microsoft SQL Server BCP API and bcp utility. See a complete list in Appendix C's section on [Bulk_Loader]
Adjust the max_temp_storage
value to determine what works best for
your site. Setting this parameter too high tends to reduce the benefits
of using multiple threads to launch bulk loader operations. Conversely,
setting this parameter too low tends to increase overhead, particularly
when the record sizes are large, by firing off too many bulk loader
operations. A value in the range 400M (default) and 1G seems to work best.
Caution
Do not set this parameter to 0, or the program will bulk load tables in one step, use a huge amount of temporary storage, and eliminate all overlapped processing.
For details about max_temp_storage
parameter, see [Bulk_Loader].
Using the BCP API eliminates the use of temporary files and allows for overlapped data extraction operations. Combined with optimized Client code for data extraction, clone speeds are much faster than before (especially in cases where the bottleneck is the CPU). The benefits of this mode of operation are much more dramatic when using a system with multiple CPUs.
Overlapped Index Creation
The Databridge Clients use a separate thread to execute the index creation scripts for tables after the data extraction for the table is completed. This allows lengthy index creation operations to be overlapped with the bulk loading of tables and has a noticeable impact on speeding up the data extraction process when many data sets are involved.
Optimizing State Information Passing
The Databridge Client optimizes the process of updating the stateinfo in
the DATASETS Client control table, which is identical for all data sets
that are in update tracking mode (ds_mode=2). Instead of updating every
data set each time the stateinfo is updated by Databridge Engine prior
to a COMMIT, it stores the common information in a dummy data set in the
DATASETS table named Global_DataSet
. When the Databridge Client is
finished processing updates, the stateinfo in the Global_DataSet
entry
in the DATASETS Client control table is copied to all data sets that
need to be updated with the stateinfo. The Global_DataSet
row is stored
in the DATASETS Client control table along with the in_sync
column that
keeps track of stateinfo synchronization between updated data sets and
the Global_DataSet
. This significantly reduces the number of SQL update
statements for the DATASETS Client control table when you have a large
number of data sets.
To keep the DATASETS table current, particularly when READ ACTIVE AUDIT
is set to TRUE in the Databridge Engine Control File, the Databridge
Client copies the stateinfo in the Global_DataSet
entry to all data sets
whose in_sync
column is 1 after an audit file switch.
When the Client is restarted after it abends, it detects the fact that
the data sets are out of sync. The Client corrects this situation by
copying the global stateinfo to all data sets whose in_sync column
is 1
and setting all of the in_sync
columns to 0.
Multiple Statements and Pre-parsed SQL Statements
The aux_stmts
parameter applies only to the Databridge Client during
update processing (not cloning).
The aux_stmts
parameter defines the maximum number of ODBC or OCI
auxiliary statements that can be assigned to SQL statements. Using
auxiliary statements allows SQL statements to be parsed once and
executed multiple times, as long as the auxiliary statement is not
reassigned to hold another SQL statement.
In general, higher values for aux_stmts
result in faster update
processing times at the expense of more memory usage. If you have the
memory to spare, increase this parameter as needed. The optimum setting
for this parameter will depend on the nature of your data. We recommended
using a value of 100 or higher, which allows the SQL statements to be
re-executed without having to reparse them and re-bind their host variables.
For details, see aux_stmts.
Reducing the Number of Updates to Data Tables
If your most frequently updated data sets have a significant number of
items with OCCURS clauses that are not flattened, you may want to set
the DSOPT_Use_bi_ai bit in the ds_options
column of the corresponding
DATASETS entries. The configuration file parameter optimize_updates
causes the define
command to set this bit for all data sets that have
active items with unflattened OCCURS clauses.
Note
To set this parameter from the Client Configurator by checking the checkbox "Optimize SQL updates" in the Customizing > Advanced page of the Client Configuration dialog.
If the ratio of SQL rows to DMSII records is five or more during update processing, setting this parameter to True will likely improve performance. Note that this increases the TCP/IP and CPU overhead. If the ratio of SQL rows to DMSII records is low, you won't see any significant reduction in SQL overhead, which can hinder performance.
For best results, set the DSOPT_Use_bi_ai bit only for data sets that have a high ratio of SQL rows to DMSII records. For example, a data set that has only one item with an OCCURS 2 TIMES clause is a poor candidate for the DSOPT_Use_bi_ai bit (SQL rows/DMSII records = 3). Conversely, a data set that has 3 items with OCCURS 12 TIMES clauses is a good candidate to use the DSOPT_Use_bi_ai bit (SQL row/DMSII records = 37).
For details, see optimize_updates.
Commit Frequency
The size of transactions in the relational database can have a noticeable impact on performance during update processing. Large transaction sizes will result in the Client getting locked out
during the commit when it tries to update a table. Conversely small transactions can add overhead by constantly committing a small number of updates. Some site use very low commit frequency
parameters to get lower lag times during on-line periods (lag time is defined as the elapsed time between the time when an update is applied to the relational database and the time it was applied to DMSII). This type of commit frequency might not work well during the processing of audit files created by batch jobs. To solve this problem we implemented a second set of commit frequency parameters that are enabled by setting the batch_job_period
parameter to define the time period during which batch jobs run, for example batch_job_period
= 22:00, 01:00. This example specifies that batch period spans from 11:00 pm to 1 am. When the Client detects that the audit timestamp crosses these boundaries it switches to using the appropriate values for the commit frequency parameters. On-line periods use the first set of values, while batch period use the second set of values. You can override these settings from the Administrative Console, however these settings will be overridden by the configured values at the next period switch.
Other Considerations
A few other factors that can significantly affect performance include:
-
The number of CPUs (at least four are recommended)
-
The type of CPU
-
The amount of memory on your Client machine
-
The type of disks you use. Redundant array of independent disks (RAID) or striped disks are recommended. During data extraction, do not use the same physical disks for temporary files and the database files. Ideally, use RAID for the database files and a separate disk for the Databridge Client files (bulk loader temporary files, scripts files, log files, and so on).
-
The condition of your database
Tips for Efficient Cloning
When you first run the define
command to populate the Client control
tables, you will notice that most DMSII data sets are set to be cloned.
Although you can accept the default data sets and their items for
cloning, note the following:
-
Cloning an entire DMSII database can take several hours or more. Most sites do not clone the entire DMSII database.
-
The cloning of the DMSII restart data set is automatically disabled. The restart data set is related to restarting the DMSII database only.
-
If you clone virtual data sets, do not disable the cloning of the data sets from which the virtual data set is derived. (Virtual data sets have a value of 17 in the
subtype
column of the DATASETS table.) Virtual data sets are created on the host and are explained in the Databridge Programmer's Reference. -
Make sure that you have enough disk space on the relational database server for the DMSII data. If there is not enough room, the cloning process stops. In addition to the space required for DMSII data, you must have some additional file space for the temporary files used by the bulk loader utilities (bcp for Microsoft SQL Server;
SQL*Loader
for Oracle).Note
The bulk loader temporary files should not be on the same disk as the relational database.
-
You do not have to clone all of the data sets at one time. One approach is to clone the most essential data sets and then determine how much space is still available.
-
If you do not want to clone secondary tables (those tables generated from a data set), you have two choices. In either case, the primary table is still generated, while the resulting secondary tables are not.
-
Set the value of the
active
column (for that table) in the corresponding DATATABLES Client control table entry to 0 (script.user_define
.primary_tablename). -
Set the value of the
active
column in the corresponding DMS_ITEMS Client control table entry to 0 for an item with an OCCURS clause (script.user_layout
.primary_tablename). -
Flatten the OCCURS in either the primary or the secondary table.
-
-
If the
active
columns for all tables related to a data set are 0, Databridge sets theactive
column of the corresponding entry in the DATASETS table to 0. -
The cloning of DMSII remaps is automatically disabled because the remaps are just different views of the base data sets; the Databridge Client assumes that the base data set will be cloned. We recommend, therefore, that you clone the base data set and then set up a view in the relational database to achieve the same result as the REMAP.
If you do want to clone a remap of a data set instead of the base data set, you can do so by changing the values of the
active
columns of the data sets in the DATASETS Client control table. You can identify remaps by their base structure number (base_strnum). For example, if structure numbers (strnum) 11, 121, and 227 are remaps of structure number 10, the base structure number for structures 11, 121, and 227 is 10.For data sets that are not remaps, the
strnum
andbase_strnum
columns are equal. If you do not want to clone anything related to a particular data set, set the value of theactive
column (for that data set) in the corresponding DATASETS Client control table entry to 0. No tables (primary or secondary) are generated from this data set.For more information about REMAPS, see the next section.
REMAPS
If the base structure of a REMAP is not visible to the Client (due to
GenFormat filtering or the use of a logical DMSII database) the Client will set the active
column of the first
REMAP to 1, and clone it in place of the base structure. If the base
structure of an embedded data set is not visible to the Client and it
has a REMAP, the Client will set the active
column of the REMAP to 1 and
use it as the parent instead.
Note
If you are using the Administrative Console or Client Configurator and you want to change the active
column of a data set that is 0, you should be aware that the data sets view has a filter.
The picture below shows this for the Administrative Console. To access it, click on the Filters button above the data sets. To see the data sets that have their active
column set to 0, you need to click on Inactive and then on Apply. This causes data sets with their active
column set 0 to be included in the data set view. The items in question will have "Inactive" in their Status column to indicate that their active
columns are 0. You can change the active
column by clicking on the gear icon to the left of the data set and moving the slider labeled Active to the on position (dark blue).