CHAPTER 5: Exporting and Importing Data

5.1 Introduction

There are three utility programs related to the exporting and importing of data. These three utilities are:

empexpt exports tables from an Empress database.
empimpt imports tables to an Empress database using the data generated by empexpt.
empdbld imports data from an operating system file into Empress databases.

These utility programs may be used to export and import database tables across operating systems.

5.2 Exporting Data from a Database

empexpt exports tables from a database. The syntax for empexpt is:

   empexpt [-dehvx] outfile database [table {table}]

where:

`database`	is an Empress database.
`table`	is a table to be exported. If no tables are given on the command line all tables in the database are exported.
`outfile`	is the file to which output is written. It is removed if the program is interrupted. If `outfile` is "`-`" the output is directed to standard output.

The command options are:

`-d`	To suppress exporting of the data. Only the table definitions, comments, lock levels, indices, range checks, and referential constraints are exported.
`-e`	To ignore errors and continue exporting.
`-h`	To put a user information string into an export file header. The string is displayed when the import utility is used to display the header only.
`-v`	To execute in verbose mode. This prints status messages to standard output while the program is executing. Otherwise, the program executes silently except for error messages printed to standard error.
`-x`	To turn off the routine's check for transmission errors on table data. Turn checking off only if you are reasonably sure of accurate transmission. With checking on, the program terminates and reports any transmission error.

Please note that:

The tables named on the command line are exported. Comments, lock levels, indices, range checks, and referential constraints are exported with the data, but not privileges. Table names may not be preceded by the keyword TABLE and may not take the form database:table (as in Query Language conventions). Table names must be separated by blanks.
If no tables are given on the command line, all user tables are exported, including the comments, lock levels, and indices on the system tables sys_tables and sys_attrs.
The command options may be grouped together after a dash (-) given individually with each preceded by a dash and separated by spaces, or given in several groups, each preceded by a dash and separated by spaces. The options may appear in any order.
You should not attempt to explicitly export data from the data dictionary. However, you may export comments, lock levels, and indices on data dictionary tables using the -d option.
If views are named for export, their underlying tables should be exported first.
Pressing the <Interrupt> key will not interrupt the program during critical phases. Pressing it several times will usually stop the program.
You must have Empress DISPLAY privilege to export a table's definition, and both DISPLAY and SELECT privileges to export its data. Data may not be exported for the table without these privileges.
Files should be transferred in binary mode.
The file tabzero is not exported by empexpt.

5.3 Importing Data from an Empress Export File

empimpt imports tables from an Empress export file into a database. The syntax is:

   empimpt [-cdeghiklprtvx] infile database [table {table}]

where:

`database`	is an Empress database.
`table`	is a table to be imported. (The table name corresponds to the name specified when the table was exported.)
`infile`	is the file from which input is read, and must contain data in the format produced by `empexpt`. If `infile` is "`-`" input is read from standard input.

The command options are:

`-c`	To suppress the import of comments.
`-d`	To display the name and number of records of each table in the input file. A view is indicated by the word `VIEW`. If the `-d` is set, all other options are cancelled, no data is imported, and the `database` needs not be supplied.
`-e`	If a transmission error occurs and the `-e` option is set, the transaction is cancelled and the utility will attempt to import the next table or view. If `-e` is not set, the utility will not attempt to continue on errors.
`-g`	To import grant privilege information. If the `-g` option is not set, grant privilege information is not imported from the export file.
`-h`	To display only the header information.
`-i`	To suppress the import of indices.
`-k`	To suppress the import of referential constraints
`-l`	To suppress the import of lock levels.
`-p`	To grant `SELECT` and `DISPLAY` privileges to all users on all imported tables.
`-r`	To suppress the import of range checks.
`-t`	To suppress the creation of tables. With `-t` set, the routine assumes all tables already exist. If `-t` is not given, a table with no existing duplicate is created.
`-v`	To execute in verbose mode. This prints status messages to standard output while the program is executing. Normally the program executes silently except for error messages printed to standard error.
`-x`	To turn off the routine's check for transmission errors on table data. Turn checking off only if you are reasonably sure of accurate transmission. With checking on, the program terminates and reports any transmission error.

Please note that:

Table names may not be preceded by the keyword TABLE. If no tables are given on the command line all tables in the input file are imported. Table names must be separated by blanks.
The command options may be grouped together after a dash (-), given individually with each preceded by a dash and separated by spaces, or given in several groups, each preceded by a dash and separated by spaces. The options may appear in any order.
The sequence of actions taken by the program in importing a table is:
- Create the table
- Insert the data
- Create the indices
- Define the lock levels
- Create the table comments
- Create the range checks
- Create the attribute comments
An error occurs if a table (other than a system table) to be imported already exists in the database. If the -e option is not used, the following points apply.
- Indices, comments, lock levels, range checks, and referential constraints are not created for the table. No message is printed.
- If the table has any level of locking defined for it, the table is locked at the table level during the insertion. If the table cannot be locked (as when it is in use) no insertions are attempted for the table and a message is printed. You may try to insert the data for the table later.
- If the insertion proceeds, data will be mapped to attributes with the same names as the original attributes. The presence of data mapped to non-existent attributes causes the program to skip to the next table. Additional attributes in the table will be set to null; if they are defined as NOT NULL the program skips to the next table.
- Individual records will be inserted if the data are not incompatible with the data types of the corresponding attributes in the table. Conversion to the appropriate data type will be done automatically.
  
  For instance, data originating from an INTEGER attribute will be inserted into a LONGINTEGER attribute, but data originating from a LONGINTEGER attribute will fail to be inserted into an INTEGER attribute if the numbers are too large.
  
  Similarly, data originating from an INTEGER attribute will be inserted into a CHAR attribute of sufficient length, but data originating from a CHAR attribute will not be inserted into an INTEGER attribute unless they are digits. Again, data from a CHAR Type 3 attribute will be inserted into a TEXT attribute, but data from a TEXT attribute will be inserted into a CHAR Type 3 attribute only if the CHAR attribute is large enough for the value.
- If the first record to be inserted to a table contains data that is incompatible with the data type of the attribute to which it is mapped, and the insertion fails due to a data conversion error or range check, the program assumes that the data is incompatible with the table and no further insertions are attempted on that table. Incompatible data in subsequent records causes the records to be skipped. If inserting any record would cause the allowable number of duplicate tuples to be exceeded, the record is skipped. Care should be taken to avoid having some records inserted while others are rejected.
Table ownership is not preserved in the export/import process. Imported tables are created in the name of the user executing the import program.
All imported tables are created in the database directory.
To import tables you must have the privilege to create tables in the database, i.e., INSERT privilege in sys_tables. If any of the tables to be imported exists, you must have INSERT privilege on them.
You should not attempt to explicitly import data to the data dictionary. You may import comments and lock levels for all the data dictionary tables if you have ALTER privilege on them, and you may import indices for these tables if you have INDEX privilege on them.
Pressing the <Interrupt> key will not stop the program during critical phases. Pressing it several times will usually stop the program.
Files should be transferred in binary mode.
If the imported data is related to the Empress 4GL application, you must compile and link the applications using emp4gl with, respectively, the -c (compile) and -l (link) options.
If no tables are listed in the command line for empexpt, any comments, lock levels, and indices on the system tables sys_tables and sys_attrs will be exported. To import this information successfully to another database, you must have the appropriate privileges on these tables in the second database; that is, ALTER privilege to import comments and lock levels, and INDEX privilege to import indices.

Note that lock levels and indices used for implicit searches of system tables will only be effective if the appropriate variables are set properly in the file tabzero in the database directory. A message to this effect is displayed if lock levels or indices are imported.

5.4 Importing Data from an Operating System File

The utility, empdbld, is a general purpose program for loading data from an operating system file into Empress databases. The loader accepts fixed format ASCII data and allows you to:

Load data from a single input file into multiple tables in multiple databases.
Load data into all the attributes of a table, or into only some attributes, with the remaining attributes being set to null.
Load data into any Empress data types, with automatic data type conversion.
Map input record fields to particular attributes.
Generate log files for error messages and summary statistics.
Generate a reject file of records that cannot be loaded.
Examine the input file for invalid data.

The syntax of empdbld is:

   empdbld [-c] [-d [COUNT]] [-r reject_file] database spec_file [data_file]

where:

`database`	is the default Empress database into which the data is to be loaded.
`spec_file`	is a specification file describing how the data is to be loaded.
`data_file`	is the file containing the data to be loaded. It must contain fixed format ASCII data. If not otherwise specified, data is read from standard input.

The command line options are:

`-c`	To check the data file for invalid data. This option suppresses loading of the data. The program will check the format and data types of the data, but will not check whether excess duplicates would be loaded to an indexed table.
`-d`	To read the specification file and data file, and print each data value with the attribute to which it is mapped. This provides a visual check that the specification file correctly maps input record fields to attributes. The optional `COUNT` gives the number of input records to process. If no `COUNT` is used, all the records are read and all the data printed.
`-r`	To generate the file `reject_file` containing records that failed to be loaded (or would fail, given the `-c` option).

Please note that:

Records are read from the data file and loaded into the database according to the instructions in the specification file.
The data file must contain fixed format ASCII data. The specification file must conform to the syntax described in the next section.
An error occurs if any data to be loaded is incompatible with the data type of the attribute for which it is intended, or if excess duplicates are inserted for an indexed attribute. If the input specifications cause a null value to be inserted for an attribute defined as NOT NULL, the program prints a message and stops before loading data.
On completion, empdbld prints a count of records read and records rejected to standard output.
You must have INSERT privilege on the tables to be loaded.
Pressing the <Interrupt> key will stop the program.

5.4.1 The Specification File

The specification file spec_file describes how fields of data are to be mapped to databases, tables, and attributes. The syntax for this file is:

   LENGTH record_length table_definition {table_definition}

where:

`record_length`	gives the length of each record in the input file. (This includes end-of-record bytes where applicable.)
`table_definition`	specifies how the input data records are mapped to a table. Each input record will cause a single record to be inserted into the table specified. Note that if several `table_definition`s are used each table must accept the same number of records.

The syntax of a table_definition is:

   INTO [TABLE] [database:]table

([ATTRIBUTE] attribute |POSITION |POS | (start:end) | [format_clause] [nullif_clause]

{, [ATTRIBUTE] attribute |POSITION |POS | (start:end) | [format_clause] [nullif_clause]} )

table is the name of a table. It may be preceded by the keyword TABLE. The optional database (database name) prefix is used to override the default database specified on the command line. Listing a table more than once causes an error. A view into a table may be used instead of a table. However, at most one view into any given table may be used.

attribute is the name of an attribute in the table. It may be preceded by the keyword ATTRIBUTE, but may not take the form table.attribute or database:table.attribute (as in Empress SQL conventions). Listing an attribute more than once causes an error. Attributes of the table that are not named will have null values inserted. If an unnamed attribute is defined as NOT NULL, an error is reported and the program stops before loading data.

The keyword POSITION (optionally POS) must precede the field specification of the data being inserted into the attribute. The field specification consists of two integers, start and end. These integers determine the starting and ending positions for the data in the record. (The first position in each input record is position 1 and the last is record_length.) A POSITION is always given relative to the beginning of the input record, and end must be less than or equal to record_length.

Data from the field will be inserted into the attribute with automatic data conversion to the attribute's data type. Where appropriate, explicit decimal points are recognized, and floating point exponents may be indicated by "e", any letter (in upper- or lowercase), or a "+/-" sign.

The record length, record_length, in the specification file, spec_file, deserves some special attention. Note that the data file, datafile, will have often been created by the user either through an editor (like vi) or from a program. In either case, the non-printing end-of-line character(s) must be included in determining record_length. Note that the typical end-of-line characters is "\n".

Here, \n is linefeed (hex a), and \r is carriage return (hex d). Thus, a file containing three printing characters 123 would likely have a record_length of 5 in Windows, and 4 in UNIX. Also note that if the file has been created with an editor like vi, the last line of a file is often one character shorter than those preceding it. You can get around this by simply adding a \n character (or even a space) to the end. The important point is that all lines in the file be of the same length, and that record_length is calculated taking the non-printing characters into account.

5.4.2 FORMAT Clause

A FORMAT clause is an option that allows you to support implied data formats. It does not override explicit formats (such as decimal points and exponents) in the automatic conversion to the attribute's data type.

The syntax of the FORMAT clause is:

   FORMAT (|DECIMAL (n)     |)
           |DEC (n)         |
           |FLOAT (n, e)    |
           |DATE (picture)  |

The keyword DECIMAL (optionally DEC) is used to supply the number of implied decimal digits (given by n) in a numeric field. The last n bytes of the field are taken as the decimal digits. The data must be right-justified in the field.

Note that this format specification is ignored if the data contains an explicit decimal point.

The keyword FLOAT is used to supply the number of decimal digits n and the number of exponent digits e in a floating point field. The last e bytes of the field are taken as the exponent digits, and the n bytes before these as the decimal digits. The data must be right-justified in the field.

Note that the entire format specification will be ignored if the data contains an explicit decimal point. The exponent part of the format specification will be ignored if any letter or sign (other than a leading sign) appears in the field.

The keyword DATE is used to supply a picture for interpreting fixed format date fields. The picture is a quoted string in which special characters are used to control how corresponding characters in the input field are treated. The length of the picture may not exceed the length of the field. The picture uses the following conventions:

Table 5-1: Date Picture Conventions for the FORMAT Clause

Characters	Description

`Y,y`	Numeric year. Year values of two bytes or less are assumed to be 20th century.
`M,m`	Numeric month
`A,a`	Alphabetic month
`D,d`	Numeric day (of month)
`H,h`	Hour
`T,t`	Minute
`S,s`	Second
other	All other characters (e.g., blanks, "`/`" or "`:`") cause corresponding bytes in the input field to be ignored.

5.4.3 NULLIF Clause

A NULLIF clause allows you to specify the conditions under which NULL is to be entered as an attribute value. These conditions always depend on specific data values in the input record.

Specifying a NULLIF clause for an attribute defined as NOT NULL causes an error. The program will print a message and stop before data loading commences.

The syntax of the NULLIF clause is:

   NULLIF (null_spec {, null_spec})

Each null_spec specifies a condition for entering a null value. The conditions are implicitly OR'd together.

A null_spec is:

   |POSITION| (start:end) = "string"
   |POS     |

The condition specifies that the field starting at position start and ending at position end of the input record be equal to the quoted string. The field positions (start and end) are not given as offsets within the part of the data record mapped to the attribute. They are given as offsets from the start of the input record. (A position is always given relative to the beginning of the input record, and end must be less than or equal to record_length.). The length of the string must equal the length of the field, (i.e., end - start + 1).

5.4.4 Examples

The examples below are based on the database repairs. It is assumed that these tables are empty at the start of each example.

Example 1

Given the table personnel:

   number        integer
   name          char (25, 1)
   phone         char (15, 1)
   credit_limit  dollar (6, 1)

Load the following data into personnel:

   10Kilroy  426-9681 500.00
   5 Mosca   544-2243 750.00
   3 Jones   667-2951 500.00
   8 Peterson978-6060 250.00
   9 Jordan  964-3335 250.00

The length of each record is 26 (including 1 character for the new line). The specification file is:

   LENGTH 26
   INTO personnel
        (number POS (1:2), name POS (3:10),
         phone POS (11:18), credit_limit POS (20:25))

If the specification file is spec.1 and the data is in data.1, run a check with the command at the operating system prompt:

   empdbld -c repairs spec.1 data.1

It produces the following output:

     *** Summary ***
   No. records read = 5
   No. rejected     = 0

To load the data with the following command at the operating system prompt:

   empdbld repairs spec.1 data.1

It produces the following output:

     *** Summary ***
   No. records read = 5
   No. rejected     = 0

Example 2

If the information in personnel is split into two tables as:

       *** pers1 ***                 *** pers2 ***
   
   number     integer            number         integer
   name       char (25, 1)       credit_limit   dollar (6, 1)
   phone      char (15, 1)

Then you can load the same data using the specification file:

   LENGTH 26
   INTO pers1
        (number POS (1:2), name POS (3:10), phone POS (11:18))
   INTO pers2
        (number POS (1:2), credit_limit POS (20:25))

Note that when loading more than one table each table must accept the same number of records.

Example 3

Suppose that in addition to the personnel table there is a loans table:

   name      char (25, 1)
   date      date (1)
   amount    dollar (6, 1)

The data to be loaded into personnel and loans is in the file data.3 containing:

   10Kilroy  426-9681 500.00 16/02/92 250.00
   5 Mosca   544-2243 750.00 02/02/92 150.00
   5 Mosca   544-2243 750.00 04/05/92 200.00
   3 Jones   667-2951 500.00 07/02/92  33.95
   3 Jones   667-2951 500.00 03/04/92  25.00
   3 Jones   667-2951 500.00 12/08/92 300.00
   8 Peterson978-6060 250.00 06/06/92  50.00

Only one record for each name should be loaded to personnel, but the loan information in each record should be loaded to loans. The data may be loaded in either of two ways:

Load one table at a time using the same data file. This means running empdbld twice using the appropriate specification file each time.
Load the data into a temporary table, then use the Query Language to select data into the two tables.

Loading data one table at a time:

To see how empdbld handle rejected records, let's use the Query Language to place a unique index on the number attribute of the personal table. This will reject any duplicate records (there are three). For instance:

   CREATE UNIQUE INDEX ON personnel.number;

The specification file spec.3 would be:

   length 42
   into personnel
        (number pos (1:2), name pos (3:10),
         phone pos (11:18), credit_limit pos (20:25))

When you run empdbld you will see error messages due to the duplicate records:

   empdbld repairs spec.3 data.3
   
   *** Data Error:
   duplicate value for attribute 'number', record 3 rejected
   
   *** Data Error:
   duplicate value for attribute 'number', record 5 rejected
   
   *** Data Error:
   duplicate value for attribute 'number', record 6 rejected
   
        *** Summary ***
   No. records read = 7
   No. rejected     = 3

To load loans, use the specification file:

   LENGTH 42
   INTO loans
        (name POS (3:10), 'date' POS (27:34)
         FORMAT (date ("dd mm yy")),
         amount POS (36:41))

The above example specifies a format for the date to be sure that the data is loaded correctly.

Loading data using a temporary table:

The following SQL command creates a temporary table combining the attributes of the two tables:

   SELECT name, number, phone, credit_limit,
        date, amount FROM loans, personnel INSERT INTO temp;

This will create the table temp with the following definition:

   name          char (25, 1)
   number        integer
   phone         char (15, 1)
   credit_limit  dollar (6, 1)
   date          date (1)
   amount        dollar (6, 1)

Next, use empdbld to load data into the temporary table according to the following specification file:

   LENGTH 42
   INTO temp
        (number POS (1:2), name POS (3:10),
         'date' POS (27:34) FORMAT (date ("dd mm yy")),
         phone POS (11:18), credit_limit POS (20:25),
         amount POS (36:41))

Note that the order of the attributes does not matter. Finally, use the Query Language to move the data from temp to personnel and loans:

   SELECT UNIQUE number, name, phone, credit_limit FROM temp
        INSERT INTO personnel;
        SELECT name, date, amount FROM temp
        INSERT INTO loans;

Example 4

If the data file uses implied data formats, use a FORMAT clause. Suppose the following data is to be inserted into loans:

   Kilroy  25000Feb16'92
   Mosca   15000Feb 2'92
   Mosca   20000May 4'92
   Jones    3395Feb 7'92
   Jones    2500Apr 3'92
   Jones   30000Aug12'92
   Peterson 5000Jun 6'92

Here, position (1:8) is the name, position (9:13) is the amount with two implied decimal places (i.e., 25000 represents $250.00), and position (14:21) is the date.

The specification file to load this data is:

   LENGTH 22
   INTO loans
   (name POS (1:8), amount POS (9:13) FORMAT (dec(2)),
    'date' POS (14:21) FORMAT (date("aaadd yy")))

Example 5

If the loans table kept loan amounts as FLOAT data, as in:

   name     char (25, 1)
   date     date (1)
   amount   float

And the data could be read in as:

   Kilroy  2.500000e+02Feb16'92
   Mosca   1.500000e+02Feb 2'92
   Mosca   2.000000e+02May 4'92
   Jones   3.395000e+01Feb 7'92
   Jones   2.500000e+01Apr 3'92
   Jones   3.000000e+02Aug12'92
   Peterson5.000000e+01Jun 6'92

the specification file as follow:

   LENGTH 28
   INTO loans
        (name POS (1:8), amount POS (9:20),
         'date' POS (21:28) FORMAT (date("aaadd yy")))

If the data format for amount is implicit:

   Kilroy   250000002Feb16'92
   Mosca    150000002Feb 2'92
   Mosca    200000002May 4'92
   Jones    339500001Feb 7'92
   Jones    250000001Apr 3'92
   Jones    300000002Aug12'92
   Peterson 500000001Jun 6'92

the specification file becomes:

   LENGTH 26
   INTO loans
        (name POS (1:8), amount POS (9:17) FORMAT (float(6,2)),
         'date' POS (18:25) FORMAT (date("aaadd yy")))

Example 6

The NULLIF clause specifies conditions under which an attribute takes a null value. Suppose data being loaded into personnel is:

   10Kilroy    426-9681  500.00
   5 Mosca     544-2243  750.00
   3 Jones     667-2951  500.00
   8 Peterson  978-6060  250.00
   9 Jordan    964-3335  250.00

The last column represents credit limits to be loaded. Suppose there is no credit limit for Jones and Mosca. Then, the specification file becomes:

   LENGTH 26
   INTO personnel
        (number POS (1:2), name POS (3:10), phone POS (11:18),
         credit_limit POS (20:25) nullif (POS (3:7)="Jones",
                                          POS (3:8)="Kilroy"))