Chapter 5
Synchronization in Empress Replication

5.1 Introduction

This chapter explains Synchronization in Empress Replication. The Client/Server architecture of synchronization is explained with the relationship between replication master server and synchronization client. Usage of the Synchronization Utility as a client is also explained.

5.2 Replication Master Server and Synchronization Client

Replication Master Server is a kind of Empress Server that is used for replication purposes. Replication Master Server resides in RMT-Side and accesses RMTs. Execution of a Replication Master Server is controlled by Empress Server Administration Utility (empsvadm).

A Replication Master Server needs to access and control the replication operations of any RMT that wants to participate in a replication relation. This control is done by Replication Master Server residing in RMT-Side and accessing that RMT. In a Replication World, only the Replication Tables that are not RMTs, do not need to be accessed by a Replication Master Server.

A Synchronization Client is a client for a Replication Master Server. It accesses the RRT and sends synchronization requests to a Replication Master Server. The relation between a Replication Master Server and a Synchronization Client are shown in [Figure: Replication Master Server and Synchronization Client].

Figure 5-1
Replication Master Server and Synchronization Client

5.2.1 Connection between an RMT-Side and an RRT-Side

A Connection between an RMT-Side and an RRT-Side is a Network Connection between :

A Synchronization Client residing at RRT-Side, and
A Replication Master Server residing at RMT-Side.

In order to establish this connection, the following conditions must satisfy:

In RMT-Side, a Replication Master Server must be started. and must be accessing the RMT.
Synchronization Client must be configured to access the Replication Master Server on the RMT-Side.
User of the Replicate Table shall be authorized to access Replication Master Server residing at RMT-Side. This is an authorization given by the Administrator of the Replication Master Server to control the access of Synchronization Clients to this Replication Master Server. This control might require Synchronization Clients to identify themselves by sending their login name and password. (See [User Authorization])
A network connection between Synchronization Utility and Replication Master Server must be existent.
Synchronization Client must have privilege to access RRT

5.2.2 Setting up a Replication Master Server

Setting up a Replication Master Server is done as follows:

Setting required environment variables (optional)
These environment variables are to be used by Empress Server Administration Utility. Generally four environment variables MSUSERAUTHCONFIGFILE, MSNETSERVERCONFIGFILE, MSNETTYPECONFIGFILE and MSCONFIGFILEPATH should be set, before starting a Replication Master Server. The default values of these environment variables are given in $EMPRESSPATH/config/initfile. These environment variables are explained in [Configuring Empress Server and its Clients]. If there is no need to change contents of Network Configuration Files or setting User Authorization security, go directly to step 3.
Network Configuration and User Authorization Configuration (optional, depends on Step 1)
Users can change the contents of network server configuration file and network type configuration file, such as server name, host name, port number etc. Refer to [Network Configuration] for setting Network Configurations. If increased security is required by checking username and password of administrators or users, the corresponding password file and user authorization should be created. Refer to [User Authorization] for increasing the security and user authorization.
Creating an Empress Server Start Configuration File (optional)
This is to optionally specify the database and RMTs to be handled by a Replication Master Server. This is explained in [Starting an Empress Server].
Starting an Empress Server
In this step an Empress Server is started, using Empress Server Administration Utility. This is explained in [Starting an Empress Server].

5.2.3 Setting up a Synchronization Client

Setting required environment variables (optional)
Similar to setting up a Replication Master Server, four environment variables should be set for Empress Server Utility and Synchronization Utility on the client side. Usually the client of an Empress Server needs to set environment variable MSNETSERVERCONFIGFILE to point to a new network server configuration file. These environment variables are explained in [Configuring Empress Server and its Clients].
Network Configuration and User Authorization Configuration (optional, depends on Step 1)
Usually the client of an Empress Server needs to change contents of a network server configuration file, so that the Synchronization Utility can access the Empress Server.
Using Synchronization Utility or Empress Server Administration Utility
In this step [Synchronization Utility] sends requests to a running Empress Server. The functions of [Empress Server Administration Utility] can also be used to perform administrative and non-administrative operations on Empress Servers.

5.3 Synchronization Utility

Synchronization Utility emprepsync is a client for a Replication Master Server and runs on the synchronization client side. In order to establish the connection to a replication master server, the network configuration must be configured before execution. Refer to [Configuring Empress Server and its Clients]. The utility usage for synchronizing table replicate_table in database replicate_database is as follows:

   $ emprepsync replicate_database replicate_table

Complete options and arguments to this utility are given in [References: Replication Synchronization Utility].

5.4 Synchronization

Synchronization in Empress Replication is the process of updating a Replicate Table with the changed data of its RMT since the last successful synchronization. The changes to the data of an RMT can be insertion of new records, deletion and update of existing records of RMT.

The synchronization request can be made manually, or can be automated through some other features like "cron" in Unix environment. The Synchronization Client then updates the replicate table based on this data. The RRT-Side of a Replication Relation does not have to be always connected to the host machine running Replication Master Server, but only for the duration of the synchronization.

Note that Empress Replication is an asynchronous process. At any certain time, the contents of an RRT might not be the same as its RMT. Only at the end of a "successful" synchronization, the contents of RRT and its RMT are synchronized.

5.4.1 Table Timestamp and Recovery Timestamp

In order to discuss synchronization algorithms, the following concepts are needed:

Current Master Table Start Timestamp
Table Timestamp
Recovery Timestamp

Where Current Master Table Start Timestamp (CMTS) is discussed in [Replication Table Switch].

Table Timestamp is the timestamp that a replicate table has the snapshot of its Master Table. A replicate table with Table Timestamp TTS means that it is consistent with master table at TTS. Only a successful synchronization can change table timestamp of a replicate table. For a master table, current timestamp is it's own timestamp, which is increasing with time.

When a replicate table is switched to master table, the original table timestamp is defined as Recovery Timestamp (RTS), and current timestamp becomes the table timestamp of current master table. When a replicate table is explicitly or implicitly synchronized with the new master table, it automatically inherits the recovery timestamp and broadcasts to its RRTs for next synchronization. Generally, Recovery Timestamp of a replication table is less than its Table Timestamp.

5.4.1 Choosing Replication Master

For Synchronization, a connection between the RRT-Side containing RRT and RMT-Side containing a "chosen" RMT is tried to be established. This connection is a network connection between Synchronization Client (residing in the RRT-Side), and the Replication Master Server assigned to the "chosen" RMT (residing in the RMT-Side). Choosing an RMT for a Replicate Table RT is to find an appropriate RMT that will serve as source of data for updating the RT. Choosing an RMT for Synchronization is done automatically by Synchronization Client, considering the Replication Master Entries for the RT.

The process of choosing a Replication Master is explained here. Synchronization utility opens the assigned replicate table, and gets all of its enabled Replication Master Entries. Then following the Replication Master Order, Synchronization utility tries to establish a connection to the replication master server assigned to the chosen replication master entry.

If the connection is established, synchronization utility sends the server the following information:

Database name and table name of RMT
Host name, database name and table name of itself (RRT)
Original master table information, current master table start timestamp, table timestamp of RRT and synchronization mode

After server gets the above information, it does the following checks:

Whether the assigned RMT exists
Whether RMT is in the list of replication tables of the Replication Master Server Start Configuration File, if the Replication Master Server is started with a [Replication Master Server Start Configuration File]
Whether RRT is authorized to synchronize from the RMT
Whether RRT and RMT are in the same replication world

If all conditions are satisfied, the RMT is chosen. Otherwise, the synchronization fails.

Note that the list of Replication Master Entries can be altered by:

Adding and subtracting replication masters (see [Creating Replication Master] and [Dropping Replication Master]),
Enabling and disabling replication masters (see [Enabling and Disabling Replication Masters] , and
Changing Replication Master Order (see [Changing Replication Master Order].

These manual alterations to list of Replication Master Entries affect the way that Synchronization Client "chooses" an RMT. Only RMTs that are accessed by an "enabled" Replication Master Entry are "chosen". Empress RDBMS "chooses" the RMT that has the smallest [Replication Master Order].

Start of a Synchronization Request updates Last Request Time information of the RRT. Arrival of Synchronization Request sent to the RMT updates Last Request Time information of the RMT. Successful Synchronization updates Last Successful Request Time and Last Request Status of both RMT-Side and RRT-Side.

5.4.2 Synchronization Requirements

A successful Synchronization has three main steps. These are:

"Choosing" an RMT, as described in [Choosing Replication Master].
Establishing a connection between the Synchronization Client in the RRT-Side where the RRT resides, and the Replication Master Server in the RMT-Side where the "chosen" RMT resides, (as explained in [Connection between an RMT-Side and an RRT-Side].)
Synchronizing the RRT data, with the changed, up-to-date data of the RMT, since the last Synchronization.

5.4.3 Synchronization Algorithms

After the RMT is chosen, the server chooses the corresponding algorithm to complete the synchronization by comparing current master table start timestamp (CMTS) and table timestamp(TTS) in both RRT and RMT. The flowchart representation of synchronization algorithms is shown in [Figure 5-2: Flowchart for Synchronization Algorithms].

In this section, synchronization algorithms on subset replication are not discussed. They are similar to full set Synchronization algorithms, only that SRSC must be applied for synchronization.

5.4.3.1 Algorithm 1: Forward algorithm

Forward algorithm is applied for the following cases:

CMTS(RRT) == CMTS(RMT) and TTS(RRT) < TTS(RMT)
CMTS(RRT) < CMTS(RMT) and TTS(RRT) < RTS(RMT)

Replication Master Server just collects all data changed on RMT between TTS (RRT) to TTS (RMT). i.e. it performs the following pseudo-query:

select * from RMT 
   where EMPRESS_TIMESTAMP > TTS (RRT) 
     and EMPRESS_TIMESTAMP <= TTS (RMT)

and sends it to synchronization client.

After synchronization client gets the changed data, synchronization utility will automatically update the replicate table, and changes its TTS, and synchronization status. If CMTS of RRT is less than CMTS of its RMT, then its RTS and CMTS are also automatically modified.

5.4.3.2 Algorithm 2: Backward algorithm

Backward algorithm is applied for the following case:

CMTS (RRT) == CMTS (RMT), TTS (RRT) > TTS (RMT)

if "force synchronization" option is not applied:
then no synchronization is done
if "force synchronization" option is applied:
Replication Master Server first asks the client to send back the change list between TTS(RMT) to TTS (RRT) on RRT, i.e. it performs the following pseudo-query:
```
select * from RMT
   where EMPRESS_RECORD_NUMBER (or Primary key) in changed list
```
then collects the data depending on the change list on RMT, and sends it to client. After synchronization client gets the data, it first physically deletes all data with EMPRESS_TIMESTAMP between TTS(RMT) to TTS(RRT) on RRT:
```
delete from RRT 
  where EMPRESS_TIMESTAMP > TTS(RMT)
  and   EMPRESS_TIMESTAMP < TTS(RRT)
```
then applies the changed data to RRT:
```
insert into RRT
   (select * from RMT 
      where EMPRESS_TIMESTAMP > TTS(RMT)
      and   EMPRESS_TIMESTAMP < TTS(RRT))
```
then rolls back RRT to TTS(RMT) status, and it finally updates TTS(RRT) and synchronization status.

5.4.3.3 Algorithm 3: Recovery algorithm

Recovery algorithm is applied for the following case:

CMTS (RRT) < CMTS (RMT) and TTS (RRT) > RTS (RMT)

Replication Master Server first asks the client to send back the change list between RTS(RMT) to TTS(RRT) on RRT, then collects the data depending on the change list and the data between RTS(RMT) and TTS(RMT) on RMT, i.e. it performs the following query:

select * from RMT 
   where EMPRESS_RECORD_NUMBER (or Primary key) in changed list 
   or (EMPRESS_TIMESTAMP > RTS(RMT) and EMPRESS_TIMESTAMP <= TTS(RMT))

and sends it to client. After synchronization client get the data, it first physically deletes all data with EMPRESS_TIMESTAMP between RTS(RMT) and TTS(RRT) on RRT:

delete from RRT 
  where EMPRESS_TIMESTAMP > RTS(RMT)
  and   EMPRESS_TIMESTAMP < TTS(RRT)

then applies the changed data to RRT:

insert into RRT
   (select * from RMT 
      where EMPRESS_TIMESTAMP > RTS(RMT)
      and   EMPRESS_TIMESTAMP < TTS(RMT))

It finally update TTS(RRT), RTS(RRT), CMTS(RRT) and its synchronization status.

Figure 5-2
Flowchart for Synchronization Algorithms

Chapter 5 Synchronization in Empress Replication