Multi-source replication

Multi-Source Replication

A basic MySQL or MariaDB replication arrangement uses a simple method to perform replication. Any changes to schema or data on the master are recorded in its binary log. The slave regularly asks the master for any new entries to its binary log, based on position numbers in that log. The slave will record any new entries the master sends it in a relay log and then execute them on its copy of the databases. This system works well with a straightforward master-slave configuration, when there is only one master and one or more slaves.

With recent releases of MySQL and MariaDB, a slave can have multiple sources for replication. This may seem absurd at first until you consider one of the inherent points of replication: load balancing. Slaves are typically used for Read load balancing—to spread traffic related to SELECT statements among several servers. With multi-source replication, Writes can also be load balanced. For a large organization with plenty of clients writing data to their databases, including scripts that load regularly and somewhat continuously large amounts of data, spreading the write load among several servers will greatly improve performance.

Incidentally, multi-source replication shouldn’t be confused with MySQL multi-master replication, which is a scenario in which each node in a cluster is both a master and a slave, making for circular or point-to-point replication. That arrangement doesn’t necessarily separate Write loads from Read loads and it’s more fragile to maintain.

Preparing for Multiple Sources

In order for the slaves not to be confused, some configuration and new features were needed to be able to give a slave the connection information about each master. These improvements were added starting with version 5.7.6 of MySQL and version 10.0 of MariaDB. For MySQL, in addition to some new system variables, the CHANGE MASTER statement was modified to allow for information on multiple masters to be entered—so it won’t overwrite information on another master entered previously. To do this, a FOR CHANNEL clause was added at the end for specifying an alias for the master. For MariaDB, you give an alias for the master near the start: CHANGE MASTER 'alias' TO ...

These aliases will then allow you to set variables for each source by specifying which channel (‘Connection’ in MariaDB) with the SET or other statements. For example, on MySQL you could execute START SLAVE FOR CHANNEL 'channel1' and on MariaDB you could execute something like, SHOW SLAVE 'master1' STATUS.

Better Identification for Transactions

Each transaction in the master’s binary log includes an identification number for the server based on the server_id variable, which can be set to any number (e.g., 1). To help users better configure servers related to recording transactions, Global Transaction Identifier (GTID) – which was launched in an earlier version of MySQL – provided a more detailed and simpler method of labeling transactions. Basically, transactions are recorded with a unique identification number for the master (i.e., it’s UUID), with a sequential, incremental transaction identification number (i.e., 1, 2, 3). So you get something like this for the first transaction: 32db96db-0da9-11e7-bbff-069099162383:51. The new server identification number is unambiguous and the transaction number located after the colon—51 in this example—is simpler. If a slave received already transactions 1 through 50 from a master with that UUID number, the next one it should request from that master is 51 and any subsequent entries. It’s as simple as that.

Incidentally, the slave keeps track of these various masters UUIDs and transaction entry numbers in a table, not in text files as it did previously. You’ll have to set to two variables to a value of TABLE at start-up: --master-info-repository and --relay-log-info-repository.

There is another bit of potential confusion for the slave: two masters send it transactions on the same database and table. Are the transactions redundant, overlapping, or what? Fortunately, the slave can resolve these conflicts with transactions it receives from the masters, before executing them on the databases. It will decide which goes first and if it should merge some transactions. It’s very slick.

Monitoring Channels

As amazing as all of this may sound, monitoring and maintaining such a voracious replication setup can be daunting: Think about it: multiple servers designated to handle Write traffic, as well as multiple servers taking in transactions from multiple sources, then sorting through them, while at the same time handling Read traffic. You’re going to need some good tools to monitor traffic and coordinate these servers‐and a dependable reporting and alert system to make sure you don’t end up with a mess across the replication setup.

In the upcoming release of Monyog, version 8.0.0 (version 7.1 has been renamed to version 8.0.0), a new Monitor Group called, Multi-Source Replication has been added. With it you can monitor all channels available, including channels for which the slave is not running, those for which the Slave I/O threads and the Slave SQL threads are not running. It also captures the errors provided by SHOW SLAVE STATUS, as well as the seconds behind the masters, but for each source. Monyog will then put this all together to be able to track activities over time. You can even set up alerts to let you know when things go wrong or start to exceed parameters that you set. Basically, running a cluster with multiple sources is no big deal with Monyog.

Download free trial here.

For existing customers, you can upgrade to v8.0.0 here.