Just setting up MySQL replication is not enough, you would need to periodically monitor your slaves to ensure they continue to work seamlessly. Here is a basic overview of the Slave variables to monitor and the tools that will help you monitor those with ease.
Top variables to monitor on your slaves
Replication is best monitored by checking the following variables-
- SLAVE_RUNNING: This is a global status variable and its value can be checked using SHOW GLOBAL STATUS like 'slave_running'. It can either be ‘ON’ or ‘OFF’.
If slave_running is ‘ON’, then the slave is up and working fine, which means both the SQL thread and the IO thread are running. If either the SQL thread or the IO thread is not running then this variable would be ‘OFF’. Use SHOW SLAVE STATUS and try to determine whether there was an error or the slave was stopped manually. Check the last 100 lines of the slave’s error log to see if that gives a clue. Look at Last_Error_Number and Last_Error_Message for specific error information and fix your slave.
The following variables are a part of SHOW SLAVE STATUS
- SLAVE_IO_RUNNING: It tells us if the Slave’s IO thread is able to connect to its master and is running fine. The possible values for this variable is ‘Yes’ or ‘No’ or ‘connecting’.
If this variable reads ‘NO’ then you will have to check the Last_Error_Number and Last_Error_Message and fix your slave. As of MySQL 5.1.20, these columns are aliases for Last_SQL_Errno and Last_SQL_Error. Before 5.1.20, they indicate the error number and error message returned by the most recently executed statement. An error number of 0 and message of the empty string mean “no error.”
- SLAVE_SQL_RUNNING: It tells if the Slave’s SQL thread has started and is working fine. The possible values of this variable are ‘Yes’ or ‘No’.
If this variable reads ‘No’, then the IO thread was caused to stop. You will have to check the Last_SQL_Errno and Last_SQL_Err for more information on the cause. An error number of 0 and message of the empty string mean “no error.” The Last_SQL_Error appears in the slave’s error log.
- SECONDS_BEHIND_MASTER: As the name suggests, this field tells how late your slave is. In other words, it tells the time in seconds that the Slave’s SQL thread lags while processing Master’s binary log. A continuous increase in this value is not a very good sign as it means that the slave is not able to catch up with its master. There is no threshold value for this variable to compare against, to determine if the value is high or low. It completely depends on your application, network speed etc.
NOTE: Although seconds_behind_master is the best option available for determining slave lag available in all MySQL versions, it has been criticised for not always being accurate.
MySQL 5.5 version has a status variable MASTER_HEARTBEAT_PERIOD which when set will send beat packages to the Slave. After the loss of a beat the Slave IO thread will disconnect and try connecting again. Various solutions for adding a ‘heartbeat’ mechanism have been proposed and patches and plugins are available for MySQL < 5.5. If you have added such ‘heartbeat’ mechanism you should monitor that as well.
What tools to use to monitor MySQL replicas?
There are numerous tools available in the market to monitor MySQL replication. Some are command line based tools, the others GUI based. I have taken up 2 of each type and explained the benefits of each.
- Manually: Execute SHOW GLOBAL STATUS like 'slave_running' and SHOW SLAVE STATUS periodically and store it in a file. Check the contents of the file periodically to see if any of your replicas have stopped replicating.
On Linux, you can have a cron job setup which executes these queries periodically and even configure to send email alert if slave_running value is ‘NO’.
This is not a tool as such, but one of the boring ways of doing things. But if you are comfortable doing everything by yourself manually, why not?
- Percona Toolkit: Percona Toolkit comes with a bunch of very useful tools that help in monitoring as well as managing MySQL replicas -
pt-heartbeat: Convenient tool to monitor slave lag in real time.
pt-slave-restart: Watches and restarts Slave on error.
pt-slave-find: Finds replication hierarchy of the slaves.
pt-table-checksum: Checks if databases on the slaves are in sync with their master.
- MySQL Enterprise Monitor: A “Virtual DBA Assistant” by Oracle is an agent based monitoring tool which has a neat web based GUI.
Specific to monitoring replicas, there is a ‘Replication’ tab which gives a topological view of all the Masters and their Slaves along with the output of SHOW SLAVE STATUS and SHOW MASTER STATUS.
Notification alerts can be configured to reach your inbox on the event of replication failure.
- MONyog – MySQL Monior and Advisor: A “MySQL DBA in a box” by Webyog is an agentless GUI based tool that helps MySQL DBAs manage MySQL servers. Replication monitoring and managing include-‘Replication’ tab that give a topological view of all the Masters and their Slaves along with the SHOW SLAVE STATUS and SHOW MASTER STATUS.
Auto registering of slaves, given the Master details.
Notification alerts through emails/SNMP traps that are sent out on the event of replication failure.
Depending on your needs, you can choose the tool of your choice. If you are interested in MONyog – MySQL Monitor and Advisor, you can try an unrestricted copy for a month. You can write to us at email@example.com to give us your feedback and suggestions. Cheers!