Blog

How to Monitor MySQL Replication?

how to monitor mysql replication

Just setting up MySQL replication is not enough, you would need to periodically monitor your slaves to ensure they continue to work seamlessly. Here is a basic overview of the Slave variables to monitor and the tools that will help you monitor those with ease.

Top variables to monitor on your slaves

Replication is best monitored by checking the following variables-

  1. SLAVE_RUNNING: This is a global status variable and its value can be checked using SHOW GLOBAL STATUS like 'slave_running'. It can either be ‘ON’ or ‘OFF’.

    If slave_running is ‘ON’, then the slave is up and working fine, which means both the SQL thread and the IO thread are running. If either the SQL thread or the IO thread is not running then this variable would be ‘OFF’. Use SHOW SLAVE STATUS and try to determine whether there was an error or the slave was stopped manually. Check the last 100 lines of the slave’s error log to see if that gives a clue. Look at Last_Error_Number and Last_Error_Message for specific error information and fix your slave.

    The following variables are a part of SHOW SLAVE STATUS

  2. SLAVE_IO_RUNNING: It tells us if the Slave’s IO thread is able to connect to its master and is running fine. The possible values for this variable is ‘Yes’ or ‘No’ or ‘connecting’.

    If this variable reads ‘NO’ then you will have to check the Last_Error_Number and Last_Error_Message and fix your slave. As of MySQL 5.1.20, these columns are aliases for Last_SQL_Errno and Last_SQL_Error. Before 5.1.20, they indicate the error number and error message returned by the most recently executed statement. An error number of 0 and message of the empty string mean “no error.”

  3. SLAVE_SQL_RUNNING: It tells if the Slave’s SQL thread has started and is working fine. The possible values of this variable are ‘Yes’ or ‘No’.

    If this variable reads ‘No’, then the IO thread was caused to stop. You will have to check the Last_SQL_Errno and Last_SQL_Err for more information on the cause. An error number of 0 and message of the empty string mean “no error.” The Last_SQL_Error appears in the slave’s error log.

  4. SECONDS_BEHIND_MASTER: As the name suggests, this field tells how late your slave is. In other words, it tells the time in seconds that the Slave’s SQL thread lags while processing Master’s binary log. A continuous increase in this value is not a very good sign as it means that the slave is not able to catch up with its master. There is no threshold value for this variable to compare against, to determine if the value is high or low. It completely depends on your application, network speed etc.

    NOTE: Although seconds_behind_master is the best option available for determining slave lag available in all MySQL versions, it has been criticised for not always being accurate.

    MySQL 5.5 version has a status variable MASTER_HEARTBEAT_PERIOD which when set will send beat packages to the Slave. After the loss of a beat the Slave IO thread will disconnect and try connecting again. Various solutions for adding a ‘heartbeat’ mechanism have been proposed and patches and plugins are available for MySQL < 5.5. If you have added such ‘heartbeat’ mechanism you should monitor that as well.

What tools to use to monitor MySQL replicas?

There are numerous tools available in the market to monitor MySQL replication. Some are command line based tools, the others GUI based. I have taken up 2 of each type and explained the benefits of each.

  1. Manually: Execute SHOW GLOBAL STATUS like 'slave_running' and SHOW SLAVE STATUS periodically and store it in a file. Check the contents of the file periodically to see if any of your replicas have stopped replicating.
    On Linux, you can have a cron job setup which executes these queries periodically and even configure to send email alert if slave_running value is ‘NO’.

    This is not a tool as such, but one of the boring ways of doing things. But if you are comfortable doing everything by yourself manually, why not?

  2. Percona Toolkit: Percona Toolkit comes with a bunch of very useful tools that help in monitoring as well as managing MySQL replicas -

    pt-heartbeat: Convenient tool to monitor slave lag in real time.
    pt-slave-restart: Watches and restarts Slave on error.
    pt-slave-find: Finds replication hierarchy of the slaves.
    pt-table-checksum: Checks if databases on the slaves are in sync with their master.

  3. Percona Toolkit pt heartbeat diagram

  4. MySQL Enterprise Monitor: A “Virtual DBA Assistant” by Oracle is an agent based monitoring tool which has a neat web based GUI.

    MySQL Enterprise Monitor MEM Replication

    Specific to monitoring replicas, there is a ‘Replication’ tab which gives a topological view of all the Masters and their Slaves along with the output of SHOW SLAVE STATUS and SHOW MASTER STATUS.

    Notification alerts can be configured to reach your inbox on the event of replication failure.

  5. MONyog – MySQL Monior and Advisor: A “MySQL DBA in a box” by Webyog is an agentless GUI based tool that helps MySQL DBAs manage MySQL servers. Replication monitoring and managing include-‘Replication’ tab that give a topological view of all the Masters and their Slaves along with the SHOW SLAVE STATUS and SHOW MASTER STATUS.

    MONyog Monitors with replication

    MONyog Replication page

    Auto registering of slaves, given the Master details.

    Notification alerts through emails/SNMP traps that are sent out on the event of replication failure.

Depending on your needs, you can choose the tool of your choice. If you are interested in MONyog – MySQL Monitor and Advisor, you can try an unrestricted copy for a month. You can write to us at support@webyog.com to give us your feedback and suggestions. Cheers!

10 thoughts on “How to Monitor MySQL Replication?

  1. I’m surprised you haven’t mentioned nagios, which I thought was pretty commonly used for enterprise monitoring. If you have any kind of on-call rotation setup, then triggering your alerts from all servers and services via the same mechanism is key.

    I’m quite fond of the check_mysql_health script, which among other things such as uptime gives me:

    slave-lag (Seconds behind master)
    slave-io-running (Slave io running: Yes)
    slave-sql-running (Slave sql running: Yes)

  2. @Joel: Percona Toolkit as I mentioned is free! These tools not only help you monitor but also manage MySQL.

    @Gregory: Yes! But note that Nagios is a generic monitoring tool which can do much more than MySQL replication monitoring. Our blog lists various options specific to MySQL monitoring.

  3. How accurate is MONyog’s “seconds behind master” measure and what is that based on? My DBA tells me that “it isn’t that accurate” but I’m not sure whether to believe him on that; I think he is just trying to wriggle our from under the microscope I have put on our MySQL replication lag and the potential impact on our DR RPO ;-)

  4. Yes .. default settings are WARNING = 450 seconds and CRITICAL = 600 seconds.
    You can customize this as you customize any counter (it is in the ‘replication’ group).

    This is not the best place for this discussion. If you need more help please use our Forums at http://forums.webyog.com/. In the Forums we can attach screenshot etc. what is not possible with a comment here.

  5. Pingback: Monitor MySQL Replication - Just just easy answers

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>