01
10/09
8:45 pm
Scripted MySQL Replication Consistency Checks
I’ve been fixing and breaking MySQL replication clusters for quite some time now, and I realized one of the biggest problems is the lack of consistency provided through MySQL replication.
Sure your data will be consistent most of the time, but how do you check if it really IS consistent across all your slaves? How do you make sure your slaves don’t have missing or invalid entries?
I’m sure you’ve all run:
mysql> SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;
Not always a good idea…
Well today I present you with a little bash script I’ve written which performs all these verifications. I haven’t invented anything. On the contrary, I’m just using the methods and tools provided by Percona in their fantastic toolset called Maatkit.
Usage:
- Change the default “User Defined Variables” in the script, to reflect your MASTER mysql server.
- Configure slave reporting on each slave so that “SHOW SLAVE HOSTS\G” works from the MASTER
- Make sure your slaves are running properly: Slave_IO_Running: Yes, Slave_SQL_Running: Yes, Seconds_Behind_Master: 0
- Make sure you have Maatkit installed (apt-get install maatkit)
- Run the script on the MASTER like this: ./mysql_consistency.sh -c
I have tested this script on Debian Lenny (5.0) with maatkit version 4334-1 and MySQL 5.0.
How does it work?
When you run the script, after performing some necessary sanity checks, the MASTER will create a checksum of every database and every table. It will store those results in the default database called test in the table called checksum. It will then replicate the data to the SLAVES who will create their own checksums on the same databases and tables. Afterwards it will tell you which slaves are consistent and which ones are not.
[root@db01 /opt (353)]#: ./mysql_consistency.sh -c
Checking consistency
Replication Slave ID 3 on 172.16.0.63:3306 is consistent.
Replication Slave ID 4 on 172.16.0.64:3306 is consistent.
Replication Slave ID 5 on 172.16.0.65:3306 is inconsistent. Requires rebuild
You might get some error messages too.
Download the script here: mysql_consistency.sh.txt
Please notify me in the comments of any errors or adjustements as I’ve only used this in a small test-environment.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 | #!/bin/sh # Script to perform consistency checks on replicated MySQL databases # # (c) Alex Williams - 2009 - www.alexwilliams.ca # # v0.1 # # Options: # -c Check for inconsistent slaves # ############### # # Slaves *must* have reporting enabled in their my.cnf # example: # [mysqld] # report-host = 172.16.0.63 # report-port = 3306 ######################### # User Defined Variables ######################### MYSQL_HOST="172.16.0.60" # The MASTER database IP MYSQL_PORT="3306" # The MASTER database PORT MYSQL_USER="username" MYSQL_PASS="password" MYSQL_CHECKSUM="test.checksum" # The database (test) and table (checksum) to store checksum results # Mandatory commands for this script to work. COMMANDS="mysql mysqladmin mk-audit mk-table-checksum mk-checksum-filter awk" ############## # Exit Codes ############## E_INVALID_ARGS=65 E_INVALID_COMMAND=66 E_NO_SLAVES=67 E_DB_PROBLEM=68 ########################## # Script Functions ########################## error() { E_CODE=$? echo "Exiting: ERROR ${E_CODE}: $E_MSG" exit $E_CODE } usage() { echo -e "MySQL Replication Consistency - version 0.1 (c) Alex Williams - www.alexwilliams.ca" echo -e "\nOptions: " echo -e "\t-c\tCheck for inconsistent slave(s)" echo -e "" exit $E_INVALID_ARGS } ## # Perform sanity checks before allowing the script to run ## sanity_checks() { ## # Verify if commands exist ## for command in $COMMANDS do ## # Set the full path of the command ## PROG=`which $command` if [ ! ${PROG} ]; then ## # Error message if the command doesn't exist ## E_MSG="missing command '$command'" return $E_INVALID_COMMAND else ## # Create a variable (i.e: $prog_tar) # substitutes all - for _ (i.e: prog_mk-audit becomes prog_mk_audit) ## E_MSG="Command not found" eval prog_${command//-/_}=${PROG} || return fi done } ### # Check for inconsistent slaves ### check() { ## # Run the mk_table_checksum command ## E_MSG="Problem running '$prog_mk_table_checksum' at the top of check() function" $prog_mk_table_checksum --quiet --replicate=$MYSQL_CHECKSUM --create-replicate-table --empty-replicate-table h=$MYSQL_HOST,P=$MYSQL_PORT,u=$MYSQL_USER,p=$MYSQL_PASS || return $E_DB_PROBLEM SLAVE_LIST=`$prog_mysql --user=$MYSQL_USER --password=$MYSQL_PASS -e "SHOW SLAVE HOSTS\G"` ## # Create arrays for the slave ids, hosts, ports # To manually create the slave arrays, do something like this instead: # # slave_ids=(3 4 5) # slave_hosts=(172.16.0.63 172.16.0.64 172.16.0.65) # slave_ports=(3306 3306 3306) # ## slave_ids=(`echo "$SLAVE_LIST" | grep "Server_id" | $prog_awk -F ": " '{ print $2 }'`) slave_hosts=(`echo "$SLAVE_LIST" | grep "Host" | $prog_awk -F ": " '{ print $2 }'`) slave_ports=(`echo "$SLAVE_LIST" | grep "Port" | $prog_awk -F ": " '{ print $2 }'`) ## # Define the number of slaves by the number of entries in the slave_ids[] array ## num_slaves=${#slave_ids[*]} index=0 if [ $num_slaves -eq 0 ]; then echo "No Replication Slaves appear in 'SHOW SLAVE HOSTS'" return $E_NO_SLAVES fi ## # verify the checksums on each replicated slave ## while [ "$index" -lt "$num_slaves" ] do slave_id=${slave_ids[$index]} slave_host=${slave_hosts[$index]} slave_port=${slave_ports[$index]} CHECKSUM=`$prog_mk_table_checksum --replicate=$MYSQL_CHECKSUM --replicate-check 2 h=$slave_host,P=$slave_port,u=$MYSQL_USER,p=$MYSQL_PASS` || CHECKSUM="not consistent" if [ "$CHECKSUM" ]; then echo "Replication Slave ID $slave_id on $slave_host:$slave_port is inconsistent. Requires rebuild" else echo "Replication Slave ID $slave_id on $slave_host:$slave_port is consistent." fi let "index = $index + 1" done } for arg in "$@" do case $arg in -c) arg_c=true;; *) usage;; esac done if sanity_checks; then sanity=true if [ $arg_c ]; then echo "Checking consistency" check || error else usage fi else error fi |