01 10/09
8:45 pm

Scripted MySQL Replication Consistency Checks


I’ve been fixing and breaking MySQL replication clusters for quite some time now, and I realized one of the biggest problems is the lack of consistency provided through MySQL replication.

Sure your data will be consistent most of the time, but how do you check if it really IS consistent across all your slaves? How do you make sure your slaves don’t have missing or invalid entries?

I’m sure you’ve all run:

mysql> SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;

Not always a good idea…

Well today I present you with a little bash script I’ve written which performs all these verifications. I haven’t invented anything. On the contrary, I’m just using the methods and tools provided by Percona in their fantastic toolset called Maatkit.

Usage:

  • Change the default “User Defined Variables” in the script, to reflect your MASTER mysql server.

  • Configure slave reporting on each slave so that “SHOW SLAVE HOSTS\G” works from the MASTER

  • Make sure your slaves are running properly: Slave_IO_Running: Yes, Slave_SQL_Running: Yes, Seconds_Behind_Master: 0

  • Make sure you have Maatkit installed (apt-get install maatkit)

  • Run the script on the MASTER like this: ./mysql_consistency.sh -c

I have tested this script on Debian Lenny (5.0) with maatkit version 4334-1 and MySQL 5.0.

How does it work?

When you run the script, after performing some necessary sanity checks, the MASTER will create a checksum of every database and every table. It will store those results in the default database called test in the table called checksum. It will then replicate the data to the SLAVES who will create their own checksums on the same databases and tables. Afterwards it will tell you which slaves are consistent and which ones are not.

[root@db01 /opt (353)]#: ./mysql_consistency.sh -c
Checking consistency
Replication Slave ID 3 on 172.16.0.63:3306 is consistent.
Replication Slave ID 4 on 172.16.0.64:3306 is consistent.
Replication Slave ID 5 on 172.16.0.65:3306 is inconsistent. Requires rebuild

You might get some error messages too.

Download the script here: mysql_consistency.sh.txt

Please notify me in the comments of any errors or adjustements as I’ve only used this in a small test-environment.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
#!/bin/sh
 
# Script to perform consistency checks on replicated MySQL databases
#
# (c) Alex Williams - 2009 - www.alexwilliams.ca
#
# v0.1
#
# Options: 
#	  -c      Check for inconsistent slaves
#
###############
#
# Slaves *must* have reporting enabled in their my.cnf
# example:
# 	[mysqld]
# 	report-host     = 172.16.0.63
# 	report-port     = 3306
 
 
#########################
# User Defined Variables
#########################
 
MYSQL_HOST="172.16.0.60"	# The MASTER database IP
MYSQL_PORT="3306"		# The MASTER database PORT
MYSQL_USER="username"
MYSQL_PASS="password"
MYSQL_CHECKSUM="test.checksum"	# The database (test) and table (checksum) to store checksum results
 
# Mandatory commands for this script to work.
COMMANDS="mysql mysqladmin mk-audit mk-table-checksum mk-checksum-filter awk"	
 
##############
# Exit Codes
##############
 
E_INVALID_ARGS=65
E_INVALID_COMMAND=66
E_NO_SLAVES=67
E_DB_PROBLEM=68
 
##########################
# Script Functions
##########################
 
error() {
        E_CODE=$?
        echo "Exiting: ERROR ${E_CODE}: $E_MSG"
 
        exit $E_CODE   
}
 
usage() {
	echo -e "MySQL Replication Consistency - version 0.1 (c) Alex Williams - www.alexwilliams.ca"
	echo -e "\nOptions: "
	echo -e "\t-c\tCheck for inconsistent slave(s)"
	echo -e ""
 
	exit $E_INVALID_ARGS
}
 
##
# Perform sanity checks before allowing the script to run
##
sanity_checks() {
	##
	# Verify if commands exist
	##
	for command in $COMMANDS
	do
		##
		# Set the full path of the command
		##
		PROG=`which $command`
		if [ ! ${PROG} ]; then
			##
			# Error message if the command doesn't exist
			##
			E_MSG="missing command '$command'"
			return $E_INVALID_COMMAND
		else
			##
			# Create a variable (i.e: $prog_tar)
			# 	substitutes all - for _ (i.e: prog_mk-audit becomes prog_mk_audit)
			##
			E_MSG="Command not found"
			eval prog_${command//-/_}=${PROG} || return
		fi
	done
}
 
 
###
# Check for inconsistent slaves
###
check() {
	##
	# Run the mk_table_checksum command
	##
	E_MSG="Problem running '$prog_mk_table_checksum' at the top of check() function"
 
	$prog_mk_table_checksum --quiet --replicate=$MYSQL_CHECKSUM --create-replicate-table --empty-replicate-table h=$MYSQL_HOST,P=$MYSQL_PORT,u=$MYSQL_USER,p=$MYSQL_PASS || return $E_DB_PROBLEM
 
	SLAVE_LIST=`$prog_mysql --user=$MYSQL_USER --password=$MYSQL_PASS -e "SHOW SLAVE HOSTS\G"`
 
	##
	# Create arrays for the slave ids, hosts, ports
	# To manually create the slave arrays, do something like this instead:
	#
	# slave_ids=(3 4 5)
	# slave_hosts=(172.16.0.63 172.16.0.64 172.16.0.65)
	# slave_ports=(3306 3306 3306)
	#
	##
	slave_ids=(`echo "$SLAVE_LIST" | grep "Server_id" | $prog_awk -F ": " '{ print $2 }'`)
	slave_hosts=(`echo "$SLAVE_LIST" | grep "Host" | $prog_awk -F ": " '{ print $2 }'`)
	slave_ports=(`echo "$SLAVE_LIST" | grep "Port" | $prog_awk -F ": " '{ print $2 }'`)
 
	##
	# Define the number of slaves by the number of entries in the slave_ids[] array
	##
	num_slaves=${#slave_ids[*]}
 
	index=0
 
	if [ $num_slaves -eq 0 ]; then
		echo "No Replication Slaves appear in 'SHOW SLAVE HOSTS'"
		return $E_NO_SLAVES
	fi
 
	##
	# verify the checksums on each replicated slave
	##
	while [ "$index" -lt "$num_slaves" ]
	do
		slave_id=${slave_ids[$index]}
		slave_host=${slave_hosts[$index]}
		slave_port=${slave_ports[$index]}
 
		CHECKSUM=`$prog_mk_table_checksum --replicate=$MYSQL_CHECKSUM --replicate-check 2 h=$slave_host,P=$slave_port,u=$MYSQL_USER,p=$MYSQL_PASS` || CHECKSUM="not consistent"
 
		if [ "$CHECKSUM" ]; then
			echo "Replication Slave ID $slave_id on $slave_host:$slave_port is inconsistent. Requires rebuild"
		else
			echo "Replication Slave ID $slave_id on $slave_host:$slave_port is consistent."
		fi
		let "index = $index + 1"
	done
}
 
 
for arg in "$@"
do
	case $arg in
	-c) arg_c=true;;
	*) usage;;
	esac
done
 
if sanity_checks; then
	sanity=true
 
	if [ $arg_c ]; then
		echo "Checking consistency"
		check || error
	else
		usage
	fi
else
	error
fi
Older Posts »