MySQL · 主从复制延迟

判断

Seconds_Behind_Master 值不为 0,单位是 s。

风险

  • 从库复制夯住,会导致备份失败(flush tables with read lock 900s 超时)
  • 以从库为基准进行的备份,数据不是最新的有延迟
  • 异常情况下,主从 HA 无法切换,HA 需要检查数据的一致性,延迟时主备不一致

原因

最根本原因是主库是多线程读写,从库读取主库的 binlog 的线程只有一个,常见的场景如下:

  • 无主键、无索引或索引区分度不高
1
2
3
show slave status;
show open tables where In_use=1;
desc table;
  • 主库上有大事务,导致从库延时
  • 主库写入频繁,从库压力跟不上导致延时
  • 大量 myisam 表,在备份时导致 slave 延迟

解决方法

  • 为夯住的表添加主键或者索引
1
2
3
4
5
6
7
8
9
10
# 找到表区分度比较高的几个字段
select count(*) from xx;
select count(*) from (select distinct xx from xxx) t;

# 添加主键
stop slave;
set sql_log_bin=0;
alter table xx add key xx(xx);
alter table 'xxxx' add id int(10) unsigned primary key AUTO_INCREMENT;
start slave
  • 调整数据库中 IO 相关参数
1
2
3
4
5
6
7
8
9
10
11
12
13
mysql> select @@innodb_flush_log_at_trx_commit;
+----------------------------------+
| @@innodb_flush_log_at_trx_commit |
+----------------------------------+
| 0 |
+----------------------------------+

mysql> select @@sync_binlog;
+---------------+
| @@sync_binlog |
+---------------+
| 0 |
+---------------+
  • 修改表存储引擎为 InnoDB

案例 1

现象:从库两个线程 Slave_IO_RunningSlave_SQL_Running 均是 Yes,主从复制链路正常,但是 Seconds_Behind_Master 不断增大,且主从数据延迟有不断扩大的趋势。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.133.0.51
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-binlog.000232
Read_Master_Log_Pos: 3108253
Relay_Log_File: relay-log.000004
Relay_Log_Pos: 169851605
Relay_Master_Log_File: mysql-binlog.000229
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
....
Seconds_Behind_Master: 42302
Master_SSL_Verify_Server_Cert: No
....
  • 检查当前数据库线程状态,未发现有明显异常
1
2
3
4
5
6
7
8
9
10
mysql> show processlist;
+-------+-------------+-------------------+--------------------+---------+--------+----------------------------------+--------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+-------+-------------+-------------------+--------------------+---------+--------+----------------------------------+--------------------------------+
| 1 | system user | | NULL | Connect | 172777 | Waiting for master to send event | NULL |
| 2 | system user | | NULL | Connect | 678456 | Reading event from the relay log | NULL |
| 4345 | cvbak | localhost | NULL | Query | 130546 | Waiting for table flush | FLUSH /*!40101 LOCAL */ TABLES |
| 17662 | root | localhost | NULL | Query | 0 | starting | show processlist |
| 17671 | hdm | 10.133.0.52:37170 | information_schema | Sleep | 1 | | NULL |
+-------+-------------+-------------------+--------------------+---------+--------+----------------------------------+--------------------------------+
  • 检查当前正在使用的表:show open tables where In_use=1; 发现有一张表一直处于 In_use 状态。
1
2
3
4
5
6
7
mysql> show open tables where In_use=1;
+----------+---------------------------+--------+-------------+
| Database | Table | In_use | Name_locked |
+----------+---------------------------+--------+-------------+
| tdmetl | odsepg_zrfc_zco_zzfymx_sw | 1 | 0 |
+----------+---------------------------+--------+-------------+
1 row in set (0.00 sec)
  • 根据 Relay_Log_Pos 解析 relay-log.000004,查看当时正在执行的操作是 Delete_rows,操作的表是 tdmetl.odsepg_zrfc_zco_zzfymx_sw
1
mysqlbinlog -vv --base64-output=decode-rows relay-log.000004 --start-position=169851605 | more
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# at 169851605
#190718 4:02:58 server id 3306100 end_log_pos 169851451 CRC32 0x6683bf99 Anonymous_GTID last_committed=1831 sequence_number=1832 rbr_
only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;

# at 169851670
#190718 4:02:58 server id 3306100 end_log_pos 169851525 CRC32 0x9d910e5d Query thread_id=1254417 exec_time=0 error_code=0
SET TIMESTAMP=1563393778/*!*/;
SET @@session.pseudo_thread_id=1254417/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1436549128/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;

# at 169851744
#190718 4:02:58 server id 3306100 end_log_pos 169851626 CRC32 0x6b8bbd2d Table_map: `tdmetl`.`odsepg_zrfc_zco_zzfymx_sw` mapped to number 170
018

# at 169851845
#190718 4:02:58 server id 3306100 end_log_pos 169859832 CRC32 0x338e105a Delete_rows: table id 170018
  • 检查表结构,发现该表数据量很大且无主键。由此基本确定其问题根本原因:对无主键的表进行删除或者更新,导致从库夯住。该表数据量:77961221条。

image-20190822092437992

解决方法

  • 表添加自增主键。
1
ALTER TABLE 'xxxx' ADD id int UNSIGNED primary key AUTO_INCREMENT;
  • 大表删除使用 truncate 命令。

案例 2

原因分析

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
mysql> mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.133.7.5
Master_User: repl
Master_Port: 3306
Connect_Retry: 10
Master_Log_File: mysql-bin.000279
Read_Master_Log_Pos: 663610631
Relay_Log_File: mysqld-relay-bin.000007
Relay_Log_Pos: 139799382
Relay_Master_Log_File: mysql-bin.000273
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
....
Seconds_Behind_Master: 475123

mysql> show open tables where In_use=1;
+----------+-------------------+--------+-------------+
| Database | Table | In_use | Name_locked |
+----------+-------------------+--------+-------------+
| hce | dim_r_cei_general | 1 | 0 |
+----------+-------------------+--------+-------------+

mysql> desc dim_r_cei_general;
+---------------------+------------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+------------------+------+-----+-------------------+----------------+
| MATERIAL_CODE | varchar(30) | YES | | NULL | |
| MATERIAL_DESCRITION | varchar(383) | YES | | NULL | |
| PA_DATE_ACTUAL | varchar(30) | YES | | NULL | |
| HR_INDEX | varchar(30) | YES | | NULL | |
| SALE_CHAR | varchar(100) | YES | | NULL | |
| CREATE_DATE | datetime | YES | | CURRENT_TIMESTAMP | |
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
+---------------------+------------------+------+-----+-------------------+----------------+

解决方法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
mysql> show variables like 'sync_binlog';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| sync_binlog | 0 |
+---------------+-------+
1 row in set (0.00 sec)

mysql> show variables like 'innodb_flush_log_at_trx_commit';
+--------------------------------+-------+
| Variable_name | Value |
+--------------------------------+-------+
| innodb_flush_log_at_trx_commit | 0 |
+--------------------------------+-------+

mysql> select count(*) from dim_r_cei_general;
+----------+
| count(*) |
+----------+
| 1933121 |
+----------+

mysql> select count(*) from (select distinct MATERIAL_CODE from dim_r_cei_general) t;
+----------+
| count(*) |
+----------+
| 1933111 |
+----------+

mysql> stop slave;
mysql> set sql_log_bin=0;
mysql> alter table dim_r_cei_general add key index_MATERIAL_CODE(MATERIAL_CODE);
mysql> start slave;

mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.133.7.5
Master_User: repl
Master_Port: 3306
Connect_Retry: 10
Master_Log_File: mysql-bin.000279
Read_Master_Log_Pos: 663999593
Relay_Log_File: mysqld-relay-bin.000010
Relay_Log_Pos: 336125628
Relay_Master_Log_File: mysql-bin.000274
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
....
Seconds_Behind_Master: 462357