MySQL · 案例分析 · 连接状态 init

问题现象

HDM 监控告警

您的自建DB实例

cosmo-aps|cosmo-aps-t@10.138.232.84:3306 ( 别名: COSMOAPS_S, 集群: 智能制造 )

MySQL活跃会话数达到551

查看当前会话情况,发现出现大量的 State is init.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
+--------+-------------+---------------------+--------------------+---------+---------+----------------------------------------+------------------------------------------------------------------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+--------+-------------+---------------------+--------------------+---------+---------+----------------------------------------+------------------------------------------------------------------------------------------------------+
| 34 | system user | | NULL | Connect | 4814602 | Queueing master event to the relay log | NULL |
| 312112 | sre | 10.138.232.84:39700 | information_schema | Query | 36129 | checking query cache for query | /* rds internal mark */ /* hdm internal mark */ /* rds internal mark */ /* hdm internal mark */ SHOW |
| 312113 | prom | 127.0.0.1:43690 | NULL | Query | 36131 | init | SHOW SLAVE STATUS |
| 312117 | prom | 127.0.0.1:43698 | NULL | Query | 36071 | init | SHOW SLAVE STATUS |
| 312120 | prom | 127.0.0.1:43704 | NULL | Query | 36011 | init | SHOW SLAVE STATUS |
| 312123 | prom | 127.0.0.1:43710 | NULL | Query | 35951 | init | SHOW SLAVE STATUS |
| 312126 | prom | 127.0.0.1:43716 | NULL | Query | 35891 | init | SHOW SLAVE STATUS |
| 312128 | prom | 127.0.0.1:43720 | NULL | Query | 35831 | init | SHOW SLAVE STATUS |
| 312131 | prom | 127.0.0.1:43726 | NULL | Query | 35771 | init | SHOW SLAVE STATUS |
| 312135 | prom | 127.0.0.1:43734 | NULL | Query | 35711 | init | SHOW SLAVE STATUS |
| 312138 | prom | 127.0.0.1:43740 | NULL | Query | 35651 | init | SHOW SLAVE STATUS |
| 312141 | prom | 127.0.0.1:43746 | NULL | Query | 35591 | init | SHOW SLAVE STATUS |
| 312143 | prom | 127.0.0.1:43750 | NULL | Query | 35531 | init | SHOW SLAVE STATUS |
| 312146 | prom | 127.0.0.1:43756 | NULL | Query | 35471 | init | SHOW SLAVE STATUS |
| 312149 | prom | 127.0.0.1:43762 | NULL | Query | 35411 | init | SHOW SLAVE STATUS |
| 312152 | prom | 127.0.0.1:43768 | NULL | Query | 35351 | init | SHOW SLAVE STATUS |
| 312155 | prom | 127.0.0.1:43774 | NULL | Query | 35291 | init | SHOW SLAVE STATUS |
| 312157 | prom | 127.0.0.1:43778 | NULL | Query | 35231 | init | SHOW SLAVE STATUS |
| 312160 | prom | 127.0.0.1:43784 | NULL | Query | 35171 | init | SHOW SLAVE STATUS |
| 312164 | prom | 127.0.0.1:43792 | NULL | Query | 35111 | init | SHOW SLAVE STATUS |
| 312167 | prom | 127.0.0.1:43798 | NULL | Query | 35051 | init | SHOW SLAVE STATUS |
| 312170 | prom | 127.0.0.1:43804 | NULL | Query | 34991 | init | SHOW SLAVE STATUS |
| 312172 | prom | 127.0.0.1:43808 | NULL | Query | 34931 | init | SHOW SLAVE STATUS |
| 312175 | prom | 127.0.0.1:43814 | NULL | Query | 34871 | init | SHOW SLAVE STATUS |
| 312178 | prom | 127.0.0.1:43820 | NULL | Query | 34811 | init | SHOW SLAVE STATUS |
| 312181 | prom | 127.0.0.1:43826 | NULL | Query | 34751 | init | SHOW SLAVE STATUS |
| 312184 | prom | 127.0.0.1:43832 | NULL | Query | 34691 | init | SHOW SLAVE STATUS |
| 312186 | prom | 127.0.0.1:43836 | NULL | Query | 34631 | init | SHOW SLAVE STATUS |
| 312189 | prom | 127.0.0.1:43842 | NULL | Query | 34571 | init | SHOW SLAVE STATUS |
| 312193 | prom | 127.0.0.1:43850 | NULL | Query | 34511 | init | SHOW SLAVE STATUS |
| 312196 | prom | 127.0.0.1:43856 | NULL | Query | 34451 | init | SHOW SLAVE STATUS |
| 312199 | prom | 127.0.0.1:43862 | NULL | Query | 34391 | init | SHOW SLAVE STATUS |
| 312201 | prom | 127.0.0.1:43866 | NULL | Query | 34331 | init | SHOW SLAVE STATUS |
| 312204 | prom | 127.0.0.1:43872 | NULL | Query | 34271 | init | SHOW SLAVE STATUS |
| 312207 | prom | 127.0.0.1:43878 | NULL | Query | 34211 | init | SHOW SLAVE STATUS |
| 312210 | prom | 127.0.0.1:43884 | NULL | Query | 34151 | init | SHOW SLAVE STATUS |
| 312213 | prom | 127.0.0.1:43890 | NULL | Query | 34091 | init | SHOW SLAVE STATUS |
| 312215 | prom | 127.0.0.1:43894 | NULL | Query | 34031 | init | SHOW SLAVE STATUS |
| 312218 | prom | 127.0.0.1:43900 | NULL | Query | 33971 | init | SHOW SLAVE STATUS |
| 312222 | prom | 127.0.0.1:43908 | NULL | Query | 33911 | init | SHOW SLAVE STATUS |
| 312225 | prom | 127.0.0.1:43914 | NULL | Query | 33851 | init | SHOW SLAVE STATUS |
| 312228 | prom | 127.0.0.1:43920 | NULL | Query | 33791 | init | SHOW SLAVE STATUS |
| 312230 | prom | 127.0.0.1:43924 | NULL | Query | 33731 | init | SHOW SLAVE STATUS |
| 312233 | prom | 127.0.0.1:43930 | NULL | Query | 33671 | init | SHOW SLAVE STATUS |
| 312236 | prom | 127.0.0.1:43936 | NULL | Query | 33611 | init | SHOW SLAVE STATUS |
| 312238 | prom | 127.0.0.1:43940 | NULL | Query | 33551 | init | SHOW SLAVE STATUS |
| 312241 | prom | 127.0.0.1:43946 | NULL | Query | 33491 | init | SHOW SLAVE STATUS |
| 312244 | prom | 127.0.0.1:43952 | NULL | Query | 33431 | init | SHOW SLAVE STATUS |
| 313865 | prom | 127.0.0.1:47194 | NULL | Query | 11 | init | SHOW SLAVE STATUS |
| 313866 | sre | localhost | NULL | Query | 0 | init | show processlist
+--------+-------------+---------------------+--------------------+---------+---------+----------------------------------------+------------------------------------------------------------------------------------------------------+

原因分析

从错误日志发现,2019-11-06 22:53:40 开始提醒磁盘空间不足

1
2
3
4
5
6
7
2019-11-06 22:53:40 5752 [Warning] Disk is full writing './relay-log.002275' (Errcode: 28 - No space left on device). Waiting for someone to free space...
2019-11-06 22:53:40 5752 [Warning] Retry in 60 secs. Message reprinted in 600 secs
2019-11-06 22:53:42 5752 [ERROR] Slave SQL: Could not execute Update_rows event on table cosmo-aps.bns_aps_versiontolineinfo; Error writing file '/data/mysql/tmp/MLpTCcJu' (Errcode: 28 - No space left on device), Error_code: 3; Error writing file 'mysql-bin' (errno: 28 - No space left on device), Error_code: 1026; handler error HA_ERR_RBR_LOGGING_FAILED; the event's master log mysql-bin.000528, end_log_pos 674303230, Error_code: 3
2019-11-06 22:53:42 5752 [Warning] Slave: Error writing file '/data/mysql/tmp/MLpTCcJu' (Errcode: 28 - No space left on device) Error_code: 3
2019-11-06 22:53:42 5752 [Warning] Slave: Error writing file 'mysql-bin' (errno: 28 - No space left on device) Error_code: 1026
2019-11-06 22:53:42 5752 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000528' position 661219322
2019-11-06 22:54:40 5752 [Warning] Disk is full writing './relay-log.002275' (Errcode: 28 - No space left on device). Waiting for someone to free space...

检查磁盘空间使用情况,发现 /data 目录已占满

1
2
3
4
5
6
7
8
9
10
11
12
[root@cosmoim-db04 log]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/centos-root 100G 7.1G 93G 8% /
devtmpfs 63G 0 63G 0% /dev
tmpfs 63G 0 63G 0% /dev/shm
tmpfs 63G 826M 63G 2% /run
tmpfs 63G 0 63G 0% /sys/fs/cgroup
/dev/sda1 497M 172M 326M 35% /boot
/dev/mapper/centos-data 888G 888G 20K 100% /data
tmpfs 13G 12K 13G 1% /run/user/42
tmpfs 13G 0 13G 0% /run/user/1002
tmpfs 13G 0 13G 0% /run/user/0

/data 目录下磁盘占满的主要原因 Nov 6 14:44 出现主从复制中断导致 relay-log 疯狂增长

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
[root@cosmoim-db04 data]# ll -h
total 798G
-rw-rw---- 1 mysql mysql 56 Sep 12 15:27 auto.cnf
drwxr-x--- 2 mysql mysql 8.0K Oct 31 18:48 cosmo@002daps
drwxr-x--- 2 mysql mysql 8.0K Oct 21 20:27 cosmo@002daps@002dt
-rw-r----- 1 mysql mysql 10G Nov 6 22:53 ibdata1
-rw-r----- 1 mysql mysql 10G Sep 12 14:13 ibdata2
-rw-r----- 1 mysql mysql 512M Nov 6 22:53 ib_logfile0
-rw-r----- 1 mysql mysql 512M Oct 28 15:47 ib_logfile1
-rw-r----- 1 mysql mysql 512M Nov 6 22:53 ib_logfile2
-rw-r----- 1 mysql mysql 12M Sep 12 14:13 ibtmp1
-rw-rw---- 1 mysql mysql 139 Nov 7 09:23 master.info
drwxr-x--- 2 mysql mysql 4.0K Sep 12 14:13 mysql
-rw-rw---- 1 mysql mysql 1.1G Oct 28 14:44 mysql-bin.000002
-rw-rw---- 1 mysql mysql 895M Nov 6 22:27 mysql-bin.000003
-rw-rw---- 1 mysql mysql 38 Oct 28 14:44 mysql-bin.index
-rw-rw---- 1 mysql mysql 5 Sep 12 15:27 mysql.pid
drwxr-x--- 2 mysql mysql 4.0K Sep 12 14:13 performance_schema
-rw-rw---- 1 mysql mysql 214 Oct 28 15:23 relay-log.000009
-rw-rw---- 1 mysql mysql 1.1G Nov 6 14:44 relay-log.000010
-rw-rw---- 1 mysql mysql 4.4M Nov 6 14:44 relay-log.000011
-rw-rw---- 1 mysql mysql 214 Nov 6 14:44 relay-log.000012
-rw-rw---- 1 mysql mysql 1.1G Nov 6 14:45 relay-log.000013
-rw-rw---- 1 mysql mysql 17M Nov 6 14:45 relay-log.000014
-rw-rw---- 1 mysql mysql 214 Nov 6 14:45 relay-log.000015
-rw-rw---- 1 mysql mysql 1.1G Nov 6 14:45 relay-log.000016
-rw-rw---- 1 mysql mysql 17M Nov 6 14:45 relay-log.000017
-rw-rw---- 1 mysql mysql 214 Nov 6 14:45 relay-log.000018
-rw-rw---- 1 mysql mysql 1.1G Nov 6 14:46 relay-log.000019
-rw-rw---- 1 mysql mysql 17M Nov 6 14:46 relay-log.000020
-rw-rw---- 1 mysql mysql 214 Nov 6 14:46 relay-log.000021
....

释放部分磁盘空间后,连接会话恢复正常

1
2
3
4
5
6
7
8
9
mysql> show processlist;
+--------+-------------+---------------------+--------------------+---------+---------+----------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+--------+-------------+---------------------+--------------------+---------+---------+----------------------------------+------------------+
| 34 | system user | | NULL | Connect | 4816502 | Waiting for master to send event | NULL |
| 313974 | sre | 10.138.232.84:43874 | information_schema | Sleep | 7 | | NULL |
| 313976 | sre | localhost | NULL | Query | 0 | init | show processlist |
+--------+-------------+---------------------+--------------------+---------+---------+----------------------------------+------------------+
3 rows in set (0.00 sec)