MySQL · 案例集锦

系列

MySQL · 案例分析 · 主从复制中断
MySQL · 案例分析 · 磁盘空间不足
MySQL · 案例分析 · information_schema.tables 与 count(*) 查询数不一致

数据库连接失败

问题现象

项目经理火急火燎的反馈，说数据库连接不上了，请求排查问题原因。

原因分析

测试 Navicate 是否能够连上数据库？
登录服务器，查看 mysqld 进程是否存在，如果进程不存在，可能是服务器断电导致进程未自启动。

$ ssh -p 2222 01510886@10.138.16.192

$ ps -ef | grep mysqld
root      64511  64440  0 10:27 pts/2    00:00:00 grep --color=auto mysqld
root      79355      1  0 Aug31 ?        00:00:00 /bin/sh /usr/local/mysql/bin/mysqld_safe --datadir=/data/mysql --pid-file=/data/mysql/mysql.pid
mysql     79981  79355 99 Aug31 ?        2-01:38:14 /usr/local/mysql/bin/mysqld --basedir=/usr/local/mysql --datadir=/data/mysql --plugin-dir=/usr/local/mysql/lib/plugin --user=mysql --log-error=/data/mysql/error.log --pid-file=/data/mysql/mysql.pid --socket=/data/mysql/mysql.sock --port=3306

1 2	$ mysql -S /data/mysql/mysql.sock -usre -p'sre@@p$i' mysql> show processlist;

解决方法

如果是第一种原因，启动 mysql 进程

$ cd /etc/init.d
$ ./mysqld stop/start

$ cd /usr/local/mysql/support-files
$ ./mysql.server stop/start

如果是第二种原因，删除正在执行的锁表线程

# 杀掉ID为1的线程, 关闭连接
kill id; 

# 查询出时间大于10的线程, 复制查询结果，批量执行
select concat('kill ', id, ';') from information_schema.processlist where time > 10;

备份夯住

现象描述

据备份供应商反映，昨天晚上 22:00 XX系统，从库备份失败，使用的备份工具是 mysqldump，请求协助定位备份失败原因。

原因分析

（1）检查从库两个线程 Slave_IO_Running 和 Slave_SQL_Running 均是 Yes，说明主从复制链路正常，但是 Seconds_Behind_Master 不断增大，且主从数据延迟有不断扩大的趋势。

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.133.0.51
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-binlog.000232
          Read_Master_Log_Pos: 3108253
               Relay_Log_File: relay-log.000004
                Relay_Log_Pos: 169851605
        Relay_Master_Log_File: mysql-binlog.000229
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 169851386
              Relay_Log_Space: 3270427001
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 42302
Master_SSL_Verify_Server_Cert: No
              ....
       
1 row in set (0.00 sec)

（2）检查当前数据库线程状态，未发现有明显异常。

mysql> show processlist;
+-------+-------------+-------------------+--------------------+---------+--------+----------------------------------+--------------------------------+
| Id    | User        | Host              | db                 | Command | Time   | State                            | Info                           |
+-------+-------------+-------------------+--------------------+---------+--------+----------------------------------+--------------------------------+
|     1 | system user |                   | NULL               | Connect | 172777 | Waiting for master to send event | NULL                           |
|     2 | system user |                   | NULL               | Connect | 678456 | Reading event from the relay log | NULL                           |
|  4345 | cvbak       | localhost         | NULL               | Query   | 130546 | Waiting for table flush          | FLUSH /*!40101 LOCAL */ TABLES |
| 17662 | root        | localhost         | NULL               | Query   |      0 | starting                         | show processlist               |
| 17671 | hdm         | 10.133.0.52:37170 | information_schema | Sleep   |      1 |                                  | NULL                           |
+-------+-------------+-------------------+--------------------+---------+--------+----------------------------------+--------------------------------+

（3）检查当前正在使用的表：show open tables where In_use=1; 发现有一张表一直处于 In_use 状态。

mysql> show open tables where In_use=1;
+----------+---------------------------+--------+-------------+
| Database | Table                     | In_use | Name_locked |
+----------+---------------------------+--------+-------------+
| tdmetl   | odsepg_zrfc_zco_zzfymx_sw |      1 |           0 |
+----------+---------------------------+--------+-------------+
1 row in set (0.00 sec)

（4）根据 Relay_Log_Pos 解析 relay-log.000004，查看当时正在执行的操作是 Delete_rows，操作的表是 tdmetl.odsepg_zrfc_zco_zzfymx_sw。

1	# mysqlbinlog -vv --base64-output=decode-rows relay-log.000004 --start-position=169851605 \| more

# at 169851605
#190718  4:02:58 server id 3306100  end_log_pos 169851451 CRC32 0x6683bf99 	Anonymous_GTID	last_committed=1831	sequence_number=1832	rbr_
only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;

# at 169851670
#190718  4:02:58 server id 3306100  end_log_pos 169851525 CRC32 0x9d910e5d 	Query	thread_id=1254417	exec_time=0	error_code=0
SET TIMESTAMP=1563393778/*!*/;
SET @@session.pseudo_thread_id=1254417/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1436549128/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;

# at 169851744
#190718  4:02:58 server id 3306100  end_log_pos 169851626 CRC32 0x6b8bbd2d 	Table_map: `tdmetl`.`odsepg_zrfc_zco_zzfymx_sw` mapped to number 170
018

# at 169851845
#190718  4:02:58 server id 3306100  end_log_pos 169859832 CRC32 0x338e105a 	Delete_rows: table id 170018

（5）检查表结构，发现该表数据量很大且无主键。由此基本确定其问题根本原因：对无主键的表进行删除或者更新，导致从库夯住。该表数据量：77961221条。

解决方法

（1）表添加自增主键。

1	ALTER TABLE 'xxxx' ADD id int UNSIGNED primary key AUTO_INCREMENT;

（2）大表删除使用 truncate 命令。

建议

避免无 where 条件的 delete 或 update 操作大表，如果需要全表删除，使用 truncate 操作。
在 binlog row 模式下表结构最好能有主键。

主从复制中断

问题现象

监控系统，检测到xx项目主从复制中断，并发送了告警

原因分析

1）登录所报告警数据库，查看复制状态

ssh haieradmin@10.138.8.216
ssh root@10.159.37.148
ssh haieradmin@10.159.44.168

MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.133.8.96
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.002660
          Read_Master_Log_Pos: 343075905
               Relay_Log_File: relay-log-bin.003429
                Relay_Log_Pos: 555
        Relay_Master_Log_File: mysql-bin.001741
             Slave_IO_Running: Yes
            Slave_SQL_Running: No

经检查 Slave_SQL_Running 为 No，原因一般都很明显，记录在 Last_Error，根据报错原因选择合适地解决方法，最常用的解决办法是跳过这一条正在执行的 SQL。

解决方法

（1）登陆 10.133.0.53

（2）执行下面操作

pt-slave-restart -h 'x.x.x.x' -u'xx' -p'xx' -P 3306

#一般是事务回滚造成的
stop slave;
set GLOBAL SQL_SLAVE_SKIP_COUNTER=1;
start slave;

mysql> show slave status\G
************************* 1. row ***************************
               Slave_IO_State: 
                  Master_Host: 10.135.22.70
                  Master_User: repluser
                  Master_Port: 3308
                Connect_Retry: 10
              Master_Log_File: mysql-bin.001935
          Read_Master_Log_Pos: 6587
               Relay_Log_File: relay-log.001487
                Relay_Log_Pos: 4
        Relay_Master_Log_File: mysql-bin.001935
             Slave_IO_Running: No
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: test.%
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 6587
              Relay_Log_Space: 107
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 1236
                Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file'
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1

1
2
3

Slave_IO_Running: No
Slave_SQL_Running: Yes
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file'

[root@hop02 mzjh]# ll | grep mysql-bin
-rw-rw----. 1 mysql mysql  1073742013 Sep 12 14:55 mysql-bin.001948
-rw-rw----. 1 mysql mysql  1073742073 Sep 13 11:53 mysql-bin.001949
-rw-rw----. 1 mysql mysql  1073742120 Sep 14 08:39 mysql-bin.001950
-rw-rw----. 1 mysql mysql  1073859355 Sep 15 06:21 mysql-bin.001951
-rw-rw----. 1 mysql mysql  1074054892 Sep 16 06:18 mysql-bin.001952
-rw-rw----. 1 mysql mysql  1075243641 Sep 17 06:18 mysql-bin.001953
-rw-rw----. 1 mysql mysql  1074801420 Sep 18 06:16 mysql-bin.001954
-rw-rw----. 1 mysql mysql  1075257243 Sep 19 06:16 mysql-bin.001955
-rw-rw----. 1 mysql mysql   619405162 Sep 19 13:32 mysql-bin.001956
-rw-rw----. 1 mysql mysql  1072898197 Sep 20 09:44 mysql-bin.001957
-rw-r-----. 1 mysql mysql         190 Sep 19 13:32 mysql-bin.index

从库在主库找不到 binlog 文件

原因

该错误发生在从库的io进程从主库拉取日志时，发现主库的mysql_bin.index文件中第一个文件不存在。出现此类报错可能是由于你的slave 由于某种原因停止了好长一段是时间，当你重启slave 复制的时候，在主库上找不到相应的binlog ,会报此类错误。或者是由于某些设置主库上的binlog被删除了，导致从库获取不到对应的binglog file。

表碎片

表碎片是 MySQL 运维过程中比较常见的问题，碎片的存在十分影响数据库的性能。

判断方法

MySQL 的碎片是否产生，通过查看

1	show table status from table_name\G;

这个命令中 Data_free 字段，如果该字段不为 0，则产生了数据碎片。

产生原因

1. 经常进行 delete 操作

经常进行 delete 操作，产生空白空间，如果进行新的插入操作，MySQL 将尝试利用这些留空的区域，但仍然无法将其彻底占用，久而久之就产生了碎片；

2. update 更新

update 更新可变长度的字段(例如：varchar 类型)，将长的字符串更新成短的。之前存储的内容长，后来存储是短的，即使后来插入新数据，那么有一些空白区域还是没能有效利用的。

产生影响

由于碎片空间是不连续的，导致这些空间不能充分被利用；
由于碎片的存在，导致数据库的磁盘 I/O 操作变成离散随机读写，加重了磁盘 I/O 的负担。

清理办法

MyISAM：optimize table 表名；（OPTIMIZE 可以整理数据文件，并重排索引）
InnoDB：

（1）重建表存储引擎：ALTER TABLE tablename ENGINE = InnoDB;

（2）进行一次数据的导入导出

碎片清理的性能对比：

引用我之前一个生产库的数据，对比一下清理前后的 SQL 执行速度差异：

1
2
3

select count(*) from test.twitter_11;
修改前：1 row in set (7.37 sec)
修改后：1 row in set (1.28 sec)

总结

通过对比，可以看到碎片清理前后，节省了很多空间，SQL执行效率更快。所以，在日常运维工作中，应该对表碎片进行定期清理，保证数据库有稳定的性能。

触发器创建失败

问题现象

创建触发器，报错提示”1419”

1 2	ERROR 1419 (HY000): You do not have the SUPER privilege and binary logging is enabled (you might want to use the less safe log_bin_trust_routine_creators variable)

原因分析

有些操作要禁止掉对 SUPER 权限的要求，要处理可以设置 log_bin_trust_routine_creators 全局系统变量为 1，默认地，这个变量值为 0。

解决方法

直接在数据库进行修改，也可以在启动服务器之时用 --log-bin-trust-routine-creators 选项来设置允许这个变量。

1	mysql> SET GLOBAL log_bin_trust_routine_creators = 1;