mariadb+galera集群

1.规划:

数据文件存储位置 /mysqldb/maridb/data
日志文件存储位置 /mysqldb/maridb/log
binlog文件存储位置 /mysqldb/maridb/binlog

缓存目录位置 /mysqldb/maridb/tmp

2.创建目录和授权:

复制代码
mkdir -p /mysqldb/maridb/data
mkdir -p /mysqldb/maridb/log
mkdir -p /mysqldb/maridb/binlog
mkdir -p /mysqldb/maridb/tmp
chown -R mysql:mysql /mysqldb/maridb
chmod -R 775 /mysqldb/maridb

复制代码
创建数据库目录和赋予数据库普通用户权限

yum install mariadb-server,mariadb-client,galera -y

mariadb 版本为:10.3.20-MariaDB

配置mysql的配置文件

复制代码

cat >> /etc/my.cnf << EOF
[client]
port = 13306
socket = /tmp/mysql.sock

[mysqld]
user = mysql
datadir = /mysqldb/maridb/data
pid-file = /mysqldb/maridb/data/mysql.pid
server-id = 101
relay_log =/mysqldb/maridb/log/mysql_relay.log
bind-address = 0.0.0.0
port = 13306

tmpdir=/mysqldb/maridb/tmp
#当数据库是大量存储时建议指导TMP目录,否则做全表操作时临时空间会不足

init_connect =’SET NAMES utf8’
character-set-server = utf8
skip-name-resolve
back_log = 300

max_connections = 8019
max_connect_errors = 1024000
open_files_limit = 65535
table_open_cache = 2048
max_allowed_packet = 50M
max_heap_table_size = 512M
tmp_table_size = 256M

read_buffer_size = 2M
read_rnd_buffer_size = 16M
sort_buffer_size = 8M
join_buffer_size = 8M
key_buffer_size = 512M
thread_cache_size = 128

query_cache_type = 2
query_cache_size = 128M
query_cache_limit = 5M
thread_stack = 192k
ft_min_word_len = 4

log_bin = /mysqldb/maridb/binlog/mysql_bin.log
binlog_format = ROW
expire_logs_days = 7

log_error = /mysqldb/maridb/log/mysql_error.log
slow_query_log = 1
long_query_time = 1
log_slow_verbosity=query_plan
slow_query_log_file = /mysqldb/maridb/log/mysql_slow.log
performance_schema = 0

skip-external-locking #跳过外部锁定,避免external locking

bulk_insert_buffer_size = 64M
myisam_sort_buffer_size = 128M
myisam_max_sort_file_size = 10G
myisam_repair_threads = 1

###InnoDB###
default_storage_engine = InnoDB
innodb_file_per_table = 1
innodb_open_files = 500
innodb_buffer_pool_size = 2048M
innodb_write_io_threads = 8
innodb_read_io_threads = 8
innodb_thread_concurrency = 16
innodb_purge_threads = 1
innodb_flush_log_at_trx_commit = 1
innodb_log_buffer_size = 8M
innodb_log_file_size = 256M
innodb_log_files_in_group = 3
innodb_max_dirty_pages_pct = 90
innodb_lock_wait_timeout = 10
innodb_autoinc_lock_mode = 2
innodb_doublewrite = 1
###New ADD
innodb_rollback_on_timeout = 1
innodb_force_recovery=0
interactive_timeout = 28800
wait_timeout = 120
#skip-grant-tables

[galera]
wsrep_on=ON
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_name=”mariadb_galera_cluster”

wsrep_cluster_address=”gcomm://openstackc01,openstackc02,openstackc03”
wsrep_node_name=”openstackc03”
wsrep_node_address=”21.57.32.3”

binlog_format=ROW
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
wsrep_slave_threads=4
innodb_flush_log_at_trx_commit=2
innodb_buffer_pool_size=1024M
wsrep_sst_method=rsync

[mysqldump]
quick
max_allowed_packet = 16M

[myisamchk]
key_buffer_size = 512M
sort_buffer_size = 8M
read_buffer = 4M
write_buffer = 4M
EOF

初始化数据库:
systemctl start mariadb

三节点集群,mariadb启动时需通过
galera_new_cluster
启动

后面两节点直接:
systemctl start mariadb 启动即可

mariadb密码为空,如无法登陆则参数增加:skip-grant-tables重启登陆

登陆数据库查看参数:
flush privileges;
alter user ‘root‘@’%’ identified by ‘HN-hepc@2020’;
update mysql.user set host=’%’ where user=’root’ and host=’localhost’;
flush privileges;
SHOW STATUS LIKE ‘wsrep_cluster_size’;
MariaDB [(none)]> show global status like ‘ws%’;
+——————————-+———————————————————-+
| Variable_name | Value |
+——————————-+———————————————————-+
| wsrep_applier_thread_count | 8 |
| wsrep_apply_oooe | 0.000000 |
| wsrep_apply_oool | 0.000000 |
| wsrep_apply_window | 0.000000 |
| wsrep_causal_reads | 0 |
| wsrep_cert_deps_distance | 0.000000 |
| wsrep_cert_index_size | 0 |
| wsrep_cert_interval | 0.000000 |
| wsrep_cluster_conf_id | 5 |
| wsrep_cluster_size | 3 |
| wsrep_cluster_state_uuid | a4d25ee0-515c-11eb-b451-66b7a1b050e2 |
| wsrep_cluster_status | Primary |
| wsrep_cluster_weight | 3 |
| wsrep_commit_oooe | 0.000000 |
| wsrep_commit_oool | 0.000000 |
| wsrep_commit_window | 0.000000 |
| wsrep_connected | ON |
| wsrep_desync_count | 0 |
| wsrep_evs_delayed | |
| wsrep_evs_evict_list | |
| wsrep_evs_repl_latency | 0.000308077/0.00394484/0.0110307/0.00501106/3 |
| wsrep_evs_state | OPERATIONAL |
| wsrep_flow_control_active | false |
| wsrep_flow_control_paused | 0.000000 |
| wsrep_flow_control_paused_ns | 0 |
| wsrep_flow_control_recv | 0 |
| wsrep_flow_control_requested | false |
| wsrep_flow_control_sent | 0 |
| wsrep_gcomm_uuid | 4fd60274-523f-11eb-b7dd-476f9c9f0398 |
| wsrep_gmcast_segment | 0 |
| wsrep_incoming_addresses | 192.168.40.60:3306,192.168.40.70:3306,192.168.40.50:3306 |
| wsrep_last_committed | 0 |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_cached_downto | 18446744073709551615 |
| wsrep_local_cert_failures | 0 |
| wsrep_local_commits | 0 |
| wsrep_local_index | 2 |
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_avg | 0.000000 |
| wsrep_local_recv_queue_max | 1 |
| wsrep_local_recv_queue_min | 0 |
| wsrep_local_replays | 0 |
| wsrep_local_send_queue | 0 |
| wsrep_local_send_queue_avg | 0.000000 |
| wsrep_local_send_queue_max | 1 |
| wsrep_local_send_queue_min | 0 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_local_state_uuid | a4d25ee0-515c-11eb-b451-66b7a1b050e2 |
| wsrep_open_connections | 0 |
| wsrep_open_transactions | 0 |
| wsrep_protocol_version | 9 |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy info@codership.com |
| wsrep_provider_version | 25.3.31(r0ede97d) |
| wsrep_ready | ON |
| wsrep_received | 2 |
| wsrep_received_bytes | 306 |
| wsrep_repl_data_bytes | 0 |
| wsrep_repl_keys | 0 |
| wsrep_repl_keys_bytes | 0 |
| wsrep_repl_other_bytes | 0 |
| wsrep_replicated | 0 |
| wsrep_replicated_bytes | 0 |
| wsrep_rollbacker_thread_count | 1 |
| wsrep_thread_count | 9 |
+——————————-+———————————————————-+
66 rows in set (0.001 sec)

wsrep_cluster_status为Primary,表示节点为主节点,正常读写

wsrep_ready为ON,表示集群正常运行

wsrep_cluster_size为 3,表示集群有三个节点

注:
三节点集群,mariadb初次启动时需通过
galera_new_cluster
启动,剩余节点正常启动即可

mysql galera 集群常见问题处理

一、mysql HA集群在断网过久或者所有节点都down了之后的恢复有以下的方法:
解决方案1:
1、等三台机器恢复网络通讯后,因为此时的mysql已经异常无法加入集群,因此需要先保证所有的mysql都是down的,再上台执行
systemctl set-environment _WSREP_NEW_CLUSTER=’–wsrep-new-cluster’ &&
systemctl start ${1:-mariadb}
这条命令,并进入mysql(只有一台机器能够成功执行,其他机器执行了过几秒钟都会异常退出这个进程,我们这里把能够成功执行的机器称为master)
2、此时三台只有一台能够成功进入mysql(即执行mysql这条命令),在非master上的两台上一台一台的执行/etc/init.d/mysql start,必须等一台成功了,另一台才能执行

3、在mysql中执行show status like “wsrep%”;

我们需要保证的第一项为synced,以及第二项必须为三个mysql的ip

4、保证3的结果是想要的说明集群已经恢复了,此时需要将master机器上面的 systemctl set-environment _WSREP_NEW_CLUSTER=’–wsrep-new-cluster’ &&
systemctl start ${1:-mariadb}这个进程kill掉,然后再执行 /etc/init.d/mysql start 即可

二、mysql HA集群某个节点无故down了并且有一段时间处于down的情况通过以下方式恢复:

1、 若日志里面出现以下日志

[Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (eb9f50c6-bc95-11e5-a735-9f48e437dc03): 1 (Operation not permitted)

解决方法:删除/var/lib/mysql/grastate.dat 文件(若还存在无法同步的情况则删除galera.cache文件)

2、 若那个down了的节点出现以下日志

(异常情况集群挂了)[ERROR] Found 1 prepared transactions! It means that mysqld was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start mysqld with –tc-heuristic-recover switch to commit or rollback pending transactions

解决方法:

1、systemctl start mariadb –innodb_force_recovery=6

(SRV_FORCE_IGNORE_CORRUPT):忽略检查到的corrupt页
(SRV_FORCE_NO_BACKGROUND):阻止主线程的运行,如主线程需要执行full purge操作,会导致crash
(SRV_FORCE_NO_TRX_UNDO):不执行事务回滚操作
(SRV_FORCE_NO_IBUF_MERGE):不执行插入缓冲的合并操作
(SRV_FORCE_NO_UNDO_LOG_SCAN):不查看重做日志,InnoDB存储引擎会将未提交的事务视为已提交
(SRV_FORCE_NO_LOG_REDO):不执行前滚的操作
如果配置后出现以下情况:
130507 14:14:01 InnoDB: Waiting for the background threads to start
130507 14:14:02 InnoDB: Waiting for the background threads to start
130507 14:14:03 InnoDB: Waiting for the background threads to start
130507 14:14:04 InnoDB: Waiting for the background threads to start
130507 14:14:05 InnoDB: Waiting for the background threads to start
130507 14:14:06 InnoDB: Waiting for the background threads to start
130507 14:14:07 InnoDB: Waiting for the background threads to start
130507 14:14:08 InnoDB: Waiting for the background threads to start
130507 14:14:09 InnoDB: Waiting for the background threads to start

需要在galera.cfg中添加这一下:
如果在设置 innodb_force_recovery >2 的同时innodb_purge_thread = 0
2、mysqld –tc-heuristic-recover=ROLLBACK
3、删除/var/lib/mysql/ib_logfile*
4、当某个mysql节点挂了,并且存在三个mysql所在host有不同的网段,当mysql想重新加入需要一个sst的过程,sst时会需要知道集群中某个节点的ip因此需要制定参数–wsrep-sst-receive-address否则可能出现同步的ip不在三台机器所共有的网段

三、一个mysql节点若down了一段时间。重新启动的时候需要一些时间去同步数据,服务的启动超时时间不够,导致服务无法启动,解决方法如下:
The correct way to adjust systemd settings so they don’t get overwritten is to create a directory and file as such:
/etc/systemd/system/mariadb.service.d/timeout.conf
[Service]

TimeoutStartSec=12min

或者直接修改/usr/lib/systemd/system/mariadb.service
[Service]

TimeoutStartSec=12min
这里的时间最少要大于90s,默认是90s之后执行 systemctl daemon-reload再重启服务即可

四、日志中出现类似如下错误:
160428 13:54:49 [ERROR] Slave SQL: Error ‘Table ‘manage_operations’ already exists’ on query. Default database: ‘horizon’. Query: ‘CREATE TABLE manage_operations (
id integer AUTO_INCREMENT NOT NULL PRIMARY KEY,
name varchar(50) NOT NULL,
type varchar(20) NOT NULL,
operation varchar(20) NOT NULL,
status varchar(20) NOT NULL,
time date NOT NULL,
operator varchar(50) NOT NULL
) default charset=utf8’, Error_code: 1050
160428 13:54:49 [Warning] WSREP: RBR event 1 Query apply warning: 1, 28585
160428 13:54:49 [Warning] WSREP: Ignoring error for TO isolated action: source: 752eecd1-0ce0-11e6-83fc-3e0502d0bdd2 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 24053 trx_id: -1 seqnos (l: 28668, g: 28585, s: 28584, d: 28584, ts: 80224119986850)
导致进程异常关闭,
此时可以通过执行mysqladmin flush-tables来刷新表项,这个问题的原因是三个节点之间的表同步存在问题,刷新一下表即可

五、日志出现以下错误:

160820 3:13:41 [ERROR] Error in accept: Too many open files
160820 3:19:42 [ERROR] Error in accept: Too many open files
160827 3:16:24 [ERROR] Error in accept: Too many open files
160831 17:20:52 [ERROR] Error in accept: Too many open files
160831 19:54:29 [ERROR] Error in accept: Too many open files
160831 20:21:53 [ERROR] Error in accept: Too many open files
160901 11:25:57 [ERROR] Error in accept: Too many open files

解决方法

vim /usr/lib/systemd/system/mariadb.service

[Service]
LimitNOFILE=10000

默认的mysql的open_file_limits是1024将该项增大,并且修改vim /etc/my.cnf.d/server.cnf该文件的open_files_limit值

systemctl daemon-reload

systemctl restart mariadb

查看mysql的open_file_limits值是否调整成功

cat /proc/$pid/limit

其中$pid为mysql进程的pid看看值是否调整成功,并看看日志是否还会出现上述错误

总结一下:

当所有节点都宕机的情况下,再次启动节点,Mariadb-Galera集群会出现无法启动的现象
Mariadb-Galera集群启动是有顺序的,遵循一个原则:最后宕机的最先启动,因为集群认为这个节点的数据是最新的
在生产环境下应该避免使用大事务,不建议在高并发写入场景下使用Galera Cluster架构,会导致集群限流,从而引起整个集群hang住,出现生产故障。针对这种情况可以考虑主从,实现读写分离等手段
对数据一致性要求较高,并且数据写入不频繁,数据库容量也不大(50GB左右),网络状况良好的情况下,可以考虑使用Galera方案