To periodically check status of the cluster, create a script (/tmp/xx):

#!/bin/sh
mysql -e \
"SHOW GLOBAL STATUS WHERE Variable_name IN (
'wsrep_cert_deps_distance',
'wsrep_cluster_size',
'wsrep_cluster_status',
'wsrep_connected',
'wsrep_evs_delayed',
'wsrep_flow_control_paused',
'wsrep_flow_control_paused_ns',
'wsrep_flow_control_recv',
'wsrep_flow_control_sent',
'wsrep_local_index',
'wsrep_local_state',
'wsrep_local_state_comment',
'wsrep_ready',
'wsrep_replicated',
'wsrep_replicated_bytes');"

if credentials are in a custom path: mysql --defaults-file=/path/to/.my.cnf -s -e

And run it with:

watch sh /tmp/xx

If single node alive (ERROR 1047 WSREP has not yet prepared node for application use):

SET GLOBAL wsrep_provider_options='pc.bootstrap=YES';

This node can be used now has the new master so others nodes can recover from it.

The below statement will give us an idea of the amount of data replicated by Galera. Run the following statement on one of the Galera nodes during peak hours (tested on MariaDB >10.0 and PXC >5.6, galera >3.x):

SET @start := (SELECT SUM(VARIABLE_VALUE/1024/1024) FROM information_schema.global_status WHERE VARIABLE_NAME LIKE 'WSREP%bytes'); do sleep(60); SET @end := (SELECT SUM(VARIABLE_VALUE/1024/1024) FROM information_schema.global_status WHERE VARIABLE_NAME LIKE 'WSREP%bytes'); SET @gcache := (SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(@@GLOBAL.wsrep_provider_options,'gcache.size = ',-1), 'M', 1)); SELECT ROUND((@end - @start),2) AS `MB/min`, ROUND((@end - @start),2) * 60 as `MB/hour`, @gcache as `gcache Size(MB)`, ROUND(@gcache/round((@end - @start),2),2) as `Time to full(minutes)`;

Will output something like: +--------+---------+-----------------+-----------------------+ | MB/min | MB/hour | gcache Size(MB) | Time to full(minutes) | +--------+---------+-----------------+-----------------------+ | 7.95 | 477.00 | 128 | 16.10 | +--------+---------+-----------------+-----------------------+

src: https://severalnines.com/database-blog/improve-performance-galera-cluster-mysql-or-mariadb

🔗grastate.dat

Try to recover from the node with max seqno, for example this is a node that shows a graceful shutdown:

version: 2.1
uuid: cbd332a9-f617-11e2-b77d-3ee9fa637069
seqno: 43760

This grastate.dat file shows -1 in the seqno. This node crashed during transaction processing:

version: 2.1
uuid: cbd332a9-f617-11e2-b77d-3ee9fa637069
seqno: -1

This node crashed during DDL:

version: 2.1
uuid: 00000000-0000-0000-0000-000000000000
seqno: -1