Redis哨兵模式

Redis哨兵模式

Scroll Down

哨兵简介

一定要有一个概念:哨兵实例也是特殊的Redis实例,也就是哨兵实例是独立的进程,多个哨兵实例可以搭建主从(Master-Slave),它们承担的职责和普通的Redis实例不一样,下面是官方文档中对哨兵的介绍:
Redis哨兵为Redis提供了高可用性,意味着可以使用哨兵创建Redis服务部署,该部署可以在无需人工干预的情况下抵御某些类型的故障。Redis哨兵还提供其他功能,如监视、通知,并且为客户端提供配置入口(acts as a configuration provider for clients),下面是Redis哨兵提供的完整功能列表:
监控(Monitoring):Sentinel会不断检查Master实例和Slave实例是否按预期工作。
通知(Notification):Sentinel可以通过API进行通知受监控的Redis实例出现问题。
自动故障转移(Automatic Failover):如果Master实例未按预期工作,则Sentinel可以启动故障转移程序,在该过程中,会将一个Slave实例提升为Master实例,将其他Slave实例重新配置为使用新的Master实例,并且会通知使用Redis实例的应用程序获取新的地址、连接信息。
提供配置入口(Configuration provider):Sentinel充当客户端服务发现的授权来源(a source of authority)**:客户端连接到Sentinel,可以询问Redis服务群中的Master实例的地址。如果发生故障转移,Sentinel将通知客户端新的Master实例的地址
Sentinel的分布式性质
Redis Sentinel是一个分布式系统,Sentinel采用同一份配置多个Sentinel进程共同协作运行的设计,多Sentinel进程协作的优势如下:

  1. 多个Sentinel实例就给定的主机不再可用这一事实达成共识时,将执行故障检测,从而降低了误报的可能性
  2. Sentinel群中即使不是所有Sentinel处于可用状态,Sentinel群仍然能够正常工作,进行故障转移

哨兵搭建

当前的Redis哨兵版本称为哨兵2,哨兵版本1是Redis 2.6的时候引入,现在已经过期,不推荐使用,官方文档中部署哨兵的示例中指出:一个健壮的部署至少需要三个Sentinel实例。再加上一般情况下,普通的Redis服务实例为了保证健壮性需要搭建树状主从,至少建议部署三个实例,这里的部署拓扑图如下:
redissaobing.png

环境配置

按照部署拓扑图,一共部署6个Redis实例,3个普通的Redis实例组成Master-Slave,并且是树状主从,3个Redis哨兵实例,为了简单起见,6个Redis实例部署在3个虚拟机中,注意在生产或者测试环境要分散机器部署,避免所有鸡蛋放在同一个篮子出现机器单点故障,具体信息如下:

实例标识角色IP端口备注
Sentinel-1192.168.200.70192.168.200.7026379
Sentinel-2192.168.200.71192.168.200.7126379
Sentinel-3192.168.200.72192.168.200.7226379
Redis-1Master192.168.200.706379
Redis-2Slave192.168.200.716379redis1的从节点
Redis-3Slave192.168.200.726379redis2的从节点

Sentinel配置

[root@redis01 ~]# cat /usr/local/redis/bin/sentinel.conf|grep -v "#"
port 26379

daemonize yes
bind 0.0.0.0

pidfile "/var/run/redis-sentinel.pid"

logfile "/var/log/sentinel.log"


dir "/tmp"

sentinel deny-scripts-reconfig yes

sentinel monitor mymaster 192.168.200.71 6379 2  #监控的redis Master

sentinel config-epoch mymaster 1

sentinel leader-epoch mymaster 1

sentinel down-after-milliseconds doge-master 30000
sentinel parallel-syncs doge-master 1
sentinel failover-timeout doge-master 180000

其余两个配置文件类似,只是IP不同而已

redis配置

[root@redis01 ~]# cat /usr/local/redis/bin/redis.conf|grep -v "#"
port 6380
daemonize yes
bind 0.0.0.0
protected-mode no
pidfile "/var/run/redis_6379.pid"
loglevel notice
logfile "/var/log/redis.log"

dir /data/redis
dbfilename "dump-6380.rdb"

从节点需要额外添加一行配置

slaveof 192.168.200.70 6379  #redis主节点IP

依次启动主节点、两个从节点和3个Sentinel(可以把命令写成一个start.sh,调用sh start.sh)
查看哨兵的配置,发现被Redis修改,新增了发现的主从信息和哨兵实例信息:

[root@redis01 bin]# cat sentinel.conf|grep -v "#"
port 26379

daemonize yes
bind 0.0.0.0
pidfile "/var/run/redis-sentinel.pid"
logfile "/var/log/sentinel.log"
dir "/tmp"
sentinel myid 2f50085ed043d7ca1cb5aa046f1cda5fb8b2c5ba

sentinel deny-scripts-reconfig yes

sentinel monitor mymaster 192.168.200.71 6379 2

sentinel config-epoch mymaster 1

sentinel leader-epoch mymaster 1

protected-mode no
sentinel known-replica mymaster 192.168.200.72 6379
sentinel known-replica mymaster 192.168.200.71 6379
sentinel known-sentinel mymaster 192.168.200.72 26379 c72a84b4882b989b6b2f0184cf66b055cdad15d7
sentinel known-sentinel mymaster 192.168.200.71 26379 b8d0a1f0955546e9fba85674dc20d99f16191206
sentinel current-epoch 1

查看一下哨兵实例的日志:

[root@redis02 bin]# tailf /var/log/sentinel.log
36305:X 09 May 2020 18:03:31.199 # +failover-state-select-slave master mymaster 192.168.200.70 6379
36305:X 09 May 2020 18:03:31.251 # +selected-slave slave 192.168.200.71:6379 192.168.200.71 6379 @ mymaster 192.168.200.70 6379
36305:X 09 May 2020 18:03:31.251 * +failover-state-send-slaveof-noone slave 192.168.200.71:6379 192.168.200.71 6379 @ mymaster 192.168.200.70 6379
36305:X 09 May 2020 18:03:31.342 * +failover-state-wait-promotion slave 192.168.200.71:6379 192.168.200.71 6379 @ mymaster 192.168.200.70 6379
36305:X 09 May 2020 18:03:32.144 # +promoted-slave slave 192.168.200.71:6379 192.168.200.71 6379 @ mymaster 192.168.200.70 6379
36305:X 09 May 2020 18:03:32.144 # +failover-state-reconf-slaves master mymaster 192.168.200.70 6379
36305:X 09 May 2020 18:03:32.215 # +failover-end master mymaster 192.168.200.70 6379
36305:X 09 May 2020 18:03:32.215 # +switch-master mymaster 192.168.200.70 6379 192.168.200.70 6379
36305:X 09 May 2020 18:03:32.215 * +slave slave 192.168.200.71:6379 192.168.200.71 6379 @ mymaster 192.168.200.70 6379
36305:X 09 May 2020 18:03:32.283 * +slave slave 192.168.200.72:6379 192.168.200.72 6379 @ mymaster 192.168.200.70 6379

目前哨兵和Redis服务都正常运作

模拟故障转移

官方文档中建议使用测试命令让Redis实例Sleep一个时间,从而触发故障转移

redis-cli -p [port] DEBUG sleep 30

先查看当前的Master实例:

[root@redis01 ~]# redis-cli -p 26379 
127.0.0.1:26379> SENTINEL get-master-addr-by-name doge-master
1) "192.168.200.70"
2) "6379"

再对Master实例执行Sleep命令:

redis-cli -p 6379 DEBUG sleep 40

该命令会阻塞直到40秒后,控制台释放后,再查看当前的Master实例:

127.0.0.1:26379> SENTINEL get-master-addr-by-name doge-master
1) "192.168.200.72"
2) "6379"
127.0.0.1:26379>

可见,已经成功切换Master实例为redis03那么,当前的Master-Slave的拓扑关系到底是怎么样的?这个时候先看一下Sentinel的日志:

[root@redis01 bin]# tail -f /var/log/sentinel.log
36305:X 09 May 2020 18:03:32.283 * +slave slave 192.168.200.72:6379 192.168.200.72 6379 @ mymaster 192.168.200.71 6379
36305:X 10 May 2020 12:10:45.610 # +sdown master mymaster 192.168.200.71 6379
36305:X 10 May 2020 12:10:46.059 # +new-epoch 2
36305:X 10 May 2020 12:10:46.060 # +vote-for-leader c72a84b4882b989b6b2f0184cf66b055cdad15d7 2
36305:X 10 May 2020 12:10:46.721 # +odown master mymaster 192.168.200.71 6379 #quorum 2/2
36305:X 10 May 2020 12:10:46.721 # Next failover delay: I will not start a failover before Sun May 10 12:16:46 2020
36305:X 10 May 2020 12:10:47.156 # +config-update-from sentinel c72a84b4882b989b6b2f0184cf66b055cdad15d7 192.168.200.72 26379 @ mymaster 192.168.200.71 6379
36305:X 10 May 2020 12:10:47.156 # +switch-master mymaster 192.168.200.71 6379 192.168.200.72 6379
36305:X 10 May 2020 12:10:47.157 * +slave slave 192.168.200.70:6379 192.168.200.70 6379 @ mymaster 192.168.200.72 6379
36305:X 10 May 2020 12:10:47.157 * +slave slave 192.168.200.71:6379 192.168.200.71 6379 @ mymaster 192.168.200.72 6379


[root@redis01 bin]# redis-cli
127.0.0.1:6379> info replication
# Replication
role:slave
master_host:192.168.200.71
master_port:6379
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:13634287
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:6a6b532ddeeaeccfc231f8ead06a112aa91d29b2
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:13634287
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:13629518
repl_backlog_histlen:4770

发现redis01变成了从节点,选举出redis03为主节点

[root@redis03 ~]# redis-cli 
127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=192.168.200.70,port=6379,state=online,offset=13685974,lag=0
slave1:ip=192.168.200.71,port=6379,state=online,offset=13685974,lag=1
master_replid:6a6b532ddeeaeccfc231f8ead06a112aa91d29b2
master_replid2:68015972710ca1218953f9911a49ed1fbedb0819
master_repl_offset:13685974
second_repl_offset:13625177
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:12637399
repl_backlog_histlen:1048576

再检查了一下旧的主节点redis的配置:

[root@redis02 bin]# cat redis.conf|grep -v "#"
bind 0.0.0.0
protected-mode yes
port 6379
tcp-backlog 511
timeout 0
tcp-keepalive 300
daemonize yes
......
aof-rewrite-incremental-fsync yes
rdb-save-incremental-fsync yes
replicaof 192.168.200.72 6379

发现,最后一行被新增了内容,它成为了从节点
redissb.png

小结

Redis哨兵搭建相对简单,但是需要注意Redis主从配置和Sentinel配置,一些命令可以直接写成shell脚本方便一键shutdown或者重启,在测试故障转移的时候发现了树状主从会变成一主多从,这个问题后面会分析