Etcd是一个分布式键值存储系统,Kubernetes使用Etcd进行数据存储,所以要先准备一个Etcd数据库,为解决Etcd单点故障,应采用集群方式部署,这里使用3台组建集群,可容忍1台机器故障。由于Etcd集群需要选举产生 leader,所以集群节点数目需要为奇数来保证正常进行选举。
说明:
使用5台组建集群,可容忍2台机器故障
使用7台组建集群,可容忍3台机器故障,
使用9台组建集群,可容忍4台机器故障
etcd集群也可以与k8s节点机器复用,只要apiserver能连接到就行。
这里使用三台服务器单独部署etcd集群
k8s集群搭建之安装cfssl证书生成工具
https://www.osyunwei.com/archives/12072.html
先在一台k8s-etcd服务器上操作
1、生成Etcd证书
1.1自签etcd证书颁发机构(CA)
创建工作目录
mkdir -p /opt/tls/etcd
cd /opt/tls/etcd
创建ca配置文件
cat > etcdca-config.json << EOF
{
"signing": {
"default": {
"expiry": "87600h"
},
"profiles": {
"etcd": {
"expiry": "87600h",
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
]
}
}
}
}
EOF
创建ca证书签名请求文件
cat > etcdca-csr.json << EOF
{
"CN": "etcdca",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"L": "Beijing",
"ST": "Beijing"
}
]
}
EOF
生成证书(etcdca.pem和etcdca-key.pem)命令:
cfssl gencert -initca etcdca-csr.json | cfssljson -bare etcdca –
1.2使用自签CA签发Etcd HTTPS证书
创建证书申请文件:
cd /opt/tls/etcd
cat > etcd-csr.json << EOF
{
"CN": "etcd",
"hosts": [
"127.0.0.1",
"192.168.21.201",
"192.168.21.202",
"192.168.21.203",
"k8s-master01",
"k8s-master02",
"k8s-master03"
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"L": "BeiJing",
"ST": "BeiJing"
}
]
}
EOF
注:上述文件hosts字段中ip为etcd集群服务器ip地址,一个都不能少,为了方便后期扩容可以多写几个规划的ip。
生成证书命令(etcd.pem和etcd-key.pem):
cfssl gencert -ca=etcdca.pem -ca-key=etcdca-key.pem -config=etcdca-config.json -profile=etcd etcd-csr.json | cfssljson -bare etcd
#证书文件已经生成了,在/opt/tls/etcd目录,后面会用到。
2、安装Etcd
使用二进制文件来安装,先在一台k8s-master01服务器上操作
#下载二进制软件包etcd-v3.5.16-linux-amd64.tar.gz
https://github.com/etcd-io/etcd/releases/download/v3.5.16/etcd-v3.5.16-linux-amd64.tar.gz
#创建工作目录并解压二进制包
mkdir /opt/etcd/{bin,cfg,ssl,data,check} -p
tar zxvf etcd-v3.5.16-linux-amd64.tar.gz
mv etcd-v3.5.16-linux-amd64/etcd* /opt/etcd/bin/
#添加执行权限
chmod +x /opt/etcd/bin/*
vi /etc/profile #把etcd服务加入系统环境变量,在最后添加下面这一行
export PATH=$PATH:/opt/etcd/bin/
:wq! #保存退出
source /etc/profile #使配置立即生效
#查看版本
etcd --version
3、创建Etcd配置文件
cat > /opt/etcd/cfg/etcd.yaml << EOF
# [Member]
name: "k8s-master01"
data-dir: "/opt/etcd/data/"
wal-dir: "/opt/etcd/data/"
listen-peer-urls: "https://192.168.21.201:2380"
listen-client-urls: "https://192.168.21.201:2379,https://127.0.0.1:2379"
logger: "zap"
# [Clustering]
initial-advertise-peer-urls: "https://192.168.21.201:2380"
advertise-client-urls: "https://192.168.21.201:2379"
initial-cluster: "k8s-master01=https://192.168.21.201:2380,k8s-master02=https://192.168.21.202:2380,k8s-master03=https://192.168.21.203:2380"
initial-cluster-token: "etcd-cluster"
initial-cluster-state: "new"
# [Security]
client-transport-security:
cert-file: "/opt/etcd/ssl/etcd.pem"
key-file: "/opt/etcd/ssl/etcd-key.pem"
client-cert-auth: true
trusted-ca-file: "/opt/etcd/ssl/etcdca.pem"
auto-tls: true
peer-transport-security:
key-file: "/opt/etcd/ssl/etcd-key.pem"
cert-file: "/opt/etcd/ssl/etcd.pem"
client-cert-auth: true
trusted-ca-file: "/opt/etcd/ssl/etcdca.pem"
auto-tls: true
EOF
#特别注意yaml文件的格式缩进
4、设置systemd管理Etcd
cat > /usr/lib/systemd/system/etcd.service << EOF
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
[Service]
User=root
Type=notify
WorkingDirectory=/opt/etcd/data/
Restart=always
#Restart=on-failure
RestartSec=10s
LimitNOFILE=65536
ExecStart=/opt/etcd/bin/etcd --config-file=/opt/etcd/cfg/etcd.yaml
[Install]
WantedBy=multi-user.target
EOF
5、拷贝Etcd证书文件
cp /opt/tls/etcd/etcdca.pem /opt/etcd/ssl/
cp /opt/tls/etcd/etcdca-key.pem /opt/etcd/ssl/
cp /opt/tls/etcd/etcd.pem /opt/etcd/ssl/
cp /opt/tls/etcd/etcd-key.pem /opt/etcd/ssl/
6、分发Etcd安装配置文件
在其中一台k8s-master01服务器上操作完成之后,需要把etcd安装配置文件分发到etcd集群内所有节点上。
当然也可以在集群内每一台服务器上重复上面的步骤进行安装。
scp -r /opt/etcd/ root@192.168.21.202:/opt/
scp -r /opt/etcd/ root@192.168.21.203:/opt/
scp /usr/lib/systemd/system/etcd.service root@192.168.21.202:/usr/lib/systemd/system/
scp /usr/lib/systemd/system/etcd.service root@192.168.21.203:/usr/lib/systemd/system/
#然后在两台服务器上分别修改etcd.yaml配置文件中的主机名和当前服务器ip地址
vi /opt/etcd/cfg/etcd.yaml
# [Member]
name: "k8s-master01" #修改为每个节点自己的主机名
data-dir: "/opt/etcd/data/"
wal-dir: "/opt/etcd/data/"
listen-peer-urls: "https://192.168.21.201:2380" #修改为每个节点自己的ip地址
#修改为每个节点自己的ip地址
listen-client-urls: "https://192.168.21.201:2379,https://127.0.0.1:2379"
logger: "zap"
# [Clustering]
initial-advertise-peer-urls: "https://192.168.21.201:2380" #修改为每个节点自己的ip地址
advertise-client-urls: "https://192.168.21.201:2379" #修改为每个节点自己的ip地址
#下面的参数三个节点都一样
initial-cluster: "k8s-master01=https://192.168.21.201:2380,k8s-master02=https://192.168.21.202:2380,k8s-master03=https://192.168.21.203:2380"
initial-cluster-token: "etcd-cluster"
initial-cluster-state: "new"
# [Security]
client-transport-security:
cert-file: "/opt/etcd/ssl/etcd.pem"
key-file: "/opt/etcd/ssl/etcd-key.pem"
client-cert-auth: true
trusted-ca-file: "/opt/etcd/ssl/etcdca.pem"
auto-tls: true
peer-transport-security:
key-file: "/opt/etcd/ssl/etcd-key.pem"
cert-file: "/opt/etcd/ssl/etcd.pem"
client-cert-auth: true
trusted-ca-file: "/opt/etcd/ssl/etcdca.pem"
auto-tls: true
:wq! #保存退出
#在两台服务器上操作,把etcd服务加入系统环境变量
vi /etc/profile #在最后添加下面这一行
export PATH=$PATH:/opt/etcd/bin/
:wq! #保存退出
source /etc/profile #使配置立即生效
7、启动Etcd并设置开机启动
同时启动三台服务器上的etcd
systemctl daemon-reload
systemctl enable etcd
systemctl start etcd
如果有问题先看日志: journalctl -u etcd systemctl status etcd
然后根据日志提示再排查解决问题
8、查看集群状态
ETCDCTL_API=3 /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/etcdca.pem --cert=/opt/etcd/ssl/etcd.pem --key=/opt/etcd/ssl/etcd-key.pem --endpoints="https://192.168.21.201:2379,https://192.168.21.202:2379,https://192.168.21.203:2379" --write-out=table endpoint health
+-----------------------------+--------+-------------+-------+
| ENDPOINT | HEALTH | TOOK | ERROR |
+-----------------------------+--------+-------------+-------+
| https://192.168.21.203:2379 | true | 20.816075ms | |
| https://192.168.21.202:2379 | true | 22.193996ms | |
| https://192.168.21.201:2379 | true | 21.069051ms | |
+-----------------------------+--------+-------------+-------+
vi /opt/etcd/check/check_etcd.sh
#!/bin/bash
# 设置基本参数
ETCDCTL="/opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/etcdca.pem --cert=/opt/etcd/ssl/etcd.pem --key=/opt/etcd/ssl/etcd-key.pem --write-out=table --endpoints=https://192.168.21.201:2379,https://192.168.21.202:2379,https://192.168.21.203:2379"
# 检查是否设置环境变量
export ETCDCTL_API=3
# 根据输入参数执行不同的操作
case "$1" in
"health")
echo "Checking cluster endpoint health..."
$ETCDCTL endpoint health
;;
"status")
echo "Listing all endpoint statuses..."
$ETCDCTL endpoint status
;;
"list")
echo "Listing all cluster members..."
$ETCDCTL member list
;;
*)
echo "Usage: $0 {health|status|list}"
echo "Please specify a valid command."
exit 1
;;
esac
:wq! #保存退出
chmod +x /opt/etcd/check/check_etcd.sh #添加执行权限
sh /opt/etcd/check/check_etcd.sh health
sh /opt/etcd/check/check_etcd.sh status
sh /opt/etcd/check/check_etcd.sh list
9、模拟集群节点故障
模拟集群中1个节点192.168.21.203发生故障,故障修复后,必须以新的身份加入集群
systemctl stop etcd #停止etcd服务
5.9.1检查集群健康状态
/opt/etcd/bin/etcdctl endpoint health -w table --cacert=/opt/etcd/ssl/etcdca.pem --cert=/opt/etcd/ssl/etcd.pem --key=/opt/etcd/ssl/etcd-key.pem --endpoints="https://192.168.21.201:2379,https://192.168.21.202:2379,https://192.168.21.203:2379"
/opt/etcd/bin/etcdctl member list -w table --cacert=/opt/etcd/ssl/etcdca.pem --cert=/opt/etcd/ssl/etcd.pem --key=/opt/etcd/ssl/etcd-key.pem --endpoints="https://192.168.21.201:2379,https://192.168.21.202:2379,https://192.168.21.203:2379"
#查询到故障节点的ID
5.9.2移除故障节点
/opt/etcd/bin/etcdctl member remove 83045a3c3a751464 --cacert=/opt/etcd/ssl/etcdca.pem --cert=/opt/etcd/ssl/etcd.pem --key=/opt/etcd/ssl/etcd-key.pem --endpoints="https://192.168.21.201:2379,https://192.168.21.202:2379,https://192.168.21.203:2379"
5.9.3清空故障节点的数据目录(在故障节点操作)
cd /opt/etcd/data
rm -rf *
5.9.4扩容集群
使用member add命令重新添加故障节点
/opt/etcd/bin/etcdctl member add k8s-master03 --peer-urls=https://192.168.21.203:2380 --cacert=/opt/etcd/ssl/etcdca.pem --cert=/opt/etcd/ssl/etcd.pem --key=/opt/etcd/ssl/etcd-key.pem --endpoints="https://192.168.21.201:2379,https://192.168.21.202:2379,https://192.168.21.203:2379"
5.9.5在故障节点上修改配置文件参数 initial-cluster-state: "existing" 并启动etcd
vi /opt/etcd/cfg/etcd.yaml
initial-cluster-state: "existing"
:wq! #保存退出
systemctl start etcd #启动
5.9.6查询集群状态
/opt/etcd/bin/etcdctl endpoint status --cacert=/opt/etcd/ssl/etcdca.pem --cert=/opt/etcd/ssl/etcd.pem --key=/opt/etcd/ssl/etcd-key.pem --endpoints="https://192.168.21.201:2379,https://192.168.21.202:2379,https://192.168.21.203:2379"
5.9.7修改集群所有节点的配置参数
新建集群的时候,initial-cluster-state参数这个值为new,当集群正常运行时,所有节点的 initial-cluster-state参数应该设置为"existing",可以让节点在重启时直接加入现有集群,防止节点重启后又重新初始化集群,导致集群不一致,
vi /opt/etcd/cfg/etcd.yaml
initial-cluster-state: "existing"
:wq! #保存退出
systemctl reload-daemon
systemctl restart etcd
至此,k8s集群搭建之部署Etcd集群完成。