k8s集群搭建之部署Etcd集群

现在的位置: 首页＞Kubernetes＞正文

上篇下篇

k8s集群搭建之部署Etcd集群

2022年03月28日 ⁄ Kubernetes ⁄ 暂无评论 ⁄ 被围观 11,972次+

Etcd是一个分布式键值存储系统，Kubernetes使用Etcd进行数据存储，所以要先准备一个Etcd数据库，为解决Etcd单点故障，应采用集群方式部署，这里使用3台组建集群，可容忍1台机器故障。由于Etcd集群需要选举产生 leader，所以集群节点数目需要为奇数来保证正常进行选举。

说明：

使用5台组建集群，可容忍2台机器故障

使用7台组建集群，可容忍3台机器故障，

使用9台组建集群，可容忍4台机器故障

etcd集群也可以与k8s节点机器复用，只要apiserver能连接到就行。

这里使用三台服务器单独部署etcd集群

k8s集群搭建之安装cfssl证书生成工具

https://www.osyunwei.com/archives/12072.html

先在一台k8s-etcd服务器上操作

1、生成Etcd证书

1.1自签etcd证书颁发机构（CA）

创建工作目录

mkdir -p /opt/tls/etcd

cd /opt/tls/etcd

创建ca配置文件

cat > etcdca-config.json << EOF

{

"signing": {

"default": {

"expiry": "87600h"

"profiles": {

"etcd": {

"expiry": "87600h",

"usages": [

"signing",

"key encipherment",

"server auth",

"client auth"

]

}

EOF

创建ca证书签名请求文件

cat > etcdca-csr.json << EOF

{

"CN": "etcdca",

"key": {

"algo": "rsa",

"size": 2048

"names": [

{

"C": "CN",

"L": "Beijing",

"ST": "Beijing"

}

]

}

EOF

生成证书（etcdca.pem和etcdca-key.pem）命令：

cfssl gencert -initca etcdca-csr.json | cfssljson -bare etcdca –

1.2使用自签CA签发Etcd HTTPS证书

创建证书申请文件：

cd /opt/tls/etcd

cat > etcd-csr.json << EOF

{

"CN": "etcd",

"hosts": [

"127.0.0.1",

"192.168.21.201",

"192.168.21.202",

"192.168.21.203",

"k8s-master01",

"k8s-master02",

"k8s-master03"

"key": {

"algo": "rsa",

"size": 2048

"names": [

{

"C": "CN",

"L": "BeiJing",

"ST": "BeiJing"

}

]

}

EOF

注：上述文件hosts字段中ip为etcd集群服务器ip地址，一个都不能少，为了方便后期扩容可以多写几个规划的ip。

生成证书命令（etcd.pem和etcd-key.pem）：

cfssl gencert -ca=etcdca.pem -ca-key=etcdca-key.pem -config=etcdca-config.json -profile=etcd etcd-csr.json | cfssljson -bare etcd

#证书文件已经生成了，在/opt/tls/etcd目录，后面会用到。

2、安装Etcd

使用二进制文件来安装，先在一台k8s-master01服务器上操作

#下载二进制软件包etcd-v3.5.16-linux-amd64.tar.gz

https://github.com/etcd-io/etcd/releases/download/v3.5.16/etcd-v3.5.16-linux-amd64.tar.gz

#创建工作目录并解压二进制包

mkdir /opt/etcd/{bin,cfg,ssl,data,check} -p

tar zxvf etcd-v3.5.16-linux-amd64.tar.gz

mv etcd-v3.5.16-linux-amd64/etcd* /opt/etcd/bin/

#添加执行权限

chmod +x /opt/etcd/bin/*

vi /etc/profile #把etcd服务加入系统环境变量，在最后添加下面这一行

export PATH=$PATH:/opt/etcd/bin/

:wq! #保存退出

source /etc/profile #使配置立即生效

#查看版本

etcd --version

3、创建Etcd配置文件

cat > /opt/etcd/cfg/etcd.yaml << EOF

# [Member]

name: "k8s-master01"

data-dir: "/opt/etcd/data/"

wal-dir: "/opt/etcd/data/"

listen-peer-urls: "https://192.168.21.201:2380"

listen-client-urls: "https://192.168.21.201:2379,https://127.0.0.1:2379"

logger: "zap"

# [Clustering]

initial-advertise-peer-urls: "https://192.168.21.201:2380"

advertise-client-urls: "https://192.168.21.201:2379"

initial-cluster: "k8s-master01=https://192.168.21.201:2380,k8s-master02=https://192.168.21.202:2380,k8s-master03=https://192.168.21.203:2380"

initial-cluster-token: "etcd-cluster"

initial-cluster-state: "new"

# [Security]

client-transport-security:

cert-file: "/opt/etcd/ssl/etcd.pem"

key-file: "/opt/etcd/ssl/etcd-key.pem"

client-cert-auth: true

trusted-ca-file: "/opt/etcd/ssl/etcdca.pem"

auto-tls: true

peer-transport-security:

key-file: "/opt/etcd/ssl/etcd-key.pem"

cert-file: "/opt/etcd/ssl/etcd.pem"

client-cert-auth: true

trusted-ca-file: "/opt/etcd/ssl/etcdca.pem"

auto-tls: true

EOF

#特别注意yaml文件的格式缩进

4、设置systemd管理Etcd

cat > /usr/lib/systemd/system/etcd.service << EOF

[Unit]

Description=Etcd Server

After=network.target

After=network-online.target

Wants=network-online.target

[Service]

User=root

Type=notify

WorkingDirectory=/opt/etcd/data/

Restart=always

#Restart=on-failure

RestartSec=10s

LimitNOFILE=65536

ExecStart=/opt/etcd/bin/etcd --config-file=/opt/etcd/cfg/etcd.yaml

[Install]

WantedBy=multi-user.target

EOF

5、拷贝Etcd证书文件

cp /opt/tls/etcd/etcdca.pem /opt/etcd/ssl/

cp /opt/tls/etcd/etcdca-key.pem /opt/etcd/ssl/

cp /opt/tls/etcd/etcd.pem /opt/etcd/ssl/

cp /opt/tls/etcd/etcd-key.pem /opt/etcd/ssl/

6、分发Etcd安装配置文件

在其中一台k8s-master01服务器上操作完成之后，需要把etcd安装配置文件分发到etcd集群内所有节点上。

当然也可以在集群内每一台服务器上重复上面的步骤进行安装。

scp -r /opt/etcd/ root@192.168.21.202:/opt/

scp -r /opt/etcd/ root@192.168.21.203:/opt/

scp /usr/lib/systemd/system/etcd.service root@192.168.21.202:/usr/lib/systemd/system/

scp /usr/lib/systemd/system/etcd.service root@192.168.21.203:/usr/lib/systemd/system/

#然后在两台服务器上分别修改etcd.yaml配置文件中的主机名和当前服务器ip地址

vi /opt/etcd/cfg/etcd.yaml

# [Member]

name: "k8s-master01" #修改为每个节点自己的主机名

data-dir: "/opt/etcd/data/"

wal-dir: "/opt/etcd/data/"

listen-peer-urls: "https://192.168.21.201:2380" #修改为每个节点自己的ip地址

#修改为每个节点自己的ip地址

listen-client-urls: "https://192.168.21.201:2379,https://127.0.0.1:2379"

logger: "zap"

# [Clustering]

initial-advertise-peer-urls: "https://192.168.21.201:2380" #修改为每个节点自己的ip地址

advertise-client-urls: "https://192.168.21.201:2379" #修改为每个节点自己的ip地址

#下面的参数三个节点都一样

initial-cluster: "k8s-master01=https://192.168.21.201:2380,k8s-master02=https://192.168.21.202:2380,k8s-master03=https://192.168.21.203:2380"

initial-cluster-token: "etcd-cluster"

initial-cluster-state: "new"

# [Security]

client-transport-security:

cert-file: "/opt/etcd/ssl/etcd.pem"

key-file: "/opt/etcd/ssl/etcd-key.pem"

client-cert-auth: true

trusted-ca-file: "/opt/etcd/ssl/etcdca.pem"

auto-tls: true

peer-transport-security:

key-file: "/opt/etcd/ssl/etcd-key.pem"

cert-file: "/opt/etcd/ssl/etcd.pem"

client-cert-auth: true

trusted-ca-file: "/opt/etcd/ssl/etcdca.pem"

auto-tls: true

:wq! #保存退出

#在两台服务器上操作，把etcd服务加入系统环境变量

vi /etc/profile #在最后添加下面这一行

export PATH=$PATH:/opt/etcd/bin/

:wq! #保存退出

source /etc/profile #使配置立即生效

7、启动Etcd并设置开机启动

同时启动三台服务器上的etcd

systemctl daemon-reload

systemctl enable etcd

systemctl start etcd

如果有问题先看日志： journalctl -u etcd systemctl status etcd

然后根据日志提示再排查解决问题

8、查看集群状态

ETCDCTL_API=3 /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/etcdca.pem --cert=/opt/etcd/ssl/etcd.pem --key=/opt/etcd/ssl/etcd-key.pem --endpoints="https://192.168.21.201:2379,https://192.168.21.202:2379,https://192.168.21.203:2379" --write-out=table endpoint health

+-----------------------------+--------+-------------+-------+

+-----------------------------+--------+-------------+-------+

+-----------------------------+--------+-------------+-------+

vi /opt/etcd/check/check_etcd.sh

#!/bin/bash

# 设置基本参数

ETCDCTL="/opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/etcdca.pem --cert=/opt/etcd/ssl/etcd.pem --key=/opt/etcd/ssl/etcd-key.pem --write-out=table --endpoints=https://192.168.21.201:2379,https://192.168.21.202:2379,https://192.168.21.203:2379"

# 检查是否设置环境变量

export ETCDCTL_API=3

# 根据输入参数执行不同的操作

case "$1" in

"health")

echo "Checking cluster endpoint health..."

$ETCDCTL endpoint health

;;

"status")

echo "Listing all endpoint statuses..."

$ETCDCTL endpoint status

;;

"list")

echo "Listing all cluster members..."

$ETCDCTL member list

;;

echo "Usage: $0 {health|status|list}"

echo "Please specify a valid command."

exit 1

;;

esac

:wq! #保存退出

chmod +x /opt/etcd/check/check_etcd.sh #添加执行权限

sh /opt/etcd/check/check_etcd.sh health

sh /opt/etcd/check/check_etcd.sh status

sh /opt/etcd/check/check_etcd.sh list

9、模拟集群节点故障

模拟集群中1个节点192.168.21.203发生故障，故障修复后，必须以新的身份加入集群

systemctl stop etcd #停止etcd服务

5.9.1检查集群健康状态

/opt/etcd/bin/etcdctl endpoint health -w table --cacert=/opt/etcd/ssl/etcdca.pem --cert=/opt/etcd/ssl/etcd.pem --key=/opt/etcd/ssl/etcd-key.pem --endpoints="https://192.168.21.201:2379,https://192.168.21.202:2379,https://192.168.21.203:2379"

/opt/etcd/bin/etcdctl member list -w table --cacert=/opt/etcd/ssl/etcdca.pem --cert=/opt/etcd/ssl/etcd.pem --key=/opt/etcd/ssl/etcd-key.pem --endpoints="https://192.168.21.201:2379,https://192.168.21.202:2379,https://192.168.21.203:2379"

#查询到故障节点的ID

5.9.2移除故障节点

/opt/etcd/bin/etcdctl member remove 83045a3c3a751464 --cacert=/opt/etcd/ssl/etcdca.pem --cert=/opt/etcd/ssl/etcd.pem --key=/opt/etcd/ssl/etcd-key.pem --endpoints="https://192.168.21.201:2379,https://192.168.21.202:2379,https://192.168.21.203:2379"

5.9.3清空故障节点的数据目录（在故障节点操作）

cd /opt/etcd/data

rm -rf *

5.9.4扩容集群

使用member add命令重新添加故障节点

/opt/etcd/bin/etcdctl member add k8s-master03 --peer-urls=https://192.168.21.203:2380 --cacert=/opt/etcd/ssl/etcdca.pem --cert=/opt/etcd/ssl/etcd.pem --key=/opt/etcd/ssl/etcd-key.pem --endpoints="https://192.168.21.201:2379,https://192.168.21.202:2379,https://192.168.21.203:2379"

5.9.5在故障节点上修改配置文件参数 initial-cluster-state: "existing" 并启动etcd

vi /opt/etcd/cfg/etcd.yaml

initial-cluster-state: "existing"

:wq! #保存退出

systemctl start etcd #启动

5.9.6查询集群状态

/opt/etcd/bin/etcdctl endpoint status --cacert=/opt/etcd/ssl/etcdca.pem --cert=/opt/etcd/ssl/etcd.pem --key=/opt/etcd/ssl/etcd-key.pem --endpoints="https://192.168.21.201:2379,https://192.168.21.202:2379,https://192.168.21.203:2379"

5.9.7修改集群所有节点的配置参数

新建集群的时候，initial-cluster-state参数这个值为new，当集群正常运行时，所有节点的 initial-cluster-state参数应该设置为"existing"，可以让节点在重启时直接加入现有集群，防止节点重启后又重新初始化集群，导致集群不一致，

vi /opt/etcd/cfg/etcd.yaml

initial-cluster-state: "existing"

:wq! #保存退出

systemctl reload-daemon

systemctl restart etcd

至此，k8s集群搭建之部署Etcd集群完成。