We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
问题描述/What happened: k3s 集群搭建完毕后,添加 2 台 x86 宿主机正常,添加 arm 机器时报错如下(以上机器在同机房):
[info 2024-08-23 15:31:27 hostinfo.NewNIC(hostinfohelper.go:241)] IP 172.20.1.115/br0/enp1s0f0np0 [info 2024-08-23 15:31:27 hostbridge.(*SBaseBridgeDriver).ConfirmToConfig(hostbridge.go:180)] bridge br0 already has ip 172.20.1.115 [info 2024-08-23 15:31:27 hostinfo.NewNIC(hostinfohelper.go:291)] Confirm to configuration!! [info 2024-08-23 15:31:27 hostinfo.NewNIC(hostinfohelper.go:241)] IP /br1/bond0 [info 2024-08-23 15:31:27 netutils2.(*SNetInterface).IsSecretAddress(netutils.go:352)] MASK --- [info 2024-08-23 15:31:27 hostinfo.NewNIC(hostinfohelper.go:291)] Confirm to configuration!! [info 2024-08-23 15:31:27 hostinfo.(*SNIC).SetupDhcpRelay(hostinfohelper.go:203)] Not enable dhcp relay on nic: &hostinfo.SNIC{Inter:"enp1s0f0np0", Bridge:"br0", Ip:"172.20.1.115", Wire:"", WireId:"", Mask:24, Bandwidth:1000, BridgeDev:(*hostbridge.SOVSBridgeDriver)(0x4002578630), dhcpServer:(*hostdhcp.SGuestDHCPServer)(0x4002579020)} [info 2024-08-23 15:31:27 hostinfo.(*SNIC).SetupDhcpRelay(hostinfohelper.go:203)] Not enable dhcp relay on nic: &hostinfo.SNIC{Inter:"bond0", Bridge:"br1", Ip:"", Wire:"bcast1", WireId:"", Mask:0, Bandwidth:1000, BridgeDev:(*hostbridge.SOVSBridgeDriver)(0x4001e365a0), dhcpServer:(*hostdhcp.SGuestDHCPServer)(0x4001e36ff0)} [info 2024-08-23 15:31:27 hostinfo.(*SHostInfo).setupOvnChassis(hostinfo.go:223)] Start setting up ovn chassis goroutine 1 [running]: runtime/debug.Stack() /usr/lib/go/src/runtime/debug/stack.go:24 +0x68 runtime/debug.PrintStack() /usr/lib/go/src/runtime/debug/stack.go:16 +0x20 yunion.io/x/onecloud/pkg/util/ovnutils.InitOvn.func1() /root/go/src/yunion.io/x/onecloud/pkg/util/ovnutils/ovnutils.go:125 +0x40 panic({0x2644460, 0x40008d4780}) /usr/lib/go/src/runtime/panic.go:838 +0x20c yunion.io/x/onecloud/pkg/util/ovnutils.mustPrepOvsdbConfig({{0x40016e8120, 0x1b}, {0x40016c1930, 0x5}, {0x0, 0x0}, {0x0, 0x0}, 0x5dc, {0x40016c1958, ...}, ...}) /root/go/src/yunion.io/x/onecloud/pkg/util/ovnutils/ovnutils.go:93 +0x5a0 yunion.io/x/onecloud/pkg/util/ovnutils.InitOvn({{0x40016e8120, 0x1b}, {0x40016c1930, 0x5}, {0x0, 0x0}, {0x0, 0x0}, 0x5dc, {0x40016c1958, ...}, ...}) /root/go/src/yunion.io/x/onecloud/pkg/util/ovnutils/ovnutils.go:130 +0xc0 yunion.io/x/onecloud/pkg/hostman/hostinfo.(*OvnHelper).Init(...) /root/go/src/yunion.io/x/onecloud/pkg/hostman/hostinfo/hostovn.go:41 yunion.io/x/onecloud/pkg/hostman/hostinfo.(*SHostInfo).setupOvnChassis(0x4000b81ce0?) /root/go/src/yunion.io/x/onecloud/pkg/hostman/hostinfo/hostinfo.go:225 +0x15c yunion.io/x/onecloud/pkg/hostman/hostinfo.(*SHostInfo).Init(0x51354d0?) /root/go/src/yunion.io/x/onecloud/pkg/hostman/hostinfo/hostinfo.go:210 +0xe8 yunion.io/x/onecloud/pkg/hostman.(*SHostService).RunService(0x4000236398?) /root/go/src/yunion.io/x/onecloud/pkg/hostman/host_services.go:80 +0x5c yunion.io/x/onecloud/pkg/cloudcommon/service.(*SServiceBase).StartService(0x400000e108) /root/go/src/yunion.io/x/onecloud/pkg/cloudcommon/service/services.go:58 +0xe0 yunion.io/x/onecloud/pkg/hostman.StartService(...) /root/go/src/yunion.io/x/onecloud/pkg/hostman/host_services.go:167 main.main() /root/go/src/yunion.io/x/onecloud/cmd/host/main.go:30 +0x124 goroutine 1 [running]: runtime/debug.Stack() /usr/lib/go/src/runtime/debug/stack.go:24 +0x68 runtime/debug.PrintStack() /usr/lib/go/src/runtime/debug/stack.go:16 +0x20 yunion.io/x/log.Fatalf({0x2b3a53c, 0x1c}, {0x400052be88, 0x1, 0x1}) /root/go/src/yunion.io/x/onecloud/vendor/yunion.io/x/log/log.go:138 +0x34 yunion.io/x/onecloud/pkg/hostman.(*SHostService).RunService(0x4000236398?) /root/go/src/yunion.io/x/onecloud/pkg/hostman/host_services.go:81 +0x90 yunion.io/x/onecloud/pkg/cloudcommon/service.(*SServiceBase).StartService(0x400000e108) /root/go/src/yunion.io/x/onecloud/pkg/cloudcommon/service/services.go:58 +0xe0 yunion.io/x/onecloud/pkg/hostman.StartService(...) /root/go/src/yunion.io/x/onecloud/pkg/hostman/host_services.go:167 main.main() /root/go/src/yunion.io/x/onecloud/cmd/host/main.go:30 +0x124 [fatal 2024-08-23 15:32:17 hostman.(*SHostService).RunService(host_services.go:81)] Host instance init error: Setup OVN Chassis: normalize db host: dns lookup (default-ovn-north) failed: lookup default-ovn-north on 10.96.0.10:53: read udp 10.96.0.10:45069->10.96.0.10:53: i/o timeout
可能的原因是:
[fatal 2024-08-23 15:32:17 hostman.(*SHostService).RunService(host_services.go:81)] Host instance init error: Setup OVN Chassis: normalize db host: dns lookup (default-ovn-north) failed: lookup default-ovn-north on 10.96.0.10:53: read udp 10.96.0.10:45069->10.96.0.10:53: i/o timeout
------问题补充 根据 https://www.cloudpods.org/docs/operations/k8s/dnserror/#calico%E9%9A%A7%E9%81%93%E5%8D%8F%E8%AE%AE%E7%9A%84%E5%88%87%E6%8D%A2 排查Pod网络问题文档,查询结果如下:
[root@anode-01 ~]# ipvsadm -Ln | grep -A 3 10.96.0.10 TCP 10.96.0.10:53 rr -> 10.40.75.28:53 Masq 1 0 0 TCP 10.96.0.10:9153 rr -> 10.40.75.28:9153 Masq 1 0 0 TCP 10.96.124.119:30357 rr -> 10.40.75.14:30357 Masq 1 0 0 -- UDP 10.96.0.10:53 rr -> 10.40.75.28:53 Masq 1 0 64 [root@anode-01 ~]# ip route | grep 10.40.75 10.40.75.0/26 via 172.20.1.200 dev tunl0 proto bird onlink [root@anode-01 ~]# ping 172.20.1.200 PING 172.20.1.200 (172.20.1.200) 56(84) 字节的数据。 64 字节,来自 172.20.1.200: icmp_seq=1 ttl=64 时间=0.210 毫秒 64 字节,来自 172.20.1.200: icmp_seq=2 ttl=64 时间=0.143 毫秒
解决过程1: 尝试关闭 gso,实测无效
[root@anode-01 ~]# ethtool -K enp1s0f0np0 gso off [root@anode-01 ~]# ethtool -k enp1s0f0np0 | grep generic-segmentation-offload generic-segmentation-offload: off
解决过程2: 控制节点-抓包结果
listening on br0, link-type EN10MB (Ethernet), snapshot length 262144 bytes 09:24:38.662983 IP 172.20.1.115 > 172.20.1.200: IP 10.40.106.128.34303 > 10.40.75.28.53: 56371+ AAAA? default-keystone.onecloud.svc.cluster.local. (61) 09:24:38.663066 IP 172.20.1.115 > 172.20.1.200: IP 10.40.106.128.49959 > 10.40.75.28.53: 16876+ A? default-keystone.onecloud.svc.cluster.local. (61) 09:24:38.806921 IP 172.20.1.115 > 172.20.1.200: IP 10.40.106.128.39662 > 10.40.75.28.53: 39831+ AAAA? default-keystone. (34) 09:24:38.806924 IP 172.20.1.115 > 172.20.1.200: IP 10.40.106.128.60606 > 10.40.75.28.53: 44761+ A? default-keystone. (34) 09:24:39.328664 IP 172.20.1.115 > 172.20.1.200: IP 10.40.106.128.16256 > 10.40.75.28.53: 48135+ A? default-apimap. (32) 09:24:39.328674 IP 172.20.1.115 > 172.20.1.200: IP 10.40.106.128.31164 > 10.40.75.28.53: 51897+ AAAA? default-apimap. (32)
过程3: 使用 v3.11.5 版本重新搭建了一个管理端尝试,正常:
查看了两个版本的changelog,v3.11.5 版本安装的计算节点,gso是关闭的。 有没有可能是这个导致的?但是我在 v3.11.6 计算阶段手动关闭了 gso 没有生效,重启机器后 gso 又是打开的,需要额外配置或重启某服务吗?
问题解决 Solved:在 host.conf 中添加以下配置并重启,恢复正常
ethtool_enable_gso: false
环境/Environment:
cat /etc/os-release
NAME="openEuler" VERSION="22.03 (LTS-SP3)" ID="openEuler" VERSION_ID="22.03" PRETTY_NAME="openEuler 22.03 (LTS-SP3)" ANSI_COLOR="0;31"
uname -a
Linux anode-01.icc.local 5.10.0-224.0.0.127.oe2203sp3.aarch64 #1 SMP Wed Aug 21 15:03:40 CST 2024 aarch64 aarch64 aarch64 GNU/Linux
dmidecode | egrep -i 'manufacturer|product' |sort -u
Manufacturer: HiSilicon Manufacturer: Huawei Manufacturer: HUAWEI Manufacturer: JINGSHIJI Memory Subsystem Controller Manufacturer ID: Unknown Memory Subsystem Controller Product ID: Unknown Module Manufacturer ID: Unknown Module Product ID: Unknown Product Name: BC82AMDYA Product Name: TaiShan 200 (Model 2280)
kubectl exec -n onecloud $(kubectl get pods -n onecloud | grep climc | awk '{print $1}') -- climc version-list
[root@cloud-mgr ocboot]# climc version-list Get "https://172.20.1.200:30898/version": dial tcp 172.20.1.200:30898: connect: connection refused Get "https://172.20.1.200:30443/version": dial tcp 172.20.1.200:30443: connect: connection refused +---------------+--------------------------------------------+ | Field | Value | +---------------+--------------------------------------------+ | ansible | release/3.11.6(6184774c6e24081713) | | apimap | release/3.11.6(6184774c6e24081713) | | cloudmon | release/3.11.6(6184774c6e24081713) | | cloudproxy | release/3.11.6(6184774c6e24081713) | | compute_v2 | release/3.11.6(6184774c6e24081712) | | devtool | release/3.11.6(6184774c6e24081712) | | identity | release/3.11.6(6184774c6e24081713) | | image | release/3.11.6(6184774c6e24081712) | | k8s | heads/v3.11.6-20240815.1(e6c3e48724081712) | | log | release/3.11.6(6184774c6e24081712) | | monitor | release/3.11.6(6184774c6e24081713) | | notify | release/3.11.6(6184774c6e24081713) | | scheduledtask | release/3.11.6(6184774c6e24081713) | | scheduler | release/3.11.6(6184774c6e24081712) | | vpcagent | release/3.11.6(6184774c6e24081713) | | webconsole | release/3.11.6(6184774c6e24081712) | | yunionconf | release/3.11.6(6184774c6e24081712) | +---------------+--------------------------------------------+
The text was updated successfully, but these errors were encountered:
No branches or pull requests
问题描述/What happened:
k3s 集群搭建完毕后,添加 2 台 x86 宿主机正常,添加 arm 机器时报错如下(以上机器在同机房):
可能的原因是:
------问题补充
根据 https://www.cloudpods.org/docs/operations/k8s/dnserror/#calico%E9%9A%A7%E9%81%93%E5%8D%8F%E8%AE%AE%E7%9A%84%E5%88%87%E6%8D%A2 排查Pod网络问题文档,查询结果如下:
解决过程1:
尝试关闭 gso,实测无效
解决过程2:
控制节点-抓包结果
过程3:
使用 v3.11.5 版本重新搭建了一个管理端尝试,正常:
查看了两个版本的changelog,v3.11.5 版本安装的计算节点,gso是关闭的。
有没有可能是这个导致的?但是我在 v3.11.6 计算阶段手动关闭了 gso 没有生效,重启机器后 gso 又是打开的,需要额外配置或重启某服务吗?
问题解决 Solved:在 host.conf 中添加以下配置并重启,恢复正常
环境/Environment:
cat /etc/os-release
):uname -a
):dmidecode | egrep -i 'manufacturer|product' |sort -u
)kubectl exec -n onecloud $(kubectl get pods -n onecloud | grep climc | awk '{print $1}') -- climc version-list
):The text was updated successfully, but these errors were encountered: