现象
现象是ovs占用cpu比较高,查看ovs的stack发现一直陷入内核代码,但是一直被schedule出去;
- 查看进程file descriptor限制
[root@node-1 zjp]# ulimit -u 102400
- 查看
ovs-vswitchd
进程打开的文件数[root@node-1 zjp]# lsof -p $(pidof ovs-vswitchd) |grep -c GENERIC 102409
可以看到文件句柄确实达到了进程限制
初步分析
-
之前遇到过类似的问题,主要就是ovs内部导致每创建一个port,就会为每个线程创建一个
netlink sock
sock_num = ports * n-handler-threads; Vn-handler-threads= online_cpu* 3/4 -1;
根据目前提供的信息,计算下来是没有超过
ovs-swicthd
进程的max open files 102400
https://bugzilla.redhat.com/show_bug.cgi?id=1526306
,我们的版本已经优化了该问题; -
使用strace -p
发现该进程一直在access,但是因为进程已经到了limit了,导致无法正常运行,但是发现
每打印如下一行,ovs的fd就会“残留”下来socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC) = 1415
-
查看ovs的日志
[root@node-1 ~]# tailf /var/log/openvswitch/ovs-vswitchd.log 2021-04-15T12:36:29.723Z|00595|bridge|INFO|bridge br-int: added interface ha-d489f511-9f on port 17744 2021-04-15T12:36:30.062Z|00596|bridge|INFO|bridge br-int: added interface ha-24939610-b4 on port 17745 2021-04-15T12:36:32.790Z|00597|bridge|INFO|bridge br-int: deleted interface ha-d489f511-9f on port 17744 2021-04-15T12:36:33.229Z|00598|bridge|INFO|bridge br-int: added interface ha-d489f511-9f on port 17746 2021-04-15T12:36:41.807Z|00599|bridge|INFO|bridge br-int: deleted interface ha-24939610-b4 on port 17745 2021-04-15T12:36:41.809Z|00600|bridge|INFO|bridge br-int: deleted interface ha-d489f511-9f on port 17746 2021-04-15T12:36:41.821Z|00601|bridge|INFO|bridge br-int: added interface ha-d489f511-9f on port 17746 2021-04-15T12:36:42.175Z|00602|bridge|INFO|bridge br-int: added interface ha-24939610-b4 on port 17747 2021-04-15T12:36:44.846Z|00603|bridge|INFO|bridge br-int: deleted interface ha-d489f511-9f on port 17746 2021-04-15T12:36:45.324Z|00604|bridge|INFO|bridge br-int: added interface ha-d489f511-9f on port 17748 2021-04-15T12:36:54.105Z|00605|bridge|INFO|bridge br-int: deleted interface ha-24939610-b4 on port 17747 2021-04-15T12:36:54.107Z|00606|bridge|INFO|bridge br-int: deleted interface ha-d489f511-9f on port 17748 2021-04-15T12:36:54.119Z|00607|bridge|INFO|bridge br-int: added interface ha-d489f511-9f on port 17748 2021-04-15T12:36:54.454Z|00608|bridge|INFO|bridge br-int: added interface ha-24939610-b4 on port 17749 2021-04-15T12:36:56.506Z|00609|bridge|INFO|bridge br-int: deleted interface ha-d489f511-9f on port 17748 2021-04-15T12:36:57.013Z|00610|bridge|INFO|bridge br-int: added interface ha-d489f511-9f on port 17750
可以发现ovs在不停的add/delete
ha-d489f511-9f
,更明确的信息就是没次打印到deleted interface ha-d489f511-9f
就会残留一个fd集合上面的strace信息应该就是残留了port的netlink的socket句柄(fd
); -
查看neutron-l3-agent的日志
2021-04-16 16:36:50.925 24849 ERROR neutron.agent.l3.agent [-] Failed to process compatible router: c863cb21-1b5a-427a-aecc-bc6b1bb22571
2021-04-16 16:36:50.925 24849 ERROR neutron.agent.l3.agent Traceback (most recent call last):
2021-04-16 16:36:50.925 24849 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 533, in _process_router_update
2021-04-16 16:36:50.925 24849 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 468, in _process_router_if_compatible
2021-04-16 16:36:50.925 24849 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 473, in _process_added_router
2021-04-16 16:36:50.925 24849 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 376, in _router_added
2021-04-16 16:36:50.925 24849 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2021-04-16 16:36:50.925 24849 ERROR neutron.agent.l3.agent self.force_reraise()
2021-04-16 16:36:50.925 24849 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2021-04-16 16:36:50.925 24849 ERROR neutron.agent.l3.agent six.reraise(self.type_, self.value, self.tb)
2021-04-16 16:36:50.925 24849 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 365, in _router_added
2021-04-16 16:36:50.925 24849 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 118, in initialize
2021-04-16 16:36:50.925 24849 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 360, in spawn_state_change_monitor
2021-04-16 16:36:50.925 24849 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/linux/external_process.py", line 94, in enable
2021-04-16 16:36:50.925 24849 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 913, in execute
2021-04-16 16:36:50.925 24849 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 148, in execute
2021-04-16 16:36:50.925 24849 ERROR neutron.agent.l3.agent ProcessExecutionError: Exit code: 1; Stdin: ; Stdout: ; Stderr: Guru meditation now registers SIGUSR1 and SIGUSR2 by default for backward compatibility.
SIGUSR1 will no longer be registered in a future release, so please use SIGUSR2 to generate reports.
2021-04-16 16:36:50.925 24849 ERROR neutron.agent.l3.agent Traceback (most recent call last):
查看neutron.agent.l3.agent
的日志发现这个pod在add
一个router
会出现错误,进入delete
的流程,但是出现错误了,试着将neutron.agent.l3.agent
服务sleep掉不再不停的delete
和add
tap设备,现在ovs-vswitchd
的fd设备已经不再增加,可以确认就是neutron.agent.l3.agent
服务引起的问题,根据这个现象,尝试在neutron每次删除失败后执行如下命令:
ovs-vsctl del-port br-int ha-d489f511-9f
可以正确释放新增的fd
小结:
查看了neutron.agent.l3.agent 日志发现应该和https://bugzilla.redhat.com/show_bug.cgi?id=1508091
这个bug一样的,在 while cleaning up a router namespace
出错了导致ovs也出现问题了。
根据现在现象分析,经过修改neutron的代码,发现确实可以修改该问题,但是为什么会出现这个现象,ovs在该问题又是如何运行的,需要深入调查一下。
深入调查
1、查看了一个neutron的代码发现:
except Exception:
with excutils.save_and_reraise_exception():
del self.router_info[router_id]
LOG.exception(_LE('Error while initializing router %s'),
router_id)
self.namespaces_manager.ensure_router_cleanup(router_id)
try:
ri.delete()
except Exception:
LOG.exception(_LE('Error while deleting router %s'),
router_id)
会执行ensure_router_cleanup
的动作,里面会清楚网络的namespace
def delete(self):
ns_ip = ip_lib.IPWrapper(namespace=self.name)
for d in ns_ip.get_devices(exclude_loopback=True,
exclude_gre_devices=True):
if d.name.startswith(INTERNAL_DEV_PREFIX):
# device is on default bridge
self.driver.unplug(d.name, namespace=self.name,
prefix=INTERNAL_DEV_PREFIX) // (1)
elif d.name.startswith(ROUTER_2_FIP_DEV_PREFIX):
ns_ip.del_veth(d.name) (2)
elif d.name.startswith(EXTERNAL_DEV_PREFIX): (3)
self.driver.unplug(
d.name,
bridge=self.agent_conf.external_network_bridge,
namespace=self.name,
prefix=EXTERNAL_DEV_PREFIX)
super(RouterNamespace, self).delete() (4)
在执行删除RouterNamespace
之前会del相关的netdev设备,如(1)、(3),(2)删除veth设备,但是前缀都是
NS_PREFIX = 'qrouter-'
INTERNAL_DEV_PREFIX = 'qr-'
EXTERNAL_DEV_PREFIX = 'qg-'
# TODO(Carl) It is odd that this file needs this. It is a dvr detail.
ROUTER_2_FIP_DEV_PREFIX = 'rfp-'
没有如ha-d489f511-9f
以ha开头的internal设备。
2、使用脚本尝试复现:
ip netns add ovs-test
ovs-vsctl --timeout=10 --oneline --format=json -- add-port br-int ha-d489f511-9f -- set Interface ha-d489f511-9f type=internal external_ids:iface-id=d489f511-9f9f-49e9-b772-f43bf3bfa04b external_ids:iface-status=active external_ids:attached-mac=fa:16:3e:4a:52:36
ip link set netns ovs-test ha-d489f511-9f
ip netns del ovs-test
sleep 1
ovs-vsctl --timeout=10 --oneline --format=json -- --if-exists del-port ha-d489f511-9f
发现可以复现环境的问题,关键就是在del-port之前把namespace删除,就是ovs的fd不会随着删除的动作相应的close socket。
3、使用systemtap(见附件)抓了一下代码发现正常和非正常的代码:
正常:
-----the delete strace-------
0xffff845a4b6c : dpif_netlink_port_del+0x0/0x134 [/usr/lib64/libopenvswitch-2.12.so.0.0.0]
0xffff844d05cc : dpif_port_del+0x80/0xe4 [/usr/lib64/libopenvswitch-2.12.so.0.0.0]
0xffff8474fee4 : port_del+0x9c/0xb4 [/usr/lib64/libofproto-2.12.so.0.0.0]
0xffff8473fd4c : ofproto_port_del+0x68/0xd0 [/usr/lib64/libofproto-2.12.so.0.0.0]
0xaaaac2e5e53c : bridge_delete_or_reconfigure_ports+0x120/0x370 [/usr/sbin/ovs-vswitchd]
0xaaaac2e5f704 : bridge_reconfigure+0x40c/0x2f0c [/usr/sbin/ovs-vswitchd]
0xaaaac2e629cc : bridge_run+0x220/0x19dc [/usr/sbin/ovs-vswitchd]
0xaaaac2e5a3a0 : main+0x3d4/0x534 [/usr/sbin/ovs-vswitchd]
0xffff83da1714 : __libc_start_main+0xf0/0x1cc [/usr/lib64/libc-2.17.so]
0xaaaac2e5a554 : _start+0x38/0x3c [/usr/sbin/ovs-vswitchd]
-----the del_cached_port strace-------
the name = ha-d489f511-9f
0xffff8457e1e0 : shash_find_and_delete+0x0/0x38 [/usr/lib64/libopenvswitch-2.12.so.0.0.0]
0xffff8473ae6c : ofport_destroy__+0xb0/0xdc [/usr/lib64/libofproto-2.12.so.0.0.0]
0xffff8473debc : ofport_remove+0x50/0x78 [/usr/lib64/libofproto-2.12.so.0.0.0]
0xffff8473f308 : update_port+0x88/0x2e0 [/usr/lib64/libofproto-2.12.so.0.0.0]
0xffff8473fd88 : ofproto_port_del+0xa4/0xd0 [/usr/lib64/libofproto-2.12.so.0.0.0]
0xaaaac2e5e53c : bridge_delete_or_reconfigure_ports+0x120/0x370 [/usr/sbin/ovs-vswitchd]
0xaaaac2e5f704 : bridge_reconfigure+0x40c/0x2f0c [/usr/sbin/ovs-vswitchd]
0xaaaac2e629cc : bridge_run+0x220/0x19dc [/usr/sbin/ovs-vswitchd]
0xaaaac2e5a3a0 : main+0x3d4/0x534 [/usr/sbin/ovs-vswitchd]
0xffff83da1714 : __libc_start_main+0xf0/0x1cc [/usr/lib64/libc-2.17.so]
0xaaaac2e5a554 : _start+0x38/0x3c [/usr/sbin/ovs-vswitchd]
非正常:
-----the del_cached_port strace-------
the name = ha-d489f511-9f
0xffff8457e1e0 : shash_find_and_delete+0x0/0x38 [/usr/lib64/libopenvswitch-2.12.so.0.0.0]
0xffff8473ae6c : ofport_destroy__+0xb0/0xdc [/usr/lib64/libofproto-2.12.so.0.0.0]
0xffff8473debc : ofport_remove+0x50/0x78 [/usr/lib64/libofproto-2.12.so.0.0.0]
0xffff8473f308 : update_port+0x88/0x2e0 [/usr/lib64/libofproto-2.12.so.0.0.0]
0xffff8473fa9c : ofproto_run+0x53c/0x680 [/usr/lib64/libofproto-2.12.so.0.0.0]
0xaaaac2e5d450 : bridge_run__+0x17c/0x1d8 [/usr/sbin/ovs-vswitchd]
0xaaaac2e60824 : bridge_reconfigure+0x152c/0x2f0c [/usr/sbin/ovs-vswitchd]
0xaaaac2e629cc : bridge_run+0x220/0x19dc [/usr/sbin/ovs-vswitchd]
0xaaaac2e5a3a0 : main+0x3d4/0x534 [/usr/sbin/ovs-vswitchd]
0xffff83da1714 : __libc_start_main+0xf0/0x1cc [/usr/lib64/libc-2.17.so]
0xaaaac2e5a554 : _start+0x38/0x3c [/usr/sbin/ovs-vswitchd]
可以看到正常的流程应该是在函数之后走两个分支
bridge_delete_or_reconfigure_ports
-->ofproto_port_del
-->ofport_remove (1)
-->port_del (2)
而非正常情况下是完全没有走到ofproto_port_del
函数,只是在bridge_run
检查到ofport异常的时候执行了ofport_remove
,看一下bridge_delete_or_reconfigure_ports 函数:
20 ┊ OFPROTO_PORT_FOR_EACH (&ofproto_port, &dump, br->ofproto) {
19 ┊ ┊ ┊ ofp_port_t requested_ofp_port;
18 ┊ ┊ ┊ struct iface *iface;
17
16 ┊ ┊ ┊ sset_add(&ofproto_ports, ofproto_port.name);
15
14 ┊ ┊ ┊ iface = iface_lookup(br, ofproto_port.name);
13 ┊ ┊ ┊ if (!iface) {
12 ┊ ┊ ┊ ┊ ┊ /* No such iface is configured, so we should delete this
11 ┊ ┊ ┊ ┊ ┊ ┊* ofproto_port.
10 ┊ ┊ ┊ ┊ ┊ ┊*
9 ┊ ┊ ┊ ┊ ┊ ┊* As a corner case exception, keep the port if it's a bond fake
8 ┊ ┊ ┊ ┊ ┊ ┊* interface. */
7 ┊ ┊ ┊ ┊ ┊ if (bridge_has_bond_fake_iface(br, ofproto_port.name)
6 ┊ ┊ ┊ ┊ ┊ ┊ ┊ && !strcmp(ofproto_port.type, "internal")) {
5 ┊ ┊ ┊ ┊ ┊ ┊ ┊ continue;
4 ┊ ┊ ┊ ┊ ┊ }
3 ┊ ┊ ┊ ┊ ┊ goto delete;
2 ┊ ┊ ┊ }
1
通过抓包也可以验证,残留fd的情况iface_lookup(br, ofproto_port.name);
函数的ofproto_port.name
都没有
ha-d489f511-9f
这个设备,所以也就走不到 goto delete的
ofproto_port_del
16 ┊ delete:
15 ┊ ┊ ┊ iface_destroy(iface);
14 ┊ ┊ ┊ del = add_ofp_port(ofproto_port.ofp_port, del, &n, &allocated);
13 ┊ }
12 ┊ for (i = 0; i < n; i++) {
11 ┊ ┊ ┊ ofproto_port_del(br->ofproto, del[i]);
10 ┊ }
9 ┊ free(del);
8
现在问题就是在ovs delete port
之前把网络的namespace删除,
dpif_netlink_port_del
函数里面会有close socket的流程,因为异常情况走不到这个分支,所以就“残留”在这个进程里面。
ovs del的时候会经过的代码段,也是在这里出现问题的
20 ┊ OFPROTO_PORT_FOR_EACH (&ofproto_port, &dump, br->ofproto) { <=======(1)
19 ┊ ┊ ┊ ofp_port_t requested_ofp_port;
18 ┊ ┊ ┊ struct iface *iface;
17
16 ┊ ┊ ┊ sset_add(&ofproto_ports, ofproto_port.name);
┊ ┊ ┊ iface = iface_lookup(br, ofproto_port.name); <=======(2)
13 ┊ ┊ ┊ if (!iface) {
12 ┊ ┊ ┊ ┊ ┊ /* No such iface is configured, so we should delete this
11 ┊ ┊ ┊ ┊ ┊ ┊* ofproto_port.
10 ┊ ┊ ┊ ┊ ┊ ┊*
9 ┊ ┊ ┊ ┊ ┊ ┊* As a corner case exception, keep the port if it's a bond fake
8 ┊ ┊ ┊ ┊ ┊ ┊* interface. */
7 ┊ ┊ ┊ ┊ ┊ if (bridge_has_bond_fake_iface(br, ofproto_port.name)
6 ┊ ┊ ┊ ┊ ┊ ┊ ┊ && !strcmp(ofproto_port.type, "internal")) {
5 ┊ ┊ ┊ ┊ ┊ ┊ ┊ continue;
4 ┊ ┊ ┊ ┊ ┊ }
3 ┊ ┊ ┊ ┊ ┊ goto delete;
2 ┊ ┊ ┊ }
为了验证ovs在del port的时候确实是因为lookup没有相关的name问题,经过gdb调试,在经过(2)此代码时手动改变ofproto_port.name
的值为ha-d489f511-9f
确实可以正确的回收这个netlink的fd;而(2)的来源是(1) dump所有的dpif的设备的结果,所名dpif已经被删除了,打开dpif debug日志也证实这一点
2021-04-26T08:13:16.076Z|00938|dpif|DBG|system@ovs-system: failed to query port ha-d489f511-9f: No such device
2021-04-26T08:13:16.078Z|00939|dpif|DBG|system@ovs-system: failed to query port ha-d489f511-9f: No such device
具体代码:
dpif_port_query_by_name
dpif_port_query_by_name(const struct dpif *dpif, const char *devname,
struct dpif_port *port)
{
....
VLOG_RL(&error_rl, error == ENODEV ? VLL_DBG : VLL_WARN,
"%s: failed to query port %s: %s",
dpif_name(dpif), devname, ovs_strerror(error));
}
return error;
}
到这里基本已经可以知道是namespace被删除的时候虚拟设备被删除才导致ovs无法找到该设备;
- 看一下kernel关于namespace被删除的相关代码
ops_exit_list146 static void ops_exit_list(const struct pernet_operations *ops, 147 ┊ struct list_head *net_exit_list) 148 { 149 struct net *net; 150 if (ops->exit) { <--------------(1) 151 list_for_each_entry(net, net_exit_list, exit_list) 152 ops->exit(net); 153 } 154 if (ops->exit_batch) <--------------(2) 155 ops->exit_batch(net_exit_list); 156 }
(1)出的具体的函数
default_device_exit
:9380 static void __net_exit default_device_exit(struct net *net) 9381 { 9382 struct net_device *dev, *aux; ...... 9396 /* Leave virtual devices for the generic cleanup */ 9397 if (dev->rtnl_link_ops) 9398 continue; ..... 9411 rtnl_unlock(); 9412 }
在这个函数里面检测到是virtual devices会放在generic 去cleanup,也就是会执行到(2)
default_device_exit_batch9442 static void __net_exit default_device_exit_batch(struct list_head *net_list) 9443 { ..... 9464 rtnl_lock_unregistering(net_list); 9465 list_for_each_entry(net, net_list, exit_list) { 9466 for_each_netdev_reverse(net, dev) { 9467 if (dev->rtnl_link_ops && dev->rtnl_link_ops->dellink) 9468 dev->rtnl_link_ops->dellink(dev, &dev_kill_list); 9469 else 9470 unregister_netdevice_queue(dev, &dev_kill_list); <=======(a) 9471 } 9472 } 9473 unregister_netdevice_many(&dev_kill_list); 9474 rtnl_unlock(); 9475 }
可以看到最终会走到(a)处去执行netdev设备的清理删除工作
贴一下intelnal的设备的代码:
141 static void do_setup(struct net_device *netdev)
142 {
143 ether_setup(netdev);
144
145 netdev->max_mtu = ETH_MAX_MTU;
146
147 netdev->netdev_ops = &internal_dev_netdev_ops;
....
155 netdev->rtnl_link_ops = &internal_dev_link_ops;
......
167 }
有设置netdev->rtnl_link_ops 结构体,该结构体的内容
137 static struct rtnl_link_ops internal_dev_link_ops __read_mostly = {
138 .kind = "openvswitch",
139 };
140
可以看到没有设置dev->rtnl_link_ops->dellink,所以就会上面分析的结果,走到default_device_exit_batch
执行真正的删除。
小结:主要是namespace在删除的时候会去清楚internal类型的netdevice,ovs没有找到是符合逻辑的,主要还是neutron这块代码有问题,在走到异常路径的时候只把其他类型的删除了,没有把ha-xxxx类型的设备删除导致的ovs fd残留。
附件:
#!/usr/bin/env stap
probe process("/usr/lib64/libopenvswitch-2.12.so.0").function("shash_find_and_delete"){
if(user_string($name)=="ha-d489f511-9f"){
printf("\n-----the del_cached_port strace-------\n");
printf("the name = %s\n", user_string($name));
print_ubacktrace();
}
}
probe process("/usr/lib64/libopenvswitch-2.12.so.0").function("dpif_netlink_port_del"){
printf("\n-----the delete strace-------\n");
print_ubacktrace();
}