背景
在k8s发现在net.bridge.bridge-nf-call-iptables=0和net.bridge.bridge-nf-call-iptables=1仍然是可以ping通不同的node节点的服务,按照我们的理解如果net.bridge.bridge-nf-call-iptables=0则在bridge就不会调用iptable的规则进行DNAT转换应该是不会再通的,基于上述的疑问,调查一下这个参数的作用?
基于centos8的系统
kernel 4.18
下面分析这个参数分别在0和1的时候不同的代码路径以及区别
sysctl -w net.bridge.bridge-nf-call-iptables=0
[15:16:42 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:45538->10.222.8.94:80 ffff9fc21973aaf8.0:br_netif_receive_skb
[15:16:42 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:45538->10.222.8.94:80 ffff9fc21973aaf8.0:__netif_receive_skb
[15:16:42 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:45538->10.222.8.94:80 ffff9fc21973aaf8.0:ip_rcv
[15:16:42 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:45538->10.222.8.94:80 ffff9fc21973aaf8.0:2.raw.PREROUTING.ACCEPT
[15:16:42 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:45538->10.222.8.94:80 ffff9fc21973aaf8.0:2.mangle.PREROUTING.ACCEPT
[15:16:42 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:45538->10.222.8.94:80 ffff9fc21973aaf8.0:2.nat.PREROUTING.ACCEPT
[15:16:42 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:45538->10.232.0.118:80 ffff9fc21973aaf8.0:ip_rcv_finish
[15:16:42 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:45538->10.232.0.118:80 ffff9fc21973aaf8.0:2.mangle.FORWARD.ACCEPT
[15:16:42 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:45538->10.232.0.118:80 ffff9fc21973aaf8.0:2.filter.FORWARD.ACCEPT
[15:16:42 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:45538->10.232.0.118:80 ffff9fc21973aaf8.0:ip_output
[15:16:42 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:45538->10.232.0.118:80 ffff9fc21973aaf8.0:2.mangle.POSTROUTING.ACCEPT
[15:16:42 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:45538->10.232.0.118:80 ffff9fc21973aaf8.0:2.nat.POSTROUTING.ACCEPT
[15:16:42 ][4026532008] cni0 0a580ae80001 T_ACK,SYN:10.232.0.118:80->10.232.0.117:45538 ffff9fcb89d5e700.0:ip_rcv_finish
[15:16:42 ][4026532008] cni0 0a580ae80001 T_ACK,SYN:10.232.0.118:80->10.232.0.117:45538 ffff9fcb89d5e700.0:2.mangle.FORWARD.ACCEPT
[15:16:42 ][4026532008] cni0 0a580ae80001 T_ACK,SYN:10.232.0.118:80->10.232.0.117:45538 ffff9fcb89d5e700.0:2.filter.FORWARD.ACCEPT
[15:16:42 ][4026532008] cni0 0a580ae80001 T_ACK,SYN:10.232.0.118:80->10.232.0.117:45538 ffff9fcb89d5e700.0:ip_output
[15:16:42 ][4026532008] cni0 0a580ae80001 T_ACK,SYN:10.232.0.118:80->10.232.0.117:45538 ffff9fcb89d5e700.0:2.mangle.POSTROUTING.ACCEPT
[15:16:42 ][4026532008] cni0 0a580ae80001 T_ACK,SYN:10.222.8.94:80->10.232.0.117:45538 ffff9fcb89d5e700.0:ip_finish_output
[15:16:42 ][4026532008] cni0 0a580ae80075 T_ACK,SYN:10.222.8.94:80->10.232.0.117:45538 ffff9fcb89d5e700.0:__dev_queue_xmit
[15:16:42 ][4026532008] cni0 0a580ae80075 T_ACK,SYN:10.222.8.94:80->10.232.0.117:45538 ffff9fcb89d5e700.0:br_forward
[15:16:42 ][4026532008] cni0 0a580ae80075 T_ACK,SYN:10.222.8.94:80->10.232.0.117:45538 ffff9fcb89d5e700.0:__br_forward
上面可以贴出来一条流的握手的流程之一sync <--------> sync,ack
[15:16:42 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:45538->10.222.8.94:80 ffff9fc21973aaf8.0:br_netif_receive_skb
[15:16:42 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:45538->10.222.8.94:80 ffff9fc21973aaf8.0:__netif_receive_skb
[15:16:42 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:45538->10.222.8.94:80 ffff9fc21973aaf8.0:ip_rcv
[15:16:42 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:45538->10.222.8.94:80 ffff9fc21973aaf8.0:2.raw.PREROUTING.ACCEPT
[15:16:42 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:45538->10.222.8.94:80 ffff9fc21973aaf8.0:2.mangle.PREROUTING.ACCEPT
[15:16:42 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:45538->10.222.8.94:80 ffff9fc21973aaf8.0:2.nat.PREROUTING.ACCEPT
[15:16:42 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:45538->10.232.0.118:80 ffff9fc21973aaf8.0:ip_rcv_finish
[15:16:42 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:45538->10.232.0.118:80 ffff9fc21973aaf8.0:2.mangle.FORWARD.ACCEPT
[15:16:42 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:45538->10.232.0.118:80 ffff9fc21973aaf8.0:2.filter.FORWARD.ACCEPT
[15:16:42 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:45538->10.232.0.118:80 ffff9fc21973aaf8.0:ip_output
[15:16:42 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:45538->10.232.0.118:80 ffff9fc21973aaf8.0:2.mangle.POSTROUTING.ACCEPT
[15:16:42 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:45538->10.232.0.118:80 ffff9fc21973aaf8.0:2.nat.POSTROUTING.ACCEPT
可以看到代码流程为:
br_netif_receive_skb
---> __netif_receive_skb
--->ip_rcv // <=======(1)
2.raw.PREROUTING.ACCEPT
2.mangle.PREROUTING.ACCEPT
2.nat.PREROUTING.ACCEPT
--->ip_rcv_finish
2.mangle.FORWARD.ACCEPT
2.filter.FORWARD.ACCEPT
---> ip_output
2.mangle.POSTROUTING.ACCEPT
2.nat.POSTROUTING.ACCEPT
在[1]上传到协议栈之后经过2.nat.PREROUTING.ACCEPT之后dstip被DNAT,如下面的记录
[15:16:42 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:45538->10.222.8.94:80 ffff9fc21973aaf8.0:2.nat.PREROUTING.ACCEPT
[15:16:42 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:45538->10.232.0.118:80 ffff9fc21973aaf8.0:ip_rcv_finish
这个解释了为什么即使在参数设置为0的情况下还是可以正确的DNAT。
具体的代码如下:
/*
* Main IP Receive routine.
*/
int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev)
{
const struct iphdr *iph;
struct net *net;
u32 len;
/* When the interface is in promisc. mode, drop all the crap
* that it receives, do not try to analyse it.
*/
if (skb->pkt_type == PACKET_OTHERHOST)
goto drop;
...
net = dev_net(dev);
...
return NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING,
net, NULL, skb, dev, NULL,
ip_rcv_finish); <========(1)
drop:
kfree_skb(skb);
out:
return NET_RX_DROP;
}
在(1)的地方会进行DNAT的动作
sysctl -w net.bridge.bridge-nf-call-iptables=1
[15:57:50 ][4026544799] nil 000000000000 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.0:ip_output
[15:57:50 ][4026544799] eth0 000000000000 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.0:ip_finish_output
[15:57:50 ][4026544799] eth0 000000000000 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.0:__dev_queue_xmit
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.3:netif_rx
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.3:__netif_receive_skb
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.3:br_handle_frame
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.0:br_nf_pre_routing
[15:57:50 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.0:2.raw.PREROUTING.ACCEPT
[15:57:50 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.0:2.mangle.PREROUTING.ACCEPT
[15:57:50 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.0:2.nat.PREROUTING.ACCEPT
[15:57:50 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:39270->10.232.0.118:80 ffff9fb92e3fa4f8.0:br_nf_pre_routing_finish
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80076 T_SYN:10.232.0.117:39270->10.232.0.118:80 ffff9fb92e3fa4f8.0:br_handle_frame_finish
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80076 T_SYN:10.232.0.117:39270->10.232.0.118:80 ffff9fb92e3fa4f8.0:br_forward
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80076 T_SYN:10.232.0.117:39270->10.232.0.118:80 ffff9fb92e3fa4f8.0:__br_forward
[15:57:50 ][4026532008] veth5d4ad362 0a580ae80076 T_SYN:10.232.0.117:39270->10.232.0.118:80 ffff9fb92e3fa4f8.0:br_nf_forward_ip
[15:57:50 ][4026532008] veth5d4ad362 0a580ae80076 T_SYN:10.232.0.117:39270->10.232.0.118:80 ffff9fb92e3fa4f8.0:2.mangle.FORWARD.ACCEPT
[15:57:50 ][4026532008] veth5d4ad362 0a580ae80076 T_SYN:10.232.0.117:39270->10.232.0.118:80 ffff9fb92e3fa4f8.0:2.filter.FORWARD.ACCEPT
[15:57:50 ][4026532008] veth5d4ad362 0a580ae80076 T_SYN:10.232.0.117:39270->10.232.0.118:80 ffff9fb92e3fa4f8.0:br_nf_forward_finish
[15:57:50 ][4026532008] veth5d4ad362 0a580ae80076 T_SYN:10.232.0.117:39270->10.232.0.118:80 ffff9fb92e3fa4f8.0:br_forward_finish
[15:57:50 ][4026532008] veth5d4ad362 0a580ae80076 T_SYN:10.232.0.117:39270->10.232.0.118:80 ffff9fb92e3fa4f8.0:br_nf_post_routing
[15:57:50 ][4026532008] veth5d4ad362 0a580ae80076 T_SYN:10.232.0.117:39270->10.232.0.118:80 ffff9fb92e3fa4f8.0:2.mangle.POSTROUTING.ACCEPT
[15:57:50 ][4026532008] veth5d4ad362 0a580ae80076 T_SYN:10.232.0.117:39270->10.232.0.118:80 ffff9fb92e3fa4f8.0:2.nat.POSTROUTING.ACCEPT
[15:57:50 ][4026532008] cni0 0a580ae80001 T_ACK,SYN:10.232.0.118:80->10.232.0.117:39270 ffff9fc2f20f5200.0:br_nf_pre_routing_finish
[15:57:50 ][4026532008] veth5d4ad362 0a580ae80075 T_ACK,SYN:10.232.0.118:80->10.232.0.117:39270 ffff9fc2f20f5200.0:br_handle_frame_finish
[15:57:50 ][4026532008] veth5d4ad362 0a580ae80075 T_ACK,SYN:10.232.0.118:80->10.232.0.117:39270 ffff9fc2f20f5200.0:br_forward
[15:57:50 ][4026532008] veth5d4ad362 0a580ae80075 T_ACK,SYN:10.232.0.118:80->10.232.0.117:39270 ffff9fc2f20f5200.0:__br_forward
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80075 T_ACK,SYN:10.232.0.118:80->10.232.0.117:39270 ffff9fc2f20f5200.0:br_nf_forward_ip
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80075 T_ACK,SYN:10.232.0.118:80->10.232.0.117:39270 ffff9fc2f20f5200.0:2.mangle.FORWARD.ACCEPT
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80075 T_ACK,SYN:10.232.0.118:80->10.232.0.117:39270 ffff9fc2f20f5200.0:2.filter.FORWARD.ACCEPT
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80075 T_ACK,SYN:10.232.0.118:80->10.232.0.117:39270 ffff9fc2f20f5200.0:br_nf_forward_finish
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80075 T_ACK,SYN:10.232.0.118:80->10.232.0.117:39270 ffff9fc2f20f5200.0:br_forward_finish
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80075 T_ACK,SYN:10.232.0.118:80->10.232.0.117:39270 ffff9fc2f20f5200.0:br_nf_post_routing
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80075 T_ACK,SYN:10.232.0.118:80->10.232.0.117:39270 ffff9fc2f20f5200.0:2.mangle.POSTROUTING.ACCEPT
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80075 T_ACK,SYN:10.222.8.94:80->10.232.0.117:39270 ffff9fc2f20f5200.0:br_nf_dev_queue_xmit
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80075 T_ACK,SYN:10.222.8.94:80->10.232.0.117:39270 ffff9fc2f20f5200.0:__dev_queue_xmit
[15:57:50 ][4026544799] eth0 0a580ae80075 T_ACK,SYN:10.222.8.94:80->10.232.0.117:39270 ffff9fc2f20f5200.0:netif_rx
[15:57:50 ][4026544799] eth0 0a580ae80075 T_ACK,SYN:10.222.8.94:80->10.232.0.117:39270 ffff9fc2f20f5200.0:__netif_receive_skb
[15:57:50 ][4026544799] eth0 0a580ae80075 T_ACK,SYN:10.222.8.94:80->10.232.0.117:39270 ffff9fc2f20f5200.0:ip_rcv
[15:57:50 ][4026544799] eth0 0a580ae80075 T_ACK,SYN:10.222.8.94:80->10.232.0.117:39270 ffff9fc2f20f5200.0:ip_rcv_finish
[15:57:50 ][4026544799] nil 000000000000 T_ACK:10.232.0.117:39270->10.222.8.94:80 ffff9fc2f20f5e00.0:ip_output
[15:57:50 ][4026544799] eth0 000000000000 T_ACK:10.232.0.117:39270->10.222.8.94:80 ffff9fc2f20f5e00.0:ip_finish_output
[15:57:50 ][4026544799] eth0 000000000000 T_ACK:10.232.0.117:39270->10.222.8.94:80 ffff9fc2f20f5e00.0:__dev_queue_xmit
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80001 T_ACK:10.232.0.117:39270->10.222.8.94:80 ffff9fc2f20f5e00.3:netif_rx
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80001 T_ACK:10.232.0.117:39270->10.222.8.94:80 ffff9fc2f20f5e00.3:__netif_receive_skb
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80001 T_ACK:10.232.0.117:39270->10.222.8.94:80 ffff9fc2f20f5e00.3:br_handle_frame
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80001 T_ACK:10.232.0.117:39270->10.222.8.94:80 ffff9fc2f20f5e00.0:br_nf_pre_routing
[15:57:50 ][4026532008] cni0 0a580ae80001 T_ACK:10.232.0.117:39270->10.222.8.94:80 ffff9fc2f20f5e00.0:2.raw.PREROUTING.ACCEPT
[15:57:50 ][4026532008] cni0 0a580ae80001 T_ACK:10.232.0.117:39270->10.222.8.94:80 ffff9fc2f20f5e00.0:2.mangle.PREROUTING.ACCEPT
[15:57:50 ][4026532008] cni0 0a580ae80001 T_ACK:10.232.0.117:39270->10.232.0.118:80 ffff9fc2f20f5e00.0:br_nf_pre_routing_finish
上面是一条流的握手的一个流程,摘录其中经过cni0的部分看一下
[15:57:50 ][4026544799] nil 000000000000 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.0:ip_output
[15:57:50 ][4026544799] eth0 000000000000 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.0:ip_finish_output
[15:57:50 ][4026544799] eth0 000000000000 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.0:__dev_queue_xmit
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.3:netif_rx
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.3:__netif_receive_skb
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.3:br_handle_frame
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.0:br_nf_pre_routing
[15:57:50 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.0:2.raw.PREROUTING.ACCEPT
[15:57:50 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.0:2.mangle.PREROUTING.ACCEPT
[15:57:50 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.0:2.nat.PREROUTING.ACCEPT
[15:57:50 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:39270->10.232.0.118:80 ffff9fb92e3fa4f8.0:br_nf_pre_routing_finish
...
[15:57:50 ][4026532008] cni0 0a580ae80001 T_ACK,SYN:10.232.0.118:80->10.232.0.117:39270 ffff9fc2f20f5200.0:br_nf_pre_routing_finish
...
[1-11]是一个package发送sync的过程,[13]是回复ack的,上面可以看出代码流程如下
ip_output
--->ip_finish_output
--->__dev_queue_xmit
--->netif_rx
---->__netif_receive_skb
----> br_handle_frame
----> br_nf_pre_routing // <=======(1)
2.raw.PREROUTING.ACCEPT
2.mangle.PREROUTING.ACCEPT
2.nat.PREROUTING.ACCEPT
---> br_nf_pre_routing_finish
在[1]在bridge调用iptable的规则经过2.nat.PREROUTING.ACCEPT之后dstip被DNAT,如下面的记录
[15:57:50 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.0:2.nat.PREROUTING.ACCEPT
[15:57:50 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:39270->10.232.0.118:80 ffff9fb92e3fa4f8.0:br_nf_pre_routing_finish
这个表明在设置参数为1的时候bridge确实会使用iptable的规则进行DNAT转换
具体的代码:
/* Direct IPv6 traffic to br_nf_pre_routing_ipv6.
* Replicate the checks that IPv4 does on packet reception.
* Set skb->dev to the bridge device (i.e. parent of the
* receiving device) to make netfilter happy, the REDIRECT
* target in particular. Save the original destination IP
* address to be able to detect DNAT afterwards. */
static unsigned int br_nf_pre_routing(void *priv,
struct sk_buff *skb,
const struct nf_hook_state *state)
{
struct nf_bridge_info *nf_bridge;
struct net_bridge_port *p;
struct net_bridge *br;
__u32 len = nf_bridge_encap_header_len(skb);
....
if (!IS_IP(skb) && !IS_VLAN_IP(skb) && !IS_PPPOE_IP(skb)) //<======(1)
return NF_ACCEPT;
....
NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING, state->net, state->sk, skb,
skb->dev, NULL,
br_nf_pre_routing_finish); // <=============(2)
return NF_STOLEN;
}
如果设置参数为1在(1)的时候if条件是不成立的,会走到(2)调用iptable的规则进行DNAT
参数打开/关闭代码路径的区别
贴一下两者代码路径的区别:
关闭参数:
[10:49:16 ][4026544799] nil 000000000000 T_SYN:10.232.0.117:38850->10.222.8.94:80 ffff9fb9c1eadaf8.0:ip_output
[10:49:16 ][4026544799] eth0 000000000000 T_SYN:10.232.0.117:38850->10.222.8.94:80 ffff9fb9c1eadaf8.0:ip_finish_output
[10:49:16 ][4026544799] eth0 000000000000 T_SYN:10.232.0.117:38850->10.222.8.94:80 ffff9fb9c1eadaf8.0:__dev_queue_xmit
[10:49:16 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:38850->10.222.8.94:80 ffff9fb9c1eadaf8.3:netif_rx
[10:49:16 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:38850->10.222.8.94:80 ffff9fb9c1eadaf8.3:__netif_receive_skb
[10:49:16 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:38850->10.222.8.94:80 ffff9fb9c1eadaf8.3:br_handle_frame
[10:49:16 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:38850->10.222.8.94:80 ffff9fb9c1eadaf8.0:br_nf_pre_routing
[10:49:16 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:38850->10.222.8.94:80 ffff9fb9c1eadaf8.0:br_handle_frame_finish
[10:49:16 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:38850->10.222.8.94:80 ffff9fb9c1eadaf8.0:br_pass_frame_up
[10:49:16 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:38850->10.222.8.94:80 ffff9fb9c1eadaf8.0:br_netif_receive_skb
[10:49:16 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:38850->10.222.8.94:80 ffff9fb9c1eadaf8.0:__netif_receive_skb
[10:49:16 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:38850->10.222.8.94:80 ffff9fb9c1eadaf8.0:ip_rcv
[10:49:16 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:38850->10.232.0.118:80 ffff9fb9c1eadaf8.0:ip_rcv_finish
[10:49:16 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:38850->10.232.0.118:80 ffff9fb9c1eadaf8.0:ip_output
打开参数:
[10:51:20 ][4026544799] nil 000000000000 T_SYN:10.232.0.117:44078->10.222.8.94:80 ffff9fb91c5f1cf8.0:ip_output
[10:51:20 ][4026544799] eth0 000000000000 T_SYN:10.232.0.117:44078->10.222.8.94:80 ffff9fb91c5f1cf8.0:ip_finish_output
[10:51:20 ][4026544799] eth0 000000000000 T_SYN:10.232.0.117:44078->10.222.8.94:80 ffff9fb91c5f1cf8.0:__dev_queue_xmit
[10:51:20 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:44078->10.222.8.94:80 ffff9fb91c5f1cf8.3:netif_rx
[10:51:20 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:44078->10.222.8.94:80 ffff9fb91c5f1cf8.3:__netif_receive_skb
[10:51:20 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:44078->10.222.8.94:80 ffff9fb91c5f1cf8.3:br_handle_frame
[10:51:20 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:44078->10.222.8.94:80 ffff9fb91c5f1cf8.0:br_nf_pre_routing
[10:51:20 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:44078->10.232.0.118:80 ffff9fb91c5f1cf8.0:br_nf_pre_routing_finish
[10:51:20 ][4026532008] veth4b9160ed 0a580ae80076 T_SYN:10.232.0.117:44078->10.232.0.118:80 ffff9fb91c5f1cf8.0:br_handle_frame_finish
[10:51:20 ][4026532008] veth4b9160ed 0a580ae80076 T_SYN:10.232.0.117:44078->10.232.0.118:80 ffff9fb91c5f1cf8.0:br_forward
[10:51:20 ][4026532008] veth4b9160ed 0a580ae80076 T_SYN:10.232.0.117:44078->10.232.0.118:80 ffff9fb91c5f1cf8.0:__br_forward
[10:51:20 ][4026532008] veth5d4ad362 0a580ae80076 T_SYN:10.232.0.117:44078->10.232.0.118:80 ffff9fb91c5f1cf8.0:br_nf_forward_ip
[10:51:20 ][4026532008] veth5d4ad362 0a580ae80076 T_SYN:10.232.0.117:44078->10.232.0.118:80 ffff9fb91c5f1cf8.0:br_nf_forward_finish
[10:51:20 ][4026532008] veth5d4ad362 0a580ae80076 T_SYN:10.232.0.117:44078->10.232.0.118:80 ffff9fb91c5f1cf8.0:br_forward_finish
[10:51:20 ][4026532008] veth5d4ad362 0a580ae80076 T_SYN:10.232.0.117:44078->10.232.0.118:80 ffff9fb91c5f1cf8.0:br_nf_post_routing
两者的区别就是如下:
[10:51:20 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:44078->10.222.8.94:80
[10:51:20 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:44078->10.232.0.118:80
就如上面的分析都会进行DNAT的动作,但是执行该动作的位置不同
为什么同样是同一个节点为什么会是这样的呢?
我们看一下这个函数
/* note: already called with rcu_read_lock */
int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
{
struct net_bridge_port *p = br_port_get_rcu(skb->dev);
enum br_pkt_type pkt_type = BR_PKT_UNICAST;
struct net_bridge_fdb_entry *dst = NULL;
struct net_bridge_mdb_entry *mdst;
bool local_rcv, mcast_hit = false;
struct net_bridge *br;
u16 vid = 0;
...
// 判断网桥是否是混杂模式,如果是会复制一份流量到本地
local_rcv = !!(br->dev->flags & IFF_PROMISC);
if (is_multicast_ether_addr(eth_hdr(skb)->h_dest)) {
/* by definition the broadcast is also a multicast address */
if (is_broadcast_ether_addr(eth_hdr(skb)->h_dest)) {
pkt_type = BR_PKT_BROADCAST;
local_rcv = true;
} else {
pkt_type = BR_PKT_MULTICAST;
if (br_multicast_rcv(br, p, skb, vid))
goto drop;
}
}
...
//针对pkt的类型进行相应的处理
switch (pkt_type) {
// 广播包
case BR_PKT_MULTICAST:
mdst = br_mdb_get(br, skb, vid);
if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
br_multicast_querier_exists(br, eth_hdr(skb))) {
if ((mdst && mdst->host_joined) ||
br_multicast_is_router(br)) {
local_rcv = true;
br->dev->stats.multicast++;
}
mcast_hit = true;
} else {
local_rcv = true;
br->dev->stats.multicast++;
}
break;
// 单播包,查看fdb(mac 地址管理表)看是否找到
case BR_PKT_UNICAST:
dst = br_fdb_find_rcu(br, eth_hdr(skb)->h_dest, vid);
default:
break;
}
// 如果转发包里面找到
if (dst) {
unsigned long now = jiffies;
if (dst->is_local) // 目的是本地的直接调用br_pass_frame_up,上送到内核协议栈
return br_pass_frame_up(skb);
if (now != dst->used)
dst->used = now;
// 目的不是local的走br_forward
br_forward(dst->dst, skb, local_rcv, false);
} else { // 泛洪到所有端口
if (!mcast_hit)
br_flood(br, skb, pkt_type, local_rcv, false);
else
br_multicast_flood(mdst, skb, local_rcv, false);
}
if (local_rcv)
return br_pass_frame_up(skb);
out:
return 0;
drop:
kfree_skb(skb);
goto out;
}
在参数设置为1的时候
[10:51:20 ][4026532008] cni0 0a580ae80001 T_SYN:10.232.0.117:44078->10.232.0.118:80 ffff9fb91c5f1cf8.0:br_nf_pre_routing_finish
[10:51:20 ][4026532008] veth4b9160ed 0a580ae80076 T_SYN:10.232.0.117:44078->10.232.0.118:80 ffff9fb91c5f1cf8.0:br_handle_frame_finish
[10:51:20 ][4026532008] veth4b9160ed 0a580ae80076 T_SYN:10.232.0.117:44078->10.232.0.118:80 ffff9fb91c5f1cf8.0:br_forward
可以看到在走到br_handle_frame_finish
函数的时候目的mac地址已经是0a580ae80076
,所以直接走到br_forward(dst->dst, skb, local_rcv, false);
在参数设置为0的时候
[10:49:16 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:38850->10.222.8.94:80 ffff9fb9c1eadaf8.0:br_nf_pre_routing
[10:49:16 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:38850->10.222.8.94:80 ffff9fb9c1eadaf8.0:br_handle_frame_finish
目的mac是0a580ae80001
网桥的mac地址被认为是local,所以会被上送到协议栈
小结:net.bridge.bridge-nf-call-iptables
确实如我们理解的那样在打开的时候bridge可以调用iptable的规则,但是net.bridge.bridge-nf-call-iptables =0
的时候不代表数据流就不会被DNAT,因为linux的bridge不仅仅作为”交换机“也是一个“网络设备”,网络层在iptable也是有机会被DNAT的。
PS:上面的调试是基于ebpf的工具skbtracer,有兴趣的可以去看看。