net.bridge.bridge-nf-call-iptables的作用

背景

在k8s发现在net.bridge.bridge-nf-call-iptables=0和net.bridge.bridge-nf-call-iptables=1仍然是可以ping通不同的node节点的服务,按照我们的理解如果net.bridge.bridge-nf-call-iptables=0则在bridge就不会调用iptable的规则进行DNAT转换应该是不会再通的,基于上述的疑问,调查一下这个参数的作用?

基于centos8的系统

kernel 4.18

下面分析这个参数分别在0和1的时候不同的代码路径以及区别

sysctl -w net.bridge.bridge-nf-call-iptables=0

[15:16:42 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:45538->10.222.8.94:80 ffff9fc21973aaf8.0:br_netif_receive_skb                                                                                 
[15:16:42 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:45538->10.222.8.94:80 ffff9fc21973aaf8.0:__netif_receive_skb                                                                                  
[15:16:42 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:45538->10.222.8.94:80 ffff9fc21973aaf8.0:ip_rcv
[15:16:42 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:45538->10.222.8.94:80 ffff9fc21973aaf8.0:2.raw.PREROUTING.ACCEPT 
[15:16:42 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:45538->10.222.8.94:80 ffff9fc21973aaf8.0:2.mangle.PREROUTING.ACCEPT 
[15:16:42 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:45538->10.222.8.94:80 ffff9fc21973aaf8.0:2.nat.PREROUTING.ACCEPT 
[15:16:42 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:45538->10.232.0.118:80 ffff9fc21973aaf8.0:ip_rcv_finish
[15:16:42 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:45538->10.232.0.118:80 ffff9fc21973aaf8.0:2.mangle.FORWARD.ACCEPT 
[15:16:42 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:45538->10.232.0.118:80 ffff9fc21973aaf8.0:2.filter.FORWARD.ACCEPT 
[15:16:42 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:45538->10.232.0.118:80 ffff9fc21973aaf8.0:ip_output
[15:16:42 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:45538->10.232.0.118:80 ffff9fc21973aaf8.0:2.mangle.POSTROUTING.ACCEPT 
[15:16:42 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:45538->10.232.0.118:80 ffff9fc21973aaf8.0:2.nat.POSTROUTING.ACCEPT 
[15:16:42 ][4026532008] cni0         0a580ae80001 T_ACK,SYN:10.232.0.118:80->10.232.0.117:45538 ffff9fcb89d5e700.0:ip_rcv_finish
[15:16:42 ][4026532008] cni0         0a580ae80001 T_ACK,SYN:10.232.0.118:80->10.232.0.117:45538 ffff9fcb89d5e700.0:2.mangle.FORWARD.ACCEPT 
[15:16:42 ][4026532008] cni0         0a580ae80001 T_ACK,SYN:10.232.0.118:80->10.232.0.117:45538 ffff9fcb89d5e700.0:2.filter.FORWARD.ACCEPT 
[15:16:42 ][4026532008] cni0         0a580ae80001 T_ACK,SYN:10.232.0.118:80->10.232.0.117:45538 ffff9fcb89d5e700.0:ip_output
[15:16:42 ][4026532008] cni0         0a580ae80001 T_ACK,SYN:10.232.0.118:80->10.232.0.117:45538 ffff9fcb89d5e700.0:2.mangle.POSTROUTING.ACCEPT 
[15:16:42 ][4026532008] cni0         0a580ae80001 T_ACK,SYN:10.222.8.94:80->10.232.0.117:45538 ffff9fcb89d5e700.0:ip_finish_output
[15:16:42 ][4026532008] cni0         0a580ae80075 T_ACK,SYN:10.222.8.94:80->10.232.0.117:45538 ffff9fcb89d5e700.0:__dev_queue_xmit
[15:16:42 ][4026532008] cni0         0a580ae80075 T_ACK,SYN:10.222.8.94:80->10.232.0.117:45538 ffff9fcb89d5e700.0:br_forward
[15:16:42 ][4026532008] cni0         0a580ae80075 T_ACK,SYN:10.222.8.94:80->10.232.0.117:45538 ffff9fcb89d5e700.0:__br_forward

上面可以贴出来一条流的握手的流程之一sync <--------> sync,ack

[15:16:42 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:45538->10.222.8.94:80 ffff9fc21973aaf8.0:br_netif_receive_skb                                                                                 
[15:16:42 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:45538->10.222.8.94:80 ffff9fc21973aaf8.0:__netif_receive_skb                                                                                  
[15:16:42 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:45538->10.222.8.94:80 ffff9fc21973aaf8.0:ip_rcv
[15:16:42 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:45538->10.222.8.94:80 ffff9fc21973aaf8.0:2.raw.PREROUTING.ACCEPT 
[15:16:42 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:45538->10.222.8.94:80 ffff9fc21973aaf8.0:2.mangle.PREROUTING.ACCEPT 
[15:16:42 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:45538->10.222.8.94:80 ffff9fc21973aaf8.0:2.nat.PREROUTING.ACCEPT 
[15:16:42 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:45538->10.232.0.118:80 ffff9fc21973aaf8.0:ip_rcv_finish
[15:16:42 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:45538->10.232.0.118:80 ffff9fc21973aaf8.0:2.mangle.FORWARD.ACCEPT 
[15:16:42 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:45538->10.232.0.118:80 ffff9fc21973aaf8.0:2.filter.FORWARD.ACCEPT 
[15:16:42 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:45538->10.232.0.118:80 ffff9fc21973aaf8.0:ip_output
[15:16:42 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:45538->10.232.0.118:80 ffff9fc21973aaf8.0:2.mangle.POSTROUTING.ACCEPT 
[15:16:42 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:45538->10.232.0.118:80 ffff9fc21973aaf8.0:2.nat.POSTROUTING.ACCEPT 

可以看到代码流程为:

br_netif_receive_skb
---> __netif_receive_skb
  --->ip_rcv                   //  <=======(1)
    2.raw.PREROUTING.ACCEPT 
    2.mangle.PREROUTING.ACCEPT 
    2.nat.PREROUTING.ACCEPT 
  --->ip_rcv_finish
    2.mangle.FORWARD.ACCEPT
    2.filter.FORWARD.ACCEPT 
      ---> ip_output
      2.mangle.POSTROUTING.ACCEPT
      2.nat.POSTROUTING.ACCEPT

在[1]上传到协议栈之后经过2.nat.PREROUTING.ACCEPT之后dstip被DNAT,如下面的记录

[15:16:42 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:45538->10.222.8.94:80 ffff9fc21973aaf8.0:2.nat.PREROUTING.ACCEPT 
[15:16:42 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:45538->10.232.0.118:80 ffff9fc21973aaf8.0:ip_rcv_finish

这个解释了为什么即使在参数设置为0的情况下还是可以正确的DNAT。

具体的代码如下:

/*
 *  Main IP Receive routine.
 */
int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev)
{
    const struct iphdr *iph;
    struct net *net;
    u32 len;

    /* When the interface is in promisc. mode, drop all the crap
     * that it receives, do not try to analyse it.
     */
    if (skb->pkt_type == PACKET_OTHERHOST)
        goto drop;

    ...
    net = dev_net(dev);

    ...
    return NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING,
               net, NULL, skb, dev, NULL,
               ip_rcv_finish); <========(1)

drop:
    kfree_skb(skb);
out:
    return NET_RX_DROP;
}

在(1)的地方会进行DNAT的动作

sysctl -w net.bridge.bridge-nf-call-iptables=1

[15:57:50 ][4026544799] nil          000000000000 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.0:ip_output
[15:57:50 ][4026544799] eth0         000000000000 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.0:ip_finish_output
[15:57:50 ][4026544799] eth0         000000000000 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.0:__dev_queue_xmit
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.3:netif_rx
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.3:__netif_receive_skb
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.3:br_handle_frame
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.0:br_nf_pre_routing
[15:57:50 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.0:2.raw.PREROUTING.ACCEPT
[15:57:50 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.0:2.mangle.PREROUTING.ACCEPT
[15:57:50 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.0:2.nat.PREROUTING.ACCEPT
[15:57:50 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:39270->10.232.0.118:80 ffff9fb92e3fa4f8.0:br_nf_pre_routing_finish
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80076 T_SYN:10.232.0.117:39270->10.232.0.118:80 ffff9fb92e3fa4f8.0:br_handle_frame_finish
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80076 T_SYN:10.232.0.117:39270->10.232.0.118:80 ffff9fb92e3fa4f8.0:br_forward
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80076 T_SYN:10.232.0.117:39270->10.232.0.118:80 ffff9fb92e3fa4f8.0:__br_forward
[15:57:50 ][4026532008] veth5d4ad362 0a580ae80076 T_SYN:10.232.0.117:39270->10.232.0.118:80 ffff9fb92e3fa4f8.0:br_nf_forward_ip
[15:57:50 ][4026532008] veth5d4ad362 0a580ae80076 T_SYN:10.232.0.117:39270->10.232.0.118:80 ffff9fb92e3fa4f8.0:2.mangle.FORWARD.ACCEPT
[15:57:50 ][4026532008] veth5d4ad362 0a580ae80076 T_SYN:10.232.0.117:39270->10.232.0.118:80 ffff9fb92e3fa4f8.0:2.filter.FORWARD.ACCEPT
[15:57:50 ][4026532008] veth5d4ad362 0a580ae80076 T_SYN:10.232.0.117:39270->10.232.0.118:80 ffff9fb92e3fa4f8.0:br_nf_forward_finish
[15:57:50 ][4026532008] veth5d4ad362 0a580ae80076 T_SYN:10.232.0.117:39270->10.232.0.118:80 ffff9fb92e3fa4f8.0:br_forward_finish
[15:57:50 ][4026532008] veth5d4ad362 0a580ae80076 T_SYN:10.232.0.117:39270->10.232.0.118:80 ffff9fb92e3fa4f8.0:br_nf_post_routing
[15:57:50 ][4026532008] veth5d4ad362 0a580ae80076 T_SYN:10.232.0.117:39270->10.232.0.118:80 ffff9fb92e3fa4f8.0:2.mangle.POSTROUTING.ACCEPT
[15:57:50 ][4026532008] veth5d4ad362 0a580ae80076 T_SYN:10.232.0.117:39270->10.232.0.118:80 ffff9fb92e3fa4f8.0:2.nat.POSTROUTING.ACCEPT
[15:57:50 ][4026532008] cni0         0a580ae80001 T_ACK,SYN:10.232.0.118:80->10.232.0.117:39270 ffff9fc2f20f5200.0:br_nf_pre_routing_finish
[15:57:50 ][4026532008] veth5d4ad362 0a580ae80075 T_ACK,SYN:10.232.0.118:80->10.232.0.117:39270 ffff9fc2f20f5200.0:br_handle_frame_finish
[15:57:50 ][4026532008] veth5d4ad362 0a580ae80075 T_ACK,SYN:10.232.0.118:80->10.232.0.117:39270 ffff9fc2f20f5200.0:br_forward
[15:57:50 ][4026532008] veth5d4ad362 0a580ae80075 T_ACK,SYN:10.232.0.118:80->10.232.0.117:39270 ffff9fc2f20f5200.0:__br_forward
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80075 T_ACK,SYN:10.232.0.118:80->10.232.0.117:39270 ffff9fc2f20f5200.0:br_nf_forward_ip
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80075 T_ACK,SYN:10.232.0.118:80->10.232.0.117:39270 ffff9fc2f20f5200.0:2.mangle.FORWARD.ACCEPT
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80075 T_ACK,SYN:10.232.0.118:80->10.232.0.117:39270 ffff9fc2f20f5200.0:2.filter.FORWARD.ACCEPT
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80075 T_ACK,SYN:10.232.0.118:80->10.232.0.117:39270 ffff9fc2f20f5200.0:br_nf_forward_finish
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80075 T_ACK,SYN:10.232.0.118:80->10.232.0.117:39270 ffff9fc2f20f5200.0:br_forward_finish
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80075 T_ACK,SYN:10.232.0.118:80->10.232.0.117:39270 ffff9fc2f20f5200.0:br_nf_post_routing
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80075 T_ACK,SYN:10.232.0.118:80->10.232.0.117:39270 ffff9fc2f20f5200.0:2.mangle.POSTROUTING.ACCEPT
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80075 T_ACK,SYN:10.222.8.94:80->10.232.0.117:39270 ffff9fc2f20f5200.0:br_nf_dev_queue_xmit
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80075 T_ACK,SYN:10.222.8.94:80->10.232.0.117:39270 ffff9fc2f20f5200.0:__dev_queue_xmit
[15:57:50 ][4026544799] eth0         0a580ae80075 T_ACK,SYN:10.222.8.94:80->10.232.0.117:39270 ffff9fc2f20f5200.0:netif_rx
[15:57:50 ][4026544799] eth0         0a580ae80075 T_ACK,SYN:10.222.8.94:80->10.232.0.117:39270 ffff9fc2f20f5200.0:__netif_receive_skb
[15:57:50 ][4026544799] eth0         0a580ae80075 T_ACK,SYN:10.222.8.94:80->10.232.0.117:39270 ffff9fc2f20f5200.0:ip_rcv
[15:57:50 ][4026544799] eth0         0a580ae80075 T_ACK,SYN:10.222.8.94:80->10.232.0.117:39270 ffff9fc2f20f5200.0:ip_rcv_finish
[15:57:50 ][4026544799] nil          000000000000 T_ACK:10.232.0.117:39270->10.222.8.94:80 ffff9fc2f20f5e00.0:ip_output
[15:57:50 ][4026544799] eth0         000000000000 T_ACK:10.232.0.117:39270->10.222.8.94:80 ffff9fc2f20f5e00.0:ip_finish_output
[15:57:50 ][4026544799] eth0         000000000000 T_ACK:10.232.0.117:39270->10.222.8.94:80 ffff9fc2f20f5e00.0:__dev_queue_xmit
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80001 T_ACK:10.232.0.117:39270->10.222.8.94:80 ffff9fc2f20f5e00.3:netif_rx
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80001 T_ACK:10.232.0.117:39270->10.222.8.94:80 ffff9fc2f20f5e00.3:__netif_receive_skb
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80001 T_ACK:10.232.0.117:39270->10.222.8.94:80 ffff9fc2f20f5e00.3:br_handle_frame
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80001 T_ACK:10.232.0.117:39270->10.222.8.94:80 ffff9fc2f20f5e00.0:br_nf_pre_routing
[15:57:50 ][4026532008] cni0         0a580ae80001 T_ACK:10.232.0.117:39270->10.222.8.94:80 ffff9fc2f20f5e00.0:2.raw.PREROUTING.ACCEPT
[15:57:50 ][4026532008] cni0         0a580ae80001 T_ACK:10.232.0.117:39270->10.222.8.94:80 ffff9fc2f20f5e00.0:2.mangle.PREROUTING.ACCEPT
[15:57:50 ][4026532008] cni0         0a580ae80001 T_ACK:10.232.0.117:39270->10.232.0.118:80 ffff9fc2f20f5e00.0:br_nf_pre_routing_finish

上面是一条流的握手的一个流程,摘录其中经过cni0的部分看一下

[15:57:50 ][4026544799] nil          000000000000 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.0:ip_output
[15:57:50 ][4026544799] eth0         000000000000 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.0:ip_finish_output
[15:57:50 ][4026544799] eth0         000000000000 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.0:__dev_queue_xmit
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.3:netif_rx
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.3:__netif_receive_skb
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.3:br_handle_frame
[15:57:50 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.0:br_nf_pre_routing
[15:57:50 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.0:2.raw.PREROUTING.ACCEPT
[15:57:50 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.0:2.mangle.PREROUTING.ACCEPT
[15:57:50 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.0:2.nat.PREROUTING.ACCEPT
[15:57:50 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:39270->10.232.0.118:80 ffff9fb92e3fa4f8.0:br_nf_pre_routing_finish
...
[15:57:50 ][4026532008] cni0         0a580ae80001 T_ACK,SYN:10.232.0.118:80->10.232.0.117:39270 ffff9fc2f20f5200.0:br_nf_pre_routing_finish
...

[1-11]是一个package发送sync的过程,[13]是回复ack的,上面可以看出代码流程如下

ip_output
  --->ip_finish_output
    --->__dev_queue_xmit
      --->netif_rx
        ---->__netif_receive_skb
          ----> br_handle_frame
            ----> br_nf_pre_routing //  <=======(1)
                2.raw.PREROUTING.ACCEPT 
                2.mangle.PREROUTING.ACCEPT 
                2.nat.PREROUTING.ACCEPT 
            ---> br_nf_pre_routing_finish

在[1]在bridge调用iptable的规则经过2.nat.PREROUTING.ACCEPT之后dstip被DNAT,如下面的记录

[15:57:50 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:39270->10.222.8.94:80 ffff9fb92e3fa4f8.0:2.nat.PREROUTING.ACCEPT
[15:57:50 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:39270->10.232.0.118:80 ffff9fb92e3fa4f8.0:br_nf_pre_routing_finish

这个表明在设置参数为1的时候bridge确实会使用iptable的规则进行DNAT转换

具体的代码:

/* Direct IPv6 traffic to br_nf_pre_routing_ipv6.
 * Replicate the checks that IPv4 does on packet reception.
 * Set skb->dev to the bridge device (i.e. parent of the
 * receiving device) to make netfilter happy, the REDIRECT
 * target in particular.  Save the original destination IP
 * address to be able to detect DNAT afterwards. */
static unsigned int br_nf_pre_routing(void *priv,
                      struct sk_buff *skb,
                      const struct nf_hook_state *state)
{
    struct nf_bridge_info *nf_bridge;
    struct net_bridge_port *p;
    struct net_bridge *br;
    __u32 len = nf_bridge_encap_header_len(skb);

    ....

    if (!IS_IP(skb) && !IS_VLAN_IP(skb) && !IS_PPPOE_IP(skb)) //<======(1)
        return NF_ACCEPT;
    ....

    NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING, state->net, state->sk, skb,
        skb->dev, NULL,
        br_nf_pre_routing_finish); // <=============(2)

    return NF_STOLEN;
}

如果设置参数为1在(1)的时候if条件是不成立的,会走到(2)调用iptable的规则进行DNAT

参数打开/关闭代码路径的区别

贴一下两者代码路径的区别:
关闭参数:

[10:49:16 ][4026544799] nil          000000000000 T_SYN:10.232.0.117:38850->10.222.8.94:80 ffff9fb9c1eadaf8.0:ip_output                                                                                            
[10:49:16 ][4026544799] eth0         000000000000 T_SYN:10.232.0.117:38850->10.222.8.94:80 ffff9fb9c1eadaf8.0:ip_finish_output                                                                                     
[10:49:16 ][4026544799] eth0         000000000000 T_SYN:10.232.0.117:38850->10.222.8.94:80 ffff9fb9c1eadaf8.0:__dev_queue_xmit                                                                                     
[10:49:16 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:38850->10.222.8.94:80 ffff9fb9c1eadaf8.3:netif_rx                                                                                             
[10:49:16 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:38850->10.222.8.94:80 ffff9fb9c1eadaf8.3:__netif_receive_skb                                                                                  
[10:49:16 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:38850->10.222.8.94:80 ffff9fb9c1eadaf8.3:br_handle_frame
[10:49:16 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:38850->10.222.8.94:80 ffff9fb9c1eadaf8.0:br_nf_pre_routing
[10:49:16 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:38850->10.222.8.94:80 ffff9fb9c1eadaf8.0:br_handle_frame_finish
[10:49:16 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:38850->10.222.8.94:80 ffff9fb9c1eadaf8.0:br_pass_frame_up
[10:49:16 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:38850->10.222.8.94:80 ffff9fb9c1eadaf8.0:br_netif_receive_skb
[10:49:16 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:38850->10.222.8.94:80 ffff9fb9c1eadaf8.0:__netif_receive_skb
[10:49:16 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:38850->10.222.8.94:80 ffff9fb9c1eadaf8.0:ip_rcv
[10:49:16 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:38850->10.232.0.118:80 ffff9fb9c1eadaf8.0:ip_rcv_finish
[10:49:16 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:38850->10.232.0.118:80 ffff9fb9c1eadaf8.0:ip_output

打开参数:

[10:51:20 ][4026544799] nil          000000000000 T_SYN:10.232.0.117:44078->10.222.8.94:80 ffff9fb91c5f1cf8.0:ip_output                                                                                            
[10:51:20 ][4026544799] eth0         000000000000 T_SYN:10.232.0.117:44078->10.222.8.94:80 ffff9fb91c5f1cf8.0:ip_finish_output                                                                                     
[10:51:20 ][4026544799] eth0         000000000000 T_SYN:10.232.0.117:44078->10.222.8.94:80 ffff9fb91c5f1cf8.0:__dev_queue_xmit                                                                                     
[10:51:20 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:44078->10.222.8.94:80 ffff9fb91c5f1cf8.3:netif_rx                                                                                             
[10:51:20 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:44078->10.222.8.94:80 ffff9fb91c5f1cf8.3:__netif_receive_skb                                                                                  
[10:51:20 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:44078->10.222.8.94:80 ffff9fb91c5f1cf8.3:br_handle_frame                                                                                      
[10:51:20 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:44078->10.222.8.94:80 ffff9fb91c5f1cf8.0:br_nf_pre_routing
[10:51:20 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:44078->10.232.0.118:80 ffff9fb91c5f1cf8.0:br_nf_pre_routing_finish
[10:51:20 ][4026532008] veth4b9160ed 0a580ae80076 T_SYN:10.232.0.117:44078->10.232.0.118:80 ffff9fb91c5f1cf8.0:br_handle_frame_finish
[10:51:20 ][4026532008] veth4b9160ed 0a580ae80076 T_SYN:10.232.0.117:44078->10.232.0.118:80 ffff9fb91c5f1cf8.0:br_forward
[10:51:20 ][4026532008] veth4b9160ed 0a580ae80076 T_SYN:10.232.0.117:44078->10.232.0.118:80 ffff9fb91c5f1cf8.0:__br_forward
[10:51:20 ][4026532008] veth5d4ad362 0a580ae80076 T_SYN:10.232.0.117:44078->10.232.0.118:80 ffff9fb91c5f1cf8.0:br_nf_forward_ip
[10:51:20 ][4026532008] veth5d4ad362 0a580ae80076 T_SYN:10.232.0.117:44078->10.232.0.118:80 ffff9fb91c5f1cf8.0:br_nf_forward_finish
[10:51:20 ][4026532008] veth5d4ad362 0a580ae80076 T_SYN:10.232.0.117:44078->10.232.0.118:80 ffff9fb91c5f1cf8.0:br_forward_finish
[10:51:20 ][4026532008] veth5d4ad362 0a580ae80076 T_SYN:10.232.0.117:44078->10.232.0.118:80 ffff9fb91c5f1cf8.0:br_nf_post_routing

两者的区别就是如下:

[10:51:20 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:44078->10.222.8.94:80 
[10:51:20 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:44078->10.232.0.118:80

就如上面的分析都会进行DNAT的动作,但是执行该动作的位置不同

为什么同样是同一个节点为什么会是这样的呢?

我们看一下这个函数

/* note: already called with rcu_read_lock */
int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
{
    struct net_bridge_port *p = br_port_get_rcu(skb->dev);
    enum br_pkt_type pkt_type = BR_PKT_UNICAST;
    struct net_bridge_fdb_entry *dst = NULL;
    struct net_bridge_mdb_entry *mdst;
    bool local_rcv, mcast_hit = false;
    struct net_bridge *br;
    u16 vid = 0;

    ...
    // 判断网桥是否是混杂模式,如果是会复制一份流量到本地
    local_rcv = !!(br->dev->flags & IFF_PROMISC);
    if (is_multicast_ether_addr(eth_hdr(skb)->h_dest)) {
        /* by definition the broadcast is also a multicast address */
        if (is_broadcast_ether_addr(eth_hdr(skb)->h_dest)) {
            pkt_type = BR_PKT_BROADCAST;
            local_rcv = true;
        } else {
            pkt_type = BR_PKT_MULTICAST;
            if (br_multicast_rcv(br, p, skb, vid))
                goto drop;
        }
    }

    ...
    //针对pkt的类型进行相应的处理
    switch (pkt_type) {
    // 广播包
    case BR_PKT_MULTICAST:
        mdst = br_mdb_get(br, skb, vid);
        if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
            br_multicast_querier_exists(br, eth_hdr(skb))) {
            if ((mdst && mdst->host_joined) ||
                br_multicast_is_router(br)) {
                local_rcv = true;
                br->dev->stats.multicast++;
            }
            mcast_hit = true;
        } else {
            local_rcv = true;
            br->dev->stats.multicast++;
        }
        break;
    // 单播包,查看fdb(mac 地址管理表)看是否找到
    case BR_PKT_UNICAST:
        dst = br_fdb_find_rcu(br, eth_hdr(skb)->h_dest, vid);
    default:
        break;
    }
    // 如果转发包里面找到
    if (dst) {
        unsigned long now = jiffies;

        if (dst->is_local)   // 目的是本地的直接调用br_pass_frame_up,上送到内核协议栈
            return br_pass_frame_up(skb);

        if (now != dst->used)
            dst->used = now;
        // 目的不是local的走br_forward
        br_forward(dst->dst, skb, local_rcv, false);
    } else { // 泛洪到所有端口
        if (!mcast_hit)
            br_flood(br, skb, pkt_type, local_rcv, false);
        else
            br_multicast_flood(mdst, skb, local_rcv, false);
    }

    if (local_rcv)
        return br_pass_frame_up(skb);

out:
    return 0;
drop:
    kfree_skb(skb);
    goto out;
}

在参数设置为1的时候

[10:51:20 ][4026532008] cni0         0a580ae80001 T_SYN:10.232.0.117:44078->10.232.0.118:80 ffff9fb91c5f1cf8.0:br_nf_pre_routing_finish
[10:51:20 ][4026532008] veth4b9160ed 0a580ae80076 T_SYN:10.232.0.117:44078->10.232.0.118:80 ffff9fb91c5f1cf8.0:br_handle_frame_finish
[10:51:20 ][4026532008] veth4b9160ed 0a580ae80076 T_SYN:10.232.0.117:44078->10.232.0.118:80 ffff9fb91c5f1cf8.0:br_forward

可以看到在走到br_handle_frame_finish函数的时候目的mac地址已经是0a580ae80076,所以直接走到br_forward(dst->dst, skb, local_rcv, false);

在参数设置为0的时候

[10:49:16 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:38850->10.222.8.94:80 ffff9fb9c1eadaf8.0:br_nf_pre_routing
[10:49:16 ][4026532008] veth4b9160ed 0a580ae80001 T_SYN:10.232.0.117:38850->10.222.8.94:80 ffff9fb9c1eadaf8.0:br_handle_frame_finish

目的mac是0a580ae80001 网桥的mac地址被认为是local,所以会被上送到协议栈

小结:net.bridge.bridge-nf-call-iptables 确实如我们理解的那样在打开的时候bridge可以调用iptable的规则,但是net.bridge.bridge-nf-call-iptables =0 的时候不代表数据流就不会被DNAT,因为linux的bridge不仅仅作为”交换机“也是一个“网络设备”,网络层在iptable也是有机会被DNAT的。

PS:上面的调试是基于ebpf的工具skbtracer,有兴趣的可以去看看。

暂无评论

发送评论 编辑评论


				
|´・ω・)ノ
ヾ(≧∇≦*)ゝ
(☆ω☆)
(╯‵□′)╯︵┴─┴
 ̄﹃ ̄
(/ω\)
∠( ᐛ 」∠)_
(๑•̀ㅁ•́ฅ)
→_→
୧(๑•̀⌄•́๑)૭
٩(ˊᗜˋ*)و
(ノ°ο°)ノ
(´இ皿இ`)
⌇●﹏●⌇
(ฅ´ω`ฅ)
(╯°A°)╯︵○○○
φ( ̄∇ ̄o)
ヾ(´・ ・`。)ノ"
( ง ᵒ̌皿ᵒ̌)ง⁼³₌₃
(ó﹏ò。)
Σ(っ °Д °;)っ
( ,,´・ω・)ノ"(´っω・`。)
╮(╯▽╰)╭
o(*////▽////*)q
>﹏<
( ๑´•ω•) "(ㆆᴗㆆ)
😂
😀
😅
😊
🙂
🙃
😌
😍
😘
😜
😝
😏
😒
🙄
😳
😡
😔
😫
😱
😭
💩
👻
🙌
🖕
👍
👫
👬
👭
🌚
🌝
🙈
💊
😶
🙏
🍦
🍉
😣
Source: github.com/k4yt3x/flowerhd
颜文字
Emoji
小恐龙
花!
上一篇
下一篇