无浮动IP的虚机出公网流量路径

Neutron构建网络时涉及的知识点比较广, 虚拟化网络实施上又具有非常大的灵活性, 这往往会让接触的同学摸不着头脑. 本文特意对"无浮动IP的虚机出公网流量路径"这一场景进行分享, 同时对涉及到的组件和知识点进行简要介绍, 希望能给对虚拟化网络感兴趣的同学一些帮助.

公司内虚拟化网络实施有很多种方式, 为什么单挑这个场景进行分享呢, 这主要是因为这个场景的链路相对比较长, 涉及到的知识点比较全, 没有硬件厂商的绑定具有通用性.

本次分享分3部分供读者挑选: 1. 拓扑和流量路径. 2. 网络知识点回顾. 3. 各节点抓包记录.

第一部分: 拓扑和流量路径

拓扑:

image.png

本场景流量路径:

流量路径是图中U字顺序, 具体实现和社区版有些区别.

image.png

第二部分: 网络知识点回顾

对VLAN/VXLAN, DVR, OSPF, ARP, OVS, namespace, bridge这些关键字比较熟悉的同学, 阅读起来可能会比较轻松, 如果不熟悉也不必担心, 我们会在这部分回顾一下这些基础的知识点, 然后在第三部分分步抓包加深理解.

OSI七层模型

在本次我们重点关注L2数据链路层和L3网络层这2层.

image.png

VLAN报文格式

基于802.1Q的VLAN帧格式

image.png

  • VLAN ID取值范围为1~4094
  • LAN 一个LAN表示一个广播域, LAN中的所有成员都会收到LAN中1个成员发出的广播包
    VLAN 表示 Virutal LAN。一个带有 VLAN 功能的交换机能够同时处于多个 LAN 中.
  • Access类型的端口只能属于1个VLAN
  • Trunk类型的端口可以属于多个VLAN,可以接收和发送多个VLAN的报文

VXLAN报文格式

VXLAN是将以太网报文封装在UDP传输层上的一种隧道转发模式(ovs默认使用4798)

image.png

  • VXLAN 在 VTEP间建立隧道,通过 Layer 3 网络(外部网络)传输封装后的 Layer 2 数据
    最外层的 IP/UDP 协议报文用来在底层网络上传输.
  • 中间是VXLAN 头部,vtep 接受到报文之后,去除前面的 IP/UDP 协议部分,根据这部分来处理vxlan 的逻辑,主要是根据VNI 发送到最终的虚拟机.
  • 最里面是原始的报文,也就是虚拟机看到的报文内容.
  • 封装会增加50Bytes的overhead.

ARP(Address Resolution Protocol )

ARP协议是用来将IP地址解析为MAC地址的协议.

  • 静态ARP
  • 免费arp
    • IP地址冲突检测
    • 用于通告一个新的MAC地址
  • 动态arp
    • 动态ARP通过广播ARP请求和单播ARP应答这两个过程完成地址解析
    • 网桥/网卡自动学习, 维护生命周期
  • proxy arp
  • 常用命令:
    ip neigh/ arp -n
    ovs-appctl fdb/show br-int
    brctl showmacs <bridge_name>

策略路由(PBR)

可以依据用户自定义的策略进行报文转发.

image.png

Tips:

  • 默认表0, 32766(默认main), 32767 3个优先级已被占用
  • 数值越小优先级别越高

网络namespace

用来实现隔离的一套机制,不同 namespace 中的资源之间彼此不可见
namespace中拥有独立的网络栈(网卡、路由转发表、iptables)

一个设备(Linux Device)只能位于1个namespace中.

不同namespace中的设备可以利用veth pair进行桥接.

常见namespace:

fip-xxx
qrouter-xxx
snat-xxx
qdhcp-xxx

命令:

ip netns # 查看
ip netns exec ns1  ip addr # 执行命令

Neutron基本概念

网络:
隔离的 L2 域,可以是虚拟、逻辑或交换。

子网:
隔离的 L3 域,IP 地址段。其中每个机器有一个 IP,同一个子网的主机彼此 L3 可见。

端口:
网络上虚拟、逻辑或交换端口。 所有这些实体都是虚拟的,拥有自动生成的唯一标示id,支持CRUD功能,并在数据库中跟踪记录状态.

Linux Bridge

image.png

OVS(Openvswitch)

Openvswitch是一个虚拟交换软件.

一个虚拟交换机主要2个作用:

  • 传递虚拟机之间的流量
  • 实现虚拟机与外界网络的通讯

OVS网桥:

  • br-int:bridge-integration,综合网桥,常用于表示实现主要内部网络功能的网桥.
  • br-ex:bridge-external,外部网桥,通常表示负责跟外部网络通信的网桥.
  • br-tun: bridge-tunnel, 隧道网桥.

OVS流表:

匹配:

  • 数值越大优先级越高
  • 根据端口号匹配
  • 根据来源MAC/目的MAC匹配
  • 根据协议来下匹配

动作:

  • NORMAL(普通二层交互)
  • resubmit
  • output到某个端口
  • drop
  • learn
  • 修改mac, 打/剥离vlan/tunnel

Iptables

表(tables)提供特定的功能, 内置的表: nat, filter, mangle.
链(chains)是数据包传播的路径.

  • 自定义chain没有自定义策略
  • 动作确实是执行完以后,就不再继续匹配其他同链的规则动作了

策略由不同的规则(rule)串联而成, 规则的本质是对进入的IP报文进行说明.

  • 匹配(符合什么条件):
    • interface/目的地址/源地址/协议/状态/...
  • 动作(做什么处理):
    • accept
    • drop
    • reject
    • return: 继续父链的调用处的下一条
    • DNAT: --to-destination
    • SNAT

DVR 分布式虚拟路由

目的: 降低网络节点的负载

核心解决方式:

  • 通过流表规则解决多个路由器mac冲突的问题
  • 让本地的请求找到本地的路由器(ARP)
  • 要避免路由器的接口 mac 地址直接暴露到外部网络上, 通过流表拦截此MAC
  • 从 neutron server 上申请唯一 MAC 地址, 通过流表替换进出流量的MAC地址

说明:

  • 分布到多个计算节点上的 qrouter 的interface 的 MAC 地址都相同
  • OVS flows 需要更新来支持 DVR
  • 虚拟机启动时port转为active时, 会rpc通知neutron agent更新规则
  • 南北SNAT流量依然需要经过网络节点
  • 有floating IP的南北流量直接从compute节点出
  • 公司内的DVR和社区版有些区别
  • Neutron L2 Agent承担使用iptables维护链和规则的任务
  • L3 agent iptables subnet之间的路由服务

neutron DVR部署设置:

  • 网络节点和计算节点都部署L2和L3 agent
  • compute dvr 为 dvr, network节点为 dvr_snat
  • l2_population = True
  • router_distributed = True

OSPF

Open Shortest Path First 开放式最短路径优先, OSPF通过路由器之间通告网络接口的状态来建立链路状态数据库,生成最短路径树,每个OSPF路由器使用这些最短路径构造路由表.

作用:

  • 通告虚拟机的浮动IP
  • 通告VPC公网互联的SNAT IP

neutron-l3-agent启动fip-xxx的namespace和接入交换机运行ospf协议.

常用命令

tcpdump # 抓包
ovs-appctl fdb/show br-int # 查看ovs mac表
ip rule list # 查看策略列表
ip route list table table_name # 查看某个策略路由
ovs-ofctl dump-ports-desc br-int # 查看网桥端口
ovs-ofctl dump-flows br-int # 查看流表
brctl showmacs <bridge> # 查看linux网桥学习到的mac
ovs-appctl fdb/show br-int  # 查看OVS网桥学习到的mac

第三部分: 各节点抓包记录

通过第二部分介绍的一大堆的基础概念, 我们再通过按照流量路径顺序抓包来加深印象, 同时可以参考第一部分的图示进行讲解.

抓包的顺序为:

  1. 虚拟机内部抓包
[root@Server-be9f76b6 ~]# tcpdump -i eth0 icmp -nnee
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
17:23:29.858571 fa:16:3e:b2:ef:af > fa:16:3e:06:d8:7b, ethertype IPv4 (0x0800), length 142: 192.168.1.7 > 8.8.8.8: ICMP echo request, id 60691, seq 53781, length 108
17:23:29.892335 fa:16:3e:d1:d8:49 > fa:16:3e:b2:ef:af, ethertype IPv4 (0x0800), length 110: 8.8.8.8 > 192.168.1.7: ICMP echo reply, id 60691, seq 53781, length 76

[root@Server-be9f76b6 ~]# ip r l
169.254.169.254 via 192.168.1.1 dev eth0  proto static
192.168.1.0/24 dev eth0  proto kernel  scope link  src 192.168.1.7
default via 192.168.1.1 dev eth0  proto static

[root@Server-be9f76b6 ~]# ip r g 8.8.8.8
8.8.8.8 via 192.168.1.1 dev eth0  src 192.168.1.7
    cache  mtu 1450 hoplimit 64

[root@Server-be9f76b6 ~]# traceroute -n 8.8.8.8
 1  192.168.1.1  0.178 ms  0.209 ms  0.135 ms
 2  192.168.1.6  0.355 ms  0.352 ms  0.340 ms
 3  169.254.96.33  0.558 ms  0.546 ms  0.531 ms
 4  10.206.221.193  1.459 ms  2.400 ms  1.967 ms
 5  10.206.223.56  1.458 ms 10.206.223.60  1.407 ms 10.206.223.62  1.880 ms
 6  10.206.223.62  2.370 ms 10.206.223.60  2.279 ms  1.749 ms
 (略)

[root@Server-be9f76b6 ~]# ip neigh
192.168.1.6 dev eth0 lladdr fa:16:3e:d1:d8:49 STALE
192.168.1.2 dev eth0 lladdr fa:16:3e:93:a3:f2 STALE
192.168.1.1 dev eth0 lladdr fa:16:3e:06:d8:7b REACHABLE

说明:
查路由8.8.8.8默认走网关
发: 目的mac对应192.168.1.1
回包: 源mac对应192.168.1.6
数据格式: 无vlan, 无vxlan 1. 有序列表项0

  1. 计算节点抓tap口流量
[root@w07 ~]# tcpdump -i tap2ab77d0f-99 -nnee

listening on tap2ab77d0f-99, link-type EN10MB (Ethernet), capture size 262144 bytes
17:16:11.866127 fa:16:3e:b2:ef:af > fa:16:3e:06:d8:7b, ethertype IPv4 (0x0800), length 142: 192.168.1.7 > 8.8.8.8: ICMP echo request, id 60691, seq 53368, length 108
17:16:11.899884 fa:16:3e:d1:d8:49 > fa:16:3e:b2:ef:af, ethertype IPv4 (0x0800), length 110: 8.8.8.8 > 192.168.1.7: ICMP echo reply, id 60691, seq 53368, length 76

[root@w07 ~]# brctl show
bridge name bridge id       STP enabled interfaces
qbr134a0c76-2f      8000.3286bc9479d4   no      qvb134a0c76-2f
                            tap134a0c76-2f
qbr2ab77d0f-99      8000.966e9ba8b1fc   no      qvb2ab77d0f-99
                            tap2ab77d0f-99
qbr5a70c226-ca      8000.a602f52928ba   no      qvb5a70c226-ca
                            tap5a70c226-ca
qbr8e2a2b2f-ea      8000.ea797eb5b738   no      qvb8e2a2b2f-ea
                            tap8e2a2b2f-ea
[root@w07 ~]# brctl showmacs qbr2ab77d0f-99
port no mac addr        is local?   ageing timer
  1 46:4f:72:0f:95:bf   no        23.98
  1 96:6e:9b:a8:b1:fc   yes        0.00
  1 96:6e:9b:a8:b1:fc   yes        0.00
  2 fa:16:3e:b2:ef:af   no         0.06
  1 fa:16:3e:d1:d8:49   no         0.06
  2 fe:16:3e:b2:ef:af   yes        0.00
  2 fe:16:3e:b2:ef:af   yes        0.00

说明:

长度依然是108
数据格式: 无vlan, 无vxlan
网桥
qvb2ab77d0f-99 (1)
tap2ab77d0f-99 (2)
网桥上学习到的mac

  1. qvo-xxx/qvb-xxx的包 (类似抓tap设备, 略)

  2. br-int

[root@w07 ~]# ovs-vsctl show
...
    Bridge br-int
        fail_mode: secure
...
        Port "qvo2ab77d0f-99"
            tag: 10
            Interface "qvo2ab77d0f-99"
        Port "qr-178fc8e2-fd"
            tag: 10
            Interface "qr-178fc8e2-fd"
                type: internal
...
[root@w07 ~]# ovs-appctl fdb/show br-int
 port  VLAN  MAC                Age
   45    10  fa:16:3e:95:02:b6    4
   40    10  fa:16:3e:06:d8:7b    0
    1    10  fa:16:3e:d1:d8:49    0
   23     1  fa:16:3e:4d:50:f1    0
   47     1  fa:16:3e:78:e7:8a    0
   41    10  fa:16:3e:b2:ef:af    0
   24     1  fa:16:3e:0c:69:32    0

说明:

br-int dump-ports-desc
1(patch-tun)
40(qr-178fc8e2-fd)
流表最终匹配到NORMAL
tag 10表示是该接口VLAN ID=10, 并不是说报文的VLAN ID = 10
报文格式依然是普通包, 无VLAN

  1. 在qrouter-xxx的namespace抓包
[root@w07 ~]# ip netns exec qrouter-c4d4b760-41b9-45e1-a607-d054da99c479 tcpdump -i qr-178fc8e2-fd -nnee
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on qr-178fc8e2-fd, link-type EN10MB (Ethernet), capture size 262144 bytes
17:28:46.130180 fa:16:3e:b2:ef:af > fa:16:3e:06:d8:7b, ethertype IPv4 (0x0800), length 142: 192.168.1.7 > 8.8.8.8: ICMP echo request, id 60691, seq 54121, length 108
17:28:46.130209 fa:16:3e:06:d8:7b > fa:16:3e:d1:d8:49, ethertype IPv4 (0x0800), length 142: 192.168.1.7 > 8.8.8.8: ICMP echo request, id 60691, seq 54121, length 108

[root@w07 ~]# ip netns exec qrouter-c4d4b760-41b9-45e1-a607-d054da99c479 ip rule list
0:  from all lookup local
32766:  from all lookup main
32767:  from all lookup default
3232235777: from 192.168.1.1/24 lookup 3232235777
[root@w07 ~]# ip netns exec qrouter-c4d4b760-41b9-45e1-a607-d054da99c479 ip r s t 3232235777
default via 192.168.1.6 dev qr-178fc8e2-fd

[root@w07 ~]# ip netns exec qrouter-c4d4b760-41b9-45e1-a607-d054da99c479 ip neigh
192.168.1.7 dev qr-178fc8e2-fd lladdr fa:16:3e:b2:ef:af PERMANENT
192.168.1.13 dev qr-178fc8e2-fd lladdr fa:16:3e:b6:fb:42 PERMANENT
192.168.1.6 dev qr-178fc8e2-fd lladdr fa:16:3e:d1:d8:49 PERMANENT
192.168.1.10 dev qr-178fc8e2-fd lladdr fa:16:3e:95:02:b6 PERMANENT
192.168.1.3 dev qr-178fc8e2-fd lladdr fa:16:3e:a9:38:b7 PERMANENT
192.168.1.2 dev qr-178fc8e2-fd lladdr fa:16:3e:93:a3:f2 PERMANENT

说明:

长度依然是108
数据格式: 无vlan, 无vxlan
静态mac(l2 population)
路径:
进入qr-xxx
策略路由, 下一跳到1.6, 查mac
从qr-xxx出去, 到 1(patch-tun)

  1. 计算节点br-tun网桥和流表
[root@w07 ~]# ovs-vsctl  show
76001df8-48a5-4185-8de4-a035fc4b2d72
    Bridge br-tun
        fail_mode: secure
        Port "vxlan-0ace6b9c"
            Interface "vxlan-0ace6b9c"
                type: vxlan
                options: {df_default="true", in_key=flow, local_ip="10.206.107.238", out_key=flow, remote_ip="10.206.107.156"}
...

[root@w07 ~]# netstat -nlup
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
udp        0      0 0.0.0.0:111             0.0.0.0:*                           12153/rpcbind
udp        0      0 127.0.0.1:323           0.0.0.0:*                           1047/chronyd
udp        0      0 0.0.0.0:874             0.0.0.0:*                           12153/rpcbind
udp        0      0 0.0.0.0:4789            0.0.0.0:*                           -

[root@w07 ~]# ovs-appctl fdb/show br-tun
 port  VLAN  MAC                Age

其他常用命令:

ovs-appctl dpif/dump-flows br-tun
ovs-appctl dpif/show

说明:
br-tun dump-ports-desc
1(patch-int)
5(vxlan-0ace6b9c)
流表: 1(mod_dl_src)>2>20(output 5)

  1. 计算节点eth2抓包
[root@w07 ~]# tcpdump -i eth2 udp -nnee
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth2, link-type EN10MB (Ethernet), capture size 262144 bytes
17:35:05.742902 b4:96:91:5a:31:78 > 5c:c9:99:60:e0:3c, ethertype IPv4 (0x0800), length 192: 10.206.107.238.58650 > 10.206.107.156.4789: VXLAN, flags [I] (0x08), vni 103
fa:16:3f:13:77:c5 > fa:16:3e:d1:d8:49, ethertype IPv4 (0x0800), length 142: 192.168.1.7 > 8.8.8.8: ICMP echo request, id 60691, seq 54500, length 108
17:35:05.776442 5c:c9:99:60:e0:3c > b4:96:91:5a:31:78, ethertype IPv4 (0x0800), length 160: 10.206.107.156.56414 > 10.206.107.238.4789: VXLAN, flags [I] (0x08), vni 103
fa:16:3e:d1:d8:49 > fa:16:3e:b2:ef:af, ethertype IPv4 (0x0800), length 110: 8.8.8.8 > 192.168.1.7: ICMP echo reply, id 60691, seq 54500, length 76

[root@w07 ~]# ip neigh | grep "5c:c9:99:60:e0:3c"
10.206.107.193 dev eth2 lladdr 5c:c9:99:60:e0:3c REACHABLE

image.png

说明:
报文格式为VXLAN格式
eth:ethertype:ip:udp:vxlan:eth:ethertype:ip:icmp:data
外部源本地eth2, 外部mac 网络节点eth2的UDP 4789
增加了50的overhead
内部包含原始包

  1. 网络节点eth2抓包
[root@w02 ~]# tcpdump -i eth2 udp and host 10.206.107.238  -nnee
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth2, link-type EN10MB (Ethernet), capture size 262144 bytes
17:37:16.969152 5c:c9:99:db:69:c9 > b4:96:91:5a:32:54, ethertype IPv4 (0x0800), length 192: 10.206.107.238.58650 > 10.206.107.156.4789: VXLAN, flags [I] (0x08), vni 103
fa:16:3f:13:77:c5 > fa:16:3e:d1:d8:49, ethertype IPv4 (0x0800), length 142: 192.168.1.7 > 8.8.8.8: ICMP echo request, id 60691, seq 54631, length 108
17:37:17.002678 b4:96:91:5a:32:54 > 5c:c9:99:db:69:c9, ethertype IPv4 (0x0800), length 160: 10.206.107.156.56414 > 10.206.107.238.4789: VXLAN, flags [I] (0x08), vni 103
fa:16:3e:d1:d8:49 > fa:16:3e:b2:ef:af, ethertype IPv4 (0x0800), length 110: 8.8.8.8 > 192.168.1.7: ICMP echo reply, id 60691, seq 54631, length 76
17:37:17.970994 5c:c9:99:db:69:c9 > b4:96:91:5a:32:54, ethertype IPv4 (0x0800), length 192: 10.206.107.238.58650 > 10.206.107.156.4789: VXLAN, flags [I] (0x08), vni 103

[root@w02 ~]# ip neigh | grep -E "5c:c9:99:db:69:c9"
10.206.107.129 dev eth2 lladdr 5c:c9:99:db:69:c9 REACHABLE

[root@w02 ~]# netstat -nlup
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
udp        0      0 0.0.0.0:45492           0.0.0.0:*                           1630/haproxy
udp        0      0 0.0.0.0:111             0.0.0.0:*                           787644/rpcbind
udp        0      0 127.0.0.1:323           0.0.0.0:*                           794/chronyd
udp        0      0 0.0.0.0:871             0.0.0.0:*                           787644/rpcbind
udp        0      0 0.0.0.0:4789            0.0.0.0:*                           -
udp        0      0 0.0.0.0:8472            0.0.0.0:*                           -

说明:
报文格式为VXLAN格式
eth:ethertype:ip:udp:vxlan:eth:ethertype:ip:icmp:data
内部报文未变化(源和目的MAC和IP)
本地内核态UDP 4789开启

  1. 网络节点br-tun网桥和流表
[root@w02 ~]# ovs-vsctl show
c67632e5-75ed-4f73-b4ab-cf32f95a8770
...
    Bridge br-tun
        fail_mode: secure
        Port br-tun
            Interface br-tun
                type: internal
        Port "vxlan-0ace6bee"
            Interface "vxlan-0ace6bee"
                type: vxlan
                options: {df_default="true", in_key=flow, local_ip="10.206.107.156", out_key=flow, remote_ip="10.206.107.238"}
...

说明:

br-tun port
4(vxlan-0ace6bee)
1(patch-int)
匹配br-tun的流表:
0->4->9(dl_src)->patch-int

  1. 网络节点br-int网桥和流表
[root@w02 ~]# ovs-appctl fdb/show br-int
 port  VLAN  MAC                Age
    1    20  fa:16:3e:95:02:b6   28
    1    20  fa:16:3e:b2:ef:af    7
   74    20  fa:16:3e:d1:d8:49    0

说明:

br-int port
1(patch-tun):
74(sg-499291dc-d8):
匹配br-int的流表:
0(dl_src)>1(dl_lan,dl_dst)>sg-xxx

  1. snat-xxx的namespace抓包
[root@w02 ~]# ip netns  exec snat-c4d4b760-41b9-45e1-a607-d054da99c479 ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: rfp-c4d4b760-4@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
    link/ether 36:66:d5:bf:5e:e0 brd ff:ff:ff:ff:ff:ff
    inet 169.254.96.32/31 scope global rfp-c4d4b760-4
       valid_lft forever preferred_lft forever
    inet 112.65.210.200/32 brd 112.65.210.200 scope global rfp-c4d4b760-4
       valid_lft forever preferred_lft forever
125: sg-499291dc-d8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN qlen 1000
    link/ether fa:16:3e:d1:d8:49 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.6/24 brd 192.168.1.255 scope global sg-499291dc-d8
       valid_lft forever preferred_lft forever

[root@w02 ~]# ip netns  exec snat-c4d4b760-41b9-45e1-a607-d054da99c479 ip r l
default via 169.254.96.33 dev rfp-c4d4b760-4
169.254.96.32/31 dev rfp-c4d4b760-4  proto kernel  scope link  src 169.254.96.32
192.168.1.0/24 dev sg-499291dc-d8  proto kernel  scope link  src 192.168.1.6

[root@w02 ~]# ip netns  exec snat-c4d4b760-41b9-45e1-a607-d054da99c479 ip neigh
192.168.1.7 dev sg-499291dc-d8 lladdr fa:16:3e:b2:ef:af REACHABLE
169.254.96.33 dev rfp-c4d4b760-4 lladdr 82:99:8f:5a:6b:ec DELAY
192.168.1.10 dev sg-499291dc-d8 lladdr fa:16:3e:95:02:b6 STALE

[root@w02 ~]# ip netns exec snat-47c9415f-f30a-4a7c-820d-b7322a064f20 iptables -t nat -S
...
-A neutron-l3-agent-POSTROUTING ! -i rfp-47c9415f-f ! -o rfp-47c9415f-f -m conntrack ! --ctstate DNAT -j ACCEPT
-A neutron-l3-agent-snat -o rfp-47c9415f-f -j SNAT --to-source 112.65.210.208
-A neutron-l3-agent-snat -m mark ! --mark 0x2/0xffff -m conntrack --ctstate DNAT -j SNAT --to-source 112.65.210.208
...

说明:

默认路由是169.254.96.33(在fip-xxx的fpr-xxx上)
rfp-xxx和fpr-xxx是一对patch, 用来连接2个不同的namespace

防火墙规则:
SNAT, 目的地址是rfp-xxx, 则把包源地址改为112.65.210.200
就是说进入到rfp-xxx接口的时候, 源IP已经改完了

[root@w02 ~]# ip netns  exec snat-c4d4b760-41b9-45e1-a607-d054da99c479 tcpdump -i sg-499291dc-d8 -nnee
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on sg-499291dc-d8, link-type EN10MB (Ethernet), capture size 262144 bytes
17:45:23.761053 fa:16:3e:06:d8:7b > fa:16:3e:d1:d8:49, ethertype IPv4 (0x0800), length 142: 192.168.1.7 > 8.8.8.8: ICMP echo request, id 60691, seq 55117, length 108
17:45:23.809919 fa:16:3e:d1:d8:49 > fa:16:3e:b2:ef:af, ethertype IPv4 (0x0800), length 110: 8.8.8.8 > 192.168.1.7: ICMP echo reply, id 60691, seq 55117, length 76

sg-xxx流量:

普通包, length 108, 无VLAN, 无VXLAN
源MAC: fa:16:3e:06:d8:7b(qr-xxx公用MAC, 但不是有qr-xxx发出, 是由br-tun规则还原出来的MAC)目的MAC是sg-xxx interface

[root@w02 ~]# ip netns  exec snat-c4d4b760-41b9-45e1-a607-d054da99c479  tcpdump -i rfp-c4d4b760-4 -nnee
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on rfp-c4d4b760-4, link-type EN10MB (Ethernet), capture size 262144 bytes
17:46:13.847083 36:66:d5:bf:5e:e0 > 82:99:8f:5a:6b:ec, ethertype IPv4 (0x0800), length 142: 112.65.210.200 > 8.8.8.8: ICMP echo request, id 60691, seq 55167, length 108
17:46:13.880503 82:99:8f:5a:6b:ec > 36:66:d5:bf:5e:e0, ethertype IPv4 (0x0800), length 110: 8.8.8.8 > 112.65.210.200: ICMP echo reply, id 60691, seq 55167, length 76

rfp-xxx流量:

普通包, length 108, 无VLAN, 无VXLAN
发包: 源MAC: rfp-xxx, 目的MAC为默认网关mac(注意源地址已经经过SNAT)

  1. 网络节点fip-xxx的namespace
[root@w02 ~]# ip netns exec fip-4617ac50-7b34-4b05-811d-b2afc741d446 ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    ...
4: fpr-c4d4b760-4@fpr-3bbebb1a-5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
    link/ether 82:99:8f:5a:6b:ec brd ff:ff:ff:ff:ff:ff
    inet 169.254.96.33/31 scope global fpr-c4d4b760-4
       valid_lft forever preferred_lft forever
124: fip-vif.103: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN qlen 1000
    link/ether fa:17:3e:f7:cd:bc brd ff:ff:ff:ff:ff:ff
    inet 10.206.221.195/26 brd 10.206.221.255 scope global fip-vif.103
       valid_lft forever preferred_lft forever

[root@w02 ~]# ip netns exec fip-4617ac50-7b34-4b05-811d-b2afc741d446 netstat -nltp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.1:2601          0.0.0.0:*               LISTEN      2063367/zebra
tcp        0      0 127.0.0.1:2604

说明:

fip-xxx端口:
fpr-xxx(patch的一端)
fip-vif.xxx(与交换机建OSPF的端口)
默认路由在物理交换机上
zebra/ospfd 2个进程监听


[root@w02 ~]# ip netns exec fip-4617ac50-7b34-4b05-811d-b2afc741d446 tcpdump -i fpr-c4d4b760-4 -nnee
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on fpr-c4d4b760-4, link-type EN10MB (Ethernet), capture size 262144 bytes
17:48:54.127102 36:66:d5:bf:5e:e0 > 82:99:8f:5a:6b:ec, ethertype IPv4 (0x0800), length 142: 112.65.210.200 > 8.8.8.8: ICMP echo request, id 60691, seq 55327, length 108
17:48:54.160524 82:99:8f:5a:6b:ec > 36:66:d5:bf:5e:e0, ethertype IPv4 (0x0800), length 110: 8.8.8.8 > 112.65.210.200: ICMP echo reply, id 60691, seq 55327, length 76

fip-xxx内的fpr-xxx抓包:
目的mac在fpr-xxx

[root@w02 ~]# ip netns exec fip-4617ac50-7b34-4b05-811d-b2afc741d446 tcpdump -i fip-vif.103 -nnee
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on fip-vif.103, link-type EN10MB (Ethernet), capture size 262144 bytes
17:49:40.208078 fa:17:3e:f7:cd:bc > 5c:c9:99:60:d2:71, ethertype IPv4 (0x0800), length 142: 112.65.210.200 > 8.8.8.8: ICMP echo request, id 60691, seq 55373, length 108
17:49:40.241501 5c:c9:99:60:d2:71 > fa:17:3e:f7:cd:bc, ethertype IPv4 (0x0800), length 110: 8.8.8.8 > 112.65.210.200: ICMP echo reply, id 60691, seq 55373, length 76

fip-xxx内的fip-vif.xxx抓包
目的mac在默认网关(物理网关)


[root@w02 ~]# ip netns exec fip-4617ac50-7b34-4b05-811d-b2afc741d446 ip r l
default via 10.206.221.193 dev fip-vif.103
10.206.221.192/26 dev fip-vif.103  proto kernel  scope link  src 10.206.221.195
112.65.210.200 via 169.254.96.32 dev fpr-c4d4b760-4
112.65.210.204 via 169.254.113.28 dev fpr-3bbebb1a-5
169.254.96.32/31 dev fpr-c4d4b760-4  proto kernel  scope link  src 169.254.96.33
169.254.113.28/31 dev fpr-3bbebb1a-5  proto kernel  scope link  src 169.254.113.29

[root@w02 ~]# ip netns exec fip-4617ac50-7b34-4b05-811d-b2afc741d446 telnet 0 ospfd
Trying 0.0.0.0...
Connected to 0.
Escape character is '^]'.

Hello, this is Quagga (version 1.0.0.0).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

User Access Verification

Password:
localhost> show ip ospf route
============ OSPF network routing table ============
N    10.206.221.192/26     [10] area: 10.206.221.193
                           directly attached to fip-vif.103
N    112.65.210.200/32     [10] area: 10.206.221.193
                           directly attached to lo
N    112.65.210.204/32     [10] area: 10.206.221.193
                           directly attached to lo

============ OSPF router routing table =============
R    10.206.221.193        [10] area: 10.206.221.193, ABR, ASBR
                           via 10.206.221.193, fip-vif.103

============ OSPF external routing table ===========

localhost> show ip ospf nei
localhost> show ip ospf neighbor

    Neighbor ID Pri State           Dead Time Address         Interface            RXmtL RqstL DBsmL
10.206.221.193   10 Full/DR           34.730s 10.206.221.193  fip-vif.103:10.206.221.195     0     0     0
10.206.221.196    1 Full/DROther      31.489s 10.206.221.196  fip-vif.103:10.206.221.195     0     0     0

说明:

ospf协议维护路由:
ospfd与物理交换机建立邻居
配置:
设置相同的area, stub, mtu, vlan
由L3 agent维护fip-xx ns

  1. 网络节点br-ex网桥
[root@w02 ~]# ovs-vsctl show
c67632e5-75ed-4f73-b4ab-cf32f95a8770
...
    Bridge br-ex
        Port "fip-vif.103"
            tag: 103
            Interface "fip-vif.103"
                type: internal
        Port br-ex
            Interface br-ex
                type: internal
        Port "eth3"
            Interface "eth3"
    ovs_version: "2.5.5"

[root@w02 ~]# ovs-appctl fdb/show br-ex
 port  VLAN  MAC                Age
    1     0  5c:c9:99:60:d2:6e  171
    1   103  fa:17:3e:a7:81:25    8
    5   103  fa:17:3e:f7:cd:bc    0
    1   103  5c:c9:99:60:d2:71    0
    ...

说明:
端口:
1(eth3)
5(fip-vif.103)
流表: NORMAL
eth3允许VLAN 103

  1. 网络节点eth3端口
[root@w02 ~]# tcpdump -i eth3 icmp -nnee
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth3, link-type EN10MB (Ethernet), capture size 262144 bytes
17:54:36.698733 fa:17:3e:f7:cd:bc > 5c:c9:99:60:d2:71, ethertype 802.1Q (0x8100), length 146: vlan 103, p 0, ethertype IPv4, 112.65.210.200 > 8.8.8.8: ICMP echo request, id 60691, seq 55669, length 108
17:54:36.732114 5c:c9:99:60:d2:71 > fa:17:3e:f7:cd:bc, ethertype 802.1Q (0x8100), length 114: vlan 103, p 0, ethertype IPv4, 8.8.8.8 > 112.65.210.200: ICMP echo reply, id 60691, seq 55669, length 76

说明:

流量包: vlan 103(出br-ex时)
流量通过eth3口送给物理交换机

流量包成功从虚拟机->宿主机-> 网络节点-> 物理交换机-> 互联网.

结语

本文流量路径和社区版本有一些区别, 如果对网络感兴趣, 建议还是根据实际情况, 结合用到的各个知识点自己绘制一次流量路径, 再碰到其他的使用场景也就能举一反三了.

发表回复

您的电子邮箱地址不会被公开。