Message ID | 20240212195030.190385-2-mkp@redhat.com |
---|---|
State | Changes Requested |
Delegated to: | Simon Horman |
Headers | show |
Series | [ovs-dev,v2,1/4] Userspace: Software fallback for UDP encapsulated TCP segmentation. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | fail | github build: failed |
ovsrobot/intel-ovs-compilation | success | test: success |
Note, this failed github CI with the following error message: ../../tests/testsuite: line 4034: ./atconfig: No such file or directory This appears to be a false negative. -M On Mon, Feb 12, 2024 at 2:50 PM Mike Pattrick <mkp@redhat.com> wrote: > > Previously if an OVS configuration nested multiple layers of UDP tunnels > like VXLAN or GENEVE on top of each other through netdev-linux > interfaces, the vnet header would be incorrectly set to the outermost > UDP tunnel layer instead of the intermediary tunnel layer. > > This resulted in the middle UDP tunnel not checksum offloading properly. > > Fixes: 3337e6d91c5b ("userspace: Enable L4 checksum offloading by default.") > Reported-by: David Marchand <david.marchand@redhat.com> > Signed-off-by: Mike Pattrick <mkp@redhat.com> > --- > lib/netdev-linux.c | 15 ++++++++++++--- > 1 file changed, 12 insertions(+), 3 deletions(-) > > diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c > index 1b2e5b6c2..7a156cc28 100644 > --- a/lib/netdev-linux.c > +++ b/lib/netdev-linux.c > @@ -7239,14 +7239,23 @@ netdev_linux_prepend_vnet_hdr(struct dp_packet *b, int mtu) > vnet->csum_offset = (OVS_FORCE __virtio16) __builtin_offsetof( > struct tcp_header, tcp_csum); > } else if (dp_packet_hwol_l4_is_udp(b)) { > - struct udp_header *udp_hdr = dp_packet_l4(b); > + /* Favour the inner packet when indicating checksum offsets. */ > + void *l3_off = dp_packet_inner_l3(b); > + void *l4_off = dp_packet_inner_l4(b); > + > + if (!l3_off || !l4_off) { > + l3_off = dp_packet_l3(b); > + l4_off = dp_packet_l4(b); > + } > + struct udp_header *udp_hdr = l4_off; > + > ovs_be16 csum = 0; > > if (dp_packet_hwol_is_ipv4(b)) { > - const struct ip_header *ip_hdr = dp_packet_l3(b); > + const struct ip_header *ip_hdr = l3_off; > csum = ~csum_finish(packet_csum_pseudoheader(ip_hdr)); > } else if (dp_packet_hwol_tx_ipv6(b)) { > - const struct ovs_16aligned_ip6_hdr *ip6_hdr = dp_packet_l3(b); > + const struct ovs_16aligned_ip6_hdr *ip6_hdr = l4_off; > csum = ~csum_finish(packet_csum_pseudoheader6(ip6_hdr)); > } > > -- > 2.39.3 >
Hello Mike, On Mon, Feb 12, 2024 at 8:50 PM Mike Pattrick <mkp@redhat.com> wrote: > > Previously if an OVS configuration nested multiple layers of UDP tunnels > like VXLAN or GENEVE on top of each other through netdev-linux > interfaces, the vnet header would be incorrectly set to the outermost > UDP tunnel layer instead of the intermediary tunnel layer. > > This resulted in the middle UDP tunnel not checksum offloading properly. > > Fixes: 3337e6d91c5b ("userspace: Enable L4 checksum offloading by default.") > Reported-by: David Marchand <david.marchand@redhat.com> > Signed-off-by: Mike Pattrick <mkp@redhat.com> I have some trouble relating this patch to the issue I faced :-). Could you detail a test that shows the issue you fix here? After applying (only this patch), I still reproduce an issue with inner checksums. As I reported this issue to you offlist, let me put the details in public here. I wrote a system-traffic.at unit test that stacks 3 vxlan tunnels (separate topic, but for the context, my goal was to stress DPDK dp-packets wrt headroom). If I try this unit test before commit 084c8087292c ("userspace: Support VXLAN and GENEVE TSO."), I have no issue. The topology is as follows: ############################################################## # # at_ns0 . init_net # . # at_vxlan1 (10.1.1.1/24) . br0 (10.1.1.100/24) # (remote 172.31.1.100) . | # . at_vxlan0 # . (remote 172.31.1.1) # . # at_vxlan3 (172.31.1.1/24) . br-underlay0 (172.31.1.100/24) # (remote 172.31.2.100) . | # . at_vxlan2 # . (remote 172.31.2.1) # . # at_vxlan5 (172.31.2.1/24) . br-underlay1 (172.31.2.100/24) # (remote 172.31.3.100) . | # . at_vxlan4 # . (remote 172.31.3.1) # . # p0 (172.31.3.1/24) . br-underlay2 (172.31.3.100/24) # | . | # \-------------------------.-ovs-p0 # ############################################################## (gmail will probably bust this copy/paste, so putting a link to the actual test: https://github.com/david-marchand/ovs/commit/manyvxlan~2#diff-45a77f85f9679bc66ac97300392c0d5d9f5c53264fa8a82d735a553246e71faeR400) With this setup, I try to ping, from at_ns0 netns, the ip address of the br tap iface plugged with the other side of each tunnel: - Most outter level, no encapsulation, all good: 16:24:51.590966 a6:0a:bf:e2:f3:f2 > 82:cf:78:de:ed:46, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 63550, offset 0, flags [DF], proto ICMP (1), length 84) 172.31.3.1 > 172.31.3.100: ICMP echo request, id 26707, seq 1, length 64 16:24:51.591084 82:cf:78:de:ed:46 > a6:0a:bf:e2:f3:f2, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 28720, offset 0, flags [none], proto ICMP (1), length 84) 172.31.3.100 > 172.31.3.1: ICMP echo reply, id 26707, seq 1, length 64 - One tunnel encap all good: 16:24:54.140629 a6:0a:bf:e2:f3:f2 > 82:cf:78:de:ed:46, ethertype IPv4 (0x0800), length 148: (tos 0x0, ttl 64, id 61052, offset 0, flags [none], proto UDP (17), length 134) 172.31.3.1.36831 > 172.31.3.100.vxlan: [udp sum ok] VXLAN, flags [I] (0x08), vni 0 1e:db:ec:e5:28:6d > 9a:39:be:e8:18:4b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 54399, offset 0, flags [DF], proto ICMP (1), length 84) 172.31.2.1 > 172.31.2.100: ICMP echo request, id 51488, seq 1, length 64 16:24:54.140772 82:cf:78:de:ed:46 > a6:0a:bf:e2:f3:f2, ethertype IPv4 (0x0800), length 148: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 134) 172.31.3.100.39912 > 172.31.3.1.vxlan: [no cksum] VXLAN, flags [I] (0x08), vni 0 9a:39:be:e8:18:4b > 1e:db:ec:e5:28:6d, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 29701, offset 0, flags [none], proto ICMP (1), length 84) 172.31.2.100 > 172.31.2.1: ICMP echo reply, id 51488, seq 1, length 64 - Two tunnels encap: 16:24:58.578900 a6:0a:bf:e2:f3:f2 > 82:cf:78:de:ed:46, ethertype IPv4 (0x0800), length 142: (tos 0x0, ttl 64, id 61719, offset 0, flags [none], proto UDP (17), length 128) 172.31.3.1.50673 > 172.31.3.100.vxlan: [udp sum ok] VXLAN, flags [I] (0x08), vni 0 1e:db:ec:e5:28:6d > 9a:39:be:e8:18:4b, ethertype IPv4 (0x0800), length 92: (tos 0x0, ttl 64, id 35175, offset 0, flags [none], proto UDP (17), length 78) 172.31.2.1.44060 > 172.31.2.100.vxlan: [udp sum ok] VXLAN, flags [I] (0x08), vni 1 62:53:3f:82:da:56 > Broadcast, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 172.31.1.100 tell 172.31.1.1, length 28 16:24:58.579021 82:cf:78:de:ed:46 > a6:0a:bf:e2:f3:f2, ethertype IPv4 (0x0800), length 142: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 128) 172.31.3.100.56325 > 172.31.3.1.vxlan: [no cksum] VXLAN, flags [I] (0x08), vni 0 9a:39:be:e8:18:4b > 1e:db:ec:e5:28:6d, ethertype IPv4 (0x0800), length 92: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 78, bad cksum 0 (->ddfb)!) 172.31.2.100.56325 > 172.31.2.1.vxlan: [no cksum] VXLAN, flags [I] (0x08), vni 1 26:8d:e7:0e:41:45 > 62:53:3f:82:da:56, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Reply 172.31.1.100 is-at 26:8d:e7:0e:41:45, length 28 In this last packet capture, the inner vxlan tunnel header has wrong (well no) IP checksum. If I set the tunnel endpoint neighbors in each netns, the same issue can be seen for icmp encapsulated traffic: 17:41:39.303170 32:a4:65:eb:df:49 > a6:0a:bf:e2:f3:f2, ethertype IPv4 (0x0800), length 198: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 184) 172.31.3.100.59550 > 172.31.3.1.vxlan: [no cksum] VXLAN, flags [I] (0x08), vni 0 fe:72:7b:39:24:49 > 7e:9d:b3:a5:a2:f8, ethertype IPv4 (0x0800), length 148: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 134, bad cksum 0 (->ddc3)!) 172.31.2.100.59550 > 172.31.2.1.vxlan: [no cksum] VXLAN, flags [I] (0x08), vni 1 56:21:48:f3:f3:42 > 72:64:65:f9:38:0a, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 49609, offset 0, flags [none], proto ICMP (1), length 84) 172.31.1.100 > 172.31.1.1: ICMP echo reply, id 35387, seq 1, length 64 Running in GHA: https://github.com/david-marchand/ovs/actions/runs/7903675971/job/21572151269 Afaics, this issue is not reproduced when using the kernel datapath (check-kernel) but it is seen in GHA for check-system-userspace / check-dpdk / check-afxdp. I can reproduce those locally. There is also a failure with check-offloads in GHA which I can't reproduce locally (for the latter, it may be a kernel related issue (and I think there were some known issues..)). Now, if I apply below diff, my issue seems to go away for check-system-userspace / check-dpdk / check-afxdp. $ git diff -b HEAD^ diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c index 0d6d803fe4..7f3e2c0ae4 100644 --- a/lib/netdev-native-tnl.c +++ b/lib/netdev-native-tnl.c @@ -280,6 +280,7 @@ dp_packet_tnl_ol_process(struct dp_packet *packet, dp_packet_hwol_set_tx_ipv6(packet); } } + } /* Attention please, tunnel inner l2 len is consist of udp header * len and tunnel header len and inner l2 len. */ @@ -299,7 +300,6 @@ dp_packet_tnl_ol_process(struct dp_packet *packet, (char *) dp_packet_eth(packet) + VXLAN_HLEN); } - } } void But IPv6 tso tests have some failures, and some nsh test fails too regardless of which datapath is used, so this is not a right fix. Actual commit: https://github.com/david-marchand/ovs/commit/manyvxlan https://github.com/david-marchand/ovs/actions/runs/7904287294
On Wed, Feb 14, 2024 at 12:09 PM David Marchand <david.marchand@redhat.com> wrote: > > Hello Mike, > > On Mon, Feb 12, 2024 at 8:50 PM Mike Pattrick <mkp@redhat.com> wrote: > > > > Previously if an OVS configuration nested multiple layers of UDP tunnels > > like VXLAN or GENEVE on top of each other through netdev-linux > > interfaces, the vnet header would be incorrectly set to the outermost > > UDP tunnel layer instead of the intermediary tunnel layer. > > > > This resulted in the middle UDP tunnel not checksum offloading properly. > > > > Fixes: 3337e6d91c5b ("userspace: Enable L4 checksum offloading by default.") > > Reported-by: David Marchand <david.marchand@redhat.com> > > Signed-off-by: Mike Pattrick <mkp@redhat.com> > > I have some trouble relating this patch to the issue I faced :-). > Could you detail a test that shows the issue you fix here? I made a slight modification to the test suite you provided adding a UDP traffic test to your test; which unveiled this issue. But you are correct, I got so distracted by one issue that I missed other issues. > > After applying (only this patch), I still reproduce an issue with > inner checksums. > As I reported this issue to you offlist, let me put the details in public here. > > I wrote a system-traffic.at unit test that stacks 3 vxlan tunnels > (separate topic, but for the context, my goal was to stress DPDK > dp-packets wrt headroom). > If I try this unit test before commit 084c8087292c ("userspace: > Support VXLAN and GENEVE TSO."), I have no issue. > > The topology is as follows: > ############################################################## > # > # at_ns0 . init_net > # . > # at_vxlan1 (10.1.1.1/24) . br0 (10.1.1.100/24) > # (remote 172.31.1.100) . | > # . at_vxlan0 > # . (remote 172.31.1.1) > # . > # at_vxlan3 (172.31.1.1/24) . br-underlay0 (172.31.1.100/24) > # (remote 172.31.2.100) . | > # . at_vxlan2 > # . (remote 172.31.2.1) > # . > # at_vxlan5 (172.31.2.1/24) . br-underlay1 (172.31.2.100/24) > # (remote 172.31.3.100) . | > # . at_vxlan4 > # . (remote 172.31.3.1) > # . > # p0 (172.31.3.1/24) . br-underlay2 (172.31.3.100/24) > # | . | > # \-------------------------.-ovs-p0 > # > ############################################################## > > (gmail will probably bust this copy/paste, so putting a link to the > actual test: https://github.com/david-marchand/ovs/commit/manyvxlan~2#diff-45a77f85f9679bc66ac97300392c0d5d9f5c53264fa8a82d735a553246e71faeR400) > > With this setup, I try to ping, from at_ns0 netns, the ip address of > the br tap iface plugged with the other side of each tunnel: > > - Most outter level, no encapsulation, all good: > 16:24:51.590966 a6:0a:bf:e2:f3:f2 > 82:cf:78:de:ed:46, ethertype IPv4 > (0x0800), length 98: (tos 0x0, ttl 64, id 63550, offset 0, flags [DF], > proto ICMP (1), length 84) > 172.31.3.1 > 172.31.3.100: ICMP echo request, id 26707, seq 1, length 64 > > 16:24:51.591084 82:cf:78:de:ed:46 > a6:0a:bf:e2:f3:f2, ethertype IPv4 > (0x0800), length 98: (tos 0x0, ttl 64, id 28720, offset 0, flags > [none], proto ICMP (1), length 84) > 172.31.3.100 > 172.31.3.1: ICMP echo reply, id 26707, seq 1, length 64 > > - One tunnel encap all good: > 16:24:54.140629 a6:0a:bf:e2:f3:f2 > 82:cf:78:de:ed:46, ethertype IPv4 > (0x0800), length 148: (tos 0x0, ttl 64, id 61052, offset 0, flags > [none], proto UDP (17), length 134) > 172.31.3.1.36831 > 172.31.3.100.vxlan: [udp sum ok] VXLAN, flags > [I] (0x08), vni 0 > 1e:db:ec:e5:28:6d > 9a:39:be:e8:18:4b, ethertype IPv4 (0x0800), length > 98: (tos 0x0, ttl 64, id 54399, offset 0, flags [DF], proto ICMP (1), > length 84) > 172.31.2.1 > 172.31.2.100: ICMP echo request, id 51488, seq 1, length 64 > > 16:24:54.140772 82:cf:78:de:ed:46 > a6:0a:bf:e2:f3:f2, ethertype IPv4 > (0x0800), length 148: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], > proto UDP (17), length 134) > 172.31.3.100.39912 > 172.31.3.1.vxlan: [no cksum] VXLAN, flags [I] > (0x08), vni 0 > 9a:39:be:e8:18:4b > 1e:db:ec:e5:28:6d, ethertype IPv4 (0x0800), length > 98: (tos 0x0, ttl 64, id 29701, offset 0, flags [none], proto ICMP > (1), length 84) > 172.31.2.100 > 172.31.2.1: ICMP echo reply, id 51488, seq 1, length 64 > > - Two tunnels encap: > 16:24:58.578900 a6:0a:bf:e2:f3:f2 > 82:cf:78:de:ed:46, ethertype IPv4 > (0x0800), length 142: (tos 0x0, ttl 64, id 61719, offset 0, flags > [none], proto UDP (17), length 128) > 172.31.3.1.50673 > 172.31.3.100.vxlan: [udp sum ok] VXLAN, flags > [I] (0x08), vni 0 > 1e:db:ec:e5:28:6d > 9a:39:be:e8:18:4b, ethertype IPv4 (0x0800), length > 92: (tos 0x0, ttl 64, id 35175, offset 0, flags [none], proto UDP > (17), length 78) > 172.31.2.1.44060 > 172.31.2.100.vxlan: [udp sum ok] VXLAN, flags > [I] (0x08), vni 1 > 62:53:3f:82:da:56 > Broadcast, ethertype ARP (0x0806), length 42: > Ethernet (len 6), IPv4 (len 4), Request who-has 172.31.1.100 tell > 172.31.1.1, length 28 > > 16:24:58.579021 82:cf:78:de:ed:46 > a6:0a:bf:e2:f3:f2, ethertype IPv4 > (0x0800), length 142: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], > proto UDP (17), length 128) > 172.31.3.100.56325 > 172.31.3.1.vxlan: [no cksum] VXLAN, flags [I] > (0x08), vni 0 > 9a:39:be:e8:18:4b > 1e:db:ec:e5:28:6d, ethertype IPv4 (0x0800), length > 92: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), > length 78, bad cksum 0 (->ddfb)!) > 172.31.2.100.56325 > 172.31.2.1.vxlan: [no cksum] VXLAN, flags [I] > (0x08), vni 1 > 26:8d:e7:0e:41:45 > 62:53:3f:82:da:56, ethertype ARP (0x0806), length > 42: Ethernet (len 6), IPv4 (len 4), Reply 172.31.1.100 is-at > 26:8d:e7:0e:41:45, length 28 > > In this last packet capture, the inner vxlan tunnel header has wrong > (well no) IP checksum. > > If I set the tunnel endpoint neighbors in each netns, the same issue > can be seen for icmp encapsulated traffic: > > 17:41:39.303170 32:a4:65:eb:df:49 > a6:0a:bf:e2:f3:f2, ethertype IPv4 > (0x0800), length 198: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], > proto UDP (17), length 184) > 172.31.3.100.59550 > 172.31.3.1.vxlan: [no cksum] VXLAN, flags [I] > (0x08), vni 0 > fe:72:7b:39:24:49 > 7e:9d:b3:a5:a2:f8, ethertype IPv4 (0x0800), length > 148: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), > length 134, bad cksum 0 (->ddc3)!) > 172.31.2.100.59550 > 172.31.2.1.vxlan: [no cksum] VXLAN, flags [I] > (0x08), vni 1 > 56:21:48:f3:f3:42 > 72:64:65:f9:38:0a, ethertype IPv4 (0x0800), length > 98: (tos 0x0, ttl 64, id 49609, offset 0, flags [none], proto ICMP > (1), length 84) > 172.31.1.100 > 172.31.1.1: ICMP echo reply, id 35387, seq 1, length 64 > > > Running in GHA: > https://github.com/david-marchand/ovs/actions/runs/7903675971/job/21572151269 > > Afaics, this issue is not reproduced when using the kernel datapath > (check-kernel) but it is seen in GHA for check-system-userspace / > check-dpdk / check-afxdp. > I can reproduce those locally. > > There is also a failure with check-offloads in GHA which I can't > reproduce locally (for the latter, it may be a kernel related issue > (and I think there were some known issues..)). > > > Now, if I apply below diff, my issue seems to go away for > check-system-userspace / check-dpdk / check-afxdp. > $ git diff -b HEAD^ > diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c > index 0d6d803fe4..7f3e2c0ae4 100644 > --- a/lib/netdev-native-tnl.c > +++ b/lib/netdev-native-tnl.c > @@ -280,6 +280,7 @@ dp_packet_tnl_ol_process(struct dp_packet *packet, > dp_packet_hwol_set_tx_ipv6(packet); > } > } > + } > > /* Attention please, tunnel inner l2 len is consist of udp header > * len and tunnel header len and inner l2 len. */ > @@ -299,7 +300,6 @@ dp_packet_tnl_ol_process(struct dp_packet *packet, > (char *) dp_packet_eth(packet) + > VXLAN_HLEN); > } > - } > } > > void > > But IPv6 tso tests have some failures, and some nsh test fails too > regardless of which datapath is used, so this is not a right fix. One of the problems here is when the tunnel offload flag is set, we set dp_packet_hwol_set_tx_outer_ipv4_csum() instead of dp_packet_hwol_set_tx_ipv4_csum(). But when modifying IP addresses, we check for dp_packet_hwol_tx_ip_csum() not the outer ip csum flag. Likewise, in netdev_tnl_ip_extract_tnl_md we also only check for ip csum, not the outer flag. Since dp_packet_l3*() functions always refer to the outermost packet but the ipv4_csum() functions refer to the inner packet there is some tension here, and some incorrect behaviour. I've made a branch where we properly account for outer and inner checksums, and it passes the tests mostly, except for afxdp. For afxdp we crash in dp_packet_prealloc_headroom(). netdev-afxdp has a hardcoded OVS_XDP_HEADROOM=128 bytes and the multiple layers of tunneling exceeds that. I ran a test where I set this to 256 and the test passes, but that seems like a non-ideal solution. We probably shouldn't abort() in dp_packet_resize(), as it could be possible to accidentally run into this. Dropping the packet is probably preferable IMO, but that is also a very large change, as none of the calling functions have return codes themselves and some of the 2rd degree call backs don't either, so many functions will need to change. You can see the branch here: https://github.com/mkp-rh/ovs/tree/multitun And the test run here: https://github.com/mkp-rh/ovs/actions/runs/7911539363 I'll clean up this a bit and address some of the other things mentioned, like the incorrect Fixes tag. Cheers, M > > Actual commit: > https://github.com/david-marchand/ovs/commit/manyvxlan > https://github.com/david-marchand/ovs/actions/runs/7904287294 > > > > -- > David Marchand >
On Thu, Feb 15, 2024 at 7:03 AM Mike Pattrick <mkp@redhat.com> wrote: > I've made a branch where we properly account for outer and inner > checksums, and it passes the tests mostly, except for afxdp. > > For afxdp we crash in dp_packet_prealloc_headroom(). netdev-afxdp has > a hardcoded OVS_XDP_HEADROOM=128 bytes and the multiple layers of > tunneling exceeds that. I ran a test where I set this to 256 and the > test passes, but that seems like a non-ideal solution. We probably > shouldn't abort() in dp_packet_resize(), as it could be possible to > accidentally run into this. This is exactly the point I wanted to stress with DPDK dp-packets. The reason behind was to check this old patch of mine: https://patchwork.ozlabs.org/project/openvswitch/patch/20220318153339.31083-1-david.marchand@redhat.com/ DPDK dp-packets data are supposed to be located at RTE_PKTMBUF_HEADROOM == 128 bytes, on rx. But I uncovered recently that we won't hit this headroom limit with net/af_xdp backing netdev-dpdk ports... The net/af_xdp driver tries to be smart and avoid copies by using the unaligned chunk af_xdp feature. https://git.dpdk.org/dpdk/commit/?id=d8a210774e1d4c295fd93b983538da0d15312edd A consequence is that this driver places received data with a 384 bytes headroom (RTE_PKTMBUF_HEADROOM + XDP_PACKET_HEADROOM). Which then defeats my unit test... This placement of data looks incorrect to me, from the DPDK mbuf API "spirit". Applications expect a RTE_PKTMBUF_HEADROOM headroom, and they size their buffers accordingly. This extra headroom would mean applications need to account for this peculiarity when using this driver... I will need to spend more time on this, but not now. > > Dropping the packet is probably preferable IMO, but that is also a > very large change, as none of the calling functions have return codes > themselves and some of the 2rd degree call backs don't either, so many > functions will need to change. Or extend dp_packet_resize() for af_xdp dp-packet. The tricky part is that the dp-packet is part of a umem buffer. If we make a af_xdp dp-packet points at a different malloc'd data buffer, we need to distinguish for this case when freeing this dp-packet. I can put this on my todolist. > > You can see the branch here: https://github.com/mkp-rh/ovs/tree/multitun > And the test run here: https://github.com/mkp-rh/ovs/actions/runs/7911539363 > > I'll clean up this a bit and address some of the other things > mentioned, like the incorrect Fixes tag. We don't need to fix all issues, the main point is the inner checksum issue, as it is something that got broken in 3.3. If we strip the 3rd layer of tunnel from my unit test, it would be enough to reproduce without hitting af_xdp headroom limit. Or do you think we can extend an existing test? At least, fixes should be isolated from the new features like one introduced in patch 1 of this series.
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index 1b2e5b6c2..7a156cc28 100644 --- a/lib/netdev-linux.c +++ b/lib/netdev-linux.c @@ -7239,14 +7239,23 @@ netdev_linux_prepend_vnet_hdr(struct dp_packet *b, int mtu) vnet->csum_offset = (OVS_FORCE __virtio16) __builtin_offsetof( struct tcp_header, tcp_csum); } else if (dp_packet_hwol_l4_is_udp(b)) { - struct udp_header *udp_hdr = dp_packet_l4(b); + /* Favour the inner packet when indicating checksum offsets. */ + void *l3_off = dp_packet_inner_l3(b); + void *l4_off = dp_packet_inner_l4(b); + + if (!l3_off || !l4_off) { + l3_off = dp_packet_l3(b); + l4_off = dp_packet_l4(b); + } + struct udp_header *udp_hdr = l4_off; + ovs_be16 csum = 0; if (dp_packet_hwol_is_ipv4(b)) { - const struct ip_header *ip_hdr = dp_packet_l3(b); + const struct ip_header *ip_hdr = l3_off; csum = ~csum_finish(packet_csum_pseudoheader(ip_hdr)); } else if (dp_packet_hwol_tx_ipv6(b)) { - const struct ovs_16aligned_ip6_hdr *ip6_hdr = dp_packet_l3(b); + const struct ovs_16aligned_ip6_hdr *ip6_hdr = l4_off; csum = ~csum_finish(packet_csum_pseudoheader6(ip6_hdr)); }
Previously if an OVS configuration nested multiple layers of UDP tunnels like VXLAN or GENEVE on top of each other through netdev-linux interfaces, the vnet header would be incorrectly set to the outermost UDP tunnel layer instead of the intermediary tunnel layer. This resulted in the middle UDP tunnel not checksum offloading properly. Fixes: 3337e6d91c5b ("userspace: Enable L4 checksum offloading by default.") Reported-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Mike Pattrick <mkp@redhat.com> --- lib/netdev-linux.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)