diff mbox series

[ovs-dev,1/1] packets: Re-calculate IPv6 checksum only for last frags upon modify.

Message ID 20220606101218.63772-1-salems@nvidia.com
State Changes Requested
Headers show
Series [ovs-dev,1/1] packets: Re-calculate IPv6 checksum only for last frags upon modify. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/intel-ovs-compilation fail test: fail

Commit Message

Salem Sol June 6, 2022, 10:12 a.m. UTC
In case of modifying an IPv6 packet src/dst address the checksum should be
recalculated only for the last frag. Currently it's done for all frags,
leading to incorrect reassembled packet checksum.
Fix it by adding a new flag to recalculate the checksum only for last frag.

Fixes: bc7a5acdff08 ("datapath: add ipv6 'set' action")
Signed-off-by: Salem Sol <salems@nvidia.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
---
 lib/packets.c           | 20 ++++++++++++++++++--
 tests/system-traffic.at | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+), 2 deletions(-)

Comments

Ilya Maximets June 7, 2022, 10:14 a.m. UTC | #1
On 6/6/22 12:12, Salem Sol via dev wrote:
> In case of modifying an IPv6 packet src/dst address the checksum should be
> recalculated only for the last frag. Currently it's done for all frags,
> leading to incorrect reassembled packet checksum.
> Fix it by adding a new flag to recalculate the checksum only for last frag.

Hi.  Thanks for working on this!

There is definitely a problem with checksum calculation, but I don't
understand the logic of the fix.  We're talking about re-calculation
of the L4 checksum, but the *last* fragment should not contain the L4
header.  So, we're just corrupting the data of the last fragment.  If
we want to re-calculate the checksum, we should do that for the *first*
fragment instead, because it's the only fragment that will likely
contain the L4 header.  Or am I missing something?

BTW, it looks like we have the same issue for ipv4 packets.  We should
update checksum only for the first fragment.  This can be a separate
fix though.

Some more comments inline.

Best regards, Ilya Maximets.

> 
> Fixes: bc7a5acdff08 ("datapath: add ipv6 'set' action")
> Signed-off-by: Salem Sol <salems@nvidia.com>
> Acked-by: Mike Pattrick <mkp@redhat.com>
> ---
>  lib/packets.c           | 20 ++++++++++++++++++--
>  tests/system-traffic.at | 35 +++++++++++++++++++++++++++++++++++
>  2 files changed, 53 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/packets.c b/lib/packets.c
> index d0fba8176..b7aef41a5 100644
> --- a/lib/packets.c
> +++ b/lib/packets.c
> @@ -1148,6 +1148,18 @@ packet_set_ipv4_addr(struct dp_packet *packet,
>      put_16aligned_be32(addr, new_addr);
>  }
>  
> +static bool
> +packet_is_last_ipv6_frag(struct dp_packet *packet)
> +{
> +    const struct ovs_16aligned_ip6_frag *frag_hdr;
> +    uint8_t *data = dp_packet_l3(packet);
> +
> +    data += sizeof (struct ovs_16aligned_ip6_hdr);
> +    frag_hdr = ALIGNED_CAST(struct ovs_16aligned_ip6_frag *, data);
> +    return (frag_hdr->ip6f_offlg & IP6F_OFF_MASK) &&
> +           !(frag_hdr->ip6f_offlg & IP6F_MORE_FRAG);
> +}
> +
>  /* Returns true, if packet contains at least one routing header where
>   * segements_left > 0.
>   *
> @@ -1334,17 +1346,21 @@ packet_set_ipv6(struct dp_packet *packet, const struct in6_addr *src,
>  {
>      struct ovs_16aligned_ip6_hdr *nh = dp_packet_l3(packet);
>      uint8_t proto = 0;
> +    bool recalc_csum;
>      bool rh_present;
>  
>      rh_present = packet_rh_present(packet, &proto);
> +    recalc_csum = nh->ip6_nxt == IPPROTO_FRAGMENT ?
> +        packet_is_last_ipv6_frag(packet) : true;

The fragmentation header is usually the last extension header,
not the first.  This check will work for the simple case with
only one extension header, but it will not work in the general
case.  We should search for the fragmentation header the same
way as we look for the routing header in packet_rh_present().
We should, probably, modify the packet_rh_present() function
to also say if the packet is fragmented / is it the first
fragment.

>  
>      if (memcmp(&nh->ip6_src, src, sizeof(ovs_be32[4]))) {
> -        packet_set_ipv6_addr(packet, proto, nh->ip6_src.be32, src, true);
> +        packet_set_ipv6_addr(packet, proto, nh->ip6_src.be32,
> +                             src, recalc_csum);
>      }
>  
>      if (memcmp(&nh->ip6_dst, dst, sizeof(ovs_be32[4]))) {
>          packet_set_ipv6_addr(packet, proto, nh->ip6_dst.be32, dst,
> -                             !rh_present);
> +                             !rh_present && recalc_csum);
>      }
>  
>      packet_set_ipv6_tc(&nh->ip6_flow, key_tc);
> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> index 239105e89..16ba42d84 100644
> --- a/tests/system-traffic.at
> +++ b/tests/system-traffic.at
> @@ -192,6 +192,41 @@ NS_CHECK_EXEC([at_ns0], [ping6 -s 3200 -q -c 3 -i 0.3 -w 2 fc00:1::2 | FORMAT_PI
>  OVS_TRAFFIC_VSWITCHD_STOP
>  AT_CLEANUP
>  
> +AT_SETUP([datapath - ping6 between two ports with header modify])
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH(p0, at_ns0, br0, "fc00::1/96", e4:11:22:33:44:55)
> +ADD_VETH(p1, at_ns1, br0, "fc00::2/96", e4:11:22:33:44:54)
> +NS_CHECK_EXEC([at_ns0], [ip -6 neigh add fc00::3 lladdr e4:11:22:33:44:54 dev p0])
> +NS_CHECK_EXEC([at_ns0], [ip -6 neigh add fc00::2 lladdr e4:11:22:33:44:54 dev p0])
> +NS_CHECK_EXEC([at_ns0], [ip -6 neigh add fc00::1 lladdr e4:11:22:33:44:55 dev p0])
> +
> +dnl Linux seems to take a little time to get its IPv6 stack in order. Without
> +dnl waiting, we get occasional failures due to the following error:
> +dnl "connect: Cannot assign requested address"
> +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2])
> +
> +AT_CHECK([ovs-ofctl del-flows -OOpenFlow15 br0], [], [stdout], [stderr])
> +AT_CHECK([ovs-ofctl add-flow -OOpenFlow15 br0 in_port=ovs-p0,ipv6,ipv6_dst=fc00::3,ipv6_src=fc00::1,actions=set_field:fc00::2-\>ipv6_dst,ovs-p1], [], [stdout], [stderr])
> +AT_CHECK([ovs-ofctl add-flow -OOpenFlow15 br0 in_port=ovs-p1,ipv6,ipv6_dst=fc00::1,ipv6_src=fc00::2,actions=set_field:fc00::3-\>ipv6_src,ovs-p0], [], [stdout], [stderr])

The ', [], [stdout], [stderr]' part in above commands should
not be needed.

> +
> +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::3 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +NS_CHECK_EXEC([at_ns0], [ping6 -s 1600 -q -c 3 -i 0.3 -w 2 fc00::3 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +NS_CHECK_EXEC([at_ns0], [ping6 -s 3200 -q -c 3 -i 0.3 -w 2 fc00::3 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
>  AT_SETUP([datapath - ping over bond])
>  OVS_TRAFFIC_VSWITCHD_START()
>
Salem Sol June 7, 2022, 3 p.m. UTC | #2
> -----Original Message-----
> From: Ilya Maximets <i.maximets@ovn.org>
> Sent: Tuesday, June 7, 2022 1:14 PM
> To: Salem Sol <salems@nvidia.com>; dev@openvswitch.org
> Cc: Eli Britstein <elibr@nvidia.com>; Michael Pattrick <mkp@redhat.com>;
> i.maximets@ovn.org
> Subject: Re: [ovs-dev] [PATCH 1/1] packets: Re-calculate IPv6 checksum only
> for last frags upon modify.
> 
> External email: Use caution opening links or attachments
> 
> 
> On 6/6/22 12:12, Salem Sol via dev wrote:
> > In case of modifying an IPv6 packet src/dst address the checksum
> > should be recalculated only for the last frag. Currently it's done for
> > all frags, leading to incorrect reassembled packet checksum.
> > Fix it by adding a new flag to recalculate the checksum only for last frag.
> 
> Hi.  Thanks for working on this!
> 
> There is definitely a problem with checksum calculation, but I don't
> understand the logic of the fix.  We're talking about re-calculation of the L4
> checksum, but the *last* fragment should not contain the L4 header.  So,
> we're just corrupting the data of the last fragment.  If we want to re-calculate
> the checksum, we should do that for the *first* fragment instead, because
> it's the only fragment that will likely contain the L4 header.  Or am I missing
> something?
> 
> BTW, it looks like we have the same issue for ipv4 packets.  We should
> update checksum only for the first fragment.  This can be a separate fix
> though.
> 
> Some more comments inline.
> 
> Best regards, Ilya Maximets.
> 
Thanks for the catch, you're right, my bad, in my testing I used only one extension header and the original patchset seemed to resolve the issue.
New implementation can be found in V3.
> >
> > Fixes: bc7a5acdff08 ("datapath: add ipv6 'set' action")
> > Signed-off-by: Salem Sol <salems@nvidia.com>
> > Acked-by: Mike Pattrick <mkp@redhat.com>
> > ---
> >  lib/packets.c           | 20 ++++++++++++++++++--
> >  tests/system-traffic.at | 35 +++++++++++++++++++++++++++++++++++
> >  2 files changed, 53 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/packets.c b/lib/packets.c index d0fba8176..b7aef41a5
> > 100644
> > --- a/lib/packets.c
> > +++ b/lib/packets.c
> > @@ -1148,6 +1148,18 @@ packet_set_ipv4_addr(struct dp_packet
> *packet,
> >      put_16aligned_be32(addr, new_addr);  }
> >
> > +static bool
> > +packet_is_last_ipv6_frag(struct dp_packet *packet) {
> > +    const struct ovs_16aligned_ip6_frag *frag_hdr;
> > +    uint8_t *data = dp_packet_l3(packet);
> > +
> > +    data += sizeof (struct ovs_16aligned_ip6_hdr);
> > +    frag_hdr = ALIGNED_CAST(struct ovs_16aligned_ip6_frag *, data);
> > +    return (frag_hdr->ip6f_offlg & IP6F_OFF_MASK) &&
> > +           !(frag_hdr->ip6f_offlg & IP6F_MORE_FRAG); }
> > +
> >  /* Returns true, if packet contains at least one routing header where
> >   * segements_left > 0.
> >   *
> > @@ -1334,17 +1346,21 @@ packet_set_ipv6(struct dp_packet *packet,
> > const struct in6_addr *src,  {
> >      struct ovs_16aligned_ip6_hdr *nh = dp_packet_l3(packet);
> >      uint8_t proto = 0;
> > +    bool recalc_csum;
> >      bool rh_present;
> >
> >      rh_present = packet_rh_present(packet, &proto);
> > +    recalc_csum = nh->ip6_nxt == IPPROTO_FRAGMENT ?
> > +        packet_is_last_ipv6_frag(packet) : true;
> 
> The fragmentation header is usually the last extension header, not the first.
> This check will work for the simple case with only one extension header, but
> it will not work in the general case.  We should search for the fragmentation
> header the same way as we look for the routing header in
> packet_rh_present().
> We should, probably, modify the packet_rh_present() function to also say if
> the packet is fragmented / is it the first fragment.
> 
Thanks for the comment, I followed your suggestion and changed packet_rh_present() to update if it's the first frag in V3.
> >
> >      if (memcmp(&nh->ip6_src, src, sizeof(ovs_be32[4]))) {
> > -        packet_set_ipv6_addr(packet, proto, nh->ip6_src.be32, src, true);
> > +        packet_set_ipv6_addr(packet, proto, nh->ip6_src.be32,
> > +                             src, recalc_csum);
> >      }
> >
> >      if (memcmp(&nh->ip6_dst, dst, sizeof(ovs_be32[4]))) {
> >          packet_set_ipv6_addr(packet, proto, nh->ip6_dst.be32, dst,
> > -                             !rh_present);
> > +                             !rh_present && recalc_csum);
> >      }
> >
> >      packet_set_ipv6_tc(&nh->ip6_flow, key_tc); diff --git
> > a/tests/system-traffic.at b/tests/system-traffic.at index
> > 239105e89..16ba42d84 100644
> > --- a/tests/system-traffic.at
> > +++ b/tests/system-traffic.at
> > @@ -192,6 +192,41 @@ NS_CHECK_EXEC([at_ns0], [ping6 -s 3200 -q -c 3 -i
> > 0.3 -w 2 fc00:1::2 | FORMAT_PI  OVS_TRAFFIC_VSWITCHD_STOP
> AT_CLEANUP
> >
> > +AT_SETUP([datapath - ping6 between two ports with header modify])
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH(p0, at_ns0, br0, "fc00::1/96", e4:11:22:33:44:55)
> > +ADD_VETH(p1, at_ns1, br0, "fc00::2/96", e4:11:22:33:44:54)
> > +NS_CHECK_EXEC([at_ns0], [ip -6 neigh add fc00::3 lladdr
> > +e4:11:22:33:44:54 dev p0]) NS_CHECK_EXEC([at_ns0], [ip -6 neigh add
> > +fc00::2 lladdr e4:11:22:33:44:54 dev p0]) NS_CHECK_EXEC([at_ns0], [ip
> > +-6 neigh add fc00::1 lladdr e4:11:22:33:44:55 dev p0])
> > +
> > +dnl Linux seems to take a little time to get its IPv6 stack in order.
> > +Without dnl waiting, we get occasional failures due to the following error:
> > +dnl "connect: Cannot assign requested address"
> > +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2])
> > +
> > +AT_CHECK([ovs-ofctl del-flows -OOpenFlow15 br0], [], [stdout],
> > +[stderr]) AT_CHECK([ovs-ofctl add-flow -OOpenFlow15 br0
> > +in_port=ovs-p0,ipv6,ipv6_dst=fc00::3,ipv6_src=fc00::1,actions=set_fie
> > +ld:fc00::2-\>ipv6_dst,ovs-p1], [], [stdout], [stderr])
> > +AT_CHECK([ovs-ofctl add-flow -OOpenFlow15 br0
> > +in_port=ovs-p1,ipv6,ipv6_dst=fc00::1,ipv6_src=fc00::2,actions=set_fie
> > +ld:fc00::3-\>ipv6_src,ovs-p0], [], [stdout], [stderr])
> 
> The ', [], [stdout], [stderr]' part in above commands should not be needed.
ACK.
> 
> > +
> > +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::3 |
> > +FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping6 -s 1600 -q -c 3 -i 0.3 -w 2 fc00::3 |
> > +FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping6 -s 3200 -q -c 3 -i 0.3 -w 2 fc00::3 |
> > +FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> >  AT_SETUP([datapath - ping over bond])
> >  OVS_TRAFFIC_VSWITCHD_START()
> >
diff mbox series

Patch

diff --git a/lib/packets.c b/lib/packets.c
index d0fba8176..b7aef41a5 100644
--- a/lib/packets.c
+++ b/lib/packets.c
@@ -1148,6 +1148,18 @@  packet_set_ipv4_addr(struct dp_packet *packet,
     put_16aligned_be32(addr, new_addr);
 }
 
+static bool
+packet_is_last_ipv6_frag(struct dp_packet *packet)
+{
+    const struct ovs_16aligned_ip6_frag *frag_hdr;
+    uint8_t *data = dp_packet_l3(packet);
+
+    data += sizeof (struct ovs_16aligned_ip6_hdr);
+    frag_hdr = ALIGNED_CAST(struct ovs_16aligned_ip6_frag *, data);
+    return (frag_hdr->ip6f_offlg & IP6F_OFF_MASK) &&
+           !(frag_hdr->ip6f_offlg & IP6F_MORE_FRAG);
+}
+
 /* Returns true, if packet contains at least one routing header where
  * segements_left > 0.
  *
@@ -1334,17 +1346,21 @@  packet_set_ipv6(struct dp_packet *packet, const struct in6_addr *src,
 {
     struct ovs_16aligned_ip6_hdr *nh = dp_packet_l3(packet);
     uint8_t proto = 0;
+    bool recalc_csum;
     bool rh_present;
 
     rh_present = packet_rh_present(packet, &proto);
+    recalc_csum = nh->ip6_nxt == IPPROTO_FRAGMENT ?
+        packet_is_last_ipv6_frag(packet) : true;
 
     if (memcmp(&nh->ip6_src, src, sizeof(ovs_be32[4]))) {
-        packet_set_ipv6_addr(packet, proto, nh->ip6_src.be32, src, true);
+        packet_set_ipv6_addr(packet, proto, nh->ip6_src.be32,
+                             src, recalc_csum);
     }
 
     if (memcmp(&nh->ip6_dst, dst, sizeof(ovs_be32[4]))) {
         packet_set_ipv6_addr(packet, proto, nh->ip6_dst.be32, dst,
-                             !rh_present);
+                             !rh_present && recalc_csum);
     }
 
     packet_set_ipv6_tc(&nh->ip6_flow, key_tc);
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 239105e89..16ba42d84 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -192,6 +192,41 @@  NS_CHECK_EXEC([at_ns0], [ping6 -s 3200 -q -c 3 -i 0.3 -w 2 fc00:1::2 | FORMAT_PI
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([datapath - ping6 between two ports with header modify])
+OVS_TRAFFIC_VSWITCHD_START()
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH(p0, at_ns0, br0, "fc00::1/96", e4:11:22:33:44:55)
+ADD_VETH(p1, at_ns1, br0, "fc00::2/96", e4:11:22:33:44:54)
+NS_CHECK_EXEC([at_ns0], [ip -6 neigh add fc00::3 lladdr e4:11:22:33:44:54 dev p0])
+NS_CHECK_EXEC([at_ns0], [ip -6 neigh add fc00::2 lladdr e4:11:22:33:44:54 dev p0])
+NS_CHECK_EXEC([at_ns0], [ip -6 neigh add fc00::1 lladdr e4:11:22:33:44:55 dev p0])
+
+dnl Linux seems to take a little time to get its IPv6 stack in order. Without
+dnl waiting, we get occasional failures due to the following error:
+dnl "connect: Cannot assign requested address"
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2])
+
+AT_CHECK([ovs-ofctl del-flows -OOpenFlow15 br0], [], [stdout], [stderr])
+AT_CHECK([ovs-ofctl add-flow -OOpenFlow15 br0 in_port=ovs-p0,ipv6,ipv6_dst=fc00::3,ipv6_src=fc00::1,actions=set_field:fc00::2-\>ipv6_dst,ovs-p1], [], [stdout], [stderr])
+AT_CHECK([ovs-ofctl add-flow -OOpenFlow15 br0 in_port=ovs-p1,ipv6,ipv6_dst=fc00::1,ipv6_src=fc00::2,actions=set_field:fc00::3-\>ipv6_src,ovs-p0], [], [stdout], [stderr])
+
+NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::3 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping6 -s 1600 -q -c 3 -i 0.3 -w 2 fc00::3 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping6 -s 3200 -q -c 3 -i 0.3 -w 2 fc00::3 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_SETUP([datapath - ping over bond])
 OVS_TRAFFIC_VSWITCHD_START()