diff mbox series

[ovs-dev] netdev-native-tnl: Fix use of uninitialized RSS hash.

Message ID 20241129163646.2722101-1-i.maximets@ovn.org
State Accepted
Commit 40ba3fc93612c5bbafa693f460cecf525921c1eb
Headers show
Series [ovs-dev] netdev-native-tnl: Fix use of uninitialized RSS hash. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed

Commit Message

Ilya Maximets Nov. 29, 2024, 4:36 p.m. UTC
RSS hash calculation for a packet may be skipped in some cases.  One
of them is a simple match optimization.  Packet is not fully parsed
for the simple match, so there is no enough data to calculate the full
5-tuple hash.  However, when such a packet needs tunnel encapsulation,
we need RSS hash to calculate the source port for the outer UDP header.
And netdev_tnl_get_src_port() function doesn't check if the hash is
valid before using it.  So, such packets will likely end up with
different and unpredictable source ports potentially causing packet
reordering or other issues in the network:

 WARNING: MemorySanitizer: use-of-uninitialized-value
  0 0x10c129c in dp_packet_get_rss_hash lib/dp-packet.h:1029:5
  1 0x10b264c in netdev_tnl_get_src_port lib/netdev-native-tnl.h:131:12
  2 0x10b171a in netdev_tnl_push_udp_header lib/netdev-native-tnl.c:286:20
  3 0xb772fe in netdev_push_header lib/netdev.c:1037:13
  4 0x9673c4 in push_tnl_action lib/dpif-netdev.c:9067:11
  5 0x961abe in dp_execute_cb lib/dpif-netdev.c:9226:13
  6 0xbcb4b1 in odp_execute_actions lib/odp-execute.c:1008:17
  7 0x8e939f in dp_netdev_execute_actions lib/dpif-netdev.c:9524:5
  8 0x968f3f in dp_execute_userspace_action lib/dpif-netdev.c:9093:9
  9 0x962e54 in dp_execute_cb lib/dpif-netdev.c:9307:17
 10 0xbcb4b1 in odp_execute_actions lib/odp-execute.c:1008:17
 11 0x8e939f in dp_netdev_execute_actions lib/dpif-netdev.c:9524:5
 12 0x950fef in packet_batch_per_flow_execute lib/dpif-netdev.c:8271:5
 13 0x8ec8db in dp_netdev_input__ lib/dpif-netdev.c:8899:9
 14 0x8eb8ec in dp_netdev_input lib/dpif-netdev.c:8908:5
 15 0x92d5e8 in dp_netdev_process_rxq_port lib/dpif-netdev.c:5660:19
 16 0x8ee2c4 in dpif_netdev_run lib/dpif-netdev.c:6993:25
 17 0x9b442f in dpif_run lib/dpif.c:471:16
 18 0x5f8e3a in type_run ofproto/ofproto-dpif.c:367:9
 19 0x56c508 in ofproto_type_run ofproto/ofproto.c:1879:31
 20 0x4cb388 in bridge_run__ vswitchd/bridge.c:3281:9
 21 0x4c9b00 in bridge_run vswitchd/bridge.c:3346:5
 22 0x526043 in main vswitchd/ovs-vswitchd.c:130:9
 23 0x7f1192 in __libc_start_call_main
 24 0x7f1192 in __libc_start_main@GLIBC_2.2.5
 25 0x432b24 in _start (vswitchd/ovs-vswitchd+0x432b24)

The issue is caught by running the 'debug_slow' test under the memory
sanitizer.  Another way to reproduce is by sending two packets at once
through the datapath.  The first one will get the same memory chunk as
the upcalled packet with already calculated RSS, the second one will
get the brand new memory chunk without the calculated RSS, so these
two packets will have different source ports after encapsulation.
The test is updated to cover this case.

Fix the issue by checking if the hash is valid before using, re-parsing
and calculating if it is not.  The netdev_tnl_get_src_port() function
moved to the .c file, since there is no real reason for it to be in the
header.  Compiler can decide on inlining it.  The declaration kept in
the header, since all the other functions declared there, even if there
is no reason for that.

In the future we may want to consolidate all the places where we
re-calculate RSS hash into a single function, but it's a little tricky.
This is also a larger change that would be harder to backport.  So, not
touching that aspect for now.

Re-parsing the packet eliminates advantages of the simple match, but
it was designed primarily for very simple setups that do not involve
tunneling or any other complex processing, so it should not be a big
problem.  And simple match can still be used with tunneling when the
input port provides the RSS hash.

Also, checking if the hash is valid is a right thing to do anyways.

Next step might be to not use simple match when there is no RSS hash
and there is a tunnel push action, but it seems hard to implement,
especially since we don't know the actions until we lookup the flow.

Fixes: e7e9973b80d3 ("dpif-netdev: Forwarding optimization for flows with a simple match.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
---
 lib/netdev-native-tnl.c  | 36 +++++++++++++++++++++++++++++++++---
 lib/netdev-native-tnl.h  | 11 +----------
 tests/tunnel-push-pop.at | 10 ++++++++++
 3 files changed, 44 insertions(+), 13 deletions(-)

Comments

Eelco Chaudron Dec. 2, 2024, 9:01 a.m. UTC | #1
On 29 Nov 2024, at 17:36, Ilya Maximets wrote:

> RSS hash calculation for a packet may be skipped in some cases.  One
> of them is a simple match optimization.  Packet is not fully parsed
> for the simple match, so there is no enough data to calculate the full
> 5-tuple hash.  However, when such a packet needs tunnel encapsulation,
> we need RSS hash to calculate the source port for the outer UDP header.
> And netdev_tnl_get_src_port() function doesn't check if the hash is
> valid before using it.  So, such packets will likely end up with
> different and unpredictable source ports potentially causing packet
> reordering or other issues in the network:
>
>  WARNING: MemorySanitizer: use-of-uninitialized-value
>   0 0x10c129c in dp_packet_get_rss_hash lib/dp-packet.h:1029:5
>   1 0x10b264c in netdev_tnl_get_src_port lib/netdev-native-tnl.h:131:12
>   2 0x10b171a in netdev_tnl_push_udp_header lib/netdev-native-tnl.c:286:20
>   3 0xb772fe in netdev_push_header lib/netdev.c:1037:13
>   4 0x9673c4 in push_tnl_action lib/dpif-netdev.c:9067:11
>   5 0x961abe in dp_execute_cb lib/dpif-netdev.c:9226:13
>   6 0xbcb4b1 in odp_execute_actions lib/odp-execute.c:1008:17
>   7 0x8e939f in dp_netdev_execute_actions lib/dpif-netdev.c:9524:5
>   8 0x968f3f in dp_execute_userspace_action lib/dpif-netdev.c:9093:9
>   9 0x962e54 in dp_execute_cb lib/dpif-netdev.c:9307:17
>  10 0xbcb4b1 in odp_execute_actions lib/odp-execute.c:1008:17
>  11 0x8e939f in dp_netdev_execute_actions lib/dpif-netdev.c:9524:5
>  12 0x950fef in packet_batch_per_flow_execute lib/dpif-netdev.c:8271:5
>  13 0x8ec8db in dp_netdev_input__ lib/dpif-netdev.c:8899:9
>  14 0x8eb8ec in dp_netdev_input lib/dpif-netdev.c:8908:5
>  15 0x92d5e8 in dp_netdev_process_rxq_port lib/dpif-netdev.c:5660:19
>  16 0x8ee2c4 in dpif_netdev_run lib/dpif-netdev.c:6993:25
>  17 0x9b442f in dpif_run lib/dpif.c:471:16
>  18 0x5f8e3a in type_run ofproto/ofproto-dpif.c:367:9
>  19 0x56c508 in ofproto_type_run ofproto/ofproto.c:1879:31
>  20 0x4cb388 in bridge_run__ vswitchd/bridge.c:3281:9
>  21 0x4c9b00 in bridge_run vswitchd/bridge.c:3346:5
>  22 0x526043 in main vswitchd/ovs-vswitchd.c:130:9
>  23 0x7f1192 in __libc_start_call_main
>  24 0x7f1192 in __libc_start_main@GLIBC_2.2.5
>  25 0x432b24 in _start (vswitchd/ovs-vswitchd+0x432b24)
>
> The issue is caught by running the 'debug_slow' test under the memory
> sanitizer.  Another way to reproduce is by sending two packets at once
> through the datapath.  The first one will get the same memory chunk as
> the upcalled packet with already calculated RSS, the second one will
> get the brand new memory chunk without the calculated RSS, so these
> two packets will have different source ports after encapsulation.
> The test is updated to cover this case.
>
> Fix the issue by checking if the hash is valid before using, re-parsing
> and calculating if it is not.  The netdev_tnl_get_src_port() function
> moved to the .c file, since there is no real reason for it to be in the
> header.  Compiler can decide on inlining it.  The declaration kept in
> the header, since all the other functions declared there, even if there
> is no reason for that.
>
> In the future we may want to consolidate all the places where we
> re-calculate RSS hash into a single function, but it's a little tricky.
> This is also a larger change that would be harder to backport.  So, not
> touching that aspect for now.
>
> Re-parsing the packet eliminates advantages of the simple match, but
> it was designed primarily for very simple setups that do not involve
> tunneling or any other complex processing, so it should not be a big
> problem.  And simple match can still be used with tunneling when the
> input port provides the RSS hash.
>
> Also, checking if the hash is valid is a right thing to do anyways.
>
> Next step might be to not use simple match when there is no RSS hash
> and there is a tunnel push action, but it seems hard to implement,
> especially since we don't know the actions until we lookup the flow.
>
> Fixes: e7e9973b80d3 ("dpif-netdev: Forwarding optimization for flows with a simple match.")
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

Thanks for finding this! The change looks good to me.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Ilya Maximets Dec. 2, 2024, 11:38 p.m. UTC | #2
On 12/2/24 10:01, Eelco Chaudron wrote:
> 
> 
> On 29 Nov 2024, at 17:36, Ilya Maximets wrote:
> 
>> RSS hash calculation for a packet may be skipped in some cases.  One
>> of them is a simple match optimization.  Packet is not fully parsed
>> for the simple match, so there is no enough data to calculate the full
>> 5-tuple hash.  However, when such a packet needs tunnel encapsulation,
>> we need RSS hash to calculate the source port for the outer UDP header.
>> And netdev_tnl_get_src_port() function doesn't check if the hash is
>> valid before using it.  So, such packets will likely end up with
>> different and unpredictable source ports potentially causing packet
>> reordering or other issues in the network:
>>
>>  WARNING: MemorySanitizer: use-of-uninitialized-value
>>   0 0x10c129c in dp_packet_get_rss_hash lib/dp-packet.h:1029:5
>>   1 0x10b264c in netdev_tnl_get_src_port lib/netdev-native-tnl.h:131:12
>>   2 0x10b171a in netdev_tnl_push_udp_header lib/netdev-native-tnl.c:286:20
>>   3 0xb772fe in netdev_push_header lib/netdev.c:1037:13
>>   4 0x9673c4 in push_tnl_action lib/dpif-netdev.c:9067:11
>>   5 0x961abe in dp_execute_cb lib/dpif-netdev.c:9226:13
>>   6 0xbcb4b1 in odp_execute_actions lib/odp-execute.c:1008:17
>>   7 0x8e939f in dp_netdev_execute_actions lib/dpif-netdev.c:9524:5
>>   8 0x968f3f in dp_execute_userspace_action lib/dpif-netdev.c:9093:9
>>   9 0x962e54 in dp_execute_cb lib/dpif-netdev.c:9307:17
>>  10 0xbcb4b1 in odp_execute_actions lib/odp-execute.c:1008:17
>>  11 0x8e939f in dp_netdev_execute_actions lib/dpif-netdev.c:9524:5
>>  12 0x950fef in packet_batch_per_flow_execute lib/dpif-netdev.c:8271:5
>>  13 0x8ec8db in dp_netdev_input__ lib/dpif-netdev.c:8899:9
>>  14 0x8eb8ec in dp_netdev_input lib/dpif-netdev.c:8908:5
>>  15 0x92d5e8 in dp_netdev_process_rxq_port lib/dpif-netdev.c:5660:19
>>  16 0x8ee2c4 in dpif_netdev_run lib/dpif-netdev.c:6993:25
>>  17 0x9b442f in dpif_run lib/dpif.c:471:16
>>  18 0x5f8e3a in type_run ofproto/ofproto-dpif.c:367:9
>>  19 0x56c508 in ofproto_type_run ofproto/ofproto.c:1879:31
>>  20 0x4cb388 in bridge_run__ vswitchd/bridge.c:3281:9
>>  21 0x4c9b00 in bridge_run vswitchd/bridge.c:3346:5
>>  22 0x526043 in main vswitchd/ovs-vswitchd.c:130:9
>>  23 0x7f1192 in __libc_start_call_main
>>  24 0x7f1192 in __libc_start_main@GLIBC_2.2.5
>>  25 0x432b24 in _start (vswitchd/ovs-vswitchd+0x432b24)
>>
>> The issue is caught by running the 'debug_slow' test under the memory
>> sanitizer.  Another way to reproduce is by sending two packets at once
>> through the datapath.  The first one will get the same memory chunk as
>> the upcalled packet with already calculated RSS, the second one will
>> get the brand new memory chunk without the calculated RSS, so these
>> two packets will have different source ports after encapsulation.
>> The test is updated to cover this case.
>>
>> Fix the issue by checking if the hash is valid before using, re-parsing
>> and calculating if it is not.  The netdev_tnl_get_src_port() function
>> moved to the .c file, since there is no real reason for it to be in the
>> header.  Compiler can decide on inlining it.  The declaration kept in
>> the header, since all the other functions declared there, even if there
>> is no reason for that.
>>
>> In the future we may want to consolidate all the places where we
>> re-calculate RSS hash into a single function, but it's a little tricky.
>> This is also a larger change that would be harder to backport.  So, not
>> touching that aspect for now.
>>
>> Re-parsing the packet eliminates advantages of the simple match, but
>> it was designed primarily for very simple setups that do not involve
>> tunneling or any other complex processing, so it should not be a big
>> problem.  And simple match can still be used with tunneling when the
>> input port provides the RSS hash.
>>
>> Also, checking if the hash is valid is a right thing to do anyways.
>>
>> Next step might be to not use simple match when there is no RSS hash
>> and there is a tunnel push action, but it seems hard to implement,
>> especially since we don't know the actions until we lookup the flow.
>>
>> Fixes: e7e9973b80d3 ("dpif-netdev: Forwarding optimization for flows with a simple match.")
>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> 
> Thanks for finding this! The change looks good to me.
> 
> Acked-by: Eelco Chaudron <echaudro@redhat.com>
> 

Thanks!  Applied and backported down to 3.3.  The patch didn't apply
cleanly on 3.2 and older, so I didn't backport there.  It's not a
critical change, AFAICT, but I could backport down to 2.17 later if
someone needs it.

Best regards, Ilya Maximets.
diff mbox series

Patch

diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
index 3e609cf54..ede5e1686 100644
--- a/lib/netdev-native-tnl.c
+++ b/lib/netdev-native-tnl.c
@@ -61,6 +61,27 @@  static struct vlog_rate_limit err_rl = VLOG_RATE_LIMIT_INIT(60, 5);
 uint16_t tnl_udp_port_min = 32768;
 uint16_t tnl_udp_port_max = 61000;
 
+ovs_be16
+netdev_tnl_get_src_port(struct dp_packet *packet)
+{
+    uint32_t hash;
+
+    if (OVS_LIKELY(dp_packet_rss_valid(packet))) {
+        hash = dp_packet_get_rss_hash(packet);
+    } else {
+        struct flow flow;
+
+        flow_extract(packet, &flow);
+        hash = flow_hash_5tuple(&flow, 0);
+
+        dp_packet_set_rss_hash(packet, hash);
+    }
+
+    hash = ((uint64_t) hash * (tnl_udp_port_max - tnl_udp_port_min)) >> 32;
+
+    return htons(hash + tnl_udp_port_min);
+}
+
 void *
 netdev_tnl_ip_extract_tnl_md(struct dp_packet *packet, struct flow_tnl *tnl,
                   unsigned int *hlen)
@@ -276,14 +297,18 @@  netdev_tnl_push_udp_header(const struct netdev *netdev OVS_UNUSED,
     uint16_t l3_ofs = packet->l3_ofs;
     uint16_t l4_ofs = packet->l4_ofs;
     struct udp_header *udp;
+    ovs_be16 udp_src;
     int ip_tot_size;
 
+    /* We may need to re-calculate the hash and this has to be done before
+     * modifying the packet. */
+    udp_src = netdev_tnl_get_src_port(packet);
+
     dp_packet_tnl_ol_process(packet, data);
     udp = netdev_tnl_push_ip_header(packet, data->header, data->header_len,
                                     &ip_tot_size, 0);
 
-    /* set udp src port */
-    udp->udp_src = netdev_tnl_get_src_port(packet);
+    udp->udp_src = udp_src;
     udp->udp_len = htons(ip_tot_size);
 
     if (udp->udp_csum) {
@@ -831,13 +856,18 @@  netdev_gtpu_push_header(const struct netdev *netdev,
     struct netdev_vport *dev = netdev_vport_cast(netdev);
     struct udp_header *udp;
     struct gtpuhdr *gtpuh;
+    ovs_be16 udp_src;
     int ip_tot_size;
     unsigned int payload_len;
 
+    /* We may need to re-calculate the hash and this has to be done before
+     * modifying the packet. */
+    udp_src = netdev_tnl_get_src_port(packet);
+
     payload_len = dp_packet_size(packet);
     udp = netdev_tnl_push_ip_header(packet, data->header, data->header_len,
                                     &ip_tot_size, 0);
-    udp->udp_src = netdev_tnl_get_src_port(packet);
+    udp->udp_src = udp_src;
     udp->udp_len = htons(ip_tot_size);
     /* Postpone checksum to the egress netdev. */
     dp_packet_hwol_set_csum_udp(packet);
diff --git a/lib/netdev-native-tnl.h b/lib/netdev-native-tnl.h
index eb55dd041..5d8f1672a 100644
--- a/lib/netdev-native-tnl.h
+++ b/lib/netdev-native-tnl.h
@@ -123,16 +123,7 @@  netdev_tnl_ip_build_header(struct ovs_action_push_tnl *data,
 extern uint16_t tnl_udp_port_min;
 extern uint16_t tnl_udp_port_max;
 
-static inline ovs_be16
-netdev_tnl_get_src_port(struct dp_packet *packet)
-{
-    uint32_t hash;
-
-    hash = dp_packet_get_rss_hash(packet);
-
-    return htons((((uint64_t) hash * (tnl_udp_port_max - tnl_udp_port_min)) >> 32) +
-                 tnl_udp_port_min);
-}
+ovs_be16 netdev_tnl_get_src_port(struct dp_packet *);
 
 void *
 netdev_tnl_ip_extract_tnl_md(struct dp_packet *packet, struct flow_tnl *tnl,
diff --git a/tests/tunnel-push-pop.at b/tests/tunnel-push-pop.at
index 99b1b02bf..cf4e62201 100644
--- a/tests/tunnel-push-pop.at
+++ b/tests/tunnel-push-pop.at
@@ -774,6 +774,16 @@  dnl Sending again to exercise the non-miss upcall path.
 AT_CHECK([ovs-appctl netdev-dummy/receive int-br "${packet}4"])
 OVS_WAIT_UNTIL([test `ovs-pcap p0.pcap | grep -E "${encap}${packet}4" | wc -l` -ge 2])
 
+dnl Send two more packets at the same time to make sure they are distinct
+dnl memory buffers.
+AT_CHECK([ovs-appctl netdev-dummy/receive int-br "${packet}4" "${packet}4"])
+OVS_WAIT_UNTIL([test `ovs-pcap p0.pcap | grep -E "${encap}${packet}4" | wc -l` -ge 4])
+
+dnl Make sure all the packets are the same, i.e. have the same source port.
+AT_CHECK([ovs-pcap p0.pcap | sed 's/.$//' | sort | uniq \
+            | grep -E -c "${encap}${packet}"], [0], [1
+])
+
 dnl Output to tunnel from the controller.
 AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out int-br CONTROLLER "debug_slow,output:2" "${packet}5"])
 OVS_WAIT_UNTIL([test `ovs-pcap p0.pcap | grep -E "${encap}${packet}5" | wc -l` -ge 1])