diff mbox series

[ovs-dev] dpif-netdev: Resolved flow table reference issue.

Message ID 20240523075049.1936263-1-wushaohua@chinatelecom.cn
State Rejected
Delegated to: Ilya Maximets
Headers show
Series [ovs-dev] dpif-netdev: Resolved flow table reference issue. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test fail github build: failed
ovsrobot/intel-ovs-compilation success test: success

Commit Message

wushaohua@chinatelecom.cn May 23, 2024, 7:50 a.m. UTC
From: Shaohua Wu <wushaohua@chinatelecom.cn>

description:
The statistics for the cached DP flow table are updated in
packet_batch_per_flow_execute. There might be a timing gap
where the dp flow in the batch reaches its aging time and
is prematurely aged. If actions on the dp flows  are
executed at this time,and the memory for the DP flow
has been released, it can lead to a crash.

0  raise () from /lib64/libc.so.6
1  abort () from /lib64/libc.so.6
2  odp_execute_actions at lib/odp-execute.c:1166
3  dp_netdev_execute_actions at lib/dpif-netdev.c:11339
4  packet_batch_per_flow_execute  at lib/dpif-netdev.c:8537
5  dp_netdev_input__  at lib/dpif-netdev.c:10722
6  dp_netdev_input at lib/dpif-netdev.c:10731
7  dp_netdev_process_rxq_port at lib/dpif-netdev.c:6332
8  dpif_netdev_run at lib/dpif-netdev.c:7343
9  dpif_run at lib/dpif.c:479
10 type_run at ofproto/ofproto-dpif.c:370
11 ofproto_type_run at ofproto/ofproto.c:1789
12 bridge_run__ at vswitchd/bridge.c:3245
13 bridge_run at vswitchd/bridge.c:3310
14 main  at vswitchd/ovs-vswitchd.c:127
(gdb) f 4
(gdb) p flow->ref_cnt
$4 = {count = 0}
(gdb) p flow->dead
$5 = true

Signed-off-by: Shaohua Wu <wushaohua@chinatelecom.cn>
---
 lib/dpif-netdev.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

Comments

Ilya Maximets May 28, 2024, 4:03 p.m. UTC | #1
On 5/23/24 09:50, wushaohua@chinatelecom.cn wrote:
> From: Shaohua Wu <wushaohua@chinatelecom.cn>
> 
> description:
> The statistics for the cached DP flow table are updated in
> packet_batch_per_flow_execute. There might be a timing gap
> where the dp flow in the batch reaches its aging time and
> is prematurely aged. If actions on the dp flows  are
> executed at this time,and the memory for the DP flow
> has been released, it can lead to a crash.
> 
> 0  raise () from /lib64/libc.so.6
> 1  abort () from /lib64/libc.so.6
> 2  odp_execute_actions at lib/odp-execute.c:1166
> 3  dp_netdev_execute_actions at lib/dpif-netdev.c:11339
> 4  packet_batch_per_flow_execute  at lib/dpif-netdev.c:8537
> 5  dp_netdev_input__  at lib/dpif-netdev.c:10722
> 6  dp_netdev_input at lib/dpif-netdev.c:10731
> 7  dp_netdev_process_rxq_port at lib/dpif-netdev.c:6332
> 8  dpif_netdev_run at lib/dpif-netdev.c:7343
> 9  dpif_run at lib/dpif.c:479
> 10 type_run at ofproto/ofproto-dpif.c:370
> 11 ofproto_type_run at ofproto/ofproto.c:1789
> 12 bridge_run__ at vswitchd/bridge.c:3245
> 13 bridge_run at vswitchd/bridge.c:3310
> 14 main  at vswitchd/ovs-vswitchd.c:127
> (gdb) f 4
> (gdb) p flow->ref_cnt
> $4 = {count = 0}
> (gdb) p flow->dead
> $5 = true

Hmm.  Thanks for the patch!  Though the flow is supposed to be
protected by RCU and PMD threads are not supposed to enter
quiescent state in the middle of packet processing, so the flow
should not be freed until the processing is over.

The flow will be dead in this case with a zero counter, but it
is not freed, i.e. should be accessible.

Do you know what exactly failed in odp_execute_actions ?
Your version of the code seems to be very different from the
upstream main branch, so line numbers do not help much.

Best regards, Ilya Maximets.
diff mbox series

Patch

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index c7f9e1490..8f96efde0 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -8255,12 +8255,13 @@  static inline void
 packet_batch_per_flow_init(struct packet_batch_per_flow *batch,
                            struct dp_netdev_flow *flow)
 {
-    flow->batch = batch;
-
-    batch->flow = flow;
-    dp_packet_batch_init(&batch->array);
-    batch->byte_count = 0;
-    batch->tcp_flags = 0;
+    if (dp_netdev_flow_ref(flow)) {
+        flow->batch = batch;
+        batch->flow = flow;
+        dp_packet_batch_init(&batch->array);
+        batch->byte_count = 0;
+        batch->tcp_flags = 0;
+    }
 }
 
 static inline void
@@ -8322,9 +8323,11 @@  packet_enqueue_to_flow_map(struct dp_packet *packet,
                            size_t index)
 {
     struct dp_packet_flow_map *map = &flow_map[index];
-    map->flow = flow;
-    map->packet = packet;
-    map->tcp_flags = tcp_flags;
+    if (dp_netdev_flow_ref(flow)) {
+        map->flow = flow;
+        map->packet = packet;
+        map->tcp_flags = tcp_flags;
+    }
 }
 
 /* SMC lookup function for a batch of packets.
@@ -8888,6 +8891,7 @@  dp_netdev_input__(struct dp_netdev_pmd_thread *pmd,
         }
         dp_netdev_queue_batches(map->packet, map->flow, map->tcp_flags,
                                 batches, &n_batches);
+        dp_netdev_flow_unref(map->flow);
      }
 
     /* All the flow batches need to be reset before any call to
@@ -8905,6 +8909,7 @@  dp_netdev_input__(struct dp_netdev_pmd_thread *pmd,
 
     for (i = 0; i < n_batches; i++) {
         packet_batch_per_flow_execute(&batches[i], pmd);
+        dp_netdev_flow_unref(batches[i].flow);
     }
 }