diff mbox series

[ovs-dev] dpif-netdev: Fix a race condition in deletion of offloaded flows

Message ID 20220127064207.133027-1-sriharsha.basavapatna@broadcom.com
State Changes Requested
Headers show
Series [ovs-dev] dpif-netdev: Fix a race condition in deletion of offloaded flows | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test fail github build: failed

Commit Message

Sriharsha Basavapatna Jan. 27, 2022, 6:42 a.m. UTC
In dp_netdev_pmd_remove_flow() we schedule the deletion of an
offloaded flow, if a mark has been assigned to the flow. But if
this occurs in the window in which the offload thread completes
offloading the flow and assigns a mark to the flow, then we miss
deleting the flow. This problem has been observed while adding
and deleting flows in a loop. To fix this, always enqueue flow
deletion regardless of the flow->mark being set.

Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
---
 lib/dpif-netdev.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

Comments

Gaetan Rivet Jan. 27, 2022, 9:15 a.m. UTC | #1
On Thu, Jan 27, 2022, at 07:42, Sriharsha Basavapatna via dev wrote:
> In dp_netdev_pmd_remove_flow() we schedule the deletion of an
> offloaded flow, if a mark has been assigned to the flow. But if
> this occurs in the window in which the offload thread completes
> offloading the flow and assigns a mark to the flow, then we miss
> deleting the flow. This problem has been observed while adding
> and deleting flows in a loop. To fix this, always enqueue flow
> deletion regardless of the flow->mark being set.
>
> Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
> Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
> ---
>  lib/dpif-netdev.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index e28e0b554..22c5f19a8 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct 
> dp_netdev_pmd_thread *pmd,
>      dp_netdev_simple_match_remove(pmd, flow);
>      cmap_remove(&pmd->flow_table, node, 
> dp_netdev_flow_hash(&flow->ufid));
>      ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
> -    if (flow->mark != INVALID_FLOW_MARK) {
> -        queue_netdev_flow_del(pmd, flow);
> -    }
> +    queue_netdev_flow_del(pmd, flow);

Hi Sriharsha,

It makes sense, thanks for the patch.
Additionally, the mark is being written in a thread and read in another but is not atomic. Without fence, coherence might take time, making the described issue more likely on some archs.

Acked-by: Gaetan Rivet <grive@u256.net>
Sriharsha Basavapatna Jan. 27, 2022, 9:55 a.m. UTC | #2
On Thu, Jan 27, 2022 at 2:45 PM Gaëtan Rivet <grive@u256.net> wrote:
>
> On Thu, Jan 27, 2022, at 07:42, Sriharsha Basavapatna via dev wrote:
> > In dp_netdev_pmd_remove_flow() we schedule the deletion of an
> > offloaded flow, if a mark has been assigned to the flow. But if
> > this occurs in the window in which the offload thread completes
> > offloading the flow and assigns a mark to the flow, then we miss
> > deleting the flow. This problem has been observed while adding
> > and deleting flows in a loop. To fix this, always enqueue flow
> > deletion regardless of the flow->mark being set.
> >
> > Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
> > Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
> > ---
> >  lib/dpif-netdev.c | 4 +---
> >  1 file changed, 1 insertion(+), 3 deletions(-)
> >
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > index e28e0b554..22c5f19a8 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct
> > dp_netdev_pmd_thread *pmd,
> >      dp_netdev_simple_match_remove(pmd, flow);
> >      cmap_remove(&pmd->flow_table, node,
> > dp_netdev_flow_hash(&flow->ufid));
> >      ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
> > -    if (flow->mark != INVALID_FLOW_MARK) {
> > -        queue_netdev_flow_del(pmd, flow);
> > -    }
> > +    queue_netdev_flow_del(pmd, flow);
>
> Hi Sriharsha,
>
> It makes sense, thanks for the patch.
> Additionally, the mark is being written in a thread and read in another but is not atomic. Without fence, coherence might take time, making the described issue more likely on some archs.

Right, good point.
>
> Acked-by: Gaetan Rivet <grive@u256.net>

Thanks Gaetan.
-Harsha
>
> --
> Gaetan Rivet
Ilya Maximets Jan. 27, 2022, 10:38 a.m. UTC | #3
On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote:
> In dp_netdev_pmd_remove_flow() we schedule the deletion of an
> offloaded flow, if a mark has been assigned to the flow. But if
> this occurs in the window in which the offload thread completes
> offloading the flow and assigns a mark to the flow, then we miss
> deleting the flow. This problem has been observed while adding
> and deleting flows in a loop. To fix this, always enqueue flow
> deletion regardless of the flow->mark being set.
> 
> Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
> Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
> ---
>  lib/dpif-netdev.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index e28e0b554..22c5f19a8 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd,
>      dp_netdev_simple_match_remove(pmd, flow);
>      cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow->ufid));
>      ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
> -    if (flow->mark != INVALID_FLOW_MARK) {
> -        queue_netdev_flow_del(pmd, flow);
> -    }
> +    queue_netdev_flow_del(pmd, flow);
>      flow->dead = true;
>  
>      dp_netdev_flow_unref(flow);
> 

Thanks for the patch, but it looks like that it's not that simple...
A lot of tests are crashing in GHA and ASAN reports use-after-free
on flow disassociation if the dpif already destroyed:
  https://github.com/ovsrobot/ovs/actions/runs/1754866321

I think that the problem is that offload thread holds the ref count
for PMD thread, but PMD thread structure doesn't hold the ref count
for the dp, because it doesn't expect that dp_netdev_pmd structure
will be used while PMD thread is destroyed.  I guess, we should fix
that.  OTOH, I'm not sure if offload thread actually needs a reference
to dp_netdev_pmd structure.  If I didn't miss something, it only
uses pmd pointer to access pmd->dp.  So, maybe, dp_offload_thread_item
should hold and ref the dp pointer instead?

What do you think?  Gaetan?

Another point is that queue_netdev_flow_del() should check for
netdev_is_flow_api_enabled() to avoid creation of offload threads
if offloading is disabled.  But that's good that we didn't have it,
as the refcount issue got exposed.  Otherwise it would be hard
to reproduce.

Best regards, Ilya Maximets.

Asan report below, for convenience:

=================================================================
==17076==ERROR: AddressSanitizer: heap-use-after-free on address 0x616000003080 at pc 0x0000005e0e38 bp 0x7fe0594309f0 sp 0x7fe0594309e8
READ of size 8 at 0x616000003080 thread T5 (hw_offload4)
    #0 0x5e0e37 in mark_to_flow_disassociate /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2556:62
    #1 0x5dfaf1 in dp_offload_flow /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2821:15
    #2 0x5df8bd in dp_netdev_flow_offload_main /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2884:17
    #3 0x73a2fc in ovsthread_wrapper /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/ovs-thread.c:422:12
    #4 0x7fe0619506da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)
    #5 0x7fe060ecf71e in clone (/lib/x86_64-linux-gnu/libc.so.6+0x12171e)

0x616000003080 is located 0 bytes inside of 576-byte region [0x616000003080,0x6160000032c0)
freed by thread T0 here:
    #0 0x49640d in free (/home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/vswitchd/ovs-vswitchd+0x49640d)
    #1 0x5d6652 in dp_netdev_free /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:1980:5
    #2 0x5f0722 in dp_netdev_unref /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:1991:13
    #3 0x5cf5e5 in dpif_netdev_close /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2002:5
    #4 0x5fc393 in dpif_uninit /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif.c:1725:9
    #5 0x5fc0c0 in dpif_close /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif.c:457:9
    #6 0x52a972 in close_dpif_backer /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto-dpif.c:715:5
    #7 0x517f08 in destruct /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto-dpif.c:1860:5
    #8 0x4f09e0 in ofproto_destroy /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto.c:1728:5
    #9 0x4c71e4 in bridge_destroy /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../vswitchd/bridge.c:3608:9
    #10 0x4c6f4a in bridge_exit /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../vswitchd/bridge.c:553:9
    #11 0x4e109a in main /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../vswitchd/ovs-vswitchd.c:146:5
    #12 0x7fe060dcfbf6 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21bf6)

previously allocated by thread T0 here:
    #0 0x496802 in calloc (/home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/vswitchd/ovs-vswitchd+0x496802)
    #1 0x7af6b2 in xcalloc__ /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/util.c:121:31
    #2 0x5d5a89 in create_dp_netdev /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:1810:10
    #3 0x5cdb5e in dpif_netdev_open /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:1896:26
    #4 0x5fbbf1 in do_open /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif.c:348:13
    #5 0x5fbfc8 in dpif_create_and_open /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif.c:417:13
    #6 0x529547 in open_dpif_backer /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto-dpif.c:776:13
    #7 0x5174eb in construct /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto-dpif.c:1672:13
    #8 0x4ec78d in ofproto_create /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto.c:553:13
    #9 0x4c7fa7 in bridge_reconfigure /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../vswitchd/bridge.c:882:21
    #10 0x4c74d5 in bridge_run /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../vswitchd/bridge.c:3331:9
    #11 0x4e0fb1 in main /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../vswitchd/ovs-vswitchd.c:129:9
    #12 0x7fe060dcfbf6 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21bf6)

Thread T5 (hw_offload4) created by T0 here:
    #0 0x480e1a in pthread_create (/home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/vswitchd/ovs-vswitchd+0x480e1a)
    #1 0x739f17 in ovs_thread_create /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/ovs-thread.c:486:13
    #2 0x5df689 in dp_netdev_offload_init /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:415:9
    #3 0x5df580 in dp_netdev_append_offload /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2689:5
    #4 0x5deddb in dp_netdev_pmd_remove_flow /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:3032:5
    #5 0x5f5082 in flow_del_on_pmd /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:4297:9
    #6 0x5f3ec5 in dpif_netdev_flow_del /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:4340:17
    #7 0x5d1beb in dpif_netdev_operate /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:4602:25
    #8 0x5fec3e in dpif_operate /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif.c:1372:13
    #9 0x5fe9f4 in dpif_flow_del /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif.c:1062:5
    #10 0x5fdfdb in dpif_probe_feature /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif.c:977:13
    #11 0x52ad76 in check_recirc /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto-dpif.c:904:21
    #12 0x52a131 in check_support /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto-dpif.c:1630:37
    #13 0x5298f6 in open_dpif_backer /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto-dpif.c:820:5
    #14 0x5174eb in construct /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto-dpif.c:1672:13
    #15 0x4ec78d in ofproto_create /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto.c:553:13
    #16 0x4c7fa7 in bridge_reconfigure /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../vswitchd/bridge.c:882:21
    #17 0x4c74d5 in bridge_run /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../vswitchd/bridge.c:3331:9
    #18 0x4e0fb1 in main /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../vswitchd/ovs-vswitchd.c:129:9
    #19 0x7fe060dcfbf6 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21bf6)

SUMMARY: AddressSanitizer: heap-use-after-free /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2556:62 in mark_to_flow_disassociate
Shadow bytes around the buggy address:
  0x0c2c7fff85c0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2c7fff85d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2c7fff85e0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2c7fff85f0: fd fd fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c2c7fff8600: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c2c7fff8610:[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2c7fff8620: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2c7fff8630: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2c7fff8640: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2c7fff8650: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
  0x0c2c7fff8660: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==17076==ABORTING
Sriharsha Basavapatna Jan. 27, 2022, 12:09 p.m. UTC | #4
On Thu, Jan 27, 2022 at 4:08 PM Ilya Maximets <i.maximets@ovn.org> wrote:
>
> On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote:
> > In dp_netdev_pmd_remove_flow() we schedule the deletion of an
> > offloaded flow, if a mark has been assigned to the flow. But if
> > this occurs in the window in which the offload thread completes
> > offloading the flow and assigns a mark to the flow, then we miss
> > deleting the flow. This problem has been observed while adding
> > and deleting flows in a loop. To fix this, always enqueue flow
> > deletion regardless of the flow->mark being set.
> >
> > Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
> > Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
> > ---
> >  lib/dpif-netdev.c | 4 +---
> >  1 file changed, 1 insertion(+), 3 deletions(-)
> >
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > index e28e0b554..22c5f19a8 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd,
> >      dp_netdev_simple_match_remove(pmd, flow);
> >      cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow->ufid));
> >      ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
> > -    if (flow->mark != INVALID_FLOW_MARK) {
> > -        queue_netdev_flow_del(pmd, flow);
> > -    }
> > +    queue_netdev_flow_del(pmd, flow);
> >      flow->dead = true;
> >
> >      dp_netdev_flow_unref(flow);
> >
>
> Thanks for the patch, but it looks like that it's not that simple...
> A lot of tests are crashing in GHA and ASAN reports use-after-free
> on flow disassociation if the dpif already destroyed:
>   https://github.com/ovsrobot/ovs/actions/runs/1754866321
>
> I think that the problem is that offload thread holds the ref count
> for PMD thread, but PMD thread structure doesn't hold the ref count
> for the dp, because it doesn't expect that dp_netdev_pmd structure
> will be used while PMD thread is destroyed.  I guess, we should fix
> that.  OTOH, I'm not sure if offload thread actually needs a reference
> to dp_netdev_pmd structure.  If I didn't miss something, it only
> uses pmd pointer to access pmd->dp.  So, maybe, dp_offload_thread_item
> should hold and ref the dp pointer instead?
>
> What do you think?  Gaetan?
>
> Another point is that queue_netdev_flow_del() should check for
> netdev_is_flow_api_enabled() to avoid creation of offload threads
> if offloading is disabled.  But that's good that we didn't have it,
> as the refcount issue got exposed.  Otherwise it would be hard
> to reproduce.
>
> Best regards, Ilya Maximets.
>
> Asan report below, for convenience:
>
> =================================================================
> ==17076==ERROR: AddressSanitizer: heap-use-after-free on address 0x616000003080 at pc 0x0000005e0e38 bp 0x7fe0594309f0 sp 0x7fe0594309e8
> READ of size 8 at 0x616000003080 thread T5 (hw_offload4)
>     #0 0x5e0e37 in mark_to_flow_disassociate /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2556:62
>     #1 0x5dfaf1 in dp_offload_flow /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2821:15
>     #2 0x5df8bd in dp_netdev_flow_offload_main /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2884:17
>     #3 0x73a2fc in ovsthread_wrapper /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/ovs-thread.c:422:12
>     #4 0x7fe0619506da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)
>     #5 0x7fe060ecf71e in clone (/lib/x86_64-linux-gnu/libc.so.6+0x12171e)
>
> 0x616000003080 is located 0 bytes inside of 576-byte region [0x616000003080,0x6160000032c0)
> freed by thread T0 here:
>     #0 0x49640d in free (/home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/vswitchd/ovs-vswitchd+0x49640d)
>     #1 0x5d6652 in dp_netdev_free /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:1980:5
>     #2 0x5f0722 in dp_netdev_unref /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:1991:13
>     #3 0x5cf5e5 in dpif_netdev_close /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2002:5
>     #4 0x5fc393 in dpif_uninit /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif.c:1725:9
>     #5 0x5fc0c0 in dpif_close /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif.c:457:9
>     #6 0x52a972 in close_dpif_backer /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto-dpif.c:715:5
>     #7 0x517f08 in destruct /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto-dpif.c:1860:5
>     #8 0x4f09e0 in ofproto_destroy /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto.c:1728:5
>     #9 0x4c71e4 in bridge_destroy /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../vswitchd/bridge.c:3608:9
>     #10 0x4c6f4a in bridge_exit /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../vswitchd/bridge.c:553:9
>     #11 0x4e109a in main /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../vswitchd/ovs-vswitchd.c:146:5
>     #12 0x7fe060dcfbf6 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21bf6)
>
> previously allocated by thread T0 here:
>     #0 0x496802 in calloc (/home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/vswitchd/ovs-vswitchd+0x496802)
>     #1 0x7af6b2 in xcalloc__ /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/util.c:121:31
>     #2 0x5d5a89 in create_dp_netdev /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:1810:10
>     #3 0x5cdb5e in dpif_netdev_open /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:1896:26
>     #4 0x5fbbf1 in do_open /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif.c:348:13
>     #5 0x5fbfc8 in dpif_create_and_open /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif.c:417:13
>     #6 0x529547 in open_dpif_backer /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto-dpif.c:776:13
>     #7 0x5174eb in construct /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto-dpif.c:1672:13
>     #8 0x4ec78d in ofproto_create /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto.c:553:13
>     #9 0x4c7fa7 in bridge_reconfigure /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../vswitchd/bridge.c:882:21
>     #10 0x4c74d5 in bridge_run /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../vswitchd/bridge.c:3331:9
>     #11 0x4e0fb1 in main /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../vswitchd/ovs-vswitchd.c:129:9
>     #12 0x7fe060dcfbf6 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21bf6)
>
> Thread T5 (hw_offload4) created by T0 here:
>     #0 0x480e1a in pthread_create (/home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/vswitchd/ovs-vswitchd+0x480e1a)
>     #1 0x739f17 in ovs_thread_create /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/ovs-thread.c:486:13
>     #2 0x5df689 in dp_netdev_offload_init /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:415:9
>     #3 0x5df580 in dp_netdev_append_offload /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2689:5
>     #4 0x5deddb in dp_netdev_pmd_remove_flow /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:3032:5
>     #5 0x5f5082 in flow_del_on_pmd /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:4297:9
>     #6 0x5f3ec5 in dpif_netdev_flow_del /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:4340:17
>     #7 0x5d1beb in dpif_netdev_operate /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:4602:25
>     #8 0x5fec3e in dpif_operate /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif.c:1372:13
>     #9 0x5fe9f4 in dpif_flow_del /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif.c:1062:5
>     #10 0x5fdfdb in dpif_probe_feature /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif.c:977:13
>     #11 0x52ad76 in check_recirc /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto-dpif.c:904:21
>     #12 0x52a131 in check_support /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto-dpif.c:1630:37
>     #13 0x5298f6 in open_dpif_backer /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto-dpif.c:820:5
>     #14 0x5174eb in construct /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto-dpif.c:1672:13
>     #15 0x4ec78d in ofproto_create /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto.c:553:13
>     #16 0x4c7fa7 in bridge_reconfigure /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../vswitchd/bridge.c:882:21
>     #17 0x4c74d5 in bridge_run /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../vswitchd/bridge.c:3331:9
>     #18 0x4e0fb1 in main /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../vswitchd/ovs-vswitchd.c:129:9
>     #19 0x7fe060dcfbf6 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21bf6)
>
> SUMMARY: AddressSanitizer: heap-use-after-free /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2556:62 in mark_to_flow_disassociate
> Shadow bytes around the buggy address:
>   0x0c2c7fff85c0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
>   0x0c2c7fff85d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
>   0x0c2c7fff85e0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
>   0x0c2c7fff85f0: fd fd fa fa fa fa fa fa fa fa fa fa fa fa fa fa
>   0x0c2c7fff8600: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
> =>0x0c2c7fff8610:[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
>   0x0c2c7fff8620: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
>   0x0c2c7fff8630: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
>   0x0c2c7fff8640: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
>   0x0c2c7fff8650: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
>   0x0c2c7fff8660: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
> Shadow byte legend (one shadow byte represents 8 application bytes):
>   Addressable:           00
>   Partially addressable: 01 02 03 04 05 06 07
>   Heap left redzone:       fa
>   Freed heap region:       fd
>   Stack left redzone:      f1
>   Stack mid redzone:       f2
>   Stack right redzone:     f3
>   Stack after return:      f5
>   Stack use after scope:   f8
>   Global redzone:          f9
>   Global init order:       f6
>   Poisoned by user:        f7
>   Container overflow:      fc
>   Array cookie:            ac
>   Intra object redzone:    bb
>   ASan internal:           fe
>   Left alloca redzone:     ca
>   Right alloca redzone:    cb
>   Shadow gap:              cc
> ==17076==ABORTING

Thanks Ilya for catching these issues. Are these specific to 2.17 ? If
possible, can you please try this patch with 2.16  ?

Thanks,
-Harsha
Ilya Maximets Jan. 27, 2022, 1:47 p.m. UTC | #5
On 1/27/22 13:09, Sriharsha Basavapatna wrote:
> On Thu, Jan 27, 2022 at 4:08 PM Ilya Maximets <i.maximets@ovn.org> wrote:
>>
>> On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote:
>>> In dp_netdev_pmd_remove_flow() we schedule the deletion of an
>>> offloaded flow, if a mark has been assigned to the flow. But if
>>> this occurs in the window in which the offload thread completes
>>> offloading the flow and assigns a mark to the flow, then we miss
>>> deleting the flow. This problem has been observed while adding
>>> and deleting flows in a loop. To fix this, always enqueue flow
>>> deletion regardless of the flow->mark being set.
>>>
>>> Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
>>> Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
>>> ---
>>>  lib/dpif-netdev.c | 4 +---
>>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>>
>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>> index e28e0b554..22c5f19a8 100644
>>> --- a/lib/dpif-netdev.c
>>> +++ b/lib/dpif-netdev.c
>>> @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd,
>>>      dp_netdev_simple_match_remove(pmd, flow);
>>>      cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow->ufid));
>>>      ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
>>> -    if (flow->mark != INVALID_FLOW_MARK) {
>>> -        queue_netdev_flow_del(pmd, flow);
>>> -    }
>>> +    queue_netdev_flow_del(pmd, flow);
>>>      flow->dead = true;
>>>
>>>      dp_netdev_flow_unref(flow);
>>>
>>
>> Thanks for the patch, but it looks like that it's not that simple...
>> A lot of tests are crashing in GHA and ASAN reports use-after-free
>> on flow disassociation if the dpif already destroyed:
>>   https://github.com/ovsrobot/ovs/actions/runs/1754866321
>>
>> I think that the problem is that offload thread holds the ref count
>> for PMD thread, but PMD thread structure doesn't hold the ref count
>> for the dp, because it doesn't expect that dp_netdev_pmd structure
>> will be used while PMD thread is destroyed.  I guess, we should fix
>> that.  OTOH, I'm not sure if offload thread actually needs a reference
>> to dp_netdev_pmd structure.  If I didn't miss something, it only
>> uses pmd pointer to access pmd->dp.  So, maybe, dp_offload_thread_item
>> should hold and ref the dp pointer instead?
>>
>> What do you think?  Gaetan?
>>
>> Another point is that queue_netdev_flow_del() should check for
>> netdev_is_flow_api_enabled() to avoid creation of offload threads
>> if offloading is disabled.  But that's good that we didn't have it,
>> as the refcount issue got exposed.  Otherwise it would be hard
>> to reproduce.
>>
>> Best regards, Ilya Maximets.
>>
>> Asan report below, for convenience:
>>
>> =================================================================
>> ==17076==ERROR: AddressSanitizer: heap-use-after-free on address 0x616000003080 at pc 0x0000005e0e38 bp 0x7fe0594309f0 sp 0x7fe0594309e8
>> READ of size 8 at 0x616000003080 thread T5 (hw_offload4)
>>     #0 0x5e0e37 in mark_to_flow_disassociate /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2556:62
>>     #1 0x5dfaf1 in dp_offload_flow /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2821:15
>>     #2 0x5df8bd in dp_netdev_flow_offload_main /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2884:17
>>     #3 0x73a2fc in ovsthread_wrapper /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/ovs-thread.c:422:12
>>     #4 0x7fe0619506da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)
>>     #5 0x7fe060ecf71e in clone (/lib/x86_64-linux-gnu/libc.so.6+0x12171e)
>>
>> 0x616000003080 is located 0 bytes inside of 576-byte region [0x616000003080,0x6160000032c0)
>> freed by thread T0 here:
>>     #0 0x49640d in free (/home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/vswitchd/ovs-vswitchd+0x49640d)
>>     #1 0x5d6652 in dp_netdev_free /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:1980:5
>>     #2 0x5f0722 in dp_netdev_unref /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:1991:13
>>     #3 0x5cf5e5 in dpif_netdev_close /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2002:5
>>     #4 0x5fc393 in dpif_uninit /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif.c:1725:9
>>     #5 0x5fc0c0 in dpif_close /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif.c:457:9
>>     #6 0x52a972 in close_dpif_backer /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto-dpif.c:715:5
>>     #7 0x517f08 in destruct /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto-dpif.c:1860:5
>>     #8 0x4f09e0 in ofproto_destroy /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto.c:1728:5
>>     #9 0x4c71e4 in bridge_destroy /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../vswitchd/bridge.c:3608:9
>>     #10 0x4c6f4a in bridge_exit /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../vswitchd/bridge.c:553:9
>>     #11 0x4e109a in main /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../vswitchd/ovs-vswitchd.c:146:5
>>     #12 0x7fe060dcfbf6 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21bf6)

<snip>
> 
> Thanks Ilya for catching these issues. Are these specific to 2.17 ? If
> possible, can you please try this patch with 2.16  ?

This is not specific to 2.17, but it's harder to reproduce on 2.16, because
on 2.16 the main thread waits for the offloading thread to wake up.  But the
race is still there and the following change exposes it:

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 8cca57f1f..d7036f6a7 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -2667,6 +2667,7 @@ dp_netdev_flow_offload_main(void *data OVS_UNUSED)
                                 &dp_flow_offload.mutex);
             ovsrcu_quiesce_end();
         }
+        xnanosleep(64000000);
         list = ovs_list_pop_front(&dp_flow_offload.list);
         offload = CONTAINER_OF(list, struct dp_flow_offload_item, node);
         ovs_mutex_unlock(&dp_flow_offload.mutex);
---

With the delay above, unit tests on 2.16 are failing in the same way as
they do on current master.  Therefore if the offloading thread will
be re-scheduled or delayed for any other reason, it may use already
freed datapath structure.

Best regards, Ilya Maximets.
Gaetan Rivet Jan. 31, 2022, 10:42 a.m. UTC | #6
On Thu, Jan 27, 2022, at 11:38, Ilya Maximets wrote:
> On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote:
>> In dp_netdev_pmd_remove_flow() we schedule the deletion of an
>> offloaded flow, if a mark has been assigned to the flow. But if
>> this occurs in the window in which the offload thread completes
>> offloading the flow and assigns a mark to the flow, then we miss
>> deleting the flow. This problem has been observed while adding
>> and deleting flows in a loop. To fix this, always enqueue flow
>> deletion regardless of the flow->mark being set.
>> 
>> Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
>> Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
>> ---
>>  lib/dpif-netdev.c | 4 +---
>>  1 file changed, 1 insertion(+), 3 deletions(-)
>> 
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index e28e0b554..22c5f19a8 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd,
>>      dp_netdev_simple_match_remove(pmd, flow);
>>      cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow->ufid));
>>      ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
>> -    if (flow->mark != INVALID_FLOW_MARK) {
>> -        queue_netdev_flow_del(pmd, flow);
>> -    }
>> +    queue_netdev_flow_del(pmd, flow);
>>      flow->dead = true;
>>  
>>      dp_netdev_flow_unref(flow);
>> 
>
> Thanks for the patch, but it looks like that it's not that simple...
> A lot of tests are crashing in GHA and ASAN reports use-after-free
> on flow disassociation if the dpif already destroyed:
>   https://github.com/ovsrobot/ovs/actions/runs/1754866321
>
> I think that the problem is that offload thread holds the ref count
> for PMD thread, but PMD thread structure doesn't hold the ref count
> for the dp, because it doesn't expect that dp_netdev_pmd structure
> will be used while PMD thread is destroyed.  I guess, we should fix
> that.  OTOH, I'm not sure if offload thread actually needs a reference
> to dp_netdev_pmd structure.  If I didn't miss something, it only
> uses pmd pointer to access pmd->dp.  So, maybe, dp_offload_thread_item
> should hold and ref the dp pointer instead?
>
> What do you think?  Gaetan?
>

Hi Ilya,

I've been looking into this issue, you are right that the offload threads don't
actually need the pmd pointer, only the dp one.

The problem I see with a proper reference taking on dp to protect against this
UAF, is that it is prohibitively slow to manage those dp references, and it relies on
a global lock.

I have tried to find an alternative (the flush op could be tagged to mark that it was issued during a dp_netdev_free), but I always end-up with a race-condition, even if the window is reduced.

So given that the refcounting is too slow but necessary, I think the only practical solution
is to rewrite it to be fast enough. I will send an RFC to that end, let me know if you think
of something better.
Ilya Maximets Jan. 31, 2022, 7:33 p.m. UTC | #7
On 1/31/22 11:42, Gaëtan Rivet wrote:
> On Thu, Jan 27, 2022, at 11:38, Ilya Maximets wrote:
>> On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote:
>>> In dp_netdev_pmd_remove_flow() we schedule the deletion of an
>>> offloaded flow, if a mark has been assigned to the flow. But if
>>> this occurs in the window in which the offload thread completes
>>> offloading the flow and assigns a mark to the flow, then we miss
>>> deleting the flow. This problem has been observed while adding
>>> and deleting flows in a loop. To fix this, always enqueue flow
>>> deletion regardless of the flow->mark being set.
>>>
>>> Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
>>> Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
>>> ---
>>>  lib/dpif-netdev.c | 4 +---
>>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>>
>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>> index e28e0b554..22c5f19a8 100644
>>> --- a/lib/dpif-netdev.c
>>> +++ b/lib/dpif-netdev.c
>>> @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd,
>>>      dp_netdev_simple_match_remove(pmd, flow);
>>>      cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow->ufid));
>>>      ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
>>> -    if (flow->mark != INVALID_FLOW_MARK) {
>>> -        queue_netdev_flow_del(pmd, flow);
>>> -    }
>>> +    queue_netdev_flow_del(pmd, flow);
>>>      flow->dead = true;
>>>  
>>>      dp_netdev_flow_unref(flow);
>>>
>>
>> Thanks for the patch, but it looks like that it's not that simple...
>> A lot of tests are crashing in GHA and ASAN reports use-after-free
>> on flow disassociation if the dpif already destroyed:
>>   https://github.com/ovsrobot/ovs/actions/runs/1754866321
>>
>> I think that the problem is that offload thread holds the ref count
>> for PMD thread, but PMD thread structure doesn't hold the ref count
>> for the dp, because it doesn't expect that dp_netdev_pmd structure
>> will be used while PMD thread is destroyed.  I guess, we should fix
>> that.  OTOH, I'm not sure if offload thread actually needs a reference
>> to dp_netdev_pmd structure.  If I didn't miss something, it only
>> uses pmd pointer to access pmd->dp.  So, maybe, dp_offload_thread_item
>> should hold and ref the dp pointer instead?
>>
>> What do you think?  Gaetan?
>>
> 
> Hi Ilya,
> 
> I've been looking into this issue, you are right that the offload threads don't
> actually need the pmd pointer, only the dp one.
> 
> The problem I see with a proper reference taking on dp to protect against this
> UAF, is that it is prohibitively slow to manage those dp references, and it relies on
> a global lock.
> 
> I have tried to find an alternative (the flush op could be tagged to mark that
> it was issued during a dp_netdev_free), but I always end-up with a race-condition,
> even if the window is reduced.
Hmm.  Is it with flow API enabled or disabled?
With flow api enabled we should have a flush operation that should
in theory wait for all the previous offloading operations to be
completed.  If it's disabled though, flow deletion will initialize
the offload thread and enqueue the operation while flush will not
actually be requested.

I missed before that we have flush on that path.  Maybe it's enough
to just add a !netdev_is_flow_api_enabled() check to the deletion path?
Or am I missing something else?

> 
> So given that the refcounting is too slow but necessary, I think the only practical
> solution is to rewrite it to be fast enough. I will send an RFC to that end, let
> me know if you think of something better.
Gaetan Rivet Feb. 1, 2022, 11:38 a.m. UTC | #8
On Mon, Jan 31, 2022, at 20:33, Ilya Maximets wrote:
> On 1/31/22 11:42, Gaëtan Rivet wrote:
>> On Thu, Jan 27, 2022, at 11:38, Ilya Maximets wrote:
>>> On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote:
>>>> In dp_netdev_pmd_remove_flow() we schedule the deletion of an
>>>> offloaded flow, if a mark has been assigned to the flow. But if
>>>> this occurs in the window in which the offload thread completes
>>>> offloading the flow and assigns a mark to the flow, then we miss
>>>> deleting the flow. This problem has been observed while adding
>>>> and deleting flows in a loop. To fix this, always enqueue flow
>>>> deletion regardless of the flow->mark being set.
>>>>
>>>> Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
>>>> Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
>>>> ---
>>>>  lib/dpif-netdev.c | 4 +---
>>>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>>>
>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>>> index e28e0b554..22c5f19a8 100644
>>>> --- a/lib/dpif-netdev.c
>>>> +++ b/lib/dpif-netdev.c
>>>> @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd,
>>>>      dp_netdev_simple_match_remove(pmd, flow);
>>>>      cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow->ufid));
>>>>      ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
>>>> -    if (flow->mark != INVALID_FLOW_MARK) {
>>>> -        queue_netdev_flow_del(pmd, flow);
>>>> -    }
>>>> +    queue_netdev_flow_del(pmd, flow);
>>>>      flow->dead = true;
>>>>  
>>>>      dp_netdev_flow_unref(flow);
>>>>
>>>
>>> Thanks for the patch, but it looks like that it's not that simple...
>>> A lot of tests are crashing in GHA and ASAN reports use-after-free
>>> on flow disassociation if the dpif already destroyed:
>>>   https://github.com/ovsrobot/ovs/actions/runs/1754866321
>>>
>>> I think that the problem is that offload thread holds the ref count
>>> for PMD thread, but PMD thread structure doesn't hold the ref count
>>> for the dp, because it doesn't expect that dp_netdev_pmd structure
>>> will be used while PMD thread is destroyed.  I guess, we should fix
>>> that.  OTOH, I'm not sure if offload thread actually needs a reference
>>> to dp_netdev_pmd structure.  If I didn't miss something, it only
>>> uses pmd pointer to access pmd->dp.  So, maybe, dp_offload_thread_item
>>> should hold and ref the dp pointer instead?
>>>
>>> What do you think?  Gaetan?
>>>
>> 
>> Hi Ilya,
>> 
>> I've been looking into this issue, you are right that the offload threads don't
>> actually need the pmd pointer, only the dp one.
>> 
>> The problem I see with a proper reference taking on dp to protect against this
>> UAF, is that it is prohibitively slow to manage those dp references, and it relies on
>> a global lock.
>> 
>> I have tried to find an alternative (the flush op could be tagged to mark that
>> it was issued during a dp_netdev_free), but I always end-up with a race-condition,
>> even if the window is reduced.
> Hmm.  Is it with flow API enabled or disabled?
> With flow api enabled we should have a flush operation that should
> in theory wait for all the previous offloading operations to be
> completed.  If it's disabled though, flow deletion will initialize
> the offload thread and enqueue the operation while flush will not
> actually be requested.

Yes, on your second point, with the API disabled the deletion path relied on
the test flow->mark != INVALID to know whether a flow was offload or not, and did not
care then to check the API status.
The API state should now be checked as well for the deletion path.

>
> I missed before that we have flush on that path.  Maybe it's enough
> to just add a !netdev_is_flow_api_enabled() check to the deletion path?
> Or am I missing something else?
>

After the flush, there will be remaining ops in the queue that would trigger the UAF.
The api_enabled() check on del emission is not sufficient.
During the flush, with a special tag telling that it is part of deleting the whole
datapath, the API could be marked disabled, and the api_enabled() could be
checked again in the offload thread, before any dereference of dp.

I didn't consider it as the API was not meant to be disabled then possible re-enabled
after a datapath re-creation potentially. That could work, with some changes on
the api_enabled() flag. The current flag should hold the user intent as set in the config,
a second flag should hold the API liveness as set related to the dp_netdev status.
That second flag would be the one read during the checks, and written on open and after
a flush, before getting past the barrier. All offload threads should just discard
offload requests if this liveness flag is false.
Ilya Maximets Feb. 1, 2022, 11:48 a.m. UTC | #9
On 2/1/22 12:38, Gaëtan Rivet wrote:
> On Mon, Jan 31, 2022, at 20:33, Ilya Maximets wrote:
>> On 1/31/22 11:42, Gaëtan Rivet wrote:
>>> On Thu, Jan 27, 2022, at 11:38, Ilya Maximets wrote:
>>>> On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote:
>>>>> In dp_netdev_pmd_remove_flow() we schedule the deletion of an
>>>>> offloaded flow, if a mark has been assigned to the flow. But if
>>>>> this occurs in the window in which the offload thread completes
>>>>> offloading the flow and assigns a mark to the flow, then we miss
>>>>> deleting the flow. This problem has been observed while adding
>>>>> and deleting flows in a loop. To fix this, always enqueue flow
>>>>> deletion regardless of the flow->mark being set.
>>>>>
>>>>> Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
>>>>> Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
>>>>> ---
>>>>>  lib/dpif-netdev.c | 4 +---
>>>>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>>>> index e28e0b554..22c5f19a8 100644
>>>>> --- a/lib/dpif-netdev.c
>>>>> +++ b/lib/dpif-netdev.c
>>>>> @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd,
>>>>>      dp_netdev_simple_match_remove(pmd, flow);
>>>>>      cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow->ufid));
>>>>>      ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
>>>>> -    if (flow->mark != INVALID_FLOW_MARK) {
>>>>> -        queue_netdev_flow_del(pmd, flow);
>>>>> -    }
>>>>> +    queue_netdev_flow_del(pmd, flow);
>>>>>      flow->dead = true;
>>>>>  
>>>>>      dp_netdev_flow_unref(flow);
>>>>>
>>>>
>>>> Thanks for the patch, but it looks like that it's not that simple...
>>>> A lot of tests are crashing in GHA and ASAN reports use-after-free
>>>> on flow disassociation if the dpif already destroyed:
>>>>   https://github.com/ovsrobot/ovs/actions/runs/1754866321
>>>>
>>>> I think that the problem is that offload thread holds the ref count
>>>> for PMD thread, but PMD thread structure doesn't hold the ref count
>>>> for the dp, because it doesn't expect that dp_netdev_pmd structure
>>>> will be used while PMD thread is destroyed.  I guess, we should fix
>>>> that.  OTOH, I'm not sure if offload thread actually needs a reference
>>>> to dp_netdev_pmd structure.  If I didn't miss something, it only
>>>> uses pmd pointer to access pmd->dp.  So, maybe, dp_offload_thread_item
>>>> should hold and ref the dp pointer instead?
>>>>
>>>> What do you think?  Gaetan?
>>>>
>>>
>>> Hi Ilya,
>>>
>>> I've been looking into this issue, you are right that the offload threads don't
>>> actually need the pmd pointer, only the dp one.
>>>
>>> The problem I see with a proper reference taking on dp to protect against this
>>> UAF, is that it is prohibitively slow to manage those dp references, and it relies on
>>> a global lock.
>>>
>>> I have tried to find an alternative (the flush op could be tagged to mark that
>>> it was issued during a dp_netdev_free), but I always end-up with a race-condition,
>>> even if the window is reduced.
>> Hmm.  Is it with flow API enabled or disabled?
>> With flow api enabled we should have a flush operation that should
>> in theory wait for all the previous offloading operations to be
>> completed.  If it's disabled though, flow deletion will initialize
>> the offload thread and enqueue the operation while flush will not
>> actually be requested.
> 
> Yes, on your second point, with the API disabled the deletion path relied on
> the test flow->mark != INVALID to know whether a flow was offload or not, and did not
> care then to check the API status.
> The API state should now be checked as well for the deletion path.
> 
>>
>> I missed before that we have flush on that path.  Maybe it's enough
>> to just add a !netdev_is_flow_api_enabled() check to the deletion path?
>> Or am I missing something else?
>>
> 
> After the flush, there will be remaining ops in the queue that would trigger the UAF.

What are these ops?

> The api_enabled() check on del emission is not sufficient.
> During the flush, with a special tag telling that it is part of deleting the whole
> datapath, the API could be marked disabled, and the api_enabled() could be
> checked again in the offload thread, before any dereference of dp.
> 
> I didn't consider it as the API was not meant to be disabled then possible re-enabled
> after a datapath re-creation potentially. That could work, with some changes on
> the api_enabled() flag. The current flag should hold the user intent as set in the config,
> a second flag should hold the API liveness as set related to the dp_netdev status.
> That second flag would be the one read during the checks, and written on open and after
> a flush, before getting past the barrier. All offload threads should just discard
> offload requests if this liveness flag is false.
>
Gaetan Rivet Feb. 1, 2022, 1:38 p.m. UTC | #10
On Tue, Feb 1, 2022, at 12:48, Ilya Maximets wrote:
> On 2/1/22 12:38, Gaëtan Rivet wrote:
>> On Mon, Jan 31, 2022, at 20:33, Ilya Maximets wrote:
>>> On 1/31/22 11:42, Gaëtan Rivet wrote:
>>>> On Thu, Jan 27, 2022, at 11:38, Ilya Maximets wrote:
>>>>> On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote:
>>>>>> In dp_netdev_pmd_remove_flow() we schedule the deletion of an
>>>>>> offloaded flow, if a mark has been assigned to the flow. But if
>>>>>> this occurs in the window in which the offload thread completes
>>>>>> offloading the flow and assigns a mark to the flow, then we miss
>>>>>> deleting the flow. This problem has been observed while adding
>>>>>> and deleting flows in a loop. To fix this, always enqueue flow
>>>>>> deletion regardless of the flow->mark being set.
>>>>>>
>>>>>> Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
>>>>>> Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
>>>>>> ---
>>>>>>  lib/dpif-netdev.c | 4 +---
>>>>>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>>>>> index e28e0b554..22c5f19a8 100644
>>>>>> --- a/lib/dpif-netdev.c
>>>>>> +++ b/lib/dpif-netdev.c
>>>>>> @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd,
>>>>>>      dp_netdev_simple_match_remove(pmd, flow);
>>>>>>      cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow->ufid));
>>>>>>      ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
>>>>>> -    if (flow->mark != INVALID_FLOW_MARK) {
>>>>>> -        queue_netdev_flow_del(pmd, flow);
>>>>>> -    }
>>>>>> +    queue_netdev_flow_del(pmd, flow);
>>>>>>      flow->dead = true;
>>>>>>  
>>>>>>      dp_netdev_flow_unref(flow);
>>>>>>
>>>>>
>>>>> Thanks for the patch, but it looks like that it's not that simple...
>>>>> A lot of tests are crashing in GHA and ASAN reports use-after-free
>>>>> on flow disassociation if the dpif already destroyed:
>>>>>   https://github.com/ovsrobot/ovs/actions/runs/1754866321
>>>>>
>>>>> I think that the problem is that offload thread holds the ref count
>>>>> for PMD thread, but PMD thread structure doesn't hold the ref count
>>>>> for the dp, because it doesn't expect that dp_netdev_pmd structure
>>>>> will be used while PMD thread is destroyed.  I guess, we should fix
>>>>> that.  OTOH, I'm not sure if offload thread actually needs a reference
>>>>> to dp_netdev_pmd structure.  If I didn't miss something, it only
>>>>> uses pmd pointer to access pmd->dp.  So, maybe, dp_offload_thread_item
>>>>> should hold and ref the dp pointer instead?
>>>>>
>>>>> What do you think?  Gaetan?
>>>>>
>>>>
>>>> Hi Ilya,
>>>>
>>>> I've been looking into this issue, you are right that the offload threads don't
>>>> actually need the pmd pointer, only the dp one.
>>>>
>>>> The problem I see with a proper reference taking on dp to protect against this
>>>> UAF, is that it is prohibitively slow to manage those dp references, and it relies on
>>>> a global lock.
>>>>
>>>> I have tried to find an alternative (the flush op could be tagged to mark that
>>>> it was issued during a dp_netdev_free), but I always end-up with a race-condition,
>>>> even if the window is reduced.
>>> Hmm.  Is it with flow API enabled or disabled?
>>> With flow api enabled we should have a flush operation that should
>>> in theory wait for all the previous offloading operations to be
>>> completed.  If it's disabled though, flow deletion will initialize
>>> the offload thread and enqueue the operation while flush will not
>>> actually be requested.
>> 
>> Yes, on your second point, with the API disabled the deletion path relied on
>> the test flow->mark != INVALID to know whether a flow was offload or not, and did not
>> care then to check the API status.
>> The API state should now be checked as well for the deletion path.
>> 
>>>
>>> I missed before that we have flush on that path.  Maybe it's enough
>>> to just add a !netdev_is_flow_api_enabled() check to the deletion path?
>>> Or am I missing something else?
>>>
>> 
>> After the flush, there will be remaining ops in the queue that would trigger the UAF.
>
> What are these ops?
>

Any add/mod/del can be found there.

Flushes are scheduled by the main thread in response to user commands.
While it executes, PMDs et revalidators are running, potentially enqueuing
offload ops. The only restriction on those ops is that it's impossible to have multiple
flush in the queue: only one can exist at a time.
Ilya Maximets Feb. 1, 2022, 2 p.m. UTC | #11
On 2/1/22 14:38, Gaëtan Rivet wrote:
> On Tue, Feb 1, 2022, at 12:48, Ilya Maximets wrote:
>> On 2/1/22 12:38, Gaëtan Rivet wrote:
>>> On Mon, Jan 31, 2022, at 20:33, Ilya Maximets wrote:
>>>> On 1/31/22 11:42, Gaëtan Rivet wrote:
>>>>> On Thu, Jan 27, 2022, at 11:38, Ilya Maximets wrote:
>>>>>> On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote:
>>>>>>> In dp_netdev_pmd_remove_flow() we schedule the deletion of an
>>>>>>> offloaded flow, if a mark has been assigned to the flow. But if
>>>>>>> this occurs in the window in which the offload thread completes
>>>>>>> offloading the flow and assigns a mark to the flow, then we miss
>>>>>>> deleting the flow. This problem has been observed while adding
>>>>>>> and deleting flows in a loop. To fix this, always enqueue flow
>>>>>>> deletion regardless of the flow->mark being set.
>>>>>>>
>>>>>>> Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
>>>>>>> Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
>>>>>>> ---
>>>>>>>  lib/dpif-netdev.c | 4 +---
>>>>>>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>>>>>>
>>>>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>>>>>> index e28e0b554..22c5f19a8 100644
>>>>>>> --- a/lib/dpif-netdev.c
>>>>>>> +++ b/lib/dpif-netdev.c
>>>>>>> @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd,
>>>>>>>      dp_netdev_simple_match_remove(pmd, flow);
>>>>>>>      cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow->ufid));
>>>>>>>      ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
>>>>>>> -    if (flow->mark != INVALID_FLOW_MARK) {
>>>>>>> -        queue_netdev_flow_del(pmd, flow);
>>>>>>> -    }
>>>>>>> +    queue_netdev_flow_del(pmd, flow);
>>>>>>>      flow->dead = true;
>>>>>>>  
>>>>>>>      dp_netdev_flow_unref(flow);
>>>>>>>
>>>>>>
>>>>>> Thanks for the patch, but it looks like that it's not that simple...
>>>>>> A lot of tests are crashing in GHA and ASAN reports use-after-free
>>>>>> on flow disassociation if the dpif already destroyed:
>>>>>>   https://github.com/ovsrobot/ovs/actions/runs/1754866321
>>>>>>
>>>>>> I think that the problem is that offload thread holds the ref count
>>>>>> for PMD thread, but PMD thread structure doesn't hold the ref count
>>>>>> for the dp, because it doesn't expect that dp_netdev_pmd structure
>>>>>> will be used while PMD thread is destroyed.  I guess, we should fix
>>>>>> that.  OTOH, I'm not sure if offload thread actually needs a reference
>>>>>> to dp_netdev_pmd structure.  If I didn't miss something, it only
>>>>>> uses pmd pointer to access pmd->dp.  So, maybe, dp_offload_thread_item
>>>>>> should hold and ref the dp pointer instead?
>>>>>>
>>>>>> What do you think?  Gaetan?
>>>>>>
>>>>>
>>>>> Hi Ilya,
>>>>>
>>>>> I've been looking into this issue, you are right that the offload threads don't
>>>>> actually need the pmd pointer, only the dp one.
>>>>>
>>>>> The problem I see with a proper reference taking on dp to protect against this
>>>>> UAF, is that it is prohibitively slow to manage those dp references, and it relies on
>>>>> a global lock.
>>>>>
>>>>> I have tried to find an alternative (the flush op could be tagged to mark that
>>>>> it was issued during a dp_netdev_free), but I always end-up with a race-condition,
>>>>> even if the window is reduced.
>>>> Hmm.  Is it with flow API enabled or disabled?
>>>> With flow api enabled we should have a flush operation that should
>>>> in theory wait for all the previous offloading operations to be
>>>> completed.  If it's disabled though, flow deletion will initialize
>>>> the offload thread and enqueue the operation while flush will not
>>>> actually be requested.
>>>
>>> Yes, on your second point, with the API disabled the deletion path relied on
>>> the test flow->mark != INVALID to know whether a flow was offload or not, and did not
>>> care then to check the API status.
>>> The API state should now be checked as well for the deletion path.
>>>
>>>>
>>>> I missed before that we have flush on that path.  Maybe it's enough
>>>> to just add a !netdev_is_flow_api_enabled() check to the deletion path?
>>>> Or am I missing something else?
>>>>
>>>
>>> After the flush, there will be remaining ops in the queue that would trigger the UAF.
>>
>> What are these ops?
>>
> 
> Any add/mod/del can be found there.
> 
> Flushes are scheduled by the main thread in response to user commands.
> While it executes, PMDs et revalidators are running, potentially enqueuing
> offload ops. The only restriction on those ops is that it's impossible to have multiple
> flush in the queue: only one can exist at a time.
> 

I think, revalidators should already be stopped at the point of datapath
deletion.  Don't they?

For the PMDs, I see that they are still running and that sounds like a
bug, because we're still offloading things after the flush.  So, some
of the flows are still in hardware after the port removal / some userspace
structures still exist for flows of the deleted datapath.

Can we do something like this (just an untested concept):

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index e28e0b554..4b51ac652 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -2313,13 +2313,14 @@ static void
 do_del_port(struct dp_netdev *dp, struct dp_netdev_port *port)
     OVS_REQ_WRLOCK(dp->port_rwlock)
 {
-    dp_netdev_offload_flush(dp, port);
-    netdev_uninit_flow_api(port->netdev);
     hmap_remove(&dp->ports, &port->node);
     seq_change(dp->port_seq);
 
     reconfigure_datapath(dp);
 
+    dp_netdev_offload_flush(dp, port);
+    netdev_uninit_flow_api(port->netdev);
+
     port_destroy(port);
 }
 
---
?

In other words, we should flush flows after removing the port from
a PMD thread.  This way, after removing the last port of a datapath,
there should be no more offloading operations enqueued after the
last flush.  Of course, assuming that revalidators are stopped at
this point, which is what I hope for.
Does that make sense?

Best regards, Ilya Maximets.
Gaetan Rivet Feb. 1, 2022, 3:07 p.m. UTC | #12
On Tue, Feb 1, 2022, at 15:00, Ilya Maximets wrote:
> On 2/1/22 14:38, Gaëtan Rivet wrote:
>> On Tue, Feb 1, 2022, at 12:48, Ilya Maximets wrote:
>>> On 2/1/22 12:38, Gaëtan Rivet wrote:
>>>> On Mon, Jan 31, 2022, at 20:33, Ilya Maximets wrote:
>>>>> On 1/31/22 11:42, Gaëtan Rivet wrote:
>>>>>> On Thu, Jan 27, 2022, at 11:38, Ilya Maximets wrote:
>>>>>>> On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote:
>>>>>>>> In dp_netdev_pmd_remove_flow() we schedule the deletion of an
>>>>>>>> offloaded flow, if a mark has been assigned to the flow. But if
>>>>>>>> this occurs in the window in which the offload thread completes
>>>>>>>> offloading the flow and assigns a mark to the flow, then we miss
>>>>>>>> deleting the flow. This problem has been observed while adding
>>>>>>>> and deleting flows in a loop. To fix this, always enqueue flow
>>>>>>>> deletion regardless of the flow->mark being set.
>>>>>>>>
>>>>>>>> Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
>>>>>>>> Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
>>>>>>>> ---
>>>>>>>>  lib/dpif-netdev.c | 4 +---
>>>>>>>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>>>>>>> index e28e0b554..22c5f19a8 100644
>>>>>>>> --- a/lib/dpif-netdev.c
>>>>>>>> +++ b/lib/dpif-netdev.c
>>>>>>>> @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd,
>>>>>>>>      dp_netdev_simple_match_remove(pmd, flow);
>>>>>>>>      cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow->ufid));
>>>>>>>>      ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
>>>>>>>> -    if (flow->mark != INVALID_FLOW_MARK) {
>>>>>>>> -        queue_netdev_flow_del(pmd, flow);
>>>>>>>> -    }
>>>>>>>> +    queue_netdev_flow_del(pmd, flow);
>>>>>>>>      flow->dead = true;
>>>>>>>>  
>>>>>>>>      dp_netdev_flow_unref(flow);
>>>>>>>>
>>>>>>>
>>>>>>> Thanks for the patch, but it looks like that it's not that simple...
>>>>>>> A lot of tests are crashing in GHA and ASAN reports use-after-free
>>>>>>> on flow disassociation if the dpif already destroyed:
>>>>>>>   https://github.com/ovsrobot/ovs/actions/runs/1754866321
>>>>>>>
>>>>>>> I think that the problem is that offload thread holds the ref count
>>>>>>> for PMD thread, but PMD thread structure doesn't hold the ref count
>>>>>>> for the dp, because it doesn't expect that dp_netdev_pmd structure
>>>>>>> will be used while PMD thread is destroyed.  I guess, we should fix
>>>>>>> that.  OTOH, I'm not sure if offload thread actually needs a reference
>>>>>>> to dp_netdev_pmd structure.  If I didn't miss something, it only
>>>>>>> uses pmd pointer to access pmd->dp.  So, maybe, dp_offload_thread_item
>>>>>>> should hold and ref the dp pointer instead?
>>>>>>>
>>>>>>> What do you think?  Gaetan?
>>>>>>>
>>>>>>
>>>>>> Hi Ilya,
>>>>>>
>>>>>> I've been looking into this issue, you are right that the offload threads don't
>>>>>> actually need the pmd pointer, only the dp one.
>>>>>>
>>>>>> The problem I see with a proper reference taking on dp to protect against this
>>>>>> UAF, is that it is prohibitively slow to manage those dp references, and it relies on
>>>>>> a global lock.
>>>>>>
>>>>>> I have tried to find an alternative (the flush op could be tagged to mark that
>>>>>> it was issued during a dp_netdev_free), but I always end-up with a race-condition,
>>>>>> even if the window is reduced.
>>>>> Hmm.  Is it with flow API enabled or disabled?
>>>>> With flow api enabled we should have a flush operation that should
>>>>> in theory wait for all the previous offloading operations to be
>>>>> completed.  If it's disabled though, flow deletion will initialize
>>>>> the offload thread and enqueue the operation while flush will not
>>>>> actually be requested.
>>>>
>>>> Yes, on your second point, with the API disabled the deletion path relied on
>>>> the test flow->mark != INVALID to know whether a flow was offload or not, and did not
>>>> care then to check the API status.
>>>> The API state should now be checked as well for the deletion path.
>>>>
>>>>>
>>>>> I missed before that we have flush on that path.  Maybe it's enough
>>>>> to just add a !netdev_is_flow_api_enabled() check to the deletion path?
>>>>> Or am I missing something else?
>>>>>
>>>>
>>>> After the flush, there will be remaining ops in the queue that would trigger the UAF.
>>>
>>> What are these ops?
>>>
>> 
>> Any add/mod/del can be found there.
>> 
>> Flushes are scheduled by the main thread in response to user commands.
>> While it executes, PMDs et revalidators are running, potentially enqueuing
>> offload ops. The only restriction on those ops is that it's impossible to have multiple
>> flush in the queue: only one can exist at a time.
>> 
>
> I think, revalidators should already be stopped at the point of datapath
> deletion.  Don't they?
>
> For the PMDs, I see that they are still running and that sounds like a
> bug, because we're still offloading things after the flush.  So, some
> of the flows are still in hardware after the port removal / some userspace
> structures still exist for flows of the deleted datapath.
>
> Can we do something like this (just an untested concept):
>
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index e28e0b554..4b51ac652 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -2313,13 +2313,14 @@ static void
>  do_del_port(struct dp_netdev *dp, struct dp_netdev_port *port)
>      OVS_REQ_WRLOCK(dp->port_rwlock)
>  {
> -    dp_netdev_offload_flush(dp, port);
> -    netdev_uninit_flow_api(port->netdev);
>      hmap_remove(&dp->ports, &port->node);
>      seq_change(dp->port_seq);
> 
>      reconfigure_datapath(dp);
> 
> +    dp_netdev_offload_flush(dp, port);
> +    netdev_uninit_flow_api(port->netdev);
> +
>      port_destroy(port);
>  }
> 
> ---
> ?
>
> In other words, we should flush flows after removing the port from
> a PMD thread.  This way, after removing the last port of a datapath,
> there should be no more offloading operations enqueued after the
> last flush.  Of course, assuming that revalidators are stopped at
> this point, which is what I hope for.
> Does that make sense?
>

It makes sense, and it's much better than what I was proposing.
I think it should work.

The only remaining edge-case is with the deletion of a single port.
In that case revalidators will still be running, so after a flush some
offloads might still exist in the queue.
The offload thread will fail to get a netdev ref and will exit early,
so it's fine, but it's something to keep in mind.

I can propose the patches if everyone agrees with this fix.
Ilya Maximets Feb. 2, 2022, 3:52 p.m. UTC | #13
On 2/1/22 16:07, Gaëtan Rivet wrote:
> On Tue, Feb 1, 2022, at 15:00, Ilya Maximets wrote:
>> On 2/1/22 14:38, Gaëtan Rivet wrote:
>>> On Tue, Feb 1, 2022, at 12:48, Ilya Maximets wrote:
>>>> On 2/1/22 12:38, Gaëtan Rivet wrote:
>>>>> On Mon, Jan 31, 2022, at 20:33, Ilya Maximets wrote:
>>>>>> On 1/31/22 11:42, Gaëtan Rivet wrote:
>>>>>>> On Thu, Jan 27, 2022, at 11:38, Ilya Maximets wrote:
>>>>>>>> On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote:
>>>>>>>>> In dp_netdev_pmd_remove_flow() we schedule the deletion of an
>>>>>>>>> offloaded flow, if a mark has been assigned to the flow. But if
>>>>>>>>> this occurs in the window in which the offload thread completes
>>>>>>>>> offloading the flow and assigns a mark to the flow, then we miss
>>>>>>>>> deleting the flow. This problem has been observed while adding
>>>>>>>>> and deleting flows in a loop. To fix this, always enqueue flow
>>>>>>>>> deletion regardless of the flow->mark being set.
>>>>>>>>>
>>>>>>>>> Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
>>>>>>>>> Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
>>>>>>>>> ---
>>>>>>>>>  lib/dpif-netdev.c | 4 +---
>>>>>>>>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>>>>>>>> index e28e0b554..22c5f19a8 100644
>>>>>>>>> --- a/lib/dpif-netdev.c
>>>>>>>>> +++ b/lib/dpif-netdev.c
>>>>>>>>> @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd,
>>>>>>>>>      dp_netdev_simple_match_remove(pmd, flow);
>>>>>>>>>      cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow->ufid));
>>>>>>>>>      ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
>>>>>>>>> -    if (flow->mark != INVALID_FLOW_MARK) {
>>>>>>>>> -        queue_netdev_flow_del(pmd, flow);
>>>>>>>>> -    }
>>>>>>>>> +    queue_netdev_flow_del(pmd, flow);
>>>>>>>>>      flow->dead = true;
>>>>>>>>>  
>>>>>>>>>      dp_netdev_flow_unref(flow);
>>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks for the patch, but it looks like that it's not that simple...
>>>>>>>> A lot of tests are crashing in GHA and ASAN reports use-after-free
>>>>>>>> on flow disassociation if the dpif already destroyed:
>>>>>>>>   https://github.com/ovsrobot/ovs/actions/runs/1754866321
>>>>>>>>
>>>>>>>> I think that the problem is that offload thread holds the ref count
>>>>>>>> for PMD thread, but PMD thread structure doesn't hold the ref count
>>>>>>>> for the dp, because it doesn't expect that dp_netdev_pmd structure
>>>>>>>> will be used while PMD thread is destroyed.  I guess, we should fix
>>>>>>>> that.  OTOH, I'm not sure if offload thread actually needs a reference
>>>>>>>> to dp_netdev_pmd structure.  If I didn't miss something, it only
>>>>>>>> uses pmd pointer to access pmd->dp.  So, maybe, dp_offload_thread_item
>>>>>>>> should hold and ref the dp pointer instead?
>>>>>>>>
>>>>>>>> What do you think?  Gaetan?
>>>>>>>>
>>>>>>>
>>>>>>> Hi Ilya,
>>>>>>>
>>>>>>> I've been looking into this issue, you are right that the offload threads don't
>>>>>>> actually need the pmd pointer, only the dp one.
>>>>>>>
>>>>>>> The problem I see with a proper reference taking on dp to protect against this
>>>>>>> UAF, is that it is prohibitively slow to manage those dp references, and it relies on
>>>>>>> a global lock.
>>>>>>>
>>>>>>> I have tried to find an alternative (the flush op could be tagged to mark that
>>>>>>> it was issued during a dp_netdev_free), but I always end-up with a race-condition,
>>>>>>> even if the window is reduced.
>>>>>> Hmm.  Is it with flow API enabled or disabled?
>>>>>> With flow api enabled we should have a flush operation that should
>>>>>> in theory wait for all the previous offloading operations to be
>>>>>> completed.  If it's disabled though, flow deletion will initialize
>>>>>> the offload thread and enqueue the operation while flush will not
>>>>>> actually be requested.
>>>>>
>>>>> Yes, on your second point, with the API disabled the deletion path relied on
>>>>> the test flow->mark != INVALID to know whether a flow was offload or not, and did not
>>>>> care then to check the API status.
>>>>> The API state should now be checked as well for the deletion path.
>>>>>
>>>>>>
>>>>>> I missed before that we have flush on that path.  Maybe it's enough
>>>>>> to just add a !netdev_is_flow_api_enabled() check to the deletion path?
>>>>>> Or am I missing something else?
>>>>>>
>>>>>
>>>>> After the flush, there will be remaining ops in the queue that would trigger the UAF.
>>>>
>>>> What are these ops?
>>>>
>>>
>>> Any add/mod/del can be found there.
>>>
>>> Flushes are scheduled by the main thread in response to user commands.
>>> While it executes, PMDs et revalidators are running, potentially enqueuing
>>> offload ops. The only restriction on those ops is that it's impossible to have multiple
>>> flush in the queue: only one can exist at a time.
>>>
>>
>> I think, revalidators should already be stopped at the point of datapath
>> deletion.  Don't they?
>>
>> For the PMDs, I see that they are still running and that sounds like a
>> bug, because we're still offloading things after the flush.  So, some
>> of the flows are still in hardware after the port removal / some userspace
>> structures still exist for flows of the deleted datapath.
>>
>> Can we do something like this (just an untested concept):
>>
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index e28e0b554..4b51ac652 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -2313,13 +2313,14 @@ static void
>>  do_del_port(struct dp_netdev *dp, struct dp_netdev_port *port)
>>      OVS_REQ_WRLOCK(dp->port_rwlock)
>>  {
>> -    dp_netdev_offload_flush(dp, port);
>> -    netdev_uninit_flow_api(port->netdev);
>>      hmap_remove(&dp->ports, &port->node);
>>      seq_change(dp->port_seq);
>>
>>      reconfigure_datapath(dp);
>>
>> +    dp_netdev_offload_flush(dp, port);
>> +    netdev_uninit_flow_api(port->netdev);
>> +
>>      port_destroy(port);
>>  }
>>
>> ---
>> ?
>>
>> In other words, we should flush flows after removing the port from
>> a PMD thread.  This way, after removing the last port of a datapath,
>> there should be no more offloading operations enqueued after the
>> last flush.  Of course, assuming that revalidators are stopped at
>> this point, which is what I hope for.
>> Does that make sense?
>>
> 
> It makes sense, and it's much better than what I was proposing.
> I think it should work.
> 
> The only remaining edge-case is with the deletion of a single port.
> In that case revalidators will still be running, so after a flush some
> offloads might still exist in the queue.
> The offload thread will fail to get a netdev ref and will exit early,
> so it's fine, but it's something to keep in mind.
> 
> I can propose the patches if everyone agrees with this fix.
> 

Sounds OK to me.  Harsha, what do you think?

Best regards, Ilya Maximets.
Sriharsha Basavapatna Feb. 2, 2022, 5:19 p.m. UTC | #14
On Wed, Feb 2, 2022 at 9:22 PM Ilya Maximets <i.maximets@ovn.org> wrote:
>
> On 2/1/22 16:07, Gaëtan Rivet wrote:
> > On Tue, Feb 1, 2022, at 15:00, Ilya Maximets wrote:
> >> On 2/1/22 14:38, Gaëtan Rivet wrote:
> >>> On Tue, Feb 1, 2022, at 12:48, Ilya Maximets wrote:
> >>>> On 2/1/22 12:38, Gaëtan Rivet wrote:
> >>>>> On Mon, Jan 31, 2022, at 20:33, Ilya Maximets wrote:
> >>>>>> On 1/31/22 11:42, Gaëtan Rivet wrote:
> >>>>>>> On Thu, Jan 27, 2022, at 11:38, Ilya Maximets wrote:
> >>>>>>>> On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote:
> >>>>>>>>> In dp_netdev_pmd_remove_flow() we schedule the deletion of an
> >>>>>>>>> offloaded flow, if a mark has been assigned to the flow. But if
> >>>>>>>>> this occurs in the window in which the offload thread completes
> >>>>>>>>> offloading the flow and assigns a mark to the flow, then we miss
> >>>>>>>>> deleting the flow. This problem has been observed while adding
> >>>>>>>>> and deleting flows in a loop. To fix this, always enqueue flow
> >>>>>>>>> deletion regardless of the flow->mark being set.
> >>>>>>>>>
> >>>>>>>>> Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
> >>>>>>>>> Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
> >>>>>>>>> ---
> >>>>>>>>>  lib/dpif-netdev.c | 4 +---
> >>>>>>>>>  1 file changed, 1 insertion(+), 3 deletions(-)
> >>>>>>>>>
> >>>>>>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> >>>>>>>>> index e28e0b554..22c5f19a8 100644
> >>>>>>>>> --- a/lib/dpif-netdev.c
> >>>>>>>>> +++ b/lib/dpif-netdev.c
> >>>>>>>>> @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd,
> >>>>>>>>>      dp_netdev_simple_match_remove(pmd, flow);
> >>>>>>>>>      cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow->ufid));
> >>>>>>>>>      ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
> >>>>>>>>> -    if (flow->mark != INVALID_FLOW_MARK) {
> >>>>>>>>> -        queue_netdev_flow_del(pmd, flow);
> >>>>>>>>> -    }
> >>>>>>>>> +    queue_netdev_flow_del(pmd, flow);
> >>>>>>>>>      flow->dead = true;
> >>>>>>>>>
> >>>>>>>>>      dp_netdev_flow_unref(flow);
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks for the patch, but it looks like that it's not that simple...
> >>>>>>>> A lot of tests are crashing in GHA and ASAN reports use-after-free
> >>>>>>>> on flow disassociation if the dpif already destroyed:
> >>>>>>>>   https://github.com/ovsrobot/ovs/actions/runs/1754866321
> >>>>>>>>
> >>>>>>>> I think that the problem is that offload thread holds the ref count
> >>>>>>>> for PMD thread, but PMD thread structure doesn't hold the ref count
> >>>>>>>> for the dp, because it doesn't expect that dp_netdev_pmd structure
> >>>>>>>> will be used while PMD thread is destroyed.  I guess, we should fix
> >>>>>>>> that.  OTOH, I'm not sure if offload thread actually needs a reference
> >>>>>>>> to dp_netdev_pmd structure.  If I didn't miss something, it only
> >>>>>>>> uses pmd pointer to access pmd->dp.  So, maybe, dp_offload_thread_item
> >>>>>>>> should hold and ref the dp pointer instead?
> >>>>>>>>
> >>>>>>>> What do you think?  Gaetan?
> >>>>>>>>
> >>>>>>>
> >>>>>>> Hi Ilya,
> >>>>>>>
> >>>>>>> I've been looking into this issue, you are right that the offload threads don't
> >>>>>>> actually need the pmd pointer, only the dp one.
> >>>>>>>
> >>>>>>> The problem I see with a proper reference taking on dp to protect against this
> >>>>>>> UAF, is that it is prohibitively slow to manage those dp references, and it relies on
> >>>>>>> a global lock.
> >>>>>>>
> >>>>>>> I have tried to find an alternative (the flush op could be tagged to mark that
> >>>>>>> it was issued during a dp_netdev_free), but I always end-up with a race-condition,
> >>>>>>> even if the window is reduced.
> >>>>>> Hmm.  Is it with flow API enabled or disabled?
> >>>>>> With flow api enabled we should have a flush operation that should
> >>>>>> in theory wait for all the previous offloading operations to be
> >>>>>> completed.  If it's disabled though, flow deletion will initialize
> >>>>>> the offload thread and enqueue the operation while flush will not
> >>>>>> actually be requested.
> >>>>>
> >>>>> Yes, on your second point, with the API disabled the deletion path relied on
> >>>>> the test flow->mark != INVALID to know whether a flow was offload or not, and did not
> >>>>> care then to check the API status.
> >>>>> The API state should now be checked as well for the deletion path.
> >>>>>
> >>>>>>
> >>>>>> I missed before that we have flush on that path.  Maybe it's enough
> >>>>>> to just add a !netdev_is_flow_api_enabled() check to the deletion path?
> >>>>>> Or am I missing something else?
> >>>>>>
> >>>>>
> >>>>> After the flush, there will be remaining ops in the queue that would trigger the UAF.
> >>>>
> >>>> What are these ops?
> >>>>
> >>>
> >>> Any add/mod/del can be found there.
> >>>
> >>> Flushes are scheduled by the main thread in response to user commands.
> >>> While it executes, PMDs et revalidators are running, potentially enqueuing
> >>> offload ops. The only restriction on those ops is that it's impossible to have multiple
> >>> flush in the queue: only one can exist at a time.
> >>>
> >>
> >> I think, revalidators should already be stopped at the point of datapath
> >> deletion.  Don't they?
> >>
> >> For the PMDs, I see that they are still running and that sounds like a
> >> bug, because we're still offloading things after the flush.  So, some
> >> of the flows are still in hardware after the port removal / some userspace
> >> structures still exist for flows of the deleted datapath.
> >>
> >> Can we do something like this (just an untested concept):
> >>
> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> >> index e28e0b554..4b51ac652 100644
> >> --- a/lib/dpif-netdev.c
> >> +++ b/lib/dpif-netdev.c
> >> @@ -2313,13 +2313,14 @@ static void
> >>  do_del_port(struct dp_netdev *dp, struct dp_netdev_port *port)
> >>      OVS_REQ_WRLOCK(dp->port_rwlock)
> >>  {
> >> -    dp_netdev_offload_flush(dp, port);
> >> -    netdev_uninit_flow_api(port->netdev);
> >>      hmap_remove(&dp->ports, &port->node);
> >>      seq_change(dp->port_seq);
> >>
> >>      reconfigure_datapath(dp);
> >>
> >> +    dp_netdev_offload_flush(dp, port);
> >> +    netdev_uninit_flow_api(port->netdev);
> >> +
> >>      port_destroy(port);
> >>  }
> >>
> >> ---
> >> ?
> >>
> >> In other words, we should flush flows after removing the port from
> >> a PMD thread.  This way, after removing the last port of a datapath,
> >> there should be no more offloading operations enqueued after the
> >> last flush.  Of course, assuming that revalidators are stopped at
> >> this point, which is what I hope for.
> >> Does that make sense?
> >>
> >
> > It makes sense, and it's much better than what I was proposing.
> > I think it should work.
> >
> > The only remaining edge-case is with the deletion of a single port.
> > In that case revalidators will still be running, so after a flush some
> > offloads might still exist in the queue.
> > The offload thread will fail to get a netdev ref and will exit early,
> > so it's fine, but it's something to keep in mind.
> >
> > I can propose the patches if everyone agrees with this fix.
> >
>
> Sounds OK to me.  Harsha, what do you think?
>
> Best regards, Ilya Maximets.

Thanks Ilya and Gaetan. It might be useful to add some inline comments
on why we do the flush after reconfigure_datapath(), just to highlight
the race condition that is addressed (along the lines of the above
comments by Ilya) ?
Otherwise it looks good.

Thanks,
-Harsha
Ilya Maximets Feb. 2, 2022, 6:22 p.m. UTC | #15
On 2/2/22 18:19, Sriharsha Basavapatna wrote:
> On Wed, Feb 2, 2022 at 9:22 PM Ilya Maximets <i.maximets@ovn.org> wrote:
>>
>> On 2/1/22 16:07, Gaëtan Rivet wrote:
>>> On Tue, Feb 1, 2022, at 15:00, Ilya Maximets wrote:
>>>> On 2/1/22 14:38, Gaëtan Rivet wrote:
>>>>> On Tue, Feb 1, 2022, at 12:48, Ilya Maximets wrote:
>>>>>> On 2/1/22 12:38, Gaëtan Rivet wrote:
>>>>>>> On Mon, Jan 31, 2022, at 20:33, Ilya Maximets wrote:
>>>>>>>> On 1/31/22 11:42, Gaëtan Rivet wrote:
>>>>>>>>> On Thu, Jan 27, 2022, at 11:38, Ilya Maximets wrote:
>>>>>>>>>> On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote:
>>>>>>>>>>> In dp_netdev_pmd_remove_flow() we schedule the deletion of an
>>>>>>>>>>> offloaded flow, if a mark has been assigned to the flow. But if
>>>>>>>>>>> this occurs in the window in which the offload thread completes
>>>>>>>>>>> offloading the flow and assigns a mark to the flow, then we miss
>>>>>>>>>>> deleting the flow. This problem has been observed while adding
>>>>>>>>>>> and deleting flows in a loop. To fix this, always enqueue flow
>>>>>>>>>>> deletion regardless of the flow->mark being set.
>>>>>>>>>>>
>>>>>>>>>>> Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
>>>>>>>>>>> Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
>>>>>>>>>>> ---
>>>>>>>>>>>  lib/dpif-netdev.c | 4 +---
>>>>>>>>>>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>>>>>>>>>> index e28e0b554..22c5f19a8 100644
>>>>>>>>>>> --- a/lib/dpif-netdev.c
>>>>>>>>>>> +++ b/lib/dpif-netdev.c
>>>>>>>>>>> @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd,
>>>>>>>>>>>      dp_netdev_simple_match_remove(pmd, flow);
>>>>>>>>>>>      cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow->ufid));
>>>>>>>>>>>      ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
>>>>>>>>>>> -    if (flow->mark != INVALID_FLOW_MARK) {
>>>>>>>>>>> -        queue_netdev_flow_del(pmd, flow);
>>>>>>>>>>> -    }
>>>>>>>>>>> +    queue_netdev_flow_del(pmd, flow);
>>>>>>>>>>>      flow->dead = true;
>>>>>>>>>>>
>>>>>>>>>>>      dp_netdev_flow_unref(flow);
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks for the patch, but it looks like that it's not that simple...
>>>>>>>>>> A lot of tests are crashing in GHA and ASAN reports use-after-free
>>>>>>>>>> on flow disassociation if the dpif already destroyed:
>>>>>>>>>>   https://github.com/ovsrobot/ovs/actions/runs/1754866321
>>>>>>>>>>
>>>>>>>>>> I think that the problem is that offload thread holds the ref count
>>>>>>>>>> for PMD thread, but PMD thread structure doesn't hold the ref count
>>>>>>>>>> for the dp, because it doesn't expect that dp_netdev_pmd structure
>>>>>>>>>> will be used while PMD thread is destroyed.  I guess, we should fix
>>>>>>>>>> that.  OTOH, I'm not sure if offload thread actually needs a reference
>>>>>>>>>> to dp_netdev_pmd structure.  If I didn't miss something, it only
>>>>>>>>>> uses pmd pointer to access pmd->dp.  So, maybe, dp_offload_thread_item
>>>>>>>>>> should hold and ref the dp pointer instead?
>>>>>>>>>>
>>>>>>>>>> What do you think?  Gaetan?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Ilya,
>>>>>>>>>
>>>>>>>>> I've been looking into this issue, you are right that the offload threads don't
>>>>>>>>> actually need the pmd pointer, only the dp one.
>>>>>>>>>
>>>>>>>>> The problem I see with a proper reference taking on dp to protect against this
>>>>>>>>> UAF, is that it is prohibitively slow to manage those dp references, and it relies on
>>>>>>>>> a global lock.
>>>>>>>>>
>>>>>>>>> I have tried to find an alternative (the flush op could be tagged to mark that
>>>>>>>>> it was issued during a dp_netdev_free), but I always end-up with a race-condition,
>>>>>>>>> even if the window is reduced.
>>>>>>>> Hmm.  Is it with flow API enabled or disabled?
>>>>>>>> With flow api enabled we should have a flush operation that should
>>>>>>>> in theory wait for all the previous offloading operations to be
>>>>>>>> completed.  If it's disabled though, flow deletion will initialize
>>>>>>>> the offload thread and enqueue the operation while flush will not
>>>>>>>> actually be requested.
>>>>>>>
>>>>>>> Yes, on your second point, with the API disabled the deletion path relied on
>>>>>>> the test flow->mark != INVALID to know whether a flow was offload or not, and did not
>>>>>>> care then to check the API status.
>>>>>>> The API state should now be checked as well for the deletion path.
>>>>>>>
>>>>>>>>
>>>>>>>> I missed before that we have flush on that path.  Maybe it's enough
>>>>>>>> to just add a !netdev_is_flow_api_enabled() check to the deletion path?
>>>>>>>> Or am I missing something else?
>>>>>>>>
>>>>>>>
>>>>>>> After the flush, there will be remaining ops in the queue that would trigger the UAF.
>>>>>>
>>>>>> What are these ops?
>>>>>>
>>>>>
>>>>> Any add/mod/del can be found there.
>>>>>
>>>>> Flushes are scheduled by the main thread in response to user commands.
>>>>> While it executes, PMDs et revalidators are running, potentially enqueuing
>>>>> offload ops. The only restriction on those ops is that it's impossible to have multiple
>>>>> flush in the queue: only one can exist at a time.
>>>>>
>>>>
>>>> I think, revalidators should already be stopped at the point of datapath
>>>> deletion.  Don't they?
>>>>
>>>> For the PMDs, I see that they are still running and that sounds like a
>>>> bug, because we're still offloading things after the flush.  So, some
>>>> of the flows are still in hardware after the port removal / some userspace
>>>> structures still exist for flows of the deleted datapath.
>>>>
>>>> Can we do something like this (just an untested concept):
>>>>
>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>>> index e28e0b554..4b51ac652 100644
>>>> --- a/lib/dpif-netdev.c
>>>> +++ b/lib/dpif-netdev.c
>>>> @@ -2313,13 +2313,14 @@ static void
>>>>  do_del_port(struct dp_netdev *dp, struct dp_netdev_port *port)
>>>>      OVS_REQ_WRLOCK(dp->port_rwlock)
>>>>  {
>>>> -    dp_netdev_offload_flush(dp, port);
>>>> -    netdev_uninit_flow_api(port->netdev);
>>>>      hmap_remove(&dp->ports, &port->node);
>>>>      seq_change(dp->port_seq);
>>>>
>>>>      reconfigure_datapath(dp);
>>>>
>>>> +    dp_netdev_offload_flush(dp, port);
>>>> +    netdev_uninit_flow_api(port->netdev);
>>>> +
>>>>      port_destroy(port);
>>>>  }
>>>>
>>>> ---
>>>> ?
>>>>
>>>> In other words, we should flush flows after removing the port from
>>>> a PMD thread.  This way, after removing the last port of a datapath,
>>>> there should be no more offloading operations enqueued after the
>>>> last flush.  Of course, assuming that revalidators are stopped at
>>>> this point, which is what I hope for.
>>>> Does that make sense?
>>>>
>>>
>>> It makes sense, and it's much better than what I was proposing.
>>> I think it should work.
>>>
>>> The only remaining edge-case is with the deletion of a single port.
>>> In that case revalidators will still be running, so after a flush some
>>> offloads might still exist in the queue.
>>> The offload thread will fail to get a netdev ref and will exit early,
>>> so it's fine, but it's something to keep in mind.
>>>
>>> I can propose the patches if everyone agrees with this fix.
>>>
>>
>> Sounds OK to me.  Harsha, what do you think?
>>
>> Best regards, Ilya Maximets.
> 
> Thanks Ilya and Gaetan. It might be useful to add some inline comments
> on why we do the flush after reconfigure_datapath(), just to highlight
> the race condition that is addressed (along the lines of the above
> comments by Ilya) ?

That's a good thing to do.  I agree.

> Otherwise it looks good.
> 
> Thanks,
> -Harsha
>
diff mbox series

Patch

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index e28e0b554..22c5f19a8 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -3029,9 +3029,7 @@  dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd,
     dp_netdev_simple_match_remove(pmd, flow);
     cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow->ufid));
     ccmap_dec(&pmd->n_flows, odp_to_u32(in_port));
-    if (flow->mark != INVALID_FLOW_MARK) {
-        queue_netdev_flow_del(pmd, flow);
-    }
+    queue_netdev_flow_del(pmd, flow);
     flow->dead = true;
 
     dp_netdev_flow_unref(flow);