diff mbox series

[ovs-dev,v2] ovs rcu: update rcu pointer first

Message ID 4099DE2E54AFAD489356C6C9161D53339735B3E8@dggeml522-mbs.china.huawei.com
State Superseded
Headers show
Series [ovs-dev,v2] ovs rcu: update rcu pointer first | expand

Commit Message

Linhaifeng June 2, 2020, 7:27 a.m. UTC
We should update rcu pointer first then use ovsrcu_postpone to free
otherwise maybe cause use-after-free.
e.g.,reader indicates momentary quiescent and access old pointer after
writer postpone free old pointer and before setting new pointer.

Signed-off-by: Linhaifeng <haifeng.lin@huawei.com>
---
 lib/classifier.c              |  4 ++--
 lib/ovs-rcu.h                 |  4 ++--
 lib/pvector.c                 | 15 ++++++++-------
 ofproto/ofproto-dpif-mirror.c |  4 ++--
 ofproto/ofproto-dpif-upcall.c |  3 +--
 5 files changed, 15 insertions(+), 15 deletions(-)

Comments

Ben Pfaff June 2, 2020, 5:27 p.m. UTC | #1
On Tue, Jun 02, 2020 at 07:27:59AM +0000, Linhaifeng wrote:
> We should update rcu pointer first then use ovsrcu_postpone to free
> otherwise maybe cause use-after-free.
> e.g.,reader indicates momentary quiescent and access old pointer after
> writer postpone free old pointer and before setting new pointer.
> 
> Signed-off-by: Linhaifeng <haifeng.lin@huawei.com>

I don't see how that's possible, since the writer hasn't quiesced.
Yanqin Wei June 2, 2020, 11:22 p.m. UTC | #2
Hi Ben,

If my understanding is correct, the writer could not be a rcu thread because it does not need report holding or not holding pointers.
So old memory will be freed after all rcu thread report quiesce.

Best Regards,
Wei Yanqin

> -----Original Message-----
> From: Ben Pfaff <blp@ovn.org>
> Sent: Wednesday, June 3, 2020 1:28 AM
> To: Linhaifeng <haifeng.lin@huawei.com>
> Cc: Yanqin Wei <Yanqin.Wei@arm.com>; dev@openvswitch.org; nd
> <nd@arm.com>; Lilijun (Jerry) <jerry.lilijun@huawei.com>; chenchanghu
> <chenchanghu@huawei.com>; Lichunhe <lichunhe@huawei.com>
> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
> 
> On Tue, Jun 02, 2020 at 07:27:59AM +0000, Linhaifeng wrote:
> > We should update rcu pointer first then use ovsrcu_postpone to free
> > otherwise maybe cause use-after-free.
> > e.g.,reader indicates momentary quiescent and access old pointer after
> > writer postpone free old pointer and before setting new pointer.
> >
> > Signed-off-by: Linhaifeng <haifeng.lin@huawei.com>
> 
> I don't see how that's possible, since the writer hasn't quiesced.
Ben Pfaff June 3, 2020, 12:34 a.m. UTC | #3
This is not how RCU works in OVS.  Every thread is by default considered
active.  They rarely quiesce except implicitly inside poll_block().
Please read the large comment at the top of ovs-rcu.h.

Is your patch based on actual bugs that you have found, or is it just
some kind of precaution?  If it is the latter, then it is not needed.

On Tue, Jun 02, 2020 at 11:22:57PM +0000, Yanqin Wei wrote:
> Hi Ben,
> 
> If my understanding is correct, the writer could not be a rcu thread because it does not need report holding or not holding pointers.
> So old memory will be freed after all rcu thread report quiesce.
> 
> Best Regards,
> Wei Yanqin
> 
> > -----Original Message-----
> > From: Ben Pfaff <blp@ovn.org>
> > Sent: Wednesday, June 3, 2020 1:28 AM
> > To: Linhaifeng <haifeng.lin@huawei.com>
> > Cc: Yanqin Wei <Yanqin.Wei@arm.com>; dev@openvswitch.org; nd
> > <nd@arm.com>; Lilijun (Jerry) <jerry.lilijun@huawei.com>; chenchanghu
> > <chenchanghu@huawei.com>; Lichunhe <lichunhe@huawei.com>
> > Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
> > 
> > On Tue, Jun 02, 2020 at 07:27:59AM +0000, Linhaifeng wrote:
> > > We should update rcu pointer first then use ovsrcu_postpone to free
> > > otherwise maybe cause use-after-free.
> > > e.g.,reader indicates momentary quiescent and access old pointer after
> > > writer postpone free old pointer and before setting new pointer.
> > >
> > > Signed-off-by: Linhaifeng <haifeng.lin@huawei.com>
> > 
> > I don't see how that's possible, since the writer hasn't quiesced.
Yanqin Wei June 3, 2020, 12:37 a.m. UTC | #4
Hi Ben,

This patch is from Linhai, but I have the same concern about this.  I will read ovs-rcu comments and feedback.
Thanks for your time.

Best Regards,
Wei Yanqin

> -----Original Message-----
> From: Ben Pfaff <blp@ovn.org>
> Sent: Wednesday, June 3, 2020 8:35 AM
> To: Yanqin Wei <Yanqin.Wei@arm.com>
> Cc: Linhaifeng <haifeng.lin@huawei.com>; dev@openvswitch.org; nd
> <nd@arm.com>; Lilijun (Jerry) <jerry.lilijun@huawei.com>; chenchanghu
> <chenchanghu@huawei.com>; Lichunhe <lichunhe@huawei.com>
> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
> 
> This is not how RCU works in OVS.  Every thread is by default considered active.
> They rarely quiesce except implicitly inside poll_block().
> Please read the large comment at the top of ovs-rcu.h.
> 
> Is your patch based on actual bugs that you have found, or is it just some kind
> of precaution?  If it is the latter, then it is not needed.
> 
> On Tue, Jun 02, 2020 at 11:22:57PM +0000, Yanqin Wei wrote:
> > Hi Ben,
> >
> > If my understanding is correct, the writer could not be a rcu thread because it
> does not need report holding or not holding pointers.
> > So old memory will be freed after all rcu thread report quiesce.
> >
> > Best Regards,
> > Wei Yanqin
> >
> > > -----Original Message-----
> > > From: Ben Pfaff <blp@ovn.org>
> > > Sent: Wednesday, June 3, 2020 1:28 AM
> > > To: Linhaifeng <haifeng.lin@huawei.com>
> > > Cc: Yanqin Wei <Yanqin.Wei@arm.com>; dev@openvswitch.org; nd
> > > <nd@arm.com>; Lilijun (Jerry) <jerry.lilijun@huawei.com>;
> > > chenchanghu <chenchanghu@huawei.com>; Lichunhe
> <lichunhe@huawei.com>
> > > Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
> > >
> > > On Tue, Jun 02, 2020 at 07:27:59AM +0000, Linhaifeng wrote:
> > > > We should update rcu pointer first then use ovsrcu_postpone to
> > > > free otherwise maybe cause use-after-free.
> > > > e.g.,reader indicates momentary quiescent and access old pointer
> > > > after writer postpone free old pointer and before setting new pointer.
> > > >
> > > > Signed-off-by: Linhaifeng <haifeng.lin@huawei.com>
> > >
> > > I don't see how that's possible, since the writer hasn't quiesced.
Ben Pfaff June 3, 2020, 12:47 a.m. UTC | #5
Oh, I apologize that I made a mistake about the author.

I appreciate feedback from anyone.

On Wed, Jun 03, 2020 at 12:37:27AM +0000, Yanqin Wei wrote:
> Hi Ben,
> 
> This patch is from Linhai, but I have the same concern about this.  I will read ovs-rcu comments and feedback.
> Thanks for your time.
> 
> Best Regards,
> Wei Yanqin
> 
> > -----Original Message-----
> > From: Ben Pfaff <blp@ovn.org>
> > Sent: Wednesday, June 3, 2020 8:35 AM
> > To: Yanqin Wei <Yanqin.Wei@arm.com>
> > Cc: Linhaifeng <haifeng.lin@huawei.com>; dev@openvswitch.org; nd
> > <nd@arm.com>; Lilijun (Jerry) <jerry.lilijun@huawei.com>; chenchanghu
> > <chenchanghu@huawei.com>; Lichunhe <lichunhe@huawei.com>
> > Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
> > 
> > This is not how RCU works in OVS.  Every thread is by default considered active.
> > They rarely quiesce except implicitly inside poll_block().
> > Please read the large comment at the top of ovs-rcu.h.
> > 
> > Is your patch based on actual bugs that you have found, or is it just some kind
> > of precaution?  If it is the latter, then it is not needed.
> > 
> > On Tue, Jun 02, 2020 at 11:22:57PM +0000, Yanqin Wei wrote:
> > > Hi Ben,
> > >
> > > If my understanding is correct, the writer could not be a rcu thread because it
> > does not need report holding or not holding pointers.
> > > So old memory will be freed after all rcu thread report quiesce.
> > >
> > > Best Regards,
> > > Wei Yanqin
> > >
> > > > -----Original Message-----
> > > > From: Ben Pfaff <blp@ovn.org>
> > > > Sent: Wednesday, June 3, 2020 1:28 AM
> > > > To: Linhaifeng <haifeng.lin@huawei.com>
> > > > Cc: Yanqin Wei <Yanqin.Wei@arm.com>; dev@openvswitch.org; nd
> > > > <nd@arm.com>; Lilijun (Jerry) <jerry.lilijun@huawei.com>;
> > > > chenchanghu <chenchanghu@huawei.com>; Lichunhe
> > <lichunhe@huawei.com>
> > > > Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
> > > >
> > > > On Tue, Jun 02, 2020 at 07:27:59AM +0000, Linhaifeng wrote:
> > > > > We should update rcu pointer first then use ovsrcu_postpone to
> > > > > free otherwise maybe cause use-after-free.
> > > > > e.g.,reader indicates momentary quiescent and access old pointer
> > > > > after writer postpone free old pointer and before setting new pointer.
> > > > >
> > > > > Signed-off-by: Linhaifeng <haifeng.lin@huawei.com>
> > > >
> > > > I don't see how that's possible, since the writer hasn't quiesced.
Linhaifeng June 3, 2020, 1:22 a.m. UTC | #6
-----Original Message-----
From: Ben Pfaff [mailto:blp@ovn.org] 
Sent: Wednesday, June 3, 2020 1:28 AM
To: Linhaifeng <haifeng.lin@huawei.com>
Cc: Yanqin Wei <Yanqin.Wei@arm.com>; dev@openvswitch.org; nd <nd@arm.com>; Lilijun (Jerry) <jerry.lilijun@huawei.com>; chenchanghu <chenchanghu@huawei.com>; Lichunhe <lichunhe@huawei.com>
Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first

On Tue, Jun 02, 2020 at 07:27:59AM +0000, Linhaifeng wrote:
> We should update rcu pointer first then use ovsrcu_postpone to free 
> otherwise maybe cause use-after-free.
> e.g.,reader indicates momentary quiescent and access old pointer after 
> writer postpone free old pointer and before setting new pointer.
> 
> Signed-off-by: Linhaifeng <haifeng.lin@huawei.com>

I don't see how that's possible, since the writer hasn't quiesced.

I think the logic is as follow, Could you help me find out where is incorrect?

1.1 -> 1.2 -> 3.1 -> 3.2 -> 2.1 -> 2.2 -> 2.3 -> 2.1 -> 1.3 -> 1.4 -> 3.3 -> 2.2(use after free)

wirter:
1.1 use postone to free old pointer
1.2 flush cbsets to flushed_cbsets
1.3 update new pointer
1.4 quiesced

Read:
2.1. read pointer
2.2. use pointer
2.3. quiesced

Rcu:
3.1 pop flushed_cbsets
3.2 ovsrcu_synchronize
3.3 call all cb to free
Linhaifeng June 3, 2020, 1:27 a.m. UTC | #7
-----Original Message-----
From: Yanqin Wei [mailto:Yanqin.Wei@arm.com] 
Sent: Wednesday, June 3, 2020 7:23 AM
To: Ben Pfaff <blp@ovn.org>; Linhaifeng <haifeng.lin@huawei.com>
Cc: dev@openvswitch.org; nd <nd@arm.com>; Lilijun (Jerry) <jerry.lilijun@huawei.com>; chenchanghu <chenchanghu@huawei.com>; Lichunhe <lichunhe@huawei.com>
Subject: RE: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first

Hi Ben,

If my understanding is correct, the writer could not be a rcu thread because it does not need report holding or not holding pointers.
So old memory will be freed after all rcu thread report quiesce.

The write also is rcu thread. If not first update pointer the reader can also get the old pointer after call ovsrcu_quiesced.

Best Regards,
Wei Yanqin

> -----Original Message-----
> From: Ben Pfaff <blp@ovn.org>
> Sent: Wednesday, June 3, 2020 1:28 AM
> To: Linhaifeng <haifeng.lin@huawei.com>
> Cc: Yanqin Wei <Yanqin.Wei@arm.com>; dev@openvswitch.org; nd 
> <nd@arm.com>; Lilijun (Jerry) <jerry.lilijun@huawei.com>; chenchanghu 
> <chenchanghu@huawei.com>; Lichunhe <lichunhe@huawei.com>
> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
> 
> On Tue, Jun 02, 2020 at 07:27:59AM +0000, Linhaifeng wrote:
> > We should update rcu pointer first then use ovsrcu_postpone to free 
> > otherwise maybe cause use-after-free.
> > e.g.,reader indicates momentary quiescent and access old pointer 
> > after writer postpone free old pointer and before setting new pointer.
> >
> > Signed-off-by: Linhaifeng <haifeng.lin@huawei.com>
> 
> I don't see how that's possible, since the writer hasn't quiesced.
Linhaifeng June 3, 2020, 2:15 a.m. UTC | #8
> -----Original Message-----
> From: Ben Pfaff [mailto:blp@ovn.org]
> Sent: Wednesday, June 3, 2020 8:35 AM
> To: Yanqin Wei <Yanqin.Wei@arm.com>
> Cc: Linhaifeng <haifeng.lin@huawei.com>; dev@openvswitch.org; nd
> <nd@arm.com>; Lilijun (Jerry) <jerry.lilijun@huawei.com>; chenchanghu
> <chenchanghu@huawei.com>; Lichunhe <lichunhe@huawei.com>
> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
> 
> This is not how RCU works in OVS.  Every thread is by default considered
> active.  They rarely quiesce except implicitly inside poll_block().
> Please read the large comment at the top of ovs-rcu.h.
> 
> Is your patch based on actual bugs that you have found, or is it just some kind
> of precaution?  If it is the latter, then it is not needed.
> 
Is an actual bug for old version bug it's also suitable for the other codes in ovs.

Here is the debug info:
linux-mNuKFc:/Images/linhf/830/Euler_compile_env # gdb -p `pidof ovs-vswitchd`
GNU gdb (GDB) Red Hat Enterprise Linux 8.2-3.h2
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-Huawei-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 102706
[New LWP 109133]
[New LWP 109134]
[New LWP 109297]
[New LWP 109298]
[New LWP 109299]
[New LWP 109300]
[New LWP 109303]
[New LWP 109304]
[New LWP 109308]
[New LWP 109309]
[New LWP 109310]
[New LWP 109311]
[New LWP 109522]
[New LWP 109523]
[New LWP 109603]
[New LWP 109615]
[New LWP 109619]
[New LWP 109655]
[New LWP 109673]
[New LWP 109794]
[New LWP 109795]
[New LWP 113953]
[New LWP 114362]
[New LWP 114364]
[New LWP 114368]
[New LWP 114370]
[New LWP 114373]
[New LWP 114377]
[New LWP 115594]
[New LWP 115595]
[New LWP 115596]
[New LWP 115597]
[New LWP 115598]
[New LWP 115600]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x0000ffff879981ac in poll () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glib2-2.54.2-2.h1.aarch64 glibc-2.28-9.h17.aarch64 keyutils-libs-1.5.8-3.aarch64 krb5-libs-1.15.1-34.h2.aarch64 libcgroup-0.41-15.h3.aarch64 libcom_err-1.44.3-1.h4.aarch64 libgcc-7.3.0-20190804.h18.aarch64 libselinux-2.5-12.aarch64 numactl-libs-2.0.9-7.h1.aarch64 openssl-libs-1.0.2k-16.h6.aarch64 pcre-8.32-17.h9.aarch64 uvpkmc-1.0.1-807.aarch64 zlib-1.2.7-17.aarch64
(gdb) b dpcls_destroy_subtable
Breakpoint 1 at 0x508bcc: file lib/dpif-netdev.c, line 6919.
(gdb) b ovsrcu_call_postponed
Breakpoint 2 at 0x5b7d34: file lib/ovs-rcu.c, line 336.
(gdb) c
Continuing.
[Switching to Thread 0xffff83b97860 (LWP 109304)]

Thread 9 "urcu2" hit Breakpoint 2, ovsrcu_call_postponed () at lib/ovs-rcu.c:336
warning: Source file is more recent than executable.
336	{
(gdb) n
339	    int wait_del = 0;
(gdb) 
340	    while(wait_del);
(gdb) set wait_del = 1
(gdb) c
Continuing.
[Switching to Thread 0xffff51748860 (LWP 115598)]

Thread 34 "revalidator19" hit Breakpoint 1, dpcls_destroy_subtable (cls=0xffff1c00a420, subtable=0xffff3c009250) at lib/dpif-netdev.c:6919
6919	    int wait_get = 0;
(gdb) n
6920	    VLOG_DBG("Destroying subtable %p for in_port %d", subtable, cls->in_port);
(gdb) 
6921	    pvector_remove(&cls->subtables, subtable);
(gdb) 
6922	    cmap_remove(&cls->subtables_map, &subtable->cmap_node,
(gdb) set wait_get = 1
(gdb) n
6924	    cmap_destroy(&subtable->rules);
(gdb) p subtable->rules
$1 = {impl = {p = 0xffff30008940}}
(gdb) s
cmap_destroy (cmap=0xffff3c009258) at lib/cmap.c:288
288	    if (cmap) {
(gdb) n
289	        struct cmap_impl *impl = cmap_get_impl(cmap);
(gdb) 
290	        if (impl != &empty_cmap) {
(gdb) 
291	            ovsrcu_postpone(free_cacheline, impl);
(gdb) s
ovsrcu_postpone__ (function=0x6029b0 <free_cacheline>, aux=0xffff30008940) at lib/ovs-rcu.c:315
315	    struct ovsrcu_perthread *perthread = ovsrcu_perthread_get();
(gdb) n
318	    int size = ARRAY_SIZE(cbset->cbs);
(gdb) 
319	    cbset = perthread->cbset;
(gdb) 
320	    if (!cbset) {
(gdb) 
325	    cb = &cbset->cbs[cbset->n_cbs++];
(gdb) 
326	    cb->function = function;
(gdb) 
327	    cb->aux = aux;
(gdb) 
329	    if (cbset->n_cbs >= size) {
(gdb) set size = cbset->n_cbs
(gdb) n
330	        ovsrcu_flush_cbset(perthread);
(gdb) s
ovsrcu_flush_cbset (perthread=0xffff30001210) at lib/ovs-rcu.c:397
397	    ovsrcu_flush_cbset__(perthread, false);
(gdb) s
ovsrcu_flush_cbset__ (perthread=0xffff30001210, protected=false) at lib/ovs-rcu.c:380
380	    struct ovsrcu_cbset *cbset = perthread->cbset;
(gdb) n
382	    if (cbset) {
(gdb) 
383	        guarded_list_push_back(&flushed_cbsets, &cbset->list_node, SIZE_MAX);
(gdb) 
384	        perthread->cbset = NULL;
(gdb) 
386	        if (protected) {
(gdb) 
389	            seq_change(flushed_cbsets_seq);
(gdb) 
392	}
(gdb) 
ovsrcu_flush_cbset (perthread=0xffff30001210) at lib/ovs-rcu.c:398
398	}
(gdb) 
ovsrcu_postpone__ (function=0x6029b0 <free_cacheline>, aux=0xffff30008940) at lib/ovs-rcu.c:332
332	}
(gdb) 
cmap_destroy (cmap=0xffff3c009258) at lib/cmap.c:294
294	}
(gdb) 
dpcls_destroy_subtable (cls=0xffff1c00a420, subtable=0xffff3c009250) at lib/dpif-netdev.c:6925
6925	    while(wait_get);
(gdb) p wait_get
$2 = 1
(gdb) thir 9
Undefined command: "thir".  Try "help".
(gdb) thr 9
[Switching to thread 9 (Thread 0xffff83b97860 (LWP 109304))]
#0  ovsrcu_call_postponed () at lib/ovs-rcu.c:340
340	    while(wait_del);
(gdb) set wait_del = 0
(gdb) n
342	    guarded_list_pop_all(&flushed_cbsets, &cbsets);
(gdb) 
343	    if (ovs_list_is_empty(&cbsets)) {
(gdb) pcbsets cbsets
$173 = {list_node = {prev = 0xfffefc041660, next = 0xffff83b96f98}, cbs = {{function = 0x6029b0 <free_cacheline>, aux = 0xffff3c0094c0}, {function = 0x6029b0 <free_cacheline>, aux = 0xffff30008940}, {function = 0x508b9c <subtable_free>, aux = 0xffff3c009c10}, {function = 0xffff87947448 <free>, aux = 0xffff30001390}, {function = 0x4f9b74 <dp_netdev_flow_free>, aux = 0xffff34009b80}, {function = 0x0, aux = 0x0}, {function = 0x0, aux = 0x0}, {function = 0x90, aux = 0x74}, {function = 0x0, aux = 0xabc7c8 <ovsrcu_threads>}, {function = 0x0, aux = 0x0}, {function = 0xffffffff, aux = 0x0}, {function = 0x0, aux = 0x0}, {function = 0x0, aux = 0x2263e3}, {function = 0x0, aux = 0x3275637275}, {function = 0x0, aux = 0x21}, {function = 0xffff3000ea30, aux = 0xffff300013f0}}, n_cbs = 2}
(gdb) prcus ovsrcu_threads
$174 = {list_node = {prev = 0xabc7c8 <ovsrcu_threads>, next = 0xffff30001390}, mutex = {lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 2, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, 
      __size = '\000' <repeats 16 times>, "\002", '\000' <repeats 30 times>, __align = 0}, where = 0x840bc0 "<unlocked>"}, seqno = 3301106, cbset = 0x0, name = "pmd14\000\000\000\000\000\000\000\000\000\000"}
$175 = {list_node = {prev = 0xffff38003970, next = 0xffff30001210}, mutex = {lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 2, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, 
      __size = '\000' <repeats 16 times>, "\002", '\000' <repeats 30 times>, __align = 0}, where = 0x840bc0 "<unlocked>"}, seqno = 3297445, cbset = 0x0, name = "urcu2\000\000\000\000\000\000\000\000\000\000"}
$176 = {list_node = {prev = 0xffff30001390, next = 0xabc7c8 <ovsrcu_threads>}, mutex = {lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 2, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, 
      __size = '\000' <repeats 16 times>, "\002", '\000' <repeats 30 times>, __align = 0}, where = 0x840bc0 "<unlocked>"}, seqno = 3300573, cbset = 0x0, name = "revalidator19\000\000"}
(gdb) n
347	    ovsrcu_synchronize();
(gdb) s
ovsrcu_synchronize () at lib/ovs-rcu.c:225
225	{
(gdb) n
226	    unsigned int warning_threshold = INIT_WARNING_THRESHOLD_MS;
(gdb) 
227	    unsigned int block_report_frequent = BLOCK_REPORT_FREQUENT_MS;
(gdb) 
230	    int wait_round = 0;
(gdb) 
232	    if (single_threaded()) {
(gdb) 
236	    target_seqno = seq_read(global_seqno);
(gdb) 
237	    ovsrcu_quiesce_start();
(gdb) p target_seqno
$177 = 3301107
(gdb) thr 23    //切换到pmd一直next,直到执行ovsrcu_try_quiesce
(gdb) 
4975	                if (!ovsrcu_try_quiesce()) {
(gdb) 
4976	                    emc_cache_slow_sweep(&pmd->flow_cache);
(gdb) 
4980	                if (OVS_UNLIKELY(pmd->pmd_xinfo.flow_table_on_dp != flow_table_on_dp)) {
(gdb) prcus ovsrcu_threads
$194 = {list_node = {prev = 0xabc7c8 <ovsrcu_threads>, next = 0xffff30001210}, mutex = {lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 2, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, 
      __size = '\000' <repeats 16 times>, "\002", '\000' <repeats 30 times>, __align = 0}, where = 0x840bc0 "<unlocked>"}, seqno = 3301109, cbset = 0x0, name = "pmd14\000\000\000\000\000\000\000\000\000\000"}
$195 = {list_node = {prev = 0xffff38003970, next = 0xabc7c8 <ovsrcu_threads>}, mutex = {lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 2, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, 
      __size = '\000' <repeats 16 times>, "\002", '\000' <repeats 30 times>, __align = 0}, where = 0x840bc0 "<unlocked>"}, seqno = 3300573, cbset = 0x0, name = "revalidator19\000\000"}
(gdb) b util.c:261 if (p == 0xffff30008940)
Breakpoint 3 at 0x6029bc: file lib/util.c, line 261.
(gdb) b dpif-netdev.c:7129 if (subtable->rules->impl->p == 0xffff30008940)
Breakpoint 4 at 0x509444: file lib/dpif-netdev.c, line 7129.
(gdb) thr 34
[Switching to thread 34 (Thread 0xffff51748860 (LWP 115598))]
(gdb) set wait_get = 0
(gdb) c
Continuing.
[Switching to Thread 0xffff43ffe860 (LWP 113953)]

Thread 23 "pmd14" hit Breakpoint 4, dpcls_lookup (cls=0xffff1c00a420, keys=0xffff43ffd3c0, rules=0xffff43ffcce0, cnt=1, num_lookups_p=0xffff43ffcd84) at lib/dpif-netdev.c:7129
7129	        while(wait_free);
(gdb) p wait_free
$200 = 0
(gdb) set wait_free = 1
(gdb) c
Continuing.
[Switching to Thread 0xffff43ffe860 (LWP 113953)]

Thread 23 "pmd14" hit Breakpoint 4, dpcls_lookup (cls=0xffff1c00a420, keys=0xffff43ffd3c0, rules=0xffff43ffcce0, cnt=1, num_lookups_p=0xffff43ffcd84) at lib/dpif-netdev.c:7129
7129	        while(wait_free);
(gdb) c
Continuing.

Thread 23 "pmd14" hit Breakpoint 4, dpcls_lookup (cls=0xffff1c00a420, keys=0xffff43ffd3c0, rules=0xffff43ffcce0, cnt=1, num_lookups_p=0xffff43ffcd84) at lib/dpif-netdev.c:7129
7129	        while(wait_free);
(gdb) c
Continuing.

Thread 23 "pmd14" hit Breakpoint 4, dpcls_lookup (cls=0xffff1c00a420, keys=0xffff43ffd3c0, rules=0xffff43ffcce0, cnt=1, num_lookups_p=0xffff43ffcd84) at lib/dpif-netdev.c:7129
7129	        while(wait_free);
(gdb) c
Continuing.
[Switching to Thread 0xffff83b97860 (LWP 109304)]

Thread 9 "urcu2" hit Breakpoint 3, free_cacheline (p=0xffff30008940) at lib/util.c:261
261	    free(p);
(gdb) c
(gdb) c
Continuing.
[Switching to Thread 0xffff43ffe860 (LWP 113953)]

Thread 23 "pmd14" hit Breakpoint 4, dpcls_lookup (cls=0xffff1c00a420, keys=0xffff43ffd3c0, rules=0xffff43ffcce0, cnt=1, num_lookups_p=0xffff43ffcd84) at lib/dpif-netdev.c:7129
7129	        while(wait_free);
(gdb) set wait_free = 0
(gdb) c
Continuing.

Thread 23 "pmd14" received signal SIGSEGV, Segmentation fault.
0x00000000004e1e50 in read_counter (bucket_=0xffff3003bfc0) at lib/cmap.c:333
333	    atomic_read_explicit(&bucket->counter, &counter, memory_order_acquire);
(gdb) bt
#0  0x00000000004e1e50 in read_counter (bucket_=0xffff3003bfc0) at lib/cmap.c:333
#1  0x00000000004e1e9c in read_even_counter (bucket=0xffff3003bfc0) at lib/cmap.c:344
#2  0x00000000004e2270 in cmap_find_batch (cmap=0xffff3c009258, map=1, hashes=0xffff43ffcb18, nodes=0xffff43ffcb98) at lib/cmap.c:459
#3  0x0000000000509468 in dpcls_lookup (cls=0xffff1c00a420, keys=0xffff43ffd3c0, rules=0xffff43ffcce0, cnt=1, num_lookups_p=0xffff43ffcd84) at lib/dpif-netdev.c:7131
#4  0x00000000004fabdc in dp_netdev_lookup_flow (classifiers=0xffff5237b0c8, key=0xffff43ffd3c0, lookup_num_p=0xffff43ffcd84) at lib/dpif-netdev.c:2111
#5  0x000000000050748c in fast_path_processing (pmd=0xffff51f4b000, packets_=0xffff43ffddb8, keys=0xffff43ffd3c0, batches=0xffff43ffd270, n_batches=0xffff43ffd6c0, in_port=11, now=862863019) at lib/dpif-netdev.c:6027
#6  0x0000000000507bb0 in dp_netdev_input__ (pmd=0xffff51f4b000, packets=0xffff43ffddb8, md_is_valid=false, port_no=11) at lib/dpif-netdev.c:6138
#7  0x0000000000507fa0 in dp_netdev_input (pmd=0xffff51f4b000, packets=0xffff43ffddb8, port_no=11) at lib/dpif-netdev.c:6222
#8  0x000000000067d130 in dp_uevs_forward (evs_ctx=0xffff43ffdd60, port_no=11, md_is_valid=false) at lib/evs/evs-dpdk.c:2717
#9  0x00000000005035e4 in evs_netdev_process_rxq_port (enable_blc=0, port_args=0xffff43ffdd40, pmd_args=0xffff43ffdd88) at lib/evs/evs-dpdk-inline.h:162
#10 evs_forward_loop (lacp_only_mode=false, ts=0xffff43ffdd30, port_args=0xffff43ffdd40, pmd_args=0xffff43ffdd88, poll_list=0xffff38009030, poll_cnt=24) at lib/dpif-netdev.c:4788
#11 pmd_thread_main (f_=0xffff51f4b000) at lib/dpif-netdev.c:4905
#12 0x00000000005bb0fc in ovsthread_wrapper (aux_=0x282d2280) at lib/ovs-thread.c:715
#13 0x0000ffff87f5c8bc in start_thread () from /lib64/libpthread.so.0
#14 0x0000ffff879a1e7c in thread_start () from /lib64/libc.so.6
> On Tue, Jun 02, 2020 at 11:22:57PM +0000, Yanqin Wei wrote:
> > Hi Ben,
> >
> > If my understanding is correct, the writer could not be a rcu thread because it
> does not need report holding or not holding pointers.
> > So old memory will be freed after all rcu thread report quiesce.
> >
> > Best Regards,
> > Wei Yanqin
> >
> > > -----Original Message-----
> > > From: Ben Pfaff <blp@ovn.org>
> > > Sent: Wednesday, June 3, 2020 1:28 AM
> > > To: Linhaifeng <haifeng.lin@huawei.com>
> > > Cc: Yanqin Wei <Yanqin.Wei@arm.com>; dev@openvswitch.org; nd
> > > <nd@arm.com>; Lilijun (Jerry) <jerry.lilijun@huawei.com>;
> > > chenchanghu <chenchanghu@huawei.com>; Lichunhe
> <lichunhe@huawei.com>
> > > Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
> > >
> > > On Tue, Jun 02, 2020 at 07:27:59AM +0000, Linhaifeng wrote:
> > > > We should update rcu pointer first then use ovsrcu_postpone to
> > > > free otherwise maybe cause use-after-free.
> > > > e.g.,reader indicates momentary quiescent and access old pointer
> > > > after writer postpone free old pointer and before setting new pointer.
> > > >
> > > > Signed-off-by: Linhaifeng <haifeng.lin@huawei.com>
> > >
> > > I don't see how that's possible, since the writer hasn't quiesced.
Ben Pfaff June 3, 2020, 5:26 a.m. UTC | #9
On Wed, Jun 03, 2020 at 01:22:52AM +0000, Linhaifeng wrote:
> 
> 
> -----Original Message-----
> From: Ben Pfaff [mailto:blp@ovn.org] 
> Sent: Wednesday, June 3, 2020 1:28 AM
> To: Linhaifeng <haifeng.lin@huawei.com>
> Cc: Yanqin Wei <Yanqin.Wei@arm.com>; dev@openvswitch.org; nd <nd@arm.com>; Lilijun (Jerry) <jerry.lilijun@huawei.com>; chenchanghu <chenchanghu@huawei.com>; Lichunhe <lichunhe@huawei.com>
> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
> 
> On Tue, Jun 02, 2020 at 07:27:59AM +0000, Linhaifeng wrote:
> > We should update rcu pointer first then use ovsrcu_postpone to free 
> > otherwise maybe cause use-after-free.
> > e.g.,reader indicates momentary quiescent and access old pointer after 
> > writer postpone free old pointer and before setting new pointer.
> > 
> > Signed-off-by: Linhaifeng <haifeng.lin@huawei.com>
> 
> I don't see how that's possible, since the writer hasn't quiesced.
> 
> I think the logic is as follow, Could you help me find out where is incorrect?
> 
> 1.1 -> 1.2 -> 3.1 -> 3.2 -> 2.1 -> 2.2 -> 2.3 -> 2.1 -> 1.3 -> 1.4 -> 3.3 -> 2.2(use after free)
> 
> wirter:
> 1.1 use postone to free old pointer
> 1.2 flush cbsets to flushed_cbsets
> 1.3 update new pointer
> 1.4 quiesced
> 
> Read:
> 2.1. read pointer
> 2.2. use pointer
> 2.3. quiesced
> 
> Rcu:
> 3.1 pop flushed_cbsets
> 3.2 ovsrcu_synchronize
> 3.3 call all cb to free

So you're saying this:

    1.1 use postone to free old pointer (A)
    1.2 flush cbsets to flushed_cbsets

    3.1 pop flushed_cbsets
    3.2 ovsrcu_synchronize

    2.1. read pointer (A)
    2.2. use pointer (A)
    2.3. quiesced

    2.1. read pointer (A)

    1.3 update new pointer (B)
    1.4 quiesced

    3.3 call all cb to free (A)

    2.2. use pointer (A)

Wow, you are absolutely right.  This had never occurred to me.  Thank
you!  I'll review your patch.
Linhaifeng June 3, 2020, 7:04 a.m. UTC | #10
> -----Original Message-----
> From: Ben Pfaff [mailto:blp@ovn.org]
> Sent: Wednesday, June 3, 2020 1:26 PM
> To: Linhaifeng <haifeng.lin@huawei.com>
> Cc: Yanqin Wei <Yanqin.Wei@arm.com>; dev@openvswitch.org; nd
> <nd@arm.com>; Lilijun (Jerry) <jerry.lilijun@huawei.com>; chenchanghu
> <chenchanghu@huawei.com>; Lichunhe <lichunhe@huawei.com>
> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
> 
> On Wed, Jun 03, 2020 at 01:22:52AM +0000, Linhaifeng wrote:
> >
> >
> > -----Original Message-----
> > From: Ben Pfaff [mailto:blp@ovn.org]
> > Sent: Wednesday, June 3, 2020 1:28 AM
> > To: Linhaifeng <haifeng.lin@huawei.com>
> > Cc: Yanqin Wei <Yanqin.Wei@arm.com>; dev@openvswitch.org; nd
> > <nd@arm.com>; Lilijun (Jerry) <jerry.lilijun@huawei.com>; chenchanghu
> > <chenchanghu@huawei.com>; Lichunhe <lichunhe@huawei.com>
> > Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
> >
> > On Tue, Jun 02, 2020 at 07:27:59AM +0000, Linhaifeng wrote:
> > > We should update rcu pointer first then use ovsrcu_postpone to free
> > > otherwise maybe cause use-after-free.
> > > e.g.,reader indicates momentary quiescent and access old pointer
> > > after writer postpone free old pointer and before setting new pointer.
> > >
> > > Signed-off-by: Linhaifeng <haifeng.lin@huawei.com>
> >
> > I don't see how that's possible, since the writer hasn't quiesced.
> >
> > I think the logic is as follow, Could you help me find out where is incorrect?
> >
> > 1.1 -> 1.2 -> 3.1 -> 3.2 -> 2.1 -> 2.2 -> 2.3 -> 2.1 -> 1.3 -> 1.4 ->
> > 3.3 -> 2.2(use after free)
> >
> > wirter:
> > 1.1 use postone to free old pointer
> > 1.2 flush cbsets to flushed_cbsets
> > 1.3 update new pointer
> > 1.4 quiesced
> >
> > Read:
> > 2.1. read pointer
> > 2.2. use pointer
> > 2.3. quiesced
> >
> > Rcu:
> > 3.1 pop flushed_cbsets
> > 3.2 ovsrcu_synchronize
> > 3.3 call all cb to free
> 
> So you're saying this:
> 
>     1.1 use postone to free old pointer (A)
>     1.2 flush cbsets to flushed_cbsets
> 
>     3.1 pop flushed_cbsets
>     3.2 ovsrcu_synchronize
> 
>     2.1. read pointer (A)
>     2.2. use pointer (A)
>     2.3. quiesced
> 
>     2.1. read pointer (A)
> 
>     1.3 update new pointer (B)
>     1.4 quiesced
> 
>     3.3 call all cb to free (A)
> 
>     2.2. use pointer (A)
> 
> Wow, you are absolutely right.  This had never occurred to me.  Thank you!
> I'll review your patch.

Yes, it's really hard to happen. If it happened it's also hard to find the reason so I suggest it can be a rule for using rcu.
Yanqin Wei June 3, 2020, 7:42 a.m. UTC | #11
Hi Ben,

I read the comments and codes about ovs-rcu. Each thread is added into ovsrcu_threads in the thread start routine.

And most of threads quiesce in the poll_block/ xsleep/ ovsrcu_quiesce/ ovsrcu_quiesce_start+ ovsrcu_quiesce_end. 
Only one thread "afsync_thread" may not be in the ovsrcu_threads list, because it invokes ovsrcu_quiesce_start but does not call ovsrcu_quiesce_end. But this thread does not update/free rcu memory, so no writer is in the quiescent state.

I am sorry to make an incorrect comment for Haifeng's patch.  He has a correct description in original patch.  Please help to correct.
https://patchwork.ozlabs.org/project/openvswitch/patch/4099DE2E54AFAD489356C6C9161D53339735AB4A@dggeml522-mbs.china.huawei.com/

Best Regards,
Wei Yanqin

> -----Original Message-----
> From: Ben Pfaff <blp@ovn.org>
> Sent: Wednesday, June 3, 2020 8:48 AM
> To: Yanqin Wei <Yanqin.Wei@arm.com>
> Cc: Linhaifeng <haifeng.lin@huawei.com>; dev@openvswitch.org; nd
> <nd@arm.com>; Lilijun (Jerry) <jerry.lilijun@huawei.com>; chenchanghu
> <chenchanghu@huawei.com>; Lichunhe <lichunhe@huawei.com>
> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
> 
> Oh, I apologize that I made a mistake about the author.
> 
> I appreciate feedback from anyone.
> 
> On Wed, Jun 03, 2020 at 12:37:27AM +0000, Yanqin Wei wrote:
> > Hi Ben,
> >
> > This patch is from Linhai, but I have the same concern about this.  I will read
> ovs-rcu comments and feedback.
> > Thanks for your time.
> >
> > Best Regards,
> > Wei Yanqin
> >
> > > -----Original Message-----
> > > From: Ben Pfaff <blp@ovn.org>
> > > Sent: Wednesday, June 3, 2020 8:35 AM
> > > To: Yanqin Wei <Yanqin.Wei@arm.com>
> > > Cc: Linhaifeng <haifeng.lin@huawei.com>; dev@openvswitch.org; nd
> > > <nd@arm.com>; Lilijun (Jerry) <jerry.lilijun@huawei.com>;
> > > chenchanghu <chenchanghu@huawei.com>; Lichunhe
> <lichunhe@huawei.com>
> > > Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
> > >
> > > This is not how RCU works in OVS.  Every thread is by default considered
> active.
> > > They rarely quiesce except implicitly inside poll_block().
> > > Please read the large comment at the top of ovs-rcu.h.
> > >
> > > Is your patch based on actual bugs that you have found, or is it
> > > just some kind of precaution?  If it is the latter, then it is not needed.
> > >
> > > On Tue, Jun 02, 2020 at 11:22:57PM +0000, Yanqin Wei wrote:
> > > > Hi Ben,
> > > >
> > > > If my understanding is correct, the writer could not be a rcu
> > > > thread because it
> > > does not need report holding or not holding pointers.
> > > > So old memory will be freed after all rcu thread report quiesce.
> > > >
> > > > Best Regards,
> > > > Wei Yanqin
> > > >
> > > > > -----Original Message-----
> > > > > From: Ben Pfaff <blp@ovn.org>
> > > > > Sent: Wednesday, June 3, 2020 1:28 AM
> > > > > To: Linhaifeng <haifeng.lin@huawei.com>
> > > > > Cc: Yanqin Wei <Yanqin.Wei@arm.com>; dev@openvswitch.org; nd
> > > > > <nd@arm.com>; Lilijun (Jerry) <jerry.lilijun@huawei.com>;
> > > > > chenchanghu <chenchanghu@huawei.com>; Lichunhe
> > > <lichunhe@huawei.com>
> > > > > Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer
> > > > > first
> > > > >
> > > > > On Tue, Jun 02, 2020 at 07:27:59AM +0000, Linhaifeng wrote:
> > > > > > We should update rcu pointer first then use ovsrcu_postpone to
> > > > > > free otherwise maybe cause use-after-free.
> > > > > > e.g.,reader indicates momentary quiescent and access old
> > > > > > pointer after writer postpone free old pointer and before setting new
> pointer.
> > > > > >
> > > > > > Signed-off-by: Linhaifeng <haifeng.lin@huawei.com>
> > > > >
> > > > > I don't see how that's possible, since the writer hasn't quiesced.
Ilya Maximets June 3, 2020, 10:50 a.m. UTC | #12
On 6/3/20 9:04 AM, Linhaifeng wrote:
> 
> 
>> -----Original Message-----
>> From: Ben Pfaff [mailto:blp@ovn.org]
>> Sent: Wednesday, June 3, 2020 1:26 PM
>> To: Linhaifeng <haifeng.lin@huawei.com>
>> Cc: Yanqin Wei <Yanqin.Wei@arm.com>; dev@openvswitch.org; nd
>> <nd@arm.com>; Lilijun (Jerry) <jerry.lilijun@huawei.com>; chenchanghu
>> <chenchanghu@huawei.com>; Lichunhe <lichunhe@huawei.com>
>> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
>>
>> On Wed, Jun 03, 2020 at 01:22:52AM +0000, Linhaifeng wrote:
>>>
>>>
>>> -----Original Message-----
>>> From: Ben Pfaff [mailto:blp@ovn.org]
>>> Sent: Wednesday, June 3, 2020 1:28 AM
>>> To: Linhaifeng <haifeng.lin@huawei.com>
>>> Cc: Yanqin Wei <Yanqin.Wei@arm.com>; dev@openvswitch.org; nd
>>> <nd@arm.com>; Lilijun (Jerry) <jerry.lilijun@huawei.com>; chenchanghu
>>> <chenchanghu@huawei.com>; Lichunhe <lichunhe@huawei.com>
>>> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
>>>
>>> On Tue, Jun 02, 2020 at 07:27:59AM +0000, Linhaifeng wrote:
>>>> We should update rcu pointer first then use ovsrcu_postpone to free
>>>> otherwise maybe cause use-after-free.
>>>> e.g.,reader indicates momentary quiescent and access old pointer
>>>> after writer postpone free old pointer and before setting new pointer.
>>>>
>>>> Signed-off-by: Linhaifeng <haifeng.lin@huawei.com>
>>>
>>> I don't see how that's possible, since the writer hasn't quiesced.
>>>
>>> I think the logic is as follow, Could you help me find out where is incorrect?
>>>
>>> 1.1 -> 1.2 -> 3.1 -> 3.2 -> 2.1 -> 2.2 -> 2.3 -> 2.1 -> 1.3 -> 1.4 ->
>>> 3.3 -> 2.2(use after free)
>>>
>>> wirter:
>>> 1.1 use postone to free old pointer
>>> 1.2 flush cbsets to flushed_cbsets
>>> 1.3 update new pointer
>>> 1.4 quiesced
>>>
>>> Read:
>>> 2.1. read pointer
>>> 2.2. use pointer
>>> 2.3. quiesced
>>>
>>> Rcu:
>>> 3.1 pop flushed_cbsets
>>> 3.2 ovsrcu_synchronize
>>> 3.3 call all cb to free
>>
>> So you're saying this:
>>
>>     1.1 use postone to free old pointer (A)
>>     1.2 flush cbsets to flushed_cbsets
>>
>>     3.1 pop flushed_cbsets
>>     3.2 ovsrcu_synchronize
>>
>>     2.1. read pointer (A)
>>     2.2. use pointer (A)
>>     2.3. quiesced
>>
>>     2.1. read pointer (A)
>>
>>     1.3 update new pointer (B)
>>     1.4 quiesced
>>
>>     3.3 call all cb to free (A)
>>
>>     2.2. use pointer (A)
>>
>> Wow, you are absolutely right.  This had never occurred to me.  Thank you!
>> I'll review your patch.
> 
> Yes, it's really hard to happen. If it happened it's also hard to find the reason so I suggest it can be a rule for using rcu.

I agree that there is an issue here, but I think that we should not force
users to call ovsrcu_set() before ovsrcu_postpone().  Current users doesn't
do anything illegal since pointer must not be freed before the next grace
period from their point of view.

For me it looks like the main issue is existence of point 1.2, i.e. flushing
cbsets while writer is not quiesced yet.  And we need to fix this inside
rcu library itself.  For example, we could avoid flushing inside
ovsrcu_postpone() by making cbs[16] a dynamically allocated array and using
x2nrealloc instead of flushing.

Thoughts?

Regarding the patch itself:

> diff --git a/lib/ovs-rcu.h b/lib/ovs-rcu.h
> index ecc4c9201..98c238aea 100644
> --- a/lib/ovs-rcu.h
> +++ b/lib/ovs-rcu.h
> @@ -118,10 +118,10 @@
>   *     void
>   *     change_flow(struct flow *new_flow)
>   *     {
> + *         struct flow *old_flow = ovsrcu_get_protected(struct flow *, &flowp)

ovsrcu_get_protected() can not be used outside of the critical section.

>   *         ovs_mutex_lock(&mutex);
> - *         ovsrcu_postpone(free,
> - *                         ovsrcu_get_protected(struct flow *, &flowp));
>   *         ovsrcu_set(&flowp, new_flow);
> + *         ovsrcu_postpone(free, old_flow);
>   *         ovs_mutex_unlock(&mutex);
>   *     }
>   
Best regards, Ilya Maximets.
Linhaifeng June 3, 2020, 11:08 a.m. UTC | #13
> -----Original Message-----
> From: Ilya Maximets [mailto:i.maximets@ovn.org]
> Sent: Wednesday, June 3, 2020 6:50 PM
> To: Linhaifeng <haifeng.lin@huawei.com>; Ben Pfaff <blp@ovn.org>
> Cc: i.maximets@ovn.org; dev@openvswitch.org; Lilijun (Jerry)
> <jerry.lilijun@huawei.com>; Lichunhe <lichunhe@huawei.com>; nd
> <nd@arm.com>; chenchanghu <chenchanghu@huawei.com>
> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
> 
> On 6/3/20 9:04 AM, Linhaifeng wrote:
> >
> >
> >> -----Original Message-----
> >> From: Ben Pfaff [mailto:blp@ovn.org]
> >> Sent: Wednesday, June 3, 2020 1:26 PM
> >> To: Linhaifeng <haifeng.lin@huawei.com>
> >> Cc: Yanqin Wei <Yanqin.Wei@arm.com>; dev@openvswitch.org; nd
> >> <nd@arm.com>; Lilijun (Jerry) <jerry.lilijun@huawei.com>; chenchanghu
> >> <chenchanghu@huawei.com>; Lichunhe <lichunhe@huawei.com>
> >> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
> >>
> >> On Wed, Jun 03, 2020 at 01:22:52AM +0000, Linhaifeng wrote:
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: Ben Pfaff [mailto:blp@ovn.org]
> >>> Sent: Wednesday, June 3, 2020 1:28 AM
> >>> To: Linhaifeng <haifeng.lin@huawei.com>
> >>> Cc: Yanqin Wei <Yanqin.Wei@arm.com>; dev@openvswitch.org; nd
> >>> <nd@arm.com>; Lilijun (Jerry) <jerry.lilijun@huawei.com>;
> >>> chenchanghu <chenchanghu@huawei.com>; Lichunhe
> <lichunhe@huawei.com>
> >>> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
> >>>
> >>> On Tue, Jun 02, 2020 at 07:27:59AM +0000, Linhaifeng wrote:
> >>>> We should update rcu pointer first then use ovsrcu_postpone to free
> >>>> otherwise maybe cause use-after-free.
> >>>> e.g.,reader indicates momentary quiescent and access old pointer
> >>>> after writer postpone free old pointer and before setting new pointer.
> >>>>
> >>>> Signed-off-by: Linhaifeng <haifeng.lin@huawei.com>
> >>>
> >>> I don't see how that's possible, since the writer hasn't quiesced.
> >>>
> >>> I think the logic is as follow, Could you help me find out where is incorrect?
> >>>
> >>> 1.1 -> 1.2 -> 3.1 -> 3.2 -> 2.1 -> 2.2 -> 2.3 -> 2.1 -> 1.3 -> 1.4
> >>> ->
> >>> 3.3 -> 2.2(use after free)
> >>>
> >>> wirter:
> >>> 1.1 use postone to free old pointer
> >>> 1.2 flush cbsets to flushed_cbsets
> >>> 1.3 update new pointer
> >>> 1.4 quiesced
> >>>
> >>> Read:
> >>> 2.1. read pointer
> >>> 2.2. use pointer
> >>> 2.3. quiesced
> >>>
> >>> Rcu:
> >>> 3.1 pop flushed_cbsets
> >>> 3.2 ovsrcu_synchronize
> >>> 3.3 call all cb to free
> >>
> >> So you're saying this:
> >>
> >>     1.1 use postone to free old pointer (A)
> >>     1.2 flush cbsets to flushed_cbsets
> >>
> >>     3.1 pop flushed_cbsets
> >>     3.2 ovsrcu_synchronize
> >>
> >>     2.1. read pointer (A)
> >>     2.2. use pointer (A)
> >>     2.3. quiesced
> >>
> >>     2.1. read pointer (A)
> >>
> >>     1.3 update new pointer (B)
> >>     1.4 quiesced
> >>
> >>     3.3 call all cb to free (A)
> >>
> >>     2.2. use pointer (A)
> >>
> >> Wow, you are absolutely right.  This had never occurred to me.  Thank
> you!
> >> I'll review your patch.
> >
> > Yes, it's really hard to happen. If it happened it's also hard to find the reason
> so I suggest it can be a rule for using rcu.
> 
> I agree that there is an issue here, but I think that we should not force users to
> call ovsrcu_set() before ovsrcu_postpone().  Current users doesn't do
> anything illegal since pointer must not be freed before the next grace period
> from their point of view.
> 
> For me it looks like the main issue is existence of point 1.2, i.e. flushing cbsets
> while writer is not quiesced yet.  And we need to fix this inside rcu library itself.
> For example, we could avoid flushing inside
> ovsrcu_postpone() by making cbs[16] a dynamically allocated array and using
> x2nrealloc instead of flushing.
> 
> Thoughts?
> 
Hi, Ilya Maximets

May be this is a good idea therefor the users not need to think about call ovsrcu_set() first or ovsrcu_postpone().
How about you think, ben? May be you can send a patch to modify the ovsrcu_postpone() not to flush cbsets to
replace of my patches.
.

> 
> Regarding the patch itself:
> 
> > diff --git a/lib/ovs-rcu.h b/lib/ovs-rcu.h index ecc4c9201..98c238aea
> > 100644
> > --- a/lib/ovs-rcu.h
> > +++ b/lib/ovs-rcu.h
> > @@ -118,10 +118,10 @@
> >   *     void
> >   *     change_flow(struct flow *new_flow)
> >   *     {
> > + *         struct flow *old_flow = ovsrcu_get_protected(struct flow *,
> &flowp)
> 
> ovsrcu_get_protected() can not be used outside of the critical section.
> 
> >   *         ovs_mutex_lock(&mutex);
> > - *         ovsrcu_postpone(free,
> > - *                         ovsrcu_get_protected(struct flow *,
> &flowp));
> >   *         ovsrcu_set(&flowp, new_flow);
> > + *         ovsrcu_postpone(free, old_flow);
> >   *         ovs_mutex_unlock(&mutex);
> >   *     }
> >
> Best regards, Ilya Maximets.
Ilya Maximets June 3, 2020, 2:33 p.m. UTC | #14
On 6/3/20 1:08 PM, Linhaifeng wrote:
> 
> 
>> -----Original Message-----
>> From: Ilya Maximets [mailto:i.maximets@ovn.org]
>> Sent: Wednesday, June 3, 2020 6:50 PM
>> To: Linhaifeng <haifeng.lin@huawei.com>; Ben Pfaff <blp@ovn.org>
>> Cc: i.maximets@ovn.org; dev@openvswitch.org; Lilijun (Jerry)
>> <jerry.lilijun@huawei.com>; Lichunhe <lichunhe@huawei.com>; nd
>> <nd@arm.com>; chenchanghu <chenchanghu@huawei.com>
>> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
>>
>> On 6/3/20 9:04 AM, Linhaifeng wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Ben Pfaff [mailto:blp@ovn.org]
>>>> Sent: Wednesday, June 3, 2020 1:26 PM
>>>> To: Linhaifeng <haifeng.lin@huawei.com>
>>>> Cc: Yanqin Wei <Yanqin.Wei@arm.com>; dev@openvswitch.org; nd
>>>> <nd@arm.com>; Lilijun (Jerry) <jerry.lilijun@huawei.com>; chenchanghu
>>>> <chenchanghu@huawei.com>; Lichunhe <lichunhe@huawei.com>
>>>> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
>>>>
>>>> On Wed, Jun 03, 2020 at 01:22:52AM +0000, Linhaifeng wrote:
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Ben Pfaff [mailto:blp@ovn.org]
>>>>> Sent: Wednesday, June 3, 2020 1:28 AM
>>>>> To: Linhaifeng <haifeng.lin@huawei.com>
>>>>> Cc: Yanqin Wei <Yanqin.Wei@arm.com>; dev@openvswitch.org; nd
>>>>> <nd@arm.com>; Lilijun (Jerry) <jerry.lilijun@huawei.com>;
>>>>> chenchanghu <chenchanghu@huawei.com>; Lichunhe
>> <lichunhe@huawei.com>
>>>>> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
>>>>>
>>>>> On Tue, Jun 02, 2020 at 07:27:59AM +0000, Linhaifeng wrote:
>>>>>> We should update rcu pointer first then use ovsrcu_postpone to free
>>>>>> otherwise maybe cause use-after-free.
>>>>>> e.g.,reader indicates momentary quiescent and access old pointer
>>>>>> after writer postpone free old pointer and before setting new pointer.
>>>>>>
>>>>>> Signed-off-by: Linhaifeng <haifeng.lin@huawei.com>
>>>>>
>>>>> I don't see how that's possible, since the writer hasn't quiesced.
>>>>>
>>>>> I think the logic is as follow, Could you help me find out where is incorrect?
>>>>>
>>>>> 1.1 -> 1.2 -> 3.1 -> 3.2 -> 2.1 -> 2.2 -> 2.3 -> 2.1 -> 1.3 -> 1.4
>>>>> ->
>>>>> 3.3 -> 2.2(use after free)
>>>>>
>>>>> wirter:
>>>>> 1.1 use postone to free old pointer
>>>>> 1.2 flush cbsets to flushed_cbsets
>>>>> 1.3 update new pointer
>>>>> 1.4 quiesced
>>>>>
>>>>> Read:
>>>>> 2.1. read pointer
>>>>> 2.2. use pointer
>>>>> 2.3. quiesced
>>>>>
>>>>> Rcu:
>>>>> 3.1 pop flushed_cbsets
>>>>> 3.2 ovsrcu_synchronize
>>>>> 3.3 call all cb to free
>>>>
>>>> So you're saying this:
>>>>
>>>>     1.1 use postone to free old pointer (A)
>>>>     1.2 flush cbsets to flushed_cbsets
>>>>
>>>>     3.1 pop flushed_cbsets
>>>>     3.2 ovsrcu_synchronize
>>>>
>>>>     2.1. read pointer (A)
>>>>     2.2. use pointer (A)
>>>>     2.3. quiesced
>>>>
>>>>     2.1. read pointer (A)
>>>>
>>>>     1.3 update new pointer (B)
>>>>     1.4 quiesced
>>>>
>>>>     3.3 call all cb to free (A)
>>>>
>>>>     2.2. use pointer (A)
>>>>
>>>> Wow, you are absolutely right.  This had never occurred to me.  Thank
>> you!
>>>> I'll review your patch.
>>>
>>> Yes, it's really hard to happen. If it happened it's also hard to find the reason
>> so I suggest it can be a rule for using rcu.
>>
>> I agree that there is an issue here, but I think that we should not force users to
>> call ovsrcu_set() before ovsrcu_postpone().  Current users doesn't do
>> anything illegal since pointer must not be freed before the next grace period
>> from their point of view.
>>
>> For me it looks like the main issue is existence of point 1.2, i.e. flushing cbsets
>> while writer is not quiesced yet.  And we need to fix this inside rcu library itself.
>> For example, we could avoid flushing inside
>> ovsrcu_postpone() by making cbs[16] a dynamically allocated array and using
>> x2nrealloc instead of flushing.
>>
>> Thoughts?
>>
> Hi, Ilya Maximets
> 
> May be this is a good idea therefor the users not need to think about call ovsrcu_set() first or ovsrcu_postpone().
> How about you think, ben? May be you can send a patch to modify the ovsrcu_postpone() not to flush cbsets to
> replace of my patches.

The change could look like this:

diff --git a/lib/ovs-rcu.c b/lib/ovs-rcu.c
index ebc8120f0..cde1e925b 100644
--- a/lib/ovs-rcu.c
+++ b/lib/ovs-rcu.c
@@ -30,6 +30,8 @@
 
 VLOG_DEFINE_THIS_MODULE(ovs_rcu);
 
+#define MIN_CBS 16
+
 struct ovsrcu_cb {
     void (*function)(void *aux);
     void *aux;
@@ -37,7 +39,8 @@ struct ovsrcu_cb {
 
 struct ovsrcu_cbset {
     struct ovs_list list_node;
-    struct ovsrcu_cb cbs[16];
+    struct ovsrcu_cb *cbs;
+    size_t n_allocated;
     int n_cbs;
 };
 
@@ -310,16 +313,19 @@ ovsrcu_postpone__(void (*function)(void *aux), void *aux)
     cbset = perthread->cbset;
     if (!cbset) {
         cbset = perthread->cbset = xmalloc(sizeof *perthread->cbset);
+        cbset->cbs = xmalloc(MIN_CBS * sizeof *cbset->cbs);
+        cbset->n_allocated = MIN_CBS;
         cbset->n_cbs = 0;
     }
 
+    if (cbset->n_cbs == cbset->n_allocated) {
+        cbset->cbs = x2nrealloc(cbset->cbs, &cbset->n_allocated,
+                                sizeof *cbset->cbs);
+    }
+
     cb = &cbset->cbs[cbset->n_cbs++];
     cb->function = function;
     cb->aux = aux;
-
-    if (cbset->n_cbs >= ARRAY_SIZE(cbset->cbs)) {
-        ovsrcu_flush_cbset(perthread);
-    }
 }
 
 static bool
@@ -341,6 +347,7 @@ ovsrcu_call_postponed(void)
         for (cb = cbset->cbs; cb < &cbset->cbs[cbset->n_cbs]; cb++) {
             cb->function(cb->aux);
         }
+        free(cbset->cbs);
         free(cbset);
     }
 
---


I could submit a formal patch, if agreed.

Ben, what do you think?
Ilya Maximets June 3, 2020, 4:01 p.m. UTC | #15
On 6/3/20 7:26 AM, Ben Pfaff wrote:
> On Wed, Jun 03, 2020 at 01:22:52AM +0000, Linhaifeng wrote:
>>
>>
>> -----Original Message-----
>> From: Ben Pfaff [mailto:blp@ovn.org] 
>> Sent: Wednesday, June 3, 2020 1:28 AM
>> To: Linhaifeng <haifeng.lin@huawei.com>
>> Cc: Yanqin Wei <Yanqin.Wei@arm.com>; dev@openvswitch.org; nd <nd@arm.com>; Lilijun (Jerry) <jerry.lilijun@huawei.com>; chenchanghu <chenchanghu@huawei.com>; Lichunhe <lichunhe@huawei.com>
>> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
>>
>> On Tue, Jun 02, 2020 at 07:27:59AM +0000, Linhaifeng wrote:
>>> We should update rcu pointer first then use ovsrcu_postpone to free 
>>> otherwise maybe cause use-after-free.
>>> e.g.,reader indicates momentary quiescent and access old pointer after 
>>> writer postpone free old pointer and before setting new pointer.
>>>
>>> Signed-off-by: Linhaifeng <haifeng.lin@huawei.com>
>>
>> I don't see how that's possible, since the writer hasn't quiesced.
>>
>> I think the logic is as follow, Could you help me find out where is incorrect?
>>
>> 1.1 -> 1.2 -> 3.1 -> 3.2 -> 2.1 -> 2.2 -> 2.3 -> 2.1 -> 1.3 -> 1.4 -> 3.3 -> 2.2(use after free)
>>
>> wirter:
>> 1.1 use postone to free old pointer
>> 1.2 flush cbsets to flushed_cbsets
>> 1.3 update new pointer
>> 1.4 quiesced
>>
>> Read:
>> 2.1. read pointer
>> 2.2. use pointer
>> 2.3. quiesced
>>
>> Rcu:
>> 3.1 pop flushed_cbsets
>> 3.2 ovsrcu_synchronize
>> 3.3 call all cb to free
> 
> So you're saying this:
> 
>     1.1 use postone to free old pointer (A)
>     1.2 flush cbsets to flushed_cbsets
> 
>     3.1 pop flushed_cbsets
>     3.2 ovsrcu_synchronize
> 
>     2.1. read pointer (A)
>     2.2. use pointer (A)
>     2.3. quiesced
> 
>     2.1. read pointer (A)

IIUC, 'Read' thread should enter quiescent state here at least
one more time in order to trigger the issue (or, maybe, some other
thread could advance global_seqno before step 2.3 and after
reading target_seqno by RCU thread on step 3.2).  Otherwise,
target_seqno will be equal to thread->seqno of the 'Read' thread
and RCU thread will wait for the next quiescent state before
calling callbacks.

> 
>     1.3 update new pointer (B)
>     1.4 quiesced
> 
>     3.3 call all cb to free (A)
> 
>     2.2. use pointer (A)
> 
> Wow, you are absolutely right.  This had never occurred to me.  Thank
> you!  I'll review your patch.

Sharing my more detailed diagram of the execution flow that should
trigger the issue here:

     ------------------  ------------------  -------------------
     Thread 1            Thread 2            RCU Thread
     ------------------  ------------------  -------------------
     pointer = A
    
     ovsrcu_quiesce():
      thread->seqno = 30
      global_seqno = 31
      quiesced
    
     read pointer A
     postpone(free(A)):
       flush cbset
                                             pop flushed_cbsets
                                             ovsrcu_synchronize:
                                               target_seqno = 31
                         ovsrcu_quiesce():
                          thread->seqno = 31
                          global_seqno = 32
                          quiesced
    
                         read pointer A
                         use pointer A
    
                         ovsrcu_quiesce():
                          thread->seqno = 32
                          global_seqno = 33
                          quiesced
    
                         read pointer A
     pointer = B
    
     ovsrcu_quiesce():
      thread->seqno = 33
      global_seqno = 34
      quiesced
    
                                             target_seqno exceeded
                                             by all threads
                                             call cbs to free A
                         use pointer A
                         (use after free)
     -----------------------------------------------------------

Best regards, Ilya Maximets.
Linhaifeng June 4, 2020, 1:25 a.m. UTC | #16
> -----Original Message-----
> From: Ilya Maximets [mailto:i.maximets@ovn.org]
> Sent: Thursday, June 4, 2020 12:01 AM
> To: Ben Pfaff <blp@ovn.org>; Linhaifeng <haifeng.lin@huawei.com>
> Cc: dev@openvswitch.org; Lilijun (Jerry) <jerry.lilijun@huawei.com>; Lichunhe
> <lichunhe@huawei.com>; nd <nd@arm.com>; chenchanghu
> <chenchanghu@huawei.com>; i.maximets@ovn.org
> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
> 
> On 6/3/20 7:26 AM, Ben Pfaff wrote:
> > On Wed, Jun 03, 2020 at 01:22:52AM +0000, Linhaifeng wrote:
> >>
> >>
> >> -----Original Message-----
> >> From: Ben Pfaff [mailto:blp@ovn.org]
> >> Sent: Wednesday, June 3, 2020 1:28 AM
> >> To: Linhaifeng <haifeng.lin@huawei.com>
> >> Cc: Yanqin Wei <Yanqin.Wei@arm.com>; dev@openvswitch.org; nd
> >> <nd@arm.com>; Lilijun (Jerry) <jerry.lilijun@huawei.com>; chenchanghu
> >> <chenchanghu@huawei.com>; Lichunhe <lichunhe@huawei.com>
> >> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
> >>
> >> On Tue, Jun 02, 2020 at 07:27:59AM +0000, Linhaifeng wrote:
> >>> We should update rcu pointer first then use ovsrcu_postpone to free
> >>> otherwise maybe cause use-after-free.
> >>> e.g.,reader indicates momentary quiescent and access old pointer
> >>> after writer postpone free old pointer and before setting new pointer.
> >>>
> >>> Signed-off-by: Linhaifeng <haifeng.lin@huawei.com>
> >>
> >> I don't see how that's possible, since the writer hasn't quiesced.
> >>
> >> I think the logic is as follow, Could you help me find out where is incorrect?
> >>
> >> 1.1 -> 1.2 -> 3.1 -> 3.2 -> 2.1 -> 2.2 -> 2.3 -> 2.1 -> 1.3 -> 1.4 ->
> >> 3.3 -> 2.2(use after free)
> >>
> >> wirter:
> >> 1.1 use postone to free old pointer
> >> 1.2 flush cbsets to flushed_cbsets
> >> 1.3 update new pointer
> >> 1.4 quiesced
> >>
> >> Read:
> >> 2.1. read pointer
> >> 2.2. use pointer
> >> 2.3. quiesced
> >>
> >> Rcu:
> >> 3.1 pop flushed_cbsets
> >> 3.2 ovsrcu_synchronize
> >> 3.3 call all cb to free
> >
> > So you're saying this:
> >
> >     1.1 use postone to free old pointer (A)
> >     1.2 flush cbsets to flushed_cbsets
> >
> >     3.1 pop flushed_cbsets
> >     3.2 ovsrcu_synchronize
> >
> >     2.1. read pointer (A)
> >     2.2. use pointer (A)
> >     2.3. quiesced
> >
> >     2.1. read pointer (A)
> 
> IIUC, 'Read' thread should enter quiescent state here at least one more time in
> order to trigger the issue (or, maybe, some other thread could advance
> global_seqno before step 2.3 and after reading target_seqno by RCU thread on
> step 3.2).  Otherwise, target_seqno will be equal to thread->seqno of the
> 'Read' thread and RCU thread will wait for the next quiescent state before
> calling callbacks.
> 

Yes, you are right. The reader's thread->seqno should bigger than global_seqno so it maybe have run one more times before someone's thread->seqno bigger than global_seqno.

> >
> >     1.3 update new pointer (B)
> >     1.4 quiesced
> >
> >     3.3 call all cb to free (A)
> >
> >     2.2. use pointer (A)
> >
> > Wow, you are absolutely right.  This had never occurred to me.  Thank
> > you!  I'll review your patch.
> 
> Sharing my more detailed diagram of the execution flow that should trigger the
> issue here:
> 
>      ------------------  ------------------  -------------------
>      Thread 1            Thread 2            RCU Thread
>      ------------------  ------------------  -------------------
>      pointer = A
> 
>      ovsrcu_quiesce():
>       thread->seqno = 30
>       global_seqno = 31
>       quiesced
> 
>      read pointer A
>      postpone(free(A)):
>        flush cbset
>                                              pop flushed_cbsets
>                                              ovsrcu_synchronize:
>                                                target_seqno = 31
>                          ovsrcu_quiesce():
>                           thread->seqno = 31
>                           global_seqno = 32
>                           quiesced
> 
>                          read pointer A
>                          use pointer A
> 
>                          ovsrcu_quiesce():
>                           thread->seqno = 32
>                           global_seqno = 33
>                           quiesced
> 
>                          read pointer A
>      pointer = B
> 
>      ovsrcu_quiesce():
>       thread->seqno = 33
>       global_seqno = 34
>       quiesced
> 
>                                              target_seqno exceeded
>                                              by all threads
>                                              call cbs to free A
>                          use pointer A
>                          (use after free)
>      -----------------------------------------------------------
> 
> Best regards, Ilya Maximets.
Linhaifeng June 8, 2020, 1:29 a.m. UTC | #17
Hi, Ben

Will this patch apply ?

> -----Original Message-----
> From: Ilya Maximets [mailto:i.maximets@ovn.org]
> Sent: Wednesday, June 3, 2020 10:33 PM
> To: Linhaifeng <haifeng.lin@huawei.com>; Ilya Maximets
> <i.maximets@ovn.org>; Ben Pfaff <blp@ovn.org>
> Cc: dev@openvswitch.org; Lilijun (Jerry) <jerry.lilijun@huawei.com>; Lichunhe
> <lichunhe@huawei.com>; nd <nd@arm.com>; chenchanghu
> <chenchanghu@huawei.com>
> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
> 
> The change could look like this:
> 
> diff --git a/lib/ovs-rcu.c b/lib/ovs-rcu.c index ebc8120f0..cde1e925b 100644
> --- a/lib/ovs-rcu.c
> +++ b/lib/ovs-rcu.c
> @@ -30,6 +30,8 @@
> 
>  VLOG_DEFINE_THIS_MODULE(ovs_rcu);
> 
> +#define MIN_CBS 16
> +
>  struct ovsrcu_cb {
>      void (*function)(void *aux);
>      void *aux;
> @@ -37,7 +39,8 @@ struct ovsrcu_cb {
> 
>  struct ovsrcu_cbset {
>      struct ovs_list list_node;
> -    struct ovsrcu_cb cbs[16];
> +    struct ovsrcu_cb *cbs;
> +    size_t n_allocated;
>      int n_cbs;
>  };
> 
> @@ -310,16 +313,19 @@ ovsrcu_postpone__(void (*function)(void *aux), void
> *aux)
>      cbset = perthread->cbset;
>      if (!cbset) {
>          cbset = perthread->cbset = xmalloc(sizeof *perthread->cbset);
> +        cbset->cbs = xmalloc(MIN_CBS * sizeof *cbset->cbs);
> +        cbset->n_allocated = MIN_CBS;
>          cbset->n_cbs = 0;
>      }
> 
> +    if (cbset->n_cbs == cbset->n_allocated) {
> +        cbset->cbs = x2nrealloc(cbset->cbs, &cbset->n_allocated,
> +                                sizeof *cbset->cbs);
> +    }
> +
>      cb = &cbset->cbs[cbset->n_cbs++];
>      cb->function = function;
>      cb->aux = aux;
> -
> -    if (cbset->n_cbs >= ARRAY_SIZE(cbset->cbs)) {
> -        ovsrcu_flush_cbset(perthread);
> -    }
>  }
> 
>  static bool
> @@ -341,6 +347,7 @@ ovsrcu_call_postponed(void)
>          for (cb = cbset->cbs; cb < &cbset->cbs[cbset->n_cbs]; cb++) {
>              cb->function(cb->aux);
>          }
> +        free(cbset->cbs);
>          free(cbset);
>      }
> 
> ---
> 
> 
> I could submit a formal patch, if agreed.
> 
> Ben, what do you think?
Ben Pfaff June 10, 2020, 6:40 p.m. UTC | #18
On Wed, Jun 03, 2020 at 04:33:28PM +0200, Ilya Maximets wrote:
> On 6/3/20 1:08 PM, Linhaifeng wrote:
> > 
> > 
> >> -----Original Message-----
> >> From: Ilya Maximets [mailto:i.maximets@ovn.org]
> >> Sent: Wednesday, June 3, 2020 6:50 PM
> >> To: Linhaifeng <haifeng.lin@huawei.com>; Ben Pfaff <blp@ovn.org>
> >> Cc: i.maximets@ovn.org; dev@openvswitch.org; Lilijun (Jerry)
> >> <jerry.lilijun@huawei.com>; Lichunhe <lichunhe@huawei.com>; nd
> >> <nd@arm.com>; chenchanghu <chenchanghu@huawei.com>
> >> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
> >>
> >> On 6/3/20 9:04 AM, Linhaifeng wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Ben Pfaff [mailto:blp@ovn.org]
> >>>> Sent: Wednesday, June 3, 2020 1:26 PM
> >>>> To: Linhaifeng <haifeng.lin@huawei.com>
> >>>> Cc: Yanqin Wei <Yanqin.Wei@arm.com>; dev@openvswitch.org; nd
> >>>> <nd@arm.com>; Lilijun (Jerry) <jerry.lilijun@huawei.com>; chenchanghu
> >>>> <chenchanghu@huawei.com>; Lichunhe <lichunhe@huawei.com>
> >>>> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
> >>>>
> >>>> On Wed, Jun 03, 2020 at 01:22:52AM +0000, Linhaifeng wrote:
> >>>>>
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: Ben Pfaff [mailto:blp@ovn.org]
> >>>>> Sent: Wednesday, June 3, 2020 1:28 AM
> >>>>> To: Linhaifeng <haifeng.lin@huawei.com>
> >>>>> Cc: Yanqin Wei <Yanqin.Wei@arm.com>; dev@openvswitch.org; nd
> >>>>> <nd@arm.com>; Lilijun (Jerry) <jerry.lilijun@huawei.com>;
> >>>>> chenchanghu <chenchanghu@huawei.com>; Lichunhe
> >> <lichunhe@huawei.com>
> >>>>> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
> >>>>>
> >>>>> On Tue, Jun 02, 2020 at 07:27:59AM +0000, Linhaifeng wrote:
> >>>>>> We should update rcu pointer first then use ovsrcu_postpone to free
> >>>>>> otherwise maybe cause use-after-free.
> >>>>>> e.g.,reader indicates momentary quiescent and access old pointer
> >>>>>> after writer postpone free old pointer and before setting new pointer.
> >>>>>>
> >>>>>> Signed-off-by: Linhaifeng <haifeng.lin@huawei.com>
> >>>>>
> >>>>> I don't see how that's possible, since the writer hasn't quiesced.
> >>>>>
> >>>>> I think the logic is as follow, Could you help me find out where is incorrect?
> >>>>>
> >>>>> 1.1 -> 1.2 -> 3.1 -> 3.2 -> 2.1 -> 2.2 -> 2.3 -> 2.1 -> 1.3 -> 1.4
> >>>>> ->
> >>>>> 3.3 -> 2.2(use after free)
> >>>>>
> >>>>> wirter:
> >>>>> 1.1 use postone to free old pointer
> >>>>> 1.2 flush cbsets to flushed_cbsets
> >>>>> 1.3 update new pointer
> >>>>> 1.4 quiesced
> >>>>>
> >>>>> Read:
> >>>>> 2.1. read pointer
> >>>>> 2.2. use pointer
> >>>>> 2.3. quiesced
> >>>>>
> >>>>> Rcu:
> >>>>> 3.1 pop flushed_cbsets
> >>>>> 3.2 ovsrcu_synchronize
> >>>>> 3.3 call all cb to free
> >>>>
> >>>> So you're saying this:
> >>>>
> >>>>     1.1 use postone to free old pointer (A)
> >>>>     1.2 flush cbsets to flushed_cbsets
> >>>>
> >>>>     3.1 pop flushed_cbsets
> >>>>     3.2 ovsrcu_synchronize
> >>>>
> >>>>     2.1. read pointer (A)
> >>>>     2.2. use pointer (A)
> >>>>     2.3. quiesced
> >>>>
> >>>>     2.1. read pointer (A)
> >>>>
> >>>>     1.3 update new pointer (B)
> >>>>     1.4 quiesced
> >>>>
> >>>>     3.3 call all cb to free (A)
> >>>>
> >>>>     2.2. use pointer (A)
> >>>>
> >>>> Wow, you are absolutely right.  This had never occurred to me.  Thank
> >> you!
> >>>> I'll review your patch.
> >>>
> >>> Yes, it's really hard to happen. If it happened it's also hard to find the reason
> >> so I suggest it can be a rule for using rcu.
> >>
> >> I agree that there is an issue here, but I think that we should not force users to
> >> call ovsrcu_set() before ovsrcu_postpone().  Current users doesn't do
> >> anything illegal since pointer must not be freed before the next grace period
> >> from their point of view.
> >>
> >> For me it looks like the main issue is existence of point 1.2, i.e. flushing cbsets
> >> while writer is not quiesced yet.  And we need to fix this inside rcu library itself.
> >> For example, we could avoid flushing inside
> >> ovsrcu_postpone() by making cbs[16] a dynamically allocated array and using
> >> x2nrealloc instead of flushing.
> >>
> >> Thoughts?
> >>
> > Hi, Ilya Maximets
> > 
> > May be this is a good idea therefor the users not need to think about call ovsrcu_set() first or ovsrcu_postpone().
> > How about you think, ben? May be you can send a patch to modify the ovsrcu_postpone() not to flush cbsets to
> > replace of my patches.
> 
> The change could look like this:
> 
> diff --git a/lib/ovs-rcu.c b/lib/ovs-rcu.c
> index ebc8120f0..cde1e925b 100644
> --- a/lib/ovs-rcu.c
> +++ b/lib/ovs-rcu.c
> @@ -30,6 +30,8 @@
>  
>  VLOG_DEFINE_THIS_MODULE(ovs_rcu);
>  
> +#define MIN_CBS 16
> +
>  struct ovsrcu_cb {
>      void (*function)(void *aux);
>      void *aux;
> @@ -37,7 +39,8 @@ struct ovsrcu_cb {
>  
>  struct ovsrcu_cbset {
>      struct ovs_list list_node;
> -    struct ovsrcu_cb cbs[16];
> +    struct ovsrcu_cb *cbs;
> +    size_t n_allocated;
>      int n_cbs;
>  };
>  
> @@ -310,16 +313,19 @@ ovsrcu_postpone__(void (*function)(void *aux), void *aux)
>      cbset = perthread->cbset;
>      if (!cbset) {
>          cbset = perthread->cbset = xmalloc(sizeof *perthread->cbset);
> +        cbset->cbs = xmalloc(MIN_CBS * sizeof *cbset->cbs);
> +        cbset->n_allocated = MIN_CBS;
>          cbset->n_cbs = 0;
>      }
>  
> +    if (cbset->n_cbs == cbset->n_allocated) {
> +        cbset->cbs = x2nrealloc(cbset->cbs, &cbset->n_allocated,
> +                                sizeof *cbset->cbs);
> +    }
> +
>      cb = &cbset->cbs[cbset->n_cbs++];
>      cb->function = function;
>      cb->aux = aux;
> -
> -    if (cbset->n_cbs >= ARRAY_SIZE(cbset->cbs)) {
> -        ovsrcu_flush_cbset(perthread);
> -    }
>  }
>  
>  static bool
> @@ -341,6 +347,7 @@ ovsrcu_call_postponed(void)
>          for (cb = cbset->cbs; cb < &cbset->cbs[cbset->n_cbs]; cb++) {
>              cb->function(cb->aux);
>          }
> +        free(cbset->cbs);
>          free(cbset);
>      }
>  
> ---
> 
> 
> I could submit a formal patch, if agreed.
> 
> Ben, what do you think?

I was pretty resigned to the idea that we'd have to change the callers
and the expectations.  This seems better and the code looks fine to me.
Thank you so much for thinking of this approach!

This approach also has the significant advantage that it should be easy
to backport.  I was worried that we would either have to laboriously
backport and check lots of old versions, or just declare that old
versions wouldn't get fixed.

As long as you tested this:

Acked-by: Ben Pfaff <blp@ovn.org>
Ben Pfaff June 10, 2020, 6:41 p.m. UTC | #19
On Wed, Jun 10, 2020 at 11:40:14AM -0700, Ben Pfaff wrote:
> On Wed, Jun 03, 2020 at 04:33:28PM +0200, Ilya Maximets wrote:
> > On 6/3/20 1:08 PM, Linhaifeng wrote:
> > > 
> > > 
> > >> -----Original Message-----
> > >> From: Ilya Maximets [mailto:i.maximets@ovn.org]
> > >> Sent: Wednesday, June 3, 2020 6:50 PM
> > >> To: Linhaifeng <haifeng.lin@huawei.com>; Ben Pfaff <blp@ovn.org>
> > >> Cc: i.maximets@ovn.org; dev@openvswitch.org; Lilijun (Jerry)
> > >> <jerry.lilijun@huawei.com>; Lichunhe <lichunhe@huawei.com>; nd
> > >> <nd@arm.com>; chenchanghu <chenchanghu@huawei.com>
> > >> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
> > >>
> > >> On 6/3/20 9:04 AM, Linhaifeng wrote:
> > >>>
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: Ben Pfaff [mailto:blp@ovn.org]
> > >>>> Sent: Wednesday, June 3, 2020 1:26 PM
> > >>>> To: Linhaifeng <haifeng.lin@huawei.com>
> > >>>> Cc: Yanqin Wei <Yanqin.Wei@arm.com>; dev@openvswitch.org; nd
> > >>>> <nd@arm.com>; Lilijun (Jerry) <jerry.lilijun@huawei.com>; chenchanghu
> > >>>> <chenchanghu@huawei.com>; Lichunhe <lichunhe@huawei.com>
> > >>>> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
> > >>>>
> > >>>> On Wed, Jun 03, 2020 at 01:22:52AM +0000, Linhaifeng wrote:
> > >>>>>
> > >>>>>
> > >>>>> -----Original Message-----
> > >>>>> From: Ben Pfaff [mailto:blp@ovn.org]
> > >>>>> Sent: Wednesday, June 3, 2020 1:28 AM
> > >>>>> To: Linhaifeng <haifeng.lin@huawei.com>
> > >>>>> Cc: Yanqin Wei <Yanqin.Wei@arm.com>; dev@openvswitch.org; nd
> > >>>>> <nd@arm.com>; Lilijun (Jerry) <jerry.lilijun@huawei.com>;
> > >>>>> chenchanghu <chenchanghu@huawei.com>; Lichunhe
> > >> <lichunhe@huawei.com>
> > >>>>> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
> > >>>>>
> > >>>>> On Tue, Jun 02, 2020 at 07:27:59AM +0000, Linhaifeng wrote:
> > >>>>>> We should update rcu pointer first then use ovsrcu_postpone to free
> > >>>>>> otherwise maybe cause use-after-free.
> > >>>>>> e.g.,reader indicates momentary quiescent and access old pointer
> > >>>>>> after writer postpone free old pointer and before setting new pointer.
> > >>>>>>
> > >>>>>> Signed-off-by: Linhaifeng <haifeng.lin@huawei.com>
> > >>>>>
> > >>>>> I don't see how that's possible, since the writer hasn't quiesced.
> > >>>>>
> > >>>>> I think the logic is as follow, Could you help me find out where is incorrect?
> > >>>>>
> > >>>>> 1.1 -> 1.2 -> 3.1 -> 3.2 -> 2.1 -> 2.2 -> 2.3 -> 2.1 -> 1.3 -> 1.4
> > >>>>> ->
> > >>>>> 3.3 -> 2.2(use after free)
> > >>>>>
> > >>>>> wirter:
> > >>>>> 1.1 use postone to free old pointer
> > >>>>> 1.2 flush cbsets to flushed_cbsets
> > >>>>> 1.3 update new pointer
> > >>>>> 1.4 quiesced
> > >>>>>
> > >>>>> Read:
> > >>>>> 2.1. read pointer
> > >>>>> 2.2. use pointer
> > >>>>> 2.3. quiesced
> > >>>>>
> > >>>>> Rcu:
> > >>>>> 3.1 pop flushed_cbsets
> > >>>>> 3.2 ovsrcu_synchronize
> > >>>>> 3.3 call all cb to free
> > >>>>
> > >>>> So you're saying this:
> > >>>>
> > >>>>     1.1 use postone to free old pointer (A)
> > >>>>     1.2 flush cbsets to flushed_cbsets
> > >>>>
> > >>>>     3.1 pop flushed_cbsets
> > >>>>     3.2 ovsrcu_synchronize
> > >>>>
> > >>>>     2.1. read pointer (A)
> > >>>>     2.2. use pointer (A)
> > >>>>     2.3. quiesced
> > >>>>
> > >>>>     2.1. read pointer (A)
> > >>>>
> > >>>>     1.3 update new pointer (B)
> > >>>>     1.4 quiesced
> > >>>>
> > >>>>     3.3 call all cb to free (A)
> > >>>>
> > >>>>     2.2. use pointer (A)
> > >>>>
> > >>>> Wow, you are absolutely right.  This had never occurred to me.  Thank
> > >> you!
> > >>>> I'll review your patch.
> > >>>
> > >>> Yes, it's really hard to happen. If it happened it's also hard to find the reason
> > >> so I suggest it can be a rule for using rcu.
> > >>
> > >> I agree that there is an issue here, but I think that we should not force users to
> > >> call ovsrcu_set() before ovsrcu_postpone().  Current users doesn't do
> > >> anything illegal since pointer must not be freed before the next grace period
> > >> from their point of view.
> > >>
> > >> For me it looks like the main issue is existence of point 1.2, i.e. flushing cbsets
> > >> while writer is not quiesced yet.  And we need to fix this inside rcu library itself.
> > >> For example, we could avoid flushing inside
> > >> ovsrcu_postpone() by making cbs[16] a dynamically allocated array and using
> > >> x2nrealloc instead of flushing.
> > >>
> > >> Thoughts?
> > >>
> > > Hi, Ilya Maximets
> > > 
> > > May be this is a good idea therefor the users not need to think about call ovsrcu_set() first or ovsrcu_postpone().
> > > How about you think, ben? May be you can send a patch to modify the ovsrcu_postpone() not to flush cbsets to
> > > replace of my patches.
> > 
> > The change could look like this:
> > 
> > diff --git a/lib/ovs-rcu.c b/lib/ovs-rcu.c
> > index ebc8120f0..cde1e925b 100644
> > --- a/lib/ovs-rcu.c
> > +++ b/lib/ovs-rcu.c
> > @@ -30,6 +30,8 @@
> >  
> >  VLOG_DEFINE_THIS_MODULE(ovs_rcu);
> >  
> > +#define MIN_CBS 16
> > +
> >  struct ovsrcu_cb {
> >      void (*function)(void *aux);
> >      void *aux;
> > @@ -37,7 +39,8 @@ struct ovsrcu_cb {
> >  
> >  struct ovsrcu_cbset {
> >      struct ovs_list list_node;
> > -    struct ovsrcu_cb cbs[16];
> > +    struct ovsrcu_cb *cbs;
> > +    size_t n_allocated;
> >      int n_cbs;
> >  };
> >  
> > @@ -310,16 +313,19 @@ ovsrcu_postpone__(void (*function)(void *aux), void *aux)
> >      cbset = perthread->cbset;
> >      if (!cbset) {
> >          cbset = perthread->cbset = xmalloc(sizeof *perthread->cbset);
> > +        cbset->cbs = xmalloc(MIN_CBS * sizeof *cbset->cbs);
> > +        cbset->n_allocated = MIN_CBS;
> >          cbset->n_cbs = 0;
> >      }
> >  
> > +    if (cbset->n_cbs == cbset->n_allocated) {
> > +        cbset->cbs = x2nrealloc(cbset->cbs, &cbset->n_allocated,
> > +                                sizeof *cbset->cbs);
> > +    }
> > +
> >      cb = &cbset->cbs[cbset->n_cbs++];
> >      cb->function = function;
> >      cb->aux = aux;
> > -
> > -    if (cbset->n_cbs >= ARRAY_SIZE(cbset->cbs)) {
> > -        ovsrcu_flush_cbset(perthread);
> > -    }
> >  }
> >  
> >  static bool
> > @@ -341,6 +347,7 @@ ovsrcu_call_postponed(void)
> >          for (cb = cbset->cbs; cb < &cbset->cbs[cbset->n_cbs]; cb++) {
> >              cb->function(cb->aux);
> >          }
> > +        free(cbset->cbs);
> >          free(cbset);
> >      }
> >  
> > ---
> > 
> > 
> > I could submit a formal patch, if agreed.
> > 
> > Ben, what do you think?
> 
> I was pretty resigned to the idea that we'd have to change the callers
> and the expectations.  This seems better and the code looks fine to me.
> Thank you so much for thinking of this approach!
> 
> This approach also has the significant advantage that it should be easy
> to backport.  I was worried that we would either have to laboriously
> backport and check lots of old versions, or just declare that old
> versions wouldn't get fixed.
> 
> As long as you tested this:
> 
> Acked-by: Ben Pfaff <blp@ovn.org>

Oh, and please credit Linhaifeng for reporting the problem.  This is
subtle and we've had years to notice and Linhaifeng is the first one to
actually do that.

Thanks,

Ben.
Ilya Maximets June 10, 2020, 6:53 p.m. UTC | #20
On 6/10/20 8:41 PM, Ben Pfaff wrote:
> On Wed, Jun 10, 2020 at 11:40:14AM -0700, Ben Pfaff wrote:
>> On Wed, Jun 03, 2020 at 04:33:28PM +0200, Ilya Maximets wrote:
>>> On 6/3/20 1:08 PM, Linhaifeng wrote:
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Ilya Maximets [mailto:i.maximets@ovn.org]
>>>>> Sent: Wednesday, June 3, 2020 6:50 PM
>>>>> To: Linhaifeng <haifeng.lin@huawei.com>; Ben Pfaff <blp@ovn.org>
>>>>> Cc: i.maximets@ovn.org; dev@openvswitch.org; Lilijun (Jerry)
>>>>> <jerry.lilijun@huawei.com>; Lichunhe <lichunhe@huawei.com>; nd
>>>>> <nd@arm.com>; chenchanghu <chenchanghu@huawei.com>
>>>>> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
>>>>>
>>>>> On 6/3/20 9:04 AM, Linhaifeng wrote:
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Ben Pfaff [mailto:blp@ovn.org]
>>>>>>> Sent: Wednesday, June 3, 2020 1:26 PM
>>>>>>> To: Linhaifeng <haifeng.lin@huawei.com>
>>>>>>> Cc: Yanqin Wei <Yanqin.Wei@arm.com>; dev@openvswitch.org; nd
>>>>>>> <nd@arm.com>; Lilijun (Jerry) <jerry.lilijun@huawei.com>; chenchanghu
>>>>>>> <chenchanghu@huawei.com>; Lichunhe <lichunhe@huawei.com>
>>>>>>> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
>>>>>>>
>>>>>>> On Wed, Jun 03, 2020 at 01:22:52AM +0000, Linhaifeng wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Ben Pfaff [mailto:blp@ovn.org]
>>>>>>>> Sent: Wednesday, June 3, 2020 1:28 AM
>>>>>>>> To: Linhaifeng <haifeng.lin@huawei.com>
>>>>>>>> Cc: Yanqin Wei <Yanqin.Wei@arm.com>; dev@openvswitch.org; nd
>>>>>>>> <nd@arm.com>; Lilijun (Jerry) <jerry.lilijun@huawei.com>;
>>>>>>>> chenchanghu <chenchanghu@huawei.com>; Lichunhe
>>>>> <lichunhe@huawei.com>
>>>>>>>> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
>>>>>>>>
>>>>>>>> On Tue, Jun 02, 2020 at 07:27:59AM +0000, Linhaifeng wrote:
>>>>>>>>> We should update rcu pointer first then use ovsrcu_postpone to free
>>>>>>>>> otherwise maybe cause use-after-free.
>>>>>>>>> e.g.,reader indicates momentary quiescent and access old pointer
>>>>>>>>> after writer postpone free old pointer and before setting new pointer.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Linhaifeng <haifeng.lin@huawei.com>
>>>>>>>>
>>>>>>>> I don't see how that's possible, since the writer hasn't quiesced.
>>>>>>>>
>>>>>>>> I think the logic is as follow, Could you help me find out where is incorrect?
>>>>>>>>
>>>>>>>> 1.1 -> 1.2 -> 3.1 -> 3.2 -> 2.1 -> 2.2 -> 2.3 -> 2.1 -> 1.3 -> 1.4
>>>>>>>> ->
>>>>>>>> 3.3 -> 2.2(use after free)
>>>>>>>>
>>>>>>>> wirter:
>>>>>>>> 1.1 use postone to free old pointer
>>>>>>>> 1.2 flush cbsets to flushed_cbsets
>>>>>>>> 1.3 update new pointer
>>>>>>>> 1.4 quiesced
>>>>>>>>
>>>>>>>> Read:
>>>>>>>> 2.1. read pointer
>>>>>>>> 2.2. use pointer
>>>>>>>> 2.3. quiesced
>>>>>>>>
>>>>>>>> Rcu:
>>>>>>>> 3.1 pop flushed_cbsets
>>>>>>>> 3.2 ovsrcu_synchronize
>>>>>>>> 3.3 call all cb to free
>>>>>>>
>>>>>>> So you're saying this:
>>>>>>>
>>>>>>>     1.1 use postone to free old pointer (A)
>>>>>>>     1.2 flush cbsets to flushed_cbsets
>>>>>>>
>>>>>>>     3.1 pop flushed_cbsets
>>>>>>>     3.2 ovsrcu_synchronize
>>>>>>>
>>>>>>>     2.1. read pointer (A)
>>>>>>>     2.2. use pointer (A)
>>>>>>>     2.3. quiesced
>>>>>>>
>>>>>>>     2.1. read pointer (A)
>>>>>>>
>>>>>>>     1.3 update new pointer (B)
>>>>>>>     1.4 quiesced
>>>>>>>
>>>>>>>     3.3 call all cb to free (A)
>>>>>>>
>>>>>>>     2.2. use pointer (A)
>>>>>>>
>>>>>>> Wow, you are absolutely right.  This had never occurred to me.  Thank
>>>>> you!
>>>>>>> I'll review your patch.
>>>>>>
>>>>>> Yes, it's really hard to happen. If it happened it's also hard to find the reason
>>>>> so I suggest it can be a rule for using rcu.
>>>>>
>>>>> I agree that there is an issue here, but I think that we should not force users to
>>>>> call ovsrcu_set() before ovsrcu_postpone().  Current users doesn't do
>>>>> anything illegal since pointer must not be freed before the next grace period
>>>>> from their point of view.
>>>>>
>>>>> For me it looks like the main issue is existence of point 1.2, i.e. flushing cbsets
>>>>> while writer is not quiesced yet.  And we need to fix this inside rcu library itself.
>>>>> For example, we could avoid flushing inside
>>>>> ovsrcu_postpone() by making cbs[16] a dynamically allocated array and using
>>>>> x2nrealloc instead of flushing.
>>>>>
>>>>> Thoughts?
>>>>>
>>>> Hi, Ilya Maximets
>>>>
>>>> May be this is a good idea therefor the users not need to think about call ovsrcu_set() first or ovsrcu_postpone().
>>>> How about you think, ben? May be you can send a patch to modify the ovsrcu_postpone() not to flush cbsets to
>>>> replace of my patches.
>>>
>>> The change could look like this:
>>>
>>> diff --git a/lib/ovs-rcu.c b/lib/ovs-rcu.c
>>> index ebc8120f0..cde1e925b 100644
>>> --- a/lib/ovs-rcu.c
>>> +++ b/lib/ovs-rcu.c
>>> @@ -30,6 +30,8 @@
>>>  
>>>  VLOG_DEFINE_THIS_MODULE(ovs_rcu);
>>>  
>>> +#define MIN_CBS 16
>>> +
>>>  struct ovsrcu_cb {
>>>      void (*function)(void *aux);
>>>      void *aux;
>>> @@ -37,7 +39,8 @@ struct ovsrcu_cb {
>>>  
>>>  struct ovsrcu_cbset {
>>>      struct ovs_list list_node;
>>> -    struct ovsrcu_cb cbs[16];
>>> +    struct ovsrcu_cb *cbs;
>>> +    size_t n_allocated;
>>>      int n_cbs;
>>>  };
>>>  
>>> @@ -310,16 +313,19 @@ ovsrcu_postpone__(void (*function)(void *aux), void *aux)
>>>      cbset = perthread->cbset;
>>>      if (!cbset) {
>>>          cbset = perthread->cbset = xmalloc(sizeof *perthread->cbset);
>>> +        cbset->cbs = xmalloc(MIN_CBS * sizeof *cbset->cbs);
>>> +        cbset->n_allocated = MIN_CBS;
>>>          cbset->n_cbs = 0;
>>>      }
>>>  
>>> +    if (cbset->n_cbs == cbset->n_allocated) {
>>> +        cbset->cbs = x2nrealloc(cbset->cbs, &cbset->n_allocated,
>>> +                                sizeof *cbset->cbs);
>>> +    }
>>> +
>>>      cb = &cbset->cbs[cbset->n_cbs++];
>>>      cb->function = function;
>>>      cb->aux = aux;
>>> -
>>> -    if (cbset->n_cbs >= ARRAY_SIZE(cbset->cbs)) {
>>> -        ovsrcu_flush_cbset(perthread);
>>> -    }
>>>  }
>>>  
>>>  static bool
>>> @@ -341,6 +347,7 @@ ovsrcu_call_postponed(void)
>>>          for (cb = cbset->cbs; cb < &cbset->cbs[cbset->n_cbs]; cb++) {
>>>              cb->function(cb->aux);
>>>          }
>>> +        free(cbset->cbs);
>>>          free(cbset);
>>>      }
>>>  
>>> ---
>>>
>>>
>>> I could submit a formal patch, if agreed.
>>>
>>> Ben, what do you think?
>>
>> I was pretty resigned to the idea that we'd have to change the callers
>> and the expectations.  This seems better and the code looks fine to me.
>> Thank you so much for thinking of this approach!
>>
>> This approach also has the significant advantage that it should be easy
>> to backport.  I was worried that we would either have to laboriously
>> backport and check lots of old versions, or just declare that old
>> versions wouldn't get fixed.
>>
>> As long as you tested this:
>>
>> Acked-by: Ben Pfaff <blp@ovn.org>
> 
> Oh, and please credit Linhaifeng for reporting the problem.  This is
> subtle and we've had years to notice and Linhaifeng is the first one to
> actually do that.

Sure.  Thanks for review!

I'll send patch formally, give it one more test run and apply.

Best regards, Ilya Maximets.
diff mbox series

Patch

diff --git a/lib/classifier.c b/lib/classifier.c
index f2c3497c2..6bff76e07 100644
--- a/lib/classifier.c
+++ b/lib/classifier.c
@@ -249,11 +249,11 @@  cls_rule_set_conjunctions(struct cls_rule *cr,
     unsigned int old_n = old ? old->n : 0;
 
     if (old_n != n || (n && memcmp(old_conj, conj, n * sizeof *conj))) {
+        ovsrcu_set(&match->conj_set,
+                   cls_conjunction_set_alloc(match, conj, n));
         if (old) {
             ovsrcu_postpone(free, old);
         }
-        ovsrcu_set(&match->conj_set,
-                   cls_conjunction_set_alloc(match, conj, n));
     }
 }
 
diff --git a/lib/ovs-rcu.h b/lib/ovs-rcu.h
index ecc4c9201..98c238aea 100644
--- a/lib/ovs-rcu.h
+++ b/lib/ovs-rcu.h
@@ -118,10 +118,10 @@ 
  *     void
  *     change_flow(struct flow *new_flow)
  *     {
+ *         struct flow *old_flow = ovsrcu_get_protected(struct flow *, &flowp)
  *         ovs_mutex_lock(&mutex);
- *         ovsrcu_postpone(free,
- *                         ovsrcu_get_protected(struct flow *, &flowp));
  *         ovsrcu_set(&flowp, new_flow);
+ *         ovsrcu_postpone(free, old_flow);
  *         ovs_mutex_unlock(&mutex);
  *     }
  *
diff --git a/lib/pvector.c b/lib/pvector.c
index cc527fdc4..aa8c6cb24 100644
--- a/lib/pvector.c
+++ b/lib/pvector.c
@@ -67,10 +67,11 @@  pvector_init(struct pvector *pvec)
 void
 pvector_destroy(struct pvector *pvec)
 {
+    struct pvector_impl *old = pvector_impl_get(pvec);
     free(pvec->temp);
     pvec->temp = NULL;
-    ovsrcu_postpone(free, pvector_impl_get(pvec));
     ovsrcu_set(&pvec->impl, NULL); /* Poison. */
+    ovsrcu_postpone(free, old);
 }
 
 /* Iterators for callers that need the 'index' afterward. */
@@ -205,11 +206,11 @@  pvector_change_priority(struct pvector *pvec, void *ptr, int priority)
 /* Make the modified pvector available for iteration. */
 void pvector_publish__(struct pvector *pvec)
 {
-    struct pvector_impl *temp = pvec->temp;
-
+    struct pvector_impl *new = pvec->temp;
+    struct pvector_impl *old = ovsrcu_get_protected(struct pvector_impl *,
+                                                   &pvec->impl);
     pvec->temp = NULL;
-    pvector_impl_sort(temp); /* Also removes gaps. */
-    ovsrcu_postpone(free, ovsrcu_get_protected(struct pvector_impl *,
-                                               &pvec->impl));
-    ovsrcu_set(&pvec->impl, temp);
+    pvector_impl_sort(new); /* Also removes gaps. */
+    ovsrcu_set(&pvec->impl, new);
+    ovsrcu_postpone(free, old);
 }
diff --git a/ofproto/ofproto-dpif-mirror.c b/ofproto/ofproto-dpif-mirror.c
index 343b75f0e..343100c08 100644
--- a/ofproto/ofproto-dpif-mirror.c
+++ b/ofproto/ofproto-dpif-mirror.c
@@ -276,9 +276,9 @@  mirror_set(struct mbridge *mbridge, void *aux, const char *name,
     hmapx_destroy(&dsts_map);
 
     if (vlans || src_vlans) {
+        unsigned long *new_vlans = vlan_bitmap_clone(src_vlans);
+        ovsrcu_set(&mirror->vlans, new_vlans);
         ovsrcu_postpone(free, vlans);
-        vlans = vlan_bitmap_clone(src_vlans);
-        ovsrcu_set(&mirror->vlans, vlans);
     }
 
     mirror->out = out;
diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index 5e08ef10d..be6dafb78 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -1658,11 +1658,10 @@  ukey_set_actions(struct udpif_key *ukey, const struct ofpbuf *actions)
     struct ofpbuf *old_actions = ovsrcu_get_protected(struct ofpbuf *,
                                                       &ukey->actions);
 
+    ovsrcu_set(&ukey->actions, ofpbuf_clone(actions));
     if (old_actions) {
         ovsrcu_postpone(ofpbuf_delete, old_actions);
     }
-
-    ovsrcu_set(&ukey->actions, ofpbuf_clone(actions));
 }
 
 static struct udpif_key *