diff mbox series

[ovs-dev,ovs-dev,v1,2/2] ofproto-dpif: fix meter use-after-free

Message ID 20220326024306.27661-2-hepeng.0320@bytedance.com
State Superseded
Headers show
Series [ovs-dev,ovs-dev,v1,1/2] ovs-rcu: add rcu_barrier | expand

Checks

Context Check Description
ovsrobot/apply-robot warning apply and check: warning
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/intel-ovs-compilation success test: success

Commit Message

Peng He March 26, 2022, 2:43 a.m. UTC
add a rcu_barrier before close_dpif_backer to ensure that
all meters has been freed before id_pool_destory meter's
id-pool.

Signed-off-by: Peng He <hepeng.0320@bytedance.com>
---
 ofproto/ofproto-dpif.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

0-day Robot March 26, 2022, 4 a.m. UTC | #1
Bleep bloop.  Greetings Peng He, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
ERROR: Author Peng He <xnhp0320@gmail.com> needs to sign off.
WARNING: Unexpected sign-offs from developers who are not authors or co-authors or committers: Peng He <hepeng.0320@bytedance.com>
ERROR: Improper whitespace around control block
#27 FILE: ofproto/ofproto-dpif.c:1804:
    while(!ovsrcu_barrier(seq, 1000)) {

Lines checked: 49, Warnings: 1, Errors: 2


Please check this out.  If you feel there has been an error, please email aconole@redhat.com

Thanks,
0-day Robot
David Marchand April 6, 2022, 6:54 p.m. UTC | #2
On Sat, Mar 26, 2022 at 3:43 AM Peng He <xnhp0320@gmail.com> wrote:
>
> add a rcu_barrier before close_dpif_backer to ensure that
> all meters has been freed before id_pool_destory meter's
> id-pool.
>
> Signed-off-by: Peng He <hepeng.0320@bytedance.com>

On the principle, the issue seems fixed.
And I have been running successfully 'make check -C build-asan
TESTSUITEFLAGS="-k meter -d"' in a loop for some time tonight.

This usually fails after 1h on origin/master.


I'll let a simplified version of the patch (see below) run for the
whole night after sending this mail.

Why a simplified patch? Because I don't think we need a new rcu api.
This issue only occurs with ofproto dpif implementation.
Adding a new api might encourage people to use it even if unneeded in
more simpler places in OVS.

I squashed your series into:

diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 6601f2346..402cdcdf5 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -1805,6 +1805,7 @@ destruct(struct ofproto *ofproto_, bool del)
     struct rule_dpif *rule;
     struct oftable *table;
     struct ovs_list ams;
+    struct seq *seq;

     ofproto->backer->need_revalidate = REV_RECONFIGURE;
     xlate_txn_start();
@@ -1848,6 +1849,14 @@ destruct(struct ofproto *ofproto_, bool del)

     seq_destroy(ofproto->ams_seq);

+    /* Wait for all the meter rules to be destroyed. */
+    seq = seq_create();
+    ovsrcu_synchronize();
+    seq_wait(seq, seq_read(seq));
+    ovsrcu_postpone__((void (*)(void *))seq_change, seq);
+    poll_block();
+    seq_destroy(seq);
+
     close_dpif_backer(ofproto->backer, del);
 }

Let's hope it passes the night tests :-).
Peng He April 7, 2022, 2:12 a.m. UTC | #3
I understand there is no need to introduce a rcu api,
but at least put these codes into a separate function ? :-)

a minor issue, see below:

David Marchand <david.marchand@redhat.com> 于2022年4月7日周四 02:55写道:

> On Sat, Mar 26, 2022 at 3:43 AM Peng He <xnhp0320@gmail.com> wrote:
> >
> > add a rcu_barrier before close_dpif_backer to ensure that
> > all meters has been freed before id_pool_destory meter's
> > id-pool.
> >
> > Signed-off-by: Peng He <hepeng.0320@bytedance.com>
>
> On the principle, the issue seems fixed.
> And I have been running successfully 'make check -C build-asan
> TESTSUITEFLAGS="-k meter -d"' in a loop for some time tonight.
>
> This usually fails after 1h on origin/master.
>
>
> I'll let a simplified version of the patch (see below) run for the
> whole night after sending this mail.
>
> Why a simplified patch? Because I don't think we need a new rcu api.
> This issue only occurs with ofproto dpif implementation.
> Adding a new api might encourage people to use it even if unneeded in
> more simpler places in OVS.
>
> I squashed your series into:
>
> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
> index 6601f2346..402cdcdf5 100644
> --- a/ofproto/ofproto-dpif.c
> +++ b/ofproto/ofproto-dpif.c
> @@ -1805,6 +1805,7 @@ destruct(struct ofproto *ofproto_, bool del)
>      struct rule_dpif *rule;
>      struct oftable *table;
>      struct ovs_list ams;
> +    struct seq *seq;
>
>      ofproto->backer->need_revalidate = REV_RECONFIGURE;
>      xlate_txn_start();
> @@ -1848,6 +1849,14 @@ destruct(struct ofproto *ofproto_, bool del)
>
>      seq_destroy(ofproto->ams_seq);
>
> +    /* Wait for all the meter rules to be destroyed. */
> +    seq = seq_create();
> +    ovsrcu_synchronize();
> +    seq_wait(seq, seq_read(seq));
> +    ovsrcu_postpone__((void (*)(void *))seq_change, seq);

+    poll_block();
>

after poll block, we should check if seq is changed or not, other fds will
wake up block too. so we need a while loop to do the seq_wait + postpone +
check
stuff.

+    seq_destroy(seq);
> +
>      close_dpif_backer(ofproto->backer, del);
>  }
>
> Let's hope it passes the night tests :-).
>
>
> --
> David Marchand
>
>
Peng He April 12, 2022, 2:17 a.m. UTC | #4
Hi,

with make check -C build-asan TESTSUITEFLAGS="-k meter -d"
the ovs is build with asan enabled, is this feature already in current
building system?

I did not find it. Can anyone give some hint?
Thanks !


David Marchand <david.marchand@redhat.com> 于2022年4月7日周四 02:55写道:

> On Sat, Mar 26, 2022 at 3:43 AM Peng He <xnhp0320@gmail.com> wrote:
> >
> > add a rcu_barrier before close_dpif_backer to ensure that
> > all meters has been freed before id_pool_destory meter's
> > id-pool.
> >
> > Signed-off-by: Peng He <hepeng.0320@bytedance.com>
>
> On the principle, the issue seems fixed.
> And I have been running successfully 'make check -C build-asan
> TESTSUITEFLAGS="-k meter -d"' in a loop for some time tonight.
>
> This usually fails after 1h on origin/master.
>
>
> I'll let a simplified version of the patch (see below) run for the
> whole night after sending this mail.
>
> Why a simplified patch? Because I don't think we need a new rcu api.
> This issue only occurs with ofproto dpif implementation.
> Adding a new api might encourage people to use it even if unneeded in
> more simpler places in OVS.
>
> I squashed your series into:
>
> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
> index 6601f2346..402cdcdf5 100644
> --- a/ofproto/ofproto-dpif.c
> +++ b/ofproto/ofproto-dpif.c
> @@ -1805,6 +1805,7 @@ destruct(struct ofproto *ofproto_, bool del)
>      struct rule_dpif *rule;
>      struct oftable *table;
>      struct ovs_list ams;
> +    struct seq *seq;
>
>      ofproto->backer->need_revalidate = REV_RECONFIGURE;
>      xlate_txn_start();
> @@ -1848,6 +1849,14 @@ destruct(struct ofproto *ofproto_, bool del)
>
>      seq_destroy(ofproto->ams_seq);
>
> +    /* Wait for all the meter rules to be destroyed. */
> +    seq = seq_create();
> +    ovsrcu_synchronize();
> +    seq_wait(seq, seq_read(seq));
> +    ovsrcu_postpone__((void (*)(void *))seq_change, seq);
> +    poll_block();
> +    seq_destroy(seq);
> +
>      close_dpif_backer(ofproto->backer, del);
>  }
>
> Let's hope it passes the night tests :-).
>
>
> --
> David Marchand
>
>
David Marchand April 12, 2022, 8:39 a.m. UTC | #5
On Thu, Apr 7, 2022 at 4:13 AM Peng He <xnhp0320@gmail.com> wrote:
>
> I understand there is no need to introduce a rcu api,
> but at least put these codes into a separate function ? :-)

No strong opinion.


>> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
>> index 6601f2346..402cdcdf5 100644
>> --- a/ofproto/ofproto-dpif.c
>> +++ b/ofproto/ofproto-dpif.c
>> @@ -1805,6 +1805,7 @@ destruct(struct ofproto *ofproto_, bool del)
>>      struct rule_dpif *rule;
>>      struct oftable *table;
>>      struct ovs_list ams;
>> +    struct seq *seq;
>>
>>      ofproto->backer->need_revalidate = REV_RECONFIGURE;
>>      xlate_txn_start();
>> @@ -1848,6 +1849,14 @@ destruct(struct ofproto *ofproto_, bool del)
>>
>>      seq_destroy(ofproto->ams_seq);
>>
>> +    /* Wait for all the meter rules to be destroyed. */
>> +    seq = seq_create();
>> +    ovsrcu_synchronize();
>> +    seq_wait(seq, seq_read(seq));
>> +    ovsrcu_postpone__((void (*)(void *))seq_change, seq);
>>
>> +    poll_block();
>
>
> after poll block, we should check if seq is changed or not, other fds will
> wake up block too. so we need a while loop to do the seq_wait + postpone + check
> stuff.
>
>> +    seq_destroy(seq);
>> +

If there can be other event nodes registered at this point, once you
leave poll_block() no event node is kept.
Adding to this, poll_block -> timepoll -> poll can fail (though I am
not clear it is something that is supposed to happen).

Once a first poll_block() returns, the only event that this code can
wait for is the seq changing.
There should be no need to wake up with a timeout, since there is no
useful work to do.

Am I missing something else?


In the end, that would translate to:

diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 6601f23464..dd1b9e5138 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -1805,6 +1805,8 @@ destruct(struct ofproto *ofproto_, bool del)
     struct rule_dpif *rule;
     struct oftable *table;
     struct ovs_list ams;
+    struct seq *seq;
+    uint64_t seqno;

     ofproto->backer->need_revalidate = REV_RECONFIGURE;
     xlate_txn_start();
@@ -1848,6 +1850,17 @@ destruct(struct ofproto *ofproto_, bool del)

     seq_destroy(ofproto->ams_seq);

+    /* Wait for all the meter rules to be destroyed. */
+    seq = seq_create();
+    seqno = seq_read();
+    ovsrcu_synchronize();
+    ovsrcu_postpone__((void (*)(void *))seq_change, seq);
+    do {
+        seq_wait(seq, seqno);
+        poll_block();
+    } while (seqno == seq_read(seq));
+    seq_destroy(seq);
+
     close_dpif_backer(ofproto->backer, del);
 }
David Marchand April 12, 2022, 8:48 a.m. UTC | #6
On Tue, Apr 12, 2022 at 4:17 AM Peng He <xnhp0320@gmail.com> wrote:
>
> Hi,
>
> with make check -C build-asan TESTSUITEFLAGS="-k meter -d"
> the ovs is build with asan enabled, is this feature already in current
> building system?
>
> I did not find it. Can anyone give some hint?
> Thanks !

I used a similar configuration to what is in run in GHA.

See .ci/linux-build.sh:

    # This will override default option configured in tests/atlocal.in.
    export ASAN_OPTIONS='detect_leaks=1'
    # -O2 generates few false-positive memory leak reports in test-ovsdb
    # application, so lowering optimizations to -O1 here.
    CFLAGS_ASAN="-O1 -fno-omit-frame-pointer -fno-common -fsanitize=address"
    CFLAGS_FOR_OVS="${CFLAGS_FOR_OVS} ${CFLAGS_ASAN}"

Then, those CFLAGS can be passed at ./configure time.
Peng He April 12, 2022, 9:09 a.m. UTC | #7
David Marchand <david.marchand@redhat.com> 于2022年4月12日周二 16:40写道:

> On Thu, Apr 7, 2022 at 4:13 AM Peng He <xnhp0320@gmail.com> wrote:
> >
> > I understand there is no need to introduce a rcu api,
> > but at least put these codes into a separate function ? :-)
>
> No strong opinion.
>
>
> >> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
> >> index 6601f2346..402cdcdf5 100644
> >> --- a/ofproto/ofproto-dpif.c
> >> +++ b/ofproto/ofproto-dpif.c
> >> @@ -1805,6 +1805,7 @@ destruct(struct ofproto *ofproto_, bool del)
> >>      struct rule_dpif *rule;
> >>      struct oftable *table;
> >>      struct ovs_list ams;
> >> +    struct seq *seq;
> >>
> >>      ofproto->backer->need_revalidate = REV_RECONFIGURE;
> >>      xlate_txn_start();
> >> @@ -1848,6 +1849,14 @@ destruct(struct ofproto *ofproto_, bool del)
> >>
> >>      seq_destroy(ofproto->ams_seq);
> >>
> >> +    /* Wait for all the meter rules to be destroyed. */
> >> +    seq = seq_create();
> >> +    ovsrcu_synchronize();
> >> +    seq_wait(seq, seq_read(seq));
> >> +    ovsrcu_postpone__((void (*)(void *))seq_change, seq);
> >>
> >> +    poll_block();
> >
> >
> > after poll block, we should check if seq is changed or not, other fds
> will
> > wake up block too. so we need a while loop to do the seq_wait + postpone
> + check
> > stuff.
> >
> >> +    seq_destroy(seq);
> >> +
>
> If there can be other event nodes registered at this point, once you
> leave poll_block() no event node is kept.
>

IMHO, this is how OVS event loop works.

even you lose all other event nodes after the first poll_block returns,
eventually (after you got seq changes) you are not blocking and running to
the next
poll_block point, and every modules in between will check fd in its XX_run
function,
if event is ready, do the work, when EAGAIN happens, the node is added back.

all the previous lost events will be added back eventually.

So it's ok to call poll_block anywhere. I think.

Adding to this, poll_block -> timepoll -> poll can fail (though I am
> not clear it is something that is supposed to happen).


why fail?

Once a first poll_block() returns, the only event that this code can
> wait for is the seq changing.
> There should be no need to wake up with a timeout, since there is no
> useful work to do.
>
> Am I missing something else?
>

yes, timeout is for the RCU api, if used here, I agree no need to add
timeout.

>
>
> In the end, that would translate to:
>
> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
> index 6601f23464..dd1b9e5138 100644
> --- a/ofproto/ofproto-dpif.c
> +++ b/ofproto/ofproto-dpif.c
> @@ -1805,6 +1805,8 @@ destruct(struct ofproto *ofproto_, bool del)
>      struct rule_dpif *rule;
>      struct oftable *table;
>      struct ovs_list ams;
> +    struct seq *seq;
> +    uint64_t seqno;
>
>      ofproto->backer->need_revalidate = REV_RECONFIGURE;
>      xlate_txn_start();
> @@ -1848,6 +1850,17 @@ destruct(struct ofproto *ofproto_, bool del)
>
>      seq_destroy(ofproto->ams_seq);
>
> +    /* Wait for all the meter rules to be destroyed. */
> +    seq = seq_create();
> +    seqno = seq_read();
> +    ovsrcu_synchronize();
> +    ovsrcu_postpone__((void (*)(void *))seq_change, seq);
> +    do {
> +        seq_wait(seq, seqno);
> +        poll_block();
> +    } while (seqno == seq_read(seq));
> +    seq_destroy(seq);
> +
>      close_dpif_backer(ofproto->backer, del);
>  }
>
>
> --
> David Marchand
>
>
Peng He April 12, 2022, 9:19 a.m. UTC | #8
Thanks!

David Marchand <david.marchand@redhat.com> 于2022年4月12日周二 16:49写道:

> On Tue, Apr 12, 2022 at 4:17 AM Peng He <xnhp0320@gmail.com> wrote:
> >
> > Hi,
> >
> > with make check -C build-asan TESTSUITEFLAGS="-k meter -d"
> > the ovs is build with asan enabled, is this feature already in current
> > building system?
> >
> > I did not find it. Can anyone give some hint?
> > Thanks !
>
> I used a similar configuration to what is in run in GHA.
>
> See .ci/linux-build.sh:
>
>     # This will override default option configured in tests/atlocal.in.
>     export ASAN_OPTIONS='detect_leaks=1'
>     # -O2 generates few false-positive memory leak reports in test-ovsdb
>     # application, so lowering optimizations to -O1 here.
>     CFLAGS_ASAN="-O1 -fno-omit-frame-pointer -fno-common
> -fsanitize=address"
>     CFLAGS_FOR_OVS="${CFLAGS_FOR_OVS} ${CFLAGS_ASAN}"
>
> Then, those CFLAGS can be passed at ./configure time.
>
>
> --
> David Marchand
>
>
Eelco Chaudron April 29, 2022, 10:54 a.m. UTC | #9
On 12 Apr 2022, at 11:09, Peng He wrote:

> David Marchand <david.marchand@redhat.com> 于2022年4月12日周二 16:40写道:
>
>> On Thu, Apr 7, 2022 at 4:13 AM Peng He <xnhp0320@gmail.com> wrote:
>>>
>>> I understand there is no need to introduce a rcu api,
>>> but at least put these codes into a separate function ? :-)
>>
>> No strong opinion.
>>
>>
>>>> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
>>>> index 6601f2346..402cdcdf5 100644
>>>> --- a/ofproto/ofproto-dpif.c
>>>> +++ b/ofproto/ofproto-dpif.c
>>>> @@ -1805,6 +1805,7 @@ destruct(struct ofproto *ofproto_, bool del)
>>>>      struct rule_dpif *rule;
>>>>      struct oftable *table;
>>>>      struct ovs_list ams;
>>>> +    struct seq *seq;
>>>>
>>>>      ofproto->backer->need_revalidate = REV_RECONFIGURE;
>>>>      xlate_txn_start();
>>>> @@ -1848,6 +1849,14 @@ destruct(struct ofproto *ofproto_, bool del)
>>>>
>>>>      seq_destroy(ofproto->ams_seq);
>>>>
>>>> +    /* Wait for all the meter rules to be destroyed. */
>>>> +    seq = seq_create();
>>>> +    ovsrcu_synchronize();
>>>> +    seq_wait(seq, seq_read(seq));
>>>> +    ovsrcu_postpone__((void (*)(void *))seq_change, seq);
>>>>
>>>> +    poll_block();
>>>
>>>
>>> after poll block, we should check if seq is changed or not, other fds
>> will
>>> wake up block too. so we need a while loop to do the seq_wait + postpone
>> + check
>>> stuff.
>>>
>>>> +    seq_destroy(seq);
>>>> +
>>
>> If there can be other event nodes registered at this point, once you
>> leave poll_block() no event node is kept.
>>
>
> IMHO, this is how OVS event loop works.
>
> even you lose all other event nodes after the first poll_block returns,
> eventually (after you got seq changes) you are not blocking and running to
> the next
> poll_block point, and every modules in between will check fd in its XX_run
> function,
> if event is ready, do the work, when EAGAIN happens, the node is added back.
>
> all the previous lost events will be added back eventually.
>
> So it's ok to call poll_block anywhere. I think.
>
> Adding to this, poll_block -> timepoll -> poll can fail (though I am
>> not clear it is something that is supposed to happen).
>
>
> why fail?
>
> Once a first poll_block() returns, the only event that this code can
>> wait for is the seq changing.
>> There should be no need to wake up with a timeout, since there is no
>> useful work to do.
>>
>> Am I missing something else?
>>
>
> yes, timeout is for the RCU api, if used here, I agree no need to add
> timeout.


Sorry for joining in late in the conversation, but I do not see any harm in introducing the new rcu API. It will be way cleaner, and if we add some good documentation (and maybe a checkpatch warning), I think we should be ok.

However looking at the implementation from patch 1, I see two problems (well they are is actually one):

+
+static void
+ovsrcu_barrier_func(void *seq_)
+{
+    struct seq *seq = (struct seq*)seq_;
+    seq_change(seq);
+}
+
+bool
+ovsrcu_barrier(struct seq *seq, long long int timeout)
+{
+    /* first let all threads flush their cbset */
+    ovsrcu_synchronize();
+
+    /* then register a new cbset, ensure this cbset
+     * is at the tail of global listi
+     */
+    uint64_t seqno = seq_read(seq);
+    ovsrcu_postpone__(ovsrcu_barrier_func, (void*)seq);
+    long long int now = time_msec();
+    long long int deadline = now + timeout;
+
+    do {
+        seq_wait(seq, seqno);
+        poll_timer_wait_until(deadline);
+        poll_block();
+        now = time_msec(); /* update now */
+    } while (seqno == seq_read(seq) && now < deadline);
+
+    return !(seqno == seq_read(seq));
+}

1) I do not like that you have to supply the seq as a parameter. The API should not take any parameters if you ask me.
2) The timeout is a problem here, if the timeout happens you will call seq_destroy(), so the later invocation of ovsrcu_barrier_func() would access invalid memory.

So I guess the API should look something like this (not tested, but also changed comments ;):

+void
+ovsrcu_barrier()
+{
+    struct seq *seq = seq_create();
+
+    /* First let all threads flush their cbset’s. */
+    ovsrcu_synchronize();
+
+    /* Then register a new cbset, ensure this cbset
+     * is at the tail of the global list.
+     */
+    uint64_t seqno = seq_read(seq);
+    ovsrcu_postpone__(ovsrcu_barrier_func, (void*)seq);
+
+    do {
+        seq_wait(seq, seqno);
+        poll_block();
+    } while (seqno == seq_read(seq));
+
+    seq_destroy(seq);
+}

Which you can then just use as (see comment changes ;):

+    /* Wait for all the meter destroy work to finish. */
+    ovsrcu_barrier();
     close_dpif_backer(ofproto->backer, del);
 }


>>
>>
>> In the end, that would translate to:
>>
>> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
>> index 6601f23464..dd1b9e5138 100644
>> --- a/ofproto/ofproto-dpif.c
>> +++ b/ofproto/ofproto-dpif.c
>> @@ -1805,6 +1805,8 @@ destruct(struct ofproto *ofproto_, bool del)
>>      struct rule_dpif *rule;
>>      struct oftable *table;
>>      struct ovs_list ams;
>> +    struct seq *seq;
>> +    uint64_t seqno;
>>
>>      ofproto->backer->need_revalidate = REV_RECONFIGURE;
>>      xlate_txn_start();
>> @@ -1848,6 +1850,17 @@ destruct(struct ofproto *ofproto_, bool del)
>>
>>      seq_destroy(ofproto->ams_seq);
>>
>> +    /* Wait for all the meter rules to be destroyed. */
>> +    seq = seq_create();
>> +    seqno = seq_read();
>> +    ovsrcu_synchronize();
>> +    ovsrcu_postpone__((void (*)(void *))seq_change, seq);
>> +    do {
>> +        seq_wait(seq, seqno);
>> +        poll_block();
>> +    } while (seqno == seq_read(seq));
>> +    seq_destroy(seq);
>> +
>>      close_dpif_backer(ofproto->backer, del);
>>  }
>>
>>
>> --
>> David Marchand
>>
>>
>
> -- 
> hepeng
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Peng He April 30, 2022, 2:47 a.m. UTC | #10
Hi, Eelco
Thanks for your comments.

Eelco Chaudron <echaudro@redhat.com> 于2022年4月29日周五 18:54写道:

>
>
> On 12 Apr 2022, at 11:09, Peng He wrote:
>
> > David Marchand <david.marchand@redhat.com> 于2022年4月12日周二 16:40写道:
> >
> >> On Thu, Apr 7, 2022 at 4:13 AM Peng He <xnhp0320@gmail.com> wrote:
> >>>
> >>> I understand there is no need to introduce a rcu api,
> >>> but at least put these codes into a separate function ? :-)
> >>
> >> No strong opinion.
> >>
> >>
> >>>> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
> >>>> index 6601f2346..402cdcdf5 100644
> >>>> --- a/ofproto/ofproto-dpif.c
> >>>> +++ b/ofproto/ofproto-dpif.c
> >>>> @@ -1805,6 +1805,7 @@ destruct(struct ofproto *ofproto_, bool del)
> >>>>      struct rule_dpif *rule;
> >>>>      struct oftable *table;
> >>>>      struct ovs_list ams;
> >>>> +    struct seq *seq;
> >>>>
> >>>>      ofproto->backer->need_revalidate = REV_RECONFIGURE;
> >>>>      xlate_txn_start();
> >>>> @@ -1848,6 +1849,14 @@ destruct(struct ofproto *ofproto_, bool del)
> >>>>
> >>>>      seq_destroy(ofproto->ams_seq);
> >>>>
> >>>> +    /* Wait for all the meter rules to be destroyed. */
> >>>> +    seq = seq_create();
> >>>> +    ovsrcu_synchronize();
> >>>> +    seq_wait(seq, seq_read(seq));
> >>>> +    ovsrcu_postpone__((void (*)(void *))seq_change, seq);
> >>>>
> >>>> +    poll_block();
> >>>
> >>>
> >>> after poll block, we should check if seq is changed or not, other fds
> >> will
> >>> wake up block too. so we need a while loop to do the seq_wait +
> postpone
> >> + check
> >>> stuff.
> >>>
> >>>> +    seq_destroy(seq);
> >>>> +
> >>
> >> If there can be other event nodes registered at this point, once you
> >> leave poll_block() no event node is kept.
> >>
> >
> > IMHO, this is how OVS event loop works.
> >
> > even you lose all other event nodes after the first poll_block returns,
> > eventually (after you got seq changes) you are not blocking and running
> to
> > the next
> > poll_block point, and every modules in between will check fd in its
> XX_run
> > function,
> > if event is ready, do the work, when EAGAIN happens, the node is added
> back.
> >
> > all the previous lost events will be added back eventually.
> >
> > So it's ok to call poll_block anywhere. I think.
> >
> > Adding to this, poll_block -> timepoll -> poll can fail (though I am
> >> not clear it is something that is supposed to happen).
> >
> >
> > why fail?
> >
> > Once a first poll_block() returns, the only event that this code can
> >> wait for is the seq changing.
> >> There should be no need to wake up with a timeout, since there is no
> >> useful work to do.
> >>
> >> Am I missing something else?
> >>
> >
> > yes, timeout is for the RCU api, if used here, I agree no need to add
> > timeout.
>
>
> Sorry for joining in late in the conversation, but I do not see any harm
> in introducing the new rcu API. It will be way cleaner, and if we add some
> good documentation (and maybe a checkpatch warning), I think we should be
> ok.


the checkpatch warning is about ?
an experimental API for RCU?



>
However looking at the implementation from patch 1, I see two problems
> (well they are is actually one):
>
> +
> +static void
> +ovsrcu_barrier_func(void *seq_)
> +{
> +    struct seq *seq = (struct seq*)seq_;
> +    seq_change(seq);
> +}
> +
> +bool
> +ovsrcu_barrier(struct seq *seq, long long int timeout)
> +{
> +    /* first let all threads flush their cbset */
> +    ovsrcu_synchronize();
> +
> +    /* then register a new cbset, ensure this cbset
> +     * is at the tail of global listi
> +     */
> +    uint64_t seqno = seq_read(seq);
> +    ovsrcu_postpone__(ovsrcu_barrier_func, (void*)seq);
> +    long long int now = time_msec();
> +    long long int deadline = now + timeout;
> +
> +    do {
> +        seq_wait(seq, seqno);
> +        poll_timer_wait_until(deadline);
> +        poll_block();
> +        now = time_msec(); /* update now */
> +    } while (seqno == seq_read(seq) && now < deadline);
> +
> +    return !(seqno == seq_read(seq));
> +}
>
> 1) I do not like that you have to supply the seq as a parameter. The API
> should not take any parameters if you ask me.
> 2) The timeout is a problem here, if the timeout happens you will call
> seq_destroy(), so the later invocation of ovsrcu_barrier_func() would
> access invalid memory.
>
Agree, will send a new version.



>
> So I guess the API should look something like this (not tested, but also
> changed comments ;):
>
> +void
> +ovsrcu_barrier()
> +{
> +    struct seq *seq = seq_create();
> +
> +    /* First let all threads flush their cbset’s. */
> +    ovsrcu_synchronize();
> +
> +    /* Then register a new cbset, ensure this cbset
> +     * is at the tail of the global list.
> +     */
> +    uint64_t seqno = seq_read(seq);
> +    ovsrcu_postpone__(ovsrcu_barrier_func, (void*)seq);
> +
> +    do {
> +        seq_wait(seq, seqno);
> +        poll_block();
> +    } while (seqno == seq_read(seq));
> +
> +    seq_destroy(seq);
> +}
>
> Which you can then just use as (see comment changes ;):
>
> +    /* Wait for all the meter destroy work to finish. */
> +    ovsrcu_barrier();
>      close_dpif_backer(ofproto->backer, del);
>  }
>
>
> >>
> >>
> >> In the end, that would translate to:
> >>
> >> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
> >> index 6601f23464..dd1b9e5138 100644
> >> --- a/ofproto/ofproto-dpif.c
> >> +++ b/ofproto/ofproto-dpif.c
> >> @@ -1805,6 +1805,8 @@ destruct(struct ofproto *ofproto_, bool del)
> >>      struct rule_dpif *rule;
> >>      struct oftable *table;
> >>      struct ovs_list ams;
> >> +    struct seq *seq;
> >> +    uint64_t seqno;
> >>
> >>      ofproto->backer->need_revalidate = REV_RECONFIGURE;
> >>      xlate_txn_start();
> >> @@ -1848,6 +1850,17 @@ destruct(struct ofproto *ofproto_, bool del)
> >>
> >>      seq_destroy(ofproto->ams_seq);
> >>
> >> +    /* Wait for all the meter rules to be destroyed. */
> >> +    seq = seq_create();
> >> +    seqno = seq_read();
> >> +    ovsrcu_synchronize();
> >> +    ovsrcu_postpone__((void (*)(void *))seq_change, seq);
> >> +    do {
> >> +        seq_wait(seq, seqno);
> >> +        poll_block();
> >> +    } while (seqno == seq_read(seq));
> >> +    seq_destroy(seq);
> >> +
> >>      close_dpif_backer(ofproto->backer, del);
> >>  }
> >>
> >>
> >> --
> >> David Marchand
> >>
> >>
> >
> > --
> > hepeng
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
diff mbox series

Patch

diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index a4c44052d..1739544e5 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -1797,6 +1797,16 @@  add_internal_flows(struct ofproto_dpif *ofproto)
     return error;
 }
 
+static void
+ofproto_rcu_barrier(void)
+{
+    struct seq *seq = seq_create();
+    while(!ovsrcu_barrier(seq, 1000)) {
+        /* do nothing */
+    }
+    seq_destroy(seq);
+}
+
 static void
 destruct(struct ofproto *ofproto_, bool del)
 {
@@ -1848,6 +1858,9 @@  destruct(struct ofproto *ofproto_, bool del)
 
     seq_destroy(ofproto->ams_seq);
 
+    /* wait for all the meter destroy work finished
+     */
+    ofproto_rcu_barrier();
     close_dpif_backer(ofproto->backer, del);
 }