diff mbox series

[ovs-dev,PATCHv2] connmgr: Check nullptr inside ofmonitor_report()

Message ID 1613596145-3110-1-git-send-email-pkusunyifeng@gmail.com
State Accepted
Headers show
Series [ovs-dev,PATCHv2] connmgr: Check nullptr inside ofmonitor_report() | expand

Commit Message

Yifeng Sun Feb. 17, 2021, 9:09 p.m. UTC
ovs-vswitchd could crash under these circumstances:
1. When one bridge is being destroyed, ofproto_destroy() is called and
connmgr pointer of its ofproto struct is nullified. This ofproto struct is
deallocated through 'ovsrcu_postpone(ofproto_destroy_defer__, p);'.
2. Before RCU enters quiesce state to actually free this ofproto struct,
revalidator thread calls udpif_revalidator(), which could handle
a learn flow and calls ofproto_flow_mod_learn(), it later calls
ofmonitor_report() and ofproto struct's connmgr pointer is accessed.

The crash stack trace is shown below:

0  ofmonitor_report (mgr=0x0, rule=rule@entry=0x7fa4ac067c30, event=event@entry=NXFME_ADDED,
    reason=reason@entry=OFPRR_IDLE_TIMEOUT, abbrev_ofconn=0x0, abbrev_xid=0, old_actions=old_actions@entry=0x0)
    at ofproto/connmgr.c:2160
1  0x00007fa4d6803495 in add_flow_finish (ofproto=0x55d9075d4ab0, ofm=<optimized out>, req=req@entry=0x0)
    at ofproto/ofproto.c:5221
2  0x00007fa4d68036af in modify_flows_finish (req=0x0, ofm=0x7fa4980753f0, ofproto=0x55d9075d4ab0)
    at ofproto/ofproto.c:5823
3  ofproto_flow_mod_finish (ofproto=0x55d9075d4ab0, ofm=ofm@entry=0x7fa4980753f0, req=req@entry=0x0)
    at ofproto/ofproto.c:8088
4  0x00007fa4d680372d in ofproto_flow_mod_learn_finish (ofm=ofm@entry=0x7fa4980753f0,
    orig_ofproto=orig_ofproto@entry=0x0) at ofproto/ofproto.c:5439
5  0x00007fa4d68072f9 in ofproto_flow_mod_learn (ofm=0x7fa4980753f0, keep_ref=keep_ref@entry=true,
    limit=<optimized out>, below_limitp=below_limitp@entry=0x0) at ofproto/ofproto.c:5499
6  0x00007fa4d6835d33 in xlate_push_stats_entry (entry=0x7fa498012448, stats=stats@entry=0x7fa4d2701a10,
    offloaded=offloaded@entry=false) at ofproto/ofproto-dpif-xlate-cache.c:127
7  0x00007fa4d6835e3a in xlate_push_stats (xcache=<optimized out>, stats=stats@entry=0x7fa4d2701a10,
    offloaded=offloaded@entry=false) at ofproto/ofproto-dpif-xlate-cache.c:181
8  0x00007fa4d6822046 in revalidate_ukey (udpif=udpif@entry=0x55d90760b240, ukey=ukey@entry=0x7fa4b0191660,
    stats=stats@entry=0x7fa4d2705118, odp_actions=odp_actions@entry=0x7fa4d2701b50,
    reval_seq=reval_seq@entry=5655486242, recircs=recircs@entry=0x7fa4d2701b40, offloaded=false)
    at ofproto/ofproto-dpif-upcall.c:2294
9  0x00007fa4d6825aee in revalidate (revalidator=0x55d90769dd00) at ofproto/ofproto-dpif-upcall.c:2683
10 0x00007fa4d6825cf3 in udpif_revalidator (arg=0x55d90769dd00) at ofproto/ofproto-dpif-upcall.c:936
11 0x00007fa4d6259c9f in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:423
12 0x00007fa4d582cea5 in start_thread () from /usr/lib64/libpthread.so.0
13 0x00007fa4d504b96d in clone () from /usr/lib64/libc.so.6

At the time of crash, the involved ofproto was already deallocated:

(gdb) print *ofproto
$1 = ..., name = 0x55d907602820 "nsx-managed", ..., ports = {...,
    one = 0x0, mask = 63, n = 0}, ..., connmgr = 0x0, ...

This patch fixes it.

VMware-BZ: #2700626
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
---
v1->v2: Add check for ofmonitor_flush, thanks William.

 ofproto/connmgr.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

William Tu Feb. 19, 2021, 12:09 a.m. UTC | #1
On Wed, Feb 17, 2021 at 1:09 PM Yifeng Sun <pkusunyifeng@gmail.com> wrote:
>
> ovs-vswitchd could crash under these circumstances:
> 1. When one bridge is being destroyed, ofproto_destroy() is called and
> connmgr pointer of its ofproto struct is nullified. This ofproto struct is
> deallocated through 'ovsrcu_postpone(ofproto_destroy_defer__, p);'.
> 2. Before RCU enters quiesce state to actually free this ofproto struct,
> revalidator thread calls udpif_revalidator(), which could handle
> a learn flow and calls ofproto_flow_mod_learn(), it later calls
> ofmonitor_report() and ofproto struct's connmgr pointer is accessed.
>
> The crash stack trace is shown below:
>
> 0  ofmonitor_report (mgr=0x0, rule=rule@entry=0x7fa4ac067c30, event=event@entry=NXFME_ADDED,
>     reason=reason@entry=OFPRR_IDLE_TIMEOUT, abbrev_ofconn=0x0, abbrev_xid=0, old_actions=old_actions@entry=0x0)
>     at ofproto/connmgr.c:2160
> 1  0x00007fa4d6803495 in add_flow_finish (ofproto=0x55d9075d4ab0, ofm=<optimized out>, req=req@entry=0x0)
>     at ofproto/ofproto.c:5221
> 2  0x00007fa4d68036af in modify_flows_finish (req=0x0, ofm=0x7fa4980753f0, ofproto=0x55d9075d4ab0)
>     at ofproto/ofproto.c:5823
> 3  ofproto_flow_mod_finish (ofproto=0x55d9075d4ab0, ofm=ofm@entry=0x7fa4980753f0, req=req@entry=0x0)
>     at ofproto/ofproto.c:8088
> 4  0x00007fa4d680372d in ofproto_flow_mod_learn_finish (ofm=ofm@entry=0x7fa4980753f0,
>     orig_ofproto=orig_ofproto@entry=0x0) at ofproto/ofproto.c:5439
> 5  0x00007fa4d68072f9 in ofproto_flow_mod_learn (ofm=0x7fa4980753f0, keep_ref=keep_ref@entry=true,
>     limit=<optimized out>, below_limitp=below_limitp@entry=0x0) at ofproto/ofproto.c:5499
> 6  0x00007fa4d6835d33 in xlate_push_stats_entry (entry=0x7fa498012448, stats=stats@entry=0x7fa4d2701a10,
>     offloaded=offloaded@entry=false) at ofproto/ofproto-dpif-xlate-cache.c:127
> 7  0x00007fa4d6835e3a in xlate_push_stats (xcache=<optimized out>, stats=stats@entry=0x7fa4d2701a10,
>     offloaded=offloaded@entry=false) at ofproto/ofproto-dpif-xlate-cache.c:181
> 8  0x00007fa4d6822046 in revalidate_ukey (udpif=udpif@entry=0x55d90760b240, ukey=ukey@entry=0x7fa4b0191660,
>     stats=stats@entry=0x7fa4d2705118, odp_actions=odp_actions@entry=0x7fa4d2701b50,
>     reval_seq=reval_seq@entry=5655486242, recircs=recircs@entry=0x7fa4d2701b40, offloaded=false)
>     at ofproto/ofproto-dpif-upcall.c:2294
> 9  0x00007fa4d6825aee in revalidate (revalidator=0x55d90769dd00) at ofproto/ofproto-dpif-upcall.c:2683
> 10 0x00007fa4d6825cf3 in udpif_revalidator (arg=0x55d90769dd00) at ofproto/ofproto-dpif-upcall.c:936
> 11 0x00007fa4d6259c9f in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:423
> 12 0x00007fa4d582cea5 in start_thread () from /usr/lib64/libpthread.so.0
> 13 0x00007fa4d504b96d in clone () from /usr/lib64/libc.so.6
>
> At the time of crash, the involved ofproto was already deallocated:
>
> (gdb) print *ofproto
> $1 = ..., name = 0x55d907602820 "nsx-managed", ..., ports = {...,
>     one = 0x0, mask = 63, n = 0}, ..., connmgr = 0x0, ...
>
> This patch fixes it.
>
> VMware-BZ: #2700626
> Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
> ---
> v1->v2: Add check for ofmonitor_flush, thanks William.
>

LGTM, thanks.
Acked-by: William Tu < u9012063@gmail.com>

CC Ilya and Ben to see if any comments.

I feel this kind of RCU issue is hard to find out, and
existing tools such as addressSanitizer are usually
not helpful.

William

>  ofproto/connmgr.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c
> index 9c5c633b4171..fa8f6cd0e83a 100644
> --- a/ofproto/connmgr.c
> +++ b/ofproto/connmgr.c
> @@ -2140,7 +2140,7 @@ ofmonitor_report(struct connmgr *mgr, struct rule *rule,
>                   const struct rule_actions *old_actions)
>      OVS_REQUIRES(ofproto_mutex)
>  {
> -    if (rule_is_hidden(rule)) {
> +    if (!mgr || rule_is_hidden(rule)) {
>          return;
>      }
>
> @@ -2244,6 +2244,10 @@ ofmonitor_flush(struct connmgr *mgr)
>  {
>      struct ofconn *ofconn;
>
> +    if (!mgr) {
> +        return;
> +    }
> +
>      LIST_FOR_EACH (ofconn, connmgr_node, &mgr->conns) {
>          struct rconn_packet_counter *counter = ofconn->monitor_counter;
>
> --
> 2.7.4
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
贺鹏 Feb. 19, 2021, 2:12 a.m. UTC | #2
Hi,

Looks like this bug is caused by violating the fact that if a rule is
referenced, the related ofproto should not be destroyed.

If so, I have a patch that also fixes the problem, not sure if this helps.

http://patchwork.ozlabs.org/project/openvswitch/patch/20200717015041.82746-1-hepeng.0320@bytedance.com/

William Tu <u9012063@gmail.com> 于2021年2月19日周五 上午8:10写道:
>
> On Wed, Feb 17, 2021 at 1:09 PM Yifeng Sun <pkusunyifeng@gmail.com> wrote:
> >
> > ovs-vswitchd could crash under these circumstances:
> > 1. When one bridge is being destroyed, ofproto_destroy() is called and
> > connmgr pointer of its ofproto struct is nullified. This ofproto struct is
> > deallocated through 'ovsrcu_postpone(ofproto_destroy_defer__, p);'.
> > 2. Before RCU enters quiesce state to actually free this ofproto struct,
> > revalidator thread calls udpif_revalidator(), which could handle
> > a learn flow and calls ofproto_flow_mod_learn(), it later calls
> > ofmonitor_report() and ofproto struct's connmgr pointer is accessed.
> >
> > The crash stack trace is shown below:
> >
> > 0  ofmonitor_report (mgr=0x0, rule=rule@entry=0x7fa4ac067c30, event=event@entry=NXFME_ADDED,
> >     reason=reason@entry=OFPRR_IDLE_TIMEOUT, abbrev_ofconn=0x0, abbrev_xid=0, old_actions=old_actions@entry=0x0)
> >     at ofproto/connmgr.c:2160
> > 1  0x00007fa4d6803495 in add_flow_finish (ofproto=0x55d9075d4ab0, ofm=<optimized out>, req=req@entry=0x0)
> >     at ofproto/ofproto.c:5221
> > 2  0x00007fa4d68036af in modify_flows_finish (req=0x0, ofm=0x7fa4980753f0, ofproto=0x55d9075d4ab0)
> >     at ofproto/ofproto.c:5823
> > 3  ofproto_flow_mod_finish (ofproto=0x55d9075d4ab0, ofm=ofm@entry=0x7fa4980753f0, req=req@entry=0x0)
> >     at ofproto/ofproto.c:8088
> > 4  0x00007fa4d680372d in ofproto_flow_mod_learn_finish (ofm=ofm@entry=0x7fa4980753f0,
> >     orig_ofproto=orig_ofproto@entry=0x0) at ofproto/ofproto.c:5439
> > 5  0x00007fa4d68072f9 in ofproto_flow_mod_learn (ofm=0x7fa4980753f0, keep_ref=keep_ref@entry=true,
> >     limit=<optimized out>, below_limitp=below_limitp@entry=0x0) at ofproto/ofproto.c:5499
> > 6  0x00007fa4d6835d33 in xlate_push_stats_entry (entry=0x7fa498012448, stats=stats@entry=0x7fa4d2701a10,
> >     offloaded=offloaded@entry=false) at ofproto/ofproto-dpif-xlate-cache.c:127
> > 7  0x00007fa4d6835e3a in xlate_push_stats (xcache=<optimized out>, stats=stats@entry=0x7fa4d2701a10,
> >     offloaded=offloaded@entry=false) at ofproto/ofproto-dpif-xlate-cache.c:181
> > 8  0x00007fa4d6822046 in revalidate_ukey (udpif=udpif@entry=0x55d90760b240, ukey=ukey@entry=0x7fa4b0191660,
> >     stats=stats@entry=0x7fa4d2705118, odp_actions=odp_actions@entry=0x7fa4d2701b50,
> >     reval_seq=reval_seq@entry=5655486242, recircs=recircs@entry=0x7fa4d2701b40, offloaded=false)
> >     at ofproto/ofproto-dpif-upcall.c:2294
> > 9  0x00007fa4d6825aee in revalidate (revalidator=0x55d90769dd00) at ofproto/ofproto-dpif-upcall.c:2683
> > 10 0x00007fa4d6825cf3 in udpif_revalidator (arg=0x55d90769dd00) at ofproto/ofproto-dpif-upcall.c:936
> > 11 0x00007fa4d6259c9f in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:423
> > 12 0x00007fa4d582cea5 in start_thread () from /usr/lib64/libpthread.so.0
> > 13 0x00007fa4d504b96d in clone () from /usr/lib64/libc.so.6
> >
> > At the time of crash, the involved ofproto was already deallocated:
> >
> > (gdb) print *ofproto
> > $1 = ..., name = 0x55d907602820 "nsx-managed", ..., ports = {...,
> >     one = 0x0, mask = 63, n = 0}, ..., connmgr = 0x0, ...
> >
> > This patch fixes it.
> >
> > VMware-BZ: #2700626
> > Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
> > ---
> > v1->v2: Add check for ofmonitor_flush, thanks William.
> >
>
> LGTM, thanks.
> Acked-by: William Tu < u9012063@gmail.com>
>
> CC Ilya and Ben to see if any comments.
>
> I feel this kind of RCU issue is hard to find out, and
> existing tools such as addressSanitizer are usually
> not helpful.
>
> William
>
> >  ofproto/connmgr.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c
> > index 9c5c633b4171..fa8f6cd0e83a 100644
> > --- a/ofproto/connmgr.c
> > +++ b/ofproto/connmgr.c
> > @@ -2140,7 +2140,7 @@ ofmonitor_report(struct connmgr *mgr, struct rule *rule,
> >                   const struct rule_actions *old_actions)
> >      OVS_REQUIRES(ofproto_mutex)
> >  {
> > -    if (rule_is_hidden(rule)) {
> > +    if (!mgr || rule_is_hidden(rule)) {
> >          return;
> >      }
> >
> > @@ -2244,6 +2244,10 @@ ofmonitor_flush(struct connmgr *mgr)
> >  {
> >      struct ofconn *ofconn;
> >
> > +    if (!mgr) {
> > +        return;
> > +    }
> > +
> >      LIST_FOR_EACH (ofconn, connmgr_node, &mgr->conns) {
> >          struct rconn_packet_counter *counter = ofconn->monitor_counter;
> >
> > --
> > 2.7.4
> >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Ilya Maximets Feb. 19, 2021, 11:19 a.m. UTC | #3
On 2/19/21 3:12 AM, 贺鹏 wrote:
> Hi,
> 
> Looks like this bug is caused by violating the fact that if a rule is
> referenced, the related ofproto should not be destroyed.
> 
> If so, I have a patch that also fixes the problem, not sure if this helps.
> 
> http://patchwork.ozlabs.org/project/openvswitch/patch/20200717015041.82746-1-hepeng.0320@bytedance.com/

There is at least one more problem that is not strictly related but
in more or less the same part of the code:
  https://mail.openvswitch.org/pipermail/ovs-dev/2021-February/380582.html

> 
> William Tu <u9012063@gmail.com> 于2021年2月19日周五 上午8:10写道:
>>
>> On Wed, Feb 17, 2021 at 1:09 PM Yifeng Sun <pkusunyifeng@gmail.com> wrote:
>>>
>>> ovs-vswitchd could crash under these circumstances:
>>> 1. When one bridge is being destroyed, ofproto_destroy() is called and
>>> connmgr pointer of its ofproto struct is nullified. This ofproto struct is
>>> deallocated through 'ovsrcu_postpone(ofproto_destroy_defer__, p);'.
>>> 2. Before RCU enters quiesce state to actually free this ofproto struct,
>>> revalidator thread calls udpif_revalidator(), which could handle
>>> a learn flow and calls ofproto_flow_mod_learn(), it later calls
>>> ofmonitor_report() and ofproto struct's connmgr pointer is accessed.
>>>
>>> The crash stack trace is shown below:
>>>
>>> 0  ofmonitor_report (mgr=0x0, rule=rule@entry=0x7fa4ac067c30, event=event@entry=NXFME_ADDED,
>>>     reason=reason@entry=OFPRR_IDLE_TIMEOUT, abbrev_ofconn=0x0, abbrev_xid=0, old_actions=old_actions@entry=0x0)
>>>     at ofproto/connmgr.c:2160
>>> 1  0x00007fa4d6803495 in add_flow_finish (ofproto=0x55d9075d4ab0, ofm=<optimized out>, req=req@entry=0x0)
>>>     at ofproto/ofproto.c:5221
>>> 2  0x00007fa4d68036af in modify_flows_finish (req=0x0, ofm=0x7fa4980753f0, ofproto=0x55d9075d4ab0)
>>>     at ofproto/ofproto.c:5823
>>> 3  ofproto_flow_mod_finish (ofproto=0x55d9075d4ab0, ofm=ofm@entry=0x7fa4980753f0, req=req@entry=0x0)
>>>     at ofproto/ofproto.c:8088
>>> 4  0x00007fa4d680372d in ofproto_flow_mod_learn_finish (ofm=ofm@entry=0x7fa4980753f0,
>>>     orig_ofproto=orig_ofproto@entry=0x0) at ofproto/ofproto.c:5439
>>> 5  0x00007fa4d68072f9 in ofproto_flow_mod_learn (ofm=0x7fa4980753f0, keep_ref=keep_ref@entry=true,
>>>     limit=<optimized out>, below_limitp=below_limitp@entry=0x0) at ofproto/ofproto.c:5499
>>> 6  0x00007fa4d6835d33 in xlate_push_stats_entry (entry=0x7fa498012448, stats=stats@entry=0x7fa4d2701a10,
>>>     offloaded=offloaded@entry=false) at ofproto/ofproto-dpif-xlate-cache.c:127
>>> 7  0x00007fa4d6835e3a in xlate_push_stats (xcache=<optimized out>, stats=stats@entry=0x7fa4d2701a10,
>>>     offloaded=offloaded@entry=false) at ofproto/ofproto-dpif-xlate-cache.c:181
>>> 8  0x00007fa4d6822046 in revalidate_ukey (udpif=udpif@entry=0x55d90760b240, ukey=ukey@entry=0x7fa4b0191660,
>>>     stats=stats@entry=0x7fa4d2705118, odp_actions=odp_actions@entry=0x7fa4d2701b50,
>>>     reval_seq=reval_seq@entry=5655486242, recircs=recircs@entry=0x7fa4d2701b40, offloaded=false)
>>>     at ofproto/ofproto-dpif-upcall.c:2294
>>> 9  0x00007fa4d6825aee in revalidate (revalidator=0x55d90769dd00) at ofproto/ofproto-dpif-upcall.c:2683
>>> 10 0x00007fa4d6825cf3 in udpif_revalidator (arg=0x55d90769dd00) at ofproto/ofproto-dpif-upcall.c:936
>>> 11 0x00007fa4d6259c9f in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:423
>>> 12 0x00007fa4d582cea5 in start_thread () from /usr/lib64/libpthread.so.0
>>> 13 0x00007fa4d504b96d in clone () from /usr/lib64/libc.so.6
>>>
>>> At the time of crash, the involved ofproto was already deallocated:
>>>
>>> (gdb) print *ofproto
>>> $1 = ..., name = 0x55d907602820 "nsx-managed", ..., ports = {...,
>>>     one = 0x0, mask = 63, n = 0}, ..., connmgr = 0x0, ...
>>>
>>> This patch fixes it.
>>>
>>> VMware-BZ: #2700626
>>> Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
>>> ---
>>> v1->v2: Add check for ofmonitor_flush, thanks William.
>>>
>>
>> LGTM, thanks.
>> Acked-by: William Tu < u9012063@gmail.com>
>>
>> CC Ilya and Ben to see if any comments.
>>
>> I feel this kind of RCU issue is hard to find out, and
>> existing tools such as addressSanitizer are usually
>> not helpful.
>>
>> William
>>
>>>  ofproto/connmgr.c | 6 +++++-
>>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c
>>> index 9c5c633b4171..fa8f6cd0e83a 100644
>>> --- a/ofproto/connmgr.c
>>> +++ b/ofproto/connmgr.c
>>> @@ -2140,7 +2140,7 @@ ofmonitor_report(struct connmgr *mgr, struct rule *rule,
>>>                   const struct rule_actions *old_actions)
>>>      OVS_REQUIRES(ofproto_mutex)
>>>  {
>>> -    if (rule_is_hidden(rule)) {
>>> +    if (!mgr || rule_is_hidden(rule)) {
>>>          return;
>>>      }
>>>
>>> @@ -2244,6 +2244,10 @@ ofmonitor_flush(struct connmgr *mgr)
>>>  {
>>>      struct ofconn *ofconn;
>>>
>>> +    if (!mgr) {
>>> +        return;
>>> +    }
>>> +
>>>      LIST_FOR_EACH (ofconn, connmgr_node, &mgr->conns) {
>>>          struct rconn_packet_counter *counter = ofconn->monitor_counter;
>>>
>>> --
>>> 2.7.4
>>>
>>> _______________________________________________
>>> dev mailing list
>>> dev@openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 
> 
>
贺鹏 Feb. 20, 2021, 2:29 a.m. UTC | #4
Hi, Ilya

Ilya Maximets <i.maximets@ovn.org> 于2021年2月19日周五 下午7:19写道:
>
> On 2/19/21 3:12 AM, 贺鹏 wrote:
> > Hi,
> >
> > Looks like this bug is caused by violating the fact that if a rule is
> > referenced, the related ofproto should not be destroyed.
> >
> > If so, I have a patch that also fixes the problem, not sure if this helps.
> >
> > http://patchwork.ozlabs.org/project/openvswitch/patch/20200717015041.82746-1-hepeng.0320@bytedance.com/
>
> There is at least one more problem that is not strictly related but
> in more or less the same part of the code:
>   https://mail.openvswitch.org/pipermail/ovs-dev/2021-February/380582.html

So maybe before add it into *ovsrcu_postpone*, we should add refcount
of ofproto also?


>
> >
> > William Tu <u9012063@gmail.com> 于2021年2月19日周五 上午8:10写道:
> >>
> >> On Wed, Feb 17, 2021 at 1:09 PM Yifeng Sun <pkusunyifeng@gmail.com> wrote:
> >>>
> >>> ovs-vswitchd could crash under these circumstances:
> >>> 1. When one bridge is being destroyed, ofproto_destroy() is called and
> >>> connmgr pointer of its ofproto struct is nullified. This ofproto struct is
> >>> deallocated through 'ovsrcu_postpone(ofproto_destroy_defer__, p);'.
> >>> 2. Before RCU enters quiesce state to actually free this ofproto struct,
> >>> revalidator thread calls udpif_revalidator(), which could handle
> >>> a learn flow and calls ofproto_flow_mod_learn(), it later calls
> >>> ofmonitor_report() and ofproto struct's connmgr pointer is accessed.
> >>>
> >>> The crash stack trace is shown below:
> >>>
> >>> 0  ofmonitor_report (mgr=0x0, rule=rule@entry=0x7fa4ac067c30, event=event@entry=NXFME_ADDED,
> >>>     reason=reason@entry=OFPRR_IDLE_TIMEOUT, abbrev_ofconn=0x0, abbrev_xid=0, old_actions=old_actions@entry=0x0)
> >>>     at ofproto/connmgr.c:2160
> >>> 1  0x00007fa4d6803495 in add_flow_finish (ofproto=0x55d9075d4ab0, ofm=<optimized out>, req=req@entry=0x0)
> >>>     at ofproto/ofproto.c:5221
> >>> 2  0x00007fa4d68036af in modify_flows_finish (req=0x0, ofm=0x7fa4980753f0, ofproto=0x55d9075d4ab0)
> >>>     at ofproto/ofproto.c:5823
> >>> 3  ofproto_flow_mod_finish (ofproto=0x55d9075d4ab0, ofm=ofm@entry=0x7fa4980753f0, req=req@entry=0x0)
> >>>     at ofproto/ofproto.c:8088
> >>> 4  0x00007fa4d680372d in ofproto_flow_mod_learn_finish (ofm=ofm@entry=0x7fa4980753f0,
> >>>     orig_ofproto=orig_ofproto@entry=0x0) at ofproto/ofproto.c:5439
> >>> 5  0x00007fa4d68072f9 in ofproto_flow_mod_learn (ofm=0x7fa4980753f0, keep_ref=keep_ref@entry=true,
> >>>     limit=<optimized out>, below_limitp=below_limitp@entry=0x0) at ofproto/ofproto.c:5499
> >>> 6  0x00007fa4d6835d33 in xlate_push_stats_entry (entry=0x7fa498012448, stats=stats@entry=0x7fa4d2701a10,
> >>>     offloaded=offloaded@entry=false) at ofproto/ofproto-dpif-xlate-cache.c:127
> >>> 7  0x00007fa4d6835e3a in xlate_push_stats (xcache=<optimized out>, stats=stats@entry=0x7fa4d2701a10,
> >>>     offloaded=offloaded@entry=false) at ofproto/ofproto-dpif-xlate-cache.c:181
> >>> 8  0x00007fa4d6822046 in revalidate_ukey (udpif=udpif@entry=0x55d90760b240, ukey=ukey@entry=0x7fa4b0191660,
> >>>     stats=stats@entry=0x7fa4d2705118, odp_actions=odp_actions@entry=0x7fa4d2701b50,
> >>>     reval_seq=reval_seq@entry=5655486242, recircs=recircs@entry=0x7fa4d2701b40, offloaded=false)
> >>>     at ofproto/ofproto-dpif-upcall.c:2294
> >>> 9  0x00007fa4d6825aee in revalidate (revalidator=0x55d90769dd00) at ofproto/ofproto-dpif-upcall.c:2683
> >>> 10 0x00007fa4d6825cf3 in udpif_revalidator (arg=0x55d90769dd00) at ofproto/ofproto-dpif-upcall.c:936
> >>> 11 0x00007fa4d6259c9f in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:423
> >>> 12 0x00007fa4d582cea5 in start_thread () from /usr/lib64/libpthread.so.0
> >>> 13 0x00007fa4d504b96d in clone () from /usr/lib64/libc.so.6
> >>>
> >>> At the time of crash, the involved ofproto was already deallocated:
> >>>
> >>> (gdb) print *ofproto
> >>> $1 = ..., name = 0x55d907602820 "nsx-managed", ..., ports = {...,
> >>>     one = 0x0, mask = 63, n = 0}, ..., connmgr = 0x0, ...
> >>>
> >>> This patch fixes it.
> >>>
> >>> VMware-BZ: #2700626
> >>> Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
> >>> ---
> >>> v1->v2: Add check for ofmonitor_flush, thanks William.
> >>>
> >>
> >> LGTM, thanks.
> >> Acked-by: William Tu < u9012063@gmail.com>
> >>
> >> CC Ilya and Ben to see if any comments.
> >>
> >> I feel this kind of RCU issue is hard to find out, and
> >> existing tools such as addressSanitizer are usually
> >> not helpful.
> >>
> >> William
> >>
> >>>  ofproto/connmgr.c | 6 +++++-
> >>>  1 file changed, 5 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c
> >>> index 9c5c633b4171..fa8f6cd0e83a 100644
> >>> --- a/ofproto/connmgr.c
> >>> +++ b/ofproto/connmgr.c
> >>> @@ -2140,7 +2140,7 @@ ofmonitor_report(struct connmgr *mgr, struct rule *rule,
> >>>                   const struct rule_actions *old_actions)
> >>>      OVS_REQUIRES(ofproto_mutex)
> >>>  {
> >>> -    if (rule_is_hidden(rule)) {
> >>> +    if (!mgr || rule_is_hidden(rule)) {
> >>>          return;
> >>>      }
> >>>
> >>> @@ -2244,6 +2244,10 @@ ofmonitor_flush(struct connmgr *mgr)
> >>>  {
> >>>      struct ofconn *ofconn;
> >>>
> >>> +    if (!mgr) {
> >>> +        return;
> >>> +    }
> >>> +
> >>>      LIST_FOR_EACH (ofconn, connmgr_node, &mgr->conns) {
> >>>          struct rconn_packet_counter *counter = ofconn->monitor_counter;
> >>>
> >>> --
> >>> 2.7.4
> >>>
> >>> _______________________________________________
> >>> dev mailing list
> >>> dev@openvswitch.org
> >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >> _______________________________________________
> >> dev mailing list
> >> dev@openvswitch.org
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
> >
> >
>
William Tu Feb. 21, 2021, 4:11 p.m. UTC | #5
On Fri, Feb 19, 2021 at 6:29 PM 贺鹏 <xnhp0320@gmail.com> wrote:
>
> Hi, Ilya
>
> Ilya Maximets <i.maximets@ovn.org> 于2021年2月19日周五 下午7:19写道:
> >
> > On 2/19/21 3:12 AM, 贺鹏 wrote:
> > > Hi,
> > >
> > > Looks like this bug is caused by violating the fact that if a rule is
> > > referenced, the related ofproto should not be destroyed.
> > >
> > > If so, I have a patch that also fixes the problem, not sure if this helps.
> > >
> > > http://patchwork.ozlabs.org/project/openvswitch/patch/20200717015041.82746-1-hepeng.0320@bytedance.com/
> >
> > There is at least one more problem that is not strictly related but
> > in more or less the same part of the code:
> >   https://mail.openvswitch.org/pipermail/ovs-dev/2021-February/380582.html
>
> So maybe before add it into *ovsrcu_postpone*, we should add refcount
> of ofproto also?
>
Yes, I think that will fix the issue. But are we able to find out all the
places that we need to add refcount of ofproto?

Looks like we might have multiple rcu postponed function that might
access the 'ofproto'. And 'ofproto' might be freed already and cause segfault.

Hepeng's patch fixes two places.
  http://patchwork.ozlabs.org/project/openvswitch/patch/20200717015041.82746-1-hepeng.0320@bytedance.com/
Ilya pointed out another place
  https://mail.openvswitch.org/pipermail/ovs-dev/2021-February/380582.html
yifeng's case is a little different (not due to ofproto = NULL, but
due to setting
the p->connmgr = NULL before postponed)

Regards,
William
Ilya Maximets Feb. 22, 2021, 12:12 p.m. UTC | #6
On 2/21/21 5:11 PM, William Tu wrote:
> On Fri, Feb 19, 2021 at 6:29 PM 贺鹏 <xnhp0320@gmail.com> wrote:
>>
>> Hi, Ilya
>>
>> Ilya Maximets <i.maximets@ovn.org> 于2021年2月19日周五 下午7:19写道:
>>>
>>> On 2/19/21 3:12 AM, 贺鹏 wrote:
>>>> Hi,
>>>>
>>>> Looks like this bug is caused by violating the fact that if a rule is
>>>> referenced, the related ofproto should not be destroyed.
>>>>
>>>> If so, I have a patch that also fixes the problem, not sure if this helps.
>>>>
>>>> http://patchwork.ozlabs.org/project/openvswitch/patch/20200717015041.82746-1-hepeng.0320@bytedance.com/
>>>
>>> There is at least one more problem that is not strictly related but
>>> in more or less the same part of the code:
>>>   https://mail.openvswitch.org/pipermail/ovs-dev/2021-February/380582.html
>>
>> So maybe before add it into *ovsrcu_postpone*, we should add refcount
>> of ofproto also?
>>
> Yes, I think that will fix the issue. But are we able to find out all the
> places that we need to add refcount of ofproto?

destruct() is called for ofproto_dpif unconditionally without any deferring,
so this will not help unless we're delaying the the destruct() itself, and
I'm not sure if we can do that.

> 
> Looks like we might have multiple rcu postponed function that might
> access the 'ofproto'. And 'ofproto' might be freed already and cause segfault.
> 
> Hepeng's patch fixes two places.
>   http://patchwork.ozlabs.org/project/openvswitch/patch/20200717015041.82746-1-hepeng.0320@bytedance.com/
> Ilya pointed out another place
>   https://mail.openvswitch.org/pipermail/ovs-dev/2021-February/380582.html
> yifeng's case is a little different (not due to ofproto = NULL, but
> due to setting
> the p->connmgr = NULL before postponed)

In my case ofproto is alive too.  The problem is destroyed ofproto->baker.
And in case of meter_id we don't really need to refcount the ofproto itself.
'baker' already has a refcount, so we could increase it and pass 'baker'
pointer to free_meter_id instead of 'ofproto'.  This function doesn't
actually need an 'ofproto' pointer.

Regarding this connmgr related patch... It's still valid even with refcounts
because destruction of connmgr is explicitly not deferred for a reason.  It's
to free the listening socket that users might want to reuse.  Refcounts will
not help here.
Also for the meters related issue.. destruct() is called for ofproto_dpif
without any postponing and I'm not sure if we need or actually able to
postpone it without consequences.  So, baker->refcount might be a better
solution keeping the ofproto->refcount only for rules.  This will reduce
the scope of changes, otherwise we will have to do a lot more work tracking
down all the users that holds 'ofproto' pointer and will miss some of them
with high probability.

I'll recheck this (connmgr) patch and, likely, apply it.   Will also review
ofproto refcount patch.

Best regards, Ilya Maximets.
贺鹏 Feb. 22, 2021, 1:05 p.m. UTC | #7
Hi, Ilya and William,

Ilya Maximets <i.maximets@ovn.org> 于2021年2月22日周一 下午8:12写道:
>
> On 2/21/21 5:11 PM, William Tu wrote:
> > On Fri, Feb 19, 2021 at 6:29 PM 贺鹏 <xnhp0320@gmail.com> wrote:
> >>
> >> Hi, Ilya
> >>
> >> Ilya Maximets <i.maximets@ovn.org> 于2021年2月19日周五 下午7:19写道:
> >>>
> >>> On 2/19/21 3:12 AM, 贺鹏 wrote:
> >>>> Hi,
> >>>>
> >>>> Looks like this bug is caused by violating the fact that if a rule is
> >>>> referenced, the related ofproto should not be destroyed.
> >>>>
> >>>> If so, I have a patch that also fixes the problem, not sure if this helps.
> >>>>
> >>>> http://patchwork.ozlabs.org/project/openvswitch/patch/20200717015041.82746-1-hepeng.0320@bytedance.com/
> >>>
> >>> There is at least one more problem that is not strictly related but
> >>> in more or less the same part of the code:
> >>>   https://mail.openvswitch.org/pipermail/ovs-dev/2021-February/380582.html
> >>
> >> So maybe before add it into *ovsrcu_postpone*, we should add refcount
> >> of ofproto also?
> >>
> > Yes, I think that will fix the issue. But are we able to find out all the
> > places that we need to add refcount of ofproto?

Yes, this could be difficult.

>
> destruct() is called for ofproto_dpif unconditionally without any deferring,
> so this will not help unless we're delaying the the destruct() itself, and
> I'm not sure if we can do that.
>
> >
> > Looks like we might have multiple rcu postponed function that might
> > access the 'ofproto'. And 'ofproto' might be freed already and cause segfault.
> >
> > Hepeng's patch fixes two places.
> >   http://patchwork.ozlabs.org/project/openvswitch/patch/20200717015041.82746-1-hepeng.0320@bytedance.com/
> > Ilya pointed out another place
> >   https://mail.openvswitch.org/pipermail/ovs-dev/2021-February/380582.html
> > yifeng's case is a little different (not due to ofproto = NULL, but
> > due to setting
> > the p->connmgr = NULL before postponed)
>
> In my case ofproto is alive too.  The problem is destroyed ofproto->baker.
> And in case of meter_id we don't really need to refcount the ofproto itself.
> 'baker' already has a refcount, so we could increase it and pass 'baker'
> pointer to free_meter_id instead of 'ofproto'.  This function doesn't
> actually need an 'ofproto' pointer.
>
> Regarding this connmgr related patch... It's still valid even with refcounts
> because destruction of connmgr is explicitly not deferred for a reason.  It's
> to free the listening socket that users might want to reuse.  Refcounts will
> not help here.
> Also for the meters related issue.. destruct() is called for ofproto_dpif
> without any postponing and I'm not sure if we need or actually able to
> postpone it without consequences.  So, baker->refcount might be a better
> solution keeping the ofproto->refcount only for rules.  This will reduce
> the scope of changes, otherwise we will have to do a lot more work tracking
> down all the users that holds 'ofproto' pointer and will miss some of them
> with high probability.

Agree.
I recheck the patch. It's a better idea to use ofproto->backer's refcount.
But in the rule case, looks like the rule->ofproto needs to be alive
to access both ofproto->vl_mff_map and ofproto->ofproto_class.

 static void
 rule_destroy_cb(struct rule *rule)
     OVS_NO_THREAD_SAFETY_ANALYSIS
 {
     /* Send rule removed if needed. */
     if (rule->flags & OFPUTIL_FF_SEND_FLOW_REM
         && rule->removed_reason != OVS_OFPRR_NONE
         && !rule_is_hidden(rule)) {
         ofproto_rule_send_removed(rule);
     }
     rule->ofproto->ofproto_class->rule_destruct(rule);
     mf_vl_mff_unref(&rule->ofproto->vl_mff_map, rule->match_tlv_bitmap);
     mf_vl_mff_unref(&rule->ofproto->vl_mff_map, rule->ofpacts_tlv_bitmap);
     ofproto_rule_destroy__(rule);
 }

Maybe add refcount into vl_mff_map and ofproto_class?

>
> I'll recheck this (connmgr) patch and, likely, apply it.   Will also review
> ofproto refcount patch.
>
> Best regards, Ilya Maximets.
Ilya Maximets Feb. 23, 2021, 1:02 p.m. UTC | #8
On 2/19/21 1:09 AM, William Tu wrote:
> On Wed, Feb 17, 2021 at 1:09 PM Yifeng Sun <pkusunyifeng@gmail.com> wrote:
>>
>> ovs-vswitchd could crash under these circumstances:
>> 1. When one bridge is being destroyed, ofproto_destroy() is called and
>> connmgr pointer of its ofproto struct is nullified. This ofproto struct is
>> deallocated through 'ovsrcu_postpone(ofproto_destroy_defer__, p);'.
>> 2. Before RCU enters quiesce state to actually free this ofproto struct,
>> revalidator thread calls udpif_revalidator(), which could handle
>> a learn flow and calls ofproto_flow_mod_learn(), it later calls
>> ofmonitor_report() and ofproto struct's connmgr pointer is accessed.
>>
>> The crash stack trace is shown below:
>>
>> 0  ofmonitor_report (mgr=0x0, rule=rule@entry=0x7fa4ac067c30, event=event@entry=NXFME_ADDED,
>>     reason=reason@entry=OFPRR_IDLE_TIMEOUT, abbrev_ofconn=0x0, abbrev_xid=0, old_actions=old_actions@entry=0x0)
>>     at ofproto/connmgr.c:2160
>> 1  0x00007fa4d6803495 in add_flow_finish (ofproto=0x55d9075d4ab0, ofm=<optimized out>, req=req@entry=0x0)
>>     at ofproto/ofproto.c:5221
>> 2  0x00007fa4d68036af in modify_flows_finish (req=0x0, ofm=0x7fa4980753f0, ofproto=0x55d9075d4ab0)
>>     at ofproto/ofproto.c:5823
>> 3  ofproto_flow_mod_finish (ofproto=0x55d9075d4ab0, ofm=ofm@entry=0x7fa4980753f0, req=req@entry=0x0)
>>     at ofproto/ofproto.c:8088
>> 4  0x00007fa4d680372d in ofproto_flow_mod_learn_finish (ofm=ofm@entry=0x7fa4980753f0,
>>     orig_ofproto=orig_ofproto@entry=0x0) at ofproto/ofproto.c:5439
>> 5  0x00007fa4d68072f9 in ofproto_flow_mod_learn (ofm=0x7fa4980753f0, keep_ref=keep_ref@entry=true,
>>     limit=<optimized out>, below_limitp=below_limitp@entry=0x0) at ofproto/ofproto.c:5499
>> 6  0x00007fa4d6835d33 in xlate_push_stats_entry (entry=0x7fa498012448, stats=stats@entry=0x7fa4d2701a10,
>>     offloaded=offloaded@entry=false) at ofproto/ofproto-dpif-xlate-cache.c:127
>> 7  0x00007fa4d6835e3a in xlate_push_stats (xcache=<optimized out>, stats=stats@entry=0x7fa4d2701a10,
>>     offloaded=offloaded@entry=false) at ofproto/ofproto-dpif-xlate-cache.c:181
>> 8  0x00007fa4d6822046 in revalidate_ukey (udpif=udpif@entry=0x55d90760b240, ukey=ukey@entry=0x7fa4b0191660,
>>     stats=stats@entry=0x7fa4d2705118, odp_actions=odp_actions@entry=0x7fa4d2701b50,
>>     reval_seq=reval_seq@entry=5655486242, recircs=recircs@entry=0x7fa4d2701b40, offloaded=false)
>>     at ofproto/ofproto-dpif-upcall.c:2294
>> 9  0x00007fa4d6825aee in revalidate (revalidator=0x55d90769dd00) at ofproto/ofproto-dpif-upcall.c:2683
>> 10 0x00007fa4d6825cf3 in udpif_revalidator (arg=0x55d90769dd00) at ofproto/ofproto-dpif-upcall.c:936
>> 11 0x00007fa4d6259c9f in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:423
>> 12 0x00007fa4d582cea5 in start_thread () from /usr/lib64/libpthread.so.0
>> 13 0x00007fa4d504b96d in clone () from /usr/lib64/libc.so.6
>>
>> At the time of crash, the involved ofproto was already deallocated:
>>
>> (gdb) print *ofproto
>> $1 = ..., name = 0x55d907602820 "nsx-managed", ..., ports = {...,
>>     one = 0x0, mask = 63, n = 0}, ..., connmgr = 0x0, ...
>>
>> This patch fixes it.
>>
>> VMware-BZ: #2700626
>> Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
>> ---
>> v1->v2: Add check for ofmonitor_flush, thanks William.
>>
> 
> LGTM, thanks.
> Acked-by: William Tu < u9012063@gmail.com>

Thanks!
Applied to master and backported down to 2.12.

Best regards, Ilya Maximets.
Ilya Maximets Feb. 23, 2021, 1:04 p.m. UTC | #9
On 2/22/21 2:05 PM, 贺鹏 wrote:
> Hi, Ilya and William,
> 
> Ilya Maximets <i.maximets@ovn.org> 于2021年2月22日周一 下午8:12写道:
>>
>> On 2/21/21 5:11 PM, William Tu wrote:
>>> On Fri, Feb 19, 2021 at 6:29 PM 贺鹏 <xnhp0320@gmail.com> wrote:
>>>>
>>>> Hi, Ilya
>>>>
>>>> Ilya Maximets <i.maximets@ovn.org> 于2021年2月19日周五 下午7:19写道:
>>>>>
>>>>> On 2/19/21 3:12 AM, 贺鹏 wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Looks like this bug is caused by violating the fact that if a rule is
>>>>>> referenced, the related ofproto should not be destroyed.
>>>>>>
>>>>>> If so, I have a patch that also fixes the problem, not sure if this helps.
>>>>>>
>>>>>> http://patchwork.ozlabs.org/project/openvswitch/patch/20200717015041.82746-1-hepeng.0320@bytedance.com/
>>>>>
>>>>> There is at least one more problem that is not strictly related but
>>>>> in more or less the same part of the code:
>>>>>   https://mail.openvswitch.org/pipermail/ovs-dev/2021-February/380582.html
>>>>
>>>> So maybe before add it into *ovsrcu_postpone*, we should add refcount
>>>> of ofproto also?
>>>>
>>> Yes, I think that will fix the issue. But are we able to find out all the
>>> places that we need to add refcount of ofproto?
> 
> Yes, this could be difficult.
> 
>>
>> destruct() is called for ofproto_dpif unconditionally without any deferring,
>> so this will not help unless we're delaying the the destruct() itself, and
>> I'm not sure if we can do that.
>>
>>>
>>> Looks like we might have multiple rcu postponed function that might
>>> access the 'ofproto'. And 'ofproto' might be freed already and cause segfault.
>>>
>>> Hepeng's patch fixes two places.
>>>   http://patchwork.ozlabs.org/project/openvswitch/patch/20200717015041.82746-1-hepeng.0320@bytedance.com/
>>> Ilya pointed out another place
>>>   https://mail.openvswitch.org/pipermail/ovs-dev/2021-February/380582.html
>>> yifeng's case is a little different (not due to ofproto = NULL, but
>>> due to setting
>>> the p->connmgr = NULL before postponed)
>>
>> In my case ofproto is alive too.  The problem is destroyed ofproto->baker.
>> And in case of meter_id we don't really need to refcount the ofproto itself.
>> 'baker' already has a refcount, so we could increase it and pass 'baker'
>> pointer to free_meter_id instead of 'ofproto'.  This function doesn't
>> actually need an 'ofproto' pointer.
>>
>> Regarding this connmgr related patch... It's still valid even with refcounts
>> because destruction of connmgr is explicitly not deferred for a reason.  It's
>> to free the listening socket that users might want to reuse.  Refcounts will
>> not help here.
>> Also for the meters related issue.. destruct() is called for ofproto_dpif
>> without any postponing and I'm not sure if we need or actually able to
>> postpone it without consequences.  So, baker->refcount might be a better
>> solution keeping the ofproto->refcount only for rules.  This will reduce
>> the scope of changes, otherwise we will have to do a lot more work tracking
>> down all the users that holds 'ofproto' pointer and will miss some of them
>> with high probability.
> 
> Agree.
> I recheck the patch. It's a better idea to use ofproto->backer's refcount.
> But in the rule case, looks like the rule->ofproto needs to be alive
> to access both ofproto->vl_mff_map and ofproto->ofproto_class.
> 
>  static void
>  rule_destroy_cb(struct rule *rule)
>      OVS_NO_THREAD_SAFETY_ANALYSIS
>  {
>      /* Send rule removed if needed. */
>      if (rule->flags & OFPUTIL_FF_SEND_FLOW_REM
>          && rule->removed_reason != OVS_OFPRR_NONE
>          && !rule_is_hidden(rule)) {
>          ofproto_rule_send_removed(rule);
>      }
>      rule->ofproto->ofproto_class->rule_destruct(rule);
>      mf_vl_mff_unref(&rule->ofproto->vl_mff_map, rule->match_tlv_bitmap);
>      mf_vl_mff_unref(&rule->ofproto->vl_mff_map, rule->ofpacts_tlv_bitmap);
>      ofproto_rule_destroy__(rule);
>  }
> 
> Maybe add refcount into vl_mff_map and ofproto_class?

IMHO, too many refcounts.  It's better to have one global for ofproto.
'baker' could be handled by its own refcount.

> 
>>
>> I'll recheck this (connmgr) patch and, likely, apply it.   Will also review
>> ofproto refcount patch.
>>
>> Best regards, Ilya Maximets.
> 
> 
>
diff mbox series

Patch

diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c
index 9c5c633b4171..fa8f6cd0e83a 100644
--- a/ofproto/connmgr.c
+++ b/ofproto/connmgr.c
@@ -2140,7 +2140,7 @@  ofmonitor_report(struct connmgr *mgr, struct rule *rule,
                  const struct rule_actions *old_actions)
     OVS_REQUIRES(ofproto_mutex)
 {
-    if (rule_is_hidden(rule)) {
+    if (!mgr || rule_is_hidden(rule)) {
         return;
     }
 
@@ -2244,6 +2244,10 @@  ofmonitor_flush(struct connmgr *mgr)
 {
     struct ofconn *ofconn;
 
+    if (!mgr) {
+        return;
+    }
+
     LIST_FOR_EACH (ofconn, connmgr_node, &mgr->conns) {
         struct rconn_packet_counter *counter = ofconn->monitor_counter;