diff mbox series

[ovs-dev] connmgr: Check nullptr inside ofmonitor_report()

Message ID 1613509593-32473-1-git-send-email-pkusunyifeng@gmail.com
State Changes Requested
Headers show
Series [ovs-dev] connmgr: Check nullptr inside ofmonitor_report() | expand

Commit Message

Yifeng Sun Feb. 16, 2021, 9:06 p.m. UTC
ovs-vswitchd could crash under these circumstances:
1. When one bridge is being destroyed, ofproto_destroy() is called and
connmgr pointer of its ofproto struct is nullified. This ofproto struct is
deallocated through 'ovsrcu_postpone(ofproto_destroy_defer__, p);'.
2. Before RCU enters quiesce state to actually free this ofproto struct,
revalidator thread calls udpif_revalidator(), which could handle
a learn flow and calls ofproto_flow_mod_learn(), it later calls
ofmonitor_report() and ofproto struct's connmgr pointer is accessed.

The crash stack trace is shown below:

0  ofmonitor_report (mgr=0x0, rule=rule@entry=0x7fa4ac067c30, event=event@entry=NXFME_ADDED,
    reason=reason@entry=OFPRR_IDLE_TIMEOUT, abbrev_ofconn=0x0, abbrev_xid=0, old_actions=old_actions@entry=0x0)
    at ofproto/connmgr.c:2160
1  0x00007fa4d6803495 in add_flow_finish (ofproto=0x55d9075d4ab0, ofm=<optimized out>, req=req@entry=0x0)
    at ofproto/ofproto.c:5221
2  0x00007fa4d68036af in modify_flows_finish (req=0x0, ofm=0x7fa4980753f0, ofproto=0x55d9075d4ab0)
    at ofproto/ofproto.c:5823
3  ofproto_flow_mod_finish (ofproto=0x55d9075d4ab0, ofm=ofm@entry=0x7fa4980753f0, req=req@entry=0x0)
    at ofproto/ofproto.c:8088
4  0x00007fa4d680372d in ofproto_flow_mod_learn_finish (ofm=ofm@entry=0x7fa4980753f0,
    orig_ofproto=orig_ofproto@entry=0x0) at ofproto/ofproto.c:5439
5  0x00007fa4d68072f9 in ofproto_flow_mod_learn (ofm=0x7fa4980753f0, keep_ref=keep_ref@entry=true,
    limit=<optimized out>, below_limitp=below_limitp@entry=0x0) at ofproto/ofproto.c:5499
6  0x00007fa4d6835d33 in xlate_push_stats_entry (entry=0x7fa498012448, stats=stats@entry=0x7fa4d2701a10,
    offloaded=offloaded@entry=false) at ofproto/ofproto-dpif-xlate-cache.c:127
7  0x00007fa4d6835e3a in xlate_push_stats (xcache=<optimized out>, stats=stats@entry=0x7fa4d2701a10,
    offloaded=offloaded@entry=false) at ofproto/ofproto-dpif-xlate-cache.c:181
8  0x00007fa4d6822046 in revalidate_ukey (udpif=udpif@entry=0x55d90760b240, ukey=ukey@entry=0x7fa4b0191660,
    stats=stats@entry=0x7fa4d2705118, odp_actions=odp_actions@entry=0x7fa4d2701b50,
    reval_seq=reval_seq@entry=5655486242, recircs=recircs@entry=0x7fa4d2701b40, offloaded=false)
    at ofproto/ofproto-dpif-upcall.c:2294
9  0x00007fa4d6825aee in revalidate (revalidator=0x55d90769dd00) at ofproto/ofproto-dpif-upcall.c:2683
10 0x00007fa4d6825cf3 in udpif_revalidator (arg=0x55d90769dd00) at ofproto/ofproto-dpif-upcall.c:936
11 0x00007fa4d6259c9f in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:423
12 0x00007fa4d582cea5 in start_thread () from /usr/lib64/libpthread.so.0
13 0x00007fa4d504b96d in clone () from /usr/lib64/libc.so.6

At the time of crash, the involved ofproto was already deallocated:

(gdb) print *ofproto
$1 = ..., name = 0x55d907602820 "nsx-managed", ..., ports = {...,
    one = 0x0, mask = 63, n = 0}, ..., connmgr = 0x0, ...

This patch fixes it.

VMware-BZ: #2700626
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
---
 ofproto/connmgr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Yi-Hung Wei Feb. 16, 2021, 9:39 p.m. UTC | #1
On Tue, Feb 16, 2021 at 1:06 PM Yifeng Sun <pkusunyifeng@gmail.com> wrote:
>
> ovs-vswitchd could crash under these circumstances:
> 1. When one bridge is being destroyed, ofproto_destroy() is called and
> connmgr pointer of its ofproto struct is nullified. This ofproto struct is
> deallocated through 'ovsrcu_postpone(ofproto_destroy_defer__, p);'.
> 2. Before RCU enters quiesce state to actually free this ofproto struct,
> revalidator thread calls udpif_revalidator(), which could handle
> a learn flow and calls ofproto_flow_mod_learn(), it later calls
> ofmonitor_report() and ofproto struct's connmgr pointer is accessed.
>
> The crash stack trace is shown below:
>
> 0  ofmonitor_report (mgr=0x0, rule=rule@entry=0x7fa4ac067c30, event=event@entry=NXFME_ADDED,
>     reason=reason@entry=OFPRR_IDLE_TIMEOUT, abbrev_ofconn=0x0, abbrev_xid=0, old_actions=old_actions@entry=0x0)
>     at ofproto/connmgr.c:2160
> 1  0x00007fa4d6803495 in add_flow_finish (ofproto=0x55d9075d4ab0, ofm=<optimized out>, req=req@entry=0x0)
>     at ofproto/ofproto.c:5221
> 2  0x00007fa4d68036af in modify_flows_finish (req=0x0, ofm=0x7fa4980753f0, ofproto=0x55d9075d4ab0)
>     at ofproto/ofproto.c:5823
> 3  ofproto_flow_mod_finish (ofproto=0x55d9075d4ab0, ofm=ofm@entry=0x7fa4980753f0, req=req@entry=0x0)
>     at ofproto/ofproto.c:8088
> 4  0x00007fa4d680372d in ofproto_flow_mod_learn_finish (ofm=ofm@entry=0x7fa4980753f0,
>     orig_ofproto=orig_ofproto@entry=0x0) at ofproto/ofproto.c:5439
> 5  0x00007fa4d68072f9 in ofproto_flow_mod_learn (ofm=0x7fa4980753f0, keep_ref=keep_ref@entry=true,
>     limit=<optimized out>, below_limitp=below_limitp@entry=0x0) at ofproto/ofproto.c:5499
> 6  0x00007fa4d6835d33 in xlate_push_stats_entry (entry=0x7fa498012448, stats=stats@entry=0x7fa4d2701a10,
>     offloaded=offloaded@entry=false) at ofproto/ofproto-dpif-xlate-cache.c:127
> 7  0x00007fa4d6835e3a in xlate_push_stats (xcache=<optimized out>, stats=stats@entry=0x7fa4d2701a10,
>     offloaded=offloaded@entry=false) at ofproto/ofproto-dpif-xlate-cache.c:181
> 8  0x00007fa4d6822046 in revalidate_ukey (udpif=udpif@entry=0x55d90760b240, ukey=ukey@entry=0x7fa4b0191660,
>     stats=stats@entry=0x7fa4d2705118, odp_actions=odp_actions@entry=0x7fa4d2701b50,
>     reval_seq=reval_seq@entry=5655486242, recircs=recircs@entry=0x7fa4d2701b40, offloaded=false)
>     at ofproto/ofproto-dpif-upcall.c:2294
> 9  0x00007fa4d6825aee in revalidate (revalidator=0x55d90769dd00) at ofproto/ofproto-dpif-upcall.c:2683
> 10 0x00007fa4d6825cf3 in udpif_revalidator (arg=0x55d90769dd00) at ofproto/ofproto-dpif-upcall.c:936
> 11 0x00007fa4d6259c9f in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:423
> 12 0x00007fa4d582cea5 in start_thread () from /usr/lib64/libpthread.so.0
> 13 0x00007fa4d504b96d in clone () from /usr/lib64/libc.so.6
>
> At the time of crash, the involved ofproto was already deallocated:
>
> (gdb) print *ofproto
> $1 = ..., name = 0x55d907602820 "nsx-managed", ..., ports = {...,
>     one = 0x0, mask = 63, n = 0}, ..., connmgr = 0x0, ...
>
> This patch fixes it.
>
> VMware-BZ: #2700626
> Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
> ---

LGTM.

Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
William Tu Feb. 17, 2021, 4:45 a.m. UTC | #2
On Tue, Feb 16, 2021 at 1:40 PM Yi-Hung Wei <yihung.wei@gmail.com> wrote:
>
> On Tue, Feb 16, 2021 at 1:06 PM Yifeng Sun <pkusunyifeng@gmail.com> wrote:
> >
> > ovs-vswitchd could crash under these circumstances:
> > 1. When one bridge is being destroyed, ofproto_destroy() is called and
> > connmgr pointer of its ofproto struct is nullified. This ofproto struct is
> > deallocated through 'ovsrcu_postpone(ofproto_destroy_defer__, p);'.
> > 2. Before RCU enters quiesce state to actually free this ofproto struct,
> > revalidator thread calls udpif_revalidator(), which could handle
> > a learn flow and calls ofproto_flow_mod_learn(), it later calls
> > ofmonitor_report() and ofproto struct's connmgr pointer is accessed.
> >

LGTM, thanks. I guess this is hard to reproduce or create a test.
Do we need to worry about other places that use 'ofproto->connmgr'?
ex: there are a couple of places calling ofmonitor_flush(ofproto->connmgr);

> > The crash stack trace is shown below:
> >
> > 0  ofmonitor_report (mgr=0x0, rule=rule@entry=0x7fa4ac067c30, event=event@entry=NXFME_ADDED,
> >     reason=reason@entry=OFPRR_IDLE_TIMEOUT, abbrev_ofconn=0x0, abbrev_xid=0, old_actions=old_actions@entry=0x0)
> >     at ofproto/connmgr.c:2160
> > 1  0x00007fa4d6803495 in add_flow_finish (ofproto=0x55d9075d4ab0, ofm=<optimized out>, req=req@entry=0x0)
> >     at ofproto/ofproto.c:5221
> > 2  0x00007fa4d68036af in modify_flows_finish (req=0x0, ofm=0x7fa4980753f0, ofproto=0x55d9075d4ab0)
> >     at ofproto/ofproto.c:5823
> > 3  ofproto_flow_mod_finish (ofproto=0x55d9075d4ab0, ofm=ofm@entry=0x7fa4980753f0, req=req@entry=0x0)
> >     at ofproto/ofproto.c:8088
> > 4  0x00007fa4d680372d in ofproto_flow_mod_learn_finish (ofm=ofm@entry=0x7fa4980753f0,
> >     orig_ofproto=orig_ofproto@entry=0x0) at ofproto/ofproto.c:5439
> > 5  0x00007fa4d68072f9 in ofproto_flow_mod_learn (ofm=0x7fa4980753f0, keep_ref=keep_ref@entry=true,
> >     limit=<optimized out>, below_limitp=below_limitp@entry=0x0) at ofproto/ofproto.c:5499
> > 6  0x00007fa4d6835d33 in xlate_push_stats_entry (entry=0x7fa498012448, stats=stats@entry=0x7fa4d2701a10,
> >     offloaded=offloaded@entry=false) at ofproto/ofproto-dpif-xlate-cache.c:127
> > 7  0x00007fa4d6835e3a in xlate_push_stats (xcache=<optimized out>, stats=stats@entry=0x7fa4d2701a10,
> >     offloaded=offloaded@entry=false) at ofproto/ofproto-dpif-xlate-cache.c:181
> > 8  0x00007fa4d6822046 in revalidate_ukey (udpif=udpif@entry=0x55d90760b240, ukey=ukey@entry=0x7fa4b0191660,
> >     stats=stats@entry=0x7fa4d2705118, odp_actions=odp_actions@entry=0x7fa4d2701b50,
> >     reval_seq=reval_seq@entry=5655486242, recircs=recircs@entry=0x7fa4d2701b40, offloaded=false)
> >     at ofproto/ofproto-dpif-upcall.c:2294
> > 9  0x00007fa4d6825aee in revalidate (revalidator=0x55d90769dd00) at ofproto/ofproto-dpif-upcall.c:2683
> > 10 0x00007fa4d6825cf3 in udpif_revalidator (arg=0x55d90769dd00) at ofproto/ofproto-dpif-upcall.c:936
> > 11 0x00007fa4d6259c9f in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:423
> > 12 0x00007fa4d582cea5 in start_thread () from /usr/lib64/libpthread.so.0
> > 13 0x00007fa4d504b96d in clone () from /usr/lib64/libc.so.6
> >
> > At the time of crash, the involved ofproto was already deallocated:
> >
> > (gdb) print *ofproto
> > $1 = ..., name = 0x55d907602820 "nsx-managed", ..., ports = {...,
> >     one = 0x0, mask = 63, n = 0}, ..., connmgr = 0x0, ...
> >
> > This patch fixes it.
> >
> > VMware-BZ: #2700626
> > Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
> > ---
>
> LGTM.
>
> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Yifeng Sun Feb. 17, 2021, 9:09 p.m. UTC | #3
Thanks William and YiHung for review, I sent a new version.

On Tue, Feb 16, 2021 at 8:45 PM William Tu <u9012063@gmail.com> wrote:

> On Tue, Feb 16, 2021 at 1:40 PM Yi-Hung Wei <yihung.wei@gmail.com> wrote:
> >
> > On Tue, Feb 16, 2021 at 1:06 PM Yifeng Sun <pkusunyifeng@gmail.com>
> wrote:
> > >
> > > ovs-vswitchd could crash under these circumstances:
> > > 1. When one bridge is being destroyed, ofproto_destroy() is called and
> > > connmgr pointer of its ofproto struct is nullified. This ofproto
> struct is
> > > deallocated through 'ovsrcu_postpone(ofproto_destroy_defer__, p);'.
> > > 2. Before RCU enters quiesce state to actually free this ofproto
> struct,
> > > revalidator thread calls udpif_revalidator(), which could handle
> > > a learn flow and calls ofproto_flow_mod_learn(), it later calls
> > > ofmonitor_report() and ofproto struct's connmgr pointer is accessed.
> > >
>
> LGTM, thanks. I guess this is hard to reproduce or create a test.
> Do we need to worry about other places that use 'ofproto->connmgr'?
> ex: there are a couple of places calling ofmonitor_flush(ofproto->connmgr);
>
> > > The crash stack trace is shown below:
> > >
> > > 0  ofmonitor_report (mgr=0x0, rule=rule@entry=0x7fa4ac067c30,
> event=event@entry=NXFME_ADDED,
> > >     reason=reason@entry=OFPRR_IDLE_TIMEOUT, abbrev_ofconn=0x0,
> abbrev_xid=0, old_actions=old_actions@entry=0x0)
> > >     at ofproto/connmgr.c:2160
> > > 1  0x00007fa4d6803495 in add_flow_finish (ofproto=0x55d9075d4ab0,
> ofm=<optimized out>, req=req@entry=0x0)
> > >     at ofproto/ofproto.c:5221
> > > 2  0x00007fa4d68036af in modify_flows_finish (req=0x0,
> ofm=0x7fa4980753f0, ofproto=0x55d9075d4ab0)
> > >     at ofproto/ofproto.c:5823
> > > 3  ofproto_flow_mod_finish (ofproto=0x55d9075d4ab0, ofm=ofm@entry=0x7fa4980753f0,
> req=req@entry=0x0)
> > >     at ofproto/ofproto.c:8088
> > > 4  0x00007fa4d680372d in ofproto_flow_mod_learn_finish (ofm=ofm@entry
> =0x7fa4980753f0,
> > >     orig_ofproto=orig_ofproto@entry=0x0) at ofproto/ofproto.c:5439
> > > 5  0x00007fa4d68072f9 in ofproto_flow_mod_learn (ofm=0x7fa4980753f0,
> keep_ref=keep_ref@entry=true,
> > >     limit=<optimized out>, below_limitp=below_limitp@entry=0x0) at
> ofproto/ofproto.c:5499
> > > 6  0x00007fa4d6835d33 in xlate_push_stats_entry (entry=0x7fa498012448,
> stats=stats@entry=0x7fa4d2701a10,
> > >     offloaded=offloaded@entry=false) at
> ofproto/ofproto-dpif-xlate-cache.c:127
> > > 7  0x00007fa4d6835e3a in xlate_push_stats (xcache=<optimized out>,
> stats=stats@entry=0x7fa4d2701a10,
> > >     offloaded=offloaded@entry=false) at
> ofproto/ofproto-dpif-xlate-cache.c:181
> > > 8  0x00007fa4d6822046 in revalidate_ukey (udpif=udpif@entry=0x55d90760b240,
> ukey=ukey@entry=0x7fa4b0191660,
> > >     stats=stats@entry=0x7fa4d2705118, odp_actions=odp_actions@entry
> =0x7fa4d2701b50,
> > >     reval_seq=reval_seq@entry=5655486242, recircs=recircs@entry=0x7fa4d2701b40,
> offloaded=false)
> > >     at ofproto/ofproto-dpif-upcall.c:2294
> > > 9  0x00007fa4d6825aee in revalidate (revalidator=0x55d90769dd00) at
> ofproto/ofproto-dpif-upcall.c:2683
> > > 10 0x00007fa4d6825cf3 in udpif_revalidator (arg=0x55d90769dd00) at
> ofproto/ofproto-dpif-upcall.c:936
> > > 11 0x00007fa4d6259c9f in ovsthread_wrapper (aux_=<optimized out>) at
> lib/ovs-thread.c:423
> > > 12 0x00007fa4d582cea5 in start_thread () from
> /usr/lib64/libpthread.so.0
> > > 13 0x00007fa4d504b96d in clone () from /usr/lib64/libc.so.6
> > >
> > > At the time of crash, the involved ofproto was already deallocated:
> > >
> > > (gdb) print *ofproto
> > > $1 = ..., name = 0x55d907602820 "nsx-managed", ..., ports = {...,
> > >     one = 0x0, mask = 63, n = 0}, ..., connmgr = 0x0, ...
> > >
> > > This patch fixes it.
> > >
> > > VMware-BZ: #2700626
> > > Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
> > > ---
> >
> > LGTM.
> >
> > Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
diff mbox series

Patch

diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c
index 9c5c633b4171..ee07df7a8bc0 100644
--- a/ofproto/connmgr.c
+++ b/ofproto/connmgr.c
@@ -2140,7 +2140,7 @@  ofmonitor_report(struct connmgr *mgr, struct rule *rule,
                  const struct rule_actions *old_actions)
     OVS_REQUIRES(ofproto_mutex)
 {
-    if (rule_is_hidden(rule)) {
+    if (!mgr || rule_is_hidden(rule)) {
         return;
     }