Message ID | 1613509593-32473-1-git-send-email-pkusunyifeng@gmail.com |
---|---|
State | Changes Requested |
Headers | show |
Series | [ovs-dev] connmgr: Check nullptr inside ofmonitor_report() | expand |
On Tue, Feb 16, 2021 at 1:06 PM Yifeng Sun <pkusunyifeng@gmail.com> wrote: > > ovs-vswitchd could crash under these circumstances: > 1. When one bridge is being destroyed, ofproto_destroy() is called and > connmgr pointer of its ofproto struct is nullified. This ofproto struct is > deallocated through 'ovsrcu_postpone(ofproto_destroy_defer__, p);'. > 2. Before RCU enters quiesce state to actually free this ofproto struct, > revalidator thread calls udpif_revalidator(), which could handle > a learn flow and calls ofproto_flow_mod_learn(), it later calls > ofmonitor_report() and ofproto struct's connmgr pointer is accessed. > > The crash stack trace is shown below: > > 0 ofmonitor_report (mgr=0x0, rule=rule@entry=0x7fa4ac067c30, event=event@entry=NXFME_ADDED, > reason=reason@entry=OFPRR_IDLE_TIMEOUT, abbrev_ofconn=0x0, abbrev_xid=0, old_actions=old_actions@entry=0x0) > at ofproto/connmgr.c:2160 > 1 0x00007fa4d6803495 in add_flow_finish (ofproto=0x55d9075d4ab0, ofm=<optimized out>, req=req@entry=0x0) > at ofproto/ofproto.c:5221 > 2 0x00007fa4d68036af in modify_flows_finish (req=0x0, ofm=0x7fa4980753f0, ofproto=0x55d9075d4ab0) > at ofproto/ofproto.c:5823 > 3 ofproto_flow_mod_finish (ofproto=0x55d9075d4ab0, ofm=ofm@entry=0x7fa4980753f0, req=req@entry=0x0) > at ofproto/ofproto.c:8088 > 4 0x00007fa4d680372d in ofproto_flow_mod_learn_finish (ofm=ofm@entry=0x7fa4980753f0, > orig_ofproto=orig_ofproto@entry=0x0) at ofproto/ofproto.c:5439 > 5 0x00007fa4d68072f9 in ofproto_flow_mod_learn (ofm=0x7fa4980753f0, keep_ref=keep_ref@entry=true, > limit=<optimized out>, below_limitp=below_limitp@entry=0x0) at ofproto/ofproto.c:5499 > 6 0x00007fa4d6835d33 in xlate_push_stats_entry (entry=0x7fa498012448, stats=stats@entry=0x7fa4d2701a10, > offloaded=offloaded@entry=false) at ofproto/ofproto-dpif-xlate-cache.c:127 > 7 0x00007fa4d6835e3a in xlate_push_stats (xcache=<optimized out>, stats=stats@entry=0x7fa4d2701a10, > offloaded=offloaded@entry=false) at ofproto/ofproto-dpif-xlate-cache.c:181 > 8 0x00007fa4d6822046 in revalidate_ukey (udpif=udpif@entry=0x55d90760b240, ukey=ukey@entry=0x7fa4b0191660, > stats=stats@entry=0x7fa4d2705118, odp_actions=odp_actions@entry=0x7fa4d2701b50, > reval_seq=reval_seq@entry=5655486242, recircs=recircs@entry=0x7fa4d2701b40, offloaded=false) > at ofproto/ofproto-dpif-upcall.c:2294 > 9 0x00007fa4d6825aee in revalidate (revalidator=0x55d90769dd00) at ofproto/ofproto-dpif-upcall.c:2683 > 10 0x00007fa4d6825cf3 in udpif_revalidator (arg=0x55d90769dd00) at ofproto/ofproto-dpif-upcall.c:936 > 11 0x00007fa4d6259c9f in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:423 > 12 0x00007fa4d582cea5 in start_thread () from /usr/lib64/libpthread.so.0 > 13 0x00007fa4d504b96d in clone () from /usr/lib64/libc.so.6 > > At the time of crash, the involved ofproto was already deallocated: > > (gdb) print *ofproto > $1 = ..., name = 0x55d907602820 "nsx-managed", ..., ports = {..., > one = 0x0, mask = 63, n = 0}, ..., connmgr = 0x0, ... > > This patch fixes it. > > VMware-BZ: #2700626 > Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> > --- LGTM. Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
On Tue, Feb 16, 2021 at 1:40 PM Yi-Hung Wei <yihung.wei@gmail.com> wrote: > > On Tue, Feb 16, 2021 at 1:06 PM Yifeng Sun <pkusunyifeng@gmail.com> wrote: > > > > ovs-vswitchd could crash under these circumstances: > > 1. When one bridge is being destroyed, ofproto_destroy() is called and > > connmgr pointer of its ofproto struct is nullified. This ofproto struct is > > deallocated through 'ovsrcu_postpone(ofproto_destroy_defer__, p);'. > > 2. Before RCU enters quiesce state to actually free this ofproto struct, > > revalidator thread calls udpif_revalidator(), which could handle > > a learn flow and calls ofproto_flow_mod_learn(), it later calls > > ofmonitor_report() and ofproto struct's connmgr pointer is accessed. > > LGTM, thanks. I guess this is hard to reproduce or create a test. Do we need to worry about other places that use 'ofproto->connmgr'? ex: there are a couple of places calling ofmonitor_flush(ofproto->connmgr); > > The crash stack trace is shown below: > > > > 0 ofmonitor_report (mgr=0x0, rule=rule@entry=0x7fa4ac067c30, event=event@entry=NXFME_ADDED, > > reason=reason@entry=OFPRR_IDLE_TIMEOUT, abbrev_ofconn=0x0, abbrev_xid=0, old_actions=old_actions@entry=0x0) > > at ofproto/connmgr.c:2160 > > 1 0x00007fa4d6803495 in add_flow_finish (ofproto=0x55d9075d4ab0, ofm=<optimized out>, req=req@entry=0x0) > > at ofproto/ofproto.c:5221 > > 2 0x00007fa4d68036af in modify_flows_finish (req=0x0, ofm=0x7fa4980753f0, ofproto=0x55d9075d4ab0) > > at ofproto/ofproto.c:5823 > > 3 ofproto_flow_mod_finish (ofproto=0x55d9075d4ab0, ofm=ofm@entry=0x7fa4980753f0, req=req@entry=0x0) > > at ofproto/ofproto.c:8088 > > 4 0x00007fa4d680372d in ofproto_flow_mod_learn_finish (ofm=ofm@entry=0x7fa4980753f0, > > orig_ofproto=orig_ofproto@entry=0x0) at ofproto/ofproto.c:5439 > > 5 0x00007fa4d68072f9 in ofproto_flow_mod_learn (ofm=0x7fa4980753f0, keep_ref=keep_ref@entry=true, > > limit=<optimized out>, below_limitp=below_limitp@entry=0x0) at ofproto/ofproto.c:5499 > > 6 0x00007fa4d6835d33 in xlate_push_stats_entry (entry=0x7fa498012448, stats=stats@entry=0x7fa4d2701a10, > > offloaded=offloaded@entry=false) at ofproto/ofproto-dpif-xlate-cache.c:127 > > 7 0x00007fa4d6835e3a in xlate_push_stats (xcache=<optimized out>, stats=stats@entry=0x7fa4d2701a10, > > offloaded=offloaded@entry=false) at ofproto/ofproto-dpif-xlate-cache.c:181 > > 8 0x00007fa4d6822046 in revalidate_ukey (udpif=udpif@entry=0x55d90760b240, ukey=ukey@entry=0x7fa4b0191660, > > stats=stats@entry=0x7fa4d2705118, odp_actions=odp_actions@entry=0x7fa4d2701b50, > > reval_seq=reval_seq@entry=5655486242, recircs=recircs@entry=0x7fa4d2701b40, offloaded=false) > > at ofproto/ofproto-dpif-upcall.c:2294 > > 9 0x00007fa4d6825aee in revalidate (revalidator=0x55d90769dd00) at ofproto/ofproto-dpif-upcall.c:2683 > > 10 0x00007fa4d6825cf3 in udpif_revalidator (arg=0x55d90769dd00) at ofproto/ofproto-dpif-upcall.c:936 > > 11 0x00007fa4d6259c9f in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:423 > > 12 0x00007fa4d582cea5 in start_thread () from /usr/lib64/libpthread.so.0 > > 13 0x00007fa4d504b96d in clone () from /usr/lib64/libc.so.6 > > > > At the time of crash, the involved ofproto was already deallocated: > > > > (gdb) print *ofproto > > $1 = ..., name = 0x55d907602820 "nsx-managed", ..., ports = {..., > > one = 0x0, mask = 63, n = 0}, ..., connmgr = 0x0, ... > > > > This patch fixes it. > > > > VMware-BZ: #2700626 > > Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> > > --- > > LGTM. > > Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Thanks William and YiHung for review, I sent a new version. On Tue, Feb 16, 2021 at 8:45 PM William Tu <u9012063@gmail.com> wrote: > On Tue, Feb 16, 2021 at 1:40 PM Yi-Hung Wei <yihung.wei@gmail.com> wrote: > > > > On Tue, Feb 16, 2021 at 1:06 PM Yifeng Sun <pkusunyifeng@gmail.com> > wrote: > > > > > > ovs-vswitchd could crash under these circumstances: > > > 1. When one bridge is being destroyed, ofproto_destroy() is called and > > > connmgr pointer of its ofproto struct is nullified. This ofproto > struct is > > > deallocated through 'ovsrcu_postpone(ofproto_destroy_defer__, p);'. > > > 2. Before RCU enters quiesce state to actually free this ofproto > struct, > > > revalidator thread calls udpif_revalidator(), which could handle > > > a learn flow and calls ofproto_flow_mod_learn(), it later calls > > > ofmonitor_report() and ofproto struct's connmgr pointer is accessed. > > > > > LGTM, thanks. I guess this is hard to reproduce or create a test. > Do we need to worry about other places that use 'ofproto->connmgr'? > ex: there are a couple of places calling ofmonitor_flush(ofproto->connmgr); > > > > The crash stack trace is shown below: > > > > > > 0 ofmonitor_report (mgr=0x0, rule=rule@entry=0x7fa4ac067c30, > event=event@entry=NXFME_ADDED, > > > reason=reason@entry=OFPRR_IDLE_TIMEOUT, abbrev_ofconn=0x0, > abbrev_xid=0, old_actions=old_actions@entry=0x0) > > > at ofproto/connmgr.c:2160 > > > 1 0x00007fa4d6803495 in add_flow_finish (ofproto=0x55d9075d4ab0, > ofm=<optimized out>, req=req@entry=0x0) > > > at ofproto/ofproto.c:5221 > > > 2 0x00007fa4d68036af in modify_flows_finish (req=0x0, > ofm=0x7fa4980753f0, ofproto=0x55d9075d4ab0) > > > at ofproto/ofproto.c:5823 > > > 3 ofproto_flow_mod_finish (ofproto=0x55d9075d4ab0, ofm=ofm@entry=0x7fa4980753f0, > req=req@entry=0x0) > > > at ofproto/ofproto.c:8088 > > > 4 0x00007fa4d680372d in ofproto_flow_mod_learn_finish (ofm=ofm@entry > =0x7fa4980753f0, > > > orig_ofproto=orig_ofproto@entry=0x0) at ofproto/ofproto.c:5439 > > > 5 0x00007fa4d68072f9 in ofproto_flow_mod_learn (ofm=0x7fa4980753f0, > keep_ref=keep_ref@entry=true, > > > limit=<optimized out>, below_limitp=below_limitp@entry=0x0) at > ofproto/ofproto.c:5499 > > > 6 0x00007fa4d6835d33 in xlate_push_stats_entry (entry=0x7fa498012448, > stats=stats@entry=0x7fa4d2701a10, > > > offloaded=offloaded@entry=false) at > ofproto/ofproto-dpif-xlate-cache.c:127 > > > 7 0x00007fa4d6835e3a in xlate_push_stats (xcache=<optimized out>, > stats=stats@entry=0x7fa4d2701a10, > > > offloaded=offloaded@entry=false) at > ofproto/ofproto-dpif-xlate-cache.c:181 > > > 8 0x00007fa4d6822046 in revalidate_ukey (udpif=udpif@entry=0x55d90760b240, > ukey=ukey@entry=0x7fa4b0191660, > > > stats=stats@entry=0x7fa4d2705118, odp_actions=odp_actions@entry > =0x7fa4d2701b50, > > > reval_seq=reval_seq@entry=5655486242, recircs=recircs@entry=0x7fa4d2701b40, > offloaded=false) > > > at ofproto/ofproto-dpif-upcall.c:2294 > > > 9 0x00007fa4d6825aee in revalidate (revalidator=0x55d90769dd00) at > ofproto/ofproto-dpif-upcall.c:2683 > > > 10 0x00007fa4d6825cf3 in udpif_revalidator (arg=0x55d90769dd00) at > ofproto/ofproto-dpif-upcall.c:936 > > > 11 0x00007fa4d6259c9f in ovsthread_wrapper (aux_=<optimized out>) at > lib/ovs-thread.c:423 > > > 12 0x00007fa4d582cea5 in start_thread () from > /usr/lib64/libpthread.so.0 > > > 13 0x00007fa4d504b96d in clone () from /usr/lib64/libc.so.6 > > > > > > At the time of crash, the involved ofproto was already deallocated: > > > > > > (gdb) print *ofproto > > > $1 = ..., name = 0x55d907602820 "nsx-managed", ..., ports = {..., > > > one = 0x0, mask = 63, n = 0}, ..., connmgr = 0x0, ... > > > > > > This patch fixes it. > > > > > > VMware-BZ: #2700626 > > > Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> > > > --- > > > > LGTM. > > > > Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c index 9c5c633b4171..ee07df7a8bc0 100644 --- a/ofproto/connmgr.c +++ b/ofproto/connmgr.c @@ -2140,7 +2140,7 @@ ofmonitor_report(struct connmgr *mgr, struct rule *rule, const struct rule_actions *old_actions) OVS_REQUIRES(ofproto_mutex) { - if (rule_is_hidden(rule)) { + if (!mgr || rule_is_hidden(rule)) { return; }
ovs-vswitchd could crash under these circumstances: 1. When one bridge is being destroyed, ofproto_destroy() is called and connmgr pointer of its ofproto struct is nullified. This ofproto struct is deallocated through 'ovsrcu_postpone(ofproto_destroy_defer__, p);'. 2. Before RCU enters quiesce state to actually free this ofproto struct, revalidator thread calls udpif_revalidator(), which could handle a learn flow and calls ofproto_flow_mod_learn(), it later calls ofmonitor_report() and ofproto struct's connmgr pointer is accessed. The crash stack trace is shown below: 0 ofmonitor_report (mgr=0x0, rule=rule@entry=0x7fa4ac067c30, event=event@entry=NXFME_ADDED, reason=reason@entry=OFPRR_IDLE_TIMEOUT, abbrev_ofconn=0x0, abbrev_xid=0, old_actions=old_actions@entry=0x0) at ofproto/connmgr.c:2160 1 0x00007fa4d6803495 in add_flow_finish (ofproto=0x55d9075d4ab0, ofm=<optimized out>, req=req@entry=0x0) at ofproto/ofproto.c:5221 2 0x00007fa4d68036af in modify_flows_finish (req=0x0, ofm=0x7fa4980753f0, ofproto=0x55d9075d4ab0) at ofproto/ofproto.c:5823 3 ofproto_flow_mod_finish (ofproto=0x55d9075d4ab0, ofm=ofm@entry=0x7fa4980753f0, req=req@entry=0x0) at ofproto/ofproto.c:8088 4 0x00007fa4d680372d in ofproto_flow_mod_learn_finish (ofm=ofm@entry=0x7fa4980753f0, orig_ofproto=orig_ofproto@entry=0x0) at ofproto/ofproto.c:5439 5 0x00007fa4d68072f9 in ofproto_flow_mod_learn (ofm=0x7fa4980753f0, keep_ref=keep_ref@entry=true, limit=<optimized out>, below_limitp=below_limitp@entry=0x0) at ofproto/ofproto.c:5499 6 0x00007fa4d6835d33 in xlate_push_stats_entry (entry=0x7fa498012448, stats=stats@entry=0x7fa4d2701a10, offloaded=offloaded@entry=false) at ofproto/ofproto-dpif-xlate-cache.c:127 7 0x00007fa4d6835e3a in xlate_push_stats (xcache=<optimized out>, stats=stats@entry=0x7fa4d2701a10, offloaded=offloaded@entry=false) at ofproto/ofproto-dpif-xlate-cache.c:181 8 0x00007fa4d6822046 in revalidate_ukey (udpif=udpif@entry=0x55d90760b240, ukey=ukey@entry=0x7fa4b0191660, stats=stats@entry=0x7fa4d2705118, odp_actions=odp_actions@entry=0x7fa4d2701b50, reval_seq=reval_seq@entry=5655486242, recircs=recircs@entry=0x7fa4d2701b40, offloaded=false) at ofproto/ofproto-dpif-upcall.c:2294 9 0x00007fa4d6825aee in revalidate (revalidator=0x55d90769dd00) at ofproto/ofproto-dpif-upcall.c:2683 10 0x00007fa4d6825cf3 in udpif_revalidator (arg=0x55d90769dd00) at ofproto/ofproto-dpif-upcall.c:936 11 0x00007fa4d6259c9f in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:423 12 0x00007fa4d582cea5 in start_thread () from /usr/lib64/libpthread.so.0 13 0x00007fa4d504b96d in clone () from /usr/lib64/libc.so.6 At the time of crash, the involved ofproto was already deallocated: (gdb) print *ofproto $1 = ..., name = 0x55d907602820 "nsx-managed", ..., ports = {..., one = 0x0, mask = 63, n = 0}, ..., connmgr = 0x0, ... This patch fixes it. VMware-BZ: #2700626 Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> --- ofproto/connmgr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)