Message ID | 20181018111705.30801-1-nusiddiq@redhat.com |
---|---|
State | Accepted |
Headers | show |
Series | [ovs-dev,v2] connmgr: Fix vswitchd abort when a port is added and the controller is down | expand |
On 18 Oct 2018, at 13:17, nusiddiq@redhat.com wrote: > From: Numan Siddique <nusiddiq@redhat.com> > > We see the below trace when a port is added to a bridge and the > configured > controller is down > > 0x00007fb002f8b207 in raise () from /lib64/libc.so.6 > 0x00007fb002f8c8f8 in abort () from /lib64/libc.so.6 > 0x00007fb004953026 in ofputil_protocol_to_ofp_version () from > /lib64/libopenvswitch-2.10.so.0 > 0x00007fb00494e38e in ofputil_encode_port_status () from > /lib64/libopenvswitch-2.10.so.0 > 0x00007fb004ef1c5b in connmgr_send_port_status () from > /lib64/libofproto-2.10.so.0 > 0x00007fb004efa9f4 in ofport_install () from > /lib64/libofproto-2.10.so.0 > 0x00007fb004efbfb2 in update_port () from /lib64/libofproto-2.10.so.0 > 0x00007fb004efc7f9 in ofproto_port_add () from > /lib64/libofproto-2.10.so.0 > 0x0000556d540a3f95 in bridge_add_ports__ () > 0x0000556d540a5a47 in bridge_reconfigure () > 0x0000556d540a9199 in bridge_run () > 0x0000556d540a02a5 in main () > > The abort is because of ofputil_protocol_to_ofp_version() is called > with invalid > protocol - OFPUTIL_P_NONE. Please see [1] for more details. Similar > aborts are > seen as reported in [2]. > > The commit [3] changed the behavior of the function > rconn_get_version(). > Before the commit [3], the function ofconn_receives_async_msg() would > always > return false if the connection to the controller was down, since > rconn_get_version() used to return -1. This patch now checks the rconn > connection status in ofconn_receives_async_msg() and returns false if > not > connected. This would avoid the aborts seen in the above stack trace. > > The issue can be reproduced by running the test added in this patch > without the fix. > > [1] - https://bugzilla.redhat.com/show_bug.cgi?id=1640045 > [2] - https://bugzilla.redhat.com/show_bug.cgi?id=1637926 > > [3] - 476d2551ab ("rconn: Introduce new invariant to fix assertion > failure in corner case.") > > Signed-off-by: Numan Siddique <nusiddiq@redhat.com> > --- Change looks good to me. Acked-by: Eelco Chaudron <echaudro@redhat.com>
On Thu, Oct 18, 2018 at 04:47:05PM +0530, nusiddiq@redhat.com wrote: > From: Numan Siddique <nusiddiq@redhat.com> > > We see the below trace when a port is added to a bridge and the configured > controller is down > > 0x00007fb002f8b207 in raise () from /lib64/libc.so.6 > 0x00007fb002f8c8f8 in abort () from /lib64/libc.so.6 > 0x00007fb004953026 in ofputil_protocol_to_ofp_version () from /lib64/libopenvswitch-2.10.so.0 > 0x00007fb00494e38e in ofputil_encode_port_status () from /lib64/libopenvswitch-2.10.so.0 > 0x00007fb004ef1c5b in connmgr_send_port_status () from /lib64/libofproto-2.10.so.0 > 0x00007fb004efa9f4 in ofport_install () from /lib64/libofproto-2.10.so.0 > 0x00007fb004efbfb2 in update_port () from /lib64/libofproto-2.10.so.0 > 0x00007fb004efc7f9 in ofproto_port_add () from /lib64/libofproto-2.10.so.0 > 0x0000556d540a3f95 in bridge_add_ports__ () > 0x0000556d540a5a47 in bridge_reconfigure () > 0x0000556d540a9199 in bridge_run () > 0x0000556d540a02a5 in main () > > The abort is because of ofputil_protocol_to_ofp_version() is called with invalid > protocol - OFPUTIL_P_NONE. Please see [1] for more details. Similar aborts are > seen as reported in [2]. > > The commit [3] changed the behavior of the function rconn_get_version(). > Before the commit [3], the function ofconn_receives_async_msg() would always > return false if the connection to the controller was down, since > rconn_get_version() used to return -1. This patch now checks the rconn > connection status in ofconn_receives_async_msg() and returns false if not > connected. This would avoid the aborts seen in the above stack trace. > > The issue can be reproduced by running the test added in this patch > without the fix. > > [1] - https://bugzilla.redhat.com/show_bug.cgi?id=1640045 > [2] - https://bugzilla.redhat.com/show_bug.cgi?id=1637926 > > [3] - 476d2551ab ("rconn: Introduce new invariant to fix assertion failure in corner case.") > > Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Thanks, applied and backported.
diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c index f78b4c5ff..008df9eb8 100644 --- a/ofproto/connmgr.c +++ b/ofproto/connmgr.c @@ -1538,6 +1538,10 @@ ofconn_receives_async_msg(const struct ofconn *ofconn, ovs_assert(reason < 32); ovs_assert((unsigned int) type < OAM_N_TYPES); + if (!rconn_is_connected(ofconn->rconn)) { + return false; + } + /* Keep the following code in sync with the documentation in the * "Asynchronous Messages" section in 'topics/design' */ diff --git a/tests/bridge.at b/tests/bridge.at index 1c3618563..ee398bdb1 100644 --- a/tests/bridge.at +++ b/tests/bridge.at @@ -79,3 +79,24 @@ AT_CHECK([ovs-vsctl --columns=status list controller | dnl OVS_APP_EXIT_AND_WAIT([ovs-vswitchd]) OVS_APP_EXIT_AND_WAIT([ovsdb-server]) AT_CLEANUP + +AT_SETUP([bridge - add port after stopping controller]) +OVS_VSWITCHD_START + +dnl Start ovs-testcontroller +ovs-testcontroller --detach punix:controller --pidfile=ovs-testcontroller.pid +OVS_WAIT_UNTIL([test -e controller]) + +AT_CHECK([ovs-vsctl set-controller br0 unix:controller]) +AT_CHECK([ovs-vsctl add-port br0 p1 -- set Interface p1 type=internal], [0], [ignore]) +AT_CHECK([ovs-appctl -t ovs-vswitchd version], [0], [ignore]) + +# Now kill the ovs-testcontroller +kill `cat ovs-testcontroller.pid` +OVS_WAIT_UNTIL([! test -e controller]) +AT_CHECK([ovs-vsctl --no-wait add-port br0 p2 -- set Interface p2 type=internal], [0], [ignore]) +AT_CHECK([ovs-appctl -t ovs-vswitchd version], [0], [ignore]) + +OVS_APP_EXIT_AND_WAIT([ovs-vswitchd]) +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) +AT_CLEANUP