diff mbox series

[ovs-dev,v2] connmgr: Fix vswitchd abort when a port is added and the controller is down

Message ID 20181018111705.30801-1-nusiddiq@redhat.com
State Accepted
Headers show
Series [ovs-dev,v2] connmgr: Fix vswitchd abort when a port is added and the controller is down | expand

Commit Message

Numan Siddique Oct. 18, 2018, 11:17 a.m. UTC
From: Numan Siddique <nusiddiq@redhat.com>

We see the below trace when a port is added to a bridge and the configured
controller is down

0x00007fb002f8b207 in raise () from /lib64/libc.so.6
0x00007fb002f8c8f8 in abort () from /lib64/libc.so.6
0x00007fb004953026 in ofputil_protocol_to_ofp_version () from /lib64/libopenvswitch-2.10.so.0
0x00007fb00494e38e in ofputil_encode_port_status () from /lib64/libopenvswitch-2.10.so.0
0x00007fb004ef1c5b in connmgr_send_port_status () from /lib64/libofproto-2.10.so.0
0x00007fb004efa9f4 in ofport_install () from /lib64/libofproto-2.10.so.0
0x00007fb004efbfb2 in update_port () from /lib64/libofproto-2.10.so.0
0x00007fb004efc7f9 in ofproto_port_add () from /lib64/libofproto-2.10.so.0
0x0000556d540a3f95 in bridge_add_ports__ ()
0x0000556d540a5a47 in bridge_reconfigure ()
0x0000556d540a9199 in bridge_run ()
0x0000556d540a02a5 in main ()

The abort is because of ofputil_protocol_to_ofp_version() is called with invalid
protocol - OFPUTIL_P_NONE. Please see [1] for more details. Similar aborts are
seen as reported in [2].

The commit [3] changed the behavior of the function rconn_get_version().
Before the commit [3], the function ofconn_receives_async_msg() would always
return false if the connection to the controller was down, since
rconn_get_version() used to return -1. This patch now checks the rconn
connection status in ofconn_receives_async_msg() and returns false if not
connected. This would avoid the aborts seen in the above stack trace.

The issue can be reproduced by running the test added in this patch
without the fix.

[1] - https://bugzilla.redhat.com/show_bug.cgi?id=1640045
[2] - https://bugzilla.redhat.com/show_bug.cgi?id=1637926

[3] - 476d2551ab ("rconn: Introduce new invariant to fix assertion failure in corner case.")

Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
---

v1 -> v2
-------
 * Addressed the review comment by adding the check for connection
 * status in ofconn_receives_async_msg()

 ofproto/connmgr.c |  4 ++++
 tests/bridge.at   | 21 +++++++++++++++++++++
 2 files changed, 25 insertions(+)

Comments

Eelco Chaudron Oct. 19, 2018, 7:45 a.m. UTC | #1
On 18 Oct 2018, at 13:17, nusiddiq@redhat.com wrote:

> From: Numan Siddique <nusiddiq@redhat.com>
>
> We see the below trace when a port is added to a bridge and the 
> configured
> controller is down
>
> 0x00007fb002f8b207 in raise () from /lib64/libc.so.6
> 0x00007fb002f8c8f8 in abort () from /lib64/libc.so.6
> 0x00007fb004953026 in ofputil_protocol_to_ofp_version () from 
> /lib64/libopenvswitch-2.10.so.0
> 0x00007fb00494e38e in ofputil_encode_port_status () from 
> /lib64/libopenvswitch-2.10.so.0
> 0x00007fb004ef1c5b in connmgr_send_port_status () from 
> /lib64/libofproto-2.10.so.0
> 0x00007fb004efa9f4 in ofport_install () from 
> /lib64/libofproto-2.10.so.0
> 0x00007fb004efbfb2 in update_port () from /lib64/libofproto-2.10.so.0
> 0x00007fb004efc7f9 in ofproto_port_add () from 
> /lib64/libofproto-2.10.so.0
> 0x0000556d540a3f95 in bridge_add_ports__ ()
> 0x0000556d540a5a47 in bridge_reconfigure ()
> 0x0000556d540a9199 in bridge_run ()
> 0x0000556d540a02a5 in main ()
>
> The abort is because of ofputil_protocol_to_ofp_version() is called 
> with invalid
> protocol - OFPUTIL_P_NONE. Please see [1] for more details. Similar 
> aborts are
> seen as reported in [2].
>
> The commit [3] changed the behavior of the function 
> rconn_get_version().
> Before the commit [3], the function ofconn_receives_async_msg() would 
> always
> return false if the connection to the controller was down, since
> rconn_get_version() used to return -1. This patch now checks the rconn
> connection status in ofconn_receives_async_msg() and returns false if 
> not
> connected. This would avoid the aborts seen in the above stack trace.
>
> The issue can be reproduced by running the test added in this patch
> without the fix.
>
> [1] - https://bugzilla.redhat.com/show_bug.cgi?id=1640045
> [2] - https://bugzilla.redhat.com/show_bug.cgi?id=1637926
>
> [3] - 476d2551ab ("rconn: Introduce new invariant to fix assertion 
> failure in corner case.")
>
> Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
> ---

Change looks good to me.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Ben Pfaff Oct. 23, 2018, 5 p.m. UTC | #2
On Thu, Oct 18, 2018 at 04:47:05PM +0530, nusiddiq@redhat.com wrote:
> From: Numan Siddique <nusiddiq@redhat.com>
> 
> We see the below trace when a port is added to a bridge and the configured
> controller is down
> 
> 0x00007fb002f8b207 in raise () from /lib64/libc.so.6
> 0x00007fb002f8c8f8 in abort () from /lib64/libc.so.6
> 0x00007fb004953026 in ofputil_protocol_to_ofp_version () from /lib64/libopenvswitch-2.10.so.0
> 0x00007fb00494e38e in ofputil_encode_port_status () from /lib64/libopenvswitch-2.10.so.0
> 0x00007fb004ef1c5b in connmgr_send_port_status () from /lib64/libofproto-2.10.so.0
> 0x00007fb004efa9f4 in ofport_install () from /lib64/libofproto-2.10.so.0
> 0x00007fb004efbfb2 in update_port () from /lib64/libofproto-2.10.so.0
> 0x00007fb004efc7f9 in ofproto_port_add () from /lib64/libofproto-2.10.so.0
> 0x0000556d540a3f95 in bridge_add_ports__ ()
> 0x0000556d540a5a47 in bridge_reconfigure ()
> 0x0000556d540a9199 in bridge_run ()
> 0x0000556d540a02a5 in main ()
> 
> The abort is because of ofputil_protocol_to_ofp_version() is called with invalid
> protocol - OFPUTIL_P_NONE. Please see [1] for more details. Similar aborts are
> seen as reported in [2].
> 
> The commit [3] changed the behavior of the function rconn_get_version().
> Before the commit [3], the function ofconn_receives_async_msg() would always
> return false if the connection to the controller was down, since
> rconn_get_version() used to return -1. This patch now checks the rconn
> connection status in ofconn_receives_async_msg() and returns false if not
> connected. This would avoid the aborts seen in the above stack trace.
> 
> The issue can be reproduced by running the test added in this patch
> without the fix.
> 
> [1] - https://bugzilla.redhat.com/show_bug.cgi?id=1640045
> [2] - https://bugzilla.redhat.com/show_bug.cgi?id=1637926
> 
> [3] - 476d2551ab ("rconn: Introduce new invariant to fix assertion failure in corner case.")
> 
> Signed-off-by: Numan Siddique <nusiddiq@redhat.com>

Thanks, applied and backported.
diff mbox series

Patch

diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c
index f78b4c5ff..008df9eb8 100644
--- a/ofproto/connmgr.c
+++ b/ofproto/connmgr.c
@@ -1538,6 +1538,10 @@  ofconn_receives_async_msg(const struct ofconn *ofconn,
     ovs_assert(reason < 32);
     ovs_assert((unsigned int) type < OAM_N_TYPES);
 
+    if (!rconn_is_connected(ofconn->rconn)) {
+        return false;
+    }
+
     /* Keep the following code in sync with the documentation in the
      * "Asynchronous Messages" section in 'topics/design' */
 
diff --git a/tests/bridge.at b/tests/bridge.at
index 1c3618563..ee398bdb1 100644
--- a/tests/bridge.at
+++ b/tests/bridge.at
@@ -79,3 +79,24 @@  AT_CHECK([ovs-vsctl --columns=status list controller | dnl
 OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
 OVS_APP_EXIT_AND_WAIT([ovsdb-server])
 AT_CLEANUP
+
+AT_SETUP([bridge - add port after stopping controller])
+OVS_VSWITCHD_START
+
+dnl Start ovs-testcontroller
+ovs-testcontroller --detach punix:controller --pidfile=ovs-testcontroller.pid
+OVS_WAIT_UNTIL([test -e controller])
+
+AT_CHECK([ovs-vsctl set-controller br0 unix:controller])
+AT_CHECK([ovs-vsctl add-port br0 p1 -- set Interface p1 type=internal], [0], [ignore])
+AT_CHECK([ovs-appctl -t ovs-vswitchd version], [0], [ignore])
+
+# Now kill the ovs-testcontroller
+kill `cat ovs-testcontroller.pid`
+OVS_WAIT_UNTIL([! test -e controller])
+AT_CHECK([ovs-vsctl --no-wait add-port br0 p2 -- set Interface p2 type=internal], [0], [ignore])
+AT_CHECK([ovs-appctl -t ovs-vswitchd version], [0], [ignore])
+
+OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
+OVS_APP_EXIT_AND_WAIT([ovsdb-server])
+AT_CLEANUP