Message ID | 20161223172343.4405-1-blp@ovn.org |
---|---|
State | Accepted |
Headers | show |
> On Dec 23, 2016, at 9:23 AM, Ben Pfaff <blp@ovn.org> wrote: > > If an rconn peer fails to send a hello message, the version number doesn't > get set. Later, if the peer delays long enough, the rconn attempts to send > an echo request but assert-fails instead because it doesn't know what > version to use. This fixes the problem. > > To reproduce this problem: > > make sandbox > ovs-vsctl add-br br0 > ovs-vsctl set-controller br0 ptcp:12345 > nc 127.0.0.1 12345 > > and wait 10 seconds for ovs-vswitchd to die. (Then exit the sandbox.) > > Reported-by: 张东亚 <fortitude.zhang@gmail.com> > Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org> --Justin
On Fri, Dec 23, 2016 at 01:02:00PM -0800, Justin Pettit wrote: > > > On Dec 23, 2016, at 9:23 AM, Ben Pfaff <blp@ovn.org> wrote: > > > > If an rconn peer fails to send a hello message, the version number doesn't > > get set. Later, if the peer delays long enough, the rconn attempts to send > > an echo request but assert-fails instead because it doesn't know what > > version to use. This fixes the problem. > > > > To reproduce this problem: > > > > make sandbox > > ovs-vsctl add-br br0 > > ovs-vsctl set-controller br0 ptcp:12345 > > nc 127.0.0.1 12345 > > > > and wait 10 seconds for ovs-vswitchd to die. (Then exit the sandbox.) > > > > Reported-by: 张东亚 <fortitude.zhang@gmail.com> > > Signed-off-by: Ben Pfaff <blp@ovn.org> > > Acked-by: Justin Pettit <jpettit@ovn.org> Thanks. I applied this to master and backported it as far as branch-2.0.
diff --git a/lib/rconn.c b/lib/rconn.c index 0c1812a..8a29864 100644 --- a/lib/rconn.c +++ b/lib/rconn.c @@ -558,19 +558,23 @@ run_ACTIVE(struct rconn *rc) { if (timed_out(rc)) { unsigned int base = MAX(rc->last_activity, rc->state_entered); - int version; - VLOG_DBG("%s: idle %u seconds, sending inactivity probe", rc->name, (unsigned int) (time_now() - base)); - version = rconn_get_version__(rc); - ovs_assert(version >= 0 && version <= 0xff); - /* Ordering is important here: rconn_send() can transition to BACKOFF, * and we don't want to transition back to IDLE if so, because then we * can end up queuing a packet with vconn == NULL and then *boom*. */ state_transition(rc, S_IDLE); - rconn_send__(rc, make_echo_request(version), NULL); + + /* Send an echo request if we can. (If version negotiation is not + * complete, that is, if we did not yet receive a "hello" message from + * the peer, we do not know the version to use, so we don't send + * anything.) */ + int version = rconn_get_version__(rc); + if (version >= 0 && version <= 0xff) { + rconn_send__(rc, make_echo_request(version), NULL); + } + return; }
If an rconn peer fails to send a hello message, the version number doesn't get set. Later, if the peer delays long enough, the rconn attempts to send an echo request but assert-fails instead because it doesn't know what version to use. This fixes the problem. To reproduce this problem: make sandbox ovs-vsctl add-br br0 ovs-vsctl set-controller br0 ptcp:12345 nc 127.0.0.1 12345 and wait 10 seconds for ovs-vswitchd to die. (Then exit the sandbox.) Reported-by: 张东亚 <fortitude.zhang@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> --- lib/rconn.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-)