Message ID | 20211025143632.1912999-1-msantana@redhat.com |
---|---|
State | Changes Requested |
Headers | show |
Series | [ovs-dev] TCP Stream: Use TCP keepalive by default | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
On Mon, Oct 25, 2021 at 10:36:32AM -0400, Michael Santana wrote: > In the case that a client disables jsonrpc probes the client would fail > to detect if the connection to the server has dropped. To workaround > such case TCP keepalive is enabled. > > Signed-off-by: Michael Santana <msantana@redhat.com> > --- Patch looks good and makes sense to me. It builds fine on Windows as well. Acked-by: Flavio Leitner <fbl@sysclose.org> fbl
On 10/25/21 16:36, Michael Santana wrote: > In the case that a client disables jsonrpc probes the client would fail > to detect if the connection to the server has dropped. To workaround > such case TCP keepalive is enabled. > > Signed-off-by: Michael Santana <msantana@redhat.com> > --- Hi, Michael. Thanks for the patch. But I'm not sure why we need this, at least in current form. Standard keepalive configuration on modern systems is set to something around 2 hours most of the time. So, the user might have 2 hours of downtime and not even notice. TCP keepalives might be useful for the case where user knows that application may not reply for a long time, so they have to set the inactivity probe to a higher value. In this case, we could detect connection failure with TCP keepalive using shorter time interval. Having TCP keepalive configured to a very long interval (which is system default), IMHO, doesn't make a lot of sense. I would also argue that inactivity probes should never be disabled, but set to a higher value instead, because TCP keepalive will not be able to detect hanged application, e.g. deadlock. Best regards, Ilya Maximets.
On Tue, Nov 16, 2021 at 09:54:54PM +0100, Ilya Maximets wrote: > On 10/25/21 16:36, Michael Santana wrote: > > In the case that a client disables jsonrpc probes the client would fail > > to detect if the connection to the server has dropped. To workaround > > such case TCP keepalive is enabled. > > > > Signed-off-by: Michael Santana <msantana@redhat.com> > > --- > > Hi, Michael. Thanks for the patch. But I'm not sure why we need this, > at least in current form. > > Standard keepalive configuration on modern systems is set to something > around 2 hours most of the time. So, the user might have 2 hours of > downtime and not even notice. > > TCP keepalives might be useful for the case where user knows that > application may not reply for a long time, so they have to set the > inactivity probe to a higher value. In this case, we could detect > connection failure with TCP keepalive using shorter time interval. > > Having TCP keepalive configured to a very long interval (which is system > default), IMHO, doesn't make a lot of sense. > > I would also argue that inactivity probes should never be disabled, but > set to a higher value instead, because TCP keepalive will not be able > to detect hanged application, e.g. deadlock. That's a good point. My concern is that either we will need to stop allowing setting to 0 and that can break upgrades, or we need to silently set a big interval instead, which is not a good program behavior IMHO. Also, it sounds like not allowing to disable actually prevents the user to shoot himself in the foot, but I don't know for sure if that has a valid use-case. Even if we decide to improve the built-in inactivity probes, it still seems like a good idea to enable TCP keepalive as a second layer of fault detection. BTW, the patch missed the tag below Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1988461
diff --git a/lib/socket-util.c b/lib/socket-util.c index 4f1ffecf5..dbf35f9bb 100644 --- a/lib/socket-util.c +++ b/lib/socket-util.c @@ -114,6 +114,19 @@ setsockopt_tcp_nodelay(int fd) } } +void +setsockopt_tcp_keepalive(int fd) +{ + int on = 1; + int retval; + + retval = setsockopt(fd, SOL_SOCKET, SO_KEEPALIVE, &on, sizeof on); + if (retval) { + retval = sock_errno(); + VLOG_ERR("setsockopt(SO_KEEPALIVE): %s", sock_strerror(retval)); + } +} + /* Sets the DSCP value of socket 'fd' to 'dscp', which must be 63 or less. * 'family' must indicate the socket's address family (AF_INET or AF_INET6, to * do anything useful). */ diff --git a/lib/socket-util.h b/lib/socket-util.h index 9ccb7d4cc..f6cf00a32 100644 --- a/lib/socket-util.h +++ b/lib/socket-util.h @@ -33,6 +33,7 @@ struct ds; int set_nonblocking(int fd); void xset_nonblocking(int fd); void setsockopt_tcp_nodelay(int fd); +void setsockopt_tcp_keepalive(int fd); int set_dscp(int fd, int family, uint8_t dscp); bool addr_is_ipv6(const char *host_name); diff --git a/lib/stream-fd.c b/lib/stream-fd.c index 46ee7ae27..df609579c 100644 --- a/lib/stream-fd.c +++ b/lib/stream-fd.c @@ -93,6 +93,7 @@ fd_connect(struct stream *stream) int retval = check_connection_completion(s->fd); if (retval == 0 && s->fd_type == AF_INET) { setsockopt_tcp_nodelay(s->fd); + setsockopt_tcp_keepalive(s->fd); } return retval; } diff --git a/lib/stream-ssl.c b/lib/stream-ssl.c index 0ea3f2c08..0a47208da 100644 --- a/lib/stream-ssl.c +++ b/lib/stream-ssl.c @@ -267,6 +267,7 @@ new_ssl_stream(char *name, char *server_name, int fd, enum session_type type, */ if (state == STATE_SSL_CONNECTING) { setsockopt_tcp_nodelay(fd); + setsockopt_tcp_keepalive(fd); } /* Create and configure OpenSSL stream. */ @@ -521,6 +522,7 @@ ssl_connect(struct stream *stream) } sslv->state = STATE_SSL_CONNECTING; setsockopt_tcp_nodelay(sslv->fd); + setsockopt_tcp_keepalive(sslv->fd); /* Fall through. */ case STATE_SSL_CONNECTING: diff --git a/lib/stream-tcp.c b/lib/stream-tcp.c index e8dc2bfaa..48443b9be 100644 --- a/lib/stream-tcp.c +++ b/lib/stream-tcp.c @@ -43,6 +43,7 @@ new_tcp_stream(char *name, int fd, int connect_status, struct stream **streamp) { if (connect_status == 0) { setsockopt_tcp_nodelay(fd); + setsockopt_tcp_keepalive(fd); } return new_fd_stream(name, fd, connect_status, AF_INET, streamp);
In the case that a client disables jsonrpc probes the client would fail to detect if the connection to the server has dropped. To workaround such case TCP keepalive is enabled. Signed-off-by: Michael Santana <msantana@redhat.com> --- lib/socket-util.c | 13 +++++++++++++ lib/socket-util.h | 1 + lib/stream-fd.c | 1 + lib/stream-ssl.c | 2 ++ lib/stream-tcp.c | 1 + 5 files changed, 18 insertions(+)