Message ID | 20200327161936.2225989-4-eblake@redhat.com |
---|---|
State | New |
Headers | show |
Series | nbd: Try for cleaner TLS shutdown | expand |
On Fri, Mar 27, 2020 at 11:19:36AM -0500, Eric Blake wrote: > Although the remote end should always be tolerant of a socket being > arbitrarily closed, there are situations where it is a lot easier if > the remote end can be guaranteed to read EOF even before the socket > has closed. In particular, when using gnutls, if we fail to inform > the remote end about an impending teardown, the remote end cannot > distinguish between our closing the socket as intended vs. a malicious > intermediary interrupting things, and may result in spurious error > messages. Does this actually matter in the NBD case ? It has an explicit NBD command for requesting shutdown, and once that's processed, it is fine to just close the socket abruptly - I don't see a benefit to a TLS shutdown sequence on top. AFAIK, the TLS level clean shutdown is only required if the application protocol does not have any way to determine an unexpected shutdown itself. This is relevant for HTTP where the connection data stream may not have a well defined end condition. In the NBD case though, we have an explicit NBD_CMD_DISC to trigger the disconnect. After processing that message, an EOF is acceptable regardless of whether , before processing that message, any EOF is a unexpected. > Or, we can end up with a deadlock where both ends are stuck > on a read() from the other end but neither gets an EOF. If the socket has been closed abruptly why would it get stuck in read() - it should see EOF surely ? > Thus, after > any time a client sends NBD_OPT_ABORT or NBD_CMD_DISC, or a server has > finished replying (where appropriate) to such a request, it is worth > informing the channel that we will not be transmitting anything else. Regards, Daniel
On 3/27/20 11:35 AM, Daniel P. Berrangé wrote: > On Fri, Mar 27, 2020 at 11:19:36AM -0500, Eric Blake wrote: >> Although the remote end should always be tolerant of a socket being >> arbitrarily closed, there are situations where it is a lot easier if >> the remote end can be guaranteed to read EOF even before the socket >> has closed. In particular, when using gnutls, if we fail to inform >> the remote end about an impending teardown, the remote end cannot >> distinguish between our closing the socket as intended vs. a malicious >> intermediary interrupting things, and may result in spurious error >> messages. > > Does this actually matter in the NBD case ? > > It has an explicit NBD command for requesting shutdown, and once > that's processed, it is fine to just close the socket abruptly - I > don't see a benefit to a TLS shutdown sequence on top. You're right that the NBD protocol has ways for the client to advertise it will be shutting down, AND documents that the server must be robust to clients that just abruptly disconnect after that point. But we don't have control over all such servers, and there may very well be a server that logs an error on abrupt closure, where it would be silent if we did a proper gnutls_bye. Which is more important: maximum speed in disconnecting after we expressed intent, or maximum attempt at catering to all sorts of remote implementations that might not be as tolerant as qemu is of an abrupt termination? > AFAIK, the TLS level clean shutdown is only required if the > application protocol does not have any way to determine an > unexpected shutdown itself. 'man gnutls_bye' states: Note that not all implementations will properly terminate a TLS connec‐ tion. Some of them, usually for performance reasons, will terminate only the underlying transport layer, and thus not distinguishing between a malicious party prematurely terminating the connection and normal termination. You're right that because the protocol has an explicit message, we can reliably distinguish any early termination prior to NBD_OPT_ABORT/NBD_CMD_DISC as being malicious, so the only case where it matters is if we have a premature termination after we asked for clean shutdown, at which point a malicious termination didn't lose any data. So on that front, I guess you are right that not using gnutls_bye isn't going to have much impact. > > This is relevant for HTTP where the connection data stream may not > have a well defined end condition. > > In the NBD case though, we have an explicit NBD_CMD_DISC to trigger > the disconnect. After processing that message, an EOF is acceptable > regardless of whether , > before processing that message, any EOF is a unexpected. > >> Or, we can end up with a deadlock where both ends are stuck >> on a read() from the other end but neither gets an EOF. > > If the socket has been closed abruptly why would it get stuck in > read() - it should see EOF surely ? That's what I'm trying to figure out: the nbdkit testsuite definitely hung even though 'qemu-nbd --list' exited, but I haven't yet figured out whether the bug lies in nbdkit proper or in libnbd, nor whether a cleaner tls shutdown would have prevented the hang in a more reliable manner. https://www.redhat.com/archives/libguestfs/2020-March/msg00191.html
On Fri, Mar 27, 2020 at 12:42:21PM -0500, Eric Blake wrote: > On 3/27/20 11:35 AM, Daniel P. Berrangé wrote: > > On Fri, Mar 27, 2020 at 11:19:36AM -0500, Eric Blake wrote: > > > Although the remote end should always be tolerant of a socket being > > > arbitrarily closed, there are situations where it is a lot easier if > > > the remote end can be guaranteed to read EOF even before the socket > > > has closed. In particular, when using gnutls, if we fail to inform > > > the remote end about an impending teardown, the remote end cannot > > > distinguish between our closing the socket as intended vs. a malicious > > > intermediary interrupting things, and may result in spurious error > > > messages. > > > > Does this actually matter in the NBD case ? > > > > It has an explicit NBD command for requesting shutdown, and once > > that's processed, it is fine to just close the socket abruptly - I > > don't see a benefit to a TLS shutdown sequence on top. > > You're right that the NBD protocol has ways for the client to advertise it > will be shutting down, AND documents that the server must be robust to > clients that just abruptly disconnect after that point. But we don't have > control over all such servers, and there may very well be a server that logs > an error on abrupt closure, where it would be silent if we did a proper > gnutls_bye. Which is more important: maximum speed in disconnecting after > we expressed intent, or maximum attempt at catering to all sorts of remote > implementations that might not be as tolerant as qemu is of an abrupt > termination? It is the cost / benefit tradeoff here that matters. Correctly using gnutls_bye(), in contexts which aren't expected to block is non-trivial bringing notable extra code complexity. It isn't an obvious win to me for something that just changes an error message for a scenario that can already be cleanly handled at the application protocol level. > > > AFAIK, the TLS level clean shutdown is only required if the > > application protocol does not have any way to determine an > > unexpected shutdown itself. > > 'man gnutls_bye' states: > > Note that not all implementations will properly terminate a TLS > connec‐ > tion. Some of them, usually for performance reasons, will > terminate > only the underlying transport layer, and thus not > distinguishing > between a malicious party prematurely terminating the connection > and > normal termination. > > You're right that because the protocol has an explicit message, we can > reliably distinguish any early termination prior to > NBD_OPT_ABORT/NBD_CMD_DISC as being malicious, so the only case where it > matters is if we have a premature termination after we asked for clean > shutdown, at which point a malicious termination didn't lose any data. So on > that front, I guess you are right that not using gnutls_bye isn't going to > have much impact. > > > > > This is relevant for HTTP where the connection data stream may not > > have a well defined end condition. > > > > In the NBD case though, we have an explicit NBD_CMD_DISC to trigger > > the disconnect. After processing that message, an EOF is acceptable > > regardless of whether , > > before processing that message, any EOF is a unexpected. > > > > > Or, we can end up with a deadlock where both ends are stuck > > > on a read() from the other end but neither gets an EOF. > > > > If the socket has been closed abruptly why would it get stuck in > > read() - it should see EOF surely ? > > That's what I'm trying to figure out: the nbdkit testsuite definitely hung > even though 'qemu-nbd --list' exited, but I haven't yet figured out whether > the bug lies in nbdkit proper or in libnbd, nor whether a cleaner tls > shutdown would have prevented the hang in a more reliable manner. > https://www.redhat.com/archives/libguestfs/2020-March/msg00191.html Regards, Daniel
diff --git a/block/nbd.c b/block/nbd.c index 2160859f6499..2906484390f9 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -1402,6 +1402,7 @@ static void nbd_client_close(BlockDriverState *bs) if (s->ioc) { nbd_send_request(s->ioc, &request); + qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_WRITE, NULL); } nbd_teardown_connection(bs); diff --git a/nbd/client.c b/nbd/client.c index ba173108baab..1b8b3a9ae3bd 100644 --- a/nbd/client.c +++ b/nbd/client.c @@ -1,5 +1,5 @@ /* - * Copyright (C) 2016-2019 Red Hat, Inc. + * Copyright (C) 2016-2020 Red Hat, Inc. * Copyright (C) 2005 Anthony Liguori <anthony@codemonkey.ws> * * Network Block Device Client Side @@ -103,6 +103,7 @@ static void nbd_send_opt_abort(QIOChannel *ioc) * even care if the request makes it to the server, let alone * waiting around for whether the server replies. */ nbd_send_option_request(ioc, NBD_OPT_ABORT, 0, NULL, NULL); + qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_WRITE, NULL); } diff --git a/nbd/server.c b/nbd/server.c index 02b1ed080145..e21a1f662cc2 100644 --- a/nbd/server.c +++ b/nbd/server.c @@ -1168,6 +1168,8 @@ static int nbd_negotiate_options(NBDClient *client, Error **errp) "Option 0x%" PRIx32 " not permitted before TLS", option); if (option == NBD_OPT_ABORT) { + qio_channel_shutdown(client->ioc, + QIO_CHANNEL_SHUTDOWN_WRITE, NULL); return 1; } break; @@ -1187,6 +1189,8 @@ static int nbd_negotiate_options(NBDClient *client, Error **errp) * disconnecting, but that we must also tolerate * guests that don't wait for our reply. */ nbd_negotiate_send_rep(client, NBD_REP_ACK, NULL); + qio_channel_shutdown(client->ioc, + QIO_CHANNEL_SHUTDOWN_WRITE, NULL); return 1; case NBD_OPT_EXPORT_NAME:
Although the remote end should always be tolerant of a socket being arbitrarily closed, there are situations where it is a lot easier if the remote end can be guaranteed to read EOF even before the socket has closed. In particular, when using gnutls, if we fail to inform the remote end about an impending teardown, the remote end cannot distinguish between our closing the socket as intended vs. a malicious intermediary interrupting things, and may result in spurious error messages. Or, we can end up with a deadlock where both ends are stuck on a read() from the other end but neither gets an EOF. Thus, after any time a client sends NBD_OPT_ABORT or NBD_CMD_DISC, or a server has finished replying (where appropriate) to such a request, it is worth informing the channel that we will not be transmitting anything else. Signed-off-by: Eric Blake <eblake@redhat.com> --- block/nbd.c | 1 + nbd/client.c | 3 ++- nbd/server.c | 4 ++++ 3 files changed, 7 insertions(+), 1 deletion(-)