Message ID | 20200327161936.2225989-3-eblake@redhat.com |
---|---|
State | New |
Headers | show |
Series | nbd: Try for cleaner TLS shutdown | expand |
On Fri, Mar 27, 2020 at 11:19:35AM -0500, Eric Blake wrote: > Gnutls documents that while many apps simply yank out the underlying > transport at the end of communication in the name of efficiency, this > is indistinguishable from a malicious actor terminating the connection > prematurely. Since our channel I/O code already supports the notion of > a graceful shutdown request, it is time to plumb that through to the > TLS layer, and wait for TLS to give the all clear before then > terminating traffic on the underlying channel. > > Note that channel-tls now always advertises shutdown support, > regardless of whether the underlying channel also has that support. > > Signed-off-by: Eric Blake <eblake@redhat.com> > --- > io/channel-tls.c | 27 ++++++++++++++++++++++++++- > 1 file changed, 26 insertions(+), 1 deletion(-) > > diff --git a/io/channel-tls.c b/io/channel-tls.c > index 7ec8ceff2f01..f90905823e1d 100644 > --- a/io/channel-tls.c > +++ b/io/channel-tls.c > @@ -360,10 +360,35 @@ static int qio_channel_tls_shutdown(QIOChannel *ioc, > Error **errp) > { > QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc); > + int ret = 0; > > tioc->shutdown |= how; > > - return qio_channel_shutdown(tioc->master, how, errp); > + do { > + switch (how) { > + case QIO_CHANNEL_SHUTDOWN_READ: > + /* No TLS counterpart */ > + break; > + case QIO_CHANNEL_SHUTDOWN_WRITE: > + ret = qcrypto_tls_session_shutdown(tioc->session, QCRYPTO_SHUT_WR); > + break; > + case QIO_CHANNEL_SHUTDOWN_BOTH: > + ret = qcrypto_tls_session_shutdown(tioc->session, > + QCRYPTO_SHUT_RDWR); > + break; > + default: > + abort(); > + } > + } while (ret == -EAGAIN); I don't think it is acceptable to do this loop here. The gnutls_bye() function triggers several I/O operations which could block. Looping like this means we busy-wait, blocking this thread for as long as I/O is blocking on the socket. If we must call gnutls_bye(), then it needs to be done in a way that can integrate with the main loop so it poll()'s / unblocks the current coroutine/thread. This makes the whole thing significantly more complex to deal with, especially if the shutdown is being done in cleanup paths which ordinarily are expected to execute without blocking on I/O. This is the big reason why i never made any attempt to use gnutls_bye(). Regards, Daniel
On 3/27/20 11:40 AM, Daniel P. Berrangé wrote: > On Fri, Mar 27, 2020 at 11:19:35AM -0500, Eric Blake wrote: >> Gnutls documents that while many apps simply yank out the underlying >> transport at the end of communication in the name of efficiency, this >> is indistinguishable from a malicious actor terminating the connection >> prematurely. Since our channel I/O code already supports the notion of >> a graceful shutdown request, it is time to plumb that through to the >> TLS layer, and wait for TLS to give the all clear before then >> terminating traffic on the underlying channel. >> >> Note that channel-tls now always advertises shutdown support, >> regardless of whether the underlying channel also has that support. >> >> Signed-off-by: Eric Blake <eblake@redhat.com> >> --- >> io/channel-tls.c | 27 ++++++++++++++++++++++++++- >> 1 file changed, 26 insertions(+), 1 deletion(-) >> >> diff --git a/io/channel-tls.c b/io/channel-tls.c >> index 7ec8ceff2f01..f90905823e1d 100644 >> --- a/io/channel-tls.c >> +++ b/io/channel-tls.c >> @@ -360,10 +360,35 @@ static int qio_channel_tls_shutdown(QIOChannel *ioc, >> Error **errp) >> { >> QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc); >> + int ret = 0; >> >> tioc->shutdown |= how; >> >> - return qio_channel_shutdown(tioc->master, how, errp); >> + do { >> + switch (how) { >> + case QIO_CHANNEL_SHUTDOWN_READ: >> + /* No TLS counterpart */ >> + break; >> + case QIO_CHANNEL_SHUTDOWN_WRITE: >> + ret = qcrypto_tls_session_shutdown(tioc->session, QCRYPTO_SHUT_WR); >> + break; >> + case QIO_CHANNEL_SHUTDOWN_BOTH: >> + ret = qcrypto_tls_session_shutdown(tioc->session, >> + QCRYPTO_SHUT_RDWR); >> + break; >> + default: >> + abort(); >> + } >> + } while (ret == -EAGAIN); > > I don't think it is acceptable to do this loop here. The gnutls_bye() > function triggers several I/O operations which could block. Looping > like this means we busy-wait, blocking this thread for as long as I/O > is blocking on the socket. Hmm, good point. Should we give qio_channel_tls_shutdown a bool parameter that says whether it should wait (good for use when we are being run in a coroutine and can deal with the additional I/O) or just fail with -EAGAIN (which the caller can ignore if it is not worried)? > > If we must call gnutls_bye(), then it needs to be done in a way that > can integrate with the main loop so it poll()'s / unblocks the current > coroutine/thread. This makes the whole thing significantly more > complex to deal with, especially if the shutdown is being done in > cleanup paths which ordinarily are expected to execute without > blocking on I/O. This is the big reason why i never made any attempt > to use gnutls_bye(). We _are_ using gnutls_bye(GNUTLS_SHUT_RDWR) on the close() path (which is indeed a cleanup path where not blocking is worthwhile) even without this patch, but the question is whether using gnutls_bye(GNUTLS_SHUT_WR) in the normal data path, where we are already using coroutines to manage callbacks, can benefit the remote endpoint, giving them a chance to see clean shutdown instead of abrupt termination.
On Fri, Mar 27, 2020 at 12:29:39PM -0500, Eric Blake wrote: > On 3/27/20 11:40 AM, Daniel P. Berrangé wrote: > > On Fri, Mar 27, 2020 at 11:19:35AM -0500, Eric Blake wrote: > > > Gnutls documents that while many apps simply yank out the underlying > > > transport at the end of communication in the name of efficiency, this > > > is indistinguishable from a malicious actor terminating the connection > > > prematurely. Since our channel I/O code already supports the notion of > > > a graceful shutdown request, it is time to plumb that through to the > > > TLS layer, and wait for TLS to give the all clear before then > > > terminating traffic on the underlying channel. > > > > > > Note that channel-tls now always advertises shutdown support, > > > regardless of whether the underlying channel also has that support. > > > > > > Signed-off-by: Eric Blake <eblake@redhat.com> > > > --- > > > io/channel-tls.c | 27 ++++++++++++++++++++++++++- > > > 1 file changed, 26 insertions(+), 1 deletion(-) > > > > > > diff --git a/io/channel-tls.c b/io/channel-tls.c > > > index 7ec8ceff2f01..f90905823e1d 100644 > > > --- a/io/channel-tls.c > > > +++ b/io/channel-tls.c > > > @@ -360,10 +360,35 @@ static int qio_channel_tls_shutdown(QIOChannel *ioc, > > > Error **errp) > > > { > > > QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc); > > > + int ret = 0; > > > > > > tioc->shutdown |= how; > > > > > > - return qio_channel_shutdown(tioc->master, how, errp); > > > + do { > > > + switch (how) { > > > + case QIO_CHANNEL_SHUTDOWN_READ: > > > + /* No TLS counterpart */ > > > + break; > > > + case QIO_CHANNEL_SHUTDOWN_WRITE: > > > + ret = qcrypto_tls_session_shutdown(tioc->session, QCRYPTO_SHUT_WR); > > > + break; > > > + case QIO_CHANNEL_SHUTDOWN_BOTH: > > > + ret = qcrypto_tls_session_shutdown(tioc->session, > > > + QCRYPTO_SHUT_RDWR); > > > + break; > > > + default: > > > + abort(); > > > + } > > > + } while (ret == -EAGAIN); > > > > I don't think it is acceptable to do this loop here. The gnutls_bye() > > function triggers several I/O operations which could block. Looping > > like this means we busy-wait, blocking this thread for as long as I/O > > is blocking on the socket. > > Hmm, good point. Should we give qio_channel_tls_shutdown a bool parameter > that says whether it should wait (good for use when we are being run in a > coroutine and can deal with the additional I/O) or just fail with -EAGAIN > (which the caller can ignore if it is not worried)? A bool would not be sufficient, because the caller would need to know which direction to wait for I/O on. Looking at the code it does a flush of outstanding data, then sends some bytes, and then receives some bytes. The write will probably work most of the time, but the receive is almost always going to return -EAGAIN. So I don't think that failing with EGAIN is going to be much of a benefit. > > If we must call gnutls_bye(), then it needs to be done in a way that > > can integrate with the main loop so it poll()'s / unblocks the current > > coroutine/thread. This makes the whole thing significantly more > > complex to deal with, especially if the shutdown is being done in > > cleanup paths which ordinarily are expected to execute without > > blocking on I/O. This is the big reason why i never made any attempt > > to use gnutls_bye(). > > We _are_ using gnutls_bye(GNUTLS_SHUT_RDWR) on the close() path (which is Are you sure ? AFAIK there is no existing usage of gnutls_bye() at all in QEMU. > indeed a cleanup path where not blocking is worthwhile) even without this > patch, but the question is whether using gnutls_bye(GNUTLS_SHUT_WR) in the > normal data path, where we are already using coroutines to manage callbacks, > can benefit the remote endpoint, giving them a chance to see clean shutdown > instead of abrupt termination. I'm not convinced the clean shutdown at the TLS level does anything important given that we already have done a clean shutdown at the NBD level. Regards, Daniel
On 3/27/20 12:43 PM, Daniel P. Berrangé wrote: >>> I don't think it is acceptable to do this loop here. The gnutls_bye() >>> function triggers several I/O operations which could block. Looping >>> like this means we busy-wait, blocking this thread for as long as I/O >>> is blocking on the socket. >> >> Hmm, good point. Should we give qio_channel_tls_shutdown a bool parameter >> that says whether it should wait (good for use when we are being run in a >> coroutine and can deal with the additional I/O) or just fail with -EAGAIN >> (which the caller can ignore if it is not worried)? > > A bool would not be sufficient, because the caller would need to know > which direction to wait for I/O on. > > Looking at the code it does a flush of outstanding data, then sends > some bytes, and then receives some bytes. The write will probably > work most of the time, but the receive is almost always going to > return -EAGAIN. So I don't think that failing with EGAIN is going > to be much of a benefit. Here, I'm guessing you're talking about gnutls lib/record.c. But note: for GNUTLS_SHUT_WR, it only does _gnutls_io_write_flush and gnutls_alert_send; the use of _gnutls_recv_int is reserved for GNUTLS_SHUT_RDWR. When informing the other end that you are not going to write any more, you don't have to wait for a reply. > >>> If we must call gnutls_bye(), then it needs to be done in a way that >>> can integrate with the main loop so it poll()'s / unblocks the current >>> coroutine/thread. This makes the whole thing significantly more >>> complex to deal with, especially if the shutdown is being done in >>> cleanup paths which ordinarily are expected to execute without >>> blocking on I/O. This is the big reason why i never made any attempt >>> to use gnutls_bye(). >> >> We _are_ using gnutls_bye(GNUTLS_SHUT_RDWR) on the close() path (which is > > Are you sure ? AFAIK there is no existing usage of gnutls_bye() at all > in QEMU. Oh, I'm confusing that with nbdkit, which does use gnutls_bye(GNUTLS_SHUT_RDWR) but does not wait for a response (at least, not yet). > >> indeed a cleanup path where not blocking is worthwhile) even without this >> patch, but the question is whether using gnutls_bye(GNUTLS_SHUT_WR) in the >> normal data path, where we are already using coroutines to manage callbacks, >> can benefit the remote endpoint, giving them a chance to see clean shutdown >> instead of abrupt termination. > > I'm not convinced the clean shutdown at the TLS level does anything important > given that we already have done a clean shutdown at the NBD level. Okay, then for now, I'll still try and see if I can fix the libnbd/nbdkit hang without relying on qemu sending a clean gnutls_bye(GNUTLS_SHUT_WR).
diff --git a/io/channel-tls.c b/io/channel-tls.c index 7ec8ceff2f01..f90905823e1d 100644 --- a/io/channel-tls.c +++ b/io/channel-tls.c @@ -360,10 +360,35 @@ static int qio_channel_tls_shutdown(QIOChannel *ioc, Error **errp) { QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc); + int ret = 0; tioc->shutdown |= how; - return qio_channel_shutdown(tioc->master, how, errp); + do { + switch (how) { + case QIO_CHANNEL_SHUTDOWN_READ: + /* No TLS counterpart */ + break; + case QIO_CHANNEL_SHUTDOWN_WRITE: + ret = qcrypto_tls_session_shutdown(tioc->session, QCRYPTO_SHUT_WR); + break; + case QIO_CHANNEL_SHUTDOWN_BOTH: + ret = qcrypto_tls_session_shutdown(tioc->session, + QCRYPTO_SHUT_RDWR); + break; + default: + abort(); + } + } while (ret == -EAGAIN); + if (ret < 0) { + error_setg_errno(errp, -ret, "Cannot shut down TLS channel"); + return -1; + } + + if (qio_channel_has_feature(tioc->master, QIO_CHANNEL_FEATURE_SHUTDOWN)) { + return qio_channel_shutdown(tioc->master, how, errp); + } + return 0; } static int qio_channel_tls_close(QIOChannel *ioc,
Gnutls documents that while many apps simply yank out the underlying transport at the end of communication in the name of efficiency, this is indistinguishable from a malicious actor terminating the connection prematurely. Since our channel I/O code already supports the notion of a graceful shutdown request, it is time to plumb that through to the TLS layer, and wait for TLS to give the all clear before then terminating traffic on the underlying channel. Note that channel-tls now always advertises shutdown support, regardless of whether the underlying channel also has that support. Signed-off-by: Eric Blake <eblake@redhat.com> --- io/channel-tls.c | 27 ++++++++++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-)