diff mbox series

[2/3] io: Support shutdown of TLS channel

Message ID 20200327161936.2225989-3-eblake@redhat.com
State New
Headers show
Series nbd: Try for cleaner TLS shutdown | expand

Commit Message

Eric Blake March 27, 2020, 4:19 p.m. UTC
Gnutls documents that while many apps simply yank out the underlying
transport at the end of communication in the name of efficiency, this
is indistinguishable from a malicious actor terminating the connection
prematurely.  Since our channel I/O code already supports the notion of
a graceful shutdown request, it is time to plumb that through to the
TLS layer, and wait for TLS to give the all clear before then
terminating traffic on the underlying channel.

Note that channel-tls now always advertises shutdown support,
regardless of whether the underlying channel also has that support.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 io/channel-tls.c | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

Comments

Daniel P. Berrangé March 27, 2020, 4:40 p.m. UTC | #1
On Fri, Mar 27, 2020 at 11:19:35AM -0500, Eric Blake wrote:
> Gnutls documents that while many apps simply yank out the underlying
> transport at the end of communication in the name of efficiency, this
> is indistinguishable from a malicious actor terminating the connection
> prematurely.  Since our channel I/O code already supports the notion of
> a graceful shutdown request, it is time to plumb that through to the
> TLS layer, and wait for TLS to give the all clear before then
> terminating traffic on the underlying channel.
> 
> Note that channel-tls now always advertises shutdown support,
> regardless of whether the underlying channel also has that support.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>  io/channel-tls.c | 27 ++++++++++++++++++++++++++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/io/channel-tls.c b/io/channel-tls.c
> index 7ec8ceff2f01..f90905823e1d 100644
> --- a/io/channel-tls.c
> +++ b/io/channel-tls.c
> @@ -360,10 +360,35 @@ static int qio_channel_tls_shutdown(QIOChannel *ioc,
>                                      Error **errp)
>  {
>      QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc);
> +    int ret = 0;
> 
>      tioc->shutdown |= how;
> 
> -    return qio_channel_shutdown(tioc->master, how, errp);
> +    do {
> +        switch (how) {
> +        case QIO_CHANNEL_SHUTDOWN_READ:
> +            /* No TLS counterpart */
> +            break;
> +        case QIO_CHANNEL_SHUTDOWN_WRITE:
> +            ret = qcrypto_tls_session_shutdown(tioc->session, QCRYPTO_SHUT_WR);
> +            break;
> +        case QIO_CHANNEL_SHUTDOWN_BOTH:
> +            ret = qcrypto_tls_session_shutdown(tioc->session,
> +                                               QCRYPTO_SHUT_RDWR);
> +            break;
> +        default:
> +            abort();
> +        }
> +    } while (ret == -EAGAIN);

I don't think it is acceptable to do this loop here. The gnutls_bye()
function triggers several I/O operations which could block. Looping
like this means we busy-wait, blocking this thread for as long as I/O
is blocking on the socket.

If we must call gnutls_bye(), then it needs to be done in a way that
can integrate with the main loop so it poll()'s / unblocks the current
coroutine/thread.  This makes the whole thing significantly more
complex to deal with, especially if the shutdown is being done in
cleanup paths which ordinarily are expected to execute without
blocking on I/O.  This is the big reason why i never made any attempt
to use gnutls_bye().


Regards,
Daniel
Eric Blake March 27, 2020, 5:29 p.m. UTC | #2
On 3/27/20 11:40 AM, Daniel P. Berrangé wrote:
> On Fri, Mar 27, 2020 at 11:19:35AM -0500, Eric Blake wrote:
>> Gnutls documents that while many apps simply yank out the underlying
>> transport at the end of communication in the name of efficiency, this
>> is indistinguishable from a malicious actor terminating the connection
>> prematurely.  Since our channel I/O code already supports the notion of
>> a graceful shutdown request, it is time to plumb that through to the
>> TLS layer, and wait for TLS to give the all clear before then
>> terminating traffic on the underlying channel.
>>
>> Note that channel-tls now always advertises shutdown support,
>> regardless of whether the underlying channel also has that support.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>> ---
>>   io/channel-tls.c | 27 ++++++++++++++++++++++++++-
>>   1 file changed, 26 insertions(+), 1 deletion(-)
>>
>> diff --git a/io/channel-tls.c b/io/channel-tls.c
>> index 7ec8ceff2f01..f90905823e1d 100644
>> --- a/io/channel-tls.c
>> +++ b/io/channel-tls.c
>> @@ -360,10 +360,35 @@ static int qio_channel_tls_shutdown(QIOChannel *ioc,
>>                                       Error **errp)
>>   {
>>       QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc);
>> +    int ret = 0;
>>
>>       tioc->shutdown |= how;
>>
>> -    return qio_channel_shutdown(tioc->master, how, errp);
>> +    do {
>> +        switch (how) {
>> +        case QIO_CHANNEL_SHUTDOWN_READ:
>> +            /* No TLS counterpart */
>> +            break;
>> +        case QIO_CHANNEL_SHUTDOWN_WRITE:
>> +            ret = qcrypto_tls_session_shutdown(tioc->session, QCRYPTO_SHUT_WR);
>> +            break;
>> +        case QIO_CHANNEL_SHUTDOWN_BOTH:
>> +            ret = qcrypto_tls_session_shutdown(tioc->session,
>> +                                               QCRYPTO_SHUT_RDWR);
>> +            break;
>> +        default:
>> +            abort();
>> +        }
>> +    } while (ret == -EAGAIN);
> 
> I don't think it is acceptable to do this loop here. The gnutls_bye()
> function triggers several I/O operations which could block. Looping
> like this means we busy-wait, blocking this thread for as long as I/O
> is blocking on the socket.

Hmm, good point.  Should we give qio_channel_tls_shutdown a bool 
parameter that says whether it should wait (good for use when we are 
being run in a coroutine and can deal with the additional I/O) or just 
fail with -EAGAIN (which the caller can ignore if it is not worried)?

> 
> If we must call gnutls_bye(), then it needs to be done in a way that
> can integrate with the main loop so it poll()'s / unblocks the current
> coroutine/thread.  This makes the whole thing significantly more
> complex to deal with, especially if the shutdown is being done in
> cleanup paths which ordinarily are expected to execute without
> blocking on I/O.  This is the big reason why i never made any attempt
> to use gnutls_bye().

We _are_ using gnutls_bye(GNUTLS_SHUT_RDWR) on the close() path (which 
is indeed a cleanup path where not blocking is worthwhile) even without 
this patch, but the question is whether using gnutls_bye(GNUTLS_SHUT_WR) 
in the normal data path, where we are already using coroutines to manage 
callbacks, can benefit the remote endpoint, giving them a chance to see 
clean shutdown instead of abrupt termination.
Daniel P. Berrangé March 27, 2020, 5:43 p.m. UTC | #3
On Fri, Mar 27, 2020 at 12:29:39PM -0500, Eric Blake wrote:
> On 3/27/20 11:40 AM, Daniel P. Berrangé wrote:
> > On Fri, Mar 27, 2020 at 11:19:35AM -0500, Eric Blake wrote:
> > > Gnutls documents that while many apps simply yank out the underlying
> > > transport at the end of communication in the name of efficiency, this
> > > is indistinguishable from a malicious actor terminating the connection
> > > prematurely.  Since our channel I/O code already supports the notion of
> > > a graceful shutdown request, it is time to plumb that through to the
> > > TLS layer, and wait for TLS to give the all clear before then
> > > terminating traffic on the underlying channel.
> > > 
> > > Note that channel-tls now always advertises shutdown support,
> > > regardless of whether the underlying channel also has that support.
> > > 
> > > Signed-off-by: Eric Blake <eblake@redhat.com>
> > > ---
> > >   io/channel-tls.c | 27 ++++++++++++++++++++++++++-
> > >   1 file changed, 26 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/io/channel-tls.c b/io/channel-tls.c
> > > index 7ec8ceff2f01..f90905823e1d 100644
> > > --- a/io/channel-tls.c
> > > +++ b/io/channel-tls.c
> > > @@ -360,10 +360,35 @@ static int qio_channel_tls_shutdown(QIOChannel *ioc,
> > >                                       Error **errp)
> > >   {
> > >       QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc);
> > > +    int ret = 0;
> > > 
> > >       tioc->shutdown |= how;
> > > 
> > > -    return qio_channel_shutdown(tioc->master, how, errp);
> > > +    do {
> > > +        switch (how) {
> > > +        case QIO_CHANNEL_SHUTDOWN_READ:
> > > +            /* No TLS counterpart */
> > > +            break;
> > > +        case QIO_CHANNEL_SHUTDOWN_WRITE:
> > > +            ret = qcrypto_tls_session_shutdown(tioc->session, QCRYPTO_SHUT_WR);
> > > +            break;
> > > +        case QIO_CHANNEL_SHUTDOWN_BOTH:
> > > +            ret = qcrypto_tls_session_shutdown(tioc->session,
> > > +                                               QCRYPTO_SHUT_RDWR);
> > > +            break;
> > > +        default:
> > > +            abort();
> > > +        }
> > > +    } while (ret == -EAGAIN);
> > 
> > I don't think it is acceptable to do this loop here. The gnutls_bye()
> > function triggers several I/O operations which could block. Looping
> > like this means we busy-wait, blocking this thread for as long as I/O
> > is blocking on the socket.
> 
> Hmm, good point.  Should we give qio_channel_tls_shutdown a bool parameter
> that says whether it should wait (good for use when we are being run in a
> coroutine and can deal with the additional I/O) or just fail with -EAGAIN
> (which the caller can ignore if it is not worried)?

A bool would not be sufficient, because the caller would need to know
which direction to wait for I/O on.

Looking at the code it does a flush of outstanding data, then sends
some bytes, and then receives some bytes. The write will probably
work most of the time, but the receive is almost always going to
return -EAGAIN. So I don't think that failing with EGAIN is going
to be much of a benefit.

> > If we must call gnutls_bye(), then it needs to be done in a way that
> > can integrate with the main loop so it poll()'s / unblocks the current
> > coroutine/thread.  This makes the whole thing significantly more
> > complex to deal with, especially if the shutdown is being done in
> > cleanup paths which ordinarily are expected to execute without
> > blocking on I/O.  This is the big reason why i never made any attempt
> > to use gnutls_bye().
> 
> We _are_ using gnutls_bye(GNUTLS_SHUT_RDWR) on the close() path (which is

Are you sure ?  AFAIK there is no existing usage of gnutls_bye() at all
in QEMU.

> indeed a cleanup path where not blocking is worthwhile) even without this
> patch, but the question is whether using gnutls_bye(GNUTLS_SHUT_WR) in the
> normal data path, where we are already using coroutines to manage callbacks,
> can benefit the remote endpoint, giving them a chance to see clean shutdown
> instead of abrupt termination.

I'm not convinced the clean shutdown at the TLS level does anything important
given that we already have done a clean shutdown at the NBD level.


Regards,
Daniel
Eric Blake March 27, 2020, 6:46 p.m. UTC | #4
On 3/27/20 12:43 PM, Daniel P. Berrangé wrote:

>>> I don't think it is acceptable to do this loop here. The gnutls_bye()
>>> function triggers several I/O operations which could block. Looping
>>> like this means we busy-wait, blocking this thread for as long as I/O
>>> is blocking on the socket.
>>
>> Hmm, good point.  Should we give qio_channel_tls_shutdown a bool parameter
>> that says whether it should wait (good for use when we are being run in a
>> coroutine and can deal with the additional I/O) or just fail with -EAGAIN
>> (which the caller can ignore if it is not worried)?
> 
> A bool would not be sufficient, because the caller would need to know
> which direction to wait for I/O on.
> 
> Looking at the code it does a flush of outstanding data, then sends
> some bytes, and then receives some bytes. The write will probably
> work most of the time, but the receive is almost always going to
> return -EAGAIN. So I don't think that failing with EGAIN is going
> to be much of a benefit.

Here, I'm guessing you're talking about gnutls lib/record.c.  But note: 
for GNUTLS_SHUT_WR, it only does _gnutls_io_write_flush and 
gnutls_alert_send; the use of _gnutls_recv_int is reserved for 
GNUTLS_SHUT_RDWR.  When informing the other end that you are not going 
to write any more, you don't have to wait for a reply.

> 
>>> If we must call gnutls_bye(), then it needs to be done in a way that
>>> can integrate with the main loop so it poll()'s / unblocks the current
>>> coroutine/thread.  This makes the whole thing significantly more
>>> complex to deal with, especially if the shutdown is being done in
>>> cleanup paths which ordinarily are expected to execute without
>>> blocking on I/O.  This is the big reason why i never made any attempt
>>> to use gnutls_bye().
>>
>> We _are_ using gnutls_bye(GNUTLS_SHUT_RDWR) on the close() path (which is
> 
> Are you sure ?  AFAIK there is no existing usage of gnutls_bye() at all
> in QEMU.

Oh, I'm confusing that with nbdkit, which does use 
gnutls_bye(GNUTLS_SHUT_RDWR) but does not wait for a response (at least, 
not yet).

> 
>> indeed a cleanup path where not blocking is worthwhile) even without this
>> patch, but the question is whether using gnutls_bye(GNUTLS_SHUT_WR) in the
>> normal data path, where we are already using coroutines to manage callbacks,
>> can benefit the remote endpoint, giving them a chance to see clean shutdown
>> instead of abrupt termination.
> 
> I'm not convinced the clean shutdown at the TLS level does anything important
> given that we already have done a clean shutdown at the NBD level.

Okay, then for now, I'll still try and see if I can fix the 
libnbd/nbdkit hang without relying on qemu sending a clean 
gnutls_bye(GNUTLS_SHUT_WR).
diff mbox series

Patch

diff --git a/io/channel-tls.c b/io/channel-tls.c
index 7ec8ceff2f01..f90905823e1d 100644
--- a/io/channel-tls.c
+++ b/io/channel-tls.c
@@ -360,10 +360,35 @@  static int qio_channel_tls_shutdown(QIOChannel *ioc,
                                     Error **errp)
 {
     QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc);
+    int ret = 0;

     tioc->shutdown |= how;

-    return qio_channel_shutdown(tioc->master, how, errp);
+    do {
+        switch (how) {
+        case QIO_CHANNEL_SHUTDOWN_READ:
+            /* No TLS counterpart */
+            break;
+        case QIO_CHANNEL_SHUTDOWN_WRITE:
+            ret = qcrypto_tls_session_shutdown(tioc->session, QCRYPTO_SHUT_WR);
+            break;
+        case QIO_CHANNEL_SHUTDOWN_BOTH:
+            ret = qcrypto_tls_session_shutdown(tioc->session,
+                                               QCRYPTO_SHUT_RDWR);
+            break;
+        default:
+            abort();
+        }
+    } while (ret == -EAGAIN);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret, "Cannot shut down TLS channel");
+        return -1;
+    }
+
+    if (qio_channel_has_feature(tioc->master, QIO_CHANNEL_FEATURE_SHUTDOWN)) {
+        return qio_channel_shutdown(tioc->master, how, errp);
+    }
+    return 0;
 }

 static int qio_channel_tls_close(QIOChannel *ioc,