diff mbox series

[2/2] nbd-client: enable TCP keepalive

Message ID 20190605100913.34972-3-vsementsov@virtuozzo.com
State New
Headers show
Series nbd: enable keepalive | expand

Commit Message

Vladimir Sementsov-Ogievskiy June 5, 2019, 10:09 a.m. UTC
Enable keepalive option to track server availablity.

Requested-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/nbd-client.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Eric Blake June 5, 2019, 2:39 p.m. UTC | #1
On 6/5/19 5:09 AM, Vladimir Sementsov-Ogievskiy wrote:
> Enable keepalive option to track server availablity.

s/availablity/availability/

Do we want this unconditionally, or should it be an option (and hence
exposed over QMP)?

> 
> Requested-by: Denis V. Lunev <den@openvz.org>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/nbd-client.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/block/nbd-client.c b/block/nbd-client.c
> index 790ecc1ee1..b57cea8482 100644
> --- a/block/nbd-client.c
> +++ b/block/nbd-client.c
> @@ -1137,6 +1137,7 @@ static int nbd_client_connect(BlockDriverState *bs,
>  
>      /* NBD handshake */
>      logout("session init %s\n", export);
> +    qio_channel_set_keepalive(QIO_CHANNEL(sioc), true, NULL);
>      qio_channel_set_blocking(QIO_CHANNEL(sioc), true, NULL);
>  
>      client->info.request_sizes = true;
>
Denis V. Lunev June 5, 2019, 2:43 p.m. UTC | #2
On 6/5/19 5:39 PM, Eric Blake wrote:
> On 6/5/19 5:09 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Enable keepalive option to track server availablity.
> s/availablity/availability/
>
> Do we want this unconditionally, or should it be an option (and hence
> exposed over QMP)?
That is good question, if we would expose it, we should specify
timeout duration as an option. Though IMHO it would be safe
to get this unconditional.


>> Requested-by: Denis V. Lunev <den@openvz.org>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>  block/nbd-client.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/block/nbd-client.c b/block/nbd-client.c
>> index 790ecc1ee1..b57cea8482 100644
>> --- a/block/nbd-client.c
>> +++ b/block/nbd-client.c
>> @@ -1137,6 +1137,7 @@ static int nbd_client_connect(BlockDriverState *bs,
>>  
>>      /* NBD handshake */
>>      logout("session init %s\n", export);
>> +    qio_channel_set_keepalive(QIO_CHANNEL(sioc), true, NULL);
>>      qio_channel_set_blocking(QIO_CHANNEL(sioc), true, NULL);
>>  
>>      client->info.request_sizes = true;
>>
Daniel P. Berrangé June 5, 2019, 4:37 p.m. UTC | #3
On Wed, Jun 05, 2019 at 09:39:10AM -0500, Eric Blake wrote:
> On 6/5/19 5:09 AM, Vladimir Sementsov-Ogievskiy wrote:
> > Enable keepalive option to track server availablity.
> 
> s/availablity/availability/
> 
> Do we want this unconditionally, or should it be an option (and hence
> exposed over QMP)?

I guess this is really a question about what our intended connection
reliability policy should be.

By enabling TCP keepalives we are explicitly making the connection
less reliable by forcing it to be terminated when keepalive
threshold triggers, instead of waiting longer for TCP to recover.

The rationale s that once a connection has been in a hung state for
so long that keepalive triggers, its (hopefully) not useful to the
mgmt app to carry on waiting anyway.

If the connection is terminated by keepalive & the mgmt app then
spawns a new client to carry on with the work, what are the risks
involved ? eg Could packets from the stuck, terminated, connection
suddenly arrive later and trigger I/O with outdated data payload ?

I guess this is no different a situation from an app explicitly
killing the QEMU NBD client process instead & spawning a new one.

I'm still feeling a little uneasy about enabling it unconditionally
though, since pretty much everything I know which supports keepalives
has a way to turn them on/off at least, even if you can't tune the
individual timer settings.

> > Requested-by: Denis V. Lunev <den@openvz.org>
> > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > ---
> >  block/nbd-client.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/block/nbd-client.c b/block/nbd-client.c
> > index 790ecc1ee1..b57cea8482 100644
> > --- a/block/nbd-client.c
> > +++ b/block/nbd-client.c
> > @@ -1137,6 +1137,7 @@ static int nbd_client_connect(BlockDriverState *bs,
> >  
> >      /* NBD handshake */
> >      logout("session init %s\n", export);
> > +    qio_channel_set_keepalive(QIO_CHANNEL(sioc), true, NULL);
> >      qio_channel_set_blocking(QIO_CHANNEL(sioc), true, NULL);
> >  
> >      client->info.request_sizes = true;
> > 
> 
> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.           +1-919-301-3226
> Virtualization:  qemu.org | libvirt.org
> 




Regards,
Daniel
Vladimir Sementsov-Ogievskiy June 5, 2019, 5:05 p.m. UTC | #4
05.06.2019 19:37, Daniel P. Berrangé wrote:
> On Wed, Jun 05, 2019 at 09:39:10AM -0500, Eric Blake wrote:
>> On 6/5/19 5:09 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Enable keepalive option to track server availablity.
>>
>> s/availablity/availability/
>>
>> Do we want this unconditionally, or should it be an option (and hence
>> exposed over QMP)?
> 
> I guess this is really a question about what our intended connection
> reliability policy should be.
> 
> By enabling TCP keepalives we are explicitly making the connection
> less reliable by forcing it to be terminated when keepalive
> threshold triggers, instead of waiting longer for TCP to recover.
> 
> The rationale s that once a connection has been in a hung state for
> so long that keepalive triggers, its (hopefully) not useful to the
> mgmt app to carry on waiting anyway.
> 
> If the connection is terminated by keepalive & the mgmt app then
> spawns a new client to carry on with the work, what are the risks
> involved ? eg Could packets from the stuck, terminated, connection
> suddenly arrive later and trigger I/O with outdated data payload ?

Hmm, I believe that tcp guarantees isolation between different connections

> 
> I guess this is no different a situation from an app explicitly
> killing the QEMU NBD client process instead & spawning a new one.
> 
> I'm still feeling a little uneasy about enabling it unconditionally
> though, since pretty much everything I know which supports keepalives
> has a way to turn them on/off at least, even if you can't tune the
> individual timer settings.

Hm. So, I can add bool keepalive parameter for nbd format with default to true.
And if needed, it may be later extended to be qapi 'alternate' of bool or struct with
three numeric parameters, corresponding to TCP_KEEPCNT, TCP_KEEPIDLE and TCP_KEEPINTVL .

Opinions?


> 
>>> Requested-by: Denis V. Lunev <den@openvz.org>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   block/nbd-client.c | 1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/block/nbd-client.c b/block/nbd-client.c
>>> index 790ecc1ee1..b57cea8482 100644
>>> --- a/block/nbd-client.c
>>> +++ b/block/nbd-client.c
>>> @@ -1137,6 +1137,7 @@ static int nbd_client_connect(BlockDriverState *bs,
>>>   
>>>       /* NBD handshake */
>>>       logout("session init %s\n", export);
>>> +    qio_channel_set_keepalive(QIO_CHANNEL(sioc), true, NULL);
>>>       qio_channel_set_blocking(QIO_CHANNEL(sioc), true, NULL);
>>>   
>>>       client->info.request_sizes = true;
>>>
>>
>> -- 
>> Eric Blake, Principal Software Engineer
>> Red Hat, Inc.           +1-919-301-3226
>> Virtualization:  qemu.org | libvirt.org
>>
> 
> 
> 
> 
> Regards,
> Daniel
>
Eric Blake June 5, 2019, 5:12 p.m. UTC | #5
On 6/5/19 12:05 PM, Vladimir Sementsov-Ogievskiy wrote:

>> By enabling TCP keepalives we are explicitly making the connection
>> less reliable by forcing it to be terminated when keepalive
>> threshold triggers, instead of waiting longer for TCP to recover.
>>
>> The rationale s that once a connection has been in a hung state for
>> so long that keepalive triggers, its (hopefully) not useful to the
>> mgmt app to carry on waiting anyway.
>>
>> If the connection is terminated by keepalive & the mgmt app then
>> spawns a new client to carry on with the work, what are the risks
>> involved ? eg Could packets from the stuck, terminated, connection
>> suddenly arrive later and trigger I/O with outdated data payload ?
> 
> Hmm, I believe that tcp guarantees isolation between different connections
> 
>>
>> I guess this is no different a situation from an app explicitly
>> killing the QEMU NBD client process instead & spawning a new one.
>>
>> I'm still feeling a little uneasy about enabling it unconditionally
>> though, since pretty much everything I know which supports keepalives
>> has a way to turn them on/off at least, even if you can't tune the
>> individual timer settings.
> 
> Hm. So, I can add bool keepalive parameter for nbd format with default to true.
> And if needed, it may be later extended to be qapi 'alternate' of bool or struct with
> three numeric parameters, corresponding to TCP_KEEPCNT, TCP_KEEPIDLE and TCP_KEEPINTVL .
> 
> Opinions?

Adding a bool that could later turn into a qapi 'alternate' for
fine-tuning seems reasonable. Defaulting the bool to true is not
backwards-compatible; better would be defaulting it to false and letting
users opt-in; introspection will also work to let you know whether the
feature is present.
Vladimir Sementsov-Ogievskiy June 5, 2019, 5:28 p.m. UTC | #6
05.06.2019 20:12, Eric Blake wrote:
> On 6/5/19 12:05 PM, Vladimir Sementsov-Ogievskiy wrote:
> 
>>> By enabling TCP keepalives we are explicitly making the connection
>>> less reliable by forcing it to be terminated when keepalive
>>> threshold triggers, instead of waiting longer for TCP to recover.
>>>
>>> The rationale s that once a connection has been in a hung state for
>>> so long that keepalive triggers, its (hopefully) not useful to the
>>> mgmt app to carry on waiting anyway.
>>>
>>> If the connection is terminated by keepalive & the mgmt app then
>>> spawns a new client to carry on with the work, what are the risks
>>> involved ? eg Could packets from the stuck, terminated, connection
>>> suddenly arrive later and trigger I/O with outdated data payload ?
>>
>> Hmm, I believe that tcp guarantees isolation between different connections
>>
>>>
>>> I guess this is no different a situation from an app explicitly
>>> killing the QEMU NBD client process instead & spawning a new one.
>>>
>>> I'm still feeling a little uneasy about enabling it unconditionally
>>> though, since pretty much everything I know which supports keepalives
>>> has a way to turn them on/off at least, even if you can't tune the
>>> individual timer settings.
>>
>> Hm. So, I can add bool keepalive parameter for nbd format with default to true.
>> And if needed, it may be later extended to be qapi 'alternate' of bool or struct with
>> three numeric parameters, corresponding to TCP_KEEPCNT, TCP_KEEPIDLE and TCP_KEEPINTVL .
>>
>> Opinions?
> 
> Adding a bool that could later turn into a qapi 'alternate' for
> fine-tuning seems reasonable. Defaulting the bool to true is not
> backwards-compatible; better would be defaulting it to false and letting
> users opt-in; introspection will also work to let you know whether the
> feature is present.
> 

Ok.

One more thing to discuss then. Should I add keepalive directly to BlockdevOptionsNbd?

Seems more useful to put it into SocketAddress, to be reused by other socket users..
But "SocketAddress" sounds like address, not like address+connection-options. On
the other hand, structure names are not part of API. So, finally, is InetSocketAddress
a good place for such thing?
Daniel P. Berrangé June 5, 2019, 5:36 p.m. UTC | #7
On Wed, Jun 05, 2019 at 05:28:05PM +0000, Vladimir Sementsov-Ogievskiy wrote:
> 05.06.2019 20:12, Eric Blake wrote:
> > On 6/5/19 12:05 PM, Vladimir Sementsov-Ogievskiy wrote:
> > 
> >>> By enabling TCP keepalives we are explicitly making the connection
> >>> less reliable by forcing it to be terminated when keepalive
> >>> threshold triggers, instead of waiting longer for TCP to recover.
> >>>
> >>> The rationale s that once a connection has been in a hung state for
> >>> so long that keepalive triggers, its (hopefully) not useful to the
> >>> mgmt app to carry on waiting anyway.
> >>>
> >>> If the connection is terminated by keepalive & the mgmt app then
> >>> spawns a new client to carry on with the work, what are the risks
> >>> involved ? eg Could packets from the stuck, terminated, connection
> >>> suddenly arrive later and trigger I/O with outdated data payload ?
> >>
> >> Hmm, I believe that tcp guarantees isolation between different connections
> >>
> >>>
> >>> I guess this is no different a situation from an app explicitly
> >>> killing the QEMU NBD client process instead & spawning a new one.
> >>>
> >>> I'm still feeling a little uneasy about enabling it unconditionally
> >>> though, since pretty much everything I know which supports keepalives
> >>> has a way to turn them on/off at least, even if you can't tune the
> >>> individual timer settings.
> >>
> >> Hm. So, I can add bool keepalive parameter for nbd format with default to true.
> >> And if needed, it may be later extended to be qapi 'alternate' of bool or struct with
> >> three numeric parameters, corresponding to TCP_KEEPCNT, TCP_KEEPIDLE and TCP_KEEPINTVL .
> >>
> >> Opinions?
> > 
> > Adding a bool that could later turn into a qapi 'alternate' for
> > fine-tuning seems reasonable. Defaulting the bool to true is not
> > backwards-compatible; better would be defaulting it to false and letting
> > users opt-in; introspection will also work to let you know whether the
> > feature is present.
> > 
> 
> Ok.
> 
> One more thing to discuss then. Should I add keepalive directly to BlockdevOptionsNbd?
> 
> Seems more useful to put it into SocketAddress, to be reused by other socket users..
> But "SocketAddress" sounds like address, not like address+connection-options. On
> the other hand, structure names are not part of API. So, finally, is InetSocketAddress
> a good place for such thing?

That's an interesting idea. Using InetSocketAddress would mean that we could
get support for this enabled "for free" everywhere in QEMU that uses an
InetSocketAddress as its master config format.

Of course there's plenty of places not using InetSocketAddress that would
still require some glue to wire up the code which converts the custom
format into InetSocketAddress


Regards,
Daniel
Eric Blake June 5, 2019, 7:48 p.m. UTC | #8
On 6/5/19 12:36 PM, Daniel P. Berrangé wrote:

>>
>> Ok.
>>
>> One more thing to discuss then. Should I add keepalive directly to BlockdevOptionsNbd?
>>
>> Seems more useful to put it into SocketAddress, to be reused by other socket users..
>> But "SocketAddress" sounds like address, not like address+connection-options. On
>> the other hand, structure names are not part of API. So, finally, is InetSocketAddress
>> a good place for such thing?
> 
> That's an interesting idea. Using InetSocketAddress would mean that we could
> get support for this enabled "for free" everywhere in QEMU that uses an
> InetSocketAddress as its master config format.

I like the idea as well.

> 
> Of course there's plenty of places not using InetSocketAddress that would
> still require some glue to wire up the code which converts the custom
> format into InetSocketAddress

Hmm - how many places are we using InetSocketAddress (which allows an
optional 'to' port value) when we really meant InetSocketAddressBase?
There may be some interesting hierarchy decisions to consider on where
we stick a keepalive option.

This also made me wonder if we should start a deprecation clock to
improve the nbd-server-start command to use SocketAddress instead of
SocketAddressLegacy.  If we revive Max's work on implementing a default
branch for a union discriminator
(https://lists.gnu.org/archive/html/qemu-devel/2019-02/msg01682.html),
we could have something like:

{ 'enum': 'NbdSocketAddressHack',
  'data': [ 'legacy', 'inet', 'unix' ] }
{ 'struct': 'NbdServerAddrLegacy',
  'data': { 'addr', 'SocketAddressLegacy' } }
{ 'union': 'NbdServerAddr',
  'base': { 'type': 'NbdSocketAddressHack',
            '*tls-creds': 'str',
            '*tls-authz': 'str' },
  'discriminator': 'type',
  'default-variant': 'legacy',
  'data': { 'legacy': 'NbdServerAddrLegacy',
            'inet', 'InetSocketAddress',
            'unix', 'UnixSocketAddress' } }
{ 'command', 'nbd-server-start', 'data': 'NbdServerAddr' }

which should be backwards compatible with the existing:

{ "execute": "nbd-server-start", "arguments": {
    "tls-authz": "authz0",
    "addr": { "type": "inet", "data": {
      "host": "localhost", "port": "10809" } } } }

by relying on the discriminator's default expansion to:

{ "execute": "nbd-server-start", "arguments": {
    "tls-authz": "authz0",
    "type": "legacy",
    "addr": { "type": "inet", "data": {
      "host": "localhost", "port": "10809" } } } }

but also permit the flatter:

{ "execute": "nbd-server-start", "arguments": {
    "tls-authz": "authz0",
    "type": "inet", "host": "localhost", "port": "10809" } }

and let us start a deprecation clock to get rid of the "legacy" branch
(especially if coupled with Kevin's work on adding introspectable
deprecation annotations in QAPI).
Eric Blake June 5, 2019, 8:11 p.m. UTC | #9
On 6/5/19 2:48 PM, Eric Blake wrote:

> This also made me wonder if we should start a deprecation clock to
> improve the nbd-server-start command to use SocketAddress instead of
> SocketAddressLegacy.  If we revive Max's work on implementing a default
> branch for a union discriminator
> (https://lists.gnu.org/archive/html/qemu-devel/2019-02/msg01682.html),
> we could have something like:

Re-reading that thread, I see that Markus was arguing for a slightly
different QAPI syntax than Max's proposal, basically:

> 
> { 'enum': 'NbdSocketAddressHack',
>   'data': [ 'legacy', 'inet', 'unix' ] }
> { 'struct': 'NbdServerAddrLegacy',
>   'data': { 'addr', 'SocketAddressLegacy' } }
> { 'union': 'NbdServerAddr',
>   'base': { 'type': 'NbdSocketAddressHack',
>             '*tls-creds': 'str',
>             '*tls-authz': 'str' },
>   'discriminator': 'type',
>   'default-variant': 'legacy',
>   'data': { 'legacy': 'NbdServerAddrLegacy',
>             'inet', 'InetSocketAddress',
>             'unix', 'UnixSocketAddress' } }
> { 'command', 'nbd-server-start', 'data': 'NbdServerAddr' }

{ 'union': 'NbdServerAddr',
  'base': { '*type': { 'type': 'NbdSocketAddressHack',
                       'default': 'legacy' },
            '*tls-creds', 'str', '*tls-authz', 'str' },
  'discriminator': 'type',
  'data': { 'legacy': 'NbdServerAddrLegacy',
            'inet', 'InetSocketAddress',
            'unix', 'UnixSocketAddress' } }

> 
> which should be backwards compatible with the existing:
> 
> { "execute": "nbd-server-start", "arguments": {
>     "tls-authz": "authz0",
>     "addr": { "type": "inet", "data": {
>       "host": "localhost", "port": "10809" } } } }
> 
> by relying on the discriminator's default expansion to:
> 
> { "execute": "nbd-server-start", "arguments": {
>     "tls-authz": "authz0",
>     "type": "legacy",
>     "addr": { "type": "inet", "data": {
>       "host": "localhost", "port": "10809" } } } }

But this part remains true - if a flat union has an optional
discriminator, then the discriminator must include a default value,
where omitting the discriminator then results in sane expansion, and
where a careful choice of discriminator default allows legacy syntax to
co-exist with new preferred syntax.
diff mbox series

Patch

diff --git a/block/nbd-client.c b/block/nbd-client.c
index 790ecc1ee1..b57cea8482 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -1137,6 +1137,7 @@  static int nbd_client_connect(BlockDriverState *bs,
 
     /* NBD handshake */
     logout("session init %s\n", export);
+    qio_channel_set_keepalive(QIO_CHANNEL(sioc), true, NULL);
     qio_channel_set_blocking(QIO_CHANNEL(sioc), true, NULL);
 
     client->info.request_sizes = true;