Patchwork [1/3] nbd: Only try to send flush/discard commands if connected to the NBD server

login
register
mail settings
Submitter Nicholas Thomas
Date Oct. 22, 2012, 11:09 a.m.
Message ID <cf837995a28824a17f65060e9515b9a8acf936fe.1350901963.git.nick@bytemark.co.uk>
Download mbox | patch
Permalink /patch/193230/
State New
Headers show

Comments

Nicholas Thomas - Oct. 22, 2012, 11:09 a.m.
This is unlikely to come up now, but is a necessary prerequisite for reconnection
behaviour.

Signed-off-by: Nick Thomas <nick@bytemark.co.uk>
---
 block/nbd.c |   13 +++++++++++--
 1 files changed, 11 insertions(+), 2 deletions(-)
Kevin Wolf - Oct. 23, 2012, 10:33 a.m.
Am 22.10.2012 13:09, schrieb nick@bytemark.co.uk:
> 
> This is unlikely to come up now, but is a necessary prerequisite for reconnection
> behaviour.
> 
> Signed-off-by: Nick Thomas <nick@bytemark.co.uk>
> ---
>  block/nbd.c |   13 +++++++++++--
>  1 files changed, 11 insertions(+), 2 deletions(-)

What's the real requirement here? Silently ignoring a flush and
returning success for it feels wrong. Why is it correct?

Kevin
Nicholas Thomas - Oct. 23, 2012, 11:08 a.m.
On Tue, 2012-10-23 at 12:33 +0200, Kevin Wolf wrote:
> Am 22.10.2012 13:09, schrieb nick@bytemark.co.uk:
> > 
> > This is unlikely to come up now, but is a necessary prerequisite for reconnection
> > behaviour.
> > 
> > Signed-off-by: Nick Thomas <nick@bytemark.co.uk>
> > ---
> >  block/nbd.c |   13 +++++++++++--
> >  1 files changed, 11 insertions(+), 2 deletions(-)
> 
> What's the real requirement here? Silently ignoring a flush and
> returning success for it feels wrong. Why is it correct?
> 
> Kevin

I just needed to avoid socket operations while s->sock == -1, and
extending the existing case of "can't do the command, so pretend I did
it" to "can't do the command right now, so pretend..." seemed like an
easy way out. 

In the Bytemark case, the NBD server always opens the file O_SYNC, so
nbd_co_flush could check in_flight == 0 and return 0/1 based on that;
but I'd be surprised if that's true for all NBD servers. Should we be
returning 1 here for both "not supported" and "can't do it right now",
instead? 


/Nick
Kevin Wolf - Oct. 23, 2012, 11:26 a.m.
Am 23.10.2012 13:08, schrieb Nicholas Thomas:
> On Tue, 2012-10-23 at 12:33 +0200, Kevin Wolf wrote:
>> Am 22.10.2012 13:09, schrieb nick@bytemark.co.uk:
>>>
>>> This is unlikely to come up now, but is a necessary prerequisite for reconnection
>>> behaviour.
>>>
>>> Signed-off-by: Nick Thomas <nick@bytemark.co.uk>
>>> ---
>>>  block/nbd.c |   13 +++++++++++--
>>>  1 files changed, 11 insertions(+), 2 deletions(-)
>>
>> What's the real requirement here? Silently ignoring a flush and
>> returning success for it feels wrong. Why is it correct?
>>
>> Kevin
> 
> I just needed to avoid socket operations while s->sock == -1, and
> extending the existing case of "can't do the command, so pretend I did
> it" to "can't do the command right now, so pretend..." seemed like an
> easy way out.

The difference is that NBD_FLAG_SEND_FLUSH is only clear if the server
is configured like that, i.e. it's completely in the control of the
user, and it ignores flushes consistently (for example because it
already uses a writethrough mode so that flushes aren't required).

Network outage usually isn't in the control of the user and ignoring a
flush request when the server actually asked for flushes is completely
surprising and dangerous.

> In the Bytemark case, the NBD server always opens the file O_SYNC, so
> nbd_co_flush could check in_flight == 0 and return 0/1 based on that;
> but I'd be surprised if that's true for all NBD servers. Should we be
> returning 1 here for both "not supported" and "can't do it right now",
> instead? 

The best return value is probably -ENOTCONN or -EIO if you must return
an error. But maybe using the same logic as for writes would be
appropriate, i.e. try to reestablish the connection before you return an
error.

Kevin
Jamie Lokier - Oct. 23, 2012, 3:02 p.m.
Nicholas Thomas wrote:
> On Tue, 2012-10-23 at 12:33 +0200, Kevin Wolf wrote:
> > Am 22.10.2012 13:09, schrieb nick@bytemark.co.uk:
> > > 
> > > This is unlikely to come up now, but is a necessary prerequisite for reconnection
> > > behaviour.
> > > 
> > > Signed-off-by: Nick Thomas <nick@bytemark.co.uk>
> > > ---
> > >  block/nbd.c |   13 +++++++++++--
> > >  1 files changed, 11 insertions(+), 2 deletions(-)
> > 
> > What's the real requirement here? Silently ignoring a flush and
> > returning success for it feels wrong. Why is it correct?
> > 
> > Kevin
> 
> I just needed to avoid socket operations while s->sock == -1, and
> extending the existing case of "can't do the command, so pretend I did
> it" to "can't do the command right now, so pretend..." seemed like an
> easy way out. 

Hi Nicholas,

Ignoring a flush is another way of saying "corrupt my data" in some
circumstances.  We have options in QEMU already to say whether flushes
are ignored on normal discs, but if someone's chosen the "I really
care about my database/filesystem" option, and verified that their NBD
setup really performs them (in normal circumstances), silently
dropping flushes from time to time isn't nice.

I would much rather the guest is forced to wait until reconnection and
then get a successful flush, if the problem is just that the server
was done briefly.  Or, if that is too hard, that the flush is 

Since the I/O _order_ before, and sometimes after, flush, is important
for data integrity, this needs to be maintained when I/Os are queued in
the disconnected state -- including those which were inflight at the
time disconnect was detected and then retried on reconnect.

Ignoring a discard is not too bad.  However, if discard is retried,
then I/O order is important in relation to those as well.

> In the Bytemark case, the NBD server always opens the file O_SYNC, so
> nbd_co_flush could check in_flight == 0 and return 0/1 based on that;
> but I'd be surprised if that's true for all NBD servers. Should we be
> returning 1 here for both "not supported" and "can't do it right now",
> instead?

When the server is opening the file O_SYNC, wouldn't it make sense to
tell QEMU -- and the guest -- that there's no need to send flushes at
all, as it's equivalent to a disk with no write-cache (or disabled)?

Best,
-- Jamie
Nicholas Thomas - Oct. 24, 2012, 12:16 p.m.
On Tue, 2012-10-23 at 16:02 +0100, Jamie Lokier wrote:
> Nicholas Thomas wrote:
> > On Tue, 2012-10-23 at 12:33 +0200, Kevin Wolf wrote:
> > > Am 22.10.2012 13:09, schrieb nick@bytemark.co.uk:
> > > > 
> > > > This is unlikely to come up now, but is a necessary prerequisite for reconnection
> > > > behaviour.
> > > > 
> > > > Signed-off-by: Nick Thomas <nick@bytemark.co.uk>
> > > > ---
> > > >  block/nbd.c |   13 +++++++++++--
> > > >  1 files changed, 11 insertions(+), 2 deletions(-)
> > > 
> > > What's the real requirement here? Silently ignoring a flush and
> > > returning success for it feels wrong. Why is it correct?
> > > 
> > > Kevin
> > 
> > I just needed to avoid socket operations while s->sock == -1, and
> > extending the existing case of "can't do the command, so pretend I did
> > it" to "can't do the command right now, so pretend..." seemed like an
> > easy way out. 
> 
> Hi Nicholas,
> 
> Ignoring a flush is another way of saying "corrupt my data" in some
> circumstances.  We have options in QEMU already to say whether flushes
> are ignored on normal discs, but if someone's chosen the "I really
> care about my database/filesystem" option, and verified that their NBD
> setup really performs them (in normal circumstances), silently
> dropping flushes from time to time isn't nice.
> 
> I would much rather the guest is forced to wait until reconnection and
> then get a successful flush, if the problem is just that the server
> was done briefly.  Or, if that is too hard, that the flush is 

OK, that's certainly doable. Revised patches hopefully showing up this
week or (at a push) next. 

> Since the I/O _order_ before, and sometimes after, flush, is important
> for data integrity, this needs to be maintained when I/Os are queued in
> the disconnected state -- including those which were inflight at the
> time disconnect was detected and then retried on reconnect.

Hmm, discussing this on IRC I was told that it wasn't necessary to
preserve order - although I forget the fine detail. Depending on the
implementation of qemu's coroutine mutexes, operations may not actually
be performed in order right now - it's not too easy to work out what's
happening. 

I've also just noticed that flush & discard don't take the send_mutex
before writing to the socket. That can't be intentional, surely? Paolo?

I've got a proof-of-concept that makes flush, discard, readv_1 and
writev_1 call a common nbd_co_handle_request function. this retries I/Os
when a connection goes away and comes back, and lets me make this patch
unnecessary. It doesn't preserve I/O ordering, but it won't be too hard
to add that if it is actually necessary.

> Ignoring a discard is not too bad.  However, if discard is retried,
> then I/O order is important in relation to those as well.
> 
> > In the Bytemark case, the NBD server always opens the file O_SYNC, so
> > nbd_co_flush could check in_flight == 0 and return 0/1 based on that;
> > but I'd be surprised if that's true for all NBD servers. Should we be
> > returning 1 here for both "not supported" and "can't do it right now",
> > instead?
> 
> When the server is opening the file O_SYNC, wouldn't it make sense to
> tell QEMU -- and the guest -- that there's no need to send flushes at
> all, as it's equivalent to a disk with no write-cache (or disabled)?

Probably; our NBD server still implements the old-style handshakes, as
far as I recall, and doesn't actually implement the flush or discard
commands as a result.
Kevin Wolf - Oct. 24, 2012, 12:57 p.m.
Am 24.10.2012 14:16, schrieb Nicholas Thomas:
> On Tue, 2012-10-23 at 16:02 +0100, Jamie Lokier wrote:
>> Since the I/O _order_ before, and sometimes after, flush, is important
>> for data integrity, this needs to be maintained when I/Os are queued in
>> the disconnected state -- including those which were inflight at the
>> time disconnect was detected and then retried on reconnect.
> 
> Hmm, discussing this on IRC I was told that it wasn't necessary to
> preserve order - although I forget the fine detail. Depending on the
> implementation of qemu's coroutine mutexes, operations may not actually
> be performed in order right now - it's not too easy to work out what's
> happening.

It's possible to reorder, but it must be consistent with the order in
which completion is signalled to the guest. The semantics of flush is
that at the point that the flush completes, all writes to the disk that
already have completed successfully are stable. It doesn't say anything
about writes that are still in flight, they may or may not be flushed to
disk.

Kevin
Paolo Bonzini - Oct. 24, 2012, 2:03 p.m.
Il 24/10/2012 14:16, Nicholas Thomas ha scritto:
> 
> I've also just noticed that flush & discard don't take the send_mutex
> before writing to the socket. That can't be intentional, surely? Paolo?

No, it's a bug.

Paolo
Paolo Bonzini - Oct. 24, 2012, 2:10 p.m.
Il 24/10/2012 16:03, Paolo Bonzini ha scritto:
> Il 24/10/2012 14:16, Nicholas Thomas ha scritto:
>>
>> I've also just noticed that flush & discard don't take the send_mutex
>> before writing to the socket. That can't be intentional, surely? Paolo?
> 
> No, it's a bug.

Hmm, why do you say it doesn't take the mutex?  nbd_co_flush and
nbd_co_discard all call nbd_co_send_request, which takes the mutex.

Paolo
Nicholas Thomas - Oct. 24, 2012, 2:12 p.m.
On Wed, 2012-10-24 at 16:10 +0200, Paolo Bonzini wrote:
> Il 24/10/2012 16:03, Paolo Bonzini ha scritto:
> > Il 24/10/2012 14:16, Nicholas Thomas ha scritto:
> >>
> >> I've also just noticed that flush & discard don't take the send_mutex
> >> before writing to the socket. That can't be intentional, surely? Paolo?
> > 
> > No, it's a bug.
> 
> Hmm, why do you say it doesn't take the mutex?  nbd_co_flush and
> nbd_co_discard all call nbd_co_send_request, which takes the mutex.
> 
> Paolo

Sorry, brain fail. You're right: no bug :)

/Nick
Jamie Lokier - Oct. 24, 2012, 2:32 p.m.
Kevin Wolf wrote:
> Am 24.10.2012 14:16, schrieb Nicholas Thomas:
> > On Tue, 2012-10-23 at 16:02 +0100, Jamie Lokier wrote:
> >> Since the I/O _order_ before, and sometimes after, flush, is important
> >> for data integrity, this needs to be maintained when I/Os are queued in
> >> the disconnected state -- including those which were inflight at the
> >> time disconnect was detected and then retried on reconnect.
> > 
> > Hmm, discussing this on IRC I was told that it wasn't necessary to
> > preserve order - although I forget the fine detail. Depending on the
> > implementation of qemu's coroutine mutexes, operations may not actually
> > be performed in order right now - it's not too easy to work out what's
> > happening.
> 
> It's possible to reorder, but it must be consistent with the order in
> which completion is signalled to the guest. The semantics of flush is
> that at the point that the flush completes, all writes to the disk that
> already have completed successfully are stable. It doesn't say anything
> about writes that are still in flight, they may or may not be flushed to
> disk.

I admit I wasn't thinking clearly how much ordering NBD actually
guarantees (or if there's ordering the guest depends on implicitly
even if it's not guaranteed in specification), and how that is related
within QEMU to virtio/FUA/NCQ/TCQ/SCSI-ORDERED ordering guarantees
that the guest expects for various emulated devices and their settings.

The ordering (if any) needed from the NBD driver (or any backend) is
going to depend on the assumptions baked into the interface between
QEMU device emulation <-> backend.

E.g. if every device emulation waited for all outstanding writes to
complete before sending a flush, then it wouldn't matter how the
backend reordered its requests, even getting the completions out of
order.

Is that relationship documented (and conformed to)?

-- Jamie
Paolo Bonzini - Oct. 24, 2012, 3:16 p.m.
Il 24/10/2012 16:32, Jamie Lokier ha scritto:
> Kevin Wolf wrote:
>> Am 24.10.2012 14:16, schrieb Nicholas Thomas:
>>> On Tue, 2012-10-23 at 16:02 +0100, Jamie Lokier wrote:
>>>> Since the I/O _order_ before, and sometimes after, flush, is important
>>>> for data integrity, this needs to be maintained when I/Os are queued in
>>>> the disconnected state -- including those which were inflight at the
>>>> time disconnect was detected and then retried on reconnect.
>>>
>>> Hmm, discussing this on IRC I was told that it wasn't necessary to
>>> preserve order - although I forget the fine detail. Depending on the
>>> implementation of qemu's coroutine mutexes, operations may not actually
>>> be performed in order right now - it's not too easy to work out what's
>>> happening.
>>
>> It's possible to reorder, but it must be consistent with the order in
>> which completion is signalled to the guest. The semantics of flush is
>> that at the point that the flush completes, all writes to the disk that
>> already have completed successfully are stable. It doesn't say anything
>> about writes that are still in flight, they may or may not be flushed to
>> disk.
> 
> I admit I wasn't thinking clearly how much ordering NBD actually
> guarantees (or if there's ordering the guest depends on implicitly
> even if it's not guaranteed in specification),

Here NBD is used just as a transport on the host and totally invisible
to the guest.  So NBD pretty much has to implement whatever ordering
guarantees the guest needs.

> E.g. if every device emulation waited for all outstanding writes to
> complete before sending a flush, then it wouldn't matter how the
> backend reordered its requests, even getting the completions out of
> order.

The NBD implementation in QEMU (which Nick is using) is completely
asynchronous.

Paolo

> Is that relationship documented (and conformed to)?
> 
> -- Jamie
>
Kevin Wolf - Oct. 25, 2012, 6:36 a.m.
Am 24.10.2012 16:32, schrieb Jamie Lokier:
> Kevin Wolf wrote:
>> Am 24.10.2012 14:16, schrieb Nicholas Thomas:
>>> On Tue, 2012-10-23 at 16:02 +0100, Jamie Lokier wrote:
>>>> Since the I/O _order_ before, and sometimes after, flush, is important
>>>> for data integrity, this needs to be maintained when I/Os are queued in
>>>> the disconnected state -- including those which were inflight at the
>>>> time disconnect was detected and then retried on reconnect.
>>>
>>> Hmm, discussing this on IRC I was told that it wasn't necessary to
>>> preserve order - although I forget the fine detail. Depending on the
>>> implementation of qemu's coroutine mutexes, operations may not actually
>>> be performed in order right now - it's not too easy to work out what's
>>> happening.
>>
>> It's possible to reorder, but it must be consistent with the order in
>> which completion is signalled to the guest. The semantics of flush is
>> that at the point that the flush completes, all writes to the disk that
>> already have completed successfully are stable. It doesn't say anything
>> about writes that are still in flight, they may or may not be flushed to
>> disk.
> 
> I admit I wasn't thinking clearly how much ordering NBD actually
> guarantees (or if there's ordering the guest depends on implicitly
> even if it's not guaranteed in specification), and how that is related
> within QEMU to virtio/FUA/NCQ/TCQ/SCSI-ORDERED ordering guarantees
> that the guest expects for various emulated devices and their settings.
> 
> The ordering (if any) needed from the NBD driver (or any backend) is
> going to depend on the assumptions baked into the interface between
> QEMU device emulation <-> backend.
> 
> E.g. if every device emulation waited for all outstanding writes to
> complete before sending a flush, then it wouldn't matter how the
> backend reordered its requests, even getting the completions out of
> order.
> 
> Is that relationship documented (and conformed to)?

No, like so many other things in qemu it's not spelt out explicitly.
However, as I understand it it's the same behaviour as real hardware
has, so device emulation at least for the common devices doesn't have to
implement anything special for it. If the hardware even supports
parallel requests, otherwise it would automatically only have a single
request in flight (like IDE).

Kevin
Jamie Lokier - Oct. 25, 2012, 5:09 p.m.
Kevin Wolf wrote:
> Am 24.10.2012 16:32, schrieb Jamie Lokier:
> > Kevin Wolf wrote:
> >> Am 24.10.2012 14:16, schrieb Nicholas Thomas:
> >>> On Tue, 2012-10-23 at 16:02 +0100, Jamie Lokier wrote:
> >>>> Since the I/O _order_ before, and sometimes after, flush, is important
> >>>> for data integrity, this needs to be maintained when I/Os are queued in
> >>>> the disconnected state -- including those which were inflight at the
> >>>> time disconnect was detected and then retried on reconnect.
> >>>
> >>> Hmm, discussing this on IRC I was told that it wasn't necessary to
> >>> preserve order - although I forget the fine detail. Depending on the
> >>> implementation of qemu's coroutine mutexes, operations may not actually
> >>> be performed in order right now - it's not too easy to work out what's
> >>> happening.
> >>
> >> It's possible to reorder, but it must be consistent with the order in
> >> which completion is signalled to the guest. The semantics of flush is
> >> that at the point that the flush completes, all writes to the disk that
> >> already have completed successfully are stable. It doesn't say anything
> >> about writes that are still in flight, they may or may not be flushed to
> >> disk.
> > 
> > I admit I wasn't thinking clearly how much ordering NBD actually
> > guarantees (or if there's ordering the guest depends on implicitly
> > even if it's not guaranteed in specification), and how that is related
> > within QEMU to virtio/FUA/NCQ/TCQ/SCSI-ORDERED ordering guarantees
> > that the guest expects for various emulated devices and their settings.
> > 
> > The ordering (if any) needed from the NBD driver (or any backend) is
> > going to depend on the assumptions baked into the interface between
> > QEMU device emulation <-> backend.
> > 
> > E.g. if every device emulation waited for all outstanding writes to
> > complete before sending a flush, then it wouldn't matter how the
> > backend reordered its requests, even getting the completions out of
> > order.
> > 
> > Is that relationship documented (and conformed to)?
> 
> No, like so many other things in qemu it's not spelt out explicitly.
> However, as I understand it it's the same behaviour as real hardware
> has, so device emulation at least for the common devices doesn't have to
> implement anything special for it. If the hardware even supports
> parallel requests, otherwise it would automatically only have a single
> request in flight (like IDE).

That's why I mention virtio/FUA/NCQ/TCQ/SCSI-ORDERED, which are quite
common.

They are features of devices which support multiple parallel requests,
but with certain ordering constraints conveyed by or expected by the
guest, which has to be ensured when it's mapped onto a QEMU fully
asynchronous backend.

That means they are features of the hardware which device emulations
_do_ have to implement.  If they don't, the storage is unreliable on
things like host power removal and virtual power removal.

If the backends are allowed to explicitly have no coupling between
different request types (even flush/discard and write), and ordering
constraints are being enforced by the order in which device emulations
submit and wait, that's fine.

I mention this, because POSIX aio_fsync() is _not_ fully decoupled
according to it's specification.

So it might be that some device emulations are depending on the
semantics of aio_fsync() or the QEMU equivalent by now; and randomly
reordering in the NBD driver in unusual circumstances (or any other
backend), would break those semantics.

-- Jamie
Kevin Wolf - Oct. 26, 2012, 7:59 a.m.
Am 25.10.2012 19:09, schrieb Jamie Lokier:
> Kevin Wolf wrote:
>> Am 24.10.2012 16:32, schrieb Jamie Lokier:
>>> Kevin Wolf wrote:
>>>> Am 24.10.2012 14:16, schrieb Nicholas Thomas:
>>>>> On Tue, 2012-10-23 at 16:02 +0100, Jamie Lokier wrote:
>>>>>> Since the I/O _order_ before, and sometimes after, flush, is important
>>>>>> for data integrity, this needs to be maintained when I/Os are queued in
>>>>>> the disconnected state -- including those which were inflight at the
>>>>>> time disconnect was detected and then retried on reconnect.
>>>>>
>>>>> Hmm, discussing this on IRC I was told that it wasn't necessary to
>>>>> preserve order - although I forget the fine detail. Depending on the
>>>>> implementation of qemu's coroutine mutexes, operations may not actually
>>>>> be performed in order right now - it's not too easy to work out what's
>>>>> happening.
>>>>
>>>> It's possible to reorder, but it must be consistent with the order in
>>>> which completion is signalled to the guest. The semantics of flush is
>>>> that at the point that the flush completes, all writes to the disk that
>>>> already have completed successfully are stable. It doesn't say anything
>>>> about writes that are still in flight, they may or may not be flushed to
>>>> disk.
>>>
>>> I admit I wasn't thinking clearly how much ordering NBD actually
>>> guarantees (or if there's ordering the guest depends on implicitly
>>> even if it's not guaranteed in specification), and how that is related
>>> within QEMU to virtio/FUA/NCQ/TCQ/SCSI-ORDERED ordering guarantees
>>> that the guest expects for various emulated devices and their settings.
>>>
>>> The ordering (if any) needed from the NBD driver (or any backend) is
>>> going to depend on the assumptions baked into the interface between
>>> QEMU device emulation <-> backend.
>>>
>>> E.g. if every device emulation waited for all outstanding writes to
>>> complete before sending a flush, then it wouldn't matter how the
>>> backend reordered its requests, even getting the completions out of
>>> order.
>>>
>>> Is that relationship documented (and conformed to)?
>>
>> No, like so many other things in qemu it's not spelt out explicitly.
>> However, as I understand it it's the same behaviour as real hardware
>> has, so device emulation at least for the common devices doesn't have to
>> implement anything special for it. If the hardware even supports
>> parallel requests, otherwise it would automatically only have a single
>> request in flight (like IDE).
> 
> That's why I mention virtio/FUA/NCQ/TCQ/SCSI-ORDERED, which are quite
> common.
> 
> They are features of devices which support multiple parallel requests,
> but with certain ordering constraints conveyed by or expected by the
> guest, which has to be ensured when it's mapped onto a QEMU fully
> asynchronous backend.
> 
> That means they are features of the hardware which device emulations
> _do_ have to implement.  If they don't, the storage is unreliable on
> things like host power removal and virtual power removal.

Yes, device emulations that need to maintain a given order must pay
attention to wait for completion of the previous requests.

> If the backends are allowed to explicitly have no coupling between
> different request types (even flush/discard and write), and ordering
> constraints are being enforced by the order in which device emulations
> submit and wait, that's fine.
> 
> I mention this, because POSIX aio_fsync() is _not_ fully decoupled
> according to it's specification.
> 
> So it might be that some device emulations are depending on the
> semantics of aio_fsync() or the QEMU equivalent by now; and randomly
> reordering in the NBD driver in unusual circumstances (or any other
> backend), would break those semantics.

qemu AIO has always had this semantics since bdrv_aio_flush() was
introduced. It behaves the same way for image files. So I don't see any
problem with NBD making use of the same.

Kevin

Patch

diff --git a/block/nbd.c b/block/nbd.c
index 2bce47b..9536408 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -71,6 +71,11 @@  typedef struct BDRVNBDState {
     char *host_spec;
 } BDRVNBDState;
 
+static bool nbd_is_connected(BDRVNBDState *s)
+{
+    return s->sock >= 0;
+}
+
 static int nbd_config(BDRVNBDState *s, const char *filename, int flags)
 {
     char *file;
@@ -309,6 +314,7 @@  static void nbd_teardown_connection(BlockDriverState *bs)
 
     qemu_aio_set_fd_handler(s->sock, NULL, NULL, NULL, NULL);
     closesocket(s->sock);
+    s->sock = -1;
 }
 
 static int nbd_open(BlockDriverState *bs, const char* filename, int flags)
@@ -316,6 +322,8 @@  static int nbd_open(BlockDriverState *bs, const char* filename, int flags)
     BDRVNBDState *s = bs->opaque;
     int result;
 
+    s->sock = -1;
+
     qemu_co_mutex_init(&s->send_mutex);
     qemu_co_mutex_init(&s->free_sema);
 
@@ -431,7 +439,7 @@  static int nbd_co_flush(BlockDriverState *bs)
     struct nbd_reply reply;
     ssize_t ret;
 
-    if (!(s->nbdflags & NBD_FLAG_SEND_FLUSH)) {
+    if (!nbd_is_connected(s) || !(s->nbdflags & NBD_FLAG_SEND_FLUSH)) {
         return 0;
     }
 
@@ -462,7 +470,7 @@  static int nbd_co_discard(BlockDriverState *bs, int64_t sector_num,
     struct nbd_reply reply;
     ssize_t ret;
 
-    if (!(s->nbdflags & NBD_FLAG_SEND_TRIM)) {
+    if (!nbd_is_connected(s) || !(s->nbdflags & NBD_FLAG_SEND_TRIM)) {
         return 0;
     }
     request.type = NBD_CMD_TRIM;
@@ -515,3 +523,4 @@  static void bdrv_nbd_init(void)
 }
 
 block_init(bdrv_nbd_init);
+