block: extend BLOCK_IO_ERROR event with nospace indicator
diff mbox

Message ID 20140829160727.69f66ecd@redhat.com
State New
Headers show

Commit Message

Luiz Capitulino Aug. 29, 2014, 8:07 p.m. UTC
Management software, such as RHEV's vdsm, want to be able to allocate
disk space on demand. The basic use case is to start a VM with a small
disk and then the disk is enlarged when QEMU hits a ENOSPC condition.

To this end, the management software has to be notified when QEMU
encounters ENOSPC. The solution implemented by this commit is simple:
it extends the BLOCK_IO_ERROR with a 'nospace' key, which is true
when QEMU is stopped due to ENOSPC.

Note that support for querying this event is already present in
query-block by means of the 'io-status' key. Also, the new 'nospace'
BLOCK_IO_ERROR field shares the same semantics with 'io-status',
which basically means that werror= has to be set to either
'stop' or 'enospc' to enable 'nospace'.

Finally, this commit also updates the 'io-status' key doc in the
schema with a list of supported device models.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
---

Three important observations:

 1. We've talked with oVirt and OpenStack folks. oVirt folks say that
    this implementation is enough for their use-case. OpenStack don't
    need this feature

 2. While testing this with a raw image on a (smaller) ext2 file mounted
    via the loopback device, I get half "Invalid argument" I/O errors and
    half  "No space" errors". This means that half of the BLOCK_IO_ERROR
    events that are emitted for this test-case will have nospace=false
    and the other half nospace=true. I don't know why I'm getting those
    "Invalid argument" errors, can anyone of the block layer comment
    on this? I don't get that with a qcow2 image (I get nospace=true for
    all events)

 3. I think this should go via block tree

 block.c              | 22 ++++++++++++++--------
 qapi/block-core.json |  8 +++++++-
 2 files changed, 21 insertions(+), 9 deletions(-)

Comments

Eric Blake Aug. 29, 2014, 8:33 p.m. UTC | #1
On 08/29/2014 02:07 PM, Luiz Capitulino wrote:
> Management software, such as RHEV's vdsm, want to be able to allocate
> disk space on demand. The basic use case is to start a VM with a small
> disk and then the disk is enlarged when QEMU hits a ENOSPC condition.
> 
> To this end, the management software has to be notified when QEMU
> encounters ENOSPC. The solution implemented by this commit is simple:
> it extends the BLOCK_IO_ERROR with a 'nospace' key, which is true
> when QEMU is stopped due to ENOSPC.
> 
> Note that support for querying this event is already present in
> query-block by means of the 'io-status' key. Also, the new 'nospace'
> BLOCK_IO_ERROR field shares the same semantics with 'io-status',
> which basically means that werror= has to be set to either
> 'stop' or 'enospc' to enable 'nospace'.
> 
> Finally, this commit also updates the 'io-status' key doc in the
> schema with a list of supported device models.
> 
> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
> ---

Reviewed-by: Eric Blake <eblake@redhat.com>
Luiz Capitulino Sept. 8, 2014, 2:42 p.m. UTC | #2
On Fri, 29 Aug 2014 16:07:27 -0400
Luiz Capitulino <lcapitulino@redhat.com> wrote:

> Management software, such as RHEV's vdsm, want to be able to allocate
> disk space on demand. The basic use case is to start a VM with a small
> disk and then the disk is enlarged when QEMU hits a ENOSPC condition.
> 
> To this end, the management software has to be notified when QEMU
> encounters ENOSPC. The solution implemented by this commit is simple:
> it extends the BLOCK_IO_ERROR with a 'nospace' key, which is true
> when QEMU is stopped due to ENOSPC.
> 
> Note that support for querying this event is already present in
> query-block by means of the 'io-status' key. Also, the new 'nospace'
> BLOCK_IO_ERROR field shares the same semantics with 'io-status',
> which basically means that werror= has to be set to either
> 'stop' or 'enospc' to enable 'nospace'.
> 
> Finally, this commit also updates the 'io-status' key doc in the
> schema with a list of supported device models.
> 
> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>

Kevin, are you going to take this via block layer tree?

> ---
> 
> Three important observations:
> 
>  1. We've talked with oVirt and OpenStack folks. oVirt folks say that
>     this implementation is enough for their use-case. OpenStack don't
>     need this feature
> 
>  2. While testing this with a raw image on a (smaller) ext2 file mounted
>     via the loopback device, I get half "Invalid argument" I/O errors and
>     half  "No space" errors". This means that half of the BLOCK_IO_ERROR
>     events that are emitted for this test-case will have nospace=false
>     and the other half nospace=true. I don't know why I'm getting those
>     "Invalid argument" errors, can anyone of the block layer comment
>     on this? I don't get that with a qcow2 image (I get nospace=true for
>     all events)
> 
>  3. I think this should go via block tree
> 
>  block.c              | 22 ++++++++++++++--------
>  qapi/block-core.json |  8 +++++++-
>  2 files changed, 21 insertions(+), 9 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 1df13ac..b334e35 100644
> --- a/block.c
> +++ b/block.c
> @@ -3632,6 +3632,18 @@ BlockErrorAction bdrv_get_error_action(BlockDriverState *bs, bool is_read, int e
>      }
>  }
>  
> +static void send_qmp_error_event(BlockDriverState *bs,
> +                                 BlockErrorAction action,
> +                                 bool is_read, int error)
> +{
> +    BlockErrorAction ac;
> +
> +    ac = is_read ? IO_OPERATION_TYPE_READ : IO_OPERATION_TYPE_WRITE;
> +    qapi_event_send_block_io_error(bdrv_get_device_name(bs), ac, action,
> +                                   bdrv_iostatus_is_enabled(bs),
> +                                   error == ENOSPC, &error_abort);
> +}
> +
>  /* This is done by device models because, while the block layer knows
>   * about the error, it does not know whether an operation comes from
>   * the device or the block layer (from a job, for example).
> @@ -3657,16 +3669,10 @@ void bdrv_error_action(BlockDriverState *bs, BlockErrorAction action,
>           * also ensures that the STOP/RESUME pair of events is emitted.
>           */
>          qemu_system_vmstop_request_prepare();
> -        qapi_event_send_block_io_error(bdrv_get_device_name(bs),
> -                                       is_read ? IO_OPERATION_TYPE_READ :
> -                                       IO_OPERATION_TYPE_WRITE,
> -                                       action, &error_abort);
> +        send_qmp_error_event(bs, action, is_read, error);
>          qemu_system_vmstop_request(RUN_STATE_IO_ERROR);
>      } else {
> -        qapi_event_send_block_io_error(bdrv_get_device_name(bs),
> -                                       is_read ? IO_OPERATION_TYPE_READ :
> -                                       IO_OPERATION_TYPE_WRITE,
> -                                       action, &error_abort);
> +        send_qmp_error_event(bs, action, is_read, error);
>      }
>  }
>  
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index fb74c56..567e0a6 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -336,6 +336,7 @@
>  #
>  # @io-status: #optional @BlockDeviceIoStatus. Only present if the device
>  #             supports it and the VM is configured to stop on errors
> +#             (supported device models: virtio-blk, ide, scsi-disk)
>  #
>  # @inserted: #optional @BlockDeviceInfo describing the device if media is
>  #            present
> @@ -1569,6 +1570,11 @@
>  #
>  # @action: action that has been taken
>  #
> +# @nospace: #optional true if I/O error was caused due to a no-space
> +#           condition. This key is only present if query-block's
> +#           io-status is present, please see query-block documentation
> +#           for more information (since: 2.2)
> +#
>  # Note: If action is "stop", a STOP event will eventually follow the
>  # BLOCK_IO_ERROR event
>  #
> @@ -1576,7 +1582,7 @@
>  ##
>  { 'event': 'BLOCK_IO_ERROR',
>    'data': { 'device': 'str', 'operation': 'IoOperationType',
> -            'action': 'BlockErrorAction' } }
> +            'action': 'BlockErrorAction', '*nospace': 'bool' } }
>  
>  ##
>  # @BLOCK_JOB_COMPLETED
Kevin Wolf Sept. 8, 2014, 3:33 p.m. UTC | #3
Am 08.09.2014 um 16:42 hat Luiz Capitulino geschrieben:
> On Fri, 29 Aug 2014 16:07:27 -0400
> Luiz Capitulino <lcapitulino@redhat.com> wrote:
> 
> > Management software, such as RHEV's vdsm, want to be able to allocate
> > disk space on demand. The basic use case is to start a VM with a small
> > disk and then the disk is enlarged when QEMU hits a ENOSPC condition.
> > 
> > To this end, the management software has to be notified when QEMU
> > encounters ENOSPC. The solution implemented by this commit is simple:
> > it extends the BLOCK_IO_ERROR with a 'nospace' key, which is true
> > when QEMU is stopped due to ENOSPC.
> > 
> > Note that support for querying this event is already present in
> > query-block by means of the 'io-status' key. Also, the new 'nospace'
> > BLOCK_IO_ERROR field shares the same semantics with 'io-status',
> > which basically means that werror= has to be set to either
> > 'stop' or 'enospc' to enable 'nospace'.
> > 
> > Finally, this commit also updates the 'io-status' key doc in the
> > schema with a list of supported device models.
> > 
> > Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
> 
> Kevin, are you going to take this via block layer tree?

Yes, thanks, I've applied it now.

What was our conclusion wrt the human-readable strerror() string for
debugging? Didn't we want to add that as well?

> > ---
> > 
> > Three important observations:
> > 
> >  1. We've talked with oVirt and OpenStack folks. oVirt folks say that
> >     this implementation is enough for their use-case. OpenStack don't
> >     need this feature
> > 
> >  2. While testing this with a raw image on a (smaller) ext2 file mounted
> >     via the loopback device, I get half "Invalid argument" I/O errors and
> >     half  "No space" errors". This means that half of the BLOCK_IO_ERROR
> >     events that are emitted for this test-case will have nospace=false
> >     and the other half nospace=true. I don't know why I'm getting those
> >     "Invalid argument" errors, can anyone of the block layer comment
> >     on this? I don't get that with a qcow2 image (I get nospace=true for
> >     all events)

Sounds familiar, but I never got around to debugging. Would probably be
worth some digging where the EINVAL comes from.

> >  3. I think this should go via block tree
> > 
> >  block.c              | 22 ++++++++++++++--------
> >  qapi/block-core.json |  8 +++++++-
> >  2 files changed, 21 insertions(+), 9 deletions(-)

Kevin
Luiz Capitulino Sept. 8, 2014, 4:57 p.m. UTC | #4
On Mon, 8 Sep 2014 17:33:18 +0200
Kevin Wolf <kwolf@redhat.com> wrote:

> Am 08.09.2014 um 16:42 hat Luiz Capitulino geschrieben:
> > On Fri, 29 Aug 2014 16:07:27 -0400
> > Luiz Capitulino <lcapitulino@redhat.com> wrote:
> > 
> > > Management software, such as RHEV's vdsm, want to be able to allocate
> > > disk space on demand. The basic use case is to start a VM with a small
> > > disk and then the disk is enlarged when QEMU hits a ENOSPC condition.
> > > 
> > > To this end, the management software has to be notified when QEMU
> > > encounters ENOSPC. The solution implemented by this commit is simple:
> > > it extends the BLOCK_IO_ERROR with a 'nospace' key, which is true
> > > when QEMU is stopped due to ENOSPC.
> > > 
> > > Note that support for querying this event is already present in
> > > query-block by means of the 'io-status' key. Also, the new 'nospace'
> > > BLOCK_IO_ERROR field shares the same semantics with 'io-status',
> > > which basically means that werror= has to be set to either
> > > 'stop' or 'enospc' to enable 'nospace'.
> > > 
> > > Finally, this commit also updates the 'io-status' key doc in the
> > > schema with a list of supported device models.
> > > 
> > > Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
> > 
> > Kevin, are you going to take this via block layer tree?
> 
> Yes, thanks, I've applied it now.
> 
> What was our conclusion wrt the human-readable strerror() string for
> debugging? Didn't we want to add that as well?

I can do it on top of this patch. So, just adding a new field for this
is fine?

> 
> > > ---
> > > 
> > > Three important observations:
> > > 
> > >  1. We've talked with oVirt and OpenStack folks. oVirt folks say that
> > >     this implementation is enough for their use-case. OpenStack don't
> > >     need this feature
> > > 
> > >  2. While testing this with a raw image on a (smaller) ext2 file mounted
> > >     via the loopback device, I get half "Invalid argument" I/O errors and
> > >     half  "No space" errors". This means that half of the BLOCK_IO_ERROR
> > >     events that are emitted for this test-case will have nospace=false
> > >     and the other half nospace=true. I don't know why I'm getting those
> > >     "Invalid argument" errors, can anyone of the block layer comment
> > >     on this? I don't get that with a qcow2 image (I get nospace=true for
> > >     all events)
> 
> Sounds familiar, but I never got around to debugging. Would probably be
> worth some digging where the EINVAL comes from.
> 
> > >  3. I think this should go via block tree
> > > 
> > >  block.c              | 22 ++++++++++++++--------
> > >  qapi/block-core.json |  8 +++++++-
> > >  2 files changed, 21 insertions(+), 9 deletions(-)
> 
> Kevin
>
Kevin Wolf Sept. 9, 2014, 8:27 a.m. UTC | #5
Am 08.09.2014 um 18:57 hat Luiz Capitulino geschrieben:
> On Mon, 8 Sep 2014 17:33:18 +0200
> Kevin Wolf <kwolf@redhat.com> wrote:
> 
> > Am 08.09.2014 um 16:42 hat Luiz Capitulino geschrieben:
> > > On Fri, 29 Aug 2014 16:07:27 -0400
> > > Luiz Capitulino <lcapitulino@redhat.com> wrote:
> > > 
> > > > Management software, such as RHEV's vdsm, want to be able to allocate
> > > > disk space on demand. The basic use case is to start a VM with a small
> > > > disk and then the disk is enlarged when QEMU hits a ENOSPC condition.
> > > > 
> > > > To this end, the management software has to be notified when QEMU
> > > > encounters ENOSPC. The solution implemented by this commit is simple:
> > > > it extends the BLOCK_IO_ERROR with a 'nospace' key, which is true
> > > > when QEMU is stopped due to ENOSPC.
> > > > 
> > > > Note that support for querying this event is already present in
> > > > query-block by means of the 'io-status' key. Also, the new 'nospace'
> > > > BLOCK_IO_ERROR field shares the same semantics with 'io-status',
> > > > which basically means that werror= has to be set to either
> > > > 'stop' or 'enospc' to enable 'nospace'.
> > > > 
> > > > Finally, this commit also updates the 'io-status' key doc in the
> > > > schema with a list of supported device models.
> > > > 
> > > > Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
> > > 
> > > Kevin, are you going to take this via block layer tree?
> > 
> > Yes, thanks, I've applied it now.
> > 
> > What was our conclusion wrt the human-readable strerror() string for
> > debugging? Didn't we want to add that as well?
> 
> I can do it on top of this patch. So, just adding a new field for this
> is fine?

I think so. Perhaps we should give it an 'x-' name to make clear that
it's a debugging help and not supposed to be parsed by management tools.
Or would that be abuse of that namespace?

The alternative solution (or actually we could do both) would be to
store it somewhere in bs and put it into query-block.

Kevin
Eric Blake Sept. 9, 2014, 12:37 p.m. UTC | #6
On 09/09/2014 02:27 AM, Kevin Wolf wrote:

>>>
>>> What was our conclusion wrt the human-readable strerror() string for
>>> debugging? Didn't we want to add that as well?
>>
>> I can do it on top of this patch. So, just adding a new field for this
>> is fine?
> 
> I think so. Perhaps we should give it an 'x-' name to make clear that
> it's a debugging help and not supposed to be parsed by management tools.
> Or would that be abuse of that namespace?

I think using x- would be okay for the namespace, but am not sure we
need to go that far.  We already have other fields without x- that are
documented as human-readable only; for example, CommandLineParameterInfo
has:

# @help: #optional human readable text string, not suitable for parsing.

or in our events, QUORUM_REPORT_BAD has:

# @error: #optional, error message. Only present on failure. This field
#         contains a human-readable error message. There are no
semantics other
#         than that the block layer reported an error and clients should not
#         try to interpret the error string.

So I'd be fine with something as simple as 'message' or 'error'.

> 
> The alternative solution (or actually we could do both) would be to
> store it somewhere in bs and put it into query-block.

Enhancing query-block in addition to the event makes sense, if it is
easy enough to do.  At this point, we are talking about debugging aids,
so as long as they are documented appropriately, I won't be too fussy.
Luiz Capitulino Sept. 9, 2014, 12:43 p.m. UTC | #7
On Tue, 09 Sep 2014 06:37:31 -0600
Eric Blake <eblake@redhat.com> wrote:

> On 09/09/2014 02:27 AM, Kevin Wolf wrote:
> 
> >>>
> >>> What was our conclusion wrt the human-readable strerror() string for
> >>> debugging? Didn't we want to add that as well?
> >>
> >> I can do it on top of this patch. So, just adding a new field for this
> >> is fine?
> > 
> > I think so. Perhaps we should give it an 'x-' name to make clear that
> > it's a debugging help and not supposed to be parsed by management tools.
> > Or would that be abuse of that namespace?
> 
> I think using x- would be okay for the namespace, but am not sure we
> need to go that far.  We already have other fields without x- that are
> documented as human-readable only; for example, CommandLineParameterInfo
> has:
> 
> # @help: #optional human readable text string, not suitable for parsing.
> 
> or in our events, QUORUM_REPORT_BAD has:
> 
> # @error: #optional, error message. Only present on failure. This field
> #         contains a human-readable error message. There are no
> semantics other
> #         than that the block layer reported an error and clients should not
> #         try to interpret the error string.
> 
> So I'd be fine with something as simple as 'message' or 'error'.
> 
> > 
> > The alternative solution (or actually we could do both) would be to
> > store it somewhere in bs and put it into query-block.
> 
> Enhancing query-block in addition to the event makes sense, if it is
> easy enough to do.  At this point, we are talking about debugging aids,
> so as long as they are documented appropriately, I won't be too fussy.

OK, but I'm wondering if we need to add the string field to both,
BLOCK_IO_ERROR and query-block, or only to one to the other.

In my opinion, we should only add it to BLOCK_IO_ERROR if libvirt is
going to consume. Otherwise, it makes more sense to add it to query-block
because that's where we'll meet the user.

Btw, by "consume" I mean read it and make it available to libvirt clients
so that they can print it to their users. If we don't want libvirt to
consume that field then I think we should only add it to query-block and
info block.
Eric Blake Sept. 9, 2014, 12:53 p.m. UTC | #8
On 09/09/2014 06:43 AM, Luiz Capitulino wrote:

>> Enhancing query-block in addition to the event makes sense, if it is
>> easy enough to do.  At this point, we are talking about debugging aids,
>> so as long as they are documented appropriately, I won't be too fussy.
> 
> OK, but I'm wondering if we need to add the string field to both,
> BLOCK_IO_ERROR and query-block, or only to one to the other.
> 
> In my opinion, we should only add it to BLOCK_IO_ERROR if libvirt is
> going to consume. Otherwise, it makes more sense to add it to query-block
> because that's where we'll meet the user.
> 
> Btw, by "consume" I mean read it and make it available to libvirt clients
> so that they can print it to their users. If we don't want libvirt to
> consume that field then I think we should only add it to query-block and
> info block.

[For those not aware, qemu built for downstream RHEL already has an
error string in the __com.redhat_ namespace; we're trying to figure out
what upstream should have so that downstream doesn't have to perpetually
maintain an extension]

Downstream libvirt does not currently consume any error string.  With
downstream qemu, the _only_ way to get the error string in the event is
to parse libvirt logs, or use upstream libvirt with its backdoor of
'virsh qemu-monitor-event' (through the explicitly unsupported
libvirt-qemu.so) to get at the raw event information.  Changing libvirt
to expose such an error string to the end user would require a new
libvirt event number (the existing libvirt event is not extensible), so
existing clients would not be able to get at the information without
being recompiled to a new libvirt.

Since the whole point of this field is for debugging, I think that it is
sufficient to add it to JUST query-block, and not to the event.  That
is, if the app on top of libvirt gets an error, in the common case, they
won't care about the message (it won't change how they act), and in the
debug case, a developer trying to learn more about what happened can do
their own query-block directly (via 'virsh qemu-monitor-command', also
in libvirt-qemu.so) rather than trying to wire up libvirt to pass
through an error string through a new event.
Kevin Wolf Sept. 9, 2014, 1:23 p.m. UTC | #9
Am 09.09.2014 um 14:53 hat Eric Blake geschrieben:
> On 09/09/2014 06:43 AM, Luiz Capitulino wrote:
> 
> >> Enhancing query-block in addition to the event makes sense, if it is
> >> easy enough to do.  At this point, we are talking about debugging aids,
> >> so as long as they are documented appropriately, I won't be too fussy.
> > 
> > OK, but I'm wondering if we need to add the string field to both,
> > BLOCK_IO_ERROR and query-block, or only to one to the other.
> > 
> > In my opinion, we should only add it to BLOCK_IO_ERROR if libvirt is
> > going to consume. Otherwise, it makes more sense to add it to query-block
> > because that's where we'll meet the user.
> > 
> > Btw, by "consume" I mean read it and make it available to libvirt clients
> > so that they can print it to their users. If we don't want libvirt to
> > consume that field then I think we should only add it to query-block and
> > info block.
> 
> [For those not aware, qemu built for downstream RHEL already has an
> error string in the __com.redhat_ namespace; we're trying to figure out
> what upstream should have so that downstream doesn't have to perpetually
> maintain an extension]
> 
> Downstream libvirt does not currently consume any error string.  With
> downstream qemu, the _only_ way to get the error string in the event is
> to parse libvirt logs, or use upstream libvirt with its backdoor of
> 'virsh qemu-monitor-event' (through the explicitly unsupported
> libvirt-qemu.so) to get at the raw event information.  Changing libvirt
> to expose such an error string to the end user would require a new
> libvirt event number (the existing libvirt event is not extensible), so
> existing clients would not be able to get at the information without
> being recompiled to a new libvirt.
> 
> Since the whole point of this field is for debugging, I think that it is
> sufficient to add it to JUST query-block, and not to the event.  That
> is, if the app on top of libvirt gets an error, in the common case, they
> won't care about the message (it won't change how they act), and in the
> debug case, a developer trying to learn more about what happened can do
> their own query-block directly (via 'virsh qemu-monitor-command', also
> in libvirt-qemu.so) rather than trying to wire up libvirt to pass
> through an error string through a new event.

You mention libvirt logs and I think that's an important point: If we
move it to query-block, will we still get a log entry so that we can
check this after the fact when only a log file is attached to a bug
report, but we don't have a running guest?

Kevin
Luiz Capitulino Sept. 9, 2014, 1:42 p.m. UTC | #10
On Tue, 9 Sep 2014 15:23:50 +0200
Kevin Wolf <kwolf@redhat.com> wrote:

> Am 09.09.2014 um 14:53 hat Eric Blake geschrieben:
> > On 09/09/2014 06:43 AM, Luiz Capitulino wrote:
> > 
> > >> Enhancing query-block in addition to the event makes sense, if it is
> > >> easy enough to do.  At this point, we are talking about debugging aids,
> > >> so as long as they are documented appropriately, I won't be too fussy.
> > > 
> > > OK, but I'm wondering if we need to add the string field to both,
> > > BLOCK_IO_ERROR and query-block, or only to one to the other.
> > > 
> > > In my opinion, we should only add it to BLOCK_IO_ERROR if libvirt is
> > > going to consume. Otherwise, it makes more sense to add it to query-block
> > > because that's where we'll meet the user.
> > > 
> > > Btw, by "consume" I mean read it and make it available to libvirt clients
> > > so that they can print it to their users. If we don't want libvirt to
> > > consume that field then I think we should only add it to query-block and
> > > info block.
> > 
> > [For those not aware, qemu built for downstream RHEL already has an
> > error string in the __com.redhat_ namespace; we're trying to figure out
> > what upstream should have so that downstream doesn't have to perpetually
> > maintain an extension]
> > 
> > Downstream libvirt does not currently consume any error string.  With
> > downstream qemu, the _only_ way to get the error string in the event is
> > to parse libvirt logs, or use upstream libvirt with its backdoor of
> > 'virsh qemu-monitor-event' (through the explicitly unsupported
> > libvirt-qemu.so) to get at the raw event information.  Changing libvirt
> > to expose such an error string to the end user would require a new
> > libvirt event number (the existing libvirt event is not extensible), so
> > existing clients would not be able to get at the information without
> > being recompiled to a new libvirt.
> > 
> > Since the whole point of this field is for debugging, I think that it is
> > sufficient to add it to JUST query-block, and not to the event.  That
> > is, if the app on top of libvirt gets an error, in the common case, they
> > won't care about the message (it won't change how they act), and in the
> > debug case, a developer trying to learn more about what happened can do
> > their own query-block directly (via 'virsh qemu-monitor-command', also
> > in libvirt-qemu.so) rather than trying to wire up libvirt to pass
> > through an error string through a new event.
> 
> You mention libvirt logs and I think that's an important point: If we
> move it to query-block, will we still get a log entry so that we can
> check this after the fact when only a log file is attached to a bug
> report, but we don't have a running guest?

Good point. I think this is call to have it in the event or both.

Patch
diff mbox

diff --git a/block.c b/block.c
index 1df13ac..b334e35 100644
--- a/block.c
+++ b/block.c
@@ -3632,6 +3632,18 @@  BlockErrorAction bdrv_get_error_action(BlockDriverState *bs, bool is_read, int e
     }
 }
 
+static void send_qmp_error_event(BlockDriverState *bs,
+                                 BlockErrorAction action,
+                                 bool is_read, int error)
+{
+    BlockErrorAction ac;
+
+    ac = is_read ? IO_OPERATION_TYPE_READ : IO_OPERATION_TYPE_WRITE;
+    qapi_event_send_block_io_error(bdrv_get_device_name(bs), ac, action,
+                                   bdrv_iostatus_is_enabled(bs),
+                                   error == ENOSPC, &error_abort);
+}
+
 /* This is done by device models because, while the block layer knows
  * about the error, it does not know whether an operation comes from
  * the device or the block layer (from a job, for example).
@@ -3657,16 +3669,10 @@  void bdrv_error_action(BlockDriverState *bs, BlockErrorAction action,
          * also ensures that the STOP/RESUME pair of events is emitted.
          */
         qemu_system_vmstop_request_prepare();
-        qapi_event_send_block_io_error(bdrv_get_device_name(bs),
-                                       is_read ? IO_OPERATION_TYPE_READ :
-                                       IO_OPERATION_TYPE_WRITE,
-                                       action, &error_abort);
+        send_qmp_error_event(bs, action, is_read, error);
         qemu_system_vmstop_request(RUN_STATE_IO_ERROR);
     } else {
-        qapi_event_send_block_io_error(bdrv_get_device_name(bs),
-                                       is_read ? IO_OPERATION_TYPE_READ :
-                                       IO_OPERATION_TYPE_WRITE,
-                                       action, &error_abort);
+        send_qmp_error_event(bs, action, is_read, error);
     }
 }
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index fb74c56..567e0a6 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -336,6 +336,7 @@ 
 #
 # @io-status: #optional @BlockDeviceIoStatus. Only present if the device
 #             supports it and the VM is configured to stop on errors
+#             (supported device models: virtio-blk, ide, scsi-disk)
 #
 # @inserted: #optional @BlockDeviceInfo describing the device if media is
 #            present
@@ -1569,6 +1570,11 @@ 
 #
 # @action: action that has been taken
 #
+# @nospace: #optional true if I/O error was caused due to a no-space
+#           condition. This key is only present if query-block's
+#           io-status is present, please see query-block documentation
+#           for more information (since: 2.2)
+#
 # Note: If action is "stop", a STOP event will eventually follow the
 # BLOCK_IO_ERROR event
 #
@@ -1576,7 +1582,7 @@ 
 ##
 { 'event': 'BLOCK_IO_ERROR',
   'data': { 'device': 'str', 'operation': 'IoOperationType',
-            'action': 'BlockErrorAction' } }
+            'action': 'BlockErrorAction', '*nospace': 'bool' } }
 
 ##
 # @BLOCK_JOB_COMPLETED