diff mbox

[13/47] block: introduce block job error

Message ID 1343127865-16608-14-git-send-email-pbonzini@redhat.com
State New
Headers show

Commit Message

Paolo Bonzini July 24, 2012, 11:03 a.m. UTC
The following behaviors are possible:

'report': The behavior is the same as in 1.1.  An I/O error,
respectively during a read or a write, will complete the job immediately
with an error code.

'ignore': An I/O error, respectively during a read or a write, will be
ignored.  For streaming, the job will complete with an error and the
backing file will be left in place.  For mirroring, the sector will be
marked again as dirty and re-examined later.

'stop': The job will be paused and the job iostatus will be set to
failed or nospace, while the VM will keep running.  This can only be
specified if the block device has rerror=stop and werror=stop or enospc.

'enospc': Behaves as 'stop' for ENOSPC errors, 'report' for others.

In all cases, even for 'report', the I/O error is reported as a QMP
event BLOCK_JOB_ERROR, with the same arguments as BLOCK_IO_ERROR.

It is possible that while stopping the VM a BLOCK_IO_ERROR event will be
reported and will clobber the event from BLOCK_JOB_ERROR, or vice versa.
This is not really avoidable since stopping the VM completes all pending
I/O requests.  In fact, it is already possible now that a series of
BLOCK_IO_ERROR events are reported with rerror=stop, because vm_stop
calls bdrv_drain_all and this can generate further errors.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 QMP/qmp-events.txt |   23 +++++++++++++++++++
 block.c            |    9 ++++----
 block_int.h        |    4 ++++
 blockjob.c         |   62 +++++++++++++++++++++++++++++++++++++++++++++++-----
 blockjob.h         |   17 ++++++++++++++
 monitor.c          |    1 +
 monitor.h          |    1 +
 qapi-schema.json   |    5 ++++-
 8 files changed, 111 insertions(+), 11 deletions(-)

Comments

Eric Blake July 25, 2012, 5:40 p.m. UTC | #1
On 07/24/2012 05:03 AM, Paolo Bonzini wrote:
> The following behaviors are possible:
> 
> 'report': The behavior is the same as in 1.1.  An I/O error,
> respectively during a read or a write, will complete the job immediately
> with an error code.
> 
> 'ignore': An I/O error, respectively during a read or a write, will be
> ignored.  For streaming, the job will complete with an error and the
> backing file will be left in place.  For mirroring, the sector will be
> marked again as dirty and re-examined later.

You actually documented these early in qapi-schema.json in patch 10/47,
but accurate bisection of docs is not a show-stopper.

I've reviewed through this point in the series, and haven't found
anything that triggered any complaints from my end.
Kevin Wolf Aug. 1, 2012, 10:14 a.m. UTC | #2
Am 24.07.2012 13:03, schrieb Paolo Bonzini:
> The following behaviors are possible:
> 
> 'report': The behavior is the same as in 1.1.  An I/O error,
> respectively during a read or a write, will complete the job immediately
> with an error code.
> 
> 'ignore': An I/O error, respectively during a read or a write, will be
> ignored.  For streaming, the job will complete with an error and the
> backing file will be left in place.  For mirroring, the sector will be
> marked again as dirty and re-examined later.
> 
> 'stop': The job will be paused and the job iostatus will be set to
> failed or nospace, while the VM will keep running.  This can only be
> specified if the block device has rerror=stop and werror=stop or enospc.
> 
> 'enospc': Behaves as 'stop' for ENOSPC errors, 'report' for others.
> 
> In all cases, even for 'report', the I/O error is reported as a QMP
> event BLOCK_JOB_ERROR, with the same arguments as BLOCK_IO_ERROR.
> 
> It is possible that while stopping the VM a BLOCK_IO_ERROR event will be
> reported and will clobber the event from BLOCK_JOB_ERROR, or vice versa.
> This is not really avoidable since stopping the VM completes all pending
> I/O requests.  In fact, it is already possible now that a series of
> BLOCK_IO_ERROR events are reported with rerror=stop, because vm_stop
> calls bdrv_drain_all and this can generate further errors.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

If we want to switch to named target block devices later, it would
probably make sense to use the io_status of that block device rather
than adding it to the job.

Maybe what results is a duplication that can be tolerated, though.

> +BlockErrorAction block_job_error_action(BlockJob *job, BlockDriverState *bs,
> +                                        BlockdevOnError on_err,
> +                                        int is_read, int error)
> +{
> +    BlockErrorAction action;
> +
> +    switch (on_err) {
> +    case BLOCKDEV_ON_ERROR_ENOSPC:
> +        action = (error == ENOSPC) ? BDRV_ACTION_STOP : BDRV_ACTION_REPORT;
> +        break;
> +    case BLOCKDEV_ON_ERROR_STOP:
> +        action = BDRV_ACTION_STOP;
> +        break;
> +    case BLOCKDEV_ON_ERROR_REPORT:
> +        action = BDRV_ACTION_REPORT;
> +        break;
> +    case BLOCKDEV_ON_ERROR_IGNORE:
> +        action = BDRV_ACTION_IGNORE;
> +        break;
> +    default:
> +        abort();
> +    }
> +    bdrv_emit_qmp_error_event(job->bs, QEVENT_BLOCK_JOB_ERROR, action, is_read);
> +    if (action == BDRV_ACTION_STOP) {
> +        block_job_pause(job);
> +        if (bs == job->bs) {
> +            block_job_iostatus_set_err(job, error);
> +        } else {
> +            bdrv_iostatus_set_err(bs, error);
> +        }

However, so that everything just falls into place once we make the
target block device visible, I'd make the bdrv_iostatus_set_err() call
unconditional then.

Kevin
Paolo Bonzini Aug. 1, 2012, 11:17 a.m. UTC | #3
Il 01/08/2012 12:14, Kevin Wolf ha scritto:
>> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> If we want to switch to named target block devices later, it would
> probably make sense to use the io_status of that block device rather
> than adding it to the job.
> 
> Maybe what results is a duplication that can be tolerated, though.

We are probably thinking of two opposite implementations.

You are thinking:

- errors in streaming, or in the mirroring source go to the block device

- errors in the mirroring target go to the block job

What I implemented is:

- errors in streaming, or in the mirroring source go to the block job

- errors in the mirroring target go to the target block device (which as
of this series could be inspected with query-block-jobs).


The reason is that an error in streaming or in the mirroring source does
not stop the VM.  A hypothetical management that polled for errors with
"info block" would see a mismatch between the error state ("failed") and
the VM RunState ("running").

So this is already ready for making the target block device visible.

The real question is: if I remove the possibility to inspect the (so far
anonymous) target device with query-block-jobs, how do you read the
status of the target device?...

Paolo

>> +    }
>> +    bdrv_emit_qmp_error_event(job->bs, QEVENT_BLOCK_JOB_ERROR, action, is_read);
>> +    if (action == BDRV_ACTION_STOP) {
>> +        block_job_pause(job);
>> +        if (bs == job->bs) {
>> +            block_job_iostatus_set_err(job, error);
>> +        } else {
>> +            bdrv_iostatus_set_err(bs, error);
>> +        }
> 
> However, so that everything just falls into place once we make the
> target block device visible, I'd make the bdrv_iostatus_set_err() call
> unconditional then.


Paolo
Kevin Wolf Aug. 1, 2012, 11:49 a.m. UTC | #4
Am 01.08.2012 13:17, schrieb Paolo Bonzini:
> Il 01/08/2012 12:14, Kevin Wolf ha scritto:
>>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> If we want to switch to named target block devices later, it would
>> probably make sense to use the io_status of that block device rather
>> than adding it to the job.
>>
>> Maybe what results is a duplication that can be tolerated, though.
> 
> We are probably thinking of two opposite implementations.
> 
> You are thinking:
> 
> - errors in streaming, or in the mirroring source go to the block device
> 
> - errors in the mirroring target go to the block job
> 
> What I implemented is:
> 
> - errors in streaming, or in the mirroring source go to the block job
> 
> - errors in the mirroring target go to the target block device (which as
> of this series could be inspected with query-block-jobs).

Ah, yes, I misunderstood.

> The reason is that an error in streaming or in the mirroring source does
> not stop the VM.  A hypothetical management that polled for errors with
> "info block" would see a mismatch between the error state ("failed") and
> the VM RunState ("running").
> 
> So this is already ready for making the target block device visible.
> 
> The real question is: if I remove the possibility to inspect the (so far
> anonymous) target device with query-block-jobs, how do you read the
> status of the target device?...

You don't? :-)

Maybe we should use named block devices from the beginning.

Kevin
Paolo Bonzini Aug. 1, 2012, 12:09 p.m. UTC | #5
Il 01/08/2012 13:49, Kevin Wolf ha scritto:
> > The real question is: if I remove the possibility to inspect the (so far
> > anonymous) target device with query-block-jobs, how do you read the
> > status of the target device?...
> 
> You don't? :-)

That's a possibility. :)

You can just report it in the block job.  It's more consistent with the
fact that you got a BLOCK_JOB_ERROR and not a BLOCK_IO_ERROR.  So I
would do:

+    bdrv_emit_qmp_error_event(job->bs, QEVENT_BLOCK_JOB_ERROR,
+                              action, is_read);
+    if (action == BDRV_ACTION_STOP) {
+        block_job_pause(job);
+        block_job_iostatus_set_err(job, error);
+        if (bs != job->bs) {
+            bdrv_iostatus_set_err(bs, error);
+        }
+    }

where the bdrv_iostatus_set_err is mostly to "prepare for the future"
usage of named block devices.

As you said for ENOSPC vs. EIO, management must be ready to retry
multiple times, if it has only the final state at its disposal.

On the other hand, if you see the exact sequence of BLOCK_IO_ERROR vs.
BLOCK_JOB_ERROR you know exactly how the error happened and you can fix it.

> Maybe we should use named block devices from the beginning.

Hmm, but I'm a bit wary of introducing such a big change now.  We know
what it makes nicer, but we don't know of anything irremediably broken
without them, and we haven't thought enough of any warts it introduces.

Paolo
Kevin Wolf Aug. 1, 2012, 12:23 p.m. UTC | #6
Am 01.08.2012 14:09, schrieb Paolo Bonzini:
> Il 01/08/2012 13:49, Kevin Wolf ha scritto:
>>> The real question is: if I remove the possibility to inspect the (so far
>>> anonymous) target device with query-block-jobs, how do you read the
>>> status of the target device?...
>>
>> You don't? :-)
> 
> That's a possibility. :)
> 
> You can just report it in the block job.  It's more consistent with the
> fact that you got a BLOCK_JOB_ERROR and not a BLOCK_IO_ERROR.  So I
> would do:
> 
> +    bdrv_emit_qmp_error_event(job->bs, QEVENT_BLOCK_JOB_ERROR,
> +                              action, is_read);

Isn't job->bs the source?

Also, if you miss the event (e.g. libvirt crashed), can you still tell
which bs caused the error? Do we need another BlockJobInfo field for the
name (*cough* ;-)) of the failed bs?

If I understand it right, error reporting on the source and on the
target would be completely symmetrical then, which I think is a good thing.

job->bs makes one image special, which isn't really useful for a generic
interface. So we should keep the fact that it exists internal and get
rid of it sometime.

> +    if (action == BDRV_ACTION_STOP) {
> +        block_job_pause(job);
> +        block_job_iostatus_set_err(job, error);
> +        if (bs != job->bs) {
> +            bdrv_iostatus_set_err(bs, error);
> +        }
> +    }
> 
> where the bdrv_iostatus_set_err is mostly to "prepare for the future"
> usage of named block devices.

Again, I'd make it unconditional to get symmetric behaviour.

> As you said for ENOSPC vs. EIO, management must be ready to retry
> multiple times, if it has only the final state at its disposal.
> 
> On the other hand, if you see the exact sequence of BLOCK_IO_ERROR vs.
> BLOCK_JOB_ERROR you know exactly how the error happened and you can fix it.
> 
>> Maybe we should use named block devices from the beginning.
> 
> Hmm, but I'm a bit wary of introducing such a big change now.  We know
> what it makes nicer, but we don't know of anything irremediably broken
> without them, and we haven't thought enough of any warts it introduces.

On the one hand, I can understand your concerns, but on the other hand,
introducing an API now and then making such a big change afterwards
scares me much more.

Kevin
Paolo Bonzini Aug. 1, 2012, 12:30 p.m. UTC | #7
Il 01/08/2012 14:23, Kevin Wolf ha scritto:
> Am 01.08.2012 14:09, schrieb Paolo Bonzini:
>> Il 01/08/2012 13:49, Kevin Wolf ha scritto:
>>>> The real question is: if I remove the possibility to inspect the (so far
>>>> anonymous) target device with query-block-jobs, how do you read the
>>>> status of the target device?...
>>>
>>> You don't? :-)
>>
>> That's a possibility. :)
>>
>> You can just report it in the block job.  It's more consistent with the
>> fact that you got a BLOCK_JOB_ERROR and not a BLOCK_IO_ERROR.  So I
>> would do:
>>
>> +    bdrv_emit_qmp_error_event(job->bs, QEVENT_BLOCK_JOB_ERROR,
>> +                              action, is_read);
> 
> Isn't job->bs the source?

It's not about source vs. target, it's about what block device the _job_
is attached too.  The source is always going to be special because you
must do "block-job-resume source".

is_read tells you where the error was.

> Also, if you miss the event (e.g. libvirt crashed), can you still tell
> which bs caused the error?

No, but how is it different from "can you still tell which bs in the
chain caused the error"?  Which you cannot tell anyway even from the
event parameters we have had so far.

> Do we need another BlockJobInfo field for the
> name (*cough* ;-)) of the failed bs?
> 
> If I understand it right, error reporting on the source and on the
> target would be completely symmetrical then, which I think is a good thing.

Yeah, it would, but job->bs _is_ special anyway.

> job->bs makes one image special, which isn't really useful for a generic
> interface. So we should keep the fact that it exists internal and get
> rid of it sometime.
> 
>> +    if (action == BDRV_ACTION_STOP) {
>> +        block_job_pause(job);
>> +        block_job_iostatus_set_err(job, error);
>> +        if (bs != job->bs) {
>> +            bdrv_iostatus_set_err(bs, error);
>> +        }
>> +    }
>>
>> where the bdrv_iostatus_set_err is mostly to "prepare for the future"
>> usage of named block devices.
> 
> Again, I'd make it unconditional to get symmetric behaviour.

Not possible, because existing clients may expect "iostatus:
{nospace,failed}" to imply a runstate of "not running (i/o error)".

>> Hmm, but I'm a bit wary of introducing such a big change now.  We know
>> what it makes nicer, but we don't know of anything irremediably broken
>> without them, and we haven't thought enough of any warts it introduces.
> 
> On the one hand, I can understand your concerns, but on the other hand,
> introducing an API now and then making such a big change afterwards
> scares me much more.

One example of the doubts I have: is iostatus a BlockBackend or a
BlockDriverState thing?  If it a BlockBackend thing, does the target
device have an iostatus at all?

Paolo
Kevin Wolf Aug. 1, 2012, 1:09 p.m. UTC | #8
Am 01.08.2012 14:30, schrieb Paolo Bonzini:
> Il 01/08/2012 14:23, Kevin Wolf ha scritto:
>> Am 01.08.2012 14:09, schrieb Paolo Bonzini:
>>> Il 01/08/2012 13:49, Kevin Wolf ha scritto:
>>>>> The real question is: if I remove the possibility to inspect the (so far
>>>>> anonymous) target device with query-block-jobs, how do you read the
>>>>> status of the target device?...
>>>>
>>>> You don't? :-)
>>>
>>> That's a possibility. :)
>>>
>>> You can just report it in the block job.  It's more consistent with the
>>> fact that you got a BLOCK_JOB_ERROR and not a BLOCK_IO_ERROR.  So I
>>> would do:
>>>
>>> +    bdrv_emit_qmp_error_event(job->bs, QEVENT_BLOCK_JOB_ERROR,
>>> +                              action, is_read);
>>
>> Isn't job->bs the source?
> 
> It's not about source vs. target, it's about what block device the _job_
> is attached too.  The source is always going to be special because you
> must do "block-job-resume source".

This whole concept of a block job being attached to a single bs is
flawed. Because in the general case it isn't. And probably it should
have been a BackgroundJob rather than a BlockJob from the beginning,
because I'm sure there are use cases outside the block layer as well.

We really manage to get every single point messed up. :-(

> is_read tells you where the error was.

Rather indirect, but yes, for the mirror it works.

>> Also, if you miss the event (e.g. libvirt crashed), can you still tell
>> which bs caused the error?
> 
> No, but how is it different from "can you still tell which bs in the
> chain caused the error"?  Which you cannot tell anyway even from the
> event parameters we have had so far.

It isn't different. We really should report the exact BDS. In practice,
management tools care about ENOSPC which is always the top-level BDS,
and call the admin for help in other cases. The admin can try manually
which file is at fault, but I suppose he would be very glad to be told
by the error message which file it is.

>> Do we need another BlockJobInfo field for the
>> name (*cough* ;-)) of the failed bs?
>>
>> If I understand it right, error reporting on the source and on the
>> target would be completely symmetrical then, which I think is a good thing.
> 
> Yeah, it would, but job->bs _is_ special anyway.
> 
>> job->bs makes one image special, which isn't really useful for a generic
>> interface. So we should keep the fact that it exists internal and get
>> rid of it sometime.
>>
>>> +    if (action == BDRV_ACTION_STOP) {
>>> +        block_job_pause(job);
>>> +        block_job_iostatus_set_err(job, error);
>>> +        if (bs != job->bs) {
>>> +            bdrv_iostatus_set_err(bs, error);
>>> +        }
>>> +    }
>>>
>>> where the bdrv_iostatus_set_err is mostly to "prepare for the future"
>>> usage of named block devices.
>>
>> Again, I'd make it unconditional to get symmetric behaviour.
> 
> Not possible, because existing clients may expect "iostatus:
> {nospace,failed}" to imply a runstate of "not running (i/o error)".

Did we make such guarantees? Does libvirt actually make use of it?

>>> Hmm, but I'm a bit wary of introducing such a big change now.  We know
>>> what it makes nicer, but we don't know of anything irremediably broken
>>> without them, and we haven't thought enough of any warts it introduces.
>>
>> On the one hand, I can understand your concerns, but on the other hand,
>> introducing an API now and then making such a big change afterwards
>> scares me much more.
> 
> One example of the doubts I have: is iostatus a BlockBackend or a
> BlockDriverState thing?  If it a BlockBackend thing, does the target
> device have an iostatus at all?

I think it's better to have it in BlockDriverState, but in my
imagination the target is a BlockBackend anyway.

Kevin
Paolo Bonzini Aug. 1, 2012, 1:21 p.m. UTC | #9
Il 01/08/2012 15:09, Kevin Wolf ha scritto:
>> It's not about source vs. target, it's about what block device the _job_
>> is attached too.  The source is always going to be special because you
>> must do "block-job-resume source".
> 
> This whole concept of a block job being attached to a single bs is
> flawed. Because in the general case it isn't. And probably it should
> have been a BackgroundJob rather than a BlockJob from the beginning,
> because I'm sure there are use cases outside the block layer as well.
> 
> We really manage to get every single point messed up. :-(

The balance between overengineering and messing up is unfortunately very
difficult to strike.  Yes, migration is a BackgroundJob in itself.  But
how far would we want to go specializing the monitor interface for
BackgroundJob subclasses, considering after all this is C?

If we had a tiny C core and everything above it written in {dynamic
language of the day} (I vote for Smalltalk, FYI) perhaps we could be
more cavalier in creating new classes and management abstractions, but
keeping it simple has its advantages.

No matter how much we messed things up, so far we decently managed to
refactor our way out (and screw up even more the next thing).

>> is_read tells you where the error was.
> 
> Rather indirect, but yes, for the mirror it works.
> 
>>> Also, if you miss the event (e.g. libvirt crashed), can you still tell
>>> which bs caused the error?
>>
>> No, but how is it different from "can you still tell which bs in the
>> chain caused the error"?  Which you cannot tell anyway even from the
>> event parameters we have had so far.
> 
> It isn't different. We really should report the exact BDS. In practice,
> management tools care about ENOSPC which is always the top-level BDS,
> and call the admin for help in other cases. The admin can try manually
> which file is at fault, but I suppose he would be very glad to be told
> by the error message which file it is.

Yes, and we can add it in the future as an additional argument to the
event and of query-{block,block-jobs} output.  At which point we are ok
with having the target iostatus in BlockJob.

>>> Do we need another BlockJobInfo field for the
>>> name (*cough* ;-)) of the failed bs?
>>>
>>> If I understand it right, error reporting on the source and on the
>>> target would be completely symmetrical then, which I think is a good thing.
>>
>> Yeah, it would, but job->bs _is_ special anyway.
>>
>>> job->bs makes one image special, which isn't really useful for a generic
>>> interface. So we should keep the fact that it exists internal and get
>>> rid of it sometime.
>>>
>>>> +    if (action == BDRV_ACTION_STOP) {
>>>> +        block_job_pause(job);
>>>> +        block_job_iostatus_set_err(job, error);
>>>> +        if (bs != job->bs) {
>>>> +            bdrv_iostatus_set_err(bs, error);
>>>> +        }
>>>> +    }
>>>>
>>>> where the bdrv_iostatus_set_err is mostly to "prepare for the future"
>>>> usage of named block devices.
>>>
>>> Again, I'd make it unconditional to get symmetric behaviour.
>>
>> Not possible, because existing clients may expect "iostatus:
>> {nospace,failed}" to imply a runstate of "not running (i/o error)".
> 
> Did we make such guarantees? Does libvirt actually make use of it?

I'm not sure libvirt relies on it, but I think it's a reasonable
expectation.

>>>> Hmm, but I'm a bit wary of introducing such a big change now.  We know
>>>> what it makes nicer, but we don't know of anything irremediably broken
>>>> without them, and we haven't thought enough of any warts it introduces.
>>>
>>> On the one hand, I can understand your concerns, but on the other hand,
>>> introducing an API now and then making such a big change afterwards
>>> scares me much more.
>>
>> One example of the doubts I have: is iostatus a BlockBackend or a
>> BlockDriverState thing?  If it a BlockBackend thing, does the target
>> device have an iostatus at all?
> 
> I think it's better to have it in BlockDriverState, but in my
> imagination the target is a BlockBackend anyway.

Great, 0/2. :)  My rhetorical questions didn't have the outcome I hoped for.

Paolo
Kevin Wolf Aug. 1, 2012, 2:01 p.m. UTC | #10
Am 01.08.2012 15:21, schrieb Paolo Bonzini:
> Il 01/08/2012 15:09, Kevin Wolf ha scritto:
>>> It's not about source vs. target, it's about what block device the _job_
>>> is attached too.  The source is always going to be special because you
>>> must do "block-job-resume source".
>>
>> This whole concept of a block job being attached to a single bs is
>> flawed. Because in the general case it isn't. And probably it should
>> have been a BackgroundJob rather than a BlockJob from the beginning,
>> because I'm sure there are use cases outside the block layer as well.
>>
>> We really manage to get every single point messed up. :-(
> 
> The balance between overengineering and messing up is unfortunately very
> difficult to strike.  Yes, migration is a BackgroundJob in itself.  But
> how far would we want to go specializing the monitor interface for
> BackgroundJob subclasses, considering after all this is C?
> 
> If we had a tiny C core and everything above it written in {dynamic
> language of the day} (I vote for Smalltalk, FYI) perhaps we could be
> more cavalier in creating new classes and management abstractions, but
> keeping it simple has its advantages.
> 
> No matter how much we messed things up, so far we decently managed to
> refactor our way out (and screw up even more the next thing).

Well, the thing that disturbs me is having a BlockJob attached to a
single block device. Jobs can and do operate on multiple images and
should probably affect both BDSs the same. For example, bs->in_use
should probably be set for both (Haven't checked in detail yet, do we
have bugs here? Patch 27 doesn't call bdrv_set_in_use at least. It
starts to matter when we get named targets.)

So my wish was to get rid of the "main bs" in job->bs and move all
needed BDSs to the subclass. Which, I then noticed, leaves you with a
mere BackgroundJob.

>>> is_read tells you where the error was.
>>
>> Rather indirect, but yes, for the mirror it works.
>>
>>>> Also, if you miss the event (e.g. libvirt crashed), can you still tell
>>>> which bs caused the error?
>>>
>>> No, but how is it different from "can you still tell which bs in the
>>> chain caused the error"?  Which you cannot tell anyway even from the
>>> event parameters we have had so far.
>>
>> It isn't different. We really should report the exact BDS. In practice,
>> management tools care about ENOSPC which is always the top-level BDS,
>> and call the admin for help in other cases. The admin can try manually
>> which file is at fault, but I suppose he would be very glad to be told
>> by the error message which file it is.
> 
> Yes, and we can add it in the future as an additional argument to the
> event and of query-{block,block-jobs} output.  At which point we are ok
> with having the target iostatus in BlockJob.

Yup, target iostatus along with an ID of the BDS (QOM path? Or will all
of them be named eventually?)

>>>> Do we need another BlockJobInfo field for the
>>>> name (*cough* ;-)) of the failed bs?
>>>>
>>>> If I understand it right, error reporting on the source and on the
>>>> target would be completely symmetrical then, which I think is a good thing.
>>>
>>> Yeah, it would, but job->bs _is_ special anyway.
>>>
>>>> job->bs makes one image special, which isn't really useful for a generic
>>>> interface. So we should keep the fact that it exists internal and get
>>>> rid of it sometime.
>>>>
>>>>> +    if (action == BDRV_ACTION_STOP) {
>>>>> +        block_job_pause(job);
>>>>> +        block_job_iostatus_set_err(job, error);
>>>>> +        if (bs != job->bs) {
>>>>> +            bdrv_iostatus_set_err(bs, error);
>>>>> +        }
>>>>> +    }
>>>>>
>>>>> where the bdrv_iostatus_set_err is mostly to "prepare for the future"
>>>>> usage of named block devices.
>>>>
>>>> Again, I'd make it unconditional to get symmetric behaviour.
>>>
>>> Not possible, because existing clients may expect "iostatus:
>>> {nospace,failed}" to imply a runstate of "not running (i/o error)".
>>
>> Did we make such guarantees? Does libvirt actually make use of it?
> 
> I'm not sure libvirt relies on it, but I think it's a reasonable
> expectation.

Possibly, but I don't think anyone should rely on it. You first get the
event, or notice that the VM is stopped, and then check query-block.
Doing it the other way round and then inferring the VM state from
query-block sounds highly unlikely.

>>>>> Hmm, but I'm a bit wary of introducing such a big change now.  We know
>>>>> what it makes nicer, but we don't know of anything irremediably broken
>>>>> without them, and we haven't thought enough of any warts it introduces.
>>>>
>>>> On the one hand, I can understand your concerns, but on the other hand,
>>>> introducing an API now and then making such a big change afterwards
>>>> scares me much more.
>>>
>>> One example of the doubts I have: is iostatus a BlockBackend or a
>>> BlockDriverState thing?  If it a BlockBackend thing, does the target
>>> device have an iostatus at all?
>>
>> I think it's better to have it in BlockDriverState, but in my
>> imagination the target is a BlockBackend anyway.
> 
> Great, 0/2. :)  My rhetorical questions didn't have the outcome I hoped for.

Heh. :-)

I was thinking that when you do it for backing files, you automatically
have to use the BDS, but the iostatus/pointer model works as well, so
yeah, maybe BB is better.

But why wouldn't you use a BlockBackend for the target?

Kevin
Paolo Bonzini Aug. 1, 2012, 2:34 p.m. UTC | #11
Il 01/08/2012 16:01, Kevin Wolf ha scritto:
> Well, the thing that disturbs me is having a BlockJob attached to a
> single block device. Jobs can and do operate on multiple images and
> should probably affect both BDSs the same. For example, bs->in_use
> should probably be set for both (Haven't checked in detail yet, do we
> have bugs here? Patch 27 doesn't call bdrv_set_in_use at least. It
> starts to matter when we get named targets.)

Makes sense, but what does bs->in_use mean if being a target is "just
another way" to use a named block device?  A target cannot be attached
to a guest disk, but a source that is used by the guest can become part
of a block job.  So, you could block in_use devices from being used in
the drive property.  But you need a separate mechanism to mark devices
as used by the guest... in_use does not cut it.

Stuff like this is what makes me nervous about naming the target
devices.  Name something, and people start using it in ways you haven't
anticipated.  It also makes me wonder if the target is indeed a
BlockBackend or there is a common superclass hiding.

>>>>>> +    if (action == BDRV_ACTION_STOP) {
>>>>>> +        block_job_pause(job);
>>>>>> +        block_job_iostatus_set_err(job, error);
>>>>>> +        if (bs != job->bs) {
>>>>>> +            bdrv_iostatus_set_err(bs, error);
>>>>>> +        }
>>>>>> +    }
>>>>>>
>>>>>> where the bdrv_iostatus_set_err is mostly to "prepare for the future"
>>>>>> usage of named block devices.
>>>>>
>>>>> Again, I'd make it unconditional to get symmetric behaviour.
>>>>
>>>> Not possible, because existing clients may expect "iostatus:
>>>> {nospace,failed}" to imply a runstate of "not running (i/o error)".
>>>
>>> Did we make such guarantees? Does libvirt actually make use of it?
>>
>> I'm not sure libvirt relies on it, but I think it's a reasonable
>> expectation.
> 
> Possibly, but I don't think anyone should rely on it. You first get the
> event, or notice that the VM is stopped, and then check query-block.
> Doing it the other way round and then inferring the VM state from
> query-block sounds highly unlikely.

Yeah, makes sense.  The other thing is that the iostatus is very much
tied to the running state.  A stop+cont happening for unrelated reasons
would clear the iostatus for example.

>>>>>> Hmm, but I'm a bit wary of introducing such a big change now.  We know
>>>>>> what it makes nicer, but we don't know of anything irremediably broken
>>>>>> without them, and we haven't thought enough of any warts it introduces.
>>>>>
>>>>> On the one hand, I can understand your concerns, but on the other hand,
>>>>> introducing an API now and then making such a big change afterwards
>>>>> scares me much more.
>>>>
>>>> One example of the doubts I have: is iostatus a BlockBackend or a
>>>> BlockDriverState thing?  If it a BlockBackend thing, does the target
>>>> device have an iostatus at all?
>>>
>>> I think it's better to have it in BlockDriverState, but in my
>>> imagination the target is a BlockBackend anyway.
>>
>> Great, 0/2. :)  My rhetorical questions didn't have the outcome I hoped for.
> 
> Heh. :-)
> 
> I was thinking that when you do it for backing files, you automatically
> have to use the BDS, but the iostatus/pointer model works as well, so
> yeah, maybe BB is better.
> 
> But why wouldn't you use a BlockBackend for the target?

Probably just because at the moment I'm not sure how much of BDS would
percolate there.  In_use is tricky, of course, but perhaps we can or
need to get rid of in_use altogether.

I'm not sure of iostatus because we currently do not have a way to reset
it.  "cont" and "block-job-resume" do that, but only implicitly and
that's too tied to the current usage of iostatus.  If the target device
is not a BB we solve the problem of "cont" resetting its iostatus...

Paolo
Kevin Wolf Aug. 1, 2012, 2:59 p.m. UTC | #12
Am 01.08.2012 16:34, schrieb Paolo Bonzini:
> Il 01/08/2012 16:01, Kevin Wolf ha scritto:
>> Well, the thing that disturbs me is having a BlockJob attached to a
>> single block device. Jobs can and do operate on multiple images and
>> should probably affect both BDSs the same. For example, bs->in_use
>> should probably be set for both (Haven't checked in detail yet, do we
>> have bugs here? Patch 27 doesn't call bdrv_set_in_use at least. It
>> starts to matter when we get named targets.)
> 
> Makes sense, but what does bs->in_use mean if being a target is "just
> another way" to use a named block device?  A target cannot be attached
> to a guest disk

For the mirror, probably yes. Live commit writes to a BDS that is part
of a BlockBackend used by a guest. So it depends on the exact job type
which involved images can be used in which other cases.

Maybe we should first get an overview of which uses exclude which other
uses of a BDS. Because honestly, at this point I have no idea how to
model it correctly.

> but a source that is used by the guest can become part
> of a block job.  So, you could block in_use devices from being used in
> the drive property.  But you need a separate mechanism to mark devices
> as used by the guest... in_use does not cut it.
> 
> Stuff like this is what makes me nervous about naming the target
> devices.  Name something, and people start using it in ways you haven't
> anticipated.  It also makes me wonder if the target is indeed a
> BlockBackend or there is a common superclass hiding.

Hm...Don't know, what would the difference be?

>>>>>>> +    if (action == BDRV_ACTION_STOP) {
>>>>>>> +        block_job_pause(job);
>>>>>>> +        block_job_iostatus_set_err(job, error);
>>>>>>> +        if (bs != job->bs) {
>>>>>>> +            bdrv_iostatus_set_err(bs, error);
>>>>>>> +        }
>>>>>>> +    }
>>>>>>>
>>>>>>> where the bdrv_iostatus_set_err is mostly to "prepare for the future"
>>>>>>> usage of named block devices.
>>>>>>
>>>>>> Again, I'd make it unconditional to get symmetric behaviour.
>>>>>
>>>>> Not possible, because existing clients may expect "iostatus:
>>>>> {nospace,failed}" to imply a runstate of "not running (i/o error)".
>>>>
>>>> Did we make such guarantees? Does libvirt actually make use of it?
>>>
>>> I'm not sure libvirt relies on it, but I think it's a reasonable
>>> expectation.
>>
>> Possibly, but I don't think anyone should rely on it. You first get the
>> event, or notice that the VM is stopped, and then check query-block.
>> Doing it the other way round and then inferring the VM state from
>> query-block sounds highly unlikely.
> 
> Yeah, makes sense.  The other thing is that the iostatus is very much
> tied to the running state.  A stop+cont happening for unrelated reasons
> would clear the iostatus for example.

Yes, clearing it automatically may be convenient in the common case, but
is probably not exactly the best idea we've ever had.

>>>>>>> Hmm, but I'm a bit wary of introducing such a big change now.  We know
>>>>>>> what it makes nicer, but we don't know of anything irremediably broken
>>>>>>> without them, and we haven't thought enough of any warts it introduces.
>>>>>>
>>>>>> On the one hand, I can understand your concerns, but on the other hand,
>>>>>> introducing an API now and then making such a big change afterwards
>>>>>> scares me much more.
>>>>>
>>>>> One example of the doubts I have: is iostatus a BlockBackend or a
>>>>> BlockDriverState thing?  If it a BlockBackend thing, does the target
>>>>> device have an iostatus at all?
>>>>
>>>> I think it's better to have it in BlockDriverState, but in my
>>>> imagination the target is a BlockBackend anyway.
>>>
>>> Great, 0/2. :)  My rhetorical questions didn't have the outcome I hoped for.
>>
>> Heh. :-)
>>
>> I was thinking that when you do it for backing files, you automatically
>> have to use the BDS, but the iostatus/pointer model works as well, so
>> yeah, maybe BB is better.
>>
>> But why wouldn't you use a BlockBackend for the target?
> 
> Probably just because at the moment I'm not sure how much of BDS would
> percolate there.  In_use is tricky, of course, but perhaps we can or
> need to get rid of in_use altogether.

in_use has never been properly designed, it is a band-aid hack that has
spread more or more. Can you explain its exact semantics? The best I can
come up with is something like: "Someone does something with the device
and thought that doing one specific other thing, that happens to use
in_use as well, at the same time might be a bad idea."

I'm pretty sure that in_use forbids more cases than is really necessary,
and probably many other cases that should be forbidden are missing.

> I'm not sure of iostatus because we currently do not have a way to reset
> it.  "cont" and "block-job-resume" do that, but only implicitly and
> that's too tied to the current usage of iostatus.  If the target device
> is not a BB we solve the problem of "cont" resetting its iostatus...

This is an awful reason. :-)

Block jobs aren't really different from guests in that respect. Maybe
the BB needs a second iostatus field that must explicitly be reset, and
the old one keeps doing the stupid thing for compatibility's sake.

Kevin
Paolo Bonzini Aug. 1, 2012, 3:15 p.m. UTC | #13
Il 01/08/2012 16:59, Kevin Wolf ha scritto:
> Am 01.08.2012 16:34, schrieb Paolo Bonzini:
>> Il 01/08/2012 16:01, Kevin Wolf ha scritto:
>>> Well, the thing that disturbs me is having a BlockJob attached to a
>>> single block device. Jobs can and do operate on multiple images and
>>> should probably affect both BDSs the same. For example, bs->in_use
>>> should probably be set for both (Haven't checked in detail yet, do we
>>> have bugs here? Patch 27 doesn't call bdrv_set_in_use at least. It
>>> starts to matter when we get named targets.)
>>
>> Makes sense, but what does bs->in_use mean if being a target is "just
>> another way" to use a named block device?  A target cannot be attached
>> to a guest disk
> 
> For the mirror, probably yes. Live commit writes to a BDS that is part
> of a BlockBackend used by a guest. So it depends on the exact job type
> which involved images can be used in which other cases.

Yes, but then it makes sense (at least initially) to just block things
more than needed.  In this respect, in_use is doing its job decently;
there is always time to relax things later.

> Maybe we should first get an overview of which uses exclude which other
> uses of a BDS. Because honestly, at this point I have no idea how to
> model it correctly.
> 
>> but a source that is used by the guest can become part
>> of a block job.  So, you could block in_use devices from being used in
>> the drive property.  But you need a separate mechanism to mark devices
>> as used by the guest... in_use does not cut it.
>>
>> Stuff like this is what makes me nervous about naming the target
>> devices.  Name something, and people start using it in ways you haven't
>> anticipated.  It also makes me wonder if the target is indeed a
>> BlockBackend or there is a common superclass hiding.
> 
> Hm...Don't know, what would the difference be?

What blocks what, for example (for which as you said we need a matrix in
order to model it correctly).  Or how the iostatus is reset if the
target has a iostatus at all; if not, that would be another difference.

>>>>>> One example of the doubts I have: is iostatus a BlockBackend or a
>>>>>> BlockDriverState thing?  If it a BlockBackend thing, does the target
>>>>>> device have an iostatus at all?
>>>>>
>>>>> I think it's better to have it in BlockDriverState, but in my
>>>>> imagination the target is a BlockBackend anyway.
>>>>
>>>> Great, 0/2. :)  My rhetorical questions didn't have the outcome I hoped for.
>>>
>>> Heh. :-)
>>>
>>> I was thinking that when you do it for backing files, you automatically
>>> have to use the BDS, but the iostatus/pointer model works as well, so
>>> yeah, maybe BB is better.
>>>
>>> But why wouldn't you use a BlockBackend for the target?
>>
>> Probably just because at the moment I'm not sure how much of BDS would
>> percolate there.  In_use is tricky, of course, but perhaps we can or
>> need to get rid of in_use altogether.
> 
> in_use has never been properly designed, it is a band-aid hack that has
> spread more or more. Can you explain its exact semantics? The best I can
> come up with is something like: "Someone does something with the device
> and thought that doing one specific other thing, that happens to use
> in_use as well, at the same time might be a bad idea."

Cool things you can do;
don't ask for two at a time
or QEMU eats your disk.

> I'm pretty sure that in_use forbids more cases than is really necessary,
> and probably many other cases that should be forbidden are missing.

So far we fared pretty well on the latter point though.

>> I'm not sure of iostatus because we currently do not have a way to reset
>> it.  "cont" and "block-job-resume" do that, but only implicitly and
>> that's too tied to the current usage of iostatus.  If the target device
>> is not a BB we solve the problem of "cont" resetting its iostatus...
> 
> This is an awful reason. :-)

It is, but it has the merit of decoupling new stuff from "not so
brilliant" old stuff.

> Block jobs aren't really different from guests in that respect. Maybe
> the BB needs a second iostatus field that must explicitly be reset, and
> the old one keeps doing the stupid thing for compatibility's sake.

Or the iostatus for the target can just reside in the BlockJob... :)

As much as I hate to invoke shortcuts, management may proceed without
human help only in the ENOSPC case, and ENOSPC can only happens on the
target.  Humans usually look at dmesg to find the source.

Paolo
Kevin Wolf Aug. 6, 2012, 9:29 a.m. UTC | #14
Am 01.08.2012 17:15, schrieb Paolo Bonzini:
> Il 01/08/2012 16:59, Kevin Wolf ha scritto:
>> Block jobs aren't really different from guests in that respect. Maybe
>> the BB needs a second iostatus field that must explicitly be reset, and
>> the old one keeps doing the stupid thing for compatibility's sake.
> 
> Or the iostatus for the target can just reside in the BlockJob... :)

That wouldn't fix the problem in more than a single instance...

> As much as I hate to invoke shortcuts, management may proceed without
> human help only in the ENOSPC case, and ENOSPC can only happens on the
> target.  Humans usually look at dmesg to find the source.

dmesg doesn't contain information about corrupted qcow2 images, Sheepdog
error codes from the server, etc.

Kevin
Paolo Bonzini Aug. 6, 2012, 9:44 a.m. UTC | #15
Il 06/08/2012 11:29, Kevin Wolf ha scritto:
>>> >> Block jobs aren't really different from guests in that respect. Maybe
>>> >> the BB needs a second iostatus field that must explicitly be reset, and
>>> >> the old one keeps doing the stupid thing for compatibility's sake.
>> > 
>> > Or the iostatus for the target can just reside in the BlockJob... :)
> That wouldn't fix the problem in more than a single instance...

Even if you have problems in more than one device, you can still fix
them one at a time.

I think that we're fine with the current information.  In the long term
we will add the failing blockdev name to the blockjob iostatus.

>> > As much as I hate to invoke shortcuts, management may proceed without
>> > human help only in the ENOSPC case, and ENOSPC can only happens on the
>> > target.  Humans usually look at dmesg to find the source.
> dmesg doesn't contain information about corrupted qcow2 images, Sheepdog
> error codes from the server, etc.

True, guest dmesg doesn't help for block jobs.  (But network glitches
could be in dmesg or other sources of monitoring information).

But for block jobs your margins are small because you cannot take the VM
offline.  So if you get an EIO you can just do three things: first,
retry and see if it goes away (transient glitch); second, throw away the
target and see if it goes away; third, raise white flag and ask the VM
admin to cooperate, because the problem is likely in the source.

Paolo
Kevin Wolf Aug. 6, 2012, 10:45 a.m. UTC | #16
Am 06.08.2012 11:44, schrieb Paolo Bonzini:
> Il 06/08/2012 11:29, Kevin Wolf ha scritto:
>>>>>> Block jobs aren't really different from guests in that respect. Maybe
>>>>>> the BB needs a second iostatus field that must explicitly be reset, and
>>>>>> the old one keeps doing the stupid thing for compatibility's sake.
>>>>
>>>> Or the iostatus for the target can just reside in the BlockJob... :)
>> That wouldn't fix the problem in more than a single instance...
> 
> Even if you have problems in more than one device, you can still fix
> them one at a time.
> 
> I think that we're fine with the current information.  In the long term
> we will add the failing blockdev name to the blockjob iostatus.

I think you misunderstood. What I was trying to say is that with the
same reasoning we'd need a field that doesn't automatically reset its
status on 'cont' not only for block jobs, but also for regular guest disks.

If you try fixing the problem by adding a field in BlockJob, it may well
be fixed for block jobs, but you still need to add it in the generic
place later so that regular disks are covered as well.

>>>> As much as I hate to invoke shortcuts, management may proceed without
>>>> human help only in the ENOSPC case, and ENOSPC can only happens on the
>>>> target.  Humans usually look at dmesg to find the source.
>> dmesg doesn't contain information about corrupted qcow2 images, Sheepdog
>> error codes from the server, etc.
> 
> True, guest dmesg doesn't help for block jobs.  (But network glitches
> could be in dmesg or other sources of monitoring information).

Guest or host dmesg? I thought we're talking about the host.

> But for block jobs your margins are small because you cannot take the VM
> offline.  So if you get an EIO you can just do three things: first,
> retry and see if it goes away (transient glitch); second, throw away the
> target and see if it goes away; third, raise white flag and ask the VM
> admin to cooperate, because the problem is likely in the source.

Yes, and the admin wants to know what happened, i.e. an accurate iostatus.

Kevin
Paolo Bonzini Aug. 6, 2012, 10:58 a.m. UTC | #17
Il 06/08/2012 12:45, Kevin Wolf ha scritto:
>> In the long term
>> we will add the failing blockdev name to the blockjob iostatus.
> I think you misunderstood. What I was trying to say is that with the
> same reasoning we'd need a field that doesn't automatically reset its
> status on 'cont' not only for block jobs, but also for regular guest disks.

I'm not sure why.  There cannot be changes to the guest-triggered
iostatus after you do a stop.

On the other hand, a block job still runs while the guest is stopped
(and we cannot change this because it'd be backwards-incompatible), so
you can have the following (> means command, < means event):

   > stop                 (no iostatus)
   < BLOCK_JOB_ERROR      (iostatus=failed on source)
   > cont                 (no iostatus)
     libvirtd restarts
   > query-block          (no iostatus)

Compare this with guest-initiated I/O:

   > stop                 (no iostatus)
   < BLOCK_IO_ERROR       (iostatus=failed)
   > cont                 (no iostatus)
     libvirtd restarts
     QEMU retries I/O, fails
   < BLOCK_IO_ERROR       (iostatus=failed)
   > query-block          (iostatus=failed)

> If you try fixing the problem by adding a field in BlockJob, it may well
> be fixed for block jobs, but you still need to add it in the generic
> place later so that regular disks are covered as well.

Regular disks are covered because each entry in query-block has its own
iostatus.  The only problematic case is now if two different backing
files fail for unrelated reasons.

Paolo
diff mbox

Patch

diff --git a/QMP/qmp-events.txt b/QMP/qmp-events.txt
index 9ba7079..e910deb 100644
--- a/QMP/qmp-events.txt
+++ b/QMP/qmp-events.txt
@@ -353,3 +353,26 @@  Example:
 { "event": "BALLOON_CHANGE",
     "data": { "actual": 944766976 },
     "timestamp": { "seconds": 1267020223, "microseconds": 435656 } }
+
+
+BLOCK_JOB_ERROR
+---------------
+
+Emitted when a block job encounters an error.
+
+Data:
+
+- "device": device name (json-string)
+- "operation": I/O operation (json-string, "read" or "write")
+- "action": action that has been taken, it's one of the following (json-string):
+    "ignore": error has been ignored, the job may fail later
+    "report": error will be reported and the job canceled
+    "stop": error caused job to be paused
+
+Example:
+
+{ "event": "BLOCK_JOB_ERROR",
+    "data": { "device": "ide0-hd1",
+              "operation": "write",
+              "action": "stop" },
+    "timestamp": { "seconds": 1265044230, "microseconds": 450486 } }
diff --git a/block.c b/block.c
index dce07b3..44542e5 100644
--- a/block.c
+++ b/block.c
@@ -1153,8 +1153,9 @@  void bdrv_set_dev_ops(BlockDriverState *bs, const BlockDevOps *ops,
     }
 }
 
-static void bdrv_emit_qmp_error_event(const BlockDriverState *bdrv,
-                                      BlockErrorAction action, int is_read)
+void bdrv_emit_qmp_error_event(const BlockDriverState *bdrv,
+                               enum MonitorEvent ev,
+                               BlockErrorAction action, int is_read)
 {
     QObject *data;
     const char *action_str;
@@ -1177,7 +1178,7 @@  static void bdrv_emit_qmp_error_event(const BlockDriverState *bdrv,
                               bdrv->device_name,
                               action_str,
                               is_read ? "read" : "write");
-    monitor_protocol_event(QEVENT_BLOCK_IO_ERROR, data);
+    monitor_protocol_event(ev, data);
 
     qobject_decref(data);
 }
@@ -2174,7 +2175,7 @@  void bdrv_error_action(BlockDriverState *bs, BlockErrorAction action,
                        int is_read, int error)
 {
     assert(error >= 0);
-    bdrv_emit_qmp_error_event(bs, action, is_read);
+    bdrv_emit_qmp_error_event(bs, QEVENT_BLOCK_IO_ERROR, action, is_read);
     if (action == BDRV_ACTION_STOP) {
         vm_stop(RUN_STATE_IO_ERROR);
         bdrv_iostatus_set_err(bs, error);
diff --git a/block_int.h b/block_int.h
index 4cc173d..92c106a 100644
--- a/block_int.h
+++ b/block_int.h
@@ -30,6 +30,7 @@ 
 #include "qemu-coroutine.h"
 #include "qemu-timer.h"
 #include "qapi-types.h"
+#include "monitor.h"
 
 #define BLOCK_FLAG_ENCRYPT	1
 #define BLOCK_FLAG_COMPAT6	4
@@ -276,6 +277,9 @@  void bdrv_set_io_limits(BlockDriverState *bs,
 #ifdef _WIN32
 int is_windows_drive(const char *filename);
 #endif
+void bdrv_emit_qmp_error_event(const BlockDriverState *bdrv,
+                               enum MonitorEvent ev,
+                               BlockErrorAction action, int is_read);
 
 /**
  * stream_start:
diff --git a/blockjob.c b/blockjob.c
index a18da3f..562e0b5 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -112,6 +112,7 @@  bool block_job_is_paused(BlockJob *job)
 void block_job_resume(BlockJob *job)
 {
     job->paused = false;
+    job->iostatus = BLOCK_DEVICE_IO_STATUS_OK;
     if (job->co && !job->busy) {
         qemu_coroutine_enter(job->co, NULL);
     }
@@ -189,11 +190,60 @@  void block_job_sleep_ns(BlockJob *job, QEMUClock *clock, int64_t ns)
 BlockJobInfo *block_job_query(BlockJob *job)
 {
     BlockJobInfo *info = g_new(BlockJobInfo, 1);
-    info->type   = g_strdup(job->job_type->job_type);
-    info->device = g_strdup(bdrv_get_device_name(job->bs));
-    info->len    = job->len;
-    info->paused = job->paused;
-    info->offset = job->offset;
-    info->speed  = job->speed;
+    info->type      = g_strdup(job->job_type->job_type);
+    info->device    = g_strdup(bdrv_get_device_name(job->bs));
+    info->len       = job->len;
+    info->paused    = job->paused;
+    info->offset    = job->offset;
+    info->speed     = job->speed;
+    info->io_status = job->iostatus;
     return info;
 }
+
+static void block_job_iostatus_set_err(BlockJob *job, int error)
+{
+    BlockDeviceIoStatus new_status =
+        (error == ENOSPC ? BLOCK_DEVICE_IO_STATUS_NOSPACE :
+                           BLOCK_DEVICE_IO_STATUS_FAILED);
+
+    /* iostatus values are sorted from less severe to most severe
+     * (ok, nospace, failed).  */
+    if (job->iostatus < new_status) {
+        job->iostatus = new_status;
+    }
+}
+
+
+BlockErrorAction block_job_error_action(BlockJob *job, BlockDriverState *bs,
+                                        BlockdevOnError on_err,
+                                        int is_read, int error)
+{
+    BlockErrorAction action;
+
+    switch (on_err) {
+    case BLOCKDEV_ON_ERROR_ENOSPC:
+        action = (error == ENOSPC) ? BDRV_ACTION_STOP : BDRV_ACTION_REPORT;
+        break;
+    case BLOCKDEV_ON_ERROR_STOP:
+        action = BDRV_ACTION_STOP;
+        break;
+    case BLOCKDEV_ON_ERROR_REPORT:
+        action = BDRV_ACTION_REPORT;
+        break;
+    case BLOCKDEV_ON_ERROR_IGNORE:
+        action = BDRV_ACTION_IGNORE;
+        break;
+    default:
+        abort();
+    }
+    bdrv_emit_qmp_error_event(job->bs, QEVENT_BLOCK_JOB_ERROR, action, is_read);
+    if (action == BDRV_ACTION_STOP) {
+        block_job_pause(job);
+        if (bs == job->bs) {
+            block_job_iostatus_set_err(job, error);
+        } else {
+            bdrv_iostatus_set_err(bs, error);
+        }
+    }
+    return action;
+}
diff --git a/blockjob.h b/blockjob.h
index 2abbe13..b17ee2e 100644
--- a/blockjob.h
+++ b/blockjob.h
@@ -82,6 +82,9 @@  struct BlockJob {
      */
     bool busy;
 
+    /** Status that is published by the query-block-jobs QMP API */
+    BlockDeviceIoStatus iostatus;
+
     /** Offset that is published by the query-block-jobs QMP API */
     int64_t offset;
 
@@ -216,4 +219,18 @@  bool block_job_is_paused(BlockJob *job);
  */
 int block_job_cancel_sync(BlockJob *job);
 
+/**
+ * block_job_error_action:
+ * @job: The job to signal an error for.
+ * @bs: The block device on which to set an I/O error.
+ * @on_err: The error action setting.
+ * @is_read: Whether the operation was a read.
+ * @error: The error that was reported.
+ *
+ * Report an I/O error for a block job and possibly stop the VM.  Return the
+ * action that was selected based on @on_err and @error.
+ */
+BlockErrorAction block_job_error_action(BlockJob *job, BlockDriverState *bs,
+                                        BlockdevOnError on_err,
+                                        int is_read, int error);
 #endif
diff --git a/monitor.c b/monitor.c
index 49dccfe..19da71d 100644
--- a/monitor.c
+++ b/monitor.c
@@ -454,6 +454,7 @@  static const char *monitor_event_names[] = {
     [QEVENT_SPICE_DISCONNECTED] = "SPICE_DISCONNECTED",
     [QEVENT_BLOCK_JOB_COMPLETED] = "BLOCK_JOB_COMPLETED",
     [QEVENT_BLOCK_JOB_CANCELLED] = "BLOCK_JOB_CANCELLED",
+    [QEVENT_BLOCK_JOB_ERROR] = "BLOCK_JOB_ERROR",
     [QEVENT_DEVICE_TRAY_MOVED] = "DEVICE_TRAY_MOVED",
     [QEVENT_SUSPEND] = "SUSPEND",
     [QEVENT_WAKEUP] = "WAKEUP",
diff --git a/monitor.h b/monitor.h
index 5f4de1b..f806962 100644
--- a/monitor.h
+++ b/monitor.h
@@ -38,6 +38,7 @@  typedef enum MonitorEvent {
     QEVENT_SPICE_DISCONNECTED,
     QEVENT_BLOCK_JOB_COMPLETED,
     QEVENT_BLOCK_JOB_CANCELLED,
+    QEVENT_BLOCK_JOB_ERROR,
     QEVENT_DEVICE_TRAY_MOVED,
     QEVENT_SUSPEND,
     QEVENT_WAKEUP,
diff --git a/qapi-schema.json b/qapi-schema.json
index 2dee7c3..d7191f3 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -941,11 +941,14 @@ 
 #
 # @speed: the rate limit, bytes per second
 #
+# @io-status: the status of the job (since 1.2)
+#
 # Since: 1.1
 ##
 { 'type': 'BlockJobInfo',
   'data': {'type': 'str', 'device': 'str', 'len': 'int',
-           'offset': 'int', 'paused': 'bool', 'speed': 'int'} }
+           'offset': 'int', 'paused': 'bool', 'speed': 'int',
+           'io-status': 'BlockDeviceIoStatus'} }
 
 ##
 # @query-block-jobs: