Message ID | 1409926039-29044-2-git-send-email-mreitz@redhat.com |
---|---|
State | New |
Headers | show |
On 09/05/2014 08:07 AM, Max Reitz wrote: > Not every BLOCK_IMAGE_CORRUPTED event must be fatal; for example, when > reading from an image, they should generally not be. Nonetheless, even > an image only read from may of course be corrupted and this can be > detected during normal operation. In this case, a non-fatal event should > be emitted, but the image should not be marked corrupt (in accordance to > "fatal" set to false). > > Signed-off-by: Max Reitz <mreitz@redhat.com> > --- > block/qcow2-refcount.c | 1 + > qapi/block-core.json | 9 +++++++-- > 2 files changed, 8 insertions(+), 2 deletions(-) As this is purely additive, it should not break any former clients. Older clients not knowing to look for the fatal flag will treat everything as fatal, but that's okay. Reviewed-by: Eric Blake <eblake@redhat.com>
On 09/05/2014 08:07 AM, Max Reitz wrote: > Not every BLOCK_IMAGE_CORRUPTED event must be fatal; for example, when > reading from an image, they should generally not be. Nonetheless, even > an image only read from may of course be corrupted and this can be > detected during normal operation. In this case, a non-fatal event should > be emitted, but the image should not be marked corrupt (in accordance to > "fatal" set to false). > Question - what happens if management misses the signal? For example, if libvirt opens qemu on a read-only image, then goes away, then corruption is detected, then libvirt reconnects. Does query-block need to also be updated to report whether a read-only BDS is currently detected as fatal, but that an event has already been delivered?
On 05.09.2014 16:40, Eric Blake wrote: > On 09/05/2014 08:07 AM, Max Reitz wrote: >> Not every BLOCK_IMAGE_CORRUPTED event must be fatal; for example, when >> reading from an image, they should generally not be. Nonetheless, even >> an image only read from may of course be corrupted and this can be >> detected during normal operation. In this case, a non-fatal event should >> be emitted, but the image should not be marked corrupt (in accordance to >> "fatal" set to false). >> > Question - what happens if management misses the signal? For example, > if libvirt opens qemu on a read-only image, then goes away, then > corruption is detected, then libvirt reconnects. Does query-block need > to also be updated to report whether a read-only BDS is currently > detected as fatal, but that an event has already been delivered? Well, the obvious problem with that is that corruption currently is a strongly format-specific topic, and only reported for qcow2. So, to do this, we'd have to move the corruption signalling code into the common block layer functions and proceed from there. This actually probably isn't too bad of an idea, anyway. But then we'll need a global bdrv_mark_corrupt() function (which we probably don't want in case we get more flags in the future, so we'll rather want bdrv_set_flag() or something) and I don't really want to make these changes now... We could postpone this, though. Making this change later shouldn't break anything. The other solution would be simply to suppress the stderr message but to always deliver the QMP event. Max
On 09/05/2014 08:47 AM, Max Reitz wrote: > On 05.09.2014 16:40, Eric Blake wrote: >> On 09/05/2014 08:07 AM, Max Reitz wrote: >>> Not every BLOCK_IMAGE_CORRUPTED event must be fatal; for example, when >>> reading from an image, they should generally not be. Nonetheless, even >>> an image only read from may of course be corrupted and this can be >>> detected during normal operation. In this case, a non-fatal event should >>> be emitted, but the image should not be marked corrupt (in accordance to >>> "fatal" set to false). >>> >> Question - what happens if management misses the signal? For example, >> if libvirt opens qemu on a read-only image, then goes away, then >> corruption is detected, then libvirt reconnects. Does query-block need >> to also be updated to report whether a read-only BDS is currently >> detected as fatal, but that an event has already been delivered? > > Well, the obvious problem with that is that corruption currently is a > strongly format-specific topic, and only reported for qcow2. So, to do > this, we'd have to move the corruption signalling code into the common > block layer functions and proceed from there. This actually probably > isn't too bad of an idea, anyway. But then we'll need a global > bdrv_mark_corrupt() function (which we probably don't want in case we > get more flags in the future, so we'll rather want bdrv_set_flag() or > something) and I don't really want to make these changes now... We could > postpone this, though. Making this change later shouldn't break anything. > > The other solution would be simply to suppress the stderr message but to > always deliver the QMP event. That feels like it would flood the channel, if the image is being repeatedly read. The approach of warning exactly once is nice because it prevents flooding. We already have a way to report per-format details, is it sufficient to modify ImageInfoSpecificQCow2 to add a bool field that reports true if the image is currently known to be corrupted?
On 05.09.2014 16:51, Eric Blake wrote: > On 09/05/2014 08:47 AM, Max Reitz wrote: >> On 05.09.2014 16:40, Eric Blake wrote: >>> On 09/05/2014 08:07 AM, Max Reitz wrote: >>>> Not every BLOCK_IMAGE_CORRUPTED event must be fatal; for example, when >>>> reading from an image, they should generally not be. Nonetheless, even >>>> an image only read from may of course be corrupted and this can be >>>> detected during normal operation. In this case, a non-fatal event should >>>> be emitted, but the image should not be marked corrupt (in accordance to >>>> "fatal" set to false). >>>> >>> Question - what happens if management misses the signal? For example, >>> if libvirt opens qemu on a read-only image, then goes away, then >>> corruption is detected, then libvirt reconnects. Does query-block need >>> to also be updated to report whether a read-only BDS is currently >>> detected as fatal, but that an event has already been delivered? >> Well, the obvious problem with that is that corruption currently is a >> strongly format-specific topic, and only reported for qcow2. So, to do >> this, we'd have to move the corruption signalling code into the common >> block layer functions and proceed from there. This actually probably >> isn't too bad of an idea, anyway. But then we'll need a global >> bdrv_mark_corrupt() function (which we probably don't want in case we >> get more flags in the future, so we'll rather want bdrv_set_flag() or >> something) and I don't really want to make these changes now... We could >> postpone this, though. Making this change later shouldn't break anything. >> >> The other solution would be simply to suppress the stderr message but to >> always deliver the QMP event. > That feels like it would flood the channel, if the image is being > repeatedly read. The approach of warning exactly once is nice because > it prevents flooding. We already have a way to report per-format > details, is it sufficient to modify ImageInfoSpecificQCow2 to add a bool > field that reports true if the image is currently known to be corrupted? Actually, I'm asking myself right now why we don't already have such a field. :-) I'll add it in an independent patch. Max
The Friday 05 Sep 2014 à 16:07:15 (+0200), Max Reitz wrote : > Not every BLOCK_IMAGE_CORRUPTED event must be fatal; for example, when > reading from an image, they should generally not be. Nonetheless, even > an image only read from may of course be corrupted and this can be > detected during normal operation. In this case, a non-fatal event should > be emitted, but the image should not be marked corrupt (in accordance to > "fatal" set to false). > > Signed-off-by: Max Reitz <mreitz@redhat.com> > --- > block/qcow2-refcount.c | 1 + > qapi/block-core.json | 9 +++++++-- > 2 files changed, 8 insertions(+), 2 deletions(-) > > diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c > index 43665b8..0bd75d2 100644 > --- a/block/qcow2-refcount.c > +++ b/block/qcow2-refcount.c > @@ -1853,6 +1853,7 @@ int qcow2_pre_write_overlap_check(BlockDriverState *bs, int ign, int64_t offset, > offset, > true, > size, > + true, What is this line ? > &error_abort); > g_free(message); > > diff --git a/qapi/block-core.json b/qapi/block-core.json > index a685d02..d23bcc2 100644 > --- a/qapi/block-core.json > +++ b/qapi/block-core.json > @@ -1554,7 +1554,7 @@ > ## > # @BLOCK_IMAGE_CORRUPTED > # > -# Emitted when a disk image is being marked corrupt > +# Emitted when a corruption has been detected in a disk image > # > # @device: device name > # > @@ -1568,13 +1568,18 @@ > # @size: #optional, if the corruption resulted from an image access, this is > # the access size > # > +# fatal: if set, the image is marked corrupt and therefore unusable after this > +# event and must be repaired (Since 2.2; before, every > +# BLOCK_IMAGE_CORRUPTED event was fatal) > +# > # Since: 1.7 > ## > { 'event': 'BLOCK_IMAGE_CORRUPTED', > 'data': { 'device' : 'str', > 'msg' : 'str', > '*offset': 'int', > - '*size' : 'int' } } > + '*size' : 'int', > + 'fatal' : 'bool' } } > > ## > # @BLOCK_IO_ERROR > -- > 2.1.0 > >
On 08.09.2014 16:01, Benoît Canet wrote: > The Friday 05 Sep 2014 à 16:07:15 (+0200), Max Reitz wrote : >> Not every BLOCK_IMAGE_CORRUPTED event must be fatal; for example, when >> reading from an image, they should generally not be. Nonetheless, even >> an image only read from may of course be corrupted and this can be >> detected during normal operation. In this case, a non-fatal event should >> be emitted, but the image should not be marked corrupt (in accordance to >> "fatal" set to false). >> >> Signed-off-by: Max Reitz <mreitz@redhat.com> >> --- >> block/qcow2-refcount.c | 1 + >> qapi/block-core.json | 9 +++++++-- >> 2 files changed, 8 insertions(+), 2 deletions(-) >> >> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c >> index 43665b8..0bd75d2 100644 >> --- a/block/qcow2-refcount.c >> +++ b/block/qcow2-refcount.c >> @@ -1853,6 +1853,7 @@ int qcow2_pre_write_overlap_check(BlockDriverState *bs, int ign, int64_t offset, >> offset, >> true, >> size, >> + true, > What is this line ? It's the new "fatal" field in BLOCK_IMAGE_CORRUPTED (missing context: qapi_event_send_block_image_corrupted(bdrv_get_device_name(bs), message, true, offset, true, size, +true, &error_abort)). Max >> &error_abort); >> g_free(message); >> >> diff --git a/qapi/block-core.json b/qapi/block-core.json >> index a685d02..d23bcc2 100644 >> --- a/qapi/block-core.json >> +++ b/qapi/block-core.json >> @@ -1554,7 +1554,7 @@ >> ## >> # @BLOCK_IMAGE_CORRUPTED >> # >> -# Emitted when a disk image is being marked corrupt >> +# Emitted when a corruption has been detected in a disk image >> # >> # @device: device name >> # >> @@ -1568,13 +1568,18 @@ >> # @size: #optional, if the corruption resulted from an image access, this is >> # the access size >> # >> +# fatal: if set, the image is marked corrupt and therefore unusable after this >> +# event and must be repaired (Since 2.2; before, every >> +# BLOCK_IMAGE_CORRUPTED event was fatal) >> +# >> # Since: 1.7 >> ## >> { 'event': 'BLOCK_IMAGE_CORRUPTED', >> 'data': { 'device' : 'str', >> 'msg' : 'str', >> '*offset': 'int', >> - '*size' : 'int' } } >> + '*size' : 'int', >> + 'fatal' : 'bool' } } >> >> ## >> # @BLOCK_IO_ERROR >> -- >> 2.1.0 >> >>
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c index 43665b8..0bd75d2 100644 --- a/block/qcow2-refcount.c +++ b/block/qcow2-refcount.c @@ -1853,6 +1853,7 @@ int qcow2_pre_write_overlap_check(BlockDriverState *bs, int ign, int64_t offset, offset, true, size, + true, &error_abort); g_free(message); diff --git a/qapi/block-core.json b/qapi/block-core.json index a685d02..d23bcc2 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -1554,7 +1554,7 @@ ## # @BLOCK_IMAGE_CORRUPTED # -# Emitted when a disk image is being marked corrupt +# Emitted when a corruption has been detected in a disk image # # @device: device name # @@ -1568,13 +1568,18 @@ # @size: #optional, if the corruption resulted from an image access, this is # the access size # +# fatal: if set, the image is marked corrupt and therefore unusable after this +# event and must be repaired (Since 2.2; before, every +# BLOCK_IMAGE_CORRUPTED event was fatal) +# # Since: 1.7 ## { 'event': 'BLOCK_IMAGE_CORRUPTED', 'data': { 'device' : 'str', 'msg' : 'str', '*offset': 'int', - '*size' : 'int' } } + '*size' : 'int', + 'fatal' : 'bool' } } ## # @BLOCK_IO_ERROR
Not every BLOCK_IMAGE_CORRUPTED event must be fatal; for example, when reading from an image, they should generally not be. Nonetheless, even an image only read from may of course be corrupted and this can be detected during normal operation. In this case, a non-fatal event should be emitted, but the image should not be marked corrupt (in accordance to "fatal" set to false). Signed-off-by: Max Reitz <mreitz@redhat.com> --- block/qcow2-refcount.c | 1 + qapi/block-core.json | 9 +++++++-- 2 files changed, 8 insertions(+), 2 deletions(-)