diff mbox

[v2,1/5] qapi/block: Add "fatal" to BLOCK_IMAGE_CORRUPTED

Message ID 1409926039-29044-2-git-send-email-mreitz@redhat.com
State New
Headers show

Commit Message

Max Reitz Sept. 5, 2014, 2:07 p.m. UTC
Not every BLOCK_IMAGE_CORRUPTED event must be fatal; for example, when
reading from an image, they should generally not be. Nonetheless, even
an image only read from may of course be corrupted and this can be
detected during normal operation. In this case, a non-fatal event should
be emitted, but the image should not be marked corrupt (in accordance to
"fatal" set to false).

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-refcount.c | 1 +
 qapi/block-core.json   | 9 +++++++--
 2 files changed, 8 insertions(+), 2 deletions(-)

Comments

Eric Blake Sept. 5, 2014, 2:29 p.m. UTC | #1
On 09/05/2014 08:07 AM, Max Reitz wrote:
> Not every BLOCK_IMAGE_CORRUPTED event must be fatal; for example, when
> reading from an image, they should generally not be. Nonetheless, even
> an image only read from may of course be corrupted and this can be
> detected during normal operation. In this case, a non-fatal event should
> be emitted, but the image should not be marked corrupt (in accordance to
> "fatal" set to false).
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2-refcount.c | 1 +
>  qapi/block-core.json   | 9 +++++++--
>  2 files changed, 8 insertions(+), 2 deletions(-)

As this is purely additive, it should not break any former clients.
Older clients not knowing to look for the fatal flag will treat
everything as fatal, but that's okay.

Reviewed-by: Eric Blake <eblake@redhat.com>
Eric Blake Sept. 5, 2014, 2:40 p.m. UTC | #2
On 09/05/2014 08:07 AM, Max Reitz wrote:
> Not every BLOCK_IMAGE_CORRUPTED event must be fatal; for example, when
> reading from an image, they should generally not be. Nonetheless, even
> an image only read from may of course be corrupted and this can be
> detected during normal operation. In this case, a non-fatal event should
> be emitted, but the image should not be marked corrupt (in accordance to
> "fatal" set to false).
> 

Question - what happens if management misses the signal?  For example,
if libvirt opens qemu on a read-only image, then goes away, then
corruption is detected, then libvirt reconnects.  Does query-block need
to also be updated to report whether a read-only BDS is currently
detected as fatal, but that an event has already been delivered?
Max Reitz Sept. 5, 2014, 2:47 p.m. UTC | #3
On 05.09.2014 16:40, Eric Blake wrote:
> On 09/05/2014 08:07 AM, Max Reitz wrote:
>> Not every BLOCK_IMAGE_CORRUPTED event must be fatal; for example, when
>> reading from an image, they should generally not be. Nonetheless, even
>> an image only read from may of course be corrupted and this can be
>> detected during normal operation. In this case, a non-fatal event should
>> be emitted, but the image should not be marked corrupt (in accordance to
>> "fatal" set to false).
>>
> Question - what happens if management misses the signal?  For example,
> if libvirt opens qemu on a read-only image, then goes away, then
> corruption is detected, then libvirt reconnects.  Does query-block need
> to also be updated to report whether a read-only BDS is currently
> detected as fatal, but that an event has already been delivered?

Well, the obvious problem with that is that corruption currently is a 
strongly format-specific topic, and only reported for qcow2. So, to do 
this, we'd have to move the corruption signalling code into the common 
block layer functions and proceed from there. This actually probably 
isn't too bad of an idea, anyway. But then we'll need a global 
bdrv_mark_corrupt() function (which we probably don't want in case we 
get more flags in the future, so we'll rather want bdrv_set_flag() or 
something) and I don't really want to make these changes now... We could 
postpone this, though. Making this change later shouldn't break anything.

The other solution would be simply to suppress the stderr message but to 
always deliver the QMP event.

Max
Eric Blake Sept. 5, 2014, 2:51 p.m. UTC | #4
On 09/05/2014 08:47 AM, Max Reitz wrote:
> On 05.09.2014 16:40, Eric Blake wrote:
>> On 09/05/2014 08:07 AM, Max Reitz wrote:
>>> Not every BLOCK_IMAGE_CORRUPTED event must be fatal; for example, when
>>> reading from an image, they should generally not be. Nonetheless, even
>>> an image only read from may of course be corrupted and this can be
>>> detected during normal operation. In this case, a non-fatal event should
>>> be emitted, but the image should not be marked corrupt (in accordance to
>>> "fatal" set to false).
>>>
>> Question - what happens if management misses the signal?  For example,
>> if libvirt opens qemu on a read-only image, then goes away, then
>> corruption is detected, then libvirt reconnects.  Does query-block need
>> to also be updated to report whether a read-only BDS is currently
>> detected as fatal, but that an event has already been delivered?
> 
> Well, the obvious problem with that is that corruption currently is a
> strongly format-specific topic, and only reported for qcow2. So, to do
> this, we'd have to move the corruption signalling code into the common
> block layer functions and proceed from there. This actually probably
> isn't too bad of an idea, anyway. But then we'll need a global
> bdrv_mark_corrupt() function (which we probably don't want in case we
> get more flags in the future, so we'll rather want bdrv_set_flag() or
> something) and I don't really want to make these changes now... We could
> postpone this, though. Making this change later shouldn't break anything.
> 
> The other solution would be simply to suppress the stderr message but to
> always deliver the QMP event.

That feels like it would flood the channel, if the image is being
repeatedly read.  The approach of warning exactly once is nice because
it prevents flooding.  We already have a way to report per-format
details, is it sufficient to modify ImageInfoSpecificQCow2 to add a bool
field that reports true if the image is currently known to be corrupted?
Max Reitz Sept. 5, 2014, 2:53 p.m. UTC | #5
On 05.09.2014 16:51, Eric Blake wrote:
> On 09/05/2014 08:47 AM, Max Reitz wrote:
>> On 05.09.2014 16:40, Eric Blake wrote:
>>> On 09/05/2014 08:07 AM, Max Reitz wrote:
>>>> Not every BLOCK_IMAGE_CORRUPTED event must be fatal; for example, when
>>>> reading from an image, they should generally not be. Nonetheless, even
>>>> an image only read from may of course be corrupted and this can be
>>>> detected during normal operation. In this case, a non-fatal event should
>>>> be emitted, but the image should not be marked corrupt (in accordance to
>>>> "fatal" set to false).
>>>>
>>> Question - what happens if management misses the signal?  For example,
>>> if libvirt opens qemu on a read-only image, then goes away, then
>>> corruption is detected, then libvirt reconnects.  Does query-block need
>>> to also be updated to report whether a read-only BDS is currently
>>> detected as fatal, but that an event has already been delivered?
>> Well, the obvious problem with that is that corruption currently is a
>> strongly format-specific topic, and only reported for qcow2. So, to do
>> this, we'd have to move the corruption signalling code into the common
>> block layer functions and proceed from there. This actually probably
>> isn't too bad of an idea, anyway. But then we'll need a global
>> bdrv_mark_corrupt() function (which we probably don't want in case we
>> get more flags in the future, so we'll rather want bdrv_set_flag() or
>> something) and I don't really want to make these changes now... We could
>> postpone this, though. Making this change later shouldn't break anything.
>>
>> The other solution would be simply to suppress the stderr message but to
>> always deliver the QMP event.
> That feels like it would flood the channel, if the image is being
> repeatedly read.  The approach of warning exactly once is nice because
> it prevents flooding.  We already have a way to report per-format
> details, is it sufficient to modify ImageInfoSpecificQCow2 to add a bool
> field that reports true if the image is currently known to be corrupted?

Actually, I'm asking myself right now why we don't already have such a 
field. :-)

I'll add it in an independent patch.

Max
Benoît Canet Sept. 8, 2014, 2:01 p.m. UTC | #6
The Friday 05 Sep 2014 à 16:07:15 (+0200), Max Reitz wrote :
> Not every BLOCK_IMAGE_CORRUPTED event must be fatal; for example, when
> reading from an image, they should generally not be. Nonetheless, even
> an image only read from may of course be corrupted and this can be
> detected during normal operation. In this case, a non-fatal event should
> be emitted, but the image should not be marked corrupt (in accordance to
> "fatal" set to false).
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2-refcount.c | 1 +
>  qapi/block-core.json   | 9 +++++++--
>  2 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index 43665b8..0bd75d2 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -1853,6 +1853,7 @@ int qcow2_pre_write_overlap_check(BlockDriverState *bs, int ign, int64_t offset,
>                                                offset,
>                                                true,
>                                                size,

> +                                              true,

What is this line ?

>                                                &error_abort);
>          g_free(message);
>  
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index a685d02..d23bcc2 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -1554,7 +1554,7 @@
>  ##
>  # @BLOCK_IMAGE_CORRUPTED
>  #
> -# Emitted when a disk image is being marked corrupt
> +# Emitted when a corruption has been detected in a disk image
>  #
>  # @device: device name
>  #
> @@ -1568,13 +1568,18 @@
>  # @size: #optional, if the corruption resulted from an image access, this is
>  #        the access size
>  #
> +# fatal: if set, the image is marked corrupt and therefore unusable after this
> +#        event and must be repaired (Since 2.2; before, every
> +#        BLOCK_IMAGE_CORRUPTED event was fatal)
> +#
>  # Since: 1.7
>  ##
>  { 'event': 'BLOCK_IMAGE_CORRUPTED',
>    'data': { 'device' : 'str',
>              'msg'    : 'str',
>              '*offset': 'int',
> -            '*size'  : 'int' } }
> +            '*size'  : 'int',
> +            'fatal'  : 'bool' } }
>  
>  ##
>  # @BLOCK_IO_ERROR
> -- 
> 2.1.0
> 
>
Max Reitz Sept. 8, 2014, 5:40 p.m. UTC | #7
On 08.09.2014 16:01, Benoît Canet wrote:
> The Friday 05 Sep 2014 à 16:07:15 (+0200), Max Reitz wrote :
>> Not every BLOCK_IMAGE_CORRUPTED event must be fatal; for example, when
>> reading from an image, they should generally not be. Nonetheless, even
>> an image only read from may of course be corrupted and this can be
>> detected during normal operation. In this case, a non-fatal event should
>> be emitted, but the image should not be marked corrupt (in accordance to
>> "fatal" set to false).
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/qcow2-refcount.c | 1 +
>>   qapi/block-core.json   | 9 +++++++--
>>   2 files changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
>> index 43665b8..0bd75d2 100644
>> --- a/block/qcow2-refcount.c
>> +++ b/block/qcow2-refcount.c
>> @@ -1853,6 +1853,7 @@ int qcow2_pre_write_overlap_check(BlockDriverState *bs, int ign, int64_t offset,
>>                                                 offset,
>>                                                 true,
>>                                                 size,
>> +                                              true,
> What is this line ?

It's the new "fatal" field in BLOCK_IMAGE_CORRUPTED (missing context: 
qapi_event_send_block_image_corrupted(bdrv_get_device_name(bs), message, 
true, offset, true, size, +true, &error_abort)).

Max

>>                                                 &error_abort);
>>           g_free(message);
>>   
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index a685d02..d23bcc2 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -1554,7 +1554,7 @@
>>   ##
>>   # @BLOCK_IMAGE_CORRUPTED
>>   #
>> -# Emitted when a disk image is being marked corrupt
>> +# Emitted when a corruption has been detected in a disk image
>>   #
>>   # @device: device name
>>   #
>> @@ -1568,13 +1568,18 @@
>>   # @size: #optional, if the corruption resulted from an image access, this is
>>   #        the access size
>>   #
>> +# fatal: if set, the image is marked corrupt and therefore unusable after this
>> +#        event and must be repaired (Since 2.2; before, every
>> +#        BLOCK_IMAGE_CORRUPTED event was fatal)
>> +#
>>   # Since: 1.7
>>   ##
>>   { 'event': 'BLOCK_IMAGE_CORRUPTED',
>>     'data': { 'device' : 'str',
>>               'msg'    : 'str',
>>               '*offset': 'int',
>> -            '*size'  : 'int' } }
>> +            '*size'  : 'int',
>> +            'fatal'  : 'bool' } }
>>   
>>   ##
>>   # @BLOCK_IO_ERROR
>> -- 
>> 2.1.0
>>
>>
diff mbox

Patch

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 43665b8..0bd75d2 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1853,6 +1853,7 @@  int qcow2_pre_write_overlap_check(BlockDriverState *bs, int ign, int64_t offset,
                                               offset,
                                               true,
                                               size,
+                                              true,
                                               &error_abort);
         g_free(message);
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index a685d02..d23bcc2 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1554,7 +1554,7 @@ 
 ##
 # @BLOCK_IMAGE_CORRUPTED
 #
-# Emitted when a disk image is being marked corrupt
+# Emitted when a corruption has been detected in a disk image
 #
 # @device: device name
 #
@@ -1568,13 +1568,18 @@ 
 # @size: #optional, if the corruption resulted from an image access, this is
 #        the access size
 #
+# fatal: if set, the image is marked corrupt and therefore unusable after this
+#        event and must be repaired (Since 2.2; before, every
+#        BLOCK_IMAGE_CORRUPTED event was fatal)
+#
 # Since: 1.7
 ##
 { 'event': 'BLOCK_IMAGE_CORRUPTED',
   'data': { 'device' : 'str',
             'msg'    : 'str',
             '*offset': 'int',
-            '*size'  : 'int' } }
+            '*size'  : 'int',
+            'fatal'  : 'bool' } }
 
 ##
 # @BLOCK_IO_ERROR