diff mbox

[1/3] qcow2: Check bs->drv in copy_sectors()

Message ID 1394491449-10897-2-git-send-email-mreitz@redhat.com
State New
Headers show

Commit Message

Max Reitz March 10, 2014, 10:44 p.m. UTC
Before dereferencing bs->drv for a call to its member bdrv_co_readv(),
copy_sectors() should check whether that pointer is indeed valid, since
it may have been set to NULL by e.g. a concurrent write triggering the
corruption prevention mechanism.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
To be precise, this still is a race condition. If bs->drv is set to NULL
after the check and before the call to bdrv_co_readv(), QEMU will
obviously still crash. However, in order to circumvent this behavior, we
would probably have to re-lock s->lock, check bs->drv, take the function
pointer to bdrv_co_readv() and then unlock s->lock before the function
is called. I found this rather ugly and therefore this still has a very
small chance of running into a race condition.
Therefore, I'm asking for your opinion on this, whether we can really
take this chance or should rather "do it right". In fact, if I were a
reviewer, I'd probably reject this patch and request the solution with
the function pointer (if there is no better solution), but I was afraid
to send such an ugly patch.
---
 block/qcow2-cluster.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Laszlo Ersek March 10, 2014, 11:16 p.m. UTC | #1
On 03/10/14 23:44, Max Reitz wrote:
> Before dereferencing bs->drv for a call to its member bdrv_co_readv(),
> copy_sectors() should check whether that pointer is indeed valid, since
> it may have been set to NULL by e.g. a concurrent write triggering the
> corruption prevention mechanism.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
> To be precise, this still is a race condition. If bs->drv is set to NULL
> after the check and before the call to bdrv_co_readv(), QEMU will
> obviously still crash. However, in order to circumvent this behavior, we
> would probably have to re-lock s->lock, check bs->drv, take the function
> pointer to bdrv_co_readv() and then unlock s->lock before the function
> is called. I found this rather ugly and therefore this still has a very
> small chance of running into a race condition.
> Therefore, I'm asking for your opinion on this, whether we can really
> take this chance or should rather "do it right". In fact, if I were a
> reviewer, I'd probably reject this patch and request the solution with
> the function pointer (if there is no better solution), but I was afraid
> to send such an ugly patch.
> ---
>  block/qcow2-cluster.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 36c1bed..9499df9 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -380,6 +380,10 @@ static int coroutine_fn copy_sectors(BlockDriverState *bs,
>  
>      BLKDBG_EVENT(bs->file, BLKDBG_COW_READ);
>  
> +    if (!bs->drv) {
> +        return -ENOMEDIUM;
> +    }
> +
>      /* Call .bdrv_co_readv() directly instead of using the public block-layer
>       * interface.  This avoids double I/O throttling and request tracking,
>       * which can lead to deadlock when block layer copy-on-read is enabled.
> 

I can't answer your question nor review this patch -- instead, I have a
question of my own: when you say "set to NULL by [...] the corruption
prevention mechanism", do you mean qcow2_pre_write_overlap_check():

        bs->drv = NULL; /* make BDS unusable */

If so: I thought that it was quite a bold move, but also that we'd find
the SIGSEGVs sooner or later... :)

Thanks
Laszlo
Kevin Wolf March 11, 2014, 10:16 a.m. UTC | #2
Am 11.03.2014 um 00:16 hat Laszlo Ersek geschrieben:
> On 03/10/14 23:44, Max Reitz wrote:
> > Before dereferencing bs->drv for a call to its member bdrv_co_readv(),
> > copy_sectors() should check whether that pointer is indeed valid, since
> > it may have been set to NULL by e.g. a concurrent write triggering the
> > corruption prevention mechanism.
> > 
> > Signed-off-by: Max Reitz <mreitz@redhat.com>
> > ---
> > To be precise, this still is a race condition. If bs->drv is set to NULL
> > after the check and before the call to bdrv_co_readv(), QEMU will
> > obviously still crash. However, in order to circumvent this behavior, we
> > would probably have to re-lock s->lock, check bs->drv, take the function
> > pointer to bdrv_co_readv() and then unlock s->lock before the function
> > is called. I found this rather ugly and therefore this still has a very
> > small chance of running into a race condition.
> > Therefore, I'm asking for your opinion on this, whether we can really
> > take this chance or should rather "do it right". In fact, if I were a
> > reviewer, I'd probably reject this patch and request the solution with
> > the function pointer (if there is no better solution), but I was afraid
> > to send such an ugly patch.

No, the code is fine. Remember that qcow2 is not threaded, we're talking
about coroutines here. There is no way for the code to yield between
your check and the protected place.

> >  block/qcow2-cluster.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> > index 36c1bed..9499df9 100644
> > --- a/block/qcow2-cluster.c
> > +++ b/block/qcow2-cluster.c
> > @@ -380,6 +380,10 @@ static int coroutine_fn copy_sectors(BlockDriverState *bs,
> >  
> >      BLKDBG_EVENT(bs->file, BLKDBG_COW_READ);
> >  
> > +    if (!bs->drv) {
> > +        return -ENOMEDIUM;
> > +    }
> > +
> >      /* Call .bdrv_co_readv() directly instead of using the public block-layer
> >       * interface.  This avoids double I/O throttling and request tracking,
> >       * which can lead to deadlock when block layer copy-on-read is enabled.
> > 
> 
> I can't answer your question nor review this patch -- instead, I have a
> question of my own: when you say "set to NULL by [...] the corruption
> prevention mechanism", do you mean qcow2_pre_write_overlap_check():
> 
>         bs->drv = NULL; /* make BDS unusable */

Yes, this is the place.

> If so: I thought that it was quite a bold move, but also that we'd find
> the SIGSEGVs sooner or later... :)

In fact, if you use the block layer API, most functions check for
bs->drv and return -ENOMEDIUM if it is NULL. The problem here is that we
directly dereference the pointer without going through block.c (there's
a good reason for this, see the comment, but it still makes it somewhat
special).

Kevin
Paolo Bonzini March 11, 2014, 2:04 p.m. UTC | #3
Il 11/03/2014 11:16, Kevin Wolf ha scritto:
> Am 11.03.2014 um 00:16 hat Laszlo Ersek geschrieben:
>> On 03/10/14 23:44, Max Reitz wrote:
>>> Before dereferencing bs->drv for a call to its member bdrv_co_readv(),
>>> copy_sectors() should check whether that pointer is indeed valid, since
>>> it may have been set to NULL by e.g. a concurrent write triggering the
>>> corruption prevention mechanism.
>>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>> To be precise, this still is a race condition. If bs->drv is set to NULL
>>> after the check and before the call to bdrv_co_readv(), QEMU will
>>> obviously still crash. However, in order to circumvent this behavior, we
>>> would probably have to re-lock s->lock, check bs->drv, take the function
>>> pointer to bdrv_co_readv() and then unlock s->lock before the function
>>> is called. I found this rather ugly and therefore this still has a very
>>> small chance of running into a race condition.
>>> Therefore, I'm asking for your opinion on this, whether we can really
>>> take this chance or should rather "do it right". In fact, if I were a
>>> reviewer, I'd probably reject this patch and request the solution with
>>> the function pointer (if there is no better solution), but I was afraid
>>> to send such an ugly patch.
>
> No, the code is fine. Remember that qcow2 is not threaded, we're talking
> about coroutines here. There is no way for the code to yield between
> your check and the protected place.
>
>>>  block/qcow2-cluster.c | 4 ++++
>>>  1 file changed, 4 insertions(+)
>>>
>>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>>> index 36c1bed..9499df9 100644
>>> --- a/block/qcow2-cluster.c
>>> +++ b/block/qcow2-cluster.c
>>> @@ -380,6 +380,10 @@ static int coroutine_fn copy_sectors(BlockDriverState *bs,
>>>
>>>      BLKDBG_EVENT(bs->file, BLKDBG_COW_READ);
>>>
>>> +    if (!bs->drv) {
>>> +        return -ENOMEDIUM;
>>> +    }
>>> +
>>>      /* Call .bdrv_co_readv() directly instead of using the public block-layer
>>>       * interface.  This avoids double I/O throttling and request tracking,
>>>       * which can lead to deadlock when block layer copy-on-read is enabled.
>>>
>>
>> I can't answer your question nor review this patch -- instead, I have a
>> question of my own: when you say "set to NULL by [...] the corruption
>> prevention mechanism", do you mean qcow2_pre_write_overlap_check():
>>
>>         bs->drv = NULL; /* make BDS unusable */
>
> Yes, this is the place.
>
>> If so: I thought that it was quite a bold move, but also that we'd find
>> the SIGSEGVs sooner or later... :)
>
> In fact, if you use the block layer API, most functions check for
> bs->drv and return -ENOMEDIUM if it is NULL. The problem here is that we
> directly dereference the pointer without going through block.c (there's
> a good reason for this, see the comment, but it still makes it somewhat
> special).

But why not call qcow2_co_readv directly?

Paolo
Max Reitz March 11, 2014, 6:09 p.m. UTC | #4
On 11.03.2014 11:16, Kevin Wolf wrote:
> Am 11.03.2014 um 00:16 hat Laszlo Ersek geschrieben:
>> On 03/10/14 23:44, Max Reitz wrote:
>>> Before dereferencing bs->drv for a call to its member bdrv_co_readv(),
>>> copy_sectors() should check whether that pointer is indeed valid, since
>>> it may have been set to NULL by e.g. a concurrent write triggering the
>>> corruption prevention mechanism.
>>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>> To be precise, this still is a race condition. If bs->drv is set to NULL
>>> after the check and before the call to bdrv_co_readv(), QEMU will
>>> obviously still crash. However, in order to circumvent this behavior, we
>>> would probably have to re-lock s->lock, check bs->drv, take the function
>>> pointer to bdrv_co_readv() and then unlock s->lock before the function
>>> is called. I found this rather ugly and therefore this still has a very
>>> small chance of running into a race condition.
>>> Therefore, I'm asking for your opinion on this, whether we can really
>>> take this chance or should rather "do it right". In fact, if I were a
>>> reviewer, I'd probably reject this patch and request the solution with
>>> the function pointer (if there is no better solution), but I was afraid
>>> to send such an ugly patch.
> No, the code is fine. Remember that qcow2 is not threaded, we're talking
> about coroutines here. There is no way for the code to yield between
> your check and the protected place.

Ah, okay, that makes sense. Thank you. :-)

Max

>>>   block/qcow2-cluster.c | 4 ++++
>>>   1 file changed, 4 insertions(+)
>>>
>>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>>> index 36c1bed..9499df9 100644
>>> --- a/block/qcow2-cluster.c
>>> +++ b/block/qcow2-cluster.c
>>> @@ -380,6 +380,10 @@ static int coroutine_fn copy_sectors(BlockDriverState *bs,
>>>   
>>>       BLKDBG_EVENT(bs->file, BLKDBG_COW_READ);
>>>   
>>> +    if (!bs->drv) {
>>> +        return -ENOMEDIUM;
>>> +    }
>>> +
>>>       /* Call .bdrv_co_readv() directly instead of using the public block-layer
>>>        * interface.  This avoids double I/O throttling and request tracking,
>>>        * which can lead to deadlock when block layer copy-on-read is enabled.
>>>
>> I can't answer your question nor review this patch -- instead, I have a
>> question of my own: when you say "set to NULL by [...] the corruption
>> prevention mechanism", do you mean qcow2_pre_write_overlap_check():
>>
>>          bs->drv = NULL; /* make BDS unusable */
> Yes, this is the place.
>
>> If so: I thought that it was quite a bold move, but also that we'd find
>> the SIGSEGVs sooner or later... :)
> In fact, if you use the block layer API, most functions check for
> bs->drv and return -ENOMEDIUM if it is NULL. The problem here is that we
> directly dereference the pointer without going through block.c (there's
> a good reason for this, see the comment, but it still makes it somewhat
> special).
>
> Kevin
Max Reitz March 11, 2014, 6:19 p.m. UTC | #5
On 11.03.2014 15:04, Paolo Bonzini wrote:
> Il 11/03/2014 11:16, Kevin Wolf ha scritto:
>> Am 11.03.2014 um 00:16 hat Laszlo Ersek geschrieben:
>>> On 03/10/14 23:44, Max Reitz wrote:
>>>> Before dereferencing bs->drv for a call to its member bdrv_co_readv(),
>>>> copy_sectors() should check whether that pointer is indeed valid, 
>>>> since
>>>> it may have been set to NULL by e.g. a concurrent write triggering the
>>>> corruption prevention mechanism.
>>>>
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>> ---
>>>> To be precise, this still is a race condition. If bs->drv is set to 
>>>> NULL
>>>> after the check and before the call to bdrv_co_readv(), QEMU will
>>>> obviously still crash. However, in order to circumvent this 
>>>> behavior, we
>>>> would probably have to re-lock s->lock, check bs->drv, take the 
>>>> function
>>>> pointer to bdrv_co_readv() and then unlock s->lock before the function
>>>> is called. I found this rather ugly and therefore this still has a 
>>>> very
>>>> small chance of running into a race condition.
>>>> Therefore, I'm asking for your opinion on this, whether we can really
>>>> take this chance or should rather "do it right". In fact, if I were a
>>>> reviewer, I'd probably reject this patch and request the solution with
>>>> the function pointer (if there is no better solution), but I was 
>>>> afraid
>>>> to send such an ugly patch.
>>
>> No, the code is fine. Remember that qcow2 is not threaded, we're talking
>> about coroutines here. There is no way for the code to yield between
>> your check and the protected place.
>>
>>>>  block/qcow2-cluster.c | 4 ++++
>>>>  1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>>>> index 36c1bed..9499df9 100644
>>>> --- a/block/qcow2-cluster.c
>>>> +++ b/block/qcow2-cluster.c
>>>> @@ -380,6 +380,10 @@ static int coroutine_fn 
>>>> copy_sectors(BlockDriverState *bs,
>>>>
>>>>      BLKDBG_EVENT(bs->file, BLKDBG_COW_READ);
>>>>
>>>> +    if (!bs->drv) {
>>>> +        return -ENOMEDIUM;
>>>> +    }
>>>> +
>>>>      /* Call .bdrv_co_readv() directly instead of using the public 
>>>> block-layer
>>>>       * interface.  This avoids double I/O throttling and request 
>>>> tracking,
>>>>       * which can lead to deadlock when block layer copy-on-read is 
>>>> enabled.
>>>>
>>>
>>> I can't answer your question nor review this patch -- instead, I have a
>>> question of my own: when you say "set to NULL by [...] the corruption
>>> prevention mechanism", do you mean qcow2_pre_write_overlap_check():
>>>
>>>         bs->drv = NULL; /* make BDS unusable */
>>
>> Yes, this is the place.
>>
>>> If so: I thought that it was quite a bold move, but also that we'd find
>>> the SIGSEGVs sooner or later... :)
>>
>> In fact, if you use the block layer API, most functions check for
>> bs->drv and return -ENOMEDIUM if it is NULL. The problem here is that we
>> directly dereference the pointer without going through block.c (there's
>> a good reason for this, see the comment, but it still makes it somewhat
>> special).
>
> But why not call qcow2_co_readv directly?

qcow2_co_readv() would probably actually perform the operation, as far 
as I can see, since it does not check whether the medium is corrupt 
(this is only checked when opening the block device). This is the reason 
why bs->drv is set to NULL, so noone is able to read from the block 
device anymore, even without checking for corruption (additionally, on 
version 2 images, there is not even a way to indicate corruption other 
than setting bs->drv to NULL).
Normally, every read operation would pass through bdrv_co_readv(), which 
will (have to) check whether bs->drv is valid or NULL - since this code 
decides to skip that and call the driver function directly, we therefore 
have to check bs->drv anyway, since it is the only way of telling us 
whether the image may actually be accessed right here. We could change 
the bs->drv->bdrv_co_readv() call to qcow2_co_readv(), but then people 
will probably ask why there is a bs->drv check right before.

Max
diff mbox

Patch

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 36c1bed..9499df9 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -380,6 +380,10 @@  static int coroutine_fn copy_sectors(BlockDriverState *bs,
 
     BLKDBG_EVENT(bs->file, BLKDBG_COW_READ);
 
+    if (!bs->drv) {
+        return -ENOMEDIUM;
+    }
+
     /* Call .bdrv_co_readv() directly instead of using the public block-layer
      * interface.  This avoids double I/O throttling and request tracking,
      * which can lead to deadlock when block layer copy-on-read is enabled.