diff mbox

block: fix sector comparism in multiwrite_req_compare

Message ID 20100519185309.GA27591@lst.de
State New
Headers show

Commit Message

Christoph Hellwig May 19, 2010, 6:53 p.m. UTC
The difference between the start sectors of two requests can be larger
than the size of the "int" type, which can lead to a not correctly
sorted multiwrite array and thus spurious I/O errors and filesystem
corruption due to incorrect request merges.

So instead of doing the cute sector arithmetics trick spell out the
exact comparisms.

Spotted by Kevin Wolf based on a testcase from Michael Tokarev.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Comments

Michael Tokarev May 19, 2010, 7:26 p.m. UTC | #1
19.05.2010 22:53, Christoph Hellwig wrote:
> The difference between the start sectors of two requests can be larger
> than the size of the "int" type, which can lead to a not correctly
> sorted multiwrite array and thus spurious I/O errors and filesystem
> corruption due to incorrect request merges.
>
> So instead of doing the cute sector arithmetics trick spell out the
> exact comparisms.
>
> Spotted by Kevin Wolf based on a testcase from Michael Tokarev.
>
> Signed-off-by: Christoph Hellwig<hch@lst.de>
>
> Index: qemu/block.c
> ===================================================================
> --- qemu.orig/block.c	2010-05-19 17:08:24.970255636 +0200
> +++ qemu/block.c	2010-05-19 17:17:34.227006021 +0200
> @@ -1933,7 +1933,19 @@ static void multiwrite_cb(void *opaque,
>
>   static int multiwrite_req_compare(const void *a, const void *b)
>   {
> -    return (((BlockRequest*) a)->sector - ((BlockRequest*) b)->sector);
> +    const BlockRequest *req1 = a, *req2 = b;
> +
> +    /*
> +     * Note that we can't simply subtract req2->sector from req1->sector
> +     * here as that could overflow the return value.
> +     */
> +    if (req1->sector>  req2->sector) {
> +        return 1;
> +    } else if (req1->sector<  req2->sector) {
> +        return -1;
> +    } else {
> +        return 0;
> +    }
>   }

Fantastic.

I ran numerous mkfs'es of my 1.5Tb image, -- every single
run before resulted in at least one "fun" comparison there.

There is NO MORE OVERLAPPING REQUESTS.  At least I can't
trigger one.

Fantastic!

Thanks guys!

Now the question is if the previous patch by Avi is actually
worth to apply -- I mean this one:
  [Qemu-devel] [PATCH +stable] block: don't attempt to merge overlapping requests

But it looks like all 3 should be applied:

  [Qemu-devel] [PATCH] block: fix sector comparism in multiwrite_req_compare
  [Qemu-devel] [PATCH +stable] block: don't attempt to merge overlapping requests
  [Qemu-devel] [PATCH] virtio-blk: fix barrier support

..to -stable.

I wonder how many similar "funny" cases are still around... ;)

/mjt
Christoph Hellwig May 19, 2010, 7:38 p.m. UTC | #2
On Wed, May 19, 2010 at 11:26:58PM +0400, Michael Tokarev wrote:
> Now the question is if the previous patch by Avi is actually
> worth to apply -- I mean this one:
>  [Qemu-devel] [PATCH +stable] block: don't attempt to merge overlapping 
>  requests

No, that was just a hack, it shouldn't be applied.
Michael Tokarev May 19, 2010, 7:42 p.m. UTC | #3
19.05.2010 23:38, Christoph Hellwig wrote:
> On Wed, May 19, 2010 at 11:26:58PM +0400, Michael Tokarev wrote:
>> Now the question is if the previous patch by Avi is actually
>> worth to apply -- I mean this one:
>>   [Qemu-devel] [PATCH +stable] block: don't attempt to merge overlapping
>>   requests
>
> No, that was just a hack, it shouldn't be applied.

Actually it's not that obvious.  If the actual problem
here (besides the mis-comparison) is due to missing
barriers or flushes.  Avi asked a good question in that
thread.

But I'm hijacking the thread actually :)

/mjt
Kevin Wolf May 19, 2010, 9:09 p.m. UTC | #4
Am Mittwoch, 19. Mai 2010 21:42 schrieb Michael Tokarev:
> 19.05.2010 23:38, Christoph Hellwig wrote:
> > On Wed, May 19, 2010 at 11:26:58PM +0400, Michael Tokarev wrote:
> >> Now the question is if the previous patch by Avi is actually
> >> worth to apply -- I mean this one:
> >>   [Qemu-devel] [PATCH +stable] block: don't attempt to merge overlapping
> >>   requests
> >
> > No, that was just a hack, it shouldn't be applied.
> 
> Actually it's not that obvious.  If the actual problem
> here (besides the mis-comparison) is due to missing
> barriers or flushes.  Avi asked a good question in that
> thread.

It's obvious that it's a hack. It doesn't fix anything, it just disables a 
feature that didn't work. Good for debugging, but not something that you 
would like to commit.

It's reasonable to include something like this when we know that something is 
broken but we haven't found it yet - but I believe Christoph's patch is the 
real fix. If anyone can still find a case that is "fixed" by Avi's patch, I 
could be convinced to apply it anyway, but I'd prefer if I didn't have to.

Note that we actually don't have overlapping requests. It just looks like it 
because the qsort call doesn't work correctly with the broken comparison 
function, so lower sector numbers can come after higher ones.

Kevin
Avi Kivity May 20, 2010, 6:09 a.m. UTC | #5
On 05/20/2010 12:09 AM, Kevin Wolf wrote:
>
>> Actually it's not that obvious.  If the actual problem
>> here (besides the mis-comparison) is due to missing
>> barriers or flushes.  Avi asked a good question in that
>> thread.
>>      
> It's obvious that it's a hack. It doesn't fix anything, it just disables a
> feature that didn't work. Good for debugging, but not something that you
> would like to commit.
>
> It's reasonable to include something like this when we know that something is
> broken but we haven't found it yet - but I believe Christoph's patch is the
> real fix. If anyone can still find a case that is "fixed" by Avi's patch, I
> could be convinced to apply it anyway, but I'd prefer if I didn't have to.
>
> Note that we actually don't have overlapping requests. It just looks like it
> because the qsort call doesn't work correctly with the broken comparison
> function, so lower sector numbers can come after higher ones.
>    

I agree my patch didn't fix the problem, only made it disappear, but 
won't the current code break with overlapping requests?
Kevin Wolf May 20, 2010, 8:19 a.m. UTC | #6
Am 20.05.2010 08:09, schrieb Avi Kivity:
> On 05/20/2010 12:09 AM, Kevin Wolf wrote:
>>
>>> Actually it's not that obvious.  If the actual problem
>>> here (besides the mis-comparison) is due to missing
>>> barriers or flushes.  Avi asked a good question in that
>>> thread.
>>>      
>> It's obvious that it's a hack. It doesn't fix anything, it just disables a
>> feature that didn't work. Good for debugging, but not something that you
>> would like to commit.
>>
>> It's reasonable to include something like this when we know that something is
>> broken but we haven't found it yet - but I believe Christoph's patch is the
>> real fix. If anyone can still find a case that is "fixed" by Avi's patch, I
>> could be convinced to apply it anyway, but I'd prefer if I didn't have to.
>>
>> Note that we actually don't have overlapping requests. It just looks like it
>> because the qsort call doesn't work correctly with the broken comparison
>> function, so lower sector numbers can come after higher ones.
>>    
> 
> I agree my patch didn't fix the problem, only made it disappear, but 
> won't the current code break with overlapping requests?

Maybe --verbose for your patch descriptions would help. I didn't see any
obvious problem. If you know any, care to explain?

Anyway, I started to implement a multiwrite command for qemu-io
yesterday, so that I can actually test such scenarios.

Kevin
Avi Kivity May 20, 2010, 8:30 a.m. UTC | #7
On 05/20/2010 11:19 AM, Kevin Wolf wrote:
> Am 20.05.2010 08:09, schrieb Avi Kivity:
>    
>> On 05/20/2010 12:09 AM, Kevin Wolf wrote:
>>      
>>>        
>>>> Actually it's not that obvious.  If the actual problem
>>>> here (besides the mis-comparison) is due to missing
>>>> barriers or flushes.  Avi asked a good question in that
>>>> thread.
>>>>
>>>>          
>>> It's obvious that it's a hack. It doesn't fix anything, it just disables a
>>> feature that didn't work. Good for debugging, but not something that you
>>> would like to commit.
>>>
>>> It's reasonable to include something like this when we know that something is
>>> broken but we haven't found it yet - but I believe Christoph's patch is the
>>> real fix. If anyone can still find a case that is "fixed" by Avi's patch, I
>>> could be convinced to apply it anyway, but I'd prefer if I didn't have to.
>>>
>>> Note that we actually don't have overlapping requests. It just looks like it
>>> because the qsort call doesn't work correctly with the broken comparison
>>> function, so lower sector numbers can come after higher ones.
>>>
>>>        
>> I agree my patch didn't fix the problem, only made it disappear, but
>> won't the current code break with overlapping requests?
>>      
> Maybe --verbose for your patch descriptions would help. I didn't see any
> obvious problem. If you know any, care to explain?
>    

Looking again, you are right.  There is code to take care of the 
overlap, and even a comment.  So my patch is indeed bogus.

>             size_t size;
>             QEMUIOVector *qiov = qemu_mallocz(sizeof(*qiov));
>             qemu_iovec_init(qiov,
>                 reqs[outidx].qiov->niov + reqs[i].qiov->niov + 1);
>
>             // Add the first request to the merged one. If the 
> requests are
>             // overlapping, drop the last sectors of the first request.
>             size = (reqs[i].sector - reqs[outidx].sector) << 9;
>             qemu_iovec_concat(qiov, reqs[outidx].qiov, size);

size can overflow on 32-bit.

Unrelated issue:  it seems we read the request directly from guest 
memory.  Since we access it multiple times, the guest can play with the 
contents meanwhile, invalidating previous decisions.  Shouldn't we copy 
all non-data elements to private storage?
Kevin Wolf May 20, 2010, 8:50 a.m. UTC | #8
Am 19.05.2010 20:53, schrieb Christoph Hellwig:
> The difference between the start sectors of two requests can be larger
> than the size of the "int" type, which can lead to a not correctly
> sorted multiwrite array and thus spurious I/O errors and filesystem
> corruption due to incorrect request merges.
> 
> So instead of doing the cute sector arithmetics trick spell out the
> exact comparisms.
> 
> Spotted by Kevin Wolf based on a testcase from Michael Tokarev.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Thanks, applied to the block branch.

Kevin
diff mbox

Patch

Index: qemu/block.c
===================================================================
--- qemu.orig/block.c	2010-05-19 17:08:24.970255636 +0200
+++ qemu/block.c	2010-05-19 17:17:34.227006021 +0200
@@ -1933,7 +1933,19 @@  static void multiwrite_cb(void *opaque,
 
 static int multiwrite_req_compare(const void *a, const void *b)
 {
-    return (((BlockRequest*) a)->sector - ((BlockRequest*) b)->sector);
+    const BlockRequest *req1 = a, *req2 = b;
+
+    /*
+     * Note that we can't simply subtract req2->sector from req1->sector
+     * here as that could overflow the return value.
+     */
+    if (req1->sector > req2->sector) {
+        return 1;
+    } else if (req1->sector < req2->sector) {
+        return -1;
+    } else {
+        return 0;
+    }
 }
 
 /*