[v2] block/vdi: Add locking for parallel requests

Message ID	1425045947-9271-1-git-send-email-mreitz@redhat.com
State	New
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> From: Max Reitz <mreitz@redhat.com> To: qemu-block@nongnu.org Date: Fri, 27 Feb 2015 09:05:47 -0500 Message-Id: <1425045947-9271-1-git-send-email-mreitz@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com>, qemu-stable@nongnu.org, qemu-devel@nongnu.org, Max Reitz <mreitz@redhat.com>, Stefan Hajnoczi <stefanha@redhat.com>, Paolo Bonzini <pbonzini@redhat.com> Subject: [Qemu-devel] [PATCH v2] block/vdi: Add locking for parallel requests Precedence: list Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org

Max Reitz Feb. 27, 2015, 2:05 p.m. UTC

Concurrently modifying the bmap does not seem to be a good idea; this patch adds
a lock for it. See https://bugs.launchpad.net/qemu/+bug/1422307 for what
can go wrong without.

Cc: qemu-stable <qemu-stable@nongnu.org>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
v2:
- Make the mutex cover vdi_co_write() completely [Kevin]
- Add a TODO comment [Kevin]
---
 block/vdi.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Stefan Hajnoczi Feb. 27, 2015, 4:57 p.m. UTC | #1

On Fri, Feb 27, 2015 at 09:05:47AM -0500, Max Reitz wrote:
> Concurrently modifying the bmap does not seem to be a good idea; this patch adds
> a lock for it. See https://bugs.launchpad.net/qemu/+bug/1422307 for what
> can go wrong without.
> 
> Cc: qemu-stable <qemu-stable@nongnu.org>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
> v2:
> - Make the mutex cover vdi_co_write() completely [Kevin]
> - Add a TODO comment [Kevin]
> ---
>  block/vdi.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/block/vdi.c b/block/vdi.c
> index 74030c6..f5f42ef 100644
> --- a/block/vdi.c
> +++ b/block/vdi.c
> @@ -51,6 +51,7 @@
>  
>  #include "qemu-common.h"
>  #include "block/block_int.h"
> +#include "block/coroutine.h"
>  #include "qemu/module.h"
>  #include "migration/migration.h"
>  
> @@ -196,6 +197,8 @@ typedef struct {
>      /* VDI header (converted to host endianness). */
>      VdiHeader header;
>  
> +    CoMutex bmap_lock;
> +
>      Error *migration_blocker;
>  } BDRVVdiState;
>  
> @@ -498,6 +501,8 @@ static int vdi_open(BlockDriverState *bs, QDict *options, int flags,
>          goto fail_free_bmap;
>      }
>  
> +    qemu_co_mutex_init(&s->bmap_lock);
> +
>      /* Disable migration when vdi images are used */
>      error_set(&s->migration_blocker,
>                QERR_BLOCK_FORMAT_FEATURE_NOT_SUPPORTED,
> @@ -607,6 +612,9 @@ static int vdi_co_write(BlockDriverState *bs,
>  
>      logout("\n");
>  
> +    /* TODO: Figure out why this is necessary */
> +    qemu_co_mutex_lock(&s->bmap_lock);

If we don't know why bmap_lock works, it would be more approprate to
take the same approach as VMDK and VHDX where there is a simply s->lock
that protects all reads and writes.  That way we know for sure there is
no parallel I/O going on.

(Since the problem is not understood, maybe reads in parallel with
writes could also cause problems.  Better to really do a coarse lock
instead of just bmap_lock in write.)

Stefan

Max Reitz Feb. 27, 2015, 4:57 p.m. UTC | #2

On 2015-02-27 at 11:57, Stefan Hajnoczi wrote:
> On Fri, Feb 27, 2015 at 09:05:47AM -0500, Max Reitz wrote:
>> Concurrently modifying the bmap does not seem to be a good idea; this patch adds
>> a lock for it. See https://bugs.launchpad.net/qemu/+bug/1422307 for what
>> can go wrong without.
>>
>> Cc: qemu-stable <qemu-stable@nongnu.org>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>> v2:
>> - Make the mutex cover vdi_co_write() completely [Kevin]
>> - Add a TODO comment [Kevin]
>> ---
>>   block/vdi.c | 11 +++++++++++
>>   1 file changed, 11 insertions(+)
>>
>> diff --git a/block/vdi.c b/block/vdi.c
>> index 74030c6..f5f42ef 100644
>> --- a/block/vdi.c
>> +++ b/block/vdi.c
>> @@ -51,6 +51,7 @@
>>   
>>   #include "qemu-common.h"
>>   #include "block/block_int.h"
>> +#include "block/coroutine.h"
>>   #include "qemu/module.h"
>>   #include "migration/migration.h"
>>   
>> @@ -196,6 +197,8 @@ typedef struct {
>>       /* VDI header (converted to host endianness). */
>>       VdiHeader header;
>>   
>> +    CoMutex bmap_lock;
>> +
>>       Error *migration_blocker;
>>   } BDRVVdiState;
>>   
>> @@ -498,6 +501,8 @@ static int vdi_open(BlockDriverState *bs, QDict *options, int flags,
>>           goto fail_free_bmap;
>>       }
>>   
>> +    qemu_co_mutex_init(&s->bmap_lock);
>> +
>>       /* Disable migration when vdi images are used */
>>       error_set(&s->migration_blocker,
>>                 QERR_BLOCK_FORMAT_FEATURE_NOT_SUPPORTED,
>> @@ -607,6 +612,9 @@ static int vdi_co_write(BlockDriverState *bs,
>>   
>>       logout("\n");
>>   
>> +    /* TODO: Figure out why this is necessary */
>> +    qemu_co_mutex_lock(&s->bmap_lock);
> If we don't know why bmap_lock works, it would be more approprate to
> take the same approach as VMDK and VHDX where there is a simply s->lock
> that protects all reads and writes.  That way we know for sure there is
> no parallel I/O going on.
>
> (Since the problem is not understood, maybe reads in parallel with
> writes could also cause problems.  Better to really do a coarse lock
> instead of just bmap_lock in write.)

OK, will do.

Max

Stefan Weil Feb. 27, 2015, 5:25 p.m. UTC | #3

Am 27.02.2015 um 17:57 schrieb Stefan Hajnoczi:
> On Fri, Feb 27, 2015 at 09:05:47AM -0500, Max Reitz wrote:
>> Concurrently modifying the bmap does not seem to be a good idea; this patch adds
>> a lock for it. See https://bugs.launchpad.net/qemu/+bug/1422307 for what
>> can go wrong without.
>>
>> Cc: qemu-stable <qemu-stable@nongnu.org>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>> v2:
>> - Make the mutex cover vdi_co_write() completely [Kevin]
>> - Add a TODO comment [Kevin]
[...]
>> If we don't know why bmap_lock works, it would be more approprate to
>> take the same approach as VMDK and VHDX where there is a simply s->lock
>> that protects all reads and writes.  That way we know for sure there is
>> no parallel I/O going on.
>>
>> (Since the problem is not understood, maybe reads in parallel with
>> writes could also cause problems.  Better to really do a coarse lock
>> instead of just bmap_lock in write.)
>>
>> Stefan

block/vdi.c was never written for multi-threaded access, see my comment
in the header of block/vdi.c:

 * The code is not thread safe (missing locks for changes in header and
 * block table, no problem with current QEMU).

This was true in the past, but obviously later multi-threaded access was
introduced for QEMU. Locking was added for qcow2 and other drivers in
2012 and 2013, but never for vdi.

I must admit that I don't know which parts of the block filesystem
drivers potentially run in parallel threads. Ideally there would be one
or more test cases which test multi-threaded operations and which
trigger a failure with the current vdi code.

If I had a simple test scenario, I could have a look on the problem.

The VMDK approach is fine as an intermediate work around, but please use
conditional compilation to allow easy tests without coarse locks (and
update the comments :-)).

Regards
Stefan (Weil)

Max Reitz Feb. 27, 2015, 5:28 p.m. UTC | #4

On 2015-02-27 at 12:25, Stefan Weil wrote:
> Am 27.02.2015 um 17:57 schrieb Stefan Hajnoczi:
>> On Fri, Feb 27, 2015 at 09:05:47AM -0500, Max Reitz wrote:
>>> Concurrently modifying the bmap does not seem to be a good idea; this patch adds
>>> a lock for it. See https://bugs.launchpad.net/qemu/+bug/1422307 for what
>>> can go wrong without.
>>>
>>> Cc: qemu-stable <qemu-stable@nongnu.org>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>> v2:
>>> - Make the mutex cover vdi_co_write() completely [Kevin]
>>> - Add a TODO comment [Kevin]
> [...]
>>> If we don't know why bmap_lock works, it would be more approprate to
>>> take the same approach as VMDK and VHDX where there is a simply s->lock
>>> that protects all reads and writes.  That way we know for sure there is
>>> no parallel I/O going on.
>>>
>>> (Since the problem is not understood, maybe reads in parallel with
>>> writes could also cause problems.  Better to really do a coarse lock
>>> instead of just bmap_lock in write.)
>>>
>>> Stefan
> block/vdi.c was never written for multi-threaded access, see my comment
> in the header of block/vdi.c:
>
>   * The code is not thread safe (missing locks for changes in header and
>   * block table, no problem with current QEMU).
>
> This was true in the past, but obviously later multi-threaded access was
> introduced for QEMU. Locking was added for qcow2 and other drivers in
> 2012 and 2013, but never for vdi.
>
> I must admit that I don't know which parts of the block filesystem
> drivers potentially run in parallel threads.Ideally there would be one
> or more test cases which test multi-threaded operations and which
> trigger a failure with the current vdi code.
>
> If I had a simple test scenario, I could have a look on the problem.

I have one for you. See the attached ruby script.

(If there are no "Pattern verification failed" messages, everything is good)

> The VMDK approach is fine as an intermediate work around, but please use
> conditional compilation to allow easy tests without coarse locks (and
> update the comments :-)).

Will a macro defined in vdi.c be enough?

Max

Stefan Weil Feb. 27, 2015, 5:34 p.m. UTC | #5

Am 27.02.2015 um 18:28 schrieb Max Reitz:
> On 2015-02-27 at 12:25, Stefan Weil wrote:
>> block/vdi.c was never written for multi-threaded access, see my comment
>> in the header of block/vdi.c:
>>
>>   * The code is not thread safe (missing locks for changes in header and
>>   * block table, no problem with current QEMU).
>>
>> This was true in the past, but obviously later multi-threaded access was
>> introduced for QEMU. Locking was added for qcow2 and other drivers in
>> 2012 and 2013, but never for vdi.
>>
>> I must admit that I don't know which parts of the block filesystem
>> drivers potentially run in parallel threads.Ideally there would be one
>> or more test cases which test multi-threaded operations and which
>> trigger a failure with the current vdi code.
>>
>> If I had a simple test scenario, I could have a look on the problem.
>
> I have one for you. See the attached ruby script.
>
> (If there are no "Pattern verification failed" messages, everything is
> good)
>
>> The VMDK approach is fine as an intermediate work around, but please use
>> conditional compilation to allow easy tests without coarse locks (and
>> update the comments :-)).
>
> Will a macro defined in vdi.c be enough?

Yes, that would be fine. vdi.c already has several locally defined
CONFIG_VDI_... macros.

Stefan

Paolo Bonzini Feb. 27, 2015, 5:35 p.m. UTC | #6

On 27/02/2015 18:25, Stefan Weil wrote:
> block/vdi.c was never written for multi-threaded access, see my comment
> in the header of block/vdi.c:

It is not using threads, only coroutines.  Preemption points of
coroutines are well defined, and I think that the bug could be present
even in the initial AIO-based version.

>  * The code is not thread safe (missing locks for changes in header and
>  * block table, no problem with current QEMU).
> 
> This was true in the past, but obviously later multi-threaded access was
> introduced for QEMU. Locking was added for qcow2 and other drivers in
> 2012 and 2013, but never for vdi.

qcow2 already had locking (based on AsyncContexts) before the conversion
to coroutines.  Other drivers implicitly had locking because they were
synchronous; locking was added because the conversion to coroutines made
them asynchronous.

vdi never got its locking because it was already asynchronous.

Paolo

Paolo Bonzini Feb. 27, 2015, 5:42 p.m. UTC | #7

On 27/02/2015 15:05, Max Reitz wrote:
> Concurrently modifying the bmap does not seem to be a good idea; this patch adds
> a lock for it. See https://bugs.launchpad.net/qemu/+bug/1422307 for what
> can go wrong without.
> 
> Cc: qemu-stable <qemu-stable@nongnu.org>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
> v2:
> - Make the mutex cover vdi_co_write() completely [Kevin]
> - Add a TODO comment [Kevin]

I think I know what the bug is.  Suppose you have two concurrent writes
to a non-allocated block, one at 16K...32K (in bytes) and one at
32K...48K.  The first write is enlarged to contain zeros, the second is
not.  Then you have two writes in flight:

      0       zeros
      ...     zeros
      16K     data1
      ...     data1
      32K     zeros      data2
      ...     zeros      data2
      48K     zeros
      ...     zeros
      64K

And the contents of 32K...48K are undefined.  If the above diagnosis is
correct, I'm not even sure why Max's v1 patch worked...

An optimized fix could be to use a CoRwLock, then:

- take it shared (read) around the write in the
"VDI_IS_ALLOCATED(bmap_entry)" path

- take it exclusive (write) around the write in the
"!VDI_IS_ALLOCATED(bmap_entry)" path

Paolo

> ---
>  block/vdi.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/block/vdi.c b/block/vdi.c
> index 74030c6..f5f42ef 100644
> --- a/block/vdi.c
> +++ b/block/vdi.c
> @@ -51,6 +51,7 @@
>  
>  #include "qemu-common.h"
>  #include "block/block_int.h"
> +#include "block/coroutine.h"
>  #include "qemu/module.h"
>  #include "migration/migration.h"
>  
> @@ -196,6 +197,8 @@ typedef struct {
>      /* VDI header (converted to host endianness). */
>      VdiHeader header;
>  
> +    CoMutex bmap_lock;
> +
>      Error *migration_blocker;
>  } BDRVVdiState;
>  
> @@ -498,6 +501,8 @@ static int vdi_open(BlockDriverState *bs, QDict *options, int flags,
>          goto fail_free_bmap;
>      }
>  
> +    qemu_co_mutex_init(&s->bmap_lock);
> +
>      /* Disable migration when vdi images are used */
>      error_set(&s->migration_blocker,
>                QERR_BLOCK_FORMAT_FEATURE_NOT_SUPPORTED,
> @@ -607,6 +612,9 @@ static int vdi_co_write(BlockDriverState *bs,
>  
>      logout("\n");
>  
> +    /* TODO: Figure out why this is necessary */
> +    qemu_co_mutex_lock(&s->bmap_lock);
> +
>      while (ret >= 0 && nb_sectors > 0) {
>          block_index = sector_num / s->block_sectors;
>          sector_in_block = sector_num % s->block_sectors;
> @@ -656,6 +664,7 @@ static int vdi_co_write(BlockDriverState *bs,
>  
>      logout("finished data write\n");
>      if (ret < 0) {
> +        qemu_co_mutex_unlock(&s->bmap_lock);
>          return ret;
>      }
>  
> @@ -674,6 +683,7 @@ static int vdi_co_write(BlockDriverState *bs,
>          block = NULL;
>  
>          if (ret < 0) {
> +            qemu_co_mutex_unlock(&s->bmap_lock);
>              return ret;
>          }
>  
> @@ -690,6 +700,7 @@ static int vdi_co_write(BlockDriverState *bs,
>          ret = bdrv_write(bs->file, offset, base, n_sectors);
>      }
>  
> +    qemu_co_mutex_unlock(&s->bmap_lock);
>      return ret;
>  }
>  
>

Stefan Weil Feb. 27, 2015, 6:07 p.m. UTC | #8

Am 27.02.2015 um 18:28 schrieb Max Reitz:
> On 2015-02-27 at 12:25, Stefan Weil wrote:
>> If I had a simple test scenario, I could have a look on the problem.
>
> I have one for you. See the attached ruby script.
>
> (If there are no "Pattern verification failed" messages, everything is
> good)

Your test runs without any failure message in my four test scenarios:

* test on x86_64 virtualized host, Debian Wheezy, with and without debug
enabled

* test on x86_64 Intel Core i7, Debian Jessie, with and without debug
enabled

I repeated the test several times. Do I have to run it in an endless loop?

Stefan

Max Reitz Feb. 27, 2015, 6:09 p.m. UTC | #9

On 2015-02-27 at 12:42, Paolo Bonzini wrote:
>
> On 27/02/2015 15:05, Max Reitz wrote:
>> Concurrently modifying the bmap does not seem to be a good idea; this patch adds
>> a lock for it. See https://bugs.launchpad.net/qemu/+bug/1422307 for what
>> can go wrong without.
>>
>> Cc: qemu-stable <qemu-stable@nongnu.org>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>> v2:
>> - Make the mutex cover vdi_co_write() completely [Kevin]
>> - Add a TODO comment [Kevin]
> I think I know what the bug is.  Suppose you have two concurrent writes
> to a non-allocated block, one at 16K...32K (in bytes) and one at
> 32K...48K.  The first write is enlarged to contain zeros, the second is
> not.  Then you have two writes in flight:
>
>        0       zeros
>        ...     zeros
>        16K     data1
>        ...     data1
>        32K     zeros      data2
>        ...     zeros      data2
>        48K     zeros
>        ...     zeros
>        64K
>
> And the contents of 32K...48K are undefined.  If the above diagnosis is
> correct, I'm not even sure why Max's v1 patch worked...

Maybe that's an issue, too; but the test case I sent out does 1 MB 
requests (and it fails), so this shouldn't matter there.

> An optimized fix could be to use a CoRwLock, then:

Yes, I'm actually already working on that.

Max

> - take it shared (read) around the write in the
> "VDI_IS_ALLOCATED(bmap_entry)" path
>
> - take it exclusive (write) around the write in the
> "!VDI_IS_ALLOCATED(bmap_entry)" path
>
> Paolo
>
>> ---
>>   block/vdi.c | 11 +++++++++++
>>   1 file changed, 11 insertions(+)
>>
>> diff --git a/block/vdi.c b/block/vdi.c
>> index 74030c6..f5f42ef 100644
>> --- a/block/vdi.c
>> +++ b/block/vdi.c
>> @@ -51,6 +51,7 @@
>>   
>>   #include "qemu-common.h"
>>   #include "block/block_int.h"
>> +#include "block/coroutine.h"
>>   #include "qemu/module.h"
>>   #include "migration/migration.h"
>>   
>> @@ -196,6 +197,8 @@ typedef struct {
>>       /* VDI header (converted to host endianness). */
>>       VdiHeader header;
>>   
>> +    CoMutex bmap_lock;
>> +
>>       Error *migration_blocker;
>>   } BDRVVdiState;
>>   
>> @@ -498,6 +501,8 @@ static int vdi_open(BlockDriverState *bs, QDict *options, int flags,
>>           goto fail_free_bmap;
>>       }
>>   
>> +    qemu_co_mutex_init(&s->bmap_lock);
>> +
>>       /* Disable migration when vdi images are used */
>>       error_set(&s->migration_blocker,
>>                 QERR_BLOCK_FORMAT_FEATURE_NOT_SUPPORTED,
>> @@ -607,6 +612,9 @@ static int vdi_co_write(BlockDriverState *bs,
>>   
>>       logout("\n");
>>   
>> +    /* TODO: Figure out why this is necessary */
>> +    qemu_co_mutex_lock(&s->bmap_lock);
>> +
>>       while (ret >= 0 && nb_sectors > 0) {
>>           block_index = sector_num / s->block_sectors;
>>           sector_in_block = sector_num % s->block_sectors;
>> @@ -656,6 +664,7 @@ static int vdi_co_write(BlockDriverState *bs,
>>   
>>       logout("finished data write\n");
>>       if (ret < 0) {
>> +        qemu_co_mutex_unlock(&s->bmap_lock);
>>           return ret;
>>       }
>>   
>> @@ -674,6 +683,7 @@ static int vdi_co_write(BlockDriverState *bs,
>>           block = NULL;
>>   
>>           if (ret < 0) {
>> +            qemu_co_mutex_unlock(&s->bmap_lock);
>>               return ret;
>>           }
>>   
>> @@ -690,6 +700,7 @@ static int vdi_co_write(BlockDriverState *bs,
>>           ret = bdrv_write(bs->file, offset, base, n_sectors);
>>       }
>>   
>> +    qemu_co_mutex_unlock(&s->bmap_lock);
>>       return ret;
>>   }
>>   
>>

Max Reitz Feb. 27, 2015, 6:09 p.m. UTC | #10

On 2015-02-27 at 13:07, Stefan Weil wrote:
> Am 27.02.2015 um 18:28 schrieb Max Reitz:
>> On 2015-02-27 at 12:25, Stefan Weil wrote:
>>> If I had a simple test scenario, I could have a look on the problem.
>> I have one for you. See the attached ruby script.
>>
>> (If there are no "Pattern verification failed" messages, everything is
>> good)
> Your test runs without any failure message in my four test scenarios:
>
> * test on x86_64 virtualized host, Debian Wheezy, with and without debug
> enabled
>
> * test on x86_64 Intel Core i7, Debian Jessie, with and without debug
> enabled
>
> I repeated the test several times. Do I have to run it in an endless loop?

It always fails for me. Do you have an SSD?

Max

Stefan Weil Feb. 27, 2015, 6:12 p.m. UTC | #11

Am 27.02.2015 um 19:09 schrieb Max Reitz:
> It always fails for me. Do you have an SSD?

On the real machine: yes. On the virtual machine: maybe.

Max Reitz Feb. 27, 2015, 6:15 p.m. UTC | #12

On 2015-02-27 at 13:12, Stefan Weil wrote:
> Am 27.02.2015 um 19:09 schrieb Max Reitz:
>> It always fails for me. Do you have an SSD?
> On the real machine: yes. On the virtual machine: maybe.

I don't, so maybe that's the issue. But running it in tmpfs doesn't 
change anything for me, still fails (every time).

Max

Max Reitz Feb. 27, 2015, 6:27 p.m. UTC | #13

On 2015-02-27 at 13:09, Max Reitz wrote:
> On 2015-02-27 at 12:42, Paolo Bonzini wrote:
>>
>> On 27/02/2015 15:05, Max Reitz wrote:
>>> Concurrently modifying the bmap does not seem to be a good idea; 
>>> this patch adds
>>> a lock for it. See https://bugs.launchpad.net/qemu/+bug/1422307 for 
>>> what
>>> can go wrong without.
>>>
>>> Cc: qemu-stable <qemu-stable@nongnu.org>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>> v2:
>>> - Make the mutex cover vdi_co_write() completely [Kevin]
>>> - Add a TODO comment [Kevin]
>> I think I know what the bug is.  Suppose you have two concurrent writes
>> to a non-allocated block, one at 16K...32K (in bytes) and one at
>> 32K...48K.  The first write is enlarged to contain zeros, the second is
>> not.  Then you have two writes in flight:
>>
>>        0       zeros
>>        ...     zeros
>>        16K     data1
>>        ...     data1
>>        32K     zeros      data2
>>        ...     zeros      data2
>>        48K     zeros
>>        ...     zeros
>>        64K
>>
>> And the contents of 32K...48K are undefined.  If the above diagnosis is
>> correct, I'm not even sure why Max's v1 patch worked...
>
> Maybe that's an issue, too; but the test case I sent out does 1 MB 
> requests (and it fails), so this shouldn't matter there.

Considering that test case didn't work for Stefan (Weil), and it fails 
in a pretty strange way for me (no output from the qemu-io command at 
all, and while most reads from raw were successful, all reads from vdi 
failed (the pattern verification failed), maybe that's something 
completely different.

Indeed, when I do sub-MB writes, I get sporadic errors which seem much 
more related to the original bug report, so it's probably the issue you 
found that's the real problem.

Also, my test case suddenly stopped reproducing the issue on my HDD and 
only does it on tmpfs. Weird.

Max

>> An optimized fix could be to use a CoRwLock, then:
>
> Yes, I'm actually already working on that.
>
> Max
>
>> - take it shared (read) around the write in the
>> "VDI_IS_ALLOCATED(bmap_entry)" path
>>
>> - take it exclusive (write) around the write in the
>> "!VDI_IS_ALLOCATED(bmap_entry)" path
>>
>> Paolo

Max Reitz Feb. 27, 2015, 6:55 p.m. UTC | #14

On 2015-02-27 at 13:15, Max Reitz wrote:
> On 2015-02-27 at 13:12, Stefan Weil wrote:
>> Am 27.02.2015 um 19:09 schrieb Max Reitz:
>>> It always fails for me. Do you have an SSD?
>> On the real machine: yes. On the virtual machine: maybe.
>
> I don't, so maybe that's the issue. But running it in tmpfs doesn't 
> change anything for me, still fails (every time).

And now it doesn't fail any more on my HDD (but still in tmpfs). Great.

How about this test case (which is in line with what Paolo identified is 
the potential problem)? :-)

Max

Stefan Weil Feb. 27, 2015, 8:21 p.m. UTC | #15

Am 27.02.2015 um 19:55 schrieb Max Reitz:
> On 2015-02-27 at 13:15, Max Reitz wrote:
>> On 2015-02-27 at 13:12, Stefan Weil wrote:
>>> Am 27.02.2015 um 19:09 schrieb Max Reitz:
>>>> It always fails for me. Do you have an SSD?
>>> On the real machine: yes. On the virtual machine: maybe.
>>
>> I don't, so maybe that's the issue. But running it in tmpfs doesn't 
>> change anything for me, still fails (every time).
>
> And now it doesn't fail any more on my HDD (but still in tmpfs). Great.
>
> How about this test case (which is in line with what Paolo identified 
> is the potential problem)? :-)
>
> Max


No failure with the new one, too.

Stefan

Max Reitz Feb. 27, 2015, 8:23 p.m. UTC | #16

On 2015-02-27 at 15:21, Stefan Weil wrote:
> Am 27.02.2015 um 19:55 schrieb Max Reitz:
>> On 2015-02-27 at 13:15, Max Reitz wrote:
>>> On 2015-02-27 at 13:12, Stefan Weil wrote:
>>>> Am 27.02.2015 um 19:09 schrieb Max Reitz:
>>>>> It always fails for me. Do you have an SSD?
>>>> On the real machine: yes. On the virtual machine: maybe.
>>>
>>> I don't, so maybe that's the issue. But running it in tmpfs doesn't 
>>> change anything for me, still fails (every time).
>>
>> And now it doesn't fail any more on my HDD (but still in tmpfs). Great.
>>
>> How about this test case (which is in line with what Paolo identified 
>> is the potential problem)? :-)
>>
>> Max
>
>
> No failure with the new one, too.

Hm, have you tried it multiple times?

Max

Stefan Weil Feb. 27, 2015, 8:37 p.m. UTC | #17

Am 27.02.2015 um 21:23 schrieb Max Reitz:
> On 2015-02-27 at 15:21, Stefan Weil wrote:
>> Am 27.02.2015 um 19:55 schrieb Max Reitz:
>>> On 2015-02-27 at 13:15, Max Reitz wrote:
>>>> On 2015-02-27 at 13:12, Stefan Weil wrote:
>>>>> Am 27.02.2015 um 19:09 schrieb Max Reitz:
>>>>>> It always fails for me. Do you have an SSD?
>>>>> On the real machine: yes. On the virtual machine: maybe.
>>>>
>>>> I don't, so maybe that's the issue. But running it in tmpfs doesn't 
>>>> change anything for me, still fails (every time).
>>>
>>> And now it doesn't fail any more on my HDD (but still in tmpfs). Great.
>>>
>>> How about this test case (which is in line with what Paolo 
>>> identified is the potential problem)? :-)
>>>
>>> Max
>>
>>
>> No failure with the new one, too.
>
> Hm, have you tried it multiple times?
>
> Max

I tried it only a few times. While running it in a loop I now get failures.

Stefan

Stefan Weil Feb. 27, 2015, 9:44 p.m. UTC | #18

Am 27.02.2015 um 18:42 schrieb Paolo Bonzini:
>
> An optimized fix could be to use a CoRwLock, then:


Note related to our previous discussion: why does the code use CoRwlock 
instead of CoRwLock? Should it be renamed before more code uses it?

Stefan

Max Reitz Feb. 27, 2015, 9:46 p.m. UTC | #19

On 2015-02-27 at 16:44, Stefan Weil wrote:
> Am 27.02.2015 um 18:42 schrieb Paolo Bonzini:
>>
>> An optimized fix could be to use a CoRwLock, then:
>
>
> Note related to our previous discussion: why does the code use 
> CoRwlock instead of CoRwLock? Should it be renamed before more code 
> uses it?

Well, I'm simply not using it (in v3 at least...), so I hope I'll get 
away without touching that. :-)

Max

[v2] block/vdi: Add locking for parallel requests

Commit Message

Comments

Patch