Message ID | 1425045947-9271-1-git-send-email-mreitz@redhat.com |
---|---|
State | New |
Headers | show |
On Fri, Feb 27, 2015 at 09:05:47AM -0500, Max Reitz wrote: > Concurrently modifying the bmap does not seem to be a good idea; this patch adds > a lock for it. See https://bugs.launchpad.net/qemu/+bug/1422307 for what > can go wrong without. > > Cc: qemu-stable <qemu-stable@nongnu.org> > Signed-off-by: Max Reitz <mreitz@redhat.com> > --- > v2: > - Make the mutex cover vdi_co_write() completely [Kevin] > - Add a TODO comment [Kevin] > --- > block/vdi.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/block/vdi.c b/block/vdi.c > index 74030c6..f5f42ef 100644 > --- a/block/vdi.c > +++ b/block/vdi.c > @@ -51,6 +51,7 @@ > > #include "qemu-common.h" > #include "block/block_int.h" > +#include "block/coroutine.h" > #include "qemu/module.h" > #include "migration/migration.h" > > @@ -196,6 +197,8 @@ typedef struct { > /* VDI header (converted to host endianness). */ > VdiHeader header; > > + CoMutex bmap_lock; > + > Error *migration_blocker; > } BDRVVdiState; > > @@ -498,6 +501,8 @@ static int vdi_open(BlockDriverState *bs, QDict *options, int flags, > goto fail_free_bmap; > } > > + qemu_co_mutex_init(&s->bmap_lock); > + > /* Disable migration when vdi images are used */ > error_set(&s->migration_blocker, > QERR_BLOCK_FORMAT_FEATURE_NOT_SUPPORTED, > @@ -607,6 +612,9 @@ static int vdi_co_write(BlockDriverState *bs, > > logout("\n"); > > + /* TODO: Figure out why this is necessary */ > + qemu_co_mutex_lock(&s->bmap_lock); If we don't know why bmap_lock works, it would be more approprate to take the same approach as VMDK and VHDX where there is a simply s->lock that protects all reads and writes. That way we know for sure there is no parallel I/O going on. (Since the problem is not understood, maybe reads in parallel with writes could also cause problems. Better to really do a coarse lock instead of just bmap_lock in write.) Stefan
On 2015-02-27 at 11:57, Stefan Hajnoczi wrote: > On Fri, Feb 27, 2015 at 09:05:47AM -0500, Max Reitz wrote: >> Concurrently modifying the bmap does not seem to be a good idea; this patch adds >> a lock for it. See https://bugs.launchpad.net/qemu/+bug/1422307 for what >> can go wrong without. >> >> Cc: qemu-stable <qemu-stable@nongnu.org> >> Signed-off-by: Max Reitz <mreitz@redhat.com> >> --- >> v2: >> - Make the mutex cover vdi_co_write() completely [Kevin] >> - Add a TODO comment [Kevin] >> --- >> block/vdi.c | 11 +++++++++++ >> 1 file changed, 11 insertions(+) >> >> diff --git a/block/vdi.c b/block/vdi.c >> index 74030c6..f5f42ef 100644 >> --- a/block/vdi.c >> +++ b/block/vdi.c >> @@ -51,6 +51,7 @@ >> >> #include "qemu-common.h" >> #include "block/block_int.h" >> +#include "block/coroutine.h" >> #include "qemu/module.h" >> #include "migration/migration.h" >> >> @@ -196,6 +197,8 @@ typedef struct { >> /* VDI header (converted to host endianness). */ >> VdiHeader header; >> >> + CoMutex bmap_lock; >> + >> Error *migration_blocker; >> } BDRVVdiState; >> >> @@ -498,6 +501,8 @@ static int vdi_open(BlockDriverState *bs, QDict *options, int flags, >> goto fail_free_bmap; >> } >> >> + qemu_co_mutex_init(&s->bmap_lock); >> + >> /* Disable migration when vdi images are used */ >> error_set(&s->migration_blocker, >> QERR_BLOCK_FORMAT_FEATURE_NOT_SUPPORTED, >> @@ -607,6 +612,9 @@ static int vdi_co_write(BlockDriverState *bs, >> >> logout("\n"); >> >> + /* TODO: Figure out why this is necessary */ >> + qemu_co_mutex_lock(&s->bmap_lock); > If we don't know why bmap_lock works, it would be more approprate to > take the same approach as VMDK and VHDX where there is a simply s->lock > that protects all reads and writes. That way we know for sure there is > no parallel I/O going on. > > (Since the problem is not understood, maybe reads in parallel with > writes could also cause problems. Better to really do a coarse lock > instead of just bmap_lock in write.) OK, will do. Max
Am 27.02.2015 um 17:57 schrieb Stefan Hajnoczi: > On Fri, Feb 27, 2015 at 09:05:47AM -0500, Max Reitz wrote: >> Concurrently modifying the bmap does not seem to be a good idea; this patch adds >> a lock for it. See https://bugs.launchpad.net/qemu/+bug/1422307 for what >> can go wrong without. >> >> Cc: qemu-stable <qemu-stable@nongnu.org> >> Signed-off-by: Max Reitz <mreitz@redhat.com> >> --- >> v2: >> - Make the mutex cover vdi_co_write() completely [Kevin] >> - Add a TODO comment [Kevin] [...] >> If we don't know why bmap_lock works, it would be more approprate to >> take the same approach as VMDK and VHDX where there is a simply s->lock >> that protects all reads and writes. That way we know for sure there is >> no parallel I/O going on. >> >> (Since the problem is not understood, maybe reads in parallel with >> writes could also cause problems. Better to really do a coarse lock >> instead of just bmap_lock in write.) >> >> Stefan block/vdi.c was never written for multi-threaded access, see my comment in the header of block/vdi.c: * The code is not thread safe (missing locks for changes in header and * block table, no problem with current QEMU). This was true in the past, but obviously later multi-threaded access was introduced for QEMU. Locking was added for qcow2 and other drivers in 2012 and 2013, but never for vdi. I must admit that I don't know which parts of the block filesystem drivers potentially run in parallel threads. Ideally there would be one or more test cases which test multi-threaded operations and which trigger a failure with the current vdi code. If I had a simple test scenario, I could have a look on the problem. The VMDK approach is fine as an intermediate work around, but please use conditional compilation to allow easy tests without coarse locks (and update the comments :-)). Regards Stefan (Weil)
On 2015-02-27 at 12:25, Stefan Weil wrote: > Am 27.02.2015 um 17:57 schrieb Stefan Hajnoczi: >> On Fri, Feb 27, 2015 at 09:05:47AM -0500, Max Reitz wrote: >>> Concurrently modifying the bmap does not seem to be a good idea; this patch adds >>> a lock for it. See https://bugs.launchpad.net/qemu/+bug/1422307 for what >>> can go wrong without. >>> >>> Cc: qemu-stable <qemu-stable@nongnu.org> >>> Signed-off-by: Max Reitz <mreitz@redhat.com> >>> --- >>> v2: >>> - Make the mutex cover vdi_co_write() completely [Kevin] >>> - Add a TODO comment [Kevin] > [...] >>> If we don't know why bmap_lock works, it would be more approprate to >>> take the same approach as VMDK and VHDX where there is a simply s->lock >>> that protects all reads and writes. That way we know for sure there is >>> no parallel I/O going on. >>> >>> (Since the problem is not understood, maybe reads in parallel with >>> writes could also cause problems. Better to really do a coarse lock >>> instead of just bmap_lock in write.) >>> >>> Stefan > block/vdi.c was never written for multi-threaded access, see my comment > in the header of block/vdi.c: > > * The code is not thread safe (missing locks for changes in header and > * block table, no problem with current QEMU). > > This was true in the past, but obviously later multi-threaded access was > introduced for QEMU. Locking was added for qcow2 and other drivers in > 2012 and 2013, but never for vdi. > > I must admit that I don't know which parts of the block filesystem > drivers potentially run in parallel threads.Ideally there would be one > or more test cases which test multi-threaded operations and which > trigger a failure with the current vdi code. > > If I had a simple test scenario, I could have a look on the problem. I have one for you. See the attached ruby script. (If there are no "Pattern verification failed" messages, everything is good) > The VMDK approach is fine as an intermediate work around, but please use > conditional compilation to allow easy tests without coarse locks (and > update the comments :-)). Will a macro defined in vdi.c be enough? Max
Am 27.02.2015 um 18:28 schrieb Max Reitz: > On 2015-02-27 at 12:25, Stefan Weil wrote: >> block/vdi.c was never written for multi-threaded access, see my comment >> in the header of block/vdi.c: >> >> * The code is not thread safe (missing locks for changes in header and >> * block table, no problem with current QEMU). >> >> This was true in the past, but obviously later multi-threaded access was >> introduced for QEMU. Locking was added for qcow2 and other drivers in >> 2012 and 2013, but never for vdi. >> >> I must admit that I don't know which parts of the block filesystem >> drivers potentially run in parallel threads.Ideally there would be one >> or more test cases which test multi-threaded operations and which >> trigger a failure with the current vdi code. >> >> If I had a simple test scenario, I could have a look on the problem. > > I have one for you. See the attached ruby script. > > (If there are no "Pattern verification failed" messages, everything is > good) > >> The VMDK approach is fine as an intermediate work around, but please use >> conditional compilation to allow easy tests without coarse locks (and >> update the comments :-)). > > Will a macro defined in vdi.c be enough? Yes, that would be fine. vdi.c already has several locally defined CONFIG_VDI_... macros. Stefan
On 27/02/2015 18:25, Stefan Weil wrote: > block/vdi.c was never written for multi-threaded access, see my comment > in the header of block/vdi.c: It is not using threads, only coroutines. Preemption points of coroutines are well defined, and I think that the bug could be present even in the initial AIO-based version. > * The code is not thread safe (missing locks for changes in header and > * block table, no problem with current QEMU). > > This was true in the past, but obviously later multi-threaded access was > introduced for QEMU. Locking was added for qcow2 and other drivers in > 2012 and 2013, but never for vdi. qcow2 already had locking (based on AsyncContexts) before the conversion to coroutines. Other drivers implicitly had locking because they were synchronous; locking was added because the conversion to coroutines made them asynchronous. vdi never got its locking because it was already asynchronous. Paolo
On 27/02/2015 15:05, Max Reitz wrote: > Concurrently modifying the bmap does not seem to be a good idea; this patch adds > a lock for it. See https://bugs.launchpad.net/qemu/+bug/1422307 for what > can go wrong without. > > Cc: qemu-stable <qemu-stable@nongnu.org> > Signed-off-by: Max Reitz <mreitz@redhat.com> > --- > v2: > - Make the mutex cover vdi_co_write() completely [Kevin] > - Add a TODO comment [Kevin] I think I know what the bug is. Suppose you have two concurrent writes to a non-allocated block, one at 16K...32K (in bytes) and one at 32K...48K. The first write is enlarged to contain zeros, the second is not. Then you have two writes in flight: 0 zeros ... zeros 16K data1 ... data1 32K zeros data2 ... zeros data2 48K zeros ... zeros 64K And the contents of 32K...48K are undefined. If the above diagnosis is correct, I'm not even sure why Max's v1 patch worked... An optimized fix could be to use a CoRwLock, then: - take it shared (read) around the write in the "VDI_IS_ALLOCATED(bmap_entry)" path - take it exclusive (write) around the write in the "!VDI_IS_ALLOCATED(bmap_entry)" path Paolo > --- > block/vdi.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/block/vdi.c b/block/vdi.c > index 74030c6..f5f42ef 100644 > --- a/block/vdi.c > +++ b/block/vdi.c > @@ -51,6 +51,7 @@ > > #include "qemu-common.h" > #include "block/block_int.h" > +#include "block/coroutine.h" > #include "qemu/module.h" > #include "migration/migration.h" > > @@ -196,6 +197,8 @@ typedef struct { > /* VDI header (converted to host endianness). */ > VdiHeader header; > > + CoMutex bmap_lock; > + > Error *migration_blocker; > } BDRVVdiState; > > @@ -498,6 +501,8 @@ static int vdi_open(BlockDriverState *bs, QDict *options, int flags, > goto fail_free_bmap; > } > > + qemu_co_mutex_init(&s->bmap_lock); > + > /* Disable migration when vdi images are used */ > error_set(&s->migration_blocker, > QERR_BLOCK_FORMAT_FEATURE_NOT_SUPPORTED, > @@ -607,6 +612,9 @@ static int vdi_co_write(BlockDriverState *bs, > > logout("\n"); > > + /* TODO: Figure out why this is necessary */ > + qemu_co_mutex_lock(&s->bmap_lock); > + > while (ret >= 0 && nb_sectors > 0) { > block_index = sector_num / s->block_sectors; > sector_in_block = sector_num % s->block_sectors; > @@ -656,6 +664,7 @@ static int vdi_co_write(BlockDriverState *bs, > > logout("finished data write\n"); > if (ret < 0) { > + qemu_co_mutex_unlock(&s->bmap_lock); > return ret; > } > > @@ -674,6 +683,7 @@ static int vdi_co_write(BlockDriverState *bs, > block = NULL; > > if (ret < 0) { > + qemu_co_mutex_unlock(&s->bmap_lock); > return ret; > } > > @@ -690,6 +700,7 @@ static int vdi_co_write(BlockDriverState *bs, > ret = bdrv_write(bs->file, offset, base, n_sectors); > } > > + qemu_co_mutex_unlock(&s->bmap_lock); > return ret; > } > >
Am 27.02.2015 um 18:28 schrieb Max Reitz: > On 2015-02-27 at 12:25, Stefan Weil wrote: >> If I had a simple test scenario, I could have a look on the problem. > > I have one for you. See the attached ruby script. > > (If there are no "Pattern verification failed" messages, everything is > good) Your test runs without any failure message in my four test scenarios: * test on x86_64 virtualized host, Debian Wheezy, with and without debug enabled * test on x86_64 Intel Core i7, Debian Jessie, with and without debug enabled I repeated the test several times. Do I have to run it in an endless loop? Stefan
On 2015-02-27 at 12:42, Paolo Bonzini wrote: > > On 27/02/2015 15:05, Max Reitz wrote: >> Concurrently modifying the bmap does not seem to be a good idea; this patch adds >> a lock for it. See https://bugs.launchpad.net/qemu/+bug/1422307 for what >> can go wrong without. >> >> Cc: qemu-stable <qemu-stable@nongnu.org> >> Signed-off-by: Max Reitz <mreitz@redhat.com> >> --- >> v2: >> - Make the mutex cover vdi_co_write() completely [Kevin] >> - Add a TODO comment [Kevin] > I think I know what the bug is. Suppose you have two concurrent writes > to a non-allocated block, one at 16K...32K (in bytes) and one at > 32K...48K. The first write is enlarged to contain zeros, the second is > not. Then you have two writes in flight: > > 0 zeros > ... zeros > 16K data1 > ... data1 > 32K zeros data2 > ... zeros data2 > 48K zeros > ... zeros > 64K > > And the contents of 32K...48K are undefined. If the above diagnosis is > correct, I'm not even sure why Max's v1 patch worked... Maybe that's an issue, too; but the test case I sent out does 1 MB requests (and it fails), so this shouldn't matter there. > An optimized fix could be to use a CoRwLock, then: Yes, I'm actually already working on that. Max > - take it shared (read) around the write in the > "VDI_IS_ALLOCATED(bmap_entry)" path > > - take it exclusive (write) around the write in the > "!VDI_IS_ALLOCATED(bmap_entry)" path > > Paolo > >> --- >> block/vdi.c | 11 +++++++++++ >> 1 file changed, 11 insertions(+) >> >> diff --git a/block/vdi.c b/block/vdi.c >> index 74030c6..f5f42ef 100644 >> --- a/block/vdi.c >> +++ b/block/vdi.c >> @@ -51,6 +51,7 @@ >> >> #include "qemu-common.h" >> #include "block/block_int.h" >> +#include "block/coroutine.h" >> #include "qemu/module.h" >> #include "migration/migration.h" >> >> @@ -196,6 +197,8 @@ typedef struct { >> /* VDI header (converted to host endianness). */ >> VdiHeader header; >> >> + CoMutex bmap_lock; >> + >> Error *migration_blocker; >> } BDRVVdiState; >> >> @@ -498,6 +501,8 @@ static int vdi_open(BlockDriverState *bs, QDict *options, int flags, >> goto fail_free_bmap; >> } >> >> + qemu_co_mutex_init(&s->bmap_lock); >> + >> /* Disable migration when vdi images are used */ >> error_set(&s->migration_blocker, >> QERR_BLOCK_FORMAT_FEATURE_NOT_SUPPORTED, >> @@ -607,6 +612,9 @@ static int vdi_co_write(BlockDriverState *bs, >> >> logout("\n"); >> >> + /* TODO: Figure out why this is necessary */ >> + qemu_co_mutex_lock(&s->bmap_lock); >> + >> while (ret >= 0 && nb_sectors > 0) { >> block_index = sector_num / s->block_sectors; >> sector_in_block = sector_num % s->block_sectors; >> @@ -656,6 +664,7 @@ static int vdi_co_write(BlockDriverState *bs, >> >> logout("finished data write\n"); >> if (ret < 0) { >> + qemu_co_mutex_unlock(&s->bmap_lock); >> return ret; >> } >> >> @@ -674,6 +683,7 @@ static int vdi_co_write(BlockDriverState *bs, >> block = NULL; >> >> if (ret < 0) { >> + qemu_co_mutex_unlock(&s->bmap_lock); >> return ret; >> } >> >> @@ -690,6 +700,7 @@ static int vdi_co_write(BlockDriverState *bs, >> ret = bdrv_write(bs->file, offset, base, n_sectors); >> } >> >> + qemu_co_mutex_unlock(&s->bmap_lock); >> return ret; >> } >> >>
On 2015-02-27 at 13:07, Stefan Weil wrote: > Am 27.02.2015 um 18:28 schrieb Max Reitz: >> On 2015-02-27 at 12:25, Stefan Weil wrote: >>> If I had a simple test scenario, I could have a look on the problem. >> I have one for you. See the attached ruby script. >> >> (If there are no "Pattern verification failed" messages, everything is >> good) > Your test runs without any failure message in my four test scenarios: > > * test on x86_64 virtualized host, Debian Wheezy, with and without debug > enabled > > * test on x86_64 Intel Core i7, Debian Jessie, with and without debug > enabled > > I repeated the test several times. Do I have to run it in an endless loop? It always fails for me. Do you have an SSD? Max
Am 27.02.2015 um 19:09 schrieb Max Reitz:
> It always fails for me. Do you have an SSD?
On the real machine: yes. On the virtual machine: maybe.
On 2015-02-27 at 13:12, Stefan Weil wrote: > Am 27.02.2015 um 19:09 schrieb Max Reitz: >> It always fails for me. Do you have an SSD? > On the real machine: yes. On the virtual machine: maybe. I don't, so maybe that's the issue. But running it in tmpfs doesn't change anything for me, still fails (every time). Max
On 2015-02-27 at 13:09, Max Reitz wrote: > On 2015-02-27 at 12:42, Paolo Bonzini wrote: >> >> On 27/02/2015 15:05, Max Reitz wrote: >>> Concurrently modifying the bmap does not seem to be a good idea; >>> this patch adds >>> a lock for it. See https://bugs.launchpad.net/qemu/+bug/1422307 for >>> what >>> can go wrong without. >>> >>> Cc: qemu-stable <qemu-stable@nongnu.org> >>> Signed-off-by: Max Reitz <mreitz@redhat.com> >>> --- >>> v2: >>> - Make the mutex cover vdi_co_write() completely [Kevin] >>> - Add a TODO comment [Kevin] >> I think I know what the bug is. Suppose you have two concurrent writes >> to a non-allocated block, one at 16K...32K (in bytes) and one at >> 32K...48K. The first write is enlarged to contain zeros, the second is >> not. Then you have two writes in flight: >> >> 0 zeros >> ... zeros >> 16K data1 >> ... data1 >> 32K zeros data2 >> ... zeros data2 >> 48K zeros >> ... zeros >> 64K >> >> And the contents of 32K...48K are undefined. If the above diagnosis is >> correct, I'm not even sure why Max's v1 patch worked... > > Maybe that's an issue, too; but the test case I sent out does 1 MB > requests (and it fails), so this shouldn't matter there. Considering that test case didn't work for Stefan (Weil), and it fails in a pretty strange way for me (no output from the qemu-io command at all, and while most reads from raw were successful, all reads from vdi failed (the pattern verification failed), maybe that's something completely different. Indeed, when I do sub-MB writes, I get sporadic errors which seem much more related to the original bug report, so it's probably the issue you found that's the real problem. Also, my test case suddenly stopped reproducing the issue on my HDD and only does it on tmpfs. Weird. Max >> An optimized fix could be to use a CoRwLock, then: > > Yes, I'm actually already working on that. > > Max > >> - take it shared (read) around the write in the >> "VDI_IS_ALLOCATED(bmap_entry)" path >> >> - take it exclusive (write) around the write in the >> "!VDI_IS_ALLOCATED(bmap_entry)" path >> >> Paolo
On 2015-02-27 at 13:15, Max Reitz wrote: > On 2015-02-27 at 13:12, Stefan Weil wrote: >> Am 27.02.2015 um 19:09 schrieb Max Reitz: >>> It always fails for me. Do you have an SSD? >> On the real machine: yes. On the virtual machine: maybe. > > I don't, so maybe that's the issue. But running it in tmpfs doesn't > change anything for me, still fails (every time). And now it doesn't fail any more on my HDD (but still in tmpfs). Great. How about this test case (which is in line with what Paolo identified is the potential problem)? :-) Max
Am 27.02.2015 um 19:55 schrieb Max Reitz: > On 2015-02-27 at 13:15, Max Reitz wrote: >> On 2015-02-27 at 13:12, Stefan Weil wrote: >>> Am 27.02.2015 um 19:09 schrieb Max Reitz: >>>> It always fails for me. Do you have an SSD? >>> On the real machine: yes. On the virtual machine: maybe. >> >> I don't, so maybe that's the issue. But running it in tmpfs doesn't >> change anything for me, still fails (every time). > > And now it doesn't fail any more on my HDD (but still in tmpfs). Great. > > How about this test case (which is in line with what Paolo identified > is the potential problem)? :-) > > Max No failure with the new one, too. Stefan
On 2015-02-27 at 15:21, Stefan Weil wrote: > Am 27.02.2015 um 19:55 schrieb Max Reitz: >> On 2015-02-27 at 13:15, Max Reitz wrote: >>> On 2015-02-27 at 13:12, Stefan Weil wrote: >>>> Am 27.02.2015 um 19:09 schrieb Max Reitz: >>>>> It always fails for me. Do you have an SSD? >>>> On the real machine: yes. On the virtual machine: maybe. >>> >>> I don't, so maybe that's the issue. But running it in tmpfs doesn't >>> change anything for me, still fails (every time). >> >> And now it doesn't fail any more on my HDD (but still in tmpfs). Great. >> >> How about this test case (which is in line with what Paolo identified >> is the potential problem)? :-) >> >> Max > > > No failure with the new one, too. Hm, have you tried it multiple times? Max
Am 27.02.2015 um 21:23 schrieb Max Reitz: > On 2015-02-27 at 15:21, Stefan Weil wrote: >> Am 27.02.2015 um 19:55 schrieb Max Reitz: >>> On 2015-02-27 at 13:15, Max Reitz wrote: >>>> On 2015-02-27 at 13:12, Stefan Weil wrote: >>>>> Am 27.02.2015 um 19:09 schrieb Max Reitz: >>>>>> It always fails for me. Do you have an SSD? >>>>> On the real machine: yes. On the virtual machine: maybe. >>>> >>>> I don't, so maybe that's the issue. But running it in tmpfs doesn't >>>> change anything for me, still fails (every time). >>> >>> And now it doesn't fail any more on my HDD (but still in tmpfs). Great. >>> >>> How about this test case (which is in line with what Paolo >>> identified is the potential problem)? :-) >>> >>> Max >> >> >> No failure with the new one, too. > > Hm, have you tried it multiple times? > > Max I tried it only a few times. While running it in a loop I now get failures. Stefan
Am 27.02.2015 um 18:42 schrieb Paolo Bonzini: > > An optimized fix could be to use a CoRwLock, then: Note related to our previous discussion: why does the code use CoRwlock instead of CoRwLock? Should it be renamed before more code uses it? Stefan
On 2015-02-27 at 16:44, Stefan Weil wrote: > Am 27.02.2015 um 18:42 schrieb Paolo Bonzini: >> >> An optimized fix could be to use a CoRwLock, then: > > > Note related to our previous discussion: why does the code use > CoRwlock instead of CoRwLock? Should it be renamed before more code > uses it? Well, I'm simply not using it (in v3 at least...), so I hope I'll get away without touching that. :-) Max
diff --git a/block/vdi.c b/block/vdi.c index 74030c6..f5f42ef 100644 --- a/block/vdi.c +++ b/block/vdi.c @@ -51,6 +51,7 @@ #include "qemu-common.h" #include "block/block_int.h" +#include "block/coroutine.h" #include "qemu/module.h" #include "migration/migration.h" @@ -196,6 +197,8 @@ typedef struct { /* VDI header (converted to host endianness). */ VdiHeader header; + CoMutex bmap_lock; + Error *migration_blocker; } BDRVVdiState; @@ -498,6 +501,8 @@ static int vdi_open(BlockDriverState *bs, QDict *options, int flags, goto fail_free_bmap; } + qemu_co_mutex_init(&s->bmap_lock); + /* Disable migration when vdi images are used */ error_set(&s->migration_blocker, QERR_BLOCK_FORMAT_FEATURE_NOT_SUPPORTED, @@ -607,6 +612,9 @@ static int vdi_co_write(BlockDriverState *bs, logout("\n"); + /* TODO: Figure out why this is necessary */ + qemu_co_mutex_lock(&s->bmap_lock); + while (ret >= 0 && nb_sectors > 0) { block_index = sector_num / s->block_sectors; sector_in_block = sector_num % s->block_sectors; @@ -656,6 +664,7 @@ static int vdi_co_write(BlockDriverState *bs, logout("finished data write\n"); if (ret < 0) { + qemu_co_mutex_unlock(&s->bmap_lock); return ret; } @@ -674,6 +683,7 @@ static int vdi_co_write(BlockDriverState *bs, block = NULL; if (ret < 0) { + qemu_co_mutex_unlock(&s->bmap_lock); return ret; } @@ -690,6 +700,7 @@ static int vdi_co_write(BlockDriverState *bs, ret = bdrv_write(bs->file, offset, base, n_sectors); } + qemu_co_mutex_unlock(&s->bmap_lock); return ret; }
Concurrently modifying the bmap does not seem to be a good idea; this patch adds a lock for it. See https://bugs.launchpad.net/qemu/+bug/1422307 for what can go wrong without. Cc: qemu-stable <qemu-stable@nongnu.org> Signed-off-by: Max Reitz <mreitz@redhat.com> --- v2: - Make the mutex cover vdi_co_write() completely [Kevin] - Add a TODO comment [Kevin] --- block/vdi.c | 11 +++++++++++ 1 file changed, 11 insertions(+)