Message ID | 4F8FC0F5.90707@redhat.com |
---|---|
State | New |
Headers | show |
On Thu, Apr 19, 2012 at 09:38:29AM +0200, Kevin Wolf wrote: > Am 19.04.2012 04:44, schrieb Marcelo Tosatti: > > On Mon, Mar 12, 2012 at 04:19:45PM +0100, Kevin Wolf wrote: > >> Signed-off-by: Kevin Wolf <kwolf@redhat.com> > >> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> > >> --- > >> block/qcow2-cluster.c | 55 ++++++++++++++++++++++++++++++++----------------- > >> 1 files changed, 36 insertions(+), 19 deletions(-) > > > > Kevin, > > > > Autotest installed Fedora.8.64 guests boot with a corrupt disk > > (see screenshot). > > > > Reverting > > > > qcow2: Reduce number of I/O requests > > qcow2: Add qcow2_alloc_clusters_at() > > qcow2: Factor out count_cow_clusters > > > > Fixes the problem. > > Do you really need to revert the latter two? They should be really > harmless, the first is adding yet dead code, the second one is trivial > code motion. > > > Let me know if there is a fix available or you need further information. > > No, this is the first report of such corruption, so any further > information would be very helpful. Does qemu-img check find any problems? [root@nehalem kvm]# qemu-img info /root/git/kvm-autotest/client/tests/kvm/images/fc8-64.qcow2 image: /root/git/kvm-autotest/client/tests/kvm/images/fc8-64.qcow2 file format: qcow2 virtual size: 10G (10737418240 bytes) disk size: 4.0G cluster_size: 65536 [root@nehalem kvm]# qemu-img check /root/git/kvm-autotest/client/tests/kvm/images/fc8-64.qcow2 ERROR OFLAG_COPIED: offset=8000000080000000 refcount=2 ERROR refcount block 1 refcount=2 2 errors were found on the image. Data may be corrupted, or further writes to the image may corrupt it. > There is one intended change in functionality in this patch, which is > that it allocates new clusters even when it could satisfy the first part > of the request with already allocated clusters. In order to check if > there is a problem with this scenario, the following patch should revert > to the old behaviour: > > --- a/block/qcow2-cluster.c > +++ b/block/qcow2-cluster.c > @@ -847,7 +847,7 @@ again: > keep_clusters = count_contiguous_clusters(nb_clusters, > s->cluster_size, > &l2_table[l2_index], > 0, 0); > assert(keep_clusters <= nb_clusters); > - nb_clusters -= keep_clusters; > + nb_clusters = 0; > } else { > /* For the moment, overwrite compressed clusters one by one */ > if (cluster_offset & QCOW_OFLAG_COMPRESSED) { > > The rest is meant to be a functionally equivalent rewrite of the old > code that was required in order to allow this change. Testing.
On Thu, Apr 19, 2012 at 05:14:20PM -0300, Marcelo Tosatti wrote: > > There is one intended change in functionality in this patch, which is > > that it allocates new clusters even when it could satisfy the first part > > of the request with already allocated clusters. In order to check if > > there is a problem with this scenario, the following patch should revert > > to the old behaviour: > > > > --- a/block/qcow2-cluster.c > > +++ b/block/qcow2-cluster.c > > @@ -847,7 +847,7 @@ again: > > keep_clusters = count_contiguous_clusters(nb_clusters, > > s->cluster_size, > > &l2_table[l2_index], > > 0, 0); > > assert(keep_clusters <= nb_clusters); > > - nb_clusters -= keep_clusters; > > + nb_clusters = 0; > > } else { > > /* For the moment, overwrite compressed clusters one by one */ > > if (cluster_offset & QCOW_OFLAG_COMPRESSED) { > > > > The rest is meant to be a functionally equivalent rewrite of the old > > code that was required in order to allow this change. > > Testing. Corruption gone with patch above.
Am 19.04.2012 23:18, schrieb Marcelo Tosatti: > On Thu, Apr 19, 2012 at 05:14:20PM -0300, Marcelo Tosatti wrote: >>> There is one intended change in functionality in this patch, which is >>> that it allocates new clusters even when it could satisfy the first part >>> of the request with already allocated clusters. In order to check if >>> there is a problem with this scenario, the following patch should revert >>> to the old behaviour: >>> >>> --- a/block/qcow2-cluster.c >>> +++ b/block/qcow2-cluster.c >>> @@ -847,7 +847,7 @@ again: >>> keep_clusters = count_contiguous_clusters(nb_clusters, >>> s->cluster_size, >>> &l2_table[l2_index], >>> 0, 0); >>> assert(keep_clusters <= nb_clusters); >>> - nb_clusters -= keep_clusters; >>> + nb_clusters = 0; >>> } else { >>> /* For the moment, overwrite compressed clusters one by one */ >>> if (cluster_offset & QCOW_OFLAG_COMPRESSED) { >>> >>> The rest is meant to be a functionally equivalent rewrite of the old >>> code that was required in order to allow this change. >> >> Testing. > > Corruption gone with patch above. Okay, so something must be wrong only with the new code paths, which makes things a bit easier. I'll have another closer look. Stefan, can you re-review 250196f1 as well? Kevin
On Fri, Apr 20, 2012 at 9:24 AM, Kevin Wolf <kwolf@redhat.com> wrote: > Am 19.04.2012 23:18, schrieb Marcelo Tosatti: >> On Thu, Apr 19, 2012 at 05:14:20PM -0300, Marcelo Tosatti wrote: >>>> There is one intended change in functionality in this patch, which is >>>> that it allocates new clusters even when it could satisfy the first part >>>> of the request with already allocated clusters. In order to check if >>>> there is a problem with this scenario, the following patch should revert >>>> to the old behaviour: >>>> >>>> --- a/block/qcow2-cluster.c >>>> +++ b/block/qcow2-cluster.c >>>> @@ -847,7 +847,7 @@ again: >>>> keep_clusters = count_contiguous_clusters(nb_clusters, >>>> s->cluster_size, >>>> &l2_table[l2_index], >>>> 0, 0); >>>> assert(keep_clusters <= nb_clusters); >>>> - nb_clusters -= keep_clusters; >>>> + nb_clusters = 0; >>>> } else { >>>> /* For the moment, overwrite compressed clusters one by one */ >>>> if (cluster_offset & QCOW_OFLAG_COMPRESSED) { >>>> >>>> The rest is meant to be a functionally equivalent rewrite of the old >>>> code that was required in order to allow this change. >>> >>> Testing. >> >> Corruption gone with patch above. > > Okay, so something must be wrong only with the new code paths, which > makes things a bit easier. I'll have another closer look. Stefan, can > you re-review 250196f1 as well? I'm taking a look. Stefan
On Fri, Apr 20, 2012 at 1:27 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote: > On Fri, Apr 20, 2012 at 9:24 AM, Kevin Wolf <kwolf@redhat.com> wrote: >> Am 19.04.2012 23:18, schrieb Marcelo Tosatti: >>> On Thu, Apr 19, 2012 at 05:14:20PM -0300, Marcelo Tosatti wrote: >>>>> There is one intended change in functionality in this patch, which is >>>>> that it allocates new clusters even when it could satisfy the first part >>>>> of the request with already allocated clusters. In order to check if >>>>> there is a problem with this scenario, the following patch should revert >>>>> to the old behaviour: >>>>> >>>>> --- a/block/qcow2-cluster.c >>>>> +++ b/block/qcow2-cluster.c >>>>> @@ -847,7 +847,7 @@ again: >>>>> keep_clusters = count_contiguous_clusters(nb_clusters, >>>>> s->cluster_size, >>>>> &l2_table[l2_index], >>>>> 0, 0); >>>>> assert(keep_clusters <= nb_clusters); >>>>> - nb_clusters -= keep_clusters; >>>>> + nb_clusters = 0; >>>>> } else { >>>>> /* For the moment, overwrite compressed clusters one by one */ >>>>> if (cluster_offset & QCOW_OFLAG_COMPRESSED) { >>>>> >>>>> The rest is meant to be a functionally equivalent rewrite of the old >>>>> code that was required in order to allow this change. >>>> >>>> Testing. >>> >>> Corruption gone with patch above. >> >> Okay, so something must be wrong only with the new code paths, which >> makes things a bit easier. I'll have another closer look. Stefan, can >> you re-review 250196f1 as well? > > I'm taking a look. Just looking at the qemu-img check output that Marcelo posted I'm interpreting that as OFLAG_COPIED was set on the offset but its refcount was 2. The refcount shouldn't be 2 unless internal snapshots were used. So we probably incremented the refcount either too often or for the wrong block (which is more likely since the guest sees corruption). Stefan
--- a/block/qcow2-cluster.c +++ b/block/qcow2-cluster.c @@ -847,7 +847,7 @@ again: keep_clusters = count_contiguous_clusters(nb_clusters, s->cluster_size, &l2_table[l2_index], 0, 0); assert(keep_clusters <= nb_clusters); - nb_clusters -= keep_clusters; + nb_clusters = 0; } else {