mbox series

[v7,0/3] Fix qcow2+luks corruption introduced by commit 8ac0f15f335

Message ID 20190915203655.21638-1-mlevitsk@redhat.com
Headers show
Series Fix qcow2+luks corruption introduced by commit 8ac0f15f335 | expand

Message

Maxim Levitsky Sept. 15, 2019, 8:36 p.m. UTC
Commit 8ac0f15f335 accidently broke the COW of non changed areas
of newly allocated clusters, when the write spans multiple clusters,
and needs COW both prior and after the write.
This results in 'after' COW area being encrypted with wrong
sector address, which render it corrupted.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1745922

CC: qemu-stable <qemu-stable@nongnu.org>

V2: grammar, spelling and code style fixes.
V3: more fixes after the review.
V4: addressed review comments from Max Reitz,
    and futher refactored the qcow2_co_encrypt to just take full host and guest offset
    which simplifies everything.

V5: reworked the patches so one of them fixes the bug
    only and other one is just refactoring

V6: removed do_perform_cow_encrypt

V7: removed do_perform_cow_encrypt take two, this
    time I hopefully did that correctly :-)
    Also updated commit names and messages a bit

Best regards,
	Maxim Levitsky

Maxim Levitsky (3):
  block/qcow2: Fix corruption introduced by commit 8ac0f15f335
  block/qcow2: refactor encryption code
  qemu-iotests: Add test for bz #1745922

 block/qcow2-cluster.c      | 40 ++++++-----------
 block/qcow2-threads.c      | 63 ++++++++++++++++++++------
 block/qcow2.c              |  5 ++-
 block/qcow2.h              |  8 ++--
 tests/qemu-iotests/263     | 91 ++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/263.out | 40 +++++++++++++++++
 tests/qemu-iotests/group   |  1 +
 7 files changed, 202 insertions(+), 46 deletions(-)
 create mode 100755 tests/qemu-iotests/263
 create mode 100644 tests/qemu-iotests/263.out

Comments

Max Reitz Sept. 16, 2019, 1:39 p.m. UTC | #1
On 15.09.19 22:36, Maxim Levitsky wrote:
> Commit 8ac0f15f335 accidently broke the COW of non changed areas
> of newly allocated clusters, when the write spans multiple clusters,
> and needs COW both prior and after the write.
> This results in 'after' COW area being encrypted with wrong
> sector address, which render it corrupted.
> 
> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1745922
> 
> CC: qemu-stable <qemu-stable@nongnu.org>
> 
> V2: grammar, spelling and code style fixes.
> V3: more fixes after the review.
> V4: addressed review comments from Max Reitz,
>     and futher refactored the qcow2_co_encrypt to just take full host and guest offset
>     which simplifies everything.
> 
> V5: reworked the patches so one of them fixes the bug
>     only and other one is just refactoring
> 
> V6: removed do_perform_cow_encrypt
> 
> V7: removed do_perform_cow_encrypt take two, this
>     time I hopefully did that correctly :-)
>     Also updated commit names and messages a bit

Luckily for you (maybe), Vladimir’s series doesn‘t quite pass the
iotests for me, so unfortunately (I find it unfortunate) I had to remove
it from my branch.  Thus, the conflicts are much more tame and I felt
comfortable taking the series to my branch (with the remaining trivial
conflicts resolved, and with Vladimir’s suggestion applied):

https://git.xanclic.moe/XanClic/qemu/commits/branch/block

Max
Maxim Levitsky Sept. 16, 2019, 1:59 p.m. UTC | #2
On Mon, 2019-09-16 at 15:39 +0200, Max Reitz wrote:
> On 15.09.19 22:36, Maxim Levitsky wrote:
> > Commit 8ac0f15f335 accidently broke the COW of non changed areas
> > of newly allocated clusters, when the write spans multiple clusters,
> > and needs COW both prior and after the write.
> > This results in 'after' COW area being encrypted with wrong
> > sector address, which render it corrupted.
> > 
> > Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1745922
> > 
> > CC: qemu-stable <qemu-stable@nongnu.org>
> > 
> > V2: grammar, spelling and code style fixes.
> > V3: more fixes after the review.
> > V4: addressed review comments from Max Reitz,
> >     and futher refactored the qcow2_co_encrypt to just take full host and guest offset
> >     which simplifies everything.
> > 
> > V5: reworked the patches so one of them fixes the bug
> >     only and other one is just refactoring
> > 
> > V6: removed do_perform_cow_encrypt
> > 
> > V7: removed do_perform_cow_encrypt take two, this
> >     time I hopefully did that correctly :-)
> >     Also updated commit names and messages a bit
> 
> Luckily for you (maybe), Vladimir’s series doesn‘t quite pass the
> iotests for me, so unfortunately (I find it unfortunate) I had to remove
> it from my branch.  Thus, the conflicts are much more tame and I felt
> comfortable taking the series to my branch (with the remaining trivial
> conflicts resolved, and with Vladimir’s suggestion applied):
> 
> https://git.xanclic.moe/XanClic/qemu/commits/branch/block


First of all, Thanks!

I don't know if this is luckily for me since I already rebased my series on top of  
https://git.xanclic.moe/XanClic/qemu.git,
and run all qcow2 iotests, and only tests 
162 169 194 196 234 262 failed, and I know that 162 always fails
due to that kernel change I talked about here few days ago,
and rest for the AF_UNIX path len, which I need to do something
about in the long term. I sometimes do a separate build in 
directory which path doesn't trigger this, and sometimes,
when I know that I haven't done significant changes to the patches,
I just let these tests fail. In long term, maybe even in a few days
I'll allocate some time to rethink the build environment here to
fix that permanently.

Now I am rerunning the iotests just for fun, in short enough directory
to see if I can reproduce the failure that you had. After looking
in your report, that iotest 026 fails, it does pass here, but
then I am only running these iotests on my laptop so I probably
don't trigger the race you were able to.

So thanks again!
Best regards,
	Maxim Levitsky
Maxim Levitsky Sept. 16, 2019, 2:59 p.m. UTC | #3
On Mon, 2019-09-16 at 16:59 +0300, Maxim Levitsky wrote:
> On Mon, 2019-09-16 at 15:39 +0200, Max Reitz wrote:
> > On 15.09.19 22:36, Maxim Levitsky wrote:
> > > Commit 8ac0f15f335 accidently broke the COW of non changed areas
> > > of newly allocated clusters, when the write spans multiple clusters,
> > > and needs COW both prior and after the write.
> > > This results in 'after' COW area being encrypted with wrong
> > > sector address, which render it corrupted.
> > > 
> > > Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1745922
> > > 
> > > CC: qemu-stable <qemu-stable@nongnu.org>
> > > 
> > > V2: grammar, spelling and code style fixes.
> > > V3: more fixes after the review.
> > > V4: addressed review comments from Max Reitz,
> > >     and futher refactored the qcow2_co_encrypt to just take full host and guest offset
> > >     which simplifies everything.
> > > 
> > > V5: reworked the patches so one of them fixes the bug
> > >     only and other one is just refactoring
> > > 
> > > V6: removed do_perform_cow_encrypt
> > > 
> > > V7: removed do_perform_cow_encrypt take two, this
> > >     time I hopefully did that correctly :-)
> > >     Also updated commit names and messages a bit
> > 
> > Luckily for you (maybe), Vladimir’s series doesn‘t quite pass the
> > iotests for me, so unfortunately (I find it unfortunate) I had to remove
> > it from my branch.  Thus, the conflicts are much more tame and I felt
> > comfortable taking the series to my branch (with the remaining trivial
> > conflicts resolved, and with Vladimir’s suggestion applied):
> > 
> > https://git.xanclic.moe/XanClic/qemu/commits/branch/block
> 
> 
> First of all, Thanks!
> 
> I don't know if this is luckily for me since I already rebased my series on top of  
> https://git.xanclic.moe/XanClic/qemu.git,
> and run all qcow2 iotests, and only tests 
> 162 169 194 196 234 262 failed, and I know that 162 always fails
> due to that kernel change I talked about here few days ago,
> and rest for the AF_UNIX path len, which I need to do something
> about in the long term. I sometimes do a separate build in 
> directory which path doesn't trigger this, and sometimes,
> when I know that I haven't done significant changes to the patches,
> I just let these tests fail. In long term, maybe even in a few days
> I'll allocate some time to rethink the build environment here to
> fix that permanently.
> 
> Now I am rerunning the iotests just for fun, in short enough directory
> to see if I can reproduce the failure that you had. After looking
> in your report, that iotest 026 fails, it does pass here, but
> then I am only running these iotests on my laptop so I probably
> don't trigger the race you were able to.

Passed all the tests but 162, not that it matters much to be honest.
Best regards,
	Maxim Levitsky