diff mbox series

[for-2.12,v4] iotests: Test abnormally large size in compressed cluster descriptor

Message ID 20180322124155.16257-1-berto@igalia.com
State New
Headers show
Series [for-2.12,v4] iotests: Test abnormally large size in compressed cluster descriptor | expand

Commit Message

Alberto Garcia March 22, 2018, 12:41 p.m. UTC
L2 entries for compressed clusters have a field that indicates the
number of sectors used to store the data in the image.

That's however not the size of the compressed data itself, just the
number of sectors where that data is located. The actual data size is
usually not a multiple of the sector size, and therefore cannot be
represented with this field.

The way it works is that QEMU reads all the specified sectors and
starts decompressing the data until there's enough to recover the
original uncompressed cluster. If there are any bytes left that
haven't been decompressed they are simply ignored.

One consequence of this is that even if the size field is larger than
it needs to be QEMU can handle it just fine: it will read more data
from disk but it will ignore the extra bytes.

This test creates an image with two compressed clusters that use 5
sectors (2.5 KB) each, increases the size field to the maximum (8192
sectors, or 4 MB) and verifies that the data can be read without
problems.

This test is important because while the decompressed data takes
exactly one cluster, the maximum value allowed in the compressed size
field is twice the cluster size. So although QEMU won't produce images
with such large values we need to make sure that it can handle them.

Another effect of increasing the size field is that it can make
it include data from the following host cluster(s). In this case
'qemu-img check' will detect that the refcounts are not correct, and
we'll need to rebuild them.

Additionally, this patch also tests that decreasing the size corrupts
the image since the original data can no longer be recovered. In this
case QEMU returns an error when trying to read the compressed data,
but 'qemu-img check' doesn't see anything wrong if the refcounts are
consistent.

One possible task for the future is to make 'qemu-img check' verify
the sizes of the compressed clusters, by trying to decompress the data
and checking that the size stored in the L2 entry is correct.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
v4: Resend for 2.12

v3: Add TODO comment, as suggested by Eric.

    Corrupt the length of the second compressed cluster as well so the
    uncompressed data would span three host clusters.

v2: We now have two scenarios where we make QEMU read data from the
    next host cluster and from beyond the end of the image. This
    version also runs qemu-img check on the corrupted image.

    If the size field is too small, reading fails but qemu-img check
    succeeds.

    If the size field is too large, reading succeeds but qemu-img
    check fails (this can be repaired, though).
---
 tests/qemu-iotests/122     | 45 +++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/122.out | 31 +++++++++++++++++++++++++++++++
 2 files changed, 76 insertions(+)

Comments

Max Reitz March 28, 2018, 5:34 p.m. UTC | #1
On 2018-03-22 13:41, Alberto Garcia wrote:
> L2 entries for compressed clusters have a field that indicates the
> number of sectors used to store the data in the image.
> 
> That's however not the size of the compressed data itself, just the
> number of sectors where that data is located. The actual data size is
> usually not a multiple of the sector size, and therefore cannot be
> represented with this field.
> 
> The way it works is that QEMU reads all the specified sectors and
> starts decompressing the data until there's enough to recover the
> original uncompressed cluster. If there are any bytes left that
> haven't been decompressed they are simply ignored.
> 
> One consequence of this is that even if the size field is larger than
> it needs to be QEMU can handle it just fine: it will read more data
> from disk but it will ignore the extra bytes.
> 
> This test creates an image with two compressed clusters that use 5
> sectors (2.5 KB) each, increases the size field to the maximum (8192
> sectors, or 4 MB) and verifies that the data can be read without
> problems.
> 
> This test is important because while the decompressed data takes
> exactly one cluster, the maximum value allowed in the compressed size
> field is twice the cluster size. So although QEMU won't produce images
> with such large values we need to make sure that it can handle them.
> 
> Another effect of increasing the size field is that it can make
> it include data from the following host cluster(s). In this case
> 'qemu-img check' will detect that the refcounts are not correct, and
> we'll need to rebuild them.
> 
> Additionally, this patch also tests that decreasing the size corrupts
> the image since the original data can no longer be recovered. In this
> case QEMU returns an error when trying to read the compressed data,
> but 'qemu-img check' doesn't see anything wrong if the refcounts are
> consistent.
> 
> One possible task for the future is to make 'qemu-img check' verify
> the sizes of the compressed clusters, by trying to decompress the data
> and checking that the size stored in the L2 entry is correct.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>
> ---
> v4: Resend for 2.12
> 
> v3: Add TODO comment, as suggested by Eric.
> 
>     Corrupt the length of the second compressed cluster as well so the
>     uncompressed data would span three host clusters.
> 
> v2: We now have two scenarios where we make QEMU read data from the
>     next host cluster and from beyond the end of the image. This
>     version also runs qemu-img check on the corrupted image.
> 
>     If the size field is too small, reading fails but qemu-img check
>     succeeds.
> 
>     If the size field is too large, reading succeeds but qemu-img
>     check fails (this can be repaired, though).
> ---
>  tests/qemu-iotests/122     | 45 +++++++++++++++++++++++++++++++++++++++++++++
>  tests/qemu-iotests/122.out | 31 +++++++++++++++++++++++++++++++
>  2 files changed, 76 insertions(+)
> 
> diff --git a/tests/qemu-iotests/122 b/tests/qemu-iotests/122
> index 45b359c2ba..5b9593016c 100755
> --- a/tests/qemu-iotests/122
> +++ b/tests/qemu-iotests/122

Not sure if 122 is the right file for this...

Or, let me rephrase, it does look to me like it is not the right file.
But on the other hand, I don't see a more suitable file.

> @@ -130,6 +130,51 @@ $QEMU_IO -c "read -P 0    1024k 1022k" "$TEST_IMG" 2>&1 | _filter_qemu_io | _fil
>  
>  
>  echo
> +echo "=== Corrupted size field in compressed cluster descriptor ==="
> +echo
> +# Create an empty image, fill half of it with data and compress it.
> +# The L2 entries of the two compressed clusters are located at
> +# 0x800000 and 0x800008, their original values are 0x4008000000a00000
> +# and 0x4008000000a00802 (5 sectors for compressed data each).
> +TEST_IMG="$TEST_IMG".1 _make_test_img 8M
> +$QEMU_IO -c "write -P 0x11 0 4M" "$TEST_IMG".1 2>&1 | _filter_qemu_io | _filter_testdir
> +$QEMU_IMG convert -c -O qcow2 -o cluster_size=2M "$TEST_IMG".1 "$TEST_IMG"

Why not just use "write -c" and drop the convert?  (You'd have to use
two writes, though, one for each cluster.)

> +
> +# Reduce size of compressed data to 4 sectors: this corrupts the image.
> +poke_file "$TEST_IMG" $((0x800000)) "\x40\x06"
> +$QEMU_IO -c "read  -P 0x11 0 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filter_testdir
> +
> +# 'qemu-img check' however doesn't see anything wrong because it
> +# doesn't try to decompress the data and the refcounts are consistent.
> +# TODO: update qemu-img so this can be detected
> +_check_test_img
> +
> +# Increase size of compressed data to the maximum (8192 sectors).
> +# This makes QEMU read more data (8192 sectors instead of 5, host
> +# addresses [0xa00000, 0xdfffff]), but the decompression algorithm
> +# stops once we have enough to restore the uncompressed cluster, so
> +# the rest of the data is ignored.
> +poke_file "$TEST_IMG" $((0x800000)) "\x7f\xfe"
> +# Do it also for the second compressed cluster (L2 entry at 0x800008).
> +# In this case the compressed data would span 3 host clusters
> +# (host addresses: [0xa00802, 0xe00801])
> +poke_file "$TEST_IMG" $((0x800008)) "\x7f\xfe"
> +
> +# Here the image is too small so we're asking QEMU to read beyond the
> +# end of the image.
> +$QEMU_IO -c "read  -P 0x11  0 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filter_testdir
> +# But if we grow the image we won't be reading beyond its end anymore.
> +$QEMU_IO -c "write -P 0x22 4M 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filter_testdir
> +$QEMU_IO -c "read  -P 0x11  0 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filter_testdir

Both reads result in exactly the same output, though, so it doesn't seem
like qemu cares.

(This is the reason I'm not merging this patch now, because that looks a
bit fishy.)

> +
> +# The refcount data is however wrong because due to the increased size
> +# of the compressed data it now reaches the following host clusters.
> +# This can be repaired by qemu-img check.

The OFLAG_COPIED repair looks a bit interesting, but, er, well.

Max

(Since a compressed cluster does not correspond 1:1 to a host cluster,
you cannot say that a compressed cluster has a refcount -- only host
clusters are refcounted.  So what is it supposed to mean that a
compressed cluster has a refcount of 1?

I'd argue from a point of usefulness.  In theory, you could modify
compressed clusters in-place, and then you'd need the information
whether you are allowed to.  But that doesn't really depend on whether
the host clusters touched by the compressed cluster have a refcount of
1, instead if depends on how many times the compressed cluster itself is
referenced.  Unfortunately we have no refcounting structure for that.

So all in all OFLAG_COPIED is just useless for compressed clusters.)

> +_check_test_img -r all
> +$QEMU_IO -c "read  -P 0x11  0 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filter_testdir
> +$QEMU_IO -c "read  -P 0x22 4M 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filter_testdir
> +
> +echo
>  echo "=== Full allocation with -S 0 ==="
>  echo
>  
> diff --git a/tests/qemu-iotests/122.out b/tests/qemu-iotests/122.out
> index 47d8656db8..bdda75650f 100644
> --- a/tests/qemu-iotests/122.out
> +++ b/tests/qemu-iotests/122.out
> @@ -99,6 +99,37 @@ read 1024/1024 bytes at offset 1047552
>  read 1046528/1046528 bytes at offset 1048576
>  1022 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>  
> +=== Corrupted size field in compressed cluster descriptor ===
> +
> +Formatting 'TEST_DIR/t.IMGFMT.1', fmt=IMGFMT size=8388608
> +wrote 4194304/4194304 bytes at offset 0
> +4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +read failed: Input/output error
> +No errors were found on the image.
> +read 4194304/4194304 bytes at offset 0
> +4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +wrote 4194304/4194304 bytes at offset 4194304
> +4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +read 4194304/4194304 bytes at offset 0
> +4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +ERROR cluster 6 refcount=1 reference=3
> +ERROR cluster 7 refcount=1 reference=2
> +Repairing cluster 6 refcount=1 reference=3
> +Repairing cluster 7 refcount=1 reference=2
> +Repairing OFLAG_COPIED data cluster: l2_entry=8000000000c00000 refcount=3
> +Repairing OFLAG_COPIED data cluster: l2_entry=8000000000e00000 refcount=2
> +The following inconsistencies were found and repaired:
> +
> +    0 leaked clusters
> +    4 corruptions
> +
> +Double checking the fixed image now...
> +No errors were found on the image.
> +read 4194304/4194304 bytes at offset 0
> +4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +read 4194304/4194304 bytes at offset 4194304
> +4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +
>  === Full allocation with -S 0 ===
>  
>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
>
Eric Blake March 28, 2018, 5:58 p.m. UTC | #2
On 03/28/2018 12:34 PM, Max Reitz wrote:
> On 2018-03-22 13:41, Alberto Garcia wrote:
>> L2 entries for compressed clusters have a field that indicates the
>> number of sectors used to store the data in the image.
>>
>> That's however not the size of the compressed data itself, just the
>> number of sectors where that data is located. The actual data size is
>> usually not a multiple of the sector size, and therefore cannot be
>> represented with this field.
>>

>> +++ b/tests/qemu-iotests/122
> 
> Not sure if 122 is the right file for this...
> 
> Or, let me rephrase, it does look to me like it is not the right file.
> But on the other hand, I don't see a more suitable file.

Short of cloning 122 as the starting point and creating a new test file.

> 
>> @@ -130,6 +130,51 @@ $QEMU_IO -c "read -P 0    1024k 1022k" "$TEST_IMG" 2>&1 | _filter_qemu_io | _fil
>>   
>>   
>>   echo
>> +echo "=== Corrupted size field in compressed cluster descriptor ==="
>> +echo
>> +# Create an empty image, fill half of it with data and compress it.
>> +# The L2 entries of the two compressed clusters are located at
>> +# 0x800000 and 0x800008, their original values are 0x4008000000a00000
>> +# and 0x4008000000a00802 (5 sectors for compressed data each).
>> +TEST_IMG="$TEST_IMG".1 _make_test_img 8M
>> +$QEMU_IO -c "write -P 0x11 0 4M" "$TEST_IMG".1 2>&1 | _filter_qemu_io | _filter_testdir
>> +$QEMU_IMG convert -c -O qcow2 -o cluster_size=2M "$TEST_IMG".1 "$TEST_IMG"
> 
> Why not just use "write -c" and drop the convert?  (You'd have to use
> two writes, though, one for each cluster.)

'write -c' is newer code; it may work, but it may also cause offsets to 
live elsewhere for knock-on effects later in the test. (It used to be 
compression was only possible during convert)


>> +
>> +# Here the image is too small so we're asking QEMU to read beyond the
>> +# end of the image.
>> +$QEMU_IO -c "read  -P 0x11  0 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filter_testdir
>> +# But if we grow the image we won't be reading beyond its end anymore.
>> +$QEMU_IO -c "write -P 0x22 4M 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filter_testdir
>> +$QEMU_IO -c "read  -P 0x11  0 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filter_testdir
> 
> Both reads result in exactly the same output, though, so it doesn't seem
> like qemu cares.
> 
> (This is the reason I'm not merging this patch now, because that looks a
> bit fishy.)

Hmm, you have an interesting point - should the read fail (you asked me 
to read more clusters than I have available) or succeed (since you asked 
me to read beyond EOF, I filled the tail of the buffer with all zeroes, 
then tried decompressing, and since decompression worked, it obviously 
didn't need the bytes that I filled in as zero).  It's a little nicer to 
fail (the image is fishy for requesting more clusters than are present, 
even if what IS present is sufficient to reconstruct the data).

> 
>> +
>> +# The refcount data is however wrong because due to the increased size
>> +# of the compressed data it now reaches the following host clusters.
>> +# This can be repaired by qemu-img check.
> 
> The OFLAG_COPIED repair looks a bit interesting, but, er, well.
> 
> Max
> 
> (Since a compressed cluster does not correspond 1:1 to a host cluster,
> you cannot say that a compressed cluster has a refcount -- only host
> clusters are refcounted.  So what is it supposed to mean that a
> compressed cluster has a refcount of 1?

A compressed cluster may affect the refcount of multiple clusters: the 
cluster that contains the initial offset, and the cluster that contains 
any of the nb_sectors that did not fit in the first cluster.  So, if I 
have a 4k-cluster image (where each character is a sector), and where 
the compressed clusters are nicely sector-aligned:

|1-------|2-------|3-------|
|AAAAAABB|BBBBCCCC|CC------|

Here, the L2 entry for A, B, and C each list nb_sectors of 5, as it 
takes 6 sectors to list the entire image, but nb_sectors does not 
include the sector that includes the original offset.  The refcount for 
cluster 1 is 2 (the full contents of compressed A and the first half of 
compressed B); for cluster 2 is 2 (the second half of compressed B and 
the first half of compressed C); and for cluster 3 is 1 (the second half 
of compressed C).

But what this patch is dealing with is when nb_sectors is larger than 
required.  With 4k sectors, qemu will never populate nb_clusters more 
than 8 (if the output is not nicely aligned, and 4096 bytes compresses 
down to only 4095, we can end up with 1 byte in the first sector, then 7 
complete sectors, then 510 bytes in a final sector, for 8 sectors beyond 
the initial offset).  But the qcow2 image is still valid even if the L2 
entry claims nb_sectors of 15; if that happens, then a compressed 
cluster can now affect the refcount of 3 clusters rather than the usual 
1 or 2.

> 
> I'd argue from a point of usefulness.  In theory, you could modify
> compressed clusters in-place, and then you'd need the information
> whether you are allowed to.  But that doesn't really depend on whether
> the host clusters touched by the compressed cluster have a refcount of
> 1, instead if depends on how many times the compressed cluster itself is
> referenced.  Unfortunately we have no refcounting structure for that.
> 
> So all in all OFLAG_COPIED is just useless for compressed clusters.)

In general, because we intentionally allow multiple compressed clusters 
to (try to) share a single host cluster, the refcount for compressed 
clusters will usually be larger than 1.  And OFLAG_COPIED is only useful 
when the refcount is 1, right?
Alberto Garcia March 29, 2018, 9:39 a.m. UTC | #3
On Wed 28 Mar 2018 07:34:15 PM CEST, Max Reitz wrote:
>> diff --git a/tests/qemu-iotests/122 b/tests/qemu-iotests/122
>> index 45b359c2ba..5b9593016c 100755
>> --- a/tests/qemu-iotests/122
>> +++ b/tests/qemu-iotests/122
>
> Not sure if 122 is the right file for this...
>
> Or, let me rephrase, it does look to me like it is not the right file.
> But on the other hand, I don't see a more suitable file.

Yeah I was tempted to create a new one, but in the end I decided to put
it there. We can still create a new one if you feel strongly about it.

>>  echo
>> +echo "=== Corrupted size field in compressed cluster descriptor ==="
>> +echo
>> +# Create an empty image, fill half of it with data and compress it.
>> +# The L2 entries of the two compressed clusters are located at
>> +# 0x800000 and 0x800008, their original values are 0x4008000000a00000
>> +# and 0x4008000000a00802 (5 sectors for compressed data each).
>> +TEST_IMG="$TEST_IMG".1 _make_test_img 8M
>> +$QEMU_IO -c "write -P 0x11 0 4M" "$TEST_IMG".1 2>&1 | _filter_qemu_io | _filter_testdir
>> +$QEMU_IMG convert -c -O qcow2 -o cluster_size=2M "$TEST_IMG".1 "$TEST_IMG"
>
> Why not just use "write -c" and drop the convert?  (You'd have to use
> two writes, though, one for each cluster.)

Yeah, it's also possible, you have to do it in the same qemu-io command
though, else it will allocate two host clusters. But yes, I can change
that.

>> +# Reduce size of compressed data to 4 sectors: this corrupts the image.
>> +poke_file "$TEST_IMG" $((0x800000)) "\x40\x06"
>> +$QEMU_IO -c "read  -P 0x11 0 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filter_testdir
>> +
>> +# 'qemu-img check' however doesn't see anything wrong because it
>> +# doesn't try to decompress the data and the refcounts are consistent.
>> +# TODO: update qemu-img so this can be detected
>> +_check_test_img
>> +
>> +# Increase size of compressed data to the maximum (8192 sectors).
>> +# This makes QEMU read more data (8192 sectors instead of 5, host
>> +# addresses [0xa00000, 0xdfffff]), but the decompression algorithm
>> +# stops once we have enough to restore the uncompressed cluster, so
>> +# the rest of the data is ignored.
>> +poke_file "$TEST_IMG" $((0x800000)) "\x7f\xfe"
>> +# Do it also for the second compressed cluster (L2 entry at 0x800008).
>> +# In this case the compressed data would span 3 host clusters
>> +# (host addresses: [0xa00802, 0xe00801])
>> +poke_file "$TEST_IMG" $((0x800008)) "\x7f\xfe"
>> +
>> +# Here the image is too small so we're asking QEMU to read beyond the
>> +# end of the image.
>> +$QEMU_IO -c "read  -P 0x11  0 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filter_testdir
>> +# But if we grow the image we won't be reading beyond its end anymore.
>> +$QEMU_IO -c "write -P 0x22 4M 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filter_testdir
>> +$QEMU_IO -c "read  -P 0x11  0 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filter_testdir
>
> Both reads result in exactly the same output, though, so it doesn't
> seem like qemu cares.
>
> (This is the reason I'm not merging this patch now, because that looks
> a bit fishy.)

In this example the size stored on the compressed cluster descriptor is
corrupted and larger than the size of the compressed data on disk.

In the first read the image is too small so this makes QEMU read not
only past the end of the compressed data, but also past the end of the
qcow2 image itself.

In the second read the qcow2 image is larger so QEMU reads actual data
from the subsequent clusters.

It doesn't make much difference. In the first case the buffer is padded
with zeroes and in the second with data from other clusters, but QEMU
only uses the data needed to restore the original uncompressed cluster
and the rest is ignored. But I thought I could still test both cases,
that's why there's two reads.

>> +# The refcount data is however wrong because due to the increased size
>> +# of the compressed data it now reaches the following host clusters.
>> +# This can be repaired by qemu-img check.
>
> The OFLAG_COPIED repair looks a bit interesting, but, er, well.
>
> Max
>
> (Since a compressed cluster does not correspond 1:1 to a host cluster,
> you cannot say that a compressed cluster has a refcount -- only host
> clusters are refcounted.  So what is it supposed to mean that a
> compressed cluster has a refcount of 1?

Compressed clusters never have OFLAG_COPIED and we treat it as an error
if they do (see check_refcounts_l2()), but their host clusters still
have a reference count.

A host cluster with 4 compressed clusters has a reference count of 4.

A compressed cluster that uses two host clusters (even if it's only
partially) increases the reference count of both of them.

In this test case the size field of the compressed cluster is too large,
so it includes uncompressed clusters that are contiguous to it. The
uncompressed clusters have refcount == 1 and OFLAG_COPIED, but QEMU
detects that the refcount should be larger and that OFLAG_COPIED should
not be there, that's why it's repaired.

Admittedly the way it's repaired is not the best one: the uncompressed
clusters have now extra references, but other than that I don't see any
harmful effect.

The proper fix for this would be to detect that the compressed cluster
size is incorrect and correct its value. There's a TODO for that (in the
first _check_test_img call where it's more severe), but now that I think
of it I should probably mention that in the second one as well.

Berto
Max Reitz March 29, 2018, 2:10 p.m. UTC | #4
On 2018-03-28 19:58, Eric Blake wrote:
> On 03/28/2018 12:34 PM, Max Reitz wrote:

[...]

>> The OFLAG_COPIED repair looks a bit interesting, but, er, well.
>>
>> Max
>>
>> (Since a compressed cluster does not correspond 1:1 to a host cluster,
>> you cannot say that a compressed cluster has a refcount -- only host
>> clusters are refcounted.  So what is it supposed to mean that a
>> compressed cluster has a refcount of 1?
> 
> A compressed cluster may affect the refcount of multiple clusters: the
> cluster that contains the initial offset, and the cluster that contains
> any of the nb_sectors that did not fit in the first cluster.  So, if I
> have a 4k-cluster image (where each character is a sector), and where
> the compressed clusters are nicely sector-aligned:
> 
> |1-------|2-------|3-------|
> |AAAAAABB|BBBBCCCC|CC------|
> 
> Here, the L2 entry for A, B, and C each list nb_sectors of 5, as it
> takes 6 sectors to list the entire image, but nb_sectors does not
> include the sector that includes the original offset.  The refcount for
> cluster 1 is 2 (the full contents of compressed A and the first half of
> compressed B); for cluster 2 is 2 (the second half of compressed B and
> the first half of compressed C); and for cluster 3 is 1 (the second half
> of compressed C).
> 
> But what this patch is dealing with is when nb_sectors is larger than
> required.  With 4k sectors, qemu will never populate nb_clusters more
> than 8 (if the output is not nicely aligned, and 4096 bytes compresses
> down to only 4095, we can end up with 1 byte in the first sector, then 7
> complete sectors, then 510 bytes in a final sector, for 8 sectors beyond
> the initial offset).  But the qcow2 image is still valid even if the L2
> entry claims nb_sectors of 15; if that happens, then a compressed
> cluster can now affect the refcount of 3 clusters rather than the usual
> 1 or 2.
> 
>>
>> I'd argue from a point of usefulness.  In theory, you could modify
>> compressed clusters in-place, and then you'd need the information
>> whether you are allowed to.  But that doesn't really depend on whether
>> the host clusters touched by the compressed cluster have a refcount of
>> 1, instead if depends on how many times the compressed cluster itself is
>> referenced.  Unfortunately we have no refcounting structure for that.
>>
>> So all in all OFLAG_COPIED is just useless for compressed clusters.)
> 
> In general, because we intentionally allow multiple compressed clusters
> to (try to) share a single host cluster, the refcount for compressed
> clusters will usually be larger than 1.  And OFLAG_COPIED is only useful
> when the refcount is 1, right?

OFLAG_COPIED is only useful when it reflects the number of references to
the data cluster, because only then can we update it in-place without
having to worry about other users.  For normal clusters, one data
cluster is one host cluster, so we can get the number of references from
the refcount structures (which contain the reference counts for host
clusters, but *not* for data clusters).

But for compressed clusters, the data cluster may be part of a host
cluster, or it may even be multiple host clusters.  So the information
we get from the refcount structures is totally useless here, it doesn't
tell us how many references there are to that compressed data cluster.

There is only one exception: If all of the host clusters the compressed
cluster touches have a refcount of 1, we can be certain that there are
no other users, so then we know there is only one reference to the
compressed cluster.  But as soon as they have a higher refcount... We
don't know whether that's because of multiple references to this
compressed cluster or because of multiple compressed cluster sharing a
single host cluster.

In practice it doesn't really matter of course because modifying
compressed clusters in-place seems like a whole different can of worms.

Max
Max Reitz March 29, 2018, 2:23 p.m. UTC | #5
On 2018-03-29 11:39, Alberto Garcia wrote:
> On Wed 28 Mar 2018 07:34:15 PM CEST, Max Reitz wrote:
>>> diff --git a/tests/qemu-iotests/122 b/tests/qemu-iotests/122
>>> index 45b359c2ba..5b9593016c 100755
>>> --- a/tests/qemu-iotests/122
>>> +++ b/tests/qemu-iotests/122
>>
>> Not sure if 122 is the right file for this...
>>
>> Or, let me rephrase, it does look to me like it is not the right file.
>> But on the other hand, I don't see a more suitable file.
> 
> Yeah I was tempted to create a new one, but in the end I decided to put
> it there. We can still create a new one if you feel strongly about it.
> 
>>>  echo
>>> +echo "=== Corrupted size field in compressed cluster descriptor ==="
>>> +echo
>>> +# Create an empty image, fill half of it with data and compress it.
>>> +# The L2 entries of the two compressed clusters are located at
>>> +# 0x800000 and 0x800008, their original values are 0x4008000000a00000
>>> +# and 0x4008000000a00802 (5 sectors for compressed data each).
>>> +TEST_IMG="$TEST_IMG".1 _make_test_img 8M
>>> +$QEMU_IO -c "write -P 0x11 0 4M" "$TEST_IMG".1 2>&1 | _filter_qemu_io | _filter_testdir
>>> +$QEMU_IMG convert -c -O qcow2 -o cluster_size=2M "$TEST_IMG".1 "$TEST_IMG"
>>
>> Why not just use "write -c" and drop the convert?  (You'd have to use
>> two writes, though, one for each cluster.)
> 
> Yeah, it's also possible, you have to do it in the same qemu-io command
> though, else it will allocate two host clusters. But yes, I can change
> that.
> 
>>> +# Reduce size of compressed data to 4 sectors: this corrupts the image.
>>> +poke_file "$TEST_IMG" $((0x800000)) "\x40\x06"
>>> +$QEMU_IO -c "read  -P 0x11 0 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filter_testdir
>>> +
>>> +# 'qemu-img check' however doesn't see anything wrong because it
>>> +# doesn't try to decompress the data and the refcounts are consistent.
>>> +# TODO: update qemu-img so this can be detected
>>> +_check_test_img
>>> +
>>> +# Increase size of compressed data to the maximum (8192 sectors).
>>> +# This makes QEMU read more data (8192 sectors instead of 5, host
>>> +# addresses [0xa00000, 0xdfffff]), but the decompression algorithm
>>> +# stops once we have enough to restore the uncompressed cluster, so
>>> +# the rest of the data is ignored.
>>> +poke_file "$TEST_IMG" $((0x800000)) "\x7f\xfe"
>>> +# Do it also for the second compressed cluster (L2 entry at 0x800008).
>>> +# In this case the compressed data would span 3 host clusters
>>> +# (host addresses: [0xa00802, 0xe00801])
>>> +poke_file "$TEST_IMG" $((0x800008)) "\x7f\xfe"
>>> +
>>> +# Here the image is too small so we're asking QEMU to read beyond the
>>> +# end of the image.
>>> +$QEMU_IO -c "read  -P 0x11  0 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filter_testdir
>>> +# But if we grow the image we won't be reading beyond its end anymore.
>>> +$QEMU_IO -c "write -P 0x22 4M 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filter_testdir
>>> +$QEMU_IO -c "read  -P 0x11  0 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filter_testdir
>>
>> Both reads result in exactly the same output, though, so it doesn't
>> seem like qemu cares.
>>
>> (This is the reason I'm not merging this patch now, because that looks
>> a bit fishy.)
> 
> In this example the size stored on the compressed cluster descriptor is
> corrupted and larger than the size of the compressed data on disk.
> 
> In the first read the image is too small so this makes QEMU read not
> only past the end of the compressed data, but also past the end of the
> qcow2 image itself.
> 
> In the second read the qcow2 image is larger so QEMU reads actual data
> from the subsequent clusters.
> 
> It doesn't make much difference. In the first case the buffer is padded
> with zeroes and in the second with data from other clusters, but QEMU
> only uses the data needed to restore the original uncompressed cluster
> and the rest is ignored. But I thought I could still test both cases,
> that's why there's two reads.

Ah, OK.  I would have preferred a comment then so that people know it's
just expected to work.

>>> +# The refcount data is however wrong because due to the increased size
>>> +# of the compressed data it now reaches the following host clusters.
>>> +# This can be repaired by qemu-img check.
>>
>> The OFLAG_COPIED repair looks a bit interesting, but, er, well.
>>
>> Max
>>
>> (Since a compressed cluster does not correspond 1:1 to a host cluster,
>> you cannot say that a compressed cluster has a refcount -- only host
>> clusters are refcounted.  So what is it supposed to mean that a
>> compressed cluster has a refcount of 1?
> 
> Compressed clusters never have OFLAG_COPIED and we treat it as an error
> if they do (see check_refcounts_l2()),

Interesting.  Wonder why the qcow2 specification doesn't mention that...

>                                        but their host clusters still
> have a reference count.
> 
> A host cluster with 4 compressed clusters has a reference count of 4.
> 
> A compressed cluster that uses two host clusters (even if it's only
> partially) increases the reference count of both of them.
> 
> In this test case the size field of the compressed cluster is too large,
> so it includes uncompressed clusters that are contiguous to it. The
> uncompressed clusters have refcount == 1 and OFLAG_COPIED, but QEMU
> detects that the refcount should be larger and that OFLAG_COPIED should
> not be there, that's why it's repaired.

Aaah, OK, I see.  Thanks for explaining.

> Admittedly the way it's repaired is not the best one: the uncompressed
> clusters have now extra references, but other than that I don't see any
> harmful effect.
> 
> The proper fix for this would be to detect that the compressed cluster
> size is incorrect and correct its value. There's a TODO for that (in the
> first _check_test_img call where it's more severe), but now that I think
> of it I should probably mention that in the second one as well.

Well, the current repair method will at least prevent data loss, so I'm
OK with it.

Max
diff mbox series

Patch

diff --git a/tests/qemu-iotests/122 b/tests/qemu-iotests/122
index 45b359c2ba..5b9593016c 100755
--- a/tests/qemu-iotests/122
+++ b/tests/qemu-iotests/122
@@ -130,6 +130,51 @@  $QEMU_IO -c "read -P 0    1024k 1022k" "$TEST_IMG" 2>&1 | _filter_qemu_io | _fil
 
 
 echo
+echo "=== Corrupted size field in compressed cluster descriptor ==="
+echo
+# Create an empty image, fill half of it with data and compress it.
+# The L2 entries of the two compressed clusters are located at
+# 0x800000 and 0x800008, their original values are 0x4008000000a00000
+# and 0x4008000000a00802 (5 sectors for compressed data each).
+TEST_IMG="$TEST_IMG".1 _make_test_img 8M
+$QEMU_IO -c "write -P 0x11 0 4M" "$TEST_IMG".1 2>&1 | _filter_qemu_io | _filter_testdir
+$QEMU_IMG convert -c -O qcow2 -o cluster_size=2M "$TEST_IMG".1 "$TEST_IMG"
+
+# Reduce size of compressed data to 4 sectors: this corrupts the image.
+poke_file "$TEST_IMG" $((0x800000)) "\x40\x06"
+$QEMU_IO -c "read  -P 0x11 0 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filter_testdir
+
+# 'qemu-img check' however doesn't see anything wrong because it
+# doesn't try to decompress the data and the refcounts are consistent.
+# TODO: update qemu-img so this can be detected
+_check_test_img
+
+# Increase size of compressed data to the maximum (8192 sectors).
+# This makes QEMU read more data (8192 sectors instead of 5, host
+# addresses [0xa00000, 0xdfffff]), but the decompression algorithm
+# stops once we have enough to restore the uncompressed cluster, so
+# the rest of the data is ignored.
+poke_file "$TEST_IMG" $((0x800000)) "\x7f\xfe"
+# Do it also for the second compressed cluster (L2 entry at 0x800008).
+# In this case the compressed data would span 3 host clusters
+# (host addresses: [0xa00802, 0xe00801])
+poke_file "$TEST_IMG" $((0x800008)) "\x7f\xfe"
+
+# Here the image is too small so we're asking QEMU to read beyond the
+# end of the image.
+$QEMU_IO -c "read  -P 0x11  0 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filter_testdir
+# But if we grow the image we won't be reading beyond its end anymore.
+$QEMU_IO -c "write -P 0x22 4M 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filter_testdir
+$QEMU_IO -c "read  -P 0x11  0 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filter_testdir
+
+# The refcount data is however wrong because due to the increased size
+# of the compressed data it now reaches the following host clusters.
+# This can be repaired by qemu-img check.
+_check_test_img -r all
+$QEMU_IO -c "read  -P 0x11  0 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filter_testdir
+$QEMU_IO -c "read  -P 0x22 4M 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filter_testdir
+
+echo
 echo "=== Full allocation with -S 0 ==="
 echo
 
diff --git a/tests/qemu-iotests/122.out b/tests/qemu-iotests/122.out
index 47d8656db8..bdda75650f 100644
--- a/tests/qemu-iotests/122.out
+++ b/tests/qemu-iotests/122.out
@@ -99,6 +99,37 @@  read 1024/1024 bytes at offset 1047552
 read 1046528/1046528 bytes at offset 1048576
 1022 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
+=== Corrupted size field in compressed cluster descriptor ===
+
+Formatting 'TEST_DIR/t.IMGFMT.1', fmt=IMGFMT size=8388608
+wrote 4194304/4194304 bytes at offset 0
+4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read failed: Input/output error
+No errors were found on the image.
+read 4194304/4194304 bytes at offset 0
+4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 4194304/4194304 bytes at offset 4194304
+4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 4194304/4194304 bytes at offset 0
+4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+ERROR cluster 6 refcount=1 reference=3
+ERROR cluster 7 refcount=1 reference=2
+Repairing cluster 6 refcount=1 reference=3
+Repairing cluster 7 refcount=1 reference=2
+Repairing OFLAG_COPIED data cluster: l2_entry=8000000000c00000 refcount=3
+Repairing OFLAG_COPIED data cluster: l2_entry=8000000000e00000 refcount=2
+The following inconsistencies were found and repaired:
+
+    0 leaked clusters
+    4 corruptions
+
+Double checking the fixed image now...
+No errors were found on the image.
+read 4194304/4194304 bytes at offset 0
+4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 4194304/4194304 bytes at offset 4194304
+4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
 === Full allocation with -S 0 ===
 
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864