diff mbox series

[v7,3/6] qcow2: Reduce REFT_OFFSET_MASK

Message ID 20180628190723.276458-4-eblake@redhat.com
State New
Headers show
Series minor qcow2 compression improvements | expand

Commit Message

Eric Blake June 28, 2018, 7:07 p.m. UTC
Match our code to the spec change in the previous patch - there's
no reason for the refcount table to allow larger offsets than the
L1/L2 tables. In practice, no image has more than 64PB of
allocated clusters anyways, as anything beyond that can't be
expressed via L2 mappings to host offsets.

Suggested-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>

---
v4: new patch
---
 block/qcow2.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Kevin Wolf June 29, 2018, 8:44 a.m. UTC | #1
Am 28.06.2018 um 21:07 hat Eric Blake geschrieben:
> Match our code to the spec change in the previous patch - there's
> no reason for the refcount table to allow larger offsets than the
> L1/L2 tables.

What about internal snapshots? And anyway, because of the metadata
overhead, the physical image size of a fully allocated image is always
going to be at least minimally larger than the virtual disk size.

I'm not necessarily opposed to making the change if there is a good
reason to make it, but I don't see a real need for it and the
justification used here and also in the previous patch is incorrect.

Kevin

> In practice, no image has more than 64PB of
> allocated clusters anyways, as anything beyond that can't be
> expressed via L2 mappings to host offsets.
> 
> Suggested-by: Alberto Garcia <berto@igalia.com>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: Alberto Garcia <berto@igalia.com>
> 
> ---
> v4: new patch
> ---
>  block/qcow2.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 01b5250415f..3774229ef9c 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -439,7 +439,7 @@ typedef enum QCow2MetadataOverlap {
>  #define L2E_OFFSET_MASK 0x00fffffffffffe00ULL
>  #define L2E_COMPRESSED_OFFSET_SIZE_MASK 0x3fffffffffffffffULL
> 
> -#define REFT_OFFSET_MASK 0xfffffffffffffe00ULL
> +#define REFT_OFFSET_MASK 0x00fffffffffffe00ULL
> 
>  static inline int64_t start_of_cluster(BDRVQcow2State *s, int64_t offset)
>  {
> -- 
> 2.14.4
>
Eric Blake June 29, 2018, 3:22 p.m. UTC | #2
On 06/29/2018 03:44 AM, Kevin Wolf wrote:
> Am 28.06.2018 um 21:07 hat Eric Blake geschrieben:
>> Match our code to the spec change in the previous patch - there's
>> no reason for the refcount table to allow larger offsets than the
>> L1/L2 tables.
> 
> What about internal snapshots? And anyway, because of the metadata
> overhead, the physical image size of a fully allocated image is always
> going to be at least minimally larger than the virtual disk size.
> 
> I'm not necessarily opposed to making the change if there is a good
> reason to make it, but I don't see a real need for it and the
> justification used here and also in the previous patch is incorrect.

The fact that ext4 cannot hold an image this large is already an 
indication that setting this limit on the refcount table is NOT going to 
bite real users.

Yes, you can argue that with lots of metadata, including internal 
snapshots, and on a capable file system (such as tmpfs) that can even 
hold files this large to begin with, then yes, allowing the refcount to 
exceed this limit will allow slightly more metadata to be crammed into a 
single image.  But will it actually help anyone?

Do I need to respin the series to split patch 2 into the obvious changes 
(stuff unrelated to capping refcount size) vs. the controversial stuff 
(refcount cap and this code change)?

> 
> Kevin
> 
>> In practice, no image has more than 64PB of
>> allocated clusters anyways, as anything beyond that can't be
>> expressed via L2 mappings to host offsets.

If you're opposed to an exact 56-bit limit on the grounds that 56-bit 
guest data plus minimal metadata should still be expressable, would it 
be better to cap refcount at 57-bits?

>>
>> Suggested-by: Alberto Garcia <berto@igalia.com>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>> Reviewed-by: Alberto Garcia <berto@igalia.com>
>>

>> +++ b/block/qcow2.h
>> @@ -439,7 +439,7 @@ typedef enum QCow2MetadataOverlap {
>>   #define L2E_OFFSET_MASK 0x00fffffffffffe00ULL
>>   #define L2E_COMPRESSED_OFFSET_SIZE_MASK 0x3fffffffffffffffULL
>>
>> -#define REFT_OFFSET_MASK 0xfffffffffffffe00ULL
>> +#define REFT_OFFSET_MASK 0x00fffffffffffe00ULL
>>
>>   static inline int64_t start_of_cluster(BDRVQcow2State *s, int64_t offset)
>>   {
>> -- 
>> 2.14.4
>>
>
Daniel P. Berrangé June 29, 2018, 3:43 p.m. UTC | #3
On Fri, Jun 29, 2018 at 10:22:22AM -0500, Eric Blake wrote:
> On 06/29/2018 03:44 AM, Kevin Wolf wrote:
> > Am 28.06.2018 um 21:07 hat Eric Blake geschrieben:
> > > Match our code to the spec change in the previous patch - there's
> > > no reason for the refcount table to allow larger offsets than the
> > > L1/L2 tables.
> > 
> > What about internal snapshots? And anyway, because of the metadata
> > overhead, the physical image size of a fully allocated image is always
> > going to be at least minimally larger than the virtual disk size.
> > 
> > I'm not necessarily opposed to making the change if there is a good
> > reason to make it, but I don't see a real need for it and the
> > justification used here and also in the previous patch is incorrect.
> 
> The fact that ext4 cannot hold an image this large is already an indication
> that setting this limit on the refcount table is NOT going to bite real
> users.

NB, RHEL-7 defaults to xfs and this supports file sizes way larger
than ext4 does, so not sure we should consider ext4 as representative
of real world limits anymore.

Regards,
Daniel
Kevin Wolf June 29, 2018, 3:43 p.m. UTC | #4
Am 29.06.2018 um 17:22 hat Eric Blake geschrieben:
> On 06/29/2018 03:44 AM, Kevin Wolf wrote:
> > Am 28.06.2018 um 21:07 hat Eric Blake geschrieben:
> > > Match our code to the spec change in the previous patch - there's
> > > no reason for the refcount table to allow larger offsets than the
> > > L1/L2 tables.
> > 
> > What about internal snapshots? And anyway, because of the metadata
> > overhead, the physical image size of a fully allocated image is always
> > going to be at least minimally larger than the virtual disk size.
> > 
> > I'm not necessarily opposed to making the change if there is a good
> > reason to make it, but I don't see a real need for it and the
> > justification used here and also in the previous patch is incorrect.
> 
> The fact that ext4 cannot hold an image this large is already an indication
> that setting this limit on the refcount table is NOT going to bite real
> users.
> 
> Yes, you can argue that with lots of metadata, including internal snapshots,
> and on a capable file system (such as tmpfs) that can even hold files this
> large to begin with, then yes, allowing the refcount to exceed this limit
> will allow slightly more metadata to be crammed into a single image.  But
> will it actually help anyone?
> 
> Do I need to respin the series to split patch 2 into the obvious changes
> (stuff unrelated to capping refcount size) vs. the controversial stuff
> (refcount cap and this code change)?
> 
> > 
> > Kevin
> > 
> > > In practice, no image has more than 64PB of
> > > allocated clusters anyways, as anything beyond that can't be
> > > expressed via L2 mappings to host offsets.
> 
> If you're opposed to an exact 56-bit limit on the grounds that 56-bit guest
> data plus minimal metadata should still be expressable, would it be better
> to cap refcount at 57-bits?

You're arguing that the patch doesn't hurt in practice, but what I'm
really looking for is what do we gain from it?

We generally don't apply patches just because they don't hurt, but
because they are helpful for something. In the simple, fully compatible
cases "helpful" could just mean "we like the code better".

This is a slightly different category: "it's technically incompatible,
but we don't think anyone uses this". We have done such changes before
when they allowed us to improve something substantially, but I think the
requirements for this are at least a little higher than just "because we
like it better". And I'm not even sure yet if I like it better because
it seems like an arbitrary restriction.

So apart from not hurting in practice, what does the restriction buy us?

Kevin
Eric Blake Sept. 14, 2018, 6:47 p.m. UTC | #5
[revisiting this thread]

On 6/29/18 10:43 AM, Kevin Wolf wrote:
> Am 29.06.2018 um 17:22 hat Eric Blake geschrieben:
>> On 06/29/2018 03:44 AM, Kevin Wolf wrote:
>>> Am 28.06.2018 um 21:07 hat Eric Blake geschrieben:
>>>> Match our code to the spec change in the previous patch - there's
>>>> no reason for the refcount table to allow larger offsets than the
>>>> L1/L2 tables.
>>>
>>> What about internal snapshots? And anyway, because of the metadata
>>> overhead, the physical image size of a fully allocated image is always
>>> going to be at least minimally larger than the virtual disk size.
>>>
>>> I'm not necessarily opposed to making the change if there is a good
>>> reason to make it, but I don't see a real need for it and the
>>> justification used here and also in the previous patch is incorrect.
>>
>> The fact that ext4 cannot hold an image this large is already an indication
>> that setting this limit on the refcount table is NOT going to bite real
>> users.
>>
>> Yes, you can argue that with lots of metadata, including internal snapshots,
>> and on a capable file system (such as tmpfs) that can even hold files this
>> large to begin with, then yes, allowing the refcount to exceed this limit
>> will allow slightly more metadata to be crammed into a single image.  But
>> will it actually help anyone?
>>
>> Do I need to respin the series to split patch 2 into the obvious changes
>> (stuff unrelated to capping refcount size) vs. the controversial stuff
>> (refcount cap and this code change)?
>>
>>>
>>> Kevin
>>>
>>>> In practice, no image has more than 64PB of
>>>> allocated clusters anyways, as anything beyond that can't be
>>>> expressed via L2 mappings to host offsets.
>>
>> If you're opposed to an exact 56-bit limit on the grounds that 56-bit guest
>> data plus minimal metadata should still be expressable, would it be better
>> to cap refcount at 57-bits?
> 
> You're arguing that the patch doesn't hurt in practice, but what I'm
> really looking for is what do we gain from it?

I guess one thing to be gained by having qemu's implementation of qcow2 
pull a hard-stop at 56 bits is that it becomes easier to detect errant 
refcount entries beyond the end of the file as being errant, and not 
attempt to expand the file during 'qemu-img check' just because a 
refcount pointed out that far.  But that's still a pretty weak argument 
(we've already mentioned that qemu-img check should do a better job of 
handling/flagging refcount entries that point way beyond the end of the 
image - but that's true regardless of what limits may be placed on the 
host image size)

> 
> We generally don't apply patches just because they don't hurt, but
> because they are helpful for something. In the simple, fully compatible
> cases "helpful" could just mean "we like the code better".
> 
> This is a slightly different category: "it's technically incompatible,
> but we don't think anyone uses this". We have done such changes before
> when they allowed us to improve something substantially, but I think the
> requirements for this are at least a little higher than just "because we
> like it better". And I'm not even sure yet if I like it better because
> it seems like an arbitrary restriction.
> 
> So apart from not hurting in practice, what does the restriction buy us?

As I'm not coming up with a solid reason for tightening the bounds, I 
will (eventually) respin the series without the spec change.  At this 
point, it's probably easier for me to wait until after the pull request 
queue has started flushing again.
diff mbox series

Patch

diff --git a/block/qcow2.h b/block/qcow2.h
index 01b5250415f..3774229ef9c 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -439,7 +439,7 @@  typedef enum QCow2MetadataOverlap {
 #define L2E_OFFSET_MASK 0x00fffffffffffe00ULL
 #define L2E_COMPRESSED_OFFSET_SIZE_MASK 0x3fffffffffffffffULL

-#define REFT_OFFSET_MASK 0xfffffffffffffe00ULL
+#define REFT_OFFSET_MASK 0x00fffffffffffe00ULL

 static inline int64_t start_of_cluster(BDRVQcow2State *s, int64_t offset)
 {