diff mbox series

[v2,4/4] qcow2: Use BDRV_SECTOR_SIZE instead of the hardcoded value

Message ID e3982d5118a90db2442c6ac18f339ec8ba006df2.1578596897.git.berto@igalia.com
State New
Headers show
Series qcow2: Misc BDRV_SECTOR_SIZE updates | expand

Commit Message

Alberto Garcia Jan. 9, 2020, 7:13 p.m. UTC
This replaces all remaining instances in the qcow2 code.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c |  2 +-
 block/qcow2.c         | 10 ++++++----
 2 files changed, 7 insertions(+), 5 deletions(-)

Comments

Max Reitz Jan. 14, 2020, 2:15 p.m. UTC | #1
On 09.01.20 20:13, Alberto Garcia wrote:
> This replaces all remaining instances in the qcow2 code.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cluster.c |  2 +-
>  block/qcow2.c         | 10 ++++++----
>  2 files changed, 7 insertions(+), 5 deletions(-)

The question of course is why we would even have such instances still.

> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 932fc48919..777ca2d409 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -219,7 +219,7 @@ static int l2_load(BlockDriverState *bs, uint64_t offset,
>   * Writes one sector of the L1 table to the disk (can't update single entries
>   * and we really don't want bdrv_pread to perform a read-modify-write)
>   */
> -#define L1_ENTRIES_PER_SECTOR (512 / 8)
> +#define L1_ENTRIES_PER_SECTOR (BDRV_SECTOR_SIZE / 8)
>  int qcow2_write_l1_entry(BlockDriverState *bs, int l1_index)

Here it’s because the comment is wrong: “Can’t update single entries” –
yes, we can.  We’d just have to do a bdrv_pwrite() to a single entry.

>  {
>      BDRVQcow2State *s = bs->opaque;
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 783d2b9578..c0f3e715ef 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -3280,7 +3280,8 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
>  
>      /* Validate options and set default values */
>      if (!QEMU_IS_ALIGNED(qcow2_opts->size, BDRV_SECTOR_SIZE)) {
> -        error_setg(errp, "Image size must be a multiple of 512 bytes");
> +        error_setg(errp, "Image size must be a multiple of %u bytes",
> +                   (unsigned) BDRV_SECTOR_SIZE);

Either way how patch 1 goes, this is due to a limitation in qemu.

>          ret = -EINVAL;
>          goto out;
>      }
> @@ -3836,7 +3837,7 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
>          case QCOW2_CLUSTER_NORMAL:
>              child = s->data_file;
>              copy_offset += offset_into_cluster(s, src_offset);
> -            if ((copy_offset & 511) != 0) {
> +            if (!QEMU_IS_ALIGNED(copy_offset, BDRV_SECTOR_SIZE)) {

Hm.  I don’t get this one.

>                  ret = -EIO;
>                  goto out;
>              }
> @@ -3958,8 +3959,9 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
>          return -ENOTSUP;
>      }
>  
> -    if (offset & 511) {
> -        error_setg(errp, "The new size must be a multiple of 512");
> +    if (!QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE)) {
> +        error_setg(errp, "The new size must be a multiple of %u",
> +                   (unsigned) BDRV_SECTOR_SIZE);

This too is due to the limitation in qemu that BDS lengths are only
stored in multiples of BDRV_SECTOR_SIZE, so we disallow any other image
sizes.

>          return -EINVAL;
>      }
So in 3/4 instances this patch looks good to me (it’s really about the
block layers concept of a “sector”, even though that notion is outdated
in the first hunk), but I don’t quite get the third hunk.  (I’m sure it
has to do something with the block layer, so it’s OK to change in this
patch, but I’m still interested in why we have that limitation there.)

Max
Alberto Garcia Jan. 16, 2020, 11:26 p.m. UTC | #2
On Tue 14 Jan 2020 03:15:48 PM CET, Max Reitz wrote:
>> @@ -219,7 +219,7 @@ static int l2_load(BlockDriverState *bs, uint64_t offset,
>>   * Writes one sector of the L1 table to the disk (can't update single entries
>>   * and we really don't want bdrv_pread to perform a read-modify-write)
>>   */
>> -#define L1_ENTRIES_PER_SECTOR (512 / 8)
>> +#define L1_ENTRIES_PER_SECTOR (BDRV_SECTOR_SIZE / 8)
>>  int qcow2_write_l1_entry(BlockDriverState *bs, int l1_index)
>
> Here it’s because the comment is wrong: “Can’t update single entries” –
> yes, we can.  We’d just have to do a bdrv_pwrite() to a single entry.

What's the point of qcow2_write_l1_entry() then?

>> @@ -3836,7 +3837,7 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
>>          case QCOW2_CLUSTER_NORMAL:
>>              child = s->data_file;
>>              copy_offset += offset_into_cluster(s, src_offset);
>> -            if ((copy_offset & 511) != 0) {
>> +            if (!QEMU_IS_ALIGNED(copy_offset, BDRV_SECTOR_SIZE)) {
>
> Hm.  I don’t get this one.

Checking the code (e.g. block_copy_do_copy()) it seems that the whole
chunk must be cluster aligned so I don't get this one either.

Berto
Max Reitz Jan. 17, 2020, 9:12 a.m. UTC | #3
On 17.01.20 00:26, Alberto Garcia wrote:
> On Tue 14 Jan 2020 03:15:48 PM CET, Max Reitz wrote:
>>> @@ -219,7 +219,7 @@ static int l2_load(BlockDriverState *bs, uint64_t offset,
>>>   * Writes one sector of the L1 table to the disk (can't update single entries
>>>   * and we really don't want bdrv_pread to perform a read-modify-write)
>>>   */
>>> -#define L1_ENTRIES_PER_SECTOR (512 / 8)
>>> +#define L1_ENTRIES_PER_SECTOR (BDRV_SECTOR_SIZE / 8)
>>>  int qcow2_write_l1_entry(BlockDriverState *bs, int l1_index)
>>
>> Here it’s because the comment is wrong: “Can’t update single entries” –
>> yes, we can.  We’d just have to do a bdrv_pwrite() to a single entry.
> 
> What's the point of qcow2_write_l1_entry() then?

I think the point was that we couldn’t, for a long time, because the
block layer only provided sector-granularity access.  This function
simply was never changed when the block layer gained the ability to do
byte-granularity I/O.

(We’d still need this function, but only for the endian swap, I think.)

>>> @@ -3836,7 +3837,7 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
>>>          case QCOW2_CLUSTER_NORMAL:
>>>              child = s->data_file;
>>>              copy_offset += offset_into_cluster(s, src_offset);
>>> -            if ((copy_offset & 511) != 0) {
>>> +            if (!QEMU_IS_ALIGNED(copy_offset, BDRV_SECTOR_SIZE)) {
>>
>> Hm.  I don’t get this one.
> 
> Checking the code (e.g. block_copy_do_copy()) it seems that the whole
> chunk must be cluster aligned so I don't get this one either.

Hm, how did you get to block_copy_do_copy()?  That’s part of the
block-copy infrastructure that’s only used for the backup job, as far as
I’m aware.  It’s different from copy_range.

I don’t see any limitation for copy_range.  I suppose maybe it doesn’t
work for anything that isn’t aligned to physical sectors?  But the qcow2
driver shouldn’t care about that.

On thing’s for sure, the raw driver doesn’t care about it.

Max
Kevin Wolf Jan. 17, 2020, 9:55 a.m. UTC | #4
Am 17.01.2020 um 10:12 hat Max Reitz geschrieben:
> On 17.01.20 00:26, Alberto Garcia wrote:
> > On Tue 14 Jan 2020 03:15:48 PM CET, Max Reitz wrote:
> >>> @@ -219,7 +219,7 @@ static int l2_load(BlockDriverState *bs, uint64_t offset,
> >>>   * Writes one sector of the L1 table to the disk (can't update single entries
> >>>   * and we really don't want bdrv_pread to perform a read-modify-write)
> >>>   */
> >>> -#define L1_ENTRIES_PER_SECTOR (512 / 8)
> >>> +#define L1_ENTRIES_PER_SECTOR (BDRV_SECTOR_SIZE / 8)
> >>>  int qcow2_write_l1_entry(BlockDriverState *bs, int l1_index)
> >>
> >> Here it’s because the comment is wrong: “Can’t update single entries” –
> >> yes, we can.  We’d just have to do a bdrv_pwrite() to a single entry.
> > 
> > What's the point of qcow2_write_l1_entry() then?
> 
> I think the point was that we couldn’t, for a long time, because the
> block layer only provided sector-granularity access.  This function
> simply was never changed when the block layer gained the ability to do
> byte-granularity I/O.
> 
> (We’d still need this function, but only for the endian swap, I think.)

We still can't do byte-granularity writes with O_DIRECT, because that's
a kernel requirement.

The comment explains that we don't want to do a RMW cycle to write a
single entry because that would be slower than just writing a whole
sector. I think this is still accurate. Maybe we should change the
comment to say "can't necessarily update". (The part that looks really
wrong in the comment is "bdrv_pread", that should be "bdrv_pwrite"...)

Now, what's wrong about the logic to avoid the RMW is that it assumes
a fixed required alignment of 512. What it should do is looking at
bs->file->bl.request_alignment and rounding accordingly.

> >>> @@ -3836,7 +3837,7 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
> >>>          case QCOW2_CLUSTER_NORMAL:
> >>>              child = s->data_file;
> >>>              copy_offset += offset_into_cluster(s, src_offset);
> >>> -            if ((copy_offset & 511) != 0) {
> >>> +            if (!QEMU_IS_ALIGNED(copy_offset, BDRV_SECTOR_SIZE)) {
> >>
> >> Hm.  I don’t get this one.
> > 
> > Checking the code (e.g. block_copy_do_copy()) it seems that the whole
> > chunk must be cluster aligned so I don't get this one either.
> 
> Hm, how did you get to block_copy_do_copy()?  That’s part of the
> block-copy infrastructure that’s only used for the backup job, as far as
> I’m aware.  It’s different from copy_range.
> 
> I don’t see any limitation for copy_range.  I suppose maybe it doesn’t
> work for anything that isn’t aligned to physical sectors?  But the qcow2
> driver shouldn’t care about that.
> 
> On thing’s for sure, the raw driver doesn’t care about it.

I don't understand this one either.

Kevin
Max Reitz Jan. 17, 2020, 11:01 a.m. UTC | #5
On 17.01.20 10:55, Kevin Wolf wrote:
> Am 17.01.2020 um 10:12 hat Max Reitz geschrieben:
>> On 17.01.20 00:26, Alberto Garcia wrote:
>>> On Tue 14 Jan 2020 03:15:48 PM CET, Max Reitz wrote:
>>>>> @@ -219,7 +219,7 @@ static int l2_load(BlockDriverState *bs, uint64_t offset,
>>>>>   * Writes one sector of the L1 table to the disk (can't update single entries
>>>>>   * and we really don't want bdrv_pread to perform a read-modify-write)
>>>>>   */
>>>>> -#define L1_ENTRIES_PER_SECTOR (512 / 8)
>>>>> +#define L1_ENTRIES_PER_SECTOR (BDRV_SECTOR_SIZE / 8)
>>>>>  int qcow2_write_l1_entry(BlockDriverState *bs, int l1_index)
>>>>
>>>> Here it’s because the comment is wrong: “Can’t update single entries” –
>>>> yes, we can.  We’d just have to do a bdrv_pwrite() to a single entry.
>>>
>>> What's the point of qcow2_write_l1_entry() then?
>>
>> I think the point was that we couldn’t, for a long time, because the
>> block layer only provided sector-granularity access.  This function
>> simply was never changed when the block layer gained the ability to do
>> byte-granularity I/O.
>>
>> (We’d still need this function, but only for the endian swap, I think.)
> 
> We still can't do byte-granularity writes with O_DIRECT, because that's
> a kernel requirement.

Ah, yes.  But that makes BDRV_SECTOR_SIZE the wrong choice.

> The comment explains that we don't want to do a RMW cycle to write a
> single entry because that would be slower than just writing a whole
> sector. I think this is still accurate. Maybe we should change the
> comment to say "can't necessarily update". (The part that looks really
> wrong in the comment is "bdrv_pread", that should be "bdrv_pwrite"...)

Hm.  But we wouldn’t do an RMW cycle without O_DIRECT, would we?

> Now, what's wrong about the logic to avoid the RMW is that it assumes
> a fixed required alignment of 512. What it should do is looking at
> bs->file->bl.request_alignment and rounding accordingly.

Yes.

>>>>> @@ -3836,7 +3837,7 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
>>>>>          case QCOW2_CLUSTER_NORMAL:
>>>>>              child = s->data_file;
>>>>>              copy_offset += offset_into_cluster(s, src_offset);
>>>>> -            if ((copy_offset & 511) != 0) {
>>>>> +            if (!QEMU_IS_ALIGNED(copy_offset, BDRV_SECTOR_SIZE)) {
>>>>
>>>> Hm.  I don’t get this one.
>>>
>>> Checking the code (e.g. block_copy_do_copy()) it seems that the whole
>>> chunk must be cluster aligned so I don't get this one either.
>>
>> Hm, how did you get to block_copy_do_copy()?  That’s part of the
>> block-copy infrastructure that’s only used for the backup job, as far as
>> I’m aware.  It’s different from copy_range.
>>
>> I don’t see any limitation for copy_range.  I suppose maybe it doesn’t
>> work for anything that isn’t aligned to physical sectors?  But the qcow2
>> driver shouldn’t care about that.
>>
>> On thing’s for sure, the raw driver doesn’t care about it.
> 
> I don't understand this one either.

Good. :-)

Max
Kevin Wolf Jan. 17, 2020, 11:51 a.m. UTC | #6
Am 17.01.2020 um 12:01 hat Max Reitz geschrieben:
> On 17.01.20 10:55, Kevin Wolf wrote:
> > Am 17.01.2020 um 10:12 hat Max Reitz geschrieben:
> >> On 17.01.20 00:26, Alberto Garcia wrote:
> >>> On Tue 14 Jan 2020 03:15:48 PM CET, Max Reitz wrote:
> >>>>> @@ -219,7 +219,7 @@ static int l2_load(BlockDriverState *bs, uint64_t offset,
> >>>>>   * Writes one sector of the L1 table to the disk (can't update single entries
> >>>>>   * and we really don't want bdrv_pread to perform a read-modify-write)
> >>>>>   */
> >>>>> -#define L1_ENTRIES_PER_SECTOR (512 / 8)
> >>>>> +#define L1_ENTRIES_PER_SECTOR (BDRV_SECTOR_SIZE / 8)
> >>>>>  int qcow2_write_l1_entry(BlockDriverState *bs, int l1_index)
> >>>>
> >>>> Here it’s because the comment is wrong: “Can’t update single entries” –
> >>>> yes, we can.  We’d just have to do a bdrv_pwrite() to a single entry.
> >>>
> >>> What's the point of qcow2_write_l1_entry() then?
> >>
> >> I think the point was that we couldn’t, for a long time, because the
> >> block layer only provided sector-granularity access.  This function
> >> simply was never changed when the block layer gained the ability to do
> >> byte-granularity I/O.
> >>
> >> (We’d still need this function, but only for the endian swap, I think.)
> > 
> > We still can't do byte-granularity writes with O_DIRECT, because that's
> > a kernel requirement.
> 
> Ah, yes.  But that makes BDRV_SECTOR_SIZE the wrong choice.
> 
> > The comment explains that we don't want to do a RMW cycle to write a
> > single entry because that would be slower than just writing a whole
> > sector. I think this is still accurate. Maybe we should change the
> > comment to say "can't necessarily update". (The part that looks really
> > wrong in the comment is "bdrv_pread", that should be "bdrv_pwrite"...)
> 
> Hm.  But we wouldn’t do an RMW cycle without O_DIRECT, would we?

file-posix with a regular file has request_alignment == 1 and wouldn't
cause an RMW cycle, I think. I won't try to make a statement about all
non-O_DIRECT cases in all protocols.

The important point is that some cases exist which would get us RMW.

> > Now, what's wrong about the logic to avoid the RMW is that it assumes
> > a fixed required alignment of 512. What it should do is looking at
> > bs->file->bl.request_alignment and rounding accordingly.
> 
> Yes.

Who'll write the patch? :-)

Kevin
Alberto Garcia Jan. 17, 2020, 2:34 p.m. UTC | #7
On Fri 17 Jan 2020 12:51:04 PM CET, Kevin Wolf wrote:
>> > Now, what's wrong about the logic to avoid the RMW is that it assumes
>> > a fixed required alignment of 512. What it should do is looking at
>> > bs->file->bl.request_alignment and rounding accordingly.
>> 
>> Yes.
>
> Who'll write the patch? :-)

I have to resend the series anyway so I can do it.

Berto
Alberto Garcia Jan. 18, 2020, 6:07 p.m. UTC | #8
On Tue 14 Jan 2020 03:15:48 PM CET, Max Reitz <mreitz@redhat.com> wrote:
>> @@ -3836,7 +3837,7 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
>>          case QCOW2_CLUSTER_NORMAL:
>>              child = s->data_file;
>>              copy_offset += offset_into_cluster(s, src_offset);
>> -            if ((copy_offset & 511) != 0) {
>> +            if (!QEMU_IS_ALIGNED(copy_offset, BDRV_SECTOR_SIZE)) {
>
> Hm.  I don’t get this one.

Ok, this came with Fam's "qemu-img convert with copy offloading" series:

   https://lists.gnu.org/archive/html/qemu-block/2018-06/msg00015.html

and qemu-img uses sectors here:

   blk_co_copy_range(..., sector_num << BDRV_SECTOR_BITS,
                     n << BDRV_SECTOR_BITS, ...)

so I guess that's why the check is there. Again, I think this should be
bl.request_alignment, because as far as I can tell copy_file_range()
works just fine unless O_DIRECT is used.

Berto
diff mbox series

Patch

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 932fc48919..777ca2d409 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -219,7 +219,7 @@  static int l2_load(BlockDriverState *bs, uint64_t offset,
  * Writes one sector of the L1 table to the disk (can't update single entries
  * and we really don't want bdrv_pread to perform a read-modify-write)
  */
-#define L1_ENTRIES_PER_SECTOR (512 / 8)
+#define L1_ENTRIES_PER_SECTOR (BDRV_SECTOR_SIZE / 8)
 int qcow2_write_l1_entry(BlockDriverState *bs, int l1_index)
 {
     BDRVQcow2State *s = bs->opaque;
diff --git a/block/qcow2.c b/block/qcow2.c
index 783d2b9578..c0f3e715ef 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3280,7 +3280,8 @@  qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
 
     /* Validate options and set default values */
     if (!QEMU_IS_ALIGNED(qcow2_opts->size, BDRV_SECTOR_SIZE)) {
-        error_setg(errp, "Image size must be a multiple of 512 bytes");
+        error_setg(errp, "Image size must be a multiple of %u bytes",
+                   (unsigned) BDRV_SECTOR_SIZE);
         ret = -EINVAL;
         goto out;
     }
@@ -3836,7 +3837,7 @@  qcow2_co_copy_range_from(BlockDriverState *bs,
         case QCOW2_CLUSTER_NORMAL:
             child = s->data_file;
             copy_offset += offset_into_cluster(s, src_offset);
-            if ((copy_offset & 511) != 0) {
+            if (!QEMU_IS_ALIGNED(copy_offset, BDRV_SECTOR_SIZE)) {
                 ret = -EIO;
                 goto out;
             }
@@ -3958,8 +3959,9 @@  static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
         return -ENOTSUP;
     }
 
-    if (offset & 511) {
-        error_setg(errp, "The new size must be a multiple of 512");
+    if (!QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE)) {
+        error_setg(errp, "The new size must be a multiple of %u",
+                   (unsigned) BDRV_SECTOR_SIZE);
         return -EINVAL;
     }