diff mbox series

[v3,17/39] qcow2: Update l2_allocate() to support L2 slices

Message ID 07a984cef4f1c97dcca44e365aa90ba5b6881e09.1516978645.git.berto@igalia.com
State New
Headers show
Series Allow configuring the qcow2 L2 cache entry size | expand

Commit Message

Alberto Garcia Jan. 26, 2018, 2:59 p.m. UTC
This patch updates l2_allocate() to support the qcow2 cache returning
L2 slices instead of full L2 tables.

The old code simply gets an L2 table from the cache and initializes it
with zeroes or with the contents of an existing table. With a cache
that returns slices instead of tables the idea remains the same, but
the code must now iterate over all the slices that are contained in an
L2 table.

Since now we're operating with slices the function can no longer
return the newly-allocated table, so it's up to the caller to retrieve
the appropriate L2 slice after calling l2_allocate() (note that with
this patch the caller is still loading full L2 tables, but we'll deal
with that in a separate patch).

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 56 +++++++++++++++++++++++++++++++--------------------
 1 file changed, 34 insertions(+), 22 deletions(-)

Comments

Eric Blake Jan. 26, 2018, 4:30 p.m. UTC | #1
On 01/26/2018 08:59 AM, Alberto Garcia wrote:
> This patch updates l2_allocate() to support the qcow2 cache returning
> L2 slices instead of full L2 tables.
> 
> The old code simply gets an L2 table from the cache and initializes it
> with zeroes or with the contents of an existing table. With a cache
> that returns slices instead of tables the idea remains the same, but
> the code must now iterate over all the slices that are contained in an
> L2 table.
> 
> Since now we're operating with slices the function can no longer
> return the newly-allocated table, so it's up to the caller to retrieve
> the appropriate L2 slice after calling l2_allocate() (note that with
> this patch the caller is still loading full L2 tables, but we'll deal
> with that in a separate patch).
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cluster.c | 56 +++++++++++++++++++++++++++++++--------------------
>  1 file changed, 34 insertions(+), 22 deletions(-)
> 

>  
> -            /* if there was an old l2 table, read it from the disk */
> +            /* if there was an old l2 table, read an slice from the disk */

s/an /a /

Reviewed-by: Eric Blake <eblake@redhat.com>
Max Reitz Jan. 31, 2018, 8:07 p.m. UTC | #2
On 2018-01-26 15:59, Alberto Garcia wrote:
> This patch updates l2_allocate() to support the qcow2 cache returning
> L2 slices instead of full L2 tables.
> 
> The old code simply gets an L2 table from the cache and initializes it
> with zeroes or with the contents of an existing table. With a cache
> that returns slices instead of tables the idea remains the same, but
> the code must now iterate over all the slices that are contained in an
> L2 table.
> 
> Since now we're operating with slices the function can no longer
> return the newly-allocated table, so it's up to the caller to retrieve
> the appropriate L2 slice after calling l2_allocate() (note that with
> this patch the caller is still loading full L2 tables, but we'll deal
> with that in a separate patch).
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cluster.c | 56 +++++++++++++++++++++++++++++++--------------------
>  1 file changed, 34 insertions(+), 22 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 57349928a9..2a53c1dc5f 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -264,11 +264,12 @@ int qcow2_write_l1_entry(BlockDriverState *bs, int l1_index)
>   *
>   */
>  
> -static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
> +static int l2_allocate(BlockDriverState *bs, int l1_index)
>  {
>      BDRVQcow2State *s = bs->opaque;
>      uint64_t old_l2_offset;
> -    uint64_t *l2_table = NULL;
> +    uint64_t *l2_slice = NULL;
> +    unsigned slice, slice_size2, n_slices;

I'd personally prefer size_t, but oh well.

(And also maybe slice_size_bytes or slice_bytes instead of the
not-so-intuitive slice_size2.  I know we're using *_size2 in other
places, but that's bad enough as it is.)

Overall (with that fixed or not, and with the spelling fixed as pointed
out by Eric);

Reviewed-by: Max Reitz <mreitz@redhat.com>

However, I'm wondering whether this is the best approach.  The old L2
table is probably not going to be used after this function, so we're
basically polluting the cache here.  That was bad enough so far, but now
that actually means wasting multiple cache entries on it.

Sure, the code is simpler this way.  But maybe it would still be better
to manually copy the data over from the old offset...  (As long as it's
not much more complicated.)

Max

>      int64_t l2_offset;
>      int ret;
>
Alberto Garcia Feb. 1, 2018, 1:13 p.m. UTC | #3
On Wed 31 Jan 2018 09:07:48 PM CET, Max Reitz wrote:
> On 2018-01-26 15:59, Alberto Garcia wrote:
>> This patch updates l2_allocate() to support the qcow2 cache returning
>> L2 slices instead of full L2 tables.
>> 
>> The old code simply gets an L2 table from the cache and initializes it
>> with zeroes or with the contents of an existing table. With a cache
>> that returns slices instead of tables the idea remains the same, but
>> the code must now iterate over all the slices that are contained in an
>> L2 table.
>> 
>> Since now we're operating with slices the function can no longer
>> return the newly-allocated table, so it's up to the caller to retrieve
>> the appropriate L2 slice after calling l2_allocate() (note that with
>> this patch the caller is still loading full L2 tables, but we'll deal
>> with that in a separate patch).
>> 
>> Signed-off-by: Alberto Garcia <berto@igalia.com>
>> ---
>>  block/qcow2-cluster.c | 56 +++++++++++++++++++++++++++++++--------------------
>>  1 file changed, 34 insertions(+), 22 deletions(-)
>> 
>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>> index 57349928a9..2a53c1dc5f 100644
>> --- a/block/qcow2-cluster.c
>> +++ b/block/qcow2-cluster.c
>> @@ -264,11 +264,12 @@ int qcow2_write_l1_entry(BlockDriverState *bs, int l1_index)
>>   *
>>   */
>>  
>> -static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
>> +static int l2_allocate(BlockDriverState *bs, int l1_index)
>>  {
>>      BDRVQcow2State *s = bs->opaque;
>>      uint64_t old_l2_offset;
>> -    uint64_t *l2_table = NULL;
>> +    uint64_t *l2_slice = NULL;
>> +    unsigned slice, slice_size2, n_slices;
>
> I'd personally prefer size_t, but oh well.

I don't see any reason not to use int / unsigned. The size of a slice is
always <= cluster_size (an int), and both slice and n_slices are smaller
than that.

> However, I'm wondering whether this is the best approach.  The old L2
> table is probably not going to be used after this function, so we're
> basically polluting the cache here.  That was bad enough so far, but
> now that actually means wasting multiple cache entries on it.
>
> Sure, the code is simpler this way.  But maybe it would still be
> better to manually copy the data over from the old offset...  (As long
> as it's not much more complicated.)

You mean bypassing the cache altogether?

    qcow2_cache_flush(bs, s->l2_table_cache);
    new_table = g_malloc(s->cluster_size);
    if (old_l2_offset & L1E_OFFSET_MASK) {
        bdrv_pread(bs->file, old_l2_offset, new_table, s->cluster_size);
    } else {
        memset(new_table, 0, s->cluster_size);
    }
    bdrv_pwrite(bs->file, new_l2_offset, new_table, s->cluster_size);
    g_free(new_table);

??

Berto
Anton Nefedov Feb. 1, 2018, 3:23 p.m. UTC | #4
On 1/2/2018 4:13 PM, Alberto Garcia wrote:
> On Wed 31 Jan 2018 09:07:48 PM CET, Max Reitz wrote:
>> On 2018-01-26 15:59, Alberto Garcia wrote:
>>> This patch updates l2_allocate() to support the qcow2 cache returning
>>> L2 slices instead of full L2 tables.
>>>
>>> The old code simply gets an L2 table from the cache and initializes it
>>> with zeroes or with the contents of an existing table. With a cache
>>> that returns slices instead of tables the idea remains the same, but
>>> the code must now iterate over all the slices that are contained in an
>>> L2 table.
>>>
>>> Since now we're operating with slices the function can no longer
>>> return the newly-allocated table, so it's up to the caller to retrieve
>>> the appropriate L2 slice after calling l2_allocate() (note that with
>>> this patch the caller is still loading full L2 tables, but we'll deal
>>> with that in a separate patch).
>>>
>>> Signed-off-by: Alberto Garcia <berto@igalia.com>
>>> ---
>>>   block/qcow2-cluster.c | 56 +++++++++++++++++++++++++++++++--------------------
>>>   1 file changed, 34 insertions(+), 22 deletions(-)
>>>
>>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>>> index 57349928a9..2a53c1dc5f 100644
>>> --- a/block/qcow2-cluster.c
>>> +++ b/block/qcow2-cluster.c
>>> @@ -264,11 +264,12 @@ int qcow2_write_l1_entry(BlockDriverState *bs, int l1_index)
>>>    *
>>>    */
>>>   
>>> -static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
>>> +static int l2_allocate(BlockDriverState *bs, int l1_index)
>>>   {
>>>       BDRVQcow2State *s = bs->opaque;
>>>       uint64_t old_l2_offset;
>>> -    uint64_t *l2_table = NULL;
>>> +    uint64_t *l2_slice = NULL;
>>> +    unsigned slice, slice_size2, n_slices;
>>
>> I'd personally prefer size_t, but oh well.
> 
> I don't see any reason not to use int / unsigned. The size of a slice is
> always <= cluster_size (an int), and both slice and n_slices are smaller
> than that.
> 
>> However, I'm wondering whether this is the best approach.  The old L2
>> table is probably not going to be used after this function, so we're
>> basically polluting the cache here.  That was bad enough so far, but
>> now that actually means wasting multiple cache entries on it.
>>
>> Sure, the code is simpler this way.  But maybe it would still be
>> better to manually copy the data over from the old offset...  (As long
>> as it's not much more complicated.)
> 
> You mean bypassing the cache altogether?
> 
>      qcow2_cache_flush(bs, s->l2_table_cache);
>      new_table = g_malloc(s->cluster_size);
>      if (old_l2_offset & L1E_OFFSET_MASK) {
>          bdrv_pread(bs->file, old_l2_offset, new_table, s->cluster_size);
>      } else {
>          memset(new_table, 0, s->cluster_size);
>      }
>      bdrv_pwrite(bs->file, new_l2_offset, new_table, s->cluster_size);
>      g_free(new_table);
> 
> ??
> 
> Berto
> 

(I know it's a draft so you probably just skipped that but just in case)
It seems ok to bypass the cache read - perhaps even a flush is
not necessary: old_l2_offset must be read-only and flushed at this
point; I believe new_l2_offset might be cached too, so it
needs to be updated.
Alberto Garcia Feb. 1, 2018, 3:43 p.m. UTC | #5
On Thu 01 Feb 2018 04:23:09 PM CET, Anton Nefedov wrote:
>>> However, I'm wondering whether this is the best approach.  The old
>>> L2 table is probably not going to be used after this function, so
>>> we're basically polluting the cache here.  That was bad enough so
>>> far, but now that actually means wasting multiple cache entries on
>>> it.
>>>
>>> Sure, the code is simpler this way.  But maybe it would still be
>>> better to manually copy the data over from the old offset...  (As
>>> long as it's not much more complicated.)
>> 
>> You mean bypassing the cache altogether?
>> 
>>      qcow2_cache_flush(bs, s->l2_table_cache);
>>      new_table = g_malloc(s->cluster_size);
>>      if (old_l2_offset & L1E_OFFSET_MASK) {
>>          bdrv_pread(bs->file, old_l2_offset, new_table, s->cluster_size);
>>      } else {
>>          memset(new_table, 0, s->cluster_size);
>>      }
>>      bdrv_pwrite(bs->file, new_l2_offset, new_table, s->cluster_size);
>>      g_free(new_table);
>> 
>> ??
>
> (I know it's a draft so you probably just skipped that but just in
> case) It seems ok to bypass the cache read - perhaps even a flush is
> not necessary: old_l2_offset must be read-only and flushed at this
> point; I believe new_l2_offset might be cached too, so it needs to be
> updated.

One problem I see with this is that while we wouldn't pollute the cache
we'd always be reading the table twice from disk in all cases:

 1) Read old table
 2) Write new table
 3) Read new table (after l2_allocate(), using the cache this time)

We can of course improve it by reading the old table from disk but
directly in the cache -so we'd spare step (3)-, but we'd still have to
read at least once from disk.

With the old code (especially if slice_size == cluster_size) we don't
need to read anything if the L2 table is already cached:

 1) Get empty table from the cache
 2) memcpy() the old data
 3) Get new table from the cache (after l2_allocate()).

Berto
Max Reitz Feb. 1, 2018, 6:15 p.m. UTC | #6
On 2018-02-01 14:13, Alberto Garcia wrote:
> On Wed 31 Jan 2018 09:07:48 PM CET, Max Reitz wrote:
>> On 2018-01-26 15:59, Alberto Garcia wrote:
>>> This patch updates l2_allocate() to support the qcow2 cache returning
>>> L2 slices instead of full L2 tables.
>>>
>>> The old code simply gets an L2 table from the cache and initializes it
>>> with zeroes or with the contents of an existing table. With a cache
>>> that returns slices instead of tables the idea remains the same, but
>>> the code must now iterate over all the slices that are contained in an
>>> L2 table.
>>>
>>> Since now we're operating with slices the function can no longer
>>> return the newly-allocated table, so it's up to the caller to retrieve
>>> the appropriate L2 slice after calling l2_allocate() (note that with
>>> this patch the caller is still loading full L2 tables, but we'll deal
>>> with that in a separate patch).
>>>
>>> Signed-off-by: Alberto Garcia <berto@igalia.com>
>>> ---
>>>  block/qcow2-cluster.c | 56 +++++++++++++++++++++++++++++++--------------------
>>>  1 file changed, 34 insertions(+), 22 deletions(-)
>>>
>>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>>> index 57349928a9..2a53c1dc5f 100644
>>> --- a/block/qcow2-cluster.c
>>> +++ b/block/qcow2-cluster.c
>>> @@ -264,11 +264,12 @@ int qcow2_write_l1_entry(BlockDriverState *bs, int l1_index)
>>>   *
>>>   */
>>>  
>>> -static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
>>> +static int l2_allocate(BlockDriverState *bs, int l1_index)
>>>  {
>>>      BDRVQcow2State *s = bs->opaque;
>>>      uint64_t old_l2_offset;
>>> -    uint64_t *l2_table = NULL;
>>> +    uint64_t *l2_slice = NULL;
>>> +    unsigned slice, slice_size2, n_slices;
>>
>> I'd personally prefer size_t, but oh well.
> 
> I don't see any reason not to use int / unsigned. The size of a slice is
> always <= cluster_size (an int), and both slice and n_slices are smaller
> than that.

Well, what's the reason to use unsigned? :-)

The type of the expression used to set slice_size2 simply is size_t.

>> However, I'm wondering whether this is the best approach.  The old L2
>> table is probably not going to be used after this function, so we're
>> basically polluting the cache here.  That was bad enough so far, but
>> now that actually means wasting multiple cache entries on it.
>>
>> Sure, the code is simpler this way.  But maybe it would still be
>> better to manually copy the data over from the old offset...  (As long
>> as it's not much more complicated.)
> 
> You mean bypassing the cache altogether?

Yes.

>     qcow2_cache_flush(bs, s->l2_table_cache);
>     new_table = g_malloc(s->cluster_size);
>     if (old_l2_offset & L1E_OFFSET_MASK) {
>         bdrv_pread(bs->file, old_l2_offset, new_table, s->cluster_size);
>     } else {
>         memset(new_table, 0, s->cluster_size);
>     }
>     bdrv_pwrite(bs->file, new_l2_offset, new_table, s->cluster_size);
>     g_free(new_table);
> 
> ??

Well, we wouldn't need to flush the whole table but only the parts of
the L2 table that we are about to copy.  And in theory we can omit even
that, because that old L2 table is not COPIED, so it can't be dirty anyway.

Max
Max Reitz Feb. 1, 2018, 6:22 p.m. UTC | #7
On 2018-02-01 16:43, Alberto Garcia wrote:
> On Thu 01 Feb 2018 04:23:09 PM CET, Anton Nefedov wrote:
>>>> However, I'm wondering whether this is the best approach.  The old
>>>> L2 table is probably not going to be used after this function, so
>>>> we're basically polluting the cache here.  That was bad enough so
>>>> far, but now that actually means wasting multiple cache entries on
>>>> it.
>>>>
>>>> Sure, the code is simpler this way.  But maybe it would still be
>>>> better to manually copy the data over from the old offset...  (As
>>>> long as it's not much more complicated.)
>>>
>>> You mean bypassing the cache altogether?
>>>
>>>      qcow2_cache_flush(bs, s->l2_table_cache);
>>>      new_table = g_malloc(s->cluster_size);
>>>      if (old_l2_offset & L1E_OFFSET_MASK) {
>>>          bdrv_pread(bs->file, old_l2_offset, new_table, s->cluster_size);
>>>      } else {
>>>          memset(new_table, 0, s->cluster_size);
>>>      }
>>>      bdrv_pwrite(bs->file, new_l2_offset, new_table, s->cluster_size);
>>>      g_free(new_table);
>>>
>>> ??
>>
>> (I know it's a draft so you probably just skipped that but just in
>> case) It seems ok to bypass the cache read - perhaps even a flush is
>> not necessary: old_l2_offset must be read-only and flushed at this
>> point; I believe new_l2_offset might be cached too, so it needs to be
>> updated.
> 
> One problem I see with this is that while we wouldn't pollute the cache
> we'd always be reading the table twice from disk in all cases:
> 
>  1) Read old table
>  2) Write new table
>  3) Read new table (after l2_allocate(), using the cache this time)
> 
> We can of course improve it by reading the old table from disk but
> directly in the cache -so we'd spare step (3)-, but we'd still have to
> read at least once from disk.
> 
> With the old code (especially if slice_size == cluster_size) we don't
> need to read anything if the L2 table is already cached:
> 
>  1) Get empty table from the cache
>  2) memcpy() the old data
>  3) Get new table from the cache (after l2_allocate()).

Well, then scratch the bdrv_pwrite() for the new table and keep using
the cache for that (because that actually sounds useful).

On second thought, though, it's rather probable the old L2 table is
already in the cache...  Before the guest does a write to a location, it
is reasonable to assume it has read from there before.

So I guess we could think about adding a parameter to qcow2_cache_put()
or something to reset the LRU counter because we probably won't need
that entry anymore.  But not something for this series, of course.

Max
Alberto Garcia Feb. 2, 2018, 8:08 a.m. UTC | #8
On Thu 01 Feb 2018 07:22:16 PM CET, Max Reitz wrote:
> On 2018-02-01 16:43, Alberto Garcia wrote:
>> On Thu 01 Feb 2018 04:23:09 PM CET, Anton Nefedov wrote:
>>>>> However, I'm wondering whether this is the best approach.  The old
>>>>> L2 table is probably not going to be used after this function, so
>>>>> we're basically polluting the cache here.  That was bad enough so
>>>>> far, but now that actually means wasting multiple cache entries on
>>>>> it.
>>>>>
>>>>> Sure, the code is simpler this way.  But maybe it would still be
>>>>> better to manually copy the data over from the old offset...  (As
>>>>> long as it's not much more complicated.)
>>>>
>>>> You mean bypassing the cache altogether?
>>>>
>>>>      qcow2_cache_flush(bs, s->l2_table_cache);
>>>>      new_table = g_malloc(s->cluster_size);
>>>>      if (old_l2_offset & L1E_OFFSET_MASK) {
>>>>          bdrv_pread(bs->file, old_l2_offset, new_table, s->cluster_size);
>>>>      } else {
>>>>          memset(new_table, 0, s->cluster_size);
>>>>      }
>>>>      bdrv_pwrite(bs->file, new_l2_offset, new_table, s->cluster_size);
>>>>      g_free(new_table);
>>>>
>>>> ??
>>>
>>> (I know it's a draft so you probably just skipped that but just in
>>> case) It seems ok to bypass the cache read - perhaps even a flush is
>>> not necessary: old_l2_offset must be read-only and flushed at this
>>> point; I believe new_l2_offset might be cached too, so it needs to be
>>> updated.
>> 
>> One problem I see with this is that while we wouldn't pollute the cache
>> we'd always be reading the table twice from disk in all cases:
>> 
>>  1) Read old table
>>  2) Write new table
>>  3) Read new table (after l2_allocate(), using the cache this time)
>> 
>> We can of course improve it by reading the old table from disk but
>> directly in the cache -so we'd spare step (3)-, but we'd still have to
>> read at least once from disk.
>> 
>> With the old code (especially if slice_size == cluster_size) we don't
>> need to read anything if the L2 table is already cached:
>> 
>>  1) Get empty table from the cache
>>  2) memcpy() the old data
>>  3) Get new table from the cache (after l2_allocate()).
>
> Well, then scratch the bdrv_pwrite() for the new table and keep using
> the cache for that (because that actually sounds useful).
>
> On second thought, though, it's rather probable the old L2 table is
> already in the cache...  Before the guest does a write to a location,
> it is reasonable to assume it has read from there before.
>
> So I guess we could think about adding a parameter to qcow2_cache_put()
> or something to reset the LRU counter because we probably won't need
> that entry anymore.  But not something for this series, of course.

That actually doesn't sound like a bad idea, there are maybe more cases
in which we know we're unlikely to need a cache entry soon, but as you
say let's take a look at it after this series.

Berto
Alberto Garcia Feb. 2, 2018, 9:41 a.m. UTC | #9
On Thu 01 Feb 2018 07:15:23 PM CET, Max Reitz <mreitz@redhat.com> wrote:
>>>> -static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
>>>> +static int l2_allocate(BlockDriverState *bs, int l1_index)
>>>>  {
>>>>      BDRVQcow2State *s = bs->opaque;
>>>>      uint64_t old_l2_offset;
>>>> -    uint64_t *l2_table = NULL;
>>>> +    uint64_t *l2_slice = NULL;
>>>> +    unsigned slice, slice_size2, n_slices;
>>>
>>> I'd personally prefer size_t, but oh well.
>> 
>> I don't see any reason not to use int / unsigned. The size of a slice
>> is always <= cluster_size (an int), and both slice and n_slices are
>> smaller than that.
>
> Well, what's the reason to use unsigned? :-)
> The type of the expression used to set slice_size2 simply is size_t.

I tend to choose the type of a variable based on what it's going to
hold, and use int (signed or not) whenever possible.

In this case a normal integer can certainly hold all possible values of
slice_size2. And unsigned because it's never going to be negative. It
could also be signed, it's not going to be any different in practice,
it's just more explicit.

Berto
diff mbox series

Patch

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 57349928a9..2a53c1dc5f 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -264,11 +264,12 @@  int qcow2_write_l1_entry(BlockDriverState *bs, int l1_index)
  *
  */
 
-static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
+static int l2_allocate(BlockDriverState *bs, int l1_index)
 {
     BDRVQcow2State *s = bs->opaque;
     uint64_t old_l2_offset;
-    uint64_t *l2_table = NULL;
+    uint64_t *l2_slice = NULL;
+    unsigned slice, slice_size2, n_slices;
     int64_t l2_offset;
     int ret;
 
@@ -299,42 +300,45 @@  static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
 
     /* allocate a new entry in the l2 cache */
 
+    slice_size2 = s->l2_slice_size * sizeof(uint64_t);
+    n_slices = s->cluster_size / slice_size2;
+
     trace_qcow2_l2_allocate_get_empty(bs, l1_index);
-    {
+    for (slice = 0; slice < n_slices; slice++) {
         ret = qcow2_cache_get_empty(bs, s->l2_table_cache,
-                                    l2_offset,
-                                    (void **) table);
+                                    l2_offset + slice * slice_size2,
+                                    (void **) &l2_slice);
         if (ret < 0) {
             goto fail;
         }
 
-        l2_table = *table;
-
         if ((old_l2_offset & L1E_OFFSET_MASK) == 0) {
-            /* if there was no old l2 table, clear the new table */
-            memset(l2_table, 0, s->l2_size * sizeof(uint64_t));
+            /* if there was no old l2 table, clear the new slice */
+            memset(l2_slice, 0, slice_size2);
         } else {
-            uint64_t *old_table;
+            uint64_t *old_slice;
+            uint64_t old_l2_slice_offset =
+                (old_l2_offset & L1E_OFFSET_MASK) + slice * slice_size2;
 
-            /* if there was an old l2 table, read it from the disk */
+            /* if there was an old l2 table, read an slice from the disk */
             BLKDBG_EVENT(bs->file, BLKDBG_L2_ALLOC_COW_READ);
-            ret = qcow2_cache_get(bs, s->l2_table_cache,
-                                  old_l2_offset & L1E_OFFSET_MASK,
-                                  (void **) &old_table);
+            ret = qcow2_cache_get(bs, s->l2_table_cache, old_l2_slice_offset,
+                                  (void **) &old_slice);
             if (ret < 0) {
                 goto fail;
             }
 
-            memcpy(l2_table, old_table, s->cluster_size);
+            memcpy(l2_slice, old_slice, slice_size2);
 
-            qcow2_cache_put(s->l2_table_cache, (void **) &old_table);
+            qcow2_cache_put(s->l2_table_cache, (void **) &old_slice);
         }
 
-        /* write the l2 table to the file */
+        /* write the l2 slice to the file */
         BLKDBG_EVENT(bs->file, BLKDBG_L2_ALLOC_WRITE);
 
         trace_qcow2_l2_allocate_write_l2(bs, l1_index);
-        qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+        qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
+        qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
     }
 
     ret = qcow2_cache_flush(bs, s->l2_table_cache);
@@ -350,14 +354,13 @@  static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
         goto fail;
     }
 
-    *table = l2_table;
     trace_qcow2_l2_allocate_done(bs, l1_index, 0);
     return 0;
 
 fail:
     trace_qcow2_l2_allocate_done(bs, l1_index, ret);
-    if (l2_table != NULL) {
-        qcow2_cache_put(s->l2_table_cache, (void **) table);
+    if (l2_slice != NULL) {
+        qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
     }
     s->l1_table[l1_index] = old_l2_offset;
     if (l2_offset > 0) {
@@ -702,7 +705,7 @@  static int get_cluster_table(BlockDriverState *bs, uint64_t offset,
         }
     } else {
         /* First allocate a new L2 table (and do COW if needed) */
-        ret = l2_allocate(bs, l1_index, &l2_table);
+        ret = l2_allocate(bs, l1_index);
         if (ret < 0) {
             return ret;
         }
@@ -712,6 +715,15 @@  static int get_cluster_table(BlockDriverState *bs, uint64_t offset,
             qcow2_free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t),
                                 QCOW2_DISCARD_OTHER);
         }
+
+        /* Get the offset of the newly-allocated l2 table */
+        l2_offset = s->l1_table[l1_index] & L1E_OFFSET_MASK;
+        assert(offset_into_cluster(s, l2_offset) == 0);
+        /* Load the l2 table in memory */
+        ret = l2_load(bs, offset, l2_offset, &l2_table);
+        if (ret < 0) {
+            return ret;
+        }
     }
 
     /* find the cluster offset for the given disk offset */