diff mbox series

[v6,2/2] s390: do not call memory_region_allocate_system_memory() multiple times

Message ID 20190916132347.30676-3-imammedo@redhat.com
State New
Headers show
Series s390: stop abusing memory_region_allocate_system_memory() | expand

Commit Message

Igor Mammedov Sept. 16, 2019, 1:23 p.m. UTC
s390 was trying to solve limited KVM memslot size issue by abusing
memory_region_allocate_system_memory(), which breaks API contract
where the function might be called only once.

Beside an invalid use of API, the approach also introduced migration
issue, since RAM chunks for each KVM_SLOT_MAX_BYTES are transferred in
migration stream as separate RAMBlocks.

After discussion [1], it was agreed to break migration from older
QEMU for guest with RAM >8Tb (as it was relatively new (since 2.12)
and considered to be not actually used downstream).
Migration should keep working for guests with less than 8TB and for
more than 8TB with QEMU 4.2 and newer binary.
In case user tries to migrate more than 8TB guest, between incompatible
QEMU versions, migration should fail gracefully due to non-exiting
RAMBlock ID or RAMBlock size mismatch.

Taking in account above and that now KVM code is able to split too
big MemorySection into several memslots, stop abusing
  memory_region_allocate_system_memory()
and use only one memory region for RAM.

1) [PATCH RFC v2 4/4] s390: do not call  memory_region_allocate_system_memory() multiple times

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
---
v3:
  - drop migration compat code

PS:
I don't have access to a suitable system to test it.
---
 hw/s390x/s390-virtio-ccw.c | 21 +++------------------
 1 file changed, 3 insertions(+), 18 deletions(-)

Comments

Peter Xu Sept. 17, 2019, 8:44 a.m. UTC | #1
On Mon, Sep 16, 2019 at 09:23:47AM -0400, Igor Mammedov wrote:
> PS:
> I don't have access to a suitable system to test it.

Hmm I feel like it would be good to have series like this to be at
least smoke tested somehow...

How about manually setup a very small max memslot size and test it on
x86?  IMHO we'd test with both KVMState.manual_dirty_log_protect to be
on & off to make sure to cover the log_clear() code path.  And of
course to trigger those paths we probably need migrations, but I
believe local migrations would be enough.

And since at it, I'm thinking whether we should start assert() in some
way in memory_region_allocate_system_memory() to make sure it's not
called twice from now on on any board.

Regards,
Igor Mammedov Sept. 17, 2019, 1:42 p.m. UTC | #2
On Tue, 17 Sep 2019 16:44:42 +0800
Peter Xu <peterx@redhat.com> wrote:

> On Mon, Sep 16, 2019 at 09:23:47AM -0400, Igor Mammedov wrote:
> > PS:
> > I don't have access to a suitable system to test it.  
> 
> Hmm I feel like it would be good to have series like this to be at
> least smoke tested somehow...
> 
> How about manually setup a very small max memslot size and test it on
> x86?  IMHO we'd test with both KVMState.manual_dirty_log_protect to be
> on & off to make sure to cover the log_clear() code path.  And of
> course to trigger those paths we probably need migrations, but I
> believe local migrations would be enough.

I did smoke test it (Fedora boot loop) [*] on s390 host with hacked
1G max section. I guess I could hack x86 and do the same for x86 guest.
Anyways, suggestions how to test it better are welcome.

*) I don't have much faith in tests we have though as it didn't
   explode with broken v5 in my case. Hence CCing ones who is more
   familiar with migration parts.

   I used 2 memslot split config at 1Gb with offline migration like this:

   $ qemu-system-s390x -M s390-ccw-virtio -m 2048 -cpu max -smp 2 -M accel=kvm  \
        --nographic --hda fedora.qcow2 -serial unix:/tmp/s,server,nowait \
        -monitor stdio 
     (monitor) stop
     (monitor) migrate "exec: cat > savefile
     (monitor) q
   $ qemu-system-s390x -M s390-ccw-virtio -m 2048 -cpu max -smp 2 -M accel=kvm  \
        --nographic --hda fedora.qcow2 -serial unix:/tmp/s,server,nowait \
        -incoming "exec: cat savefile"

> 
> And since at it, I'm thinking whether we should start assert() in some
> way in memory_region_allocate_system_memory() to make sure it's not
> called twice from now on on any board.

there is another broken board that misuses it as well,
I intend to clean that up and drop memory_region_allocate_system_memory()
altogether once s390 case is dealt with.

---
*) I don't have much faith in existing tests though as it didn't
   explode with broken v5 in my case. Hence CCing ones who is more
   familiar with migration parts.

   I've used 2 memslot split config at 1Gb with offline migration like this:

   $ qemu-system-s390x -M s390-ccw-virtio -m 2048 -cpu max -smp 2 -M accel=kvm  \
        --nographic --hda fedora.qcow2 -serial unix:/tmp/s,server,nowait \
        -monitor stdio 
     (monitor) stop
     (monitor) migrate "exec: cat > savefile
     (monitor) q
   $ qemu-system-s390x -M s390-ccw-virtio -m 2048 -cpu max -smp 2 -M accel=kvm  \
        --nographic --hda fedora.qcow2 -serial unix:/tmp/s,server,nowait \
        -incoming "exec: cat savefile"
     (monitor) cont
Peter Xu Sept. 18, 2019, 12:16 a.m. UTC | #3
On Tue, Sep 17, 2019 at 03:42:12PM +0200, Igor Mammedov wrote:
> On Tue, 17 Sep 2019 16:44:42 +0800
> Peter Xu <peterx@redhat.com> wrote:
> 
> > On Mon, Sep 16, 2019 at 09:23:47AM -0400, Igor Mammedov wrote:
> > > PS:
> > > I don't have access to a suitable system to test it.  
> > 
> > Hmm I feel like it would be good to have series like this to be at
> > least smoke tested somehow...
> > 
> > How about manually setup a very small max memslot size and test it on
> > x86?  IMHO we'd test with both KVMState.manual_dirty_log_protect to be
> > on & off to make sure to cover the log_clear() code path.  And of
> > course to trigger those paths we probably need migrations, but I
> > believe local migrations would be enough.
> 
> I did smoke test it (Fedora boot loop) [*] on s390 host with hacked
> 1G max section. I guess I could hack x86 and do the same for x86 guest.
> Anyways, suggestions how to test it better are welcome.
> 
> *) I don't have much faith in tests we have though as it didn't
>    explode with broken v5 in my case. Hence CCing ones who is more
>    familiar with migration parts.
> 
>    I used 2 memslot split config at 1Gb with offline migration like this:
> 
>    $ qemu-system-s390x -M s390-ccw-virtio -m 2048 -cpu max -smp 2 -M accel=kvm  \
>         --nographic --hda fedora.qcow2 -serial unix:/tmp/s,server,nowait \
>         -monitor stdio 
>      (monitor) stop
>      (monitor) migrate "exec: cat > savefile
>      (monitor) q
>    $ qemu-system-s390x -M s390-ccw-virtio -m 2048 -cpu max -smp 2 -M accel=kvm  \
>         --nographic --hda fedora.qcow2 -serial unix:/tmp/s,server,nowait \
>         -incoming "exec: cat savefile"

Yeah this looks good already. A better one could be (AFAICS):

  1) as mentioned, enable KVMState.manual_dirty_log_protect to test
     the log_clear path by offering a host kernel new enough (Linux
     5.2+), then it'll be on by default.  Otherwise the default is
     off.  We can enable some trace points to make sure those code
     paths are triggered if uncertain like trace_kvm_clear_dirty_log.

  2) more aggresive dirtying. This can be done by:

    - run a mem dirty workload inside.  I normally use [1] on my own
      ("mig_mon mm_dirty 1024 500" will dirty mem upon 1024MB with
      500MB/s dirty rate), but any tool would work

    - turn down migration bandwidth using "migrate_set_speed" so the
      migration even harder to converge, then dirty bit path is
      tortured more.  Otherwise local full-speed migration normally
      completes super fast due to pure mem moves.

Though if with 2) above I'd suggest to use unix sockets or tcp
otherwise the dumped file could be super big (hopefully not eating all
the laptop disk!).

[1] https://github.com/xzpeter/clibs/blob/master/bsd/mig_mon/mig_mon.c

Regards,
diff mbox series

Patch

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index baa95aad38..18ad279a00 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -157,27 +157,12 @@  static void virtio_ccw_register_hcalls(void)
 static void s390_memory_init(ram_addr_t mem_size)
 {
     MemoryRegion *sysmem = get_system_memory();
-    ram_addr_t chunk, offset = 0;
-    unsigned int number = 0;
+    MemoryRegion *ram = g_new(MemoryRegion, 1);
     Error *local_err = NULL;
-    gchar *name;
 
     /* allocate RAM for core */
-    name = g_strdup_printf("s390.ram");
-    while (mem_size) {
-        MemoryRegion *ram = g_new(MemoryRegion, 1);
-        uint64_t size = mem_size;
-
-        /* KVM does not allow memslots >= 8 TB */
-        chunk = MIN(size, KVM_SLOT_MAX_BYTES);
-        memory_region_allocate_system_memory(ram, NULL, name, chunk);
-        memory_region_add_subregion(sysmem, offset, ram);
-        mem_size -= chunk;
-        offset += chunk;
-        g_free(name);
-        name = g_strdup_printf("s390.ram.%u", ++number);
-    }
-    g_free(name);
+    memory_region_allocate_system_memory(ram, NULL, "s390.ram", mem_size);
+    memory_region_add_subregion(sysmem, 0, ram);
 
     /*
      * Configure the maximum page size. As no memory devices were created