diff mbox series

[v7,4/4] s390: do not call memory_region_allocate_system_memory() multiple times

Message ID 20190924144751.24149-5-imammedo@redhat.com
State New
Headers show
Series s390: stop abusing memory_region_allocate_system_memory() | expand

Commit Message

Igor Mammedov Sept. 24, 2019, 2:47 p.m. UTC
s390 was trying to solve limited KVM memslot size issue by abusing
memory_region_allocate_system_memory(), which breaks API contract
where the function might be called only once.

Beside an invalid use of API, the approach also introduced migration
issue, since RAM chunks for each KVM_SLOT_MAX_BYTES are transferred in
migration stream as separate RAMBlocks.

After discussion [1], it was agreed to break migration from older
QEMU for guest with RAM >8Tb (as it was relatively new (since 2.12)
and considered to be not actually used downstream).
Migration should keep working for guests with less than 8TB and for
more than 8TB with QEMU 4.2 and newer binary.
In case user tries to migrate more than 8TB guest, between incompatible
QEMU versions, migration should fail gracefully due to non-exiting
RAMBlock ID or RAMBlock size mismatch.

Taking in account above and that now KVM code is able to split too
big MemorySection into several memslots, partially revert commit
 (bb223055b s390-ccw-virtio: allow for systems larger that 7.999TB)
and use kvm_set_max_memslot_size() to set KVMSlot size to
KVM_SLOT_MAX_BYTES.

1) [PATCH RFC v2 4/4] s390: do not call  memory_region_allocate_system_memory() multiple times

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
v7:
  - move KVM_SLOT_MAX_BYTES movement from kvm specific patch to here
    (Peter Xu <peterx@redhat.com>)
v3:
  - drop migration compat code

PS:
I don't have access to a suitable system to test it with 8Tb split,
so I've tested only hacked up KVM_SLOT_MAX_BYTES = 1Gb variant
---
 hw/s390x/s390-virtio-ccw.c | 30 +++---------------------------
 target/s390x/kvm.c         | 11 +++++++++++
 2 files changed, 14 insertions(+), 27 deletions(-)

Comments

Peter Xu Sept. 25, 2019, 3:27 a.m. UTC | #1
On Tue, Sep 24, 2019 at 10:47:51AM -0400, Igor Mammedov wrote:
> s390 was trying to solve limited KVM memslot size issue by abusing
> memory_region_allocate_system_memory(), which breaks API contract
> where the function might be called only once.
> 
> Beside an invalid use of API, the approach also introduced migration
> issue, since RAM chunks for each KVM_SLOT_MAX_BYTES are transferred in
> migration stream as separate RAMBlocks.
> 
> After discussion [1], it was agreed to break migration from older
> QEMU for guest with RAM >8Tb (as it was relatively new (since 2.12)
> and considered to be not actually used downstream).
> Migration should keep working for guests with less than 8TB and for
> more than 8TB with QEMU 4.2 and newer binary.
> In case user tries to migrate more than 8TB guest, between incompatible
> QEMU versions, migration should fail gracefully due to non-exiting
> RAMBlock ID or RAMBlock size mismatch.
> 
> Taking in account above and that now KVM code is able to split too
> big MemorySection into several memslots, partially revert commit
>  (bb223055b s390-ccw-virtio: allow for systems larger that 7.999TB)
> and use kvm_set_max_memslot_size() to set KVMSlot size to
> KVM_SLOT_MAX_BYTES.
> 
> 1) [PATCH RFC v2 4/4] s390: do not call  memory_region_allocate_system_memory() multiple times
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>

Acked-by: Peter Xu <peterx@redhat.com>

IMHO it would be good to at least mention bb223055b9 in the commit
message even if not with a "Fixed:" tag.  May be amended during commit
if anyone prefers.

Also, this only applies the split limitation to s390.  Would that be a
good thing to some other archs as well?

Thanks,
Igor Mammedov Sept. 25, 2019, 11:51 a.m. UTC | #2
On Wed, 25 Sep 2019 11:27:00 +0800
Peter Xu <peterx@redhat.com> wrote:

> On Tue, Sep 24, 2019 at 10:47:51AM -0400, Igor Mammedov wrote:
> > s390 was trying to solve limited KVM memslot size issue by abusing
> > memory_region_allocate_system_memory(), which breaks API contract
> > where the function might be called only once.
> > 
> > Beside an invalid use of API, the approach also introduced migration
> > issue, since RAM chunks for each KVM_SLOT_MAX_BYTES are transferred in
> > migration stream as separate RAMBlocks.
> > 
> > After discussion [1], it was agreed to break migration from older
> > QEMU for guest with RAM >8Tb (as it was relatively new (since 2.12)
> > and considered to be not actually used downstream).
> > Migration should keep working for guests with less than 8TB and for
> > more than 8TB with QEMU 4.2 and newer binary.
> > In case user tries to migrate more than 8TB guest, between incompatible
> > QEMU versions, migration should fail gracefully due to non-exiting
> > RAMBlock ID or RAMBlock size mismatch.
> > 
> > Taking in account above and that now KVM code is able to split too
> > big MemorySection into several memslots, partially revert commit
> >  (bb223055b s390-ccw-virtio: allow for systems larger that 7.999TB)
> > and use kvm_set_max_memslot_size() to set KVMSlot size to
> > KVM_SLOT_MAX_BYTES.
> > 
> > 1) [PATCH RFC v2 4/4] s390: do not call  memory_region_allocate_system_memory() multiple times
> > 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>  
> 
> Acked-by: Peter Xu <peterx@redhat.com>
> 
> IMHO it would be good to at least mention bb223055b9 in the commit
> message even if not with a "Fixed:" tag.  May be amended during commit
> if anyone prefers.

/me confused, bb223055b9 is mentioned in commit message
 
> Also, this only applies the split limitation to s390.  Would that be a
> good thing to some other archs as well?

Don't we have the similar bitmap size issue in KVM for other archs?

> 
> Thanks,
>
Peter Xu Sept. 25, 2019, 11:52 p.m. UTC | #3
On Wed, Sep 25, 2019 at 01:51:05PM +0200, Igor Mammedov wrote:
> On Wed, 25 Sep 2019 11:27:00 +0800
> Peter Xu <peterx@redhat.com> wrote:
> 
> > On Tue, Sep 24, 2019 at 10:47:51AM -0400, Igor Mammedov wrote:
> > > s390 was trying to solve limited KVM memslot size issue by abusing
> > > memory_region_allocate_system_memory(), which breaks API contract
> > > where the function might be called only once.
> > > 
> > > Beside an invalid use of API, the approach also introduced migration
> > > issue, since RAM chunks for each KVM_SLOT_MAX_BYTES are transferred in
> > > migration stream as separate RAMBlocks.
> > > 
> > > After discussion [1], it was agreed to break migration from older
> > > QEMU for guest with RAM >8Tb (as it was relatively new (since 2.12)
> > > and considered to be not actually used downstream).
> > > Migration should keep working for guests with less than 8TB and for
> > > more than 8TB with QEMU 4.2 and newer binary.
> > > In case user tries to migrate more than 8TB guest, between incompatible
> > > QEMU versions, migration should fail gracefully due to non-exiting
> > > RAMBlock ID or RAMBlock size mismatch.
> > > 
> > > Taking in account above and that now KVM code is able to split too
> > > big MemorySection into several memslots, partially revert commit
> > >  (bb223055b s390-ccw-virtio: allow for systems larger that 7.999TB)
> > > and use kvm_set_max_memslot_size() to set KVMSlot size to
> > > KVM_SLOT_MAX_BYTES.
> > > 
> > > 1) [PATCH RFC v2 4/4] s390: do not call  memory_region_allocate_system_memory() multiple times
> > > 
> > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>  
> > 
> > Acked-by: Peter Xu <peterx@redhat.com>
> > 
> > IMHO it would be good to at least mention bb223055b9 in the commit
> > message even if not with a "Fixed:" tag.  May be amended during commit
> > if anyone prefers.
> 
> /me confused, bb223055b9 is mentioned in commit message

I'm sorry, I overlooked that.

>  
> > Also, this only applies the split limitation to s390.  Would that be a
> > good thing to some other archs as well?
> 
> Don't we have the similar bitmap size issue in KVM for other archs?

Yes I thought we had.  So I feel like it would be good to also allow
other archs to support >8TB mem as well.  Thanks,
Igor Mammedov Sept. 27, 2019, 1:33 p.m. UTC | #4
On Thu, 26 Sep 2019 07:52:35 +0800
Peter Xu <peterx@redhat.com> wrote:

> On Wed, Sep 25, 2019 at 01:51:05PM +0200, Igor Mammedov wrote:
> > On Wed, 25 Sep 2019 11:27:00 +0800
> > Peter Xu <peterx@redhat.com> wrote:
> >   
> > > On Tue, Sep 24, 2019 at 10:47:51AM -0400, Igor Mammedov wrote:  
> > > > s390 was trying to solve limited KVM memslot size issue by abusing
> > > > memory_region_allocate_system_memory(), which breaks API contract
> > > > where the function might be called only once.
> > > > 
> > > > Beside an invalid use of API, the approach also introduced migration
> > > > issue, since RAM chunks for each KVM_SLOT_MAX_BYTES are transferred in
> > > > migration stream as separate RAMBlocks.
> > > > 
> > > > After discussion [1], it was agreed to break migration from older
> > > > QEMU for guest with RAM >8Tb (as it was relatively new (since 2.12)
> > > > and considered to be not actually used downstream).
> > > > Migration should keep working for guests with less than 8TB and for
> > > > more than 8TB with QEMU 4.2 and newer binary.
> > > > In case user tries to migrate more than 8TB guest, between incompatible
> > > > QEMU versions, migration should fail gracefully due to non-exiting
> > > > RAMBlock ID or RAMBlock size mismatch.
> > > > 
> > > > Taking in account above and that now KVM code is able to split too
> > > > big MemorySection into several memslots, partially revert commit
> > > >  (bb223055b s390-ccw-virtio: allow for systems larger that 7.999TB)
> > > > and use kvm_set_max_memslot_size() to set KVMSlot size to
> > > > KVM_SLOT_MAX_BYTES.
> > > > 
> > > > 1) [PATCH RFC v2 4/4] s390: do not call  memory_region_allocate_system_memory() multiple times
> > > > 
> > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>    
> > > 
> > > Acked-by: Peter Xu <peterx@redhat.com>
> > > 
> > > IMHO it would be good to at least mention bb223055b9 in the commit
> > > message even if not with a "Fixed:" tag.  May be amended during commit
> > > if anyone prefers.  
> > 
> > /me confused, bb223055b9 is mentioned in commit message  
> 
> I'm sorry, I overlooked that.
> 
> >    
> > > Also, this only applies the split limitation to s390.  Would that be a
> > > good thing to some other archs as well?  
> > 
> > Don't we have the similar bitmap size issue in KVM for other archs?  
> 
> Yes I thought we had.  So I feel like it would be good to also allow
> other archs to support >8TB mem as well.  Thanks,
Another question, Is there another archs with that much RAM that are
available/used in real life (if not I'd wait for demand to arise first)?

If we are to generalize it to other targets, then instead of using
arbitrary memslot max size per target, we could just hardcode or get
from KVM, max supported size of bitmap and use that to calculate
kvm_max_slot_size depending on target page size.

Then there wouldn't be need for having machine specific code
to care about it and pick/set arbitrary values.

Another aspect to think about if we are to enable it for
other targets is memslot accounting. It doesn't affect s390
but other targets that support memory hotplug now assume 1:1
relation between memoryregion:memslot, which currently holds
true but would need to amended in case split is enabled there.
Peter Xu Sept. 28, 2019, 1:28 a.m. UTC | #5
On Fri, Sep 27, 2019 at 03:33:20PM +0200, Igor Mammedov wrote:
> On Thu, 26 Sep 2019 07:52:35 +0800
> Peter Xu <peterx@redhat.com> wrote:
> 
> > On Wed, Sep 25, 2019 at 01:51:05PM +0200, Igor Mammedov wrote:
> > > On Wed, 25 Sep 2019 11:27:00 +0800
> > > Peter Xu <peterx@redhat.com> wrote:
> > >   
> > > > On Tue, Sep 24, 2019 at 10:47:51AM -0400, Igor Mammedov wrote:  
> > > > > s390 was trying to solve limited KVM memslot size issue by abusing
> > > > > memory_region_allocate_system_memory(), which breaks API contract
> > > > > where the function might be called only once.
> > > > > 
> > > > > Beside an invalid use of API, the approach also introduced migration
> > > > > issue, since RAM chunks for each KVM_SLOT_MAX_BYTES are transferred in
> > > > > migration stream as separate RAMBlocks.
> > > > > 
> > > > > After discussion [1], it was agreed to break migration from older
> > > > > QEMU for guest with RAM >8Tb (as it was relatively new (since 2.12)
> > > > > and considered to be not actually used downstream).
> > > > > Migration should keep working for guests with less than 8TB and for
> > > > > more than 8TB with QEMU 4.2 and newer binary.
> > > > > In case user tries to migrate more than 8TB guest, between incompatible
> > > > > QEMU versions, migration should fail gracefully due to non-exiting
> > > > > RAMBlock ID or RAMBlock size mismatch.
> > > > > 
> > > > > Taking in account above and that now KVM code is able to split too
> > > > > big MemorySection into several memslots, partially revert commit
> > > > >  (bb223055b s390-ccw-virtio: allow for systems larger that 7.999TB)
> > > > > and use kvm_set_max_memslot_size() to set KVMSlot size to
> > > > > KVM_SLOT_MAX_BYTES.
> > > > > 
> > > > > 1) [PATCH RFC v2 4/4] s390: do not call  memory_region_allocate_system_memory() multiple times
> > > > > 
> > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>    
> > > > 
> > > > Acked-by: Peter Xu <peterx@redhat.com>
> > > > 
> > > > IMHO it would be good to at least mention bb223055b9 in the commit
> > > > message even if not with a "Fixed:" tag.  May be amended during commit
> > > > if anyone prefers.  
> > > 
> > > /me confused, bb223055b9 is mentioned in commit message  
> > 
> > I'm sorry, I overlooked that.
> > 
> > >    
> > > > Also, this only applies the split limitation to s390.  Would that be a
> > > > good thing to some other archs as well?  
> > > 
> > > Don't we have the similar bitmap size issue in KVM for other archs?  
> > 
> > Yes I thought we had.  So I feel like it would be good to also allow
> > other archs to support >8TB mem as well.  Thanks,
> Another question, Is there another archs with that much RAM that are
> available/used in real life (if not I'd wait for demand to arise first)?

I don't know, so it was a pure question besides the series.  Sorry if
that holds your series somehow, it was not my intention.

> 
> If we are to generalize it to other targets, then instead of using
> arbitrary memslot max size per target, we could just hardcode or get
> from KVM, max supported size of bitmap and use that to calculate
> kvm_max_slot_size depending on target page size.

Right, I think if so hard code would be fine for now, and probably can
with a smallest one across all archs (should depend on the smallest
page size, I guess).

> 
> Then there wouldn't be need for having machine specific code
> to care about it and pick/set arbitrary values.
> 
> Another aspect to think about if we are to enable it for
> other targets is memslot accounting. It doesn't affect s390
> but other targets that support memory hotplug now assume 1:1
> relation between memoryregion:memslot, which currently holds
> true but would need to amended in case split is enabled there.

I didn't know this.  So maybe it makes more sense to have s390 only
here.  Thanks,
Christian Borntraeger Sept. 30, 2019, 7:09 a.m. UTC | #6
On 28.09.19 03:28, Peter Xu wrote:
> On Fri, Sep 27, 2019 at 03:33:20PM +0200, Igor Mammedov wrote:
>> On Thu, 26 Sep 2019 07:52:35 +0800
>> Peter Xu <peterx@redhat.com> wrote:
>>
>>> On Wed, Sep 25, 2019 at 01:51:05PM +0200, Igor Mammedov wrote:
>>>> On Wed, 25 Sep 2019 11:27:00 +0800
>>>> Peter Xu <peterx@redhat.com> wrote:
>>>>   
>>>>> On Tue, Sep 24, 2019 at 10:47:51AM -0400, Igor Mammedov wrote:  
>>>>>> s390 was trying to solve limited KVM memslot size issue by abusing
>>>>>> memory_region_allocate_system_memory(), which breaks API contract
>>>>>> where the function might be called only once.
>>>>>>
>>>>>> Beside an invalid use of API, the approach also introduced migration
>>>>>> issue, since RAM chunks for each KVM_SLOT_MAX_BYTES are transferred in
>>>>>> migration stream as separate RAMBlocks.
>>>>>>
>>>>>> After discussion [1], it was agreed to break migration from older
>>>>>> QEMU for guest with RAM >8Tb (as it was relatively new (since 2.12)
>>>>>> and considered to be not actually used downstream).
>>>>>> Migration should keep working for guests with less than 8TB and for
>>>>>> more than 8TB with QEMU 4.2 and newer binary.
>>>>>> In case user tries to migrate more than 8TB guest, between incompatible
>>>>>> QEMU versions, migration should fail gracefully due to non-exiting
>>>>>> RAMBlock ID or RAMBlock size mismatch.
>>>>>>
>>>>>> Taking in account above and that now KVM code is able to split too
>>>>>> big MemorySection into several memslots, partially revert commit
>>>>>>  (bb223055b s390-ccw-virtio: allow for systems larger that 7.999TB)
>>>>>> and use kvm_set_max_memslot_size() to set KVMSlot size to
>>>>>> KVM_SLOT_MAX_BYTES.
>>>>>>
>>>>>> 1) [PATCH RFC v2 4/4] s390: do not call  memory_region_allocate_system_memory() multiple times
>>>>>>
>>>>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>    
>>>>>
>>>>> Acked-by: Peter Xu <peterx@redhat.com>
>>>>>
>>>>> IMHO it would be good to at least mention bb223055b9 in the commit
>>>>> message even if not with a "Fixed:" tag.  May be amended during commit
>>>>> if anyone prefers.  
>>>>
>>>> /me confused, bb223055b9 is mentioned in commit message  
>>>
>>> I'm sorry, I overlooked that.
>>>
>>>>    
>>>>> Also, this only applies the split limitation to s390.  Would that be a
>>>>> good thing to some other archs as well?  
>>>>
>>>> Don't we have the similar bitmap size issue in KVM for other archs?  
>>>
>>> Yes I thought we had.  So I feel like it would be good to also allow
>>> other archs to support >8TB mem as well.  Thanks,
>> Another question, Is there another archs with that much RAM that are
>> available/used in real life (if not I'd wait for demand to arise first)?
> 
> I don't know, so it was a pure question besides the series.  Sorry if
> that holds your series somehow, it was not my intention.
> 
>>
>> If we are to generalize it to other targets, then instead of using
>> arbitrary memslot max size per target, we could just hardcode or get
>> from KVM, max supported size of bitmap and use that to calculate
>> kvm_max_slot_size depending on target page size.
> 
> Right, I think if so hard code would be fine for now, and probably can
> with a smallest one across all archs (should depend on the smallest
> page size, I guess).
> 
>>
>> Then there wouldn't be need for having machine specific code
>> to care about it and pick/set arbitrary values.
>>
>> Another aspect to think about if we are to enable it for
>> other targets is memslot accounting. It doesn't affect s390
>> but other targets that support memory hotplug now assume 1:1
>> relation between memoryregion:memslot, which currently holds
>> true but would need to amended in case split is enabled there.
> 
> I didn't know this.  So maybe it makes more sense to have s390 only
> here.  Thanks,

OK. So shall I take the series as is via the s390 tree?
I would like to add the following patch on top if nobody minds:

Subject: [PATCH 1/1] s390/kvm: split kvm mem slots at 4TB

Instead of splitting at an unaligned address, we can simply split at
4TB.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 target/s390x/kvm.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
index ad2dd14f7e78..611f56f4b5ac 100644
--- a/target/s390x/kvm.c
+++ b/target/s390x/kvm.c
@@ -126,12 +126,11 @@
 /*
  * KVM does only support memory slots up to KVM_MEM_MAX_NR_PAGES pages
  * as the dirty bitmap must be managed by bitops that take an int as
- * position indicator. If we have a guest beyond that we will split off
- * new subregions. The split must happen on a segment boundary (1MB).
+ * position indicator. This would end at an unaligned  address
+ * (0x7fffff00000). As future variants might provide larger pages
+ * and to make all addresses properly aligned, let us split at 4TB.
  */
-#define KVM_MEM_MAX_NR_PAGES ((1ULL << 31) - 1)
-#define SEG_MSK (~0xfffffULL)
-#define KVM_SLOT_MAX_BYTES ((KVM_MEM_MAX_NR_PAGES * TARGET_PAGE_SIZE) & SEG_MSK)
+#define KVM_SLOT_MAX_BYTES 4096UL*1024*1024*1024
 
 static CPUWatchpoint hw_watchpoint;
 /*
Igor Mammedov Sept. 30, 2019, 9:33 a.m. UTC | #7
On Mon, 30 Sep 2019 09:09:59 +0200
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> On 28.09.19 03:28, Peter Xu wrote:
> > On Fri, Sep 27, 2019 at 03:33:20PM +0200, Igor Mammedov wrote:  
> >> On Thu, 26 Sep 2019 07:52:35 +0800
> >> Peter Xu <peterx@redhat.com> wrote:
> >>  
> >>> On Wed, Sep 25, 2019 at 01:51:05PM +0200, Igor Mammedov wrote:  
> >>>> On Wed, 25 Sep 2019 11:27:00 +0800
> >>>> Peter Xu <peterx@redhat.com> wrote:
> >>>>     
> >>>>> On Tue, Sep 24, 2019 at 10:47:51AM -0400, Igor Mammedov wrote:    
> >>>>>> s390 was trying to solve limited KVM memslot size issue by abusing
> >>>>>> memory_region_allocate_system_memory(), which breaks API contract
> >>>>>> where the function might be called only once.
> >>>>>>
> >>>>>> Beside an invalid use of API, the approach also introduced migration
> >>>>>> issue, since RAM chunks for each KVM_SLOT_MAX_BYTES are transferred in
> >>>>>> migration stream as separate RAMBlocks.
> >>>>>>
> >>>>>> After discussion [1], it was agreed to break migration from older
> >>>>>> QEMU for guest with RAM >8Tb (as it was relatively new (since 2.12)
> >>>>>> and considered to be not actually used downstream).
> >>>>>> Migration should keep working for guests with less than 8TB and for
> >>>>>> more than 8TB with QEMU 4.2 and newer binary.
> >>>>>> In case user tries to migrate more than 8TB guest, between incompatible
> >>>>>> QEMU versions, migration should fail gracefully due to non-exiting
> >>>>>> RAMBlock ID or RAMBlock size mismatch.
> >>>>>>
> >>>>>> Taking in account above and that now KVM code is able to split too
> >>>>>> big MemorySection into several memslots, partially revert commit
> >>>>>>  (bb223055b s390-ccw-virtio: allow for systems larger that 7.999TB)
> >>>>>> and use kvm_set_max_memslot_size() to set KVMSlot size to
> >>>>>> KVM_SLOT_MAX_BYTES.
> >>>>>>
> >>>>>> 1) [PATCH RFC v2 4/4] s390: do not call  memory_region_allocate_system_memory() multiple times
> >>>>>>
> >>>>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>      
> >>>>>
> >>>>> Acked-by: Peter Xu <peterx@redhat.com>
> >>>>>
> >>>>> IMHO it would be good to at least mention bb223055b9 in the commit
> >>>>> message even if not with a "Fixed:" tag.  May be amended during commit
> >>>>> if anyone prefers.    
> >>>>
> >>>> /me confused, bb223055b9 is mentioned in commit message    
> >>>
> >>> I'm sorry, I overlooked that.
> >>>  
> >>>>      
> >>>>> Also, this only applies the split limitation to s390.  Would that be a
> >>>>> good thing to some other archs as well?    
> >>>>
> >>>> Don't we have the similar bitmap size issue in KVM for other archs?    
> >>>
> >>> Yes I thought we had.  So I feel like it would be good to also allow
> >>> other archs to support >8TB mem as well.  Thanks,  
> >> Another question, Is there another archs with that much RAM that are
> >> available/used in real life (if not I'd wait for demand to arise first)?  
> > 
> > I don't know, so it was a pure question besides the series.  Sorry if
> > that holds your series somehow, it was not my intention.
> >   
> >>
> >> If we are to generalize it to other targets, then instead of using
> >> arbitrary memslot max size per target, we could just hardcode or get
> >> from KVM, max supported size of bitmap and use that to calculate
> >> kvm_max_slot_size depending on target page size.  
> > 
> > Right, I think if so hard code would be fine for now, and probably can
> > with a smallest one across all archs (should depend on the smallest
> > page size, I guess).
> >   
> >>
> >> Then there wouldn't be need for having machine specific code
> >> to care about it and pick/set arbitrary values.
> >>
> >> Another aspect to think about if we are to enable it for
> >> other targets is memslot accounting. It doesn't affect s390
> >> but other targets that support memory hotplug now assume 1:1
> >> relation between memoryregion:memslot, which currently holds
> >> true but would need to amended in case split is enabled there.  
> > 
> > I didn't know this.  So maybe it makes more sense to have s390 only
> > here.  Thanks,  
> 
> OK. So shall I take the series as is via the s390 tree?
Yes, I'd appreciate it.

> I would like to add the following patch on top if nobody minds:
> 
> Subject: [PATCH 1/1] s390/kvm: split kvm mem slots at 4TB
> 
> Instead of splitting at an unaligned address, we can simply split at
> 4TB.
> 
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

Looks fine to me

Acked-by: Igor Mammedov <imammedo@redhat.com>

> ---
>  target/s390x/kvm.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
> index ad2dd14f7e78..611f56f4b5ac 100644
> --- a/target/s390x/kvm.c
> +++ b/target/s390x/kvm.c
> @@ -126,12 +126,11 @@
>  /*
>   * KVM does only support memory slots up to KVM_MEM_MAX_NR_PAGES pages
>   * as the dirty bitmap must be managed by bitops that take an int as
> - * position indicator. If we have a guest beyond that we will split off
> - * new subregions. The split must happen on a segment boundary (1MB).
> + * position indicator. This would end at an unaligned  address
> + * (0x7fffff00000). As future variants might provide larger pages
> + * and to make all addresses properly aligned, let us split at 4TB.
>   */
> -#define KVM_MEM_MAX_NR_PAGES ((1ULL << 31) - 1)
> -#define SEG_MSK (~0xfffffULL)
> -#define KVM_SLOT_MAX_BYTES ((KVM_MEM_MAX_NR_PAGES * TARGET_PAGE_SIZE) & SEG_MSK)
> +#define KVM_SLOT_MAX_BYTES 4096UL*1024*1024*1024

I'd use TiB instead of 1024*1024*1024

>  
>  static CPUWatchpoint hw_watchpoint;
>  /*
Christian Borntraeger Sept. 30, 2019, 10:04 a.m. UTC | #8
On 30.09.19 11:33, Igor Mammedov wrote:
> On Mon, 30 Sep 2019 09:09:59 +0200
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> 
>> On 28.09.19 03:28, Peter Xu wrote:
>>> On Fri, Sep 27, 2019 at 03:33:20PM +0200, Igor Mammedov wrote:  
>>>> On Thu, 26 Sep 2019 07:52:35 +0800
>>>> Peter Xu <peterx@redhat.com> wrote:
>>>>  
>>>>> On Wed, Sep 25, 2019 at 01:51:05PM +0200, Igor Mammedov wrote:  
>>>>>> On Wed, 25 Sep 2019 11:27:00 +0800
>>>>>> Peter Xu <peterx@redhat.com> wrote:
>>>>>>     
>>>>>>> On Tue, Sep 24, 2019 at 10:47:51AM -0400, Igor Mammedov wrote:    
>>>>>>>> s390 was trying to solve limited KVM memslot size issue by abusing
>>>>>>>> memory_region_allocate_system_memory(), which breaks API contract
>>>>>>>> where the function might be called only once.
>>>>>>>>
>>>>>>>> Beside an invalid use of API, the approach also introduced migration
>>>>>>>> issue, since RAM chunks for each KVM_SLOT_MAX_BYTES are transferred in
>>>>>>>> migration stream as separate RAMBlocks.
>>>>>>>>
>>>>>>>> After discussion [1], it was agreed to break migration from older
>>>>>>>> QEMU for guest with RAM >8Tb (as it was relatively new (since 2.12)
>>>>>>>> and considered to be not actually used downstream).
>>>>>>>> Migration should keep working for guests with less than 8TB and for
>>>>>>>> more than 8TB with QEMU 4.2 and newer binary.
>>>>>>>> In case user tries to migrate more than 8TB guest, between incompatible
>>>>>>>> QEMU versions, migration should fail gracefully due to non-exiting
>>>>>>>> RAMBlock ID or RAMBlock size mismatch.
>>>>>>>>
>>>>>>>> Taking in account above and that now KVM code is able to split too
>>>>>>>> big MemorySection into several memslots, partially revert commit
>>>>>>>>  (bb223055b s390-ccw-virtio: allow for systems larger that 7.999TB)
>>>>>>>> and use kvm_set_max_memslot_size() to set KVMSlot size to
>>>>>>>> KVM_SLOT_MAX_BYTES.
>>>>>>>>
>>>>>>>> 1) [PATCH RFC v2 4/4] s390: do not call  memory_region_allocate_system_memory() multiple times
>>>>>>>>
>>>>>>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>      
>>>>>>>
>>>>>>> Acked-by: Peter Xu <peterx@redhat.com>
>>>>>>>
>>>>>>> IMHO it would be good to at least mention bb223055b9 in the commit
>>>>>>> message even if not with a "Fixed:" tag.  May be amended during commit
>>>>>>> if anyone prefers.    
>>>>>>
>>>>>> /me confused, bb223055b9 is mentioned in commit message    
>>>>>
>>>>> I'm sorry, I overlooked that.
>>>>>  
>>>>>>      
>>>>>>> Also, this only applies the split limitation to s390.  Would that be a
>>>>>>> good thing to some other archs as well?    
>>>>>>
>>>>>> Don't we have the similar bitmap size issue in KVM for other archs?    
>>>>>
>>>>> Yes I thought we had.  So I feel like it would be good to also allow
>>>>> other archs to support >8TB mem as well.  Thanks,  
>>>> Another question, Is there another archs with that much RAM that are
>>>> available/used in real life (if not I'd wait for demand to arise first)?  
>>>
>>> I don't know, so it was a pure question besides the series.  Sorry if
>>> that holds your series somehow, it was not my intention.
>>>   
>>>>
>>>> If we are to generalize it to other targets, then instead of using
>>>> arbitrary memslot max size per target, we could just hardcode or get
>>>> from KVM, max supported size of bitmap and use that to calculate
>>>> kvm_max_slot_size depending on target page size.  
>>>
>>> Right, I think if so hard code would be fine for now, and probably can
>>> with a smallest one across all archs (should depend on the smallest
>>> page size, I guess).
>>>   
>>>>
>>>> Then there wouldn't be need for having machine specific code
>>>> to care about it and pick/set arbitrary values.
>>>>
>>>> Another aspect to think about if we are to enable it for
>>>> other targets is memslot accounting. It doesn't affect s390
>>>> but other targets that support memory hotplug now assume 1:1
>>>> relation between memoryregion:memslot, which currently holds
>>>> true but would need to amended in case split is enabled there.  
>>>
>>> I didn't know this.  So maybe it makes more sense to have s390 only
>>> here.  Thanks,  
>>
>> OK. So shall I take the series as is via the s390 tree?
> Yes, I'd appreciate it.


Paolo, ok it I pick the first 3 patches as well? Can you ack?
Paolo Bonzini Sept. 30, 2019, 10:35 a.m. UTC | #9
On 30/09/19 12:04, Christian Borntraeger wrote:
> Paolo, ok it I pick the first 3 patches as well? Can you ack?

Yes, please.

Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Paolo
diff mbox series

Patch

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 8bfb6684cb..18ad279a00 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -154,39 +154,15 @@  static void virtio_ccw_register_hcalls(void)
                                    virtio_ccw_hcall_early_printk);
 }
 
-/*
- * KVM does only support memory slots up to KVM_MEM_MAX_NR_PAGES pages
- * as the dirty bitmap must be managed by bitops that take an int as
- * position indicator. If we have a guest beyond that we will split off
- * new subregions. The split must happen on a segment boundary (1MB).
- */
-#define KVM_MEM_MAX_NR_PAGES ((1ULL << 31) - 1)
-#define SEG_MSK (~0xfffffULL)
-#define KVM_SLOT_MAX_BYTES ((KVM_MEM_MAX_NR_PAGES * TARGET_PAGE_SIZE) & SEG_MSK)
 static void s390_memory_init(ram_addr_t mem_size)
 {
     MemoryRegion *sysmem = get_system_memory();
-    ram_addr_t chunk, offset = 0;
-    unsigned int number = 0;
+    MemoryRegion *ram = g_new(MemoryRegion, 1);
     Error *local_err = NULL;
-    gchar *name;
 
     /* allocate RAM for core */
-    name = g_strdup_printf("s390.ram");
-    while (mem_size) {
-        MemoryRegion *ram = g_new(MemoryRegion, 1);
-        uint64_t size = mem_size;
-
-        /* KVM does not allow memslots >= 8 TB */
-        chunk = MIN(size, KVM_SLOT_MAX_BYTES);
-        memory_region_allocate_system_memory(ram, NULL, name, chunk);
-        memory_region_add_subregion(sysmem, offset, ram);
-        mem_size -= chunk;
-        offset += chunk;
-        g_free(name);
-        name = g_strdup_printf("s390.ram.%u", ++number);
-    }
-    g_free(name);
+    memory_region_allocate_system_memory(ram, NULL, "s390.ram", mem_size);
+    memory_region_add_subregion(sysmem, 0, ram);
 
     /*
      * Configure the maximum page size. As no memory devices were created
diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
index 97a662ad0e..54864c259c 100644
--- a/target/s390x/kvm.c
+++ b/target/s390x/kvm.c
@@ -28,6 +28,7 @@ 
 #include "cpu.h"
 #include "internal.h"
 #include "kvm_s390x.h"
+#include "sysemu/kvm_int.h"
 #include "qapi/error.h"
 #include "qemu/error-report.h"
 #include "qemu/timer.h"
@@ -122,6 +123,15 @@ 
  */
 #define VCPU_IRQ_BUF_SIZE(max_cpus) (sizeof(struct kvm_s390_irq) * \
                                      (max_cpus + NR_LOCAL_IRQS))
+/*
+ * KVM does only support memory slots up to KVM_MEM_MAX_NR_PAGES pages
+ * as the dirty bitmap must be managed by bitops that take an int as
+ * position indicator. If we have a guest beyond that we will split off
+ * new subregions. The split must happen on a segment boundary (1MB).
+ */
+#define KVM_MEM_MAX_NR_PAGES ((1ULL << 31) - 1)
+#define SEG_MSK (~0xfffffULL)
+#define KVM_SLOT_MAX_BYTES ((KVM_MEM_MAX_NR_PAGES * TARGET_PAGE_SIZE) & SEG_MSK)
 
 static CPUWatchpoint hw_watchpoint;
 /*
@@ -355,6 +365,7 @@  int kvm_arch_init(MachineState *ms, KVMState *s)
      */
     /* kvm_vm_enable_cap(s, KVM_CAP_S390_AIS, 0); */
 
+    kvm_set_max_memslot_size(KVM_SLOT_MAX_BYTES);
     return 0;
 }