diff mbox

[21/27] pc: add memory hotplug 440fx machine

Message ID 1385001528-12003-22-git-send-email-imammedo@redhat.com
State New
Headers show

Commit Message

Igor Mammedov Nov. 21, 2013, 2:38 a.m. UTC
Add DimmBus for memory hotplug below 4Gb or above 4Gb depending
on initial memory size and hotplug memory size.

* if ram_size is less than 32-bit PCI hole start, reserve
  hotplug memory region as [ram_size,32bit-PCIhole-start)
  if hotplug memory region fits there,
  otherwise reserve hotplug memory region after "0x100000000ULL
  + above_4g_mem_size"

* setup memory hotplug callback to pass event to ACPI hadware.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 hw/i386/pc.c               |   52 ++++++++++++++++++++++++++++++++++++++++++++
 hw/i386/pc_piix.c          |    6 ++++-
 hw/pci-host/piix.c         |   13 ++++++++++-
 include/hw/i386/pc.h       |   16 +++++++++++++
 include/hw/pci-host/piix.h |    2 +
 5 files changed, 87 insertions(+), 2 deletions(-)

Comments

liguang Nov. 21, 2013, 5:48 a.m. UTC | #1
Hi, Igor

Igor Mammedov wrote:
> Add DimmBus for memory hotplug below 4Gb or above 4Gb depending
> on initial memory size and hotplug memory size.
>    
...
> +static
> +void pc_hotplug_memory_init_impl(Object *owner,
> +                                 MemoryRegion *system_memory,
> +                                 ram_addr_t low_hotplug_mem_start,
> +                                 ram_addr_t low_hotplug_mem_end,
> +                                 DimmBus *hotplug_mem_bus,
> +                                 ram_addr_t *high_mem_end)
> +{
> +    QemuOpts *opts = qemu_opts_find(qemu_find_opts("memory-opts"), NULL);
> +    ram_addr_t ram_size = qemu_opt_get_size(opts, "mem", 0);
> +    ram_addr_t maxmem = qemu_opt_get_size(opts, "maxmem", 0);
> +    ram_addr_t hotplug_mem_size;
> +
> +    if (maxmem<= ram_size) {
> +        /* Disable ACPI migration code and creation of memory devices in SSDT */
>    

Why not give the memory that not be hot-added a chance to be placed on 
one memory slot?
if all memory can be hot-added and hot-removed, then we can bring in 
more flexibility for
memory hotplug feature.

Thanks!

> +        qemu_opt_set_number(opts, "slots", 0);
> +        return;
> +    }
> +
> +    hotplug_mem_size = maxmem - ram_size;
> +    if (hotplug_mem_size<= (low_hotplug_mem_end - low_hotplug_mem_start)) {
> +        hotplug_mem_bus->base = low_hotplug_mem_start;
> +    } else {
> +        hotplug_mem_bus->base = ROUND_UP(*high_mem_end, 1ULL<<  30);
> +        *high_mem_end = hotplug_mem_bus->base + hotplug_mem_size;
> +    }
> +
>    
...
Andreas Färber Nov. 21, 2013, 2:13 p.m. UTC | #2
Am 21.11.2013 06:48, schrieb Li Guang:
> Hi, Igor
> 
> Igor Mammedov wrote:
>> Add DimmBus for memory hotplug below 4Gb or above 4Gb depending
>> on initial memory size and hotplug memory size.
>>    
> ...
>> +static
>> +void pc_hotplug_memory_init_impl(Object *owner,
>> +                                 MemoryRegion *system_memory,
>> +                                 ram_addr_t low_hotplug_mem_start,
>> +                                 ram_addr_t low_hotplug_mem_end,
>> +                                 DimmBus *hotplug_mem_bus,
>> +                                 ram_addr_t *high_mem_end)
>> +{
>> +    QemuOpts *opts = qemu_opts_find(qemu_find_opts("memory-opts"),
>> NULL);
>> +    ram_addr_t ram_size = qemu_opt_get_size(opts, "mem", 0);
>> +    ram_addr_t maxmem = qemu_opt_get_size(opts, "maxmem", 0);
>> +    ram_addr_t hotplug_mem_size;
>> +
>> +    if (maxmem<= ram_size) {
>> +        /* Disable ACPI migration code and creation of memory devices
>> in SSDT */
>>    
> 
> Why not give the memory that not be hot-added a chance to be placed on
> one memory slot?

Seconded, I believe I requested that on the previous version already.

Andreas

> if all memory can be hot-added and hot-removed, then we can bring in
> more flexibility for
> memory hotplug feature.
Igor Mammedov Nov. 21, 2013, 2:34 p.m. UTC | #3
On Thu, 21 Nov 2013 15:13:12 +0100
Andreas Färber <afaerber@suse.de> wrote:

> Am 21.11.2013 06:48, schrieb Li Guang:
> > Hi, Igor
> > 
> > Igor Mammedov wrote:
> >> Add DimmBus for memory hotplug below 4Gb or above 4Gb depending
> >> on initial memory size and hotplug memory size.
> >>    
> > ...
> >> +static
> >> +void pc_hotplug_memory_init_impl(Object *owner,
> >> +                                 MemoryRegion *system_memory,
> >> +                                 ram_addr_t low_hotplug_mem_start,
> >> +                                 ram_addr_t low_hotplug_mem_end,
> >> +                                 DimmBus *hotplug_mem_bus,
> >> +                                 ram_addr_t *high_mem_end)
> >> +{
> >> +    QemuOpts *opts = qemu_opts_find(qemu_find_opts("memory-opts"),
> >> NULL);
> >> +    ram_addr_t ram_size = qemu_opt_get_size(opts, "mem", 0);
> >> +    ram_addr_t maxmem = qemu_opt_get_size(opts, "maxmem", 0);
> >> +    ram_addr_t hotplug_mem_size;
> >> +
> >> +    if (maxmem<= ram_size) {
> >> +        /* Disable ACPI migration code and creation of memory devices
> >> in SSDT */
> >>    
> > 
> > Why not give the memory that not be hot-added a chance to be placed on
> > one memory slot?
> 
> Seconded, I believe I requested that on the previous version already.
Because current initial memory allocation is a mess and this already
large series would become even more large and intrusive, so far series
it relatively not intrusive and self contained.

I believe re-factoring of initial memory to use Dimm devices should be
done later on top of infrastructure this series provides.

> Andreas
> 
> > if all memory can be hot-added and hot-removed, then we can bring in
> > more flexibility for
> > memory hotplug feature.
>
Michael S. Tsirkin Nov. 21, 2013, 2:39 p.m. UTC | #4
On Thu, Nov 21, 2013 at 03:34:53PM +0100, Igor Mammedov wrote:
> On Thu, 21 Nov 2013 15:13:12 +0100
> Andreas Färber <afaerber@suse.de> wrote:
> 
> > Am 21.11.2013 06:48, schrieb Li Guang:
> > > Hi, Igor
> > > 
> > > Igor Mammedov wrote:
> > >> Add DimmBus for memory hotplug below 4Gb or above 4Gb depending
> > >> on initial memory size and hotplug memory size.
> > >>    
> > > ...
> > >> +static
> > >> +void pc_hotplug_memory_init_impl(Object *owner,
> > >> +                                 MemoryRegion *system_memory,
> > >> +                                 ram_addr_t low_hotplug_mem_start,
> > >> +                                 ram_addr_t low_hotplug_mem_end,
> > >> +                                 DimmBus *hotplug_mem_bus,
> > >> +                                 ram_addr_t *high_mem_end)
> > >> +{
> > >> +    QemuOpts *opts = qemu_opts_find(qemu_find_opts("memory-opts"),
> > >> NULL);
> > >> +    ram_addr_t ram_size = qemu_opt_get_size(opts, "mem", 0);
> > >> +    ram_addr_t maxmem = qemu_opt_get_size(opts, "maxmem", 0);
> > >> +    ram_addr_t hotplug_mem_size;
> > >> +
> > >> +    if (maxmem<= ram_size) {
> > >> +        /* Disable ACPI migration code and creation of memory devices
> > >> in SSDT */
> > >>    
> > > 
> > > Why not give the memory that not be hot-added a chance to be placed on
> > > one memory slot?
> > 
> > Seconded, I believe I requested that on the previous version already.
> Because current initial memory allocation is a mess and this already
> large series would become even more large and intrusive, so far series
> it relatively not intrusive and self contained.
> 
> I believe re-factoring of initial memory to use Dimm devices should be
> done later on top of infrastructure this series provides.

Kind of makes sense, it looks like a feature request rather than
a bug report.
Maybe add some comments in code/commit logs?

> > Andreas
> > 
> > > if all memory can be hot-added and hot-removed, then we can bring in
> > > more flexibility for
> > > memory hotplug feature.
> >
Andreas Färber Nov. 21, 2013, 4:09 p.m. UTC | #5
Am 21.11.2013 15:34, schrieb Igor Mammedov:
> On Thu, 21 Nov 2013 15:13:12 +0100
> Andreas Färber <afaerber@suse.de> wrote:
>> Am 21.11.2013 06:48, schrieb Li Guang:
>>> Why not give the memory that not be hot-added a chance to be placed on
>>> one memory slot?
>>
>> Seconded, I believe I requested that on the previous version already.
> Because current initial memory allocation is a mess and this already
> large series would become even more large and intrusive, so far series
> it relatively not intrusive and self contained.
> 
> I believe re-factoring of initial memory to use Dimm devices should be
> done later on top of infrastructure this series provides.

My problem with that is that that would be a semantically incompatible
modeling change. With your series we might have /machine/dimm.0/child[0]
be the first hot-plugged memory and once such a refactoring is done,
child[0] silently becomes -m and child[1] is the hot-added one.

So if we know that we want/need to change the infrastructure, why not
add a single patch (?) to allocate one additional object on the bus now?
Unless we actually write the code, we won't know whether there are some
incorrect hot-plug assumptions in the dimm code.

Andreas
Michael S. Tsirkin Nov. 21, 2013, 4:17 p.m. UTC | #6
On Thu, Nov 21, 2013 at 05:09:27PM +0100, Andreas Färber wrote:
> Am 21.11.2013 15:34, schrieb Igor Mammedov:
> > On Thu, 21 Nov 2013 15:13:12 +0100
> > Andreas Färber <afaerber@suse.de> wrote:
> >> Am 21.11.2013 06:48, schrieb Li Guang:
> >>> Why not give the memory that not be hot-added a chance to be placed on
> >>> one memory slot?
> >>
> >> Seconded, I believe I requested that on the previous version already.
> > Because current initial memory allocation is a mess and this already
> > large series would become even more large and intrusive, so far series
> > it relatively not intrusive and self contained.
> > 
> > I believe re-factoring of initial memory to use Dimm devices should be
> > done later on top of infrastructure this series provides.
> 
> My problem with that is that that would be a semantically incompatible
> modeling change.

Yes but we are not merging this for 1.7, time enough to make
changes before 1.8.

> With your series we might have /machine/dimm.0/child[0]
> be the first hot-plugged memory and once such a refactoring is done,
> child[0] silently becomes -m and child[1] is the hot-added one.
> 
> So if we know that we want/need to change the infrastructure, why not
> add a single patch (?) to allocate one additional object on the bus now?
> Unless we actually write the code, we won't know whether there are some
> incorrect hot-plug assumptions in the dimm code.
> 
> Andreas
> -- 
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
> GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Gerd Hoffmann Nov. 22, 2013, 2:23 p.m. UTC | #7
On Do, 2013-11-21 at 03:38 +0100, Igor Mammedov wrote:
> Add DimmBus for memory hotplug below 4Gb or above 4Gb depending
> on initial memory size and hotplug memory size.
> 
> * if ram_size is less than 32-bit PCI hole start, reserve
>   hotplug memory region as [ram_size,32bit-PCIhole-start)
>   if hotplug memory region fits there,
>   otherwise reserve hotplug memory region after "0x100000000ULL
>   + above_4g_mem_size"

Hmm, 32-bit pci hole start depends on ram size ...

Does it make sense to hotplug memory above 4g unconditionally to
simplify things?

cheers,
  Gerd
Igor Mammedov Nov. 25, 2013, 10:41 a.m. UTC | #8
On Thu, 21 Nov 2013 17:09:27 +0100
Andreas Färber <afaerber@suse.de> wrote:

> Am 21.11.2013 15:34, schrieb Igor Mammedov:
> > On Thu, 21 Nov 2013 15:13:12 +0100
> > Andreas Färber <afaerber@suse.de> wrote:
> >> Am 21.11.2013 06:48, schrieb Li Guang:
> >>> Why not give the memory that not be hot-added a chance to be placed on
> >>> one memory slot?
> >>
> >> Seconded, I believe I requested that on the previous version already.
> > Because current initial memory allocation is a mess and this already
> > large series would become even more large and intrusive, so far series
> > it relatively not intrusive and self contained.
> > 
> > I believe re-factoring of initial memory to use Dimm devices should be
> > done later on top of infrastructure this series provides.
> 
> My problem with that is that that would be a semantically incompatible
> modeling change. With your series we might have /machine/dimm.0/child[0]
> be the first hot-plugged memory and once such a refactoring is done,
> child[0] silently becomes -m and child[1] is the hot-added one.

I think there won't be silent change in child[0], since most likely
initial RAM would require additional DimmBus (maybe even 2)
for it's devices.

But anyway, why would this be an issue?

> So if we know that we want/need to change the infrastructure, why not
> add a single patch (?) to allocate one additional object on the bus now?
> Unless we actually write the code, we won't know whether there are some
> incorrect hot-plug assumptions in the dimm code.
It wouldn't be a single simple patch for PC, I'm afraid.
I don't see point in adding dummy DIMM device for initial memory and then
do re-aliasing of its memory region in GPA as it's done in current code.

As I see it (taking in account Marcolo's/Paolo's alignment path), current
single MR for initial RAM should be split in 1-4 separate MRs depending on
initial RAM amount and alignment requirements between HPA/GPA addresses.

That would probably introduce additional, non hotlugable DimmBus-es (1-2)
for low and high memory, which would be incompatible with old machine types
devices and GPA layout, so why care about what
/machine/dimm.0/child[0] would be in new machine version?

> Andreas
>
Igor Mammedov Nov. 25, 2013, 11 a.m. UTC | #9
On Fri, 22 Nov 2013 15:23:56 +0100
Gerd Hoffmann <kraxel@redhat.com> wrote:

> On Do, 2013-11-21 at 03:38 +0100, Igor Mammedov wrote:
> > Add DimmBus for memory hotplug below 4Gb or above 4Gb depending
> > on initial memory size and hotplug memory size.
> > 
> > * if ram_size is less than 32-bit PCI hole start, reserve
> >   hotplug memory region as [ram_size,32bit-PCIhole-start)
> >   if hotplug memory region fits there,
> >   otherwise reserve hotplug memory region after "0x100000000ULL
> >   + above_4g_mem_size"
> 
> Hmm, 32-bit pci hole start depends on ram size ...
> 
> Does it make sense to hotplug memory above 4g unconditionally to
> simplify things?
It does and it was so in v6 RFC,
But it would rule out hotplug for 32-bit guests that doesn't support
more then 4Gb.

As use case 32-bit guest could start whit small initial memory
and hotplug additional memory if needed up to point where 32-bit
PCI hole starts. That would allow guests to launch with small amount
but baloon up upto 2-3.5 Gb depending on machine type.
I could drop 32-bit guest support and do only high mem hotplug if
this case it not interesting to the comunity, any suggestions?

I'm now experimenting with removing pci-info and allowing BIOS to
do placement of 32-bit PCI bars akin it was done with 64-bit BARs,
(I'm looking to using E820 reservations for it but I'm not sure it
will work with every guest OS, so it needs more testing).

> cheers,
>   Gerd
> 
> 
>
Gerd Hoffmann Nov. 25, 2013, 11:39 a.m. UTC | #10
> > Does it make sense to hotplug memory above 4g unconditionally to
> > simplify things?
> It does and it was so in v6 RFC,
> But it would rule out hotplug for 32-bit guests that doesn't support
> more then 4Gb.

Indeed.

> As use case 32-bit guest could start whit small initial memory
> and hotplug additional memory if needed up to point where 32-bit
> PCI hole starts. That would allow guests to launch with small amount
> but baloon up upto 2-3.5 Gb depending on machine type.
> I could drop 32-bit guest support and do only high mem hotplug if
> this case it not interesting to the comunity, any suggestions?

32bit limits start to hurt with 1GB already.  Kernel address space is 1G
on 32bit, so the kernel can't map all RAM all the time any more.  Which
in turn adds overhead for mapping/unmapping pages if the kernel must
access highmem pages.  So it's better to run 64bit guests even with alot
less than 4G of memory.

I'd tend to just not support 32bit guests, I think it simply isn't worth
the trouble.

cheers,
  Gerd
Igor Mammedov Nov. 25, 2013, 1:34 p.m. UTC | #11
On Mon, 25 Nov 2013 12:39:05 +0100
Gerd Hoffmann <kraxel@redhat.com> wrote:

> > > Does it make sense to hotplug memory above 4g unconditionally to
> > > simplify things?
> > It does and it was so in v6 RFC,
> > But it would rule out hotplug for 32-bit guests that doesn't support
> > more then 4Gb.
> 
> Indeed.
> 
> > As use case 32-bit guest could start whit small initial memory
> > and hotplug additional memory if needed up to point where 32-bit
> > PCI hole starts. That would allow guests to launch with small amount
> > but baloon up upto 2-3.5 Gb depending on machine type.
> > I could drop 32-bit guest support and do only high mem hotplug if
> > this case it not interesting to the comunity, any suggestions?
> 
> 32bit limits start to hurt with 1GB already.  Kernel address space is 1G
> on 32bit, so the kernel can't map all RAM all the time any more.  Which
> in turn adds overhead for mapping/unmapping pages if the kernel must
> access highmem pages.  So it's better to run 64bit guests even with alot
> less than 4G of memory.
> 
> I'd tend to just not support 32bit guests, I think it simply isn't worth
> the trouble.
Fine with me, If there isn't abjection to it, I'll drop 32-bit guest support
on the next respin.

> cheers,
>   Gerd
> 
>
Paolo Bonzini Nov. 25, 2013, 1:35 p.m. UTC | #12
Il 25/11/2013 12:39, Gerd Hoffmann ha scritto:
>> > As use case 32-bit guest could start whit small initial memory
>> > and hotplug additional memory if needed up to point where 32-bit
>> > PCI hole starts. That would allow guests to launch with small amount
>> > but baloon up upto 2-3.5 Gb depending on machine type.
>> > I could drop 32-bit guest support and do only high mem hotplug if
>> > this case it not interesting to the comunity, any suggestions?
> 32bit limits start to hurt with 1GB already.  Kernel address space is 1G
> on 32bit, so the kernel can't map all RAM all the time any more.  Which
> in turn adds overhead for mapping/unmapping pages if the kernel must
> access highmem pages.  So it's better to run 64bit guests even with alot
> less than 4G of memory.
> 
> I'd tend to just not support 32bit guests, I think it simply isn't worth
> the trouble.

Also because it's just non-PAE 32-bit guests, no?  PAE guests would
support hotplug just fine.

Paolo
Igor Mammedov Nov. 25, 2013, 2:24 p.m. UTC | #13
On Mon, 25 Nov 2013 14:35:18 +0100
Paolo Bonzini <pbonzini@redhat.com> wrote:

> Il 25/11/2013 12:39, Gerd Hoffmann ha scritto:
> >> > As use case 32-bit guest could start whit small initial memory
> >> > and hotplug additional memory if needed up to point where 32-bit
> >> > PCI hole starts. That would allow guests to launch with small amount
> >> > but baloon up upto 2-3.5 Gb depending on machine type.
> >> > I could drop 32-bit guest support and do only high mem hotplug if
> >> > this case it not interesting to the comunity, any suggestions?
> > 32bit limits start to hurt with 1GB already.  Kernel address space is 1G
> > on 32bit, so the kernel can't map all RAM all the time any more.  Which
> > in turn adds overhead for mapping/unmapping pages if the kernel must
> > access highmem pages.  So it's better to run 64bit guests even with alot
> > less than 4G of memory.
> > 
> > I'd tend to just not support 32bit guests, I think it simply isn't worth
> > the trouble.
> 
> Also because it's just non-PAE 32-bit guests, no?  PAE guests would
> support hotplug just fine.
Yes, it shouldn't be issue for PAE in general.

but it might not work for Windows Server 2003 (including 64-bit one), since
it BSODed when it sees 64bit CRS (we saw it playing with 64-bit PCI window).
Not sure if we care about it though. 

> Paolo
>
Andreas Färber Nov. 25, 2013, 5 p.m. UTC | #14
Am 25.11.2013 11:41, schrieb Igor Mammedov:
> On Thu, 21 Nov 2013 17:09:27 +0100
> Andreas Färber <afaerber@suse.de> wrote:
> 
>> Am 21.11.2013 15:34, schrieb Igor Mammedov:
>>> On Thu, 21 Nov 2013 15:13:12 +0100
>>> Andreas Färber <afaerber@suse.de> wrote:
>>>> Am 21.11.2013 06:48, schrieb Li Guang:
>>>>> Why not give the memory that not be hot-added a chance to be placed on
>>>>> one memory slot?
>>>>
>>>> Seconded, I believe I requested that on the previous version already.
>>> Because current initial memory allocation is a mess and this already
>>> large series would become even more large and intrusive, so far series
>>> it relatively not intrusive and self contained.
>>>
>>> I believe re-factoring of initial memory to use Dimm devices should be
>>> done later on top of infrastructure this series provides.
>>
>> My problem with that is that that would be a semantically incompatible
>> modeling change. With your series we might have /machine/dimm.0/child[0]
>> be the first hot-plugged memory and once such a refactoring is done,
>> child[0] silently becomes -m and child[1] is the hot-added one.
> 
> I think there won't be silent change in child[0], since most likely
> initial RAM would require additional DimmBus (maybe even 2)
> for it's devices.
> 
> But anyway, why would this be an issue?
> 
>> So if we know that we want/need to change the infrastructure, why not
>> add a single patch (?) to allocate one additional object on the bus now?
>> Unless we actually write the code, we won't know whether there are some
>> incorrect hot-plug assumptions in the dimm code.
> It wouldn't be a single simple patch for PC, I'm afraid.
> I don't see point in adding dummy DIMM device for initial memory and then
> do re-aliasing of its memory region in GPA as it's done in current code.
> 
> As I see it (taking in account Marcolo's/Paolo's alignment path), current
> single MR for initial RAM should be split in 1-4 separate MRs depending on
> initial RAM amount and alignment requirements between HPA/GPA addresses.
> 
> That would probably introduce additional, non hotlugable DimmBus-es (1-2)
> for low and high memory, which would be incompatible with old machine types
> devices and GPA layout, so why care about what
> /machine/dimm.0/child[0] would be in new machine version?

I feel we're talking about two very different things here.

What I am talking about is the user experience. A mainboard has 4 or
maybe 8 DIMM slots where the user can plug in greenish memory bars.
That's what I would like to see implemented in QEMU because that's
understandable without reading code and ACPI specs.

What you seem to be talking about by contrast is your DimmBus
implementation and its limitations/assumptions. You can easily use
dev->hotplugged to distinguish between initial and hot-plugged devices
as done elsewhere, including PCI and ICC bus, no?

In short, what I am fighting against is having a machine with 4 slots:

slot[0] = 42
slot[1] = 0
slot[2] = 0
slot[3] = 0

meaning 42 + implicit -m now, later getting fixed to explicit:

slot[0] = -m
slot[1] = 42
slot[2] = 0
slot[3] = 0

Whether -m maps to one or more slots can easily be scaled in the
example, I had previously asked whether there were upper limits per slot
but believe that got denied from an ACPI perspective; my point is the
slot offset and the inconsistent sum exposed via QOM/QMP.

On your ICC bus we had the initial -smp CPUs alongside hot-plugged CPUs
right from the start.

I can't think of a reason why there would be multiple DimmBuses in a
single machine from a user's point of view.
Different implementations for different memory controllers / machines
make perfect sense of course. But even from a developer's point of view
multiple buses don't make that much sense either if we keep
http://wiki.qemu.org/Features/QOM#TODO in mind - devices exposing
multiple buses need to be split up and in the end we just have named
link<> properties on some container object as sketched in the example
above - question then becomes should we have multiple containers, and I
think the answer for a PC will be no.

Embedded systems with a mix of small on-chip SRAM and on-board SDRAM may
be a different beast to model, but well beyond the scope of this series
anyway, which IIUC doesn't expose any DimmBus outside of the two PCs.

Also, once memory has been hot-plugged and the machine gets rebooted,
shouldn't that be the same to BIOS/ACPI as if the memory was cold-plugged?

Regards,
Andreas
Igor Mammedov Nov. 26, 2013, 8:26 p.m. UTC | #15
On Mon, 25 Nov 2013 18:00:56 +0100
Andreas Färber <afaerber@suse.de> wrote:

> Am 25.11.2013 11:41, schrieb Igor Mammedov:
> > On Thu, 21 Nov 2013 17:09:27 +0100
> > Andreas Färber <afaerber@suse.de> wrote:
> > 
> >> Am 21.11.2013 15:34, schrieb Igor Mammedov:
> >>> On Thu, 21 Nov 2013 15:13:12 +0100
> >>> Andreas Färber <afaerber@suse.de> wrote:
> >>>> Am 21.11.2013 06:48, schrieb Li Guang:
> >>>>> Why not give the memory that not be hot-added a chance to be placed on
> >>>>> one memory slot?
> >>>>
> >>>> Seconded, I believe I requested that on the previous version already.
> >>> Because current initial memory allocation is a mess and this already
> >>> large series would become even more large and intrusive, so far series
> >>> it relatively not intrusive and self contained.
> >>>
> >>> I believe re-factoring of initial memory to use Dimm devices should be
> >>> done later on top of infrastructure this series provides.
> >>
> >> My problem with that is that that would be a semantically incompatible
> >> modeling change. With your series we might have /machine/dimm.0/child[0]
> >> be the first hot-plugged memory and once such a refactoring is done,
> >> child[0] silently becomes -m and child[1] is the hot-added one.
> > 
> > I think there won't be silent change in child[0], since most likely
> > initial RAM would require additional DimmBus (maybe even 2)
> > for it's devices.
> > 
> > But anyway, why would this be an issue?
> > 
> >> So if we know that we want/need to change the infrastructure, why not
> >> add a single patch (?) to allocate one additional object on the bus now?
> >> Unless we actually write the code, we won't know whether there are some
> >> incorrect hot-plug assumptions in the dimm code.
> > It wouldn't be a single simple patch for PC, I'm afraid.
> > I don't see point in adding dummy DIMM device for initial memory and then
> > do re-aliasing of its memory region in GPA as it's done in current code.
> > 
> > As I see it (taking in account Marcolo's/Paolo's alignment path), current
> > single MR for initial RAM should be split in 1-4 separate MRs depending on
> > initial RAM amount and alignment requirements between HPA/GPA addresses.
> > 
> > That would probably introduce additional, non hotlugable DimmBus-es (1-2)
> > for low and high memory, which would be incompatible with old machine types
> > devices and GPA layout, so why care about what
> > /machine/dimm.0/child[0] would be in new machine version?
> 
> I feel we're talking about two very different things here.
> 
> What I am talking about is the user experience. A mainboard has 4 or
> maybe 8 DIMM slots where the user can plug in greenish memory bars.
> That's what I would like to see implemented in QEMU because that's
> understandable without reading code and ACPI specs.
> 
> What you seem to be talking about by contrast is your DimmBus
> implementation and its limitations/assumptions. You can easily use
> dev->hotplugged to distinguish between initial and hot-plugged devices
> as done elsewhere, including PCI and ICC bus, no?
Yes, That what user would be interested in when doing hot-unplug. I'll add
properties to DimmDevice so user could see if it's "hotplugable" &
"hotplugged". 

> 
> In short, what I am fighting against is having a machine with 4 slots:
> 
> slot[0] = 42
> slot[1] = 0
> slot[2] = 0
> slot[3] = 0
> 
> meaning 42 + implicit -m now, later getting fixed to explicit:
> 
> slot[0] = -m
> slot[1] = 42
> slot[2] = 0
> slot[3] = 0
> 
> Whether -m maps to one or more slots can easily be scaled in the
> example, I had previously asked whether there were upper limits per slot
> but believe that got denied from an ACPI perspective; my point is the
> slot offset and the inconsistent sum exposed via QOM/QMP.
such change would be machine incompatible, so why the slot offset would be
important? Depending on initial memory size, slot offset would change.
Depending on stable offset to do something would be a just wrong use of
interface.

I see issue with a sum exposed via QOM/QMP whether it's links or bus based
implementation, but it looks like an additional feature not related to
memory hotplug:
 "let me count all present memory"
this series doesn't provide it, it only provides
 "current hotplug memory enumeration"


> 
> On your ICC bus we had the initial -smp CPUs alongside hot-plugged CPUs
> right from the start.
As Michael said 1.8 is not in freeze yet, so if there will be time I'll
try to convert initial memory to DIMMs as well for the sake of
cleaning up mess it's now and not producing yet another migration
incompatible machine.
But it's not trivial and not directly related to memory hotplug.
Doing dummy conversion would help SUM case from above but it will make
current code even messier. So I'd rather do it incrementally cleaning
it up in process vs making it messier.
 
> 
> I can't think of a reason why there would be multiple DimmBuses in a
> single machine from a user's point of view.
> Different implementations for different memory controllers / machines
> make perfect sense of course. But even from a developer's point of view
> multiple buses don't make that much sense either if we keep
> http://wiki.qemu.org/Features/QOM#TODO in mind - devices exposing
> multiple buses need to be split up and in the end we just have named
> link<> properties on some container object as sketched in the example
> above - question then becomes should we have multiple containers, and I
> think the answer for a PC will be no.
in pc we have to have container memory regions to contain DIMMs since the
split below / above 4Gb memory organization. One way would be to replace
current initial Memory region with non hotplugable bus that will hold initial
memory DIMMs.
In case when buses are to be converted to links<> with all hotplug
machinery around ready, it could be reorganized to 1 container with 2 MR
containers.

> Embedded systems with a mix of small on-chip SRAM and on-board SDRAM may
> be a different beast to model, but well beyond the scope of this series
> anyway, which IIUC doesn't expose any DimmBus outside of the two PCs.
> 
> Also, once memory has been hot-plugged and the machine gets rebooted,
> shouldn't that be the same to BIOS/ACPI as if the memory was cold-plugged?
Guest sees earlier hotplugged memory after reboot during enumeration of ACPI
memory device objects. Windows & Linux work with it just fine (the only
difference is that Linux doesn't online them automaticaly, it's up to udev
to deal with it).

I also have TODO item to evaluate if it's acceptable to add them and
reservation to E820 table so that guest could see them even before ACPI is
processed. 

> 
> Regards,
> Andreas
> 
> -- 
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
> GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
diff mbox

Patch

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 85d0862..306ed22 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1236,6 +1236,58 @@  void pc_acpi_dev_memory_hotplug_init(DimmBus *hotplug_mem_bus,
     }
 }
 
+static
+void pc_hotplug_memory_init_impl(Object *owner,
+                                 MemoryRegion *system_memory,
+                                 ram_addr_t low_hotplug_mem_start,
+                                 ram_addr_t low_hotplug_mem_end,
+                                 DimmBus *hotplug_mem_bus,
+                                 ram_addr_t *high_mem_end)
+{
+    QemuOpts *opts = qemu_opts_find(qemu_find_opts("memory-opts"), NULL);
+    ram_addr_t ram_size = qemu_opt_get_size(opts, "mem", 0);
+    ram_addr_t maxmem = qemu_opt_get_size(opts, "maxmem", 0);
+    ram_addr_t hotplug_mem_size;
+
+    if (maxmem <= ram_size) {
+        /* Disable ACPI migration code and creation of memory devices in SSDT */
+        qemu_opt_set_number(opts, "slots", 0);
+        return;
+    }
+
+    hotplug_mem_size = maxmem - ram_size;
+    if (hotplug_mem_size <= (low_hotplug_mem_end - low_hotplug_mem_start)) {
+        hotplug_mem_bus->base = low_hotplug_mem_start;
+    } else {
+        hotplug_mem_bus->base = ROUND_UP(*high_mem_end, 1ULL << 30);
+        *high_mem_end = hotplug_mem_bus->base + hotplug_mem_size;
+    }
+
+    memory_region_init(&hotplug_mem_bus->as, owner, "hotplug-memory",
+                       hotplug_mem_size);
+    memory_region_add_subregion(system_memory, hotplug_mem_bus->base,
+                                &hotplug_mem_bus->as);
+}
+
+pc_hotplug_memory_init_fn pc_hotplug_memory_init = pc_hotplug_memory_init_impl;
+
+void pc_hotplug_memory_init_compat_1_7(Object *owner,
+                                       MemoryRegion *system_memory,
+                                       ram_addr_t low_hotplug_mem_start,
+                                       ram_addr_t low_hotplug_mem_end,
+                                       DimmBus *hotplug_mem_bus,
+                                       ram_addr_t *high_mem_end)
+{
+    QemuOpts *opts = qemu_opts_find(qemu_find_opts("memory-opts"), NULL);
+    /*
+     * clearing slots tells acpi code that memory hotplug is disabled,
+     * so there is not need to migrate its state and create related
+     * SSDT table objects
+     */
+    qemu_opt_set_number(opts, "slots", 0);
+}
+
+
 qemu_irq *pc_allocate_cpu_irq(void)
 {
     return qemu_allocate_irqs(pic_irq_request, NULL, 1);
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index e1fe85a..7d30c12 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -48,6 +48,8 @@ 
 #include "exec/address-spaces.h"
 #include "hw/acpi/acpi.h"
 #include "cpu.h"
+#include "hw/acpi/piix4.h"
+#include "hw/pci-host/piix.h"
 #ifdef CONFIG_XEN
 #  include <xen/hvm/hvm_info_table.h>
 #endif
@@ -230,6 +232,8 @@  static void pc_init1(QEMUMachineInitArgs *args,
                               gsi[9], *smi_irq,
                               kvm_enabled(), fw_cfg);
         smbus_eeprom_init(smbus, 8, NULL, 0);
+        pc_acpi_dev_memory_hotplug_init(&i440fx_state->hotplug_mem_bus,
+                                        piix4_mem_hotplug, piix4_pm_find());
     }
 
     if (pci_enabled) {
@@ -250,6 +254,7 @@  static void pc_compat_1_7(QEMUMachineInitArgs *args)
 {
     smbios_type1_defaults = false;
     pc_pci_as_mapping_init = pc_pci_as_mapping_init_1_7;
+    pc_hotplug_memory_init = pc_hotplug_memory_init_compat_1_7;
 }
 
 static void pc_compat_1_6(QEMUMachineInitArgs *args)
@@ -367,7 +372,6 @@  static void pc_xen_hvm_init(QEMUMachineInitArgs *args)
     .desc = "Standard PC (i440FX + PIIX, 1996)", \
     .hot_add_cpu = pc_hot_add_cpu
 
-
 #define PC_I440FX_1_8_MACHINE_OPTIONS PC_I440FX_MACHINE_OPTIONS
 static QEMUMachine pc_i440fx_machine_v1_8 = {
     PC_I440FX_1_8_MACHINE_OPTIONS,
diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
index 8351430..2f2785b 100644
--- a/hw/pci-host/piix.c
+++ b/hw/pci-host/piix.c
@@ -282,6 +282,11 @@  static int i440fx_initfn(PCIDevice *dev)
     dev->config[I440FX_SMRAM] = 0x02;
 
     cpu_smm_register(&i440fx_set_smm, d);
+
+    qbus_create_inplace(&d->hotplug_mem_bus,
+                        sizeof(d->hotplug_mem_bus),
+                        TYPE_DIMM_BUS, DEVICE(d),
+                        "hotplug-membus");
     return 0;
 }
 
@@ -303,6 +308,8 @@  PCIBus *i440fx_init(PCII440FXState **pi440fx_state,
     PCII440FXState *f;
     unsigned i;
     I440FXState *i440fx;
+    ram_addr_t below_4g_mem_size = ram_size - above_4g_mem_size;
+    ram_addr_t high_memory_end = 0x100000000ULL + above_4g_mem_size;
 
     dev = qdev_create(NULL, TYPE_I440FX_PCI_HOST_BRIDGE);
     s = PCI_HOST_BRIDGE(dev);
@@ -330,10 +337,14 @@  PCIBus *i440fx_init(PCII440FXState **pi440fx_state,
         i440fx->pci_info.w32.begin = 0xe0000000;
     }
 
+    pc_hotplug_memory_init(OBJECT(f), f->system_memory,
+                           below_4g_mem_size, i440fx->pci_info.w32.begin,
+                           &f->hotplug_mem_bus, &high_memory_end);
+
     /* setup pci memory mapping */
     pc_pci_as_mapping_init(OBJECT(f), f->system_memory,
                            f->pci_address_space,
-                           0x100000000ULL + above_4g_mem_size);
+                           high_memory_end);
 
     memory_region_init_alias(&f->smram_region, OBJECT(d), "smram-region",
                              f->pci_address_space, 0xa0000, 0x20000);
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 0c86ee5..fea7d60 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -154,6 +154,22 @@  void pc_acpi_dev_memory_hotplug_init(DimmBus *hotplug_mem_bus,
                                      hotplug_fn hotplug,
                                      Object *acpi_dev);
 
+typedef
+void (*pc_hotplug_memory_init_fn)(Object *owner,
+                                  MemoryRegion *system_memory,
+                                  ram_addr_t low_hotplug_mem_start,
+                                  ram_addr_t low_hotplug_mem_end,
+                                  DimmBus *hotplug_mem_bus,
+                                  ram_addr_t *high_mem_end);
+extern pc_hotplug_memory_init_fn pc_hotplug_memory_init;
+
+void pc_hotplug_memory_init_compat_1_7(Object *owner,
+                                       MemoryRegion *system_memory,
+                                       ram_addr_t low_hotplug_mem_start,
+                                       ram_addr_t low_hotplug_mem_end,
+                                       DimmBus *hotplug_mem_bus,
+                                       ram_addr_t *high_mem_end);
+
 qemu_irq *pc_allocate_cpu_irq(void);
 DeviceState *pc_vga_init(ISABus *isa_bus, PCIBus *pci_bus);
 void pc_basic_device_init(ISABus *isa_bus, qemu_irq *gsi,
diff --git a/include/hw/pci-host/piix.h b/include/hw/pci-host/piix.h
index 1155996..a301c9d 100644
--- a/include/hw/pci-host/piix.h
+++ b/include/hw/pci-host/piix.h
@@ -6,6 +6,7 @@ 
 #include "hw/pci/pci.h"
 #include "hw/pci/pci_host.h"
 #include "hw/pci-host/pam.h"
+#include "hw/mem/dimm.h"
 
 #define TYPE_I440FX_PCI_DEVICE "i440FX"
 #define I440FX_PCI_DEVICE(obj) \
@@ -22,6 +23,7 @@  struct PCII440FXState {
     PAMMemoryRegion pam_regions[13];
     MemoryRegion smram_region;
     uint8_t smm_enabled;
+    DimmBus hotplug_mem_bus;
 };
 
 #endif