diff mbox

[PATCHv2,5/7] spapr: Adjust placement of PCI host bridge to allow > 1TiB RAM

Message ID 1476247497-6976-6-git-send-email-david@gibson.dropbear.id.au
State New
Headers show

Commit Message

David Gibson Oct. 12, 2016, 4:44 a.m. UTC
Currently the default PCI host bridge for the 'pseries' machine type is
constructed with its IO windows in the 1TiB..(1TiB + 64GiB) range in
guest memory space.  This means that if > 1TiB of guest RAM is specified,
the RAM will collide with the PCI IO windows, causing serious problems.

Problems won't be obvious until guest RAM goes a bit beyond 1TiB, because
there's a little unused space at the bottom of the area reserved for PCI,
but essentially this means that > 1TiB of RAM has never worked with the
pseries machine type.

This patch fixes this by altering the placement of PHBs on large-RAM VMs.
Instead of always placing the first PHB at 1TiB, it is placed at the next
1 TiB boundary after the maximum RAM address.

Technically, this changes behaviour in a migration-breaking way for
existing machines with > 1TiB maximum memory, but since having > 1 TiB
memory was broken anyway, this seems like a reasonable trade-off.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

Comments

Laurent Vivier Oct. 12, 2016, 10:07 a.m. UTC | #1
On 12/10/2016 06:44, David Gibson wrote:
> Currently the default PCI host bridge for the 'pseries' machine type is
> constructed with its IO windows in the 1TiB..(1TiB + 64GiB) range in
> guest memory space.  This means that if > 1TiB of guest RAM is specified,
> the RAM will collide with the PCI IO windows, causing serious problems.
> 
> Problems won't be obvious until guest RAM goes a bit beyond 1TiB, because
> there's a little unused space at the bottom of the area reserved for PCI,
> but essentially this means that > 1TiB of RAM has never worked with the
> pseries machine type.
> 
> This patch fixes this by altering the placement of PHBs on large-RAM VMs.
> Instead of always placing the first PHB at 1TiB, it is placed at the next
> 1 TiB boundary after the maximum RAM address.
> 
> Technically, this changes behaviour in a migration-breaking way for
> existing machines with > 1TiB maximum memory, but since having > 1 TiB
> memory was broken anyway, this seems like a reasonable trade-off.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/ppc/spapr.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index f6e9c2a..7cb167c 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2376,15 +2376,23 @@ static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index,
>                                  unsigned n_dma, uint32_t *liobns, Error **errp)
>  {
>      const uint64_t base_buid = 0x800000020000000ULL;
> -    const hwaddr phb0_base = 0x10000000000ULL; /* 1 TiB */
>      const hwaddr phb_spacing = 0x1000000000ULL; /* 64 GiB */
>      const hwaddr mmio_offset = 0xa0000000; /* 2 GiB + 512 MiB */
>      const hwaddr pio_offset = 0x80000000; /* 2 GiB */
>      const uint32_t max_index = 255;
> +    const hwaddr phb0_alignment = 0x10000000000ULL; /* 1 TiB */
>  
> -    hwaddr phb_base;
> +    uint64_t ram_top = MACHINE(spapr)->ram_size;
> +    hwaddr phb0_base, phb_base;
>      int i;
>  
> +    if (MACHINE(spapr)->maxram_size > ram_top) {
> +        ram_top = spapr->hotplug_memory.base +
> +            memory_region_size(&spapr->hotplug_memory.mr);
> +    }

Why don't you set directly ram_top to maxram_size?

> +
> +    phb0_base = QEMU_ALIGN_UP(ram_top, phb0_alignment);
> +
>      if (index > max_index) {
>          error_setg(errp, "\"index\" for PAPR PHB is too large (max %u)",
>                     max_index);
> 

Thanks,
Laurent
David Gibson Oct. 12, 2016, 10:55 a.m. UTC | #2
On Wed, Oct 12, 2016 at 12:07:50PM +0200, Laurent Vivier wrote:
> 
> 
> On 12/10/2016 06:44, David Gibson wrote:
> > Currently the default PCI host bridge for the 'pseries' machine type is
> > constructed with its IO windows in the 1TiB..(1TiB + 64GiB) range in
> > guest memory space.  This means that if > 1TiB of guest RAM is specified,
> > the RAM will collide with the PCI IO windows, causing serious problems.
> > 
> > Problems won't be obvious until guest RAM goes a bit beyond 1TiB, because
> > there's a little unused space at the bottom of the area reserved for PCI,
> > but essentially this means that > 1TiB of RAM has never worked with the
> > pseries machine type.
> > 
> > This patch fixes this by altering the placement of PHBs on large-RAM VMs.
> > Instead of always placing the first PHB at 1TiB, it is placed at the next
> > 1 TiB boundary after the maximum RAM address.
> > 
> > Technically, this changes behaviour in a migration-breaking way for
> > existing machines with > 1TiB maximum memory, but since having > 1 TiB
> > memory was broken anyway, this seems like a reasonable trade-off.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  hw/ppc/spapr.c | 12 ++++++++++--
> >  1 file changed, 10 insertions(+), 2 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index f6e9c2a..7cb167c 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -2376,15 +2376,23 @@ static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index,
> >                                  unsigned n_dma, uint32_t *liobns, Error **errp)
> >  {
> >      const uint64_t base_buid = 0x800000020000000ULL;
> > -    const hwaddr phb0_base = 0x10000000000ULL; /* 1 TiB */
> >      const hwaddr phb_spacing = 0x1000000000ULL; /* 64 GiB */
> >      const hwaddr mmio_offset = 0xa0000000; /* 2 GiB + 512 MiB */
> >      const hwaddr pio_offset = 0x80000000; /* 2 GiB */
> >      const uint32_t max_index = 255;
> > +    const hwaddr phb0_alignment = 0x10000000000ULL; /* 1 TiB */
> >  
> > -    hwaddr phb_base;
> > +    uint64_t ram_top = MACHINE(spapr)->ram_size;
> > +    hwaddr phb0_base, phb_base;
> >      int i;
> >  
> > +    if (MACHINE(spapr)->maxram_size > ram_top) {
> > +        ram_top = spapr->hotplug_memory.base +
> > +            memory_region_size(&spapr->hotplug_memory.mr);
> > +    }
> 
> Why don't you set directly ram_top to maxram_size?

Because there may be an alignment gap between ram_size and the bottom
of the hotplug region.

> 
> > +
> > +    phb0_base = QEMU_ALIGN_UP(ram_top, phb0_alignment);
> > +
> >      if (index > max_index) {
> >          error_setg(errp, "\"index\" for PAPR PHB is too large (max %u)",
> >                     max_index);
> > 
> 
> Thanks,
> Laurent
>
Laurent Vivier Oct. 12, 2016, 12:06 p.m. UTC | #3
On 12/10/2016 12:55, David Gibson wrote:
> On Wed, Oct 12, 2016 at 12:07:50PM +0200, Laurent Vivier wrote:
>>
>>
>> On 12/10/2016 06:44, David Gibson wrote:
>>> Currently the default PCI host bridge for the 'pseries' machine type is
>>> constructed with its IO windows in the 1TiB..(1TiB + 64GiB) range in
>>> guest memory space.  This means that if > 1TiB of guest RAM is specified,
>>> the RAM will collide with the PCI IO windows, causing serious problems.
>>>
>>> Problems won't be obvious until guest RAM goes a bit beyond 1TiB, because
>>> there's a little unused space at the bottom of the area reserved for PCI,
>>> but essentially this means that > 1TiB of RAM has never worked with the
>>> pseries machine type.
>>>
>>> This patch fixes this by altering the placement of PHBs on large-RAM VMs.
>>> Instead of always placing the first PHB at 1TiB, it is placed at the next
>>> 1 TiB boundary after the maximum RAM address.
>>>
>>> Technically, this changes behaviour in a migration-breaking way for
>>> existing machines with > 1TiB maximum memory, but since having > 1 TiB
>>> memory was broken anyway, this seems like a reasonable trade-off.
>>>
>>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>>> ---
>>>  hw/ppc/spapr.c | 12 ++++++++++--
>>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>> index f6e9c2a..7cb167c 100644
>>> --- a/hw/ppc/spapr.c
>>> +++ b/hw/ppc/spapr.c
>>> @@ -2376,15 +2376,23 @@ static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index,
>>>                                  unsigned n_dma, uint32_t *liobns, Error **errp)
>>>  {
>>>      const uint64_t base_buid = 0x800000020000000ULL;
>>> -    const hwaddr phb0_base = 0x10000000000ULL; /* 1 TiB */
>>>      const hwaddr phb_spacing = 0x1000000000ULL; /* 64 GiB */
>>>      const hwaddr mmio_offset = 0xa0000000; /* 2 GiB + 512 MiB */
>>>      const hwaddr pio_offset = 0x80000000; /* 2 GiB */
>>>      const uint32_t max_index = 255;
>>> +    const hwaddr phb0_alignment = 0x10000000000ULL; /* 1 TiB */
>>>  
>>> -    hwaddr phb_base;
>>> +    uint64_t ram_top = MACHINE(spapr)->ram_size;
>>> +    hwaddr phb0_base, phb_base;
>>>      int i;
>>>  
>>> +    if (MACHINE(spapr)->maxram_size > ram_top) {
>>> +        ram_top = spapr->hotplug_memory.base +
>>> +            memory_region_size(&spapr->hotplug_memory.mr);
>>> +    }
>>
>> Why don't you set directly ram_top to maxram_size?
> 
> Because there may be an alignment gap between ram_size and the bottom
> of the hotplug region.

Perhaps you could add a comment why we have this check:

when machine->ram_size == machine->maxram_size, there is no hotpluggable
memory.

BTW, something strange in hotpluggable memory in hw/ppc/spapr.c:

     /* initialize hotplug memory address space */
     if (machine->ram_size < machine->maxram_size) {
         ram_addr_t hotplug_mem_size = machine->maxram_size -
machine->ram_size;
...
         spapr->hotplug_memory.base = ROUND_UP(machine->ram_size,
                                               SPAPR_HOTPLUG_MEM_ALIGN);
         memory_region_init(&spapr->hotplug_memory.mr, OBJECT(spapr),
                            "hotplug-memory", hotplug_mem_size);

So the end of hotpluggable memory can be beyond maxram_size, is that normal?

Thanks,
Laurent
David Gibson Oct. 12, 2016, 11:48 p.m. UTC | #4
On Wed, Oct 12, 2016 at 02:06:10PM +0200, Laurent Vivier wrote:
> 
> 
> On 12/10/2016 12:55, David Gibson wrote:
> > On Wed, Oct 12, 2016 at 12:07:50PM +0200, Laurent Vivier wrote:
> >>
> >>
> >> On 12/10/2016 06:44, David Gibson wrote:
> >>> Currently the default PCI host bridge for the 'pseries' machine type is
> >>> constructed with its IO windows in the 1TiB..(1TiB + 64GiB) range in
> >>> guest memory space.  This means that if > 1TiB of guest RAM is specified,
> >>> the RAM will collide with the PCI IO windows, causing serious problems.
> >>>
> >>> Problems won't be obvious until guest RAM goes a bit beyond 1TiB, because
> >>> there's a little unused space at the bottom of the area reserved for PCI,
> >>> but essentially this means that > 1TiB of RAM has never worked with the
> >>> pseries machine type.
> >>>
> >>> This patch fixes this by altering the placement of PHBs on large-RAM VMs.
> >>> Instead of always placing the first PHB at 1TiB, it is placed at the next
> >>> 1 TiB boundary after the maximum RAM address.
> >>>
> >>> Technically, this changes behaviour in a migration-breaking way for
> >>> existing machines with > 1TiB maximum memory, but since having > 1 TiB
> >>> memory was broken anyway, this seems like a reasonable trade-off.
> >>>
> >>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> >>> ---
> >>>  hw/ppc/spapr.c | 12 ++++++++++--
> >>>  1 file changed, 10 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >>> index f6e9c2a..7cb167c 100644
> >>> --- a/hw/ppc/spapr.c
> >>> +++ b/hw/ppc/spapr.c
> >>> @@ -2376,15 +2376,23 @@ static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index,
> >>>                                  unsigned n_dma, uint32_t *liobns, Error **errp)
> >>>  {
> >>>      const uint64_t base_buid = 0x800000020000000ULL;
> >>> -    const hwaddr phb0_base = 0x10000000000ULL; /* 1 TiB */
> >>>      const hwaddr phb_spacing = 0x1000000000ULL; /* 64 GiB */
> >>>      const hwaddr mmio_offset = 0xa0000000; /* 2 GiB + 512 MiB */
> >>>      const hwaddr pio_offset = 0x80000000; /* 2 GiB */
> >>>      const uint32_t max_index = 255;
> >>> +    const hwaddr phb0_alignment = 0x10000000000ULL; /* 1 TiB */
> >>>  
> >>> -    hwaddr phb_base;
> >>> +    uint64_t ram_top = MACHINE(spapr)->ram_size;
> >>> +    hwaddr phb0_base, phb_base;
> >>>      int i;
> >>>  
> >>> +    if (MACHINE(spapr)->maxram_size > ram_top) {
> >>> +        ram_top = spapr->hotplug_memory.base +
> >>> +            memory_region_size(&spapr->hotplug_memory.mr);
> >>> +    }
> >>
> >> Why don't you set directly ram_top to maxram_size?
> > 
> > Because there may be an alignment gap between ram_size and the bottom
> > of the hotplug region.
> 
> Perhaps you could add a comment why we have this check:
> 
> when machine->ram_size == machine->maxram_size, there is no hotpluggable
> memory.

Good idea, I've added something.

> BTW, something strange in hotpluggable memory in hw/ppc/spapr.c:
> 
>      /* initialize hotplug memory address space */
>      if (machine->ram_size < machine->maxram_size) {
>          ram_addr_t hotplug_mem_size = machine->maxram_size -
> machine->ram_size;
> ...
>          spapr->hotplug_memory.base = ROUND_UP(machine->ram_size,
>                                                SPAPR_HOTPLUG_MEM_ALIGN);
>          memory_region_init(&spapr->hotplug_memory.mr, OBJECT(spapr),
>                             "hotplug-memory", hotplug_mem_size);
> 
> So the end of hotpluggable memory can be beyond maxram_size, is that normal?

Yes.  maxram_size is the total possible amount of RAM not counting
gaps, rather than the maximum possible RAM address.
diff mbox

Patch

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index f6e9c2a..7cb167c 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2376,15 +2376,23 @@  static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index,
                                 unsigned n_dma, uint32_t *liobns, Error **errp)
 {
     const uint64_t base_buid = 0x800000020000000ULL;
-    const hwaddr phb0_base = 0x10000000000ULL; /* 1 TiB */
     const hwaddr phb_spacing = 0x1000000000ULL; /* 64 GiB */
     const hwaddr mmio_offset = 0xa0000000; /* 2 GiB + 512 MiB */
     const hwaddr pio_offset = 0x80000000; /* 2 GiB */
     const uint32_t max_index = 255;
+    const hwaddr phb0_alignment = 0x10000000000ULL; /* 1 TiB */
 
-    hwaddr phb_base;
+    uint64_t ram_top = MACHINE(spapr)->ram_size;
+    hwaddr phb0_base, phb_base;
     int i;
 
+    if (MACHINE(spapr)->maxram_size > ram_top) {
+        ram_top = spapr->hotplug_memory.base +
+            memory_region_size(&spapr->hotplug_memory.mr);
+    }
+
+    phb0_base = QEMU_ALIGN_UP(ram_top, phb0_alignment);
+
     if (index > max_index) {
         error_setg(errp, "\"index\" for PAPR PHB is too large (max %u)",
                    max_index);