Message ID | 1476247497-6976-6-git-send-email-david@gibson.dropbear.id.au |
---|---|
State | New |
Headers | show |
On 12/10/2016 06:44, David Gibson wrote: > Currently the default PCI host bridge for the 'pseries' machine type is > constructed with its IO windows in the 1TiB..(1TiB + 64GiB) range in > guest memory space. This means that if > 1TiB of guest RAM is specified, > the RAM will collide with the PCI IO windows, causing serious problems. > > Problems won't be obvious until guest RAM goes a bit beyond 1TiB, because > there's a little unused space at the bottom of the area reserved for PCI, > but essentially this means that > 1TiB of RAM has never worked with the > pseries machine type. > > This patch fixes this by altering the placement of PHBs on large-RAM VMs. > Instead of always placing the first PHB at 1TiB, it is placed at the next > 1 TiB boundary after the maximum RAM address. > > Technically, this changes behaviour in a migration-breaking way for > existing machines with > 1TiB maximum memory, but since having > 1 TiB > memory was broken anyway, this seems like a reasonable trade-off. > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au> > --- > hw/ppc/spapr.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index f6e9c2a..7cb167c 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -2376,15 +2376,23 @@ static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index, > unsigned n_dma, uint32_t *liobns, Error **errp) > { > const uint64_t base_buid = 0x800000020000000ULL; > - const hwaddr phb0_base = 0x10000000000ULL; /* 1 TiB */ > const hwaddr phb_spacing = 0x1000000000ULL; /* 64 GiB */ > const hwaddr mmio_offset = 0xa0000000; /* 2 GiB + 512 MiB */ > const hwaddr pio_offset = 0x80000000; /* 2 GiB */ > const uint32_t max_index = 255; > + const hwaddr phb0_alignment = 0x10000000000ULL; /* 1 TiB */ > > - hwaddr phb_base; > + uint64_t ram_top = MACHINE(spapr)->ram_size; > + hwaddr phb0_base, phb_base; > int i; > > + if (MACHINE(spapr)->maxram_size > ram_top) { > + ram_top = spapr->hotplug_memory.base + > + memory_region_size(&spapr->hotplug_memory.mr); > + } Why don't you set directly ram_top to maxram_size? > + > + phb0_base = QEMU_ALIGN_UP(ram_top, phb0_alignment); > + > if (index > max_index) { > error_setg(errp, "\"index\" for PAPR PHB is too large (max %u)", > max_index); > Thanks, Laurent
On Wed, Oct 12, 2016 at 12:07:50PM +0200, Laurent Vivier wrote: > > > On 12/10/2016 06:44, David Gibson wrote: > > Currently the default PCI host bridge for the 'pseries' machine type is > > constructed with its IO windows in the 1TiB..(1TiB + 64GiB) range in > > guest memory space. This means that if > 1TiB of guest RAM is specified, > > the RAM will collide with the PCI IO windows, causing serious problems. > > > > Problems won't be obvious until guest RAM goes a bit beyond 1TiB, because > > there's a little unused space at the bottom of the area reserved for PCI, > > but essentially this means that > 1TiB of RAM has never worked with the > > pseries machine type. > > > > This patch fixes this by altering the placement of PHBs on large-RAM VMs. > > Instead of always placing the first PHB at 1TiB, it is placed at the next > > 1 TiB boundary after the maximum RAM address. > > > > Technically, this changes behaviour in a migration-breaking way for > > existing machines with > 1TiB maximum memory, but since having > 1 TiB > > memory was broken anyway, this seems like a reasonable trade-off. > > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au> > > --- > > hw/ppc/spapr.c | 12 ++++++++++-- > > 1 file changed, 10 insertions(+), 2 deletions(-) > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > index f6e9c2a..7cb167c 100644 > > --- a/hw/ppc/spapr.c > > +++ b/hw/ppc/spapr.c > > @@ -2376,15 +2376,23 @@ static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index, > > unsigned n_dma, uint32_t *liobns, Error **errp) > > { > > const uint64_t base_buid = 0x800000020000000ULL; > > - const hwaddr phb0_base = 0x10000000000ULL; /* 1 TiB */ > > const hwaddr phb_spacing = 0x1000000000ULL; /* 64 GiB */ > > const hwaddr mmio_offset = 0xa0000000; /* 2 GiB + 512 MiB */ > > const hwaddr pio_offset = 0x80000000; /* 2 GiB */ > > const uint32_t max_index = 255; > > + const hwaddr phb0_alignment = 0x10000000000ULL; /* 1 TiB */ > > > > - hwaddr phb_base; > > + uint64_t ram_top = MACHINE(spapr)->ram_size; > > + hwaddr phb0_base, phb_base; > > int i; > > > > + if (MACHINE(spapr)->maxram_size > ram_top) { > > + ram_top = spapr->hotplug_memory.base + > > + memory_region_size(&spapr->hotplug_memory.mr); > > + } > > Why don't you set directly ram_top to maxram_size? Because there may be an alignment gap between ram_size and the bottom of the hotplug region. > > > + > > + phb0_base = QEMU_ALIGN_UP(ram_top, phb0_alignment); > > + > > if (index > max_index) { > > error_setg(errp, "\"index\" for PAPR PHB is too large (max %u)", > > max_index); > > > > Thanks, > Laurent >
On 12/10/2016 12:55, David Gibson wrote: > On Wed, Oct 12, 2016 at 12:07:50PM +0200, Laurent Vivier wrote: >> >> >> On 12/10/2016 06:44, David Gibson wrote: >>> Currently the default PCI host bridge for the 'pseries' machine type is >>> constructed with its IO windows in the 1TiB..(1TiB + 64GiB) range in >>> guest memory space. This means that if > 1TiB of guest RAM is specified, >>> the RAM will collide with the PCI IO windows, causing serious problems. >>> >>> Problems won't be obvious until guest RAM goes a bit beyond 1TiB, because >>> there's a little unused space at the bottom of the area reserved for PCI, >>> but essentially this means that > 1TiB of RAM has never worked with the >>> pseries machine type. >>> >>> This patch fixes this by altering the placement of PHBs on large-RAM VMs. >>> Instead of always placing the first PHB at 1TiB, it is placed at the next >>> 1 TiB boundary after the maximum RAM address. >>> >>> Technically, this changes behaviour in a migration-breaking way for >>> existing machines with > 1TiB maximum memory, but since having > 1 TiB >>> memory was broken anyway, this seems like a reasonable trade-off. >>> >>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> >>> --- >>> hw/ppc/spapr.c | 12 ++++++++++-- >>> 1 file changed, 10 insertions(+), 2 deletions(-) >>> >>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c >>> index f6e9c2a..7cb167c 100644 >>> --- a/hw/ppc/spapr.c >>> +++ b/hw/ppc/spapr.c >>> @@ -2376,15 +2376,23 @@ static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index, >>> unsigned n_dma, uint32_t *liobns, Error **errp) >>> { >>> const uint64_t base_buid = 0x800000020000000ULL; >>> - const hwaddr phb0_base = 0x10000000000ULL; /* 1 TiB */ >>> const hwaddr phb_spacing = 0x1000000000ULL; /* 64 GiB */ >>> const hwaddr mmio_offset = 0xa0000000; /* 2 GiB + 512 MiB */ >>> const hwaddr pio_offset = 0x80000000; /* 2 GiB */ >>> const uint32_t max_index = 255; >>> + const hwaddr phb0_alignment = 0x10000000000ULL; /* 1 TiB */ >>> >>> - hwaddr phb_base; >>> + uint64_t ram_top = MACHINE(spapr)->ram_size; >>> + hwaddr phb0_base, phb_base; >>> int i; >>> >>> + if (MACHINE(spapr)->maxram_size > ram_top) { >>> + ram_top = spapr->hotplug_memory.base + >>> + memory_region_size(&spapr->hotplug_memory.mr); >>> + } >> >> Why don't you set directly ram_top to maxram_size? > > Because there may be an alignment gap between ram_size and the bottom > of the hotplug region. Perhaps you could add a comment why we have this check: when machine->ram_size == machine->maxram_size, there is no hotpluggable memory. BTW, something strange in hotpluggable memory in hw/ppc/spapr.c: /* initialize hotplug memory address space */ if (machine->ram_size < machine->maxram_size) { ram_addr_t hotplug_mem_size = machine->maxram_size - machine->ram_size; ... spapr->hotplug_memory.base = ROUND_UP(machine->ram_size, SPAPR_HOTPLUG_MEM_ALIGN); memory_region_init(&spapr->hotplug_memory.mr, OBJECT(spapr), "hotplug-memory", hotplug_mem_size); So the end of hotpluggable memory can be beyond maxram_size, is that normal? Thanks, Laurent
On Wed, Oct 12, 2016 at 02:06:10PM +0200, Laurent Vivier wrote: > > > On 12/10/2016 12:55, David Gibson wrote: > > On Wed, Oct 12, 2016 at 12:07:50PM +0200, Laurent Vivier wrote: > >> > >> > >> On 12/10/2016 06:44, David Gibson wrote: > >>> Currently the default PCI host bridge for the 'pseries' machine type is > >>> constructed with its IO windows in the 1TiB..(1TiB + 64GiB) range in > >>> guest memory space. This means that if > 1TiB of guest RAM is specified, > >>> the RAM will collide with the PCI IO windows, causing serious problems. > >>> > >>> Problems won't be obvious until guest RAM goes a bit beyond 1TiB, because > >>> there's a little unused space at the bottom of the area reserved for PCI, > >>> but essentially this means that > 1TiB of RAM has never worked with the > >>> pseries machine type. > >>> > >>> This patch fixes this by altering the placement of PHBs on large-RAM VMs. > >>> Instead of always placing the first PHB at 1TiB, it is placed at the next > >>> 1 TiB boundary after the maximum RAM address. > >>> > >>> Technically, this changes behaviour in a migration-breaking way for > >>> existing machines with > 1TiB maximum memory, but since having > 1 TiB > >>> memory was broken anyway, this seems like a reasonable trade-off. > >>> > >>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> > >>> --- > >>> hw/ppc/spapr.c | 12 ++++++++++-- > >>> 1 file changed, 10 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > >>> index f6e9c2a..7cb167c 100644 > >>> --- a/hw/ppc/spapr.c > >>> +++ b/hw/ppc/spapr.c > >>> @@ -2376,15 +2376,23 @@ static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index, > >>> unsigned n_dma, uint32_t *liobns, Error **errp) > >>> { > >>> const uint64_t base_buid = 0x800000020000000ULL; > >>> - const hwaddr phb0_base = 0x10000000000ULL; /* 1 TiB */ > >>> const hwaddr phb_spacing = 0x1000000000ULL; /* 64 GiB */ > >>> const hwaddr mmio_offset = 0xa0000000; /* 2 GiB + 512 MiB */ > >>> const hwaddr pio_offset = 0x80000000; /* 2 GiB */ > >>> const uint32_t max_index = 255; > >>> + const hwaddr phb0_alignment = 0x10000000000ULL; /* 1 TiB */ > >>> > >>> - hwaddr phb_base; > >>> + uint64_t ram_top = MACHINE(spapr)->ram_size; > >>> + hwaddr phb0_base, phb_base; > >>> int i; > >>> > >>> + if (MACHINE(spapr)->maxram_size > ram_top) { > >>> + ram_top = spapr->hotplug_memory.base + > >>> + memory_region_size(&spapr->hotplug_memory.mr); > >>> + } > >> > >> Why don't you set directly ram_top to maxram_size? > > > > Because there may be an alignment gap between ram_size and the bottom > > of the hotplug region. > > Perhaps you could add a comment why we have this check: > > when machine->ram_size == machine->maxram_size, there is no hotpluggable > memory. Good idea, I've added something. > BTW, something strange in hotpluggable memory in hw/ppc/spapr.c: > > /* initialize hotplug memory address space */ > if (machine->ram_size < machine->maxram_size) { > ram_addr_t hotplug_mem_size = machine->maxram_size - > machine->ram_size; > ... > spapr->hotplug_memory.base = ROUND_UP(machine->ram_size, > SPAPR_HOTPLUG_MEM_ALIGN); > memory_region_init(&spapr->hotplug_memory.mr, OBJECT(spapr), > "hotplug-memory", hotplug_mem_size); > > So the end of hotpluggable memory can be beyond maxram_size, is that normal? Yes. maxram_size is the total possible amount of RAM not counting gaps, rather than the maximum possible RAM address.
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index f6e9c2a..7cb167c 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -2376,15 +2376,23 @@ static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index, unsigned n_dma, uint32_t *liobns, Error **errp) { const uint64_t base_buid = 0x800000020000000ULL; - const hwaddr phb0_base = 0x10000000000ULL; /* 1 TiB */ const hwaddr phb_spacing = 0x1000000000ULL; /* 64 GiB */ const hwaddr mmio_offset = 0xa0000000; /* 2 GiB + 512 MiB */ const hwaddr pio_offset = 0x80000000; /* 2 GiB */ const uint32_t max_index = 255; + const hwaddr phb0_alignment = 0x10000000000ULL; /* 1 TiB */ - hwaddr phb_base; + uint64_t ram_top = MACHINE(spapr)->ram_size; + hwaddr phb0_base, phb_base; int i; + if (MACHINE(spapr)->maxram_size > ram_top) { + ram_top = spapr->hotplug_memory.base + + memory_region_size(&spapr->hotplug_memory.mr); + } + + phb0_base = QEMU_ALIGN_UP(ram_top, phb0_alignment); + if (index > max_index) { error_setg(errp, "\"index\" for PAPR PHB is too large (max %u)", max_index);
Currently the default PCI host bridge for the 'pseries' machine type is constructed with its IO windows in the 1TiB..(1TiB + 64GiB) range in guest memory space. This means that if > 1TiB of guest RAM is specified, the RAM will collide with the PCI IO windows, causing serious problems. Problems won't be obvious until guest RAM goes a bit beyond 1TiB, because there's a little unused space at the bottom of the area reserved for PCI, but essentially this means that > 1TiB of RAM has never worked with the pseries machine type. This patch fixes this by altering the placement of PHBs on large-RAM VMs. Instead of always placing the first PHB at 1TiB, it is placed at the next 1 TiB boundary after the maximum RAM address. Technically, this changes behaviour in a migration-breaking way for existing machines with > 1TiB maximum memory, but since having > 1 TiB memory was broken anyway, this seems like a reasonable trade-off. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> --- hw/ppc/spapr.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)