diff mbox

[v3] spapr: Ensure all LMBs are represented in ibm, dynamic-memory

Message ID 1465276743-7340-1-git-send-email-bharata@linux.vnet.ibm.com
State New
Headers show

Commit Message

Bharata B Rao June 7, 2016, 5:19 a.m. UTC
Memory hotplug can fail for some combinations of RAM and maxmem when
DDW is enabled in the presence of devices like nec-usb-xhci. DDW depends
on maximum addressable memory returned by guest and this value is currently
being calculated wrongly by the guest kernel routine memory_hotplug_max().
While there is an attempt to fix the guest kernel, this patch works
around the problem within QEMU itself.

memory_hotplug_max() routine in the guest kernel arrives at max
addressable memory by multiplying lmb-size with the lmb-count obtained
from ibm,dynamic-memory property. There are two assumptions here:

- All LMBs are part of ibm,dynamic memory: This is not true for PowerKVM
  where only hot-pluggable LMBs are present in this property.
- The memory area comprising of RAM and hotplug region is contiguous: This
  needn't be true always for PowerKVM as there can be gap between
  boot time RAM and hotplug region.

To work around this guest kernel bug, ensure that ibm,dynamic-memory
has information about all the LMBs (RMA, boot-time LMBs, future
hotpluggable LMBs, and dummy LMBs to cover the gap between RAM and
hotpluggable region).

RMA is represented separately by memory@0 node. Hence mark RMA LMBs
and also the LMBs for the gap b/n RAM and hotpluggable region as
reserved so that these LMBs are not recounted/counted by guest.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
---
Changes in v3:

- Not touching spapr_create_lmb_dr_connectors() so that we continue
  to have DRC objects for only hotpluggable LMBs.
- Simplified the logic of creating dynamic-memory node based on comments
  from Michael Roth and David Gibson.

v2: https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg01316.html

 hw/ppc/spapr.c         | 51 ++++++++++++++++++++++++++++++++------------------
 include/hw/ppc/spapr.h |  5 +++--
 2 files changed, 36 insertions(+), 20 deletions(-)

Comments

Michael Roth June 7, 2016, 11:37 p.m. UTC | #1
Quoting Bharata B Rao (2016-06-07 00:19:03)
> Memory hotplug can fail for some combinations of RAM and maxmem when
> DDW is enabled in the presence of devices like nec-usb-xhci. DDW depends
> on maximum addressable memory returned by guest and this value is currently
> being calculated wrongly by the guest kernel routine memory_hotplug_max().
> While there is an attempt to fix the guest kernel, this patch works
> around the problem within QEMU itself.
> 
> memory_hotplug_max() routine in the guest kernel arrives at max
> addressable memory by multiplying lmb-size with the lmb-count obtained
> from ibm,dynamic-memory property. There are two assumptions here:
> 
> - All LMBs are part of ibm,dynamic memory: This is not true for PowerKVM
>   where only hot-pluggable LMBs are present in this property.
> - The memory area comprising of RAM and hotplug region is contiguous: This
>   needn't be true always for PowerKVM as there can be gap between
>   boot time RAM and hotplug region.
> 
> To work around this guest kernel bug, ensure that ibm,dynamic-memory
> has information about all the LMBs (RMA, boot-time LMBs, future
> hotpluggable LMBs, and dummy LMBs to cover the gap between RAM and
> hotpluggable region).
> 
> RMA is represented separately by memory@0 node. Hence mark RMA LMBs
> and also the LMBs for the gap b/n RAM and hotpluggable region as
> reserved so that these LMBs are not recounted/counted by guest.
> 
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> ---
> Changes in v3:
> 
> - Not touching spapr_create_lmb_dr_connectors() so that we continue
>   to have DRC objects for only hotpluggable LMBs.
> - Simplified the logic of creating dynamic-memory node based on comments
>   from Michael Roth and David Gibson.
> 
> v2: https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg01316.html
> 
>  hw/ppc/spapr.c         | 51 ++++++++++++++++++++++++++++++++------------------
>  include/hw/ppc/spapr.h |  5 +++--
>  2 files changed, 36 insertions(+), 20 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 0636642..9d1d43d 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -762,14 +762,17 @@ static int spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt)
>      int ret, i, offset;
>      uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
>      uint32_t prop_lmb_size[] = {0, cpu_to_be32(lmb_size)};
> -    uint32_t nr_lmbs = (machine->maxram_size - machine->ram_size)/lmb_size;
> +    uint32_t hotplug_lmb_start = spapr->hotplug_memory.base / lmb_size;
> +    uint32_t nr_lmbs = (spapr->hotplug_memory.base +
> +                       memory_region_size(&spapr->hotplug_memory.mr)) /
> +                       lmb_size;
>      uint32_t *int_buf, *cur_index, buf_len;
>      int nr_nodes = nb_numa_nodes ? nb_numa_nodes : 1;
> 
>      /*
> -     * Don't create the node if there are no DR LMBs.
> +     * Don't create the node if there is no hotpluggable memory
>       */
> -    if (!nr_lmbs) {
> +    if (machine->ram_size == machine->maxram_size) {
>          return 0;
>      }
> 
> @@ -805,24 +808,36 @@ static int spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt)
>      for (i = 0; i < nr_lmbs; i++) {
>          sPAPRDRConnector *drc;
>          sPAPRDRConnectorClass *drck;

Since these ^ are only used if (i >= hotplug_lmb_start), it might be
clearer to move them there now.

> -        uint64_t addr = i * lmb_size + spapr->hotplug_memory.base;;
> +        uint64_t addr = i * lmb_size;
>          uint32_t *dynamic_memory = cur_index;
> 
> -        drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> -                                       addr/lmb_size);
> -        g_assert(drc);
> -        drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> -
> -        dynamic_memory[0] = cpu_to_be32(addr >> 32);
> -        dynamic_memory[1] = cpu_to_be32(addr & 0xffffffff);
> -        dynamic_memory[2] = cpu_to_be32(drck->get_index(drc));
> -        dynamic_memory[3] = cpu_to_be32(0); /* reserved */
> -        dynamic_memory[4] = cpu_to_be32(numa_get_node(addr, NULL));
> -        if (addr < machine->ram_size ||
> -                    memory_region_present(get_system_memory(), addr)) {
> -            dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_ASSIGNED);
> +        if (i >= hotplug_lmb_start) {
> +            drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> +                                           addr / lmb_size);

Could just be i

> +            g_assert(drc);
> +            drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +
> +            dynamic_memory[0] = cpu_to_be32(addr >> 32);
> +            dynamic_memory[1] = cpu_to_be32(addr & 0xffffffff);
> +            dynamic_memory[2] = cpu_to_be32(drck->get_index(drc));
> +            dynamic_memory[3] = cpu_to_be32(0); /* reserved */
> +            dynamic_memory[4] = cpu_to_be32(numa_get_node(addr, NULL));
> +            if (memory_region_present(get_system_memory(), addr)) {
> +                dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_ASSIGNED);
> +            } else {
> +                dynamic_memory[5] = cpu_to_be32(0);
> +            }
>          } else {
> -            dynamic_memory[5] = cpu_to_be32(0);
> +            /*
> +             * LMB information for RMA, boot time RAM and gap b/n RAM and
> +             * hotplug memory region -- all these are marked as reserved.
> +             */
> +            dynamic_memory[0] = cpu_to_be32(0);
> +            dynamic_memory[1] = cpu_to_be32(0);

Are we sure we shouldn't still encode the addr here?

> +            dynamic_memory[2] = cpu_to_be32(0);
> +            dynamic_memory[3] = cpu_to_be32(0); /* reserved */
> +            dynamic_memory[4] = cpu_to_be32(-1);
> +            dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_RESERVED);

LoPAPR Table 248 defines a "DRC invalid" bit at 0x00000020, which I
think is what we'll want for cases where there's no backing DRC.

>          }
> 
>          cur_index += SPAPR_DR_LMB_LIST_ENTRY_SIZE;
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 971df3d..bb265a2 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -620,9 +620,10 @@ int spapr_rng_populate_dt(void *fdt);
>  #define SPAPR_DR_LMB_LIST_ENTRY_SIZE 6
> 
>  /*
> - * This flag value defines the LMB as assigned in ibm,dynamic-memory
> - * property under ibm,dynamic-reconfiguration-memory node.
> + * Defines for flag value in ibm,dynamic-memory property under
> + * ibm,dynamic-reconfiguration-memory node.
>   */
>  #define SPAPR_LMB_FLAGS_ASSIGNED 0x00000008
> +#define SPAPR_LMB_FLAGS_RESERVED 0x00000080
> 
>  #endif /* !defined (__HW_SPAPR_H__) */
> -- 
> 2.1.0
>
Bharata B Rao June 8, 2016, 2:35 a.m. UTC | #2
On Tue, Jun 07, 2016 at 06:37:28PM -0500, Michael Roth wrote:
> Quoting Bharata B Rao (2016-06-07 00:19:03)
> > Memory hotplug can fail for some combinations of RAM and maxmem when
> > DDW is enabled in the presence of devices like nec-usb-xhci. DDW depends
> > on maximum addressable memory returned by guest and this value is currently
> > being calculated wrongly by the guest kernel routine memory_hotplug_max().
> > While there is an attempt to fix the guest kernel, this patch works
> > around the problem within QEMU itself.
> > 
> > memory_hotplug_max() routine in the guest kernel arrives at max
> > addressable memory by multiplying lmb-size with the lmb-count obtained
> > from ibm,dynamic-memory property. There are two assumptions here:
> > 
> > - All LMBs are part of ibm,dynamic memory: This is not true for PowerKVM
> >   where only hot-pluggable LMBs are present in this property.
> > - The memory area comprising of RAM and hotplug region is contiguous: This
> >   needn't be true always for PowerKVM as there can be gap between
> >   boot time RAM and hotplug region.
> > 
> > To work around this guest kernel bug, ensure that ibm,dynamic-memory
> > has information about all the LMBs (RMA, boot-time LMBs, future
> > hotpluggable LMBs, and dummy LMBs to cover the gap between RAM and
> > hotpluggable region).
> > 
> > RMA is represented separately by memory@0 node. Hence mark RMA LMBs
> > and also the LMBs for the gap b/n RAM and hotpluggable region as
> > reserved so that these LMBs are not recounted/counted by guest.
> > 
> > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > ---
> > Changes in v3:
> > 
> > - Not touching spapr_create_lmb_dr_connectors() so that we continue
> >   to have DRC objects for only hotpluggable LMBs.
> > - Simplified the logic of creating dynamic-memory node based on comments
> >   from Michael Roth and David Gibson.
> > 
> > v2: https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg01316.html
> > 
> >  hw/ppc/spapr.c         | 51 ++++++++++++++++++++++++++++++++------------------
> >  include/hw/ppc/spapr.h |  5 +++--
> >  2 files changed, 36 insertions(+), 20 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 0636642..9d1d43d 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -762,14 +762,17 @@ static int spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt)
> >      int ret, i, offset;
> >      uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
> >      uint32_t prop_lmb_size[] = {0, cpu_to_be32(lmb_size)};
> > -    uint32_t nr_lmbs = (machine->maxram_size - machine->ram_size)/lmb_size;
> > +    uint32_t hotplug_lmb_start = spapr->hotplug_memory.base / lmb_size;
> > +    uint32_t nr_lmbs = (spapr->hotplug_memory.base +
> > +                       memory_region_size(&spapr->hotplug_memory.mr)) /
> > +                       lmb_size;
> >      uint32_t *int_buf, *cur_index, buf_len;
> >      int nr_nodes = nb_numa_nodes ? nb_numa_nodes : 1;
> > 
> >      /*
> > -     * Don't create the node if there are no DR LMBs.
> > +     * Don't create the node if there is no hotpluggable memory
> >       */
> > -    if (!nr_lmbs) {
> > +    if (machine->ram_size == machine->maxram_size) {
> >          return 0;
> >      }
> > 
> > @@ -805,24 +808,36 @@ static int spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt)
> >      for (i = 0; i < nr_lmbs; i++) {
> >          sPAPRDRConnector *drc;
> >          sPAPRDRConnectorClass *drck;
> 
> Since these ^ are only used if (i >= hotplug_lmb_start), it might be
> clearer to move them there now.

Yes.

> 
> > -        uint64_t addr = i * lmb_size + spapr->hotplug_memory.base;;
> > +        uint64_t addr = i * lmb_size;
> >          uint32_t *dynamic_memory = cur_index;
> > 
> > -        drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> > -                                       addr/lmb_size);
> > -        g_assert(drc);
> > -        drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > -
> > -        dynamic_memory[0] = cpu_to_be32(addr >> 32);
> > -        dynamic_memory[1] = cpu_to_be32(addr & 0xffffffff);
> > -        dynamic_memory[2] = cpu_to_be32(drck->get_index(drc));
> > -        dynamic_memory[3] = cpu_to_be32(0); /* reserved */
> > -        dynamic_memory[4] = cpu_to_be32(numa_get_node(addr, NULL));
> > -        if (addr < machine->ram_size ||
> > -                    memory_region_present(get_system_memory(), addr)) {
> > -            dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_ASSIGNED);
> > +        if (i >= hotplug_lmb_start) {
> > +            drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> > +                                           addr / lmb_size);
> 
> Could just be i

Hmm I thought I got all such occurances covered :(

> 
> > +            g_assert(drc);
> > +            drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > +
> > +            dynamic_memory[0] = cpu_to_be32(addr >> 32);
> > +            dynamic_memory[1] = cpu_to_be32(addr & 0xffffffff);
> > +            dynamic_memory[2] = cpu_to_be32(drck->get_index(drc));
> > +            dynamic_memory[3] = cpu_to_be32(0); /* reserved */
> > +            dynamic_memory[4] = cpu_to_be32(numa_get_node(addr, NULL));
> > +            if (memory_region_present(get_system_memory(), addr)) {
> > +                dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_ASSIGNED);
> > +            } else {
> > +                dynamic_memory[5] = cpu_to_be32(0);
> > +            }
> >          } else {
> > -            dynamic_memory[5] = cpu_to_be32(0);
> > +            /*
> > +             * LMB information for RMA, boot time RAM and gap b/n RAM and
> > +             * hotplug memory region -- all these are marked as reserved.
> > +             */
> > +            dynamic_memory[0] = cpu_to_be32(0);
> > +            dynamic_memory[1] = cpu_to_be32(0);
> 
> Are we sure we shouldn't still encode the addr here?

Since kernel won't look at reserved LMBs, I thought it should be fine.
We could populate the addr, but we don't have DRC for them and hence
no DRC index, so anyway we will not have full information.

> 
> > +            dynamic_memory[2] = cpu_to_be32(0);
> > +            dynamic_memory[3] = cpu_to_be32(0); /* reserved */
> > +            dynamic_memory[4] = cpu_to_be32(-1);
> > +            dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_RESERVED);
> 
> LoPAPR Table 248 defines a "DRC invalid" bit at 0x00000020, which I
> think is what we'll want for cases where there's no backing DRC.

Seems to work. I started with reserved based on Nathan's original
suggestion.

So Nathan, which would be more appropriate here from kernel point of view ?
Reserved or 'DRC Invalid' ?

Regards,
Bharata.
Michael Roth June 8, 2016, 3:05 p.m. UTC | #3
Quoting Bharata B Rao (2016-06-07 21:35:54)
> On Tue, Jun 07, 2016 at 06:37:28PM -0500, Michael Roth wrote:
> > Quoting Bharata B Rao (2016-06-07 00:19:03)
> > > Memory hotplug can fail for some combinations of RAM and maxmem when
> > > DDW is enabled in the presence of devices like nec-usb-xhci. DDW depends
> > > on maximum addressable memory returned by guest and this value is currently
> > > being calculated wrongly by the guest kernel routine memory_hotplug_max().
> > > While there is an attempt to fix the guest kernel, this patch works
> > > around the problem within QEMU itself.
> > > 
> > > memory_hotplug_max() routine in the guest kernel arrives at max
> > > addressable memory by multiplying lmb-size with the lmb-count obtained
> > > from ibm,dynamic-memory property. There are two assumptions here:
> > > 
> > > - All LMBs are part of ibm,dynamic memory: This is not true for PowerKVM
> > >   where only hot-pluggable LMBs are present in this property.
> > > - The memory area comprising of RAM and hotplug region is contiguous: This
> > >   needn't be true always for PowerKVM as there can be gap between
> > >   boot time RAM and hotplug region.
> > > 
> > > To work around this guest kernel bug, ensure that ibm,dynamic-memory
> > > has information about all the LMBs (RMA, boot-time LMBs, future
> > > hotpluggable LMBs, and dummy LMBs to cover the gap between RAM and
> > > hotpluggable region).
> > > 
> > > RMA is represented separately by memory@0 node. Hence mark RMA LMBs
> > > and also the LMBs for the gap b/n RAM and hotpluggable region as
> > > reserved so that these LMBs are not recounted/counted by guest.
> > > 
> > > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > > ---
> > > Changes in v3:
> > > 
> > > - Not touching spapr_create_lmb_dr_connectors() so that we continue
> > >   to have DRC objects for only hotpluggable LMBs.
> > > - Simplified the logic of creating dynamic-memory node based on comments
> > >   from Michael Roth and David Gibson.
> > > 
> > > v2: https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg01316.html
> > > 
> > >  hw/ppc/spapr.c         | 51 ++++++++++++++++++++++++++++++++------------------
> > >  include/hw/ppc/spapr.h |  5 +++--
> > >  2 files changed, 36 insertions(+), 20 deletions(-)
> > > 
> > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > index 0636642..9d1d43d 100644
> > > --- a/hw/ppc/spapr.c
> > > +++ b/hw/ppc/spapr.c
> > > @@ -762,14 +762,17 @@ static int spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt)
> > >      int ret, i, offset;
> > >      uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
> > >      uint32_t prop_lmb_size[] = {0, cpu_to_be32(lmb_size)};
> > > -    uint32_t nr_lmbs = (machine->maxram_size - machine->ram_size)/lmb_size;
> > > +    uint32_t hotplug_lmb_start = spapr->hotplug_memory.base / lmb_size;
> > > +    uint32_t nr_lmbs = (spapr->hotplug_memory.base +
> > > +                       memory_region_size(&spapr->hotplug_memory.mr)) /
> > > +                       lmb_size;
> > >      uint32_t *int_buf, *cur_index, buf_len;
> > >      int nr_nodes = nb_numa_nodes ? nb_numa_nodes : 1;
> > > 
> > >      /*
> > > -     * Don't create the node if there are no DR LMBs.
> > > +     * Don't create the node if there is no hotpluggable memory
> > >       */
> > > -    if (!nr_lmbs) {
> > > +    if (machine->ram_size == machine->maxram_size) {
> > >          return 0;
> > >      }
> > > 
> > > @@ -805,24 +808,36 @@ static int spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt)
> > >      for (i = 0; i < nr_lmbs; i++) {
> > >          sPAPRDRConnector *drc;
> > >          sPAPRDRConnectorClass *drck;
> > 
> > Since these ^ are only used if (i >= hotplug_lmb_start), it might be
> > clearer to move them there now.
> 
> Yes.
> 
> > 
> > > -        uint64_t addr = i * lmb_size + spapr->hotplug_memory.base;;
> > > +        uint64_t addr = i * lmb_size;
> > >          uint32_t *dynamic_memory = cur_index;
> > > 
> > > -        drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> > > -                                       addr/lmb_size);
> > > -        g_assert(drc);
> > > -        drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > > -
> > > -        dynamic_memory[0] = cpu_to_be32(addr >> 32);
> > > -        dynamic_memory[1] = cpu_to_be32(addr & 0xffffffff);
> > > -        dynamic_memory[2] = cpu_to_be32(drck->get_index(drc));
> > > -        dynamic_memory[3] = cpu_to_be32(0); /* reserved */
> > > -        dynamic_memory[4] = cpu_to_be32(numa_get_node(addr, NULL));
> > > -        if (addr < machine->ram_size ||
> > > -                    memory_region_present(get_system_memory(), addr)) {
> > > -            dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_ASSIGNED);
> > > +        if (i >= hotplug_lmb_start) {
> > > +            drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> > > +                                           addr / lmb_size);
> > 
> > Could just be i
> 
> Hmm I thought I got all such occurances covered :(
> 
> > 
> > > +            g_assert(drc);
> > > +            drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > > +
> > > +            dynamic_memory[0] = cpu_to_be32(addr >> 32);
> > > +            dynamic_memory[1] = cpu_to_be32(addr & 0xffffffff);
> > > +            dynamic_memory[2] = cpu_to_be32(drck->get_index(drc));
> > > +            dynamic_memory[3] = cpu_to_be32(0); /* reserved */
> > > +            dynamic_memory[4] = cpu_to_be32(numa_get_node(addr, NULL));
> > > +            if (memory_region_present(get_system_memory(), addr)) {
> > > +                dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_ASSIGNED);
> > > +            } else {
> > > +                dynamic_memory[5] = cpu_to_be32(0);
> > > +            }
> > >          } else {
> > > -            dynamic_memory[5] = cpu_to_be32(0);
> > > +            /*
> > > +             * LMB information for RMA, boot time RAM and gap b/n RAM and
> > > +             * hotplug memory region -- all these are marked as reserved.
> > > +             */
> > > +            dynamic_memory[0] = cpu_to_be32(0);
> > > +            dynamic_memory[1] = cpu_to_be32(0);
> > 
> > Are we sure we shouldn't still encode the addr here?
> 
> Since kernel won't look at reserved LMBs, I thought it should be fine.
> We could populate the addr, but we don't have DRC for them and hence
> no DRC index, so anyway we will not have full information.

The 'DRC invalid' bit seems to suggests it's a perfectly valid scenario
to have LMBs with no backing drc, so I would think the other fields
should still be filled out appropriately, even if it might not get
used by guest either way.

I think an argument could be made for not setting addr for LMBs to used
to cover the RAM->hotplug gap, since that's padding rather than real
memory. But effectively re-using addr 0 seems like more of a liability
than a safeguard. If 'reserved' flag is doing what we expect, then it's
probably best not to deviate from normal values any more than necessary,
IMO.

> 
> > 
> > > +            dynamic_memory[2] = cpu_to_be32(0);
> > > +            dynamic_memory[3] = cpu_to_be32(0); /* reserved */
> > > +            dynamic_memory[4] = cpu_to_be32(-1);
> > > +            dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_RESERVED);
> > 
> > LoPAPR Table 248 defines a "DRC invalid" bit at 0x00000020, which I
> > think is what we'll want for cases where there's no backing DRC.
> 
> Seems to work. I started with reserved based on Nathan's original
> suggestion.
> 
> So Nathan, which would be more appropriate here from kernel point of view ?
> Reserved or 'DRC Invalid' ?

Wouldn't we want an OR of both?

> 
> Regards,
> Bharata.
>
Bharata B Rao June 8, 2016, 3:50 p.m. UTC | #4
On Wed, Jun 08, 2016 at 10:05:12AM -0500, Michael Roth wrote:
> Quoting Bharata B Rao (2016-06-07 21:35:54)
> > On Tue, Jun 07, 2016 at 06:37:28PM -0500, Michael Roth wrote:
> > > Quoting Bharata B Rao (2016-06-07 00:19:03)
> > > > Memory hotplug can fail for some combinations of RAM and maxmem when
> > > > DDW is enabled in the presence of devices like nec-usb-xhci. DDW depends
> > > > on maximum addressable memory returned by guest and this value is currently
> > > > being calculated wrongly by the guest kernel routine memory_hotplug_max().
> > > > While there is an attempt to fix the guest kernel, this patch works
> > > > around the problem within QEMU itself.
> > > > 
> > > > memory_hotplug_max() routine in the guest kernel arrives at max
> > > > addressable memory by multiplying lmb-size with the lmb-count obtained
> > > > from ibm,dynamic-memory property. There are two assumptions here:
> > > > 
> > > > - All LMBs are part of ibm,dynamic memory: This is not true for PowerKVM
> > > >   where only hot-pluggable LMBs are present in this property.
> > > > - The memory area comprising of RAM and hotplug region is contiguous: This
> > > >   needn't be true always for PowerKVM as there can be gap between
> > > >   boot time RAM and hotplug region.
> > > > 
> > > > To work around this guest kernel bug, ensure that ibm,dynamic-memory
> > > > has information about all the LMBs (RMA, boot-time LMBs, future
> > > > hotpluggable LMBs, and dummy LMBs to cover the gap between RAM and
> > > > hotpluggable region).
> > > > 
> > > > RMA is represented separately by memory@0 node. Hence mark RMA LMBs
> > > > and also the LMBs for the gap b/n RAM and hotpluggable region as
> > > > reserved so that these LMBs are not recounted/counted by guest.
> > > > 
> > > > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > > > ---
> > > > Changes in v3:
> > > > 
> > > > - Not touching spapr_create_lmb_dr_connectors() so that we continue
> > > >   to have DRC objects for only hotpluggable LMBs.
> > > > - Simplified the logic of creating dynamic-memory node based on comments
> > > >   from Michael Roth and David Gibson.
> > > > 
> > > > v2: https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg01316.html
> > > > 
> > > >  hw/ppc/spapr.c         | 51 ++++++++++++++++++++++++++++++++------------------
> > > >  include/hw/ppc/spapr.h |  5 +++--
> > > >  2 files changed, 36 insertions(+), 20 deletions(-)
> > > > 
> > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > > index 0636642..9d1d43d 100644
> > > > --- a/hw/ppc/spapr.c
> > > > +++ b/hw/ppc/spapr.c
> > > > @@ -762,14 +762,17 @@ static int spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt)
> > > >      int ret, i, offset;
> > > >      uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
> > > >      uint32_t prop_lmb_size[] = {0, cpu_to_be32(lmb_size)};
> > > > -    uint32_t nr_lmbs = (machine->maxram_size - machine->ram_size)/lmb_size;
> > > > +    uint32_t hotplug_lmb_start = spapr->hotplug_memory.base / lmb_size;
> > > > +    uint32_t nr_lmbs = (spapr->hotplug_memory.base +
> > > > +                       memory_region_size(&spapr->hotplug_memory.mr)) /
> > > > +                       lmb_size;
> > > >      uint32_t *int_buf, *cur_index, buf_len;
> > > >      int nr_nodes = nb_numa_nodes ? nb_numa_nodes : 1;
> > > > 
> > > >      /*
> > > > -     * Don't create the node if there are no DR LMBs.
> > > > +     * Don't create the node if there is no hotpluggable memory
> > > >       */
> > > > -    if (!nr_lmbs) {
> > > > +    if (machine->ram_size == machine->maxram_size) {
> > > >          return 0;
> > > >      }
> > > > 
> > > > @@ -805,24 +808,36 @@ static int spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt)
> > > >      for (i = 0; i < nr_lmbs; i++) {
> > > >          sPAPRDRConnector *drc;
> > > >          sPAPRDRConnectorClass *drck;
> > > 
> > > Since these ^ are only used if (i >= hotplug_lmb_start), it might be
> > > clearer to move them there now.
> > 
> > Yes.
> > 
> > > 
> > > > -        uint64_t addr = i * lmb_size + spapr->hotplug_memory.base;;
> > > > +        uint64_t addr = i * lmb_size;
> > > >          uint32_t *dynamic_memory = cur_index;
> > > > 
> > > > -        drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> > > > -                                       addr/lmb_size);
> > > > -        g_assert(drc);
> > > > -        drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > > > -
> > > > -        dynamic_memory[0] = cpu_to_be32(addr >> 32);
> > > > -        dynamic_memory[1] = cpu_to_be32(addr & 0xffffffff);
> > > > -        dynamic_memory[2] = cpu_to_be32(drck->get_index(drc));
> > > > -        dynamic_memory[3] = cpu_to_be32(0); /* reserved */
> > > > -        dynamic_memory[4] = cpu_to_be32(numa_get_node(addr, NULL));
> > > > -        if (addr < machine->ram_size ||
> > > > -                    memory_region_present(get_system_memory(), addr)) {
> > > > -            dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_ASSIGNED);
> > > > +        if (i >= hotplug_lmb_start) {
> > > > +            drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> > > > +                                           addr / lmb_size);
> > > 
> > > Could just be i
> > 
> > Hmm I thought I got all such occurances covered :(
> > 
> > > 
> > > > +            g_assert(drc);
> > > > +            drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > > > +
> > > > +            dynamic_memory[0] = cpu_to_be32(addr >> 32);
> > > > +            dynamic_memory[1] = cpu_to_be32(addr & 0xffffffff);
> > > > +            dynamic_memory[2] = cpu_to_be32(drck->get_index(drc));
> > > > +            dynamic_memory[3] = cpu_to_be32(0); /* reserved */
> > > > +            dynamic_memory[4] = cpu_to_be32(numa_get_node(addr, NULL));
> > > > +            if (memory_region_present(get_system_memory(), addr)) {
> > > > +                dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_ASSIGNED);
> > > > +            } else {
> > > > +                dynamic_memory[5] = cpu_to_be32(0);
> > > > +            }
> > > >          } else {
> > > > -            dynamic_memory[5] = cpu_to_be32(0);
> > > > +            /*
> > > > +             * LMB information for RMA, boot time RAM and gap b/n RAM and
> > > > +             * hotplug memory region -- all these are marked as reserved.
> > > > +             */
> > > > +            dynamic_memory[0] = cpu_to_be32(0);
> > > > +            dynamic_memory[1] = cpu_to_be32(0);
> > > 
> > > Are we sure we shouldn't still encode the addr here?
> > 
> > Since kernel won't look at reserved LMBs, I thought it should be fine.
> > We could populate the addr, but we don't have DRC for them and hence
> > no DRC index, so anyway we will not have full information.
> 
> The 'DRC invalid' bit seems to suggests it's a perfectly valid scenario
> to have LMBs with no backing drc, so I would think the other fields
> should still be filled out appropriately, even if it might not get
> used by guest either way.

addr can be populated, yes, but since there are no backing DRCs for
RMA and rest of the boot-time RAM, we can't be populating DRC index.

'DRC Invalid' in Table 248 of LoPAPR reads: If b '0/1', the DRC field of
"ibm,dynamic-memory" property is valid/invalid.

In ibm,dyanmic-memory, anything sounding close to "DRC field" is DRC index,
so it should be ok to have 0 for DRC index for such LMBs I suppose.

> 
> I think an argument could be made for not setting addr for LMBs to used
> to cover the RAM->hotplug gap, since that's padding rather than real
> memory. But effectively re-using addr 0 seems like more of a liability
> than a safeguard. If 'reserved' flag is doing what we expect, then it's
> probably best not to deviate from normal values any more than necessary,
> IMO.

Fine.

> 
> > 
> > > 
> > > > +            dynamic_memory[2] = cpu_to_be32(0);
> > > > +            dynamic_memory[3] = cpu_to_be32(0); /* reserved */
> > > > +            dynamic_memory[4] = cpu_to_be32(-1);
> > > > +            dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_RESERVED);
> > > 
> > > LoPAPR Table 248 defines a "DRC invalid" bit at 0x00000020, which I
> > > think is what we'll want for cases where there's no backing DRC.
> > 
> > Seems to work. I started with reserved based on Nathan's original
> > suggestion.
> > 
> > So Nathan, which would be more appropriate here from kernel point of view ?
> > Reserved or 'DRC Invalid' ?
> 
> Wouldn't we want an OR of both?

To be fully sure that these LMBs would never be enumerated by the guest ?

Regards,
Bharata.
Michael Roth June 8, 2016, 4:09 p.m. UTC | #5
Quoting Bharata B Rao (2016-06-08 10:50:03)
> On Wed, Jun 08, 2016 at 10:05:12AM -0500, Michael Roth wrote:
> > Quoting Bharata B Rao (2016-06-07 21:35:54)
> > > On Tue, Jun 07, 2016 at 06:37:28PM -0500, Michael Roth wrote:
> > > > Quoting Bharata B Rao (2016-06-07 00:19:03)
> > > > > Memory hotplug can fail for some combinations of RAM and maxmem when
> > > > > DDW is enabled in the presence of devices like nec-usb-xhci. DDW depends
> > > > > on maximum addressable memory returned by guest and this value is currently
> > > > > being calculated wrongly by the guest kernel routine memory_hotplug_max().
> > > > > While there is an attempt to fix the guest kernel, this patch works
> > > > > around the problem within QEMU itself.
> > > > > 
> > > > > memory_hotplug_max() routine in the guest kernel arrives at max
> > > > > addressable memory by multiplying lmb-size with the lmb-count obtained
> > > > > from ibm,dynamic-memory property. There are two assumptions here:
> > > > > 
> > > > > - All LMBs are part of ibm,dynamic memory: This is not true for PowerKVM
> > > > >   where only hot-pluggable LMBs are present in this property.
> > > > > - The memory area comprising of RAM and hotplug region is contiguous: This
> > > > >   needn't be true always for PowerKVM as there can be gap between
> > > > >   boot time RAM and hotplug region.
> > > > > 
> > > > > To work around this guest kernel bug, ensure that ibm,dynamic-memory
> > > > > has information about all the LMBs (RMA, boot-time LMBs, future
> > > > > hotpluggable LMBs, and dummy LMBs to cover the gap between RAM and
> > > > > hotpluggable region).
> > > > > 
> > > > > RMA is represented separately by memory@0 node. Hence mark RMA LMBs
> > > > > and also the LMBs for the gap b/n RAM and hotpluggable region as
> > > > > reserved so that these LMBs are not recounted/counted by guest.
> > > > > 
> > > > > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > > > > ---
> > > > > Changes in v3:
> > > > > 
> > > > > - Not touching spapr_create_lmb_dr_connectors() so that we continue
> > > > >   to have DRC objects for only hotpluggable LMBs.
> > > > > - Simplified the logic of creating dynamic-memory node based on comments
> > > > >   from Michael Roth and David Gibson.
> > > > > 
> > > > > v2: https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg01316.html
> > > > > 
> > > > >  hw/ppc/spapr.c         | 51 ++++++++++++++++++++++++++++++++------------------
> > > > >  include/hw/ppc/spapr.h |  5 +++--
> > > > >  2 files changed, 36 insertions(+), 20 deletions(-)
> > > > > 
> > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > > > index 0636642..9d1d43d 100644
> > > > > --- a/hw/ppc/spapr.c
> > > > > +++ b/hw/ppc/spapr.c
> > > > > @@ -762,14 +762,17 @@ static int spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt)
> > > > >      int ret, i, offset;
> > > > >      uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
> > > > >      uint32_t prop_lmb_size[] = {0, cpu_to_be32(lmb_size)};
> > > > > -    uint32_t nr_lmbs = (machine->maxram_size - machine->ram_size)/lmb_size;
> > > > > +    uint32_t hotplug_lmb_start = spapr->hotplug_memory.base / lmb_size;
> > > > > +    uint32_t nr_lmbs = (spapr->hotplug_memory.base +
> > > > > +                       memory_region_size(&spapr->hotplug_memory.mr)) /
> > > > > +                       lmb_size;
> > > > >      uint32_t *int_buf, *cur_index, buf_len;
> > > > >      int nr_nodes = nb_numa_nodes ? nb_numa_nodes : 1;
> > > > > 
> > > > >      /*
> > > > > -     * Don't create the node if there are no DR LMBs.
> > > > > +     * Don't create the node if there is no hotpluggable memory
> > > > >       */
> > > > > -    if (!nr_lmbs) {
> > > > > +    if (machine->ram_size == machine->maxram_size) {
> > > > >          return 0;
> > > > >      }
> > > > > 
> > > > > @@ -805,24 +808,36 @@ static int spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt)
> > > > >      for (i = 0; i < nr_lmbs; i++) {
> > > > >          sPAPRDRConnector *drc;
> > > > >          sPAPRDRConnectorClass *drck;
> > > > 
> > > > Since these ^ are only used if (i >= hotplug_lmb_start), it might be
> > > > clearer to move them there now.
> > > 
> > > Yes.
> > > 
> > > > 
> > > > > -        uint64_t addr = i * lmb_size + spapr->hotplug_memory.base;;
> > > > > +        uint64_t addr = i * lmb_size;
> > > > >          uint32_t *dynamic_memory = cur_index;
> > > > > 
> > > > > -        drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> > > > > -                                       addr/lmb_size);
> > > > > -        g_assert(drc);
> > > > > -        drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > > > > -
> > > > > -        dynamic_memory[0] = cpu_to_be32(addr >> 32);
> > > > > -        dynamic_memory[1] = cpu_to_be32(addr & 0xffffffff);
> > > > > -        dynamic_memory[2] = cpu_to_be32(drck->get_index(drc));
> > > > > -        dynamic_memory[3] = cpu_to_be32(0); /* reserved */
> > > > > -        dynamic_memory[4] = cpu_to_be32(numa_get_node(addr, NULL));
> > > > > -        if (addr < machine->ram_size ||
> > > > > -                    memory_region_present(get_system_memory(), addr)) {
> > > > > -            dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_ASSIGNED);
> > > > > +        if (i >= hotplug_lmb_start) {
> > > > > +            drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> > > > > +                                           addr / lmb_size);
> > > > 
> > > > Could just be i
> > > 
> > > Hmm I thought I got all such occurances covered :(
> > > 
> > > > 
> > > > > +            g_assert(drc);
> > > > > +            drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > > > > +
> > > > > +            dynamic_memory[0] = cpu_to_be32(addr >> 32);
> > > > > +            dynamic_memory[1] = cpu_to_be32(addr & 0xffffffff);
> > > > > +            dynamic_memory[2] = cpu_to_be32(drck->get_index(drc));
> > > > > +            dynamic_memory[3] = cpu_to_be32(0); /* reserved */
> > > > > +            dynamic_memory[4] = cpu_to_be32(numa_get_node(addr, NULL));
> > > > > +            if (memory_region_present(get_system_memory(), addr)) {
> > > > > +                dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_ASSIGNED);
> > > > > +            } else {
> > > > > +                dynamic_memory[5] = cpu_to_be32(0);
> > > > > +            }
> > > > >          } else {
> > > > > -            dynamic_memory[5] = cpu_to_be32(0);
> > > > > +            /*
> > > > > +             * LMB information for RMA, boot time RAM and gap b/n RAM and
> > > > > +             * hotplug memory region -- all these are marked as reserved.
> > > > > +             */
> > > > > +            dynamic_memory[0] = cpu_to_be32(0);
> > > > > +            dynamic_memory[1] = cpu_to_be32(0);
> > > > 
> > > > Are we sure we shouldn't still encode the addr here?
> > > 
> > > Since kernel won't look at reserved LMBs, I thought it should be fine.
> > > We could populate the addr, but we don't have DRC for them and hence
> > > no DRC index, so anyway we will not have full information.
> > 
> > The 'DRC invalid' bit seems to suggests it's a perfectly valid scenario
> > to have LMBs with no backing drc, so I would think the other fields
> > should still be filled out appropriately, even if it might not get
> > used by guest either way.
> 
> addr can be populated, yes, but since there are no backing DRCs for
> RMA and rest of the boot-time RAM, we can't be populating DRC index.
> 
> 'DRC Invalid' in Table 248 of LoPAPR reads: If b '0/1', the DRC field of
> "ibm,dynamic-memory" property is valid/invalid.
> 
> In ibm,dyanmic-memory, anything sounding close to "DRC field" is DRC index,
> so it should be ok to have 0 for DRC index for such LMBs I suppose.

Agreed, for the drc index case I think leaving it 0/unset is the only
sensible approach, but if there's a flag to denote this situation we
should probably make use of it.

> 
> > 
> > I think an argument could be made for not setting addr for LMBs to used
> > to cover the RAM->hotplug gap, since that's padding rather than real
> > memory. But effectively re-using addr 0 seems like more of a liability
> > than a safeguard. If 'reserved' flag is doing what we expect, then it's
> > probably best not to deviate from normal values any more than necessary,
> > IMO.
> 
> Fine.
> 
> > 
> > > 
> > > > 
> > > > > +            dynamic_memory[2] = cpu_to_be32(0);
> > > > > +            dynamic_memory[3] = cpu_to_be32(0); /* reserved */
> > > > > +            dynamic_memory[4] = cpu_to_be32(-1);
> > > > > +            dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_RESERVED);
> > > > 
> > > > LoPAPR Table 248 defines a "DRC invalid" bit at 0x00000020, which I
> > > > think is what we'll want for cases where there's no backing DRC.
> > > 
> > > Seems to work. I started with reserved based on Nathan's original
> > > suggestion.
> > > 
> > > So Nathan, which would be more appropriate here from kernel point of view ?
> > > Reserved or 'DRC Invalid' ?
> > 
> > Wouldn't we want an OR of both?
> 
> To be fully sure that these LMBs would never be enumerated by the guest ?

I think 'reserved' flag covers that aspect, I'm really only suggesting
'drc invalid' flag also be included in the unlikely case that some
drmgr-like tool out there attempts to interpret the DRC index field as a
real index otherwise.

> 
> Regards,
> Bharata.
>
David Gibson June 9, 2016, 2:03 a.m. UTC | #6
On Wed, Jun 08, 2016 at 10:05:12AM -0500, Michael Roth wrote:
> Quoting Bharata B Rao (2016-06-07 21:35:54)
> > On Tue, Jun 07, 2016 at 06:37:28PM -0500, Michael Roth wrote:
> > > Quoting Bharata B Rao (2016-06-07 00:19:03)
> > > > Memory hotplug can fail for some combinations of RAM and maxmem when
> > > > DDW is enabled in the presence of devices like nec-usb-xhci. DDW depends
> > > > on maximum addressable memory returned by guest and this value is currently
> > > > being calculated wrongly by the guest kernel routine memory_hotplug_max().
> > > > While there is an attempt to fix the guest kernel, this patch works
> > > > around the problem within QEMU itself.
> > > > 
> > > > memory_hotplug_max() routine in the guest kernel arrives at max
> > > > addressable memory by multiplying lmb-size with the lmb-count obtained
> > > > from ibm,dynamic-memory property. There are two assumptions here:
> > > > 
> > > > - All LMBs are part of ibm,dynamic memory: This is not true for PowerKVM
> > > >   where only hot-pluggable LMBs are present in this property.
> > > > - The memory area comprising of RAM and hotplug region is contiguous: This
> > > >   needn't be true always for PowerKVM as there can be gap between
> > > >   boot time RAM and hotplug region.
> > > > 
> > > > To work around this guest kernel bug, ensure that ibm,dynamic-memory
> > > > has information about all the LMBs (RMA, boot-time LMBs, future
> > > > hotpluggable LMBs, and dummy LMBs to cover the gap between RAM and
> > > > hotpluggable region).
> > > > 
> > > > RMA is represented separately by memory@0 node. Hence mark RMA LMBs
> > > > and also the LMBs for the gap b/n RAM and hotpluggable region as
> > > > reserved so that these LMBs are not recounted/counted by guest.
> > > > 
> > > > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > > > ---
> > > > Changes in v3:
> > > > 
> > > > - Not touching spapr_create_lmb_dr_connectors() so that we continue
> > > >   to have DRC objects for only hotpluggable LMBs.
> > > > - Simplified the logic of creating dynamic-memory node based on comments
> > > >   from Michael Roth and David Gibson.
> > > > 
> > > > v2: https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg01316.html
> > > > 
> > > >  hw/ppc/spapr.c         | 51 ++++++++++++++++++++++++++++++++------------------
> > > >  include/hw/ppc/spapr.h |  5 +++--
> > > >  2 files changed, 36 insertions(+), 20 deletions(-)
> > > > 
> > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > > index 0636642..9d1d43d 100644
> > > > --- a/hw/ppc/spapr.c
> > > > +++ b/hw/ppc/spapr.c
> > > > @@ -762,14 +762,17 @@ static int spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt)
> > > >      int ret, i, offset;
> > > >      uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
> > > >      uint32_t prop_lmb_size[] = {0, cpu_to_be32(lmb_size)};
> > > > -    uint32_t nr_lmbs = (machine->maxram_size - machine->ram_size)/lmb_size;
> > > > +    uint32_t hotplug_lmb_start = spapr->hotplug_memory.base / lmb_size;
> > > > +    uint32_t nr_lmbs = (spapr->hotplug_memory.base +
> > > > +                       memory_region_size(&spapr->hotplug_memory.mr)) /
> > > > +                       lmb_size;
> > > >      uint32_t *int_buf, *cur_index, buf_len;
> > > >      int nr_nodes = nb_numa_nodes ? nb_numa_nodes : 1;
> > > > 
> > > >      /*
> > > > -     * Don't create the node if there are no DR LMBs.
> > > > +     * Don't create the node if there is no hotpluggable memory
> > > >       */
> > > > -    if (!nr_lmbs) {
> > > > +    if (machine->ram_size == machine->maxram_size) {
> > > >          return 0;
> > > >      }
> > > > 
> > > > @@ -805,24 +808,36 @@ static int spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt)
> > > >      for (i = 0; i < nr_lmbs; i++) {
> > > >          sPAPRDRConnector *drc;
> > > >          sPAPRDRConnectorClass *drck;
> > > 
> > > Since these ^ are only used if (i >= hotplug_lmb_start), it might be
> > > clearer to move them there now.
> > 
> > Yes.
> > 
> > > 
> > > > -        uint64_t addr = i * lmb_size + spapr->hotplug_memory.base;;
> > > > +        uint64_t addr = i * lmb_size;
> > > >          uint32_t *dynamic_memory = cur_index;
> > > > 
> > > > -        drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> > > > -                                       addr/lmb_size);
> > > > -        g_assert(drc);
> > > > -        drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > > > -
> > > > -        dynamic_memory[0] = cpu_to_be32(addr >> 32);
> > > > -        dynamic_memory[1] = cpu_to_be32(addr & 0xffffffff);
> > > > -        dynamic_memory[2] = cpu_to_be32(drck->get_index(drc));
> > > > -        dynamic_memory[3] = cpu_to_be32(0); /* reserved */
> > > > -        dynamic_memory[4] = cpu_to_be32(numa_get_node(addr, NULL));
> > > > -        if (addr < machine->ram_size ||
> > > > -                    memory_region_present(get_system_memory(), addr)) {
> > > > -            dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_ASSIGNED);
> > > > +        if (i >= hotplug_lmb_start) {
> > > > +            drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> > > > +                                           addr / lmb_size);
> > > 
> > > Could just be i
> > 
> > Hmm I thought I got all such occurances covered :(
> > 
> > > 
> > > > +            g_assert(drc);
> > > > +            drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > > > +
> > > > +            dynamic_memory[0] = cpu_to_be32(addr >> 32);
> > > > +            dynamic_memory[1] = cpu_to_be32(addr & 0xffffffff);
> > > > +            dynamic_memory[2] = cpu_to_be32(drck->get_index(drc));
> > > > +            dynamic_memory[3] = cpu_to_be32(0); /* reserved */
> > > > +            dynamic_memory[4] = cpu_to_be32(numa_get_node(addr, NULL));
> > > > +            if (memory_region_present(get_system_memory(), addr)) {
> > > > +                dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_ASSIGNED);
> > > > +            } else {
> > > > +                dynamic_memory[5] = cpu_to_be32(0);
> > > > +            }
> > > >          } else {
> > > > -            dynamic_memory[5] = cpu_to_be32(0);
> > > > +            /*
> > > > +             * LMB information for RMA, boot time RAM and gap b/n RAM and
> > > > +             * hotplug memory region -- all these are marked as reserved.
> > > > +             */
> > > > +            dynamic_memory[0] = cpu_to_be32(0);
> > > > +            dynamic_memory[1] = cpu_to_be32(0);
> > > 
> > > Are we sure we shouldn't still encode the addr here?
> > 
> > Since kernel won't look at reserved LMBs, I thought it should be fine.
> > We could populate the addr, but we don't have DRC for them and hence
> > no DRC index, so anyway we will not have full information.
> 
> The 'DRC invalid' bit seems to suggests it's a perfectly valid scenario
> to have LMBs with no backing drc, so I would think the other fields
> should still be filled out appropriately, even if it might not get
> used by guest either way.

That makes sense - but we should check that existing guests will
actually respect that DRC invalid bit.

> I think an argument could be made for not setting addr for LMBs to used
> to cover the RAM->hotplug gap, since that's padding rather than real
> memory. But effectively re-using addr 0 seems like more of a liability
> than a safeguard. If 'reserved' flag is doing what we expect, then it's
> probably best not to deviate from normal values any more than necessary,
> IMO.

Yes, I think addr should be set, even for the dummy LMBs.

> > > > +            dynamic_memory[2] = cpu_to_be32(0);
> > > > +            dynamic_memory[3] = cpu_to_be32(0); /* reserved */
> > > > +            dynamic_memory[4] = cpu_to_be32(-1);
> > > > +            dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_RESERVED);
> > > 
> > > LoPAPR Table 248 defines a "DRC invalid" bit at 0x00000020, which I
> > > think is what we'll want for cases where there's no backing DRC.
> > 
> > Seems to work. I started with reserved based on Nathan's original
> > suggestion.
> > 
> > So Nathan, which would be more appropriate here from kernel point of view ?
> > Reserved or 'DRC Invalid' ?
> 
> Wouldn't we want an OR of both?

That might be the safest option, yes.
David Gibson June 9, 2016, 2:05 a.m. UTC | #7
On Wed, Jun 08, 2016 at 11:09:11AM -0500, Michael Roth wrote:
> Quoting Bharata B Rao (2016-06-08 10:50:03)
> > On Wed, Jun 08, 2016 at 10:05:12AM -0500, Michael Roth wrote:
> > > Quoting Bharata B Rao (2016-06-07 21:35:54)
> > > > On Tue, Jun 07, 2016 at 06:37:28PM -0500, Michael Roth wrote:
> > > > > Quoting Bharata B Rao (2016-06-07 00:19:03)
> > > > > > Memory hotplug can fail for some combinations of RAM and maxmem when
> > > > > > DDW is enabled in the presence of devices like nec-usb-xhci. DDW depends
> > > > > > on maximum addressable memory returned by guest and this value is currently
> > > > > > being calculated wrongly by the guest kernel routine memory_hotplug_max().
> > > > > > While there is an attempt to fix the guest kernel, this patch works
> > > > > > around the problem within QEMU itself.
> > > > > > 
> > > > > > memory_hotplug_max() routine in the guest kernel arrives at max
> > > > > > addressable memory by multiplying lmb-size with the lmb-count obtained
> > > > > > from ibm,dynamic-memory property. There are two assumptions here:
> > > > > > 
> > > > > > - All LMBs are part of ibm,dynamic memory: This is not true for PowerKVM
> > > > > >   where only hot-pluggable LMBs are present in this property.
> > > > > > - The memory area comprising of RAM and hotplug region is contiguous: This
> > > > > >   needn't be true always for PowerKVM as there can be gap between
> > > > > >   boot time RAM and hotplug region.
> > > > > > 
> > > > > > To work around this guest kernel bug, ensure that ibm,dynamic-memory
> > > > > > has information about all the LMBs (RMA, boot-time LMBs, future
> > > > > > hotpluggable LMBs, and dummy LMBs to cover the gap between RAM and
> > > > > > hotpluggable region).
> > > > > > 
> > > > > > RMA is represented separately by memory@0 node. Hence mark RMA LMBs
> > > > > > and also the LMBs for the gap b/n RAM and hotpluggable region as
> > > > > > reserved so that these LMBs are not recounted/counted by guest.
> > > > > > 
> > > > > > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > > > > > ---
> > > > > > Changes in v3:
> > > > > > 
> > > > > > - Not touching spapr_create_lmb_dr_connectors() so that we continue
> > > > > >   to have DRC objects for only hotpluggable LMBs.
> > > > > > - Simplified the logic of creating dynamic-memory node based on comments
> > > > > >   from Michael Roth and David Gibson.
> > > > > > 
> > > > > > v2: https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg01316.html
> > > > > > 
> > > > > >  hw/ppc/spapr.c         | 51 ++++++++++++++++++++++++++++++++------------------
> > > > > >  include/hw/ppc/spapr.h |  5 +++--
> > > > > >  2 files changed, 36 insertions(+), 20 deletions(-)
> > > > > > 
> > > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > > > > index 0636642..9d1d43d 100644
> > > > > > --- a/hw/ppc/spapr.c
> > > > > > +++ b/hw/ppc/spapr.c
> > > > > > @@ -762,14 +762,17 @@ static int spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt)
> > > > > >      int ret, i, offset;
> > > > > >      uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
> > > > > >      uint32_t prop_lmb_size[] = {0, cpu_to_be32(lmb_size)};
> > > > > > -    uint32_t nr_lmbs = (machine->maxram_size - machine->ram_size)/lmb_size;
> > > > > > +    uint32_t hotplug_lmb_start = spapr->hotplug_memory.base / lmb_size;
> > > > > > +    uint32_t nr_lmbs = (spapr->hotplug_memory.base +
> > > > > > +                       memory_region_size(&spapr->hotplug_memory.mr)) /
> > > > > > +                       lmb_size;
> > > > > >      uint32_t *int_buf, *cur_index, buf_len;
> > > > > >      int nr_nodes = nb_numa_nodes ? nb_numa_nodes : 1;
> > > > > > 
> > > > > >      /*
> > > > > > -     * Don't create the node if there are no DR LMBs.
> > > > > > +     * Don't create the node if there is no hotpluggable memory
> > > > > >       */
> > > > > > -    if (!nr_lmbs) {
> > > > > > +    if (machine->ram_size == machine->maxram_size) {
> > > > > >          return 0;
> > > > > >      }
> > > > > > 
> > > > > > @@ -805,24 +808,36 @@ static int spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt)
> > > > > >      for (i = 0; i < nr_lmbs; i++) {
> > > > > >          sPAPRDRConnector *drc;
> > > > > >          sPAPRDRConnectorClass *drck;
> > > > > 
> > > > > Since these ^ are only used if (i >= hotplug_lmb_start), it might be
> > > > > clearer to move them there now.
> > > > 
> > > > Yes.
> > > > 
> > > > > 
> > > > > > -        uint64_t addr = i * lmb_size + spapr->hotplug_memory.base;;
> > > > > > +        uint64_t addr = i * lmb_size;
> > > > > >          uint32_t *dynamic_memory = cur_index;
> > > > > > 
> > > > > > -        drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> > > > > > -                                       addr/lmb_size);
> > > > > > -        g_assert(drc);
> > > > > > -        drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > > > > > -
> > > > > > -        dynamic_memory[0] = cpu_to_be32(addr >> 32);
> > > > > > -        dynamic_memory[1] = cpu_to_be32(addr & 0xffffffff);
> > > > > > -        dynamic_memory[2] = cpu_to_be32(drck->get_index(drc));
> > > > > > -        dynamic_memory[3] = cpu_to_be32(0); /* reserved */
> > > > > > -        dynamic_memory[4] = cpu_to_be32(numa_get_node(addr, NULL));
> > > > > > -        if (addr < machine->ram_size ||
> > > > > > -                    memory_region_present(get_system_memory(), addr)) {
> > > > > > -            dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_ASSIGNED);
> > > > > > +        if (i >= hotplug_lmb_start) {
> > > > > > +            drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> > > > > > +                                           addr / lmb_size);
> > > > > 
> > > > > Could just be i
> > > > 
> > > > Hmm I thought I got all such occurances covered :(
> > > > 
> > > > > 
> > > > > > +            g_assert(drc);
> > > > > > +            drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > > > > > +
> > > > > > +            dynamic_memory[0] = cpu_to_be32(addr >> 32);
> > > > > > +            dynamic_memory[1] = cpu_to_be32(addr & 0xffffffff);
> > > > > > +            dynamic_memory[2] = cpu_to_be32(drck->get_index(drc));
> > > > > > +            dynamic_memory[3] = cpu_to_be32(0); /* reserved */
> > > > > > +            dynamic_memory[4] = cpu_to_be32(numa_get_node(addr, NULL));
> > > > > > +            if (memory_region_present(get_system_memory(), addr)) {
> > > > > > +                dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_ASSIGNED);
> > > > > > +            } else {
> > > > > > +                dynamic_memory[5] = cpu_to_be32(0);
> > > > > > +            }
> > > > > >          } else {
> > > > > > -            dynamic_memory[5] = cpu_to_be32(0);
> > > > > > +            /*
> > > > > > +             * LMB information for RMA, boot time RAM and gap b/n RAM and
> > > > > > +             * hotplug memory region -- all these are marked as reserved.
> > > > > > +             */
> > > > > > +            dynamic_memory[0] = cpu_to_be32(0);
> > > > > > +            dynamic_memory[1] = cpu_to_be32(0);
> > > > > 
> > > > > Are we sure we shouldn't still encode the addr here?
> > > > 
> > > > Since kernel won't look at reserved LMBs, I thought it should be fine.
> > > > We could populate the addr, but we don't have DRC for them and hence
> > > > no DRC index, so anyway we will not have full information.
> > > 
> > > The 'DRC invalid' bit seems to suggests it's a perfectly valid scenario
> > > to have LMBs with no backing drc, so I would think the other fields
> > > should still be filled out appropriately, even if it might not get
> > > used by guest either way.
> > 
> > addr can be populated, yes, but since there are no backing DRCs for
> > RMA and rest of the boot-time RAM, we can't be populating DRC index.
> > 
> > 'DRC Invalid' in Table 248 of LoPAPR reads: If b '0/1', the DRC field of
> > "ibm,dynamic-memory" property is valid/invalid.
> > 
> > In ibm,dyanmic-memory, anything sounding close to "DRC field" is DRC index,
> > so it should be ok to have 0 for DRC index for such LMBs I suppose.
> 
> Agreed, for the drc index case I think leaving it 0/unset is the only
> sensible approach, but if there's a flag to denote this situation we
> should probably make use of it.
> 
> > 
> > > 
> > > I think an argument could be made for not setting addr for LMBs to used
> > > to cover the RAM->hotplug gap, since that's padding rather than real
> > > memory. But effectively re-using addr 0 seems like more of a liability
> > > than a safeguard. If 'reserved' flag is doing what we expect, then it's
> > > probably best not to deviate from normal values any more than necessary,
> > > IMO.
> > 
> > Fine.
> > 
> > > 
> > > > 
> > > > > 
> > > > > > +            dynamic_memory[2] = cpu_to_be32(0);
> > > > > > +            dynamic_memory[3] = cpu_to_be32(0); /* reserved */
> > > > > > +            dynamic_memory[4] = cpu_to_be32(-1);
> > > > > > +            dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_RESERVED);
> > > > > 
> > > > > LoPAPR Table 248 defines a "DRC invalid" bit at 0x00000020, which I
> > > > > think is what we'll want for cases where there's no backing DRC.
> > > > 
> > > > Seems to work. I started with reserved based on Nathan's original
> > > > suggestion.
> > > > 
> > > > So Nathan, which would be more appropriate here from kernel point of view ?
> > > > Reserved or 'DRC Invalid' ?
> > > 
> > > Wouldn't we want an OR of both?
> > 
> > To be fully sure that these LMBs would never be enumerated by the guest ?
> 
> I think 'reserved' flag covers that aspect, I'm really only suggesting
> 'drc invalid' flag also be included in the unlikely case that some
> drmgr-like tool out there attempts to interpret the DRC index field as a
> real index otherwise.

Right, if I'm understanding this correctly, it sounds like these bits
have slightly different meanings.  The DRC invalid bit is essentially
talking about the specific encoding here in the DT - the DRC index
field is not populated.  The Reserved bit is talking about how the
guest should use (or in this case, not use) the area which is
theoretically separate from whether it has a DRC backing it or not.

So, yes, I think we want both bits set.
Bharata B Rao June 10, 2016, 5:07 a.m. UTC | #8
On Thu, Jun 09, 2016 at 12:03:30PM +1000, David Gibson wrote:
> On Wed, Jun 08, 2016 at 10:05:12AM -0500, Michael Roth wrote:
> > Quoting Bharata B Rao (2016-06-07 21:35:54)
> > > On Tue, Jun 07, 2016 at 06:37:28PM -0500, Michael Roth wrote:
> > > > Quoting Bharata B Rao (2016-06-07 00:19:03)
> > > > > Memory hotplug can fail for some combinations of RAM and maxmem when
> > > > > DDW is enabled in the presence of devices like nec-usb-xhci. DDW depends
> > > > > on maximum addressable memory returned by guest and this value is currently
> > > > > being calculated wrongly by the guest kernel routine memory_hotplug_max().
> > > > > While there is an attempt to fix the guest kernel, this patch works
> > > > > around the problem within QEMU itself.
> > > > > 
> > > > > memory_hotplug_max() routine in the guest kernel arrives at max
> > > > > addressable memory by multiplying lmb-size with the lmb-count obtained
> > > > > from ibm,dynamic-memory property. There are two assumptions here:
> > > > > 
> > > > > - All LMBs are part of ibm,dynamic memory: This is not true for PowerKVM
> > > > >   where only hot-pluggable LMBs are present in this property.
> > > > > - The memory area comprising of RAM and hotplug region is contiguous: This
> > > > >   needn't be true always for PowerKVM as there can be gap between
> > > > >   boot time RAM and hotplug region.
> > > > > 
> > > > > To work around this guest kernel bug, ensure that ibm,dynamic-memory
> > > > > has information about all the LMBs (RMA, boot-time LMBs, future
> > > > > hotpluggable LMBs, and dummy LMBs to cover the gap between RAM and
> > > > > hotpluggable region).
> > > > > 
> > > > > RMA is represented separately by memory@0 node. Hence mark RMA LMBs
> > > > > and also the LMBs for the gap b/n RAM and hotpluggable region as
> > > > > reserved so that these LMBs are not recounted/counted by guest.
> > > > > 
> > > > > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > > > > ---
> > > > > Changes in v3:
> > > > > 
> > > > > - Not touching spapr_create_lmb_dr_connectors() so that we continue
> > > > >   to have DRC objects for only hotpluggable LMBs.
> > > > > - Simplified the logic of creating dynamic-memory node based on comments
> > > > >   from Michael Roth and David Gibson.
> > > > > 
> > > > > v2: https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg01316.html
> > > > > 
> > > > >  hw/ppc/spapr.c         | 51 ++++++++++++++++++++++++++++++++------------------
> > > > >  include/hw/ppc/spapr.h |  5 +++--
> > > > >  2 files changed, 36 insertions(+), 20 deletions(-)
> > > > > 
> > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > > > index 0636642..9d1d43d 100644
> > > > > --- a/hw/ppc/spapr.c
> > > > > +++ b/hw/ppc/spapr.c
> > > > > @@ -762,14 +762,17 @@ static int spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt)
> > > > >      int ret, i, offset;
> > > > >      uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
> > > > >      uint32_t prop_lmb_size[] = {0, cpu_to_be32(lmb_size)};
> > > > > -    uint32_t nr_lmbs = (machine->maxram_size - machine->ram_size)/lmb_size;
> > > > > +    uint32_t hotplug_lmb_start = spapr->hotplug_memory.base / lmb_size;
> > > > > +    uint32_t nr_lmbs = (spapr->hotplug_memory.base +
> > > > > +                       memory_region_size(&spapr->hotplug_memory.mr)) /
> > > > > +                       lmb_size;
> > > > >      uint32_t *int_buf, *cur_index, buf_len;
> > > > >      int nr_nodes = nb_numa_nodes ? nb_numa_nodes : 1;
> > > > > 
> > > > >      /*
> > > > > -     * Don't create the node if there are no DR LMBs.
> > > > > +     * Don't create the node if there is no hotpluggable memory
> > > > >       */
> > > > > -    if (!nr_lmbs) {
> > > > > +    if (machine->ram_size == machine->maxram_size) {
> > > > >          return 0;
> > > > >      }
> > > > > 
> > > > > @@ -805,24 +808,36 @@ static int spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt)
> > > > >      for (i = 0; i < nr_lmbs; i++) {
> > > > >          sPAPRDRConnector *drc;
> > > > >          sPAPRDRConnectorClass *drck;
> > > > 
> > > > Since these ^ are only used if (i >= hotplug_lmb_start), it might be
> > > > clearer to move them there now.
> > > 
> > > Yes.
> > > 
> > > > 
> > > > > -        uint64_t addr = i * lmb_size + spapr->hotplug_memory.base;;
> > > > > +        uint64_t addr = i * lmb_size;
> > > > >          uint32_t *dynamic_memory = cur_index;
> > > > > 
> > > > > -        drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> > > > > -                                       addr/lmb_size);
> > > > > -        g_assert(drc);
> > > > > -        drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > > > > -
> > > > > -        dynamic_memory[0] = cpu_to_be32(addr >> 32);
> > > > > -        dynamic_memory[1] = cpu_to_be32(addr & 0xffffffff);
> > > > > -        dynamic_memory[2] = cpu_to_be32(drck->get_index(drc));
> > > > > -        dynamic_memory[3] = cpu_to_be32(0); /* reserved */
> > > > > -        dynamic_memory[4] = cpu_to_be32(numa_get_node(addr, NULL));
> > > > > -        if (addr < machine->ram_size ||
> > > > > -                    memory_region_present(get_system_memory(), addr)) {
> > > > > -            dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_ASSIGNED);
> > > > > +        if (i >= hotplug_lmb_start) {
> > > > > +            drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> > > > > +                                           addr / lmb_size);
> > > > 
> > > > Could just be i
> > > 
> > > Hmm I thought I got all such occurances covered :(
> > > 
> > > > 
> > > > > +            g_assert(drc);
> > > > > +            drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > > > > +
> > > > > +            dynamic_memory[0] = cpu_to_be32(addr >> 32);
> > > > > +            dynamic_memory[1] = cpu_to_be32(addr & 0xffffffff);
> > > > > +            dynamic_memory[2] = cpu_to_be32(drck->get_index(drc));
> > > > > +            dynamic_memory[3] = cpu_to_be32(0); /* reserved */
> > > > > +            dynamic_memory[4] = cpu_to_be32(numa_get_node(addr, NULL));
> > > > > +            if (memory_region_present(get_system_memory(), addr)) {
> > > > > +                dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_ASSIGNED);
> > > > > +            } else {
> > > > > +                dynamic_memory[5] = cpu_to_be32(0);
> > > > > +            }
> > > > >          } else {
> > > > > -            dynamic_memory[5] = cpu_to_be32(0);
> > > > > +            /*
> > > > > +             * LMB information for RMA, boot time RAM and gap b/n RAM and
> > > > > +             * hotplug memory region -- all these are marked as reserved.
> > > > > +             */
> > > > > +            dynamic_memory[0] = cpu_to_be32(0);
> > > > > +            dynamic_memory[1] = cpu_to_be32(0);
> > > > 
> > > > Are we sure we shouldn't still encode the addr here?
> > > 
> > > Since kernel won't look at reserved LMBs, I thought it should be fine.
> > > We could populate the addr, but we don't have DRC for them and hence
> > > no DRC index, so anyway we will not have full information.
> > 
> > The 'DRC invalid' bit seems to suggests it's a perfectly valid scenario
> > to have LMBs with no backing drc, so I would think the other fields
> > should still be filled out appropriately, even if it might not get
> > used by guest either way.
> 
> That makes sense - but we should check that existing guests will
> actually respect that DRC invalid bit.

Existing guests don't even define DRC invalid bit yet.

As Nathan said elsewhere, existing guest will check for reserved bit
and not consider such LMBs.

Regards,
Bharata.
diff mbox

Patch

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 0636642..9d1d43d 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -762,14 +762,17 @@  static int spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt)
     int ret, i, offset;
     uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
     uint32_t prop_lmb_size[] = {0, cpu_to_be32(lmb_size)};
-    uint32_t nr_lmbs = (machine->maxram_size - machine->ram_size)/lmb_size;
+    uint32_t hotplug_lmb_start = spapr->hotplug_memory.base / lmb_size;
+    uint32_t nr_lmbs = (spapr->hotplug_memory.base +
+                       memory_region_size(&spapr->hotplug_memory.mr)) /
+                       lmb_size;
     uint32_t *int_buf, *cur_index, buf_len;
     int nr_nodes = nb_numa_nodes ? nb_numa_nodes : 1;
 
     /*
-     * Don't create the node if there are no DR LMBs.
+     * Don't create the node if there is no hotpluggable memory
      */
-    if (!nr_lmbs) {
+    if (machine->ram_size == machine->maxram_size) {
         return 0;
     }
 
@@ -805,24 +808,36 @@  static int spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt)
     for (i = 0; i < nr_lmbs; i++) {
         sPAPRDRConnector *drc;
         sPAPRDRConnectorClass *drck;
-        uint64_t addr = i * lmb_size + spapr->hotplug_memory.base;;
+        uint64_t addr = i * lmb_size;
         uint32_t *dynamic_memory = cur_index;
 
-        drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
-                                       addr/lmb_size);
-        g_assert(drc);
-        drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
-
-        dynamic_memory[0] = cpu_to_be32(addr >> 32);
-        dynamic_memory[1] = cpu_to_be32(addr & 0xffffffff);
-        dynamic_memory[2] = cpu_to_be32(drck->get_index(drc));
-        dynamic_memory[3] = cpu_to_be32(0); /* reserved */
-        dynamic_memory[4] = cpu_to_be32(numa_get_node(addr, NULL));
-        if (addr < machine->ram_size ||
-                    memory_region_present(get_system_memory(), addr)) {
-            dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_ASSIGNED);
+        if (i >= hotplug_lmb_start) {
+            drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
+                                           addr / lmb_size);
+            g_assert(drc);
+            drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+
+            dynamic_memory[0] = cpu_to_be32(addr >> 32);
+            dynamic_memory[1] = cpu_to_be32(addr & 0xffffffff);
+            dynamic_memory[2] = cpu_to_be32(drck->get_index(drc));
+            dynamic_memory[3] = cpu_to_be32(0); /* reserved */
+            dynamic_memory[4] = cpu_to_be32(numa_get_node(addr, NULL));
+            if (memory_region_present(get_system_memory(), addr)) {
+                dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_ASSIGNED);
+            } else {
+                dynamic_memory[5] = cpu_to_be32(0);
+            }
         } else {
-            dynamic_memory[5] = cpu_to_be32(0);
+            /*
+             * LMB information for RMA, boot time RAM and gap b/n RAM and
+             * hotplug memory region -- all these are marked as reserved.
+             */
+            dynamic_memory[0] = cpu_to_be32(0);
+            dynamic_memory[1] = cpu_to_be32(0);
+            dynamic_memory[2] = cpu_to_be32(0);
+            dynamic_memory[3] = cpu_to_be32(0); /* reserved */
+            dynamic_memory[4] = cpu_to_be32(-1);
+            dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_RESERVED);
         }
 
         cur_index += SPAPR_DR_LMB_LIST_ENTRY_SIZE;
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 971df3d..bb265a2 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -620,9 +620,10 @@  int spapr_rng_populate_dt(void *fdt);
 #define SPAPR_DR_LMB_LIST_ENTRY_SIZE 6
 
 /*
- * This flag value defines the LMB as assigned in ibm,dynamic-memory
- * property under ibm,dynamic-reconfiguration-memory node.
+ * Defines for flag value in ibm,dynamic-memory property under
+ * ibm,dynamic-reconfiguration-memory node.
  */
 #define SPAPR_LMB_FLAGS_ASSIGNED 0x00000008
+#define SPAPR_LMB_FLAGS_RESERVED 0x00000080
 
 #endif /* !defined (__HW_SPAPR_H__) */