[v4,00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT)
mbox series

Message ID 20190508061726.27631-1-tao3.xu@intel.com
Headers show
Series
  • Build ACPI Heterogeneous Memory Attribute Table (HMAT)
Related show

Message

Tao Xu May 8, 2019, 6:17 a.m. UTC
This series of patches will build Heterogeneous Memory Attribute Table (HMAT)
according to the command line. The ACPI HMAT describes the memory attributes,
such as memory side cache attributes and bandwidth and latency details,
related to the System Physical Address (SPA) Memory Ranges.
The software is expected to use this information as hint for optimization.

OSPM evaluates HMAT only during system initialization. Any changes to the HMAT
state at runtime or information regarding HMAT for hot plug are communicated
using the _HMA method.

The V3 patches link:
https://lists.nongnu.org/archive/html/qemu-devel/2019-01/msg08076.html
The V2 patches link:
https://lists.nongnu.org/archive/html/qemu-devel/2019-01/msg02276.html
The V1 RESEND patches link:
https://lists.nongnu.org/archive/html/qemu-devel/2018-06/msg05368.html

Changelog:
v4:
    - send the patch of "move numa global variables into MachineState"
    together with HMAT patches.
    https://lists.gnu.org/archive/html/qemu-devel/2019-04/msg03662.html
    - spilt the 1/8 of v3 patch into two patches, 4/11 introduces
    build_mem_ranges() and 5/11 builds HMAT (Igor)
    - use build_append_int_noprefix() to build parts of ACPI table in
    all patches (Igor)
    - Split 8/8 of patch v3 into two parts, 10/11 introduces NFIT
    generalizations (build_acpi_aml_common), and use it in 11/11 to
    simplify hmat_build_aml (Igor)
    - use MachineState instead of PCMachineState to build HMAT more
    generalic (Igor)
    - move the 7/8 v3 patch into the former patches
    - update the version tag from 4.0 to 4.1
v3:
    - rebase the fixing patch into the jingqi's patches (Eric)
    - update the version tag from 3.10 to 4.0 (Eric)
v2:
  Per Igor and Eric's comments, fix some coding style and small issues:
    - update the version number in qapi/misc.json
    - including the expansion of the acronym HMAT in qapi/misc.json
    - correct spell mistakes in qapi/misc.json and qemu-options.hx
    - fix the comment syle in hw/i386/acpi-build.c
    and hw/acpi/hmat.h
   - remove some unnecessary head files in hw/acpi/hmat.c 
   - use hardcoded numbers from spec to generate
   Memory Subsystem Address Range Structure in hw/acpi/hmat.c
   - drop the struct AcpiHmat and AcpiHmatSpaRange
    in hw/acpi/hmat.h
   - rewrite NFIT code to build _HMA method

Liu Jingqi (6):
  hmat acpi: Build Memory Subsystem Address Range Structure(s) in ACPI
    HMAT
  hmat acpi: Build System Locality Latency and Bandwidth Information
    Structure(s) in ACPI HMAT
  hmat acpi: Build Memory Side Cache Information Structure(s) in ACPI
    HMAT
  numa: Extend the command-line to provide memory latency and bandwidth
    information
  numa: Extend the command-line to provide memory side cache information
  hmat acpi: Implement _HMA method to update HMAT at runtime

Tao Xu (5):
  numa: move numa global variable nb_numa_nodes into MachineState
  numa: move numa global variable have_numa_distance into MachineState
  numa: move numa global variable numa_info into MachineState
  acpi: introduce AcpiDeviceIfClass.build_mem_ranges hook
  acpi: introduce build_acpi_aml_common for NFIT generalizations

 exec.c                               |   5 +-
 hw/acpi/Kconfig                      |   5 +
 hw/acpi/Makefile.objs                |   1 +
 hw/acpi/aml-build.c                  |   9 +-
 hw/acpi/hmat.c                       | 574 +++++++++++++++++++++++++++
 hw/acpi/hmat.h                       | 179 +++++++++
 hw/acpi/nvdimm.c                     |  49 ++-
 hw/acpi/piix4.c                      |   1 +
 hw/arm/boot.c                        |   4 +-
 hw/arm/virt-acpi-build.c             |  17 +-
 hw/arm/virt.c                        |   8 +-
 hw/core/machine.c                    |  24 +-
 hw/i386/acpi-build.c                 | 125 +++---
 hw/i386/pc.c                         |  14 +-
 hw/i386/pc_piix.c                    |   4 +
 hw/i386/pc_q35.c                     |   4 +
 hw/isa/lpc_ich9.c                    |   1 +
 hw/mem/pc-dimm.c                     |   2 +
 hw/pci-bridge/pci_expander_bridge.c  |   2 +
 hw/ppc/spapr.c                       |  20 +-
 hw/ppc/spapr_pci.c                   |   2 +
 include/hw/acpi/acpi_dev_interface.h |   3 +
 include/hw/acpi/aml-build.h          |   2 +-
 include/hw/boards.h                  |  43 ++
 include/hw/i386/pc.h                 |   1 +
 include/hw/mem/nvdimm.h              |   6 +
 include/qemu/typedefs.h              |   3 +
 include/sysemu/numa.h                |  13 +-
 include/sysemu/sysemu.h              |  30 ++
 monitor.c                            |   4 +-
 numa.c                               | 282 +++++++++++--
 qapi/misc.json                       | 162 +++++++-
 qemu-options.hx                      |  28 +-
 stubs/Makefile.objs                  |   1 +
 stubs/pc_build_mem_ranges.c          |   6 +
 35 files changed, 1501 insertions(+), 133 deletions(-)
 create mode 100644 hw/acpi/hmat.c
 create mode 100644 hw/acpi/hmat.h
 create mode 100644 stubs/pc_build_mem_ranges.c

Comments

Dan Williams May 31, 2019, 4:55 a.m. UTC | #1
On Tue, May 7, 2019 at 11:32 PM Tao Xu <tao3.xu@intel.com> wrote:
>
> This series of patches will build Heterogeneous Memory Attribute Table (HMAT)
> according to the command line. The ACPI HMAT describes the memory attributes,
> such as memory side cache attributes and bandwidth and latency details,
> related to the System Physical Address (SPA) Memory Ranges.
> The software is expected to use this information as hint for optimization.
>
> OSPM evaluates HMAT only during system initialization. Any changes to the HMAT
> state at runtime or information regarding HMAT for hot plug are communicated
> using the _HMA method.
[..]

Hi,

I gave these patches a try while developing support for the new EFI
v2.8 Specific Purpose Memory attribute [1]. I have a gap / feature
request to note to make this implementation capable of emulating
current shipping platform BIOS implementations for persistent memory
platforms.

The NUMA configuration I tested was:

        -numa node,mem=4G,cpus=0-19,nodeid=0
        -numa node,mem=4G,cpus=20-39,nodeid=1
        -numa node,mem=4G,nodeid=2
        -numa node,mem=4G,nodeid=3

...and it produced an entry like the following for proximity domain 2.

[0C8h 0200   2]               Structure Type : 0000 [Memory Proximity
Domain Attributes]
[0CAh 0202   2]                     Reserved : 0000
[0CCh 0204   4]                       Length : 00000028
[0D0h 0208   2]        Flags (decoded below) : 0002
            Processor Proximity Domain Valid : 0
[0D2h 0210   2]                    Reserved1 : 0000
[0D4h 0212   4]   Processor Proximity Domain : 00000002
[0D8h 0216   4]      Memory Proximity Domain : 00000002
[0DCh 0220   4]                    Reserved2 : 00000000
[0E0h 0224   8]                    Reserved3 : 0000000240000000
[0E8h 0232   8]                    Reserved4 : 0000000100000000

Notice that the Processor "Proximity Domain Valid" bit is clear. I
understand that the implementation is keying off of whether cpus are
defined for that same node or not, but that's not how current
persistent memory platforms implement "Processor Proximity Domain". On
these platforms persistent memory indeed has its own proximity domain,
but the Processor Proximity Domain is expected to be assigned to the
domain that houses the memory controller for that persistent memory.
So to emulate that configuration it would be useful to have a way to
specify "Processor Proximity Domain" without needing to define CPUs in
that domain.

Something like:

        -numa node,mem=4G,cpus=0-19,nodeid=0
        -numa node,mem=4G,cpus=20-39,nodeid=1
        -numa node,mem=4G,nodeid=2,localnodeid=0
        -numa node,mem=4G,nodeid=3,localnodeid=1

...to specify that node2 memory is connected / local to node0 and
node3 memory is connected / local to node1. In general HMAT specifies
that all performance differentiated memory ranges have their own
proximity domain, but those are expected to still be associated with a
local/host/home-socket memory controller.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2019-May/021668.html