diff mbox series

[RFC] ich9:cpuhp: add support for cpu hot-unplug with SMI broadcast enabled

Message ID 20201124122507.1014839-1-imammedo@redhat.com
State New
Headers show
Series [RFC] ich9:cpuhp: add support for cpu hot-unplug with SMI broadcast enabled | expand

Commit Message

Igor Mammedov Nov. 24, 2020, 12:25 p.m. UTC
If firmware negotiates ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT feature,
OSPM on CPU eject will set bit #4 in CPU hotplug block for to be
ejected CPU to mark it for removal by firmware and trigger SMI
upcall to let firmware do actual eject.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
PS:
  - abuse 5.1 machine type for now to turn off unplug feature
    (it will be moved to 5.2 machine type once new merge window is open)
---
 include/hw/acpi/cpu.h           |  2 ++
 docs/specs/acpi_cpu_hotplug.txt | 11 +++++++++--
 hw/acpi/cpu.c                   | 18 ++++++++++++++++--
 hw/i386/acpi-build.c            |  5 +++++
 hw/i386/pc.c                    |  1 +
 hw/isa/lpc_ich9.c               |  2 +-
 6 files changed, 34 insertions(+), 5 deletions(-)

Comments

Laszlo Ersek Nov. 24, 2020, 10:58 p.m. UTC | #1
On 11/24/20 13:25, Igor Mammedov wrote:
> If firmware negotiates ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT feature,
> OSPM on CPU eject will set bit #4 in CPU hotplug block for to be
> ejected CPU to mark it for removal by firmware and trigger SMI
> upcall to let firmware do actual eject.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> PS:
>   - abuse 5.1 machine type for now to turn off unplug feature
>     (it will be moved to 5.2 machine type once new merge window is open)
> ---
>  include/hw/acpi/cpu.h           |  2 ++
>  docs/specs/acpi_cpu_hotplug.txt | 11 +++++++++--
>  hw/acpi/cpu.c                   | 18 ++++++++++++++++--
>  hw/i386/acpi-build.c            |  5 +++++
>  hw/i386/pc.c                    |  1 +
>  hw/isa/lpc_ich9.c               |  2 +-
>  6 files changed, 34 insertions(+), 5 deletions(-)

Thanks -- I've tagged this for later; I can't tell when I'll come to it.
I'll have to re-read the previous discussion, from start of October, first.

Ankur -- please feel free to comment!

Thanks
Laszlo

> 
> diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
> index 0eeedaa491..999caaf510 100644
> --- a/include/hw/acpi/cpu.h
> +++ b/include/hw/acpi/cpu.h
> @@ -22,6 +22,7 @@ typedef struct AcpiCpuStatus {
>      uint64_t arch_id;
>      bool is_inserting;
>      bool is_removing;
> +    bool fw_remove;
>      uint32_t ost_event;
>      uint32_t ost_status;
>  } AcpiCpuStatus;
> @@ -50,6 +51,7 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object *owner,
>  typedef struct CPUHotplugFeatures {
>      bool acpi_1_compatible;
>      bool has_legacy_cphp;
> +    bool fw_unplugs_cpu;
>      const char *smi_path;
>  } CPUHotplugFeatures;
>  
> diff --git a/docs/specs/acpi_cpu_hotplug.txt b/docs/specs/acpi_cpu_hotplug.txt
> index 9bb22d1270..f68ef6e06c 100644
> --- a/docs/specs/acpi_cpu_hotplug.txt
> +++ b/docs/specs/acpi_cpu_hotplug.txt
> @@ -57,7 +57,11 @@ read access:
>                It's valid only when bit 0 is set.
>             2: Device remove event, used to distinguish device for which
>                no device eject request to OSPM was issued.
> -           3-7: reserved and should be ignored by OSPM
> +           3: reserved and should be ignored by OSPM
> +           4: if set to 1, OSPM requests firmware to perform device eject,
> +              firmware shall clear this event by writing 1 into it before
> +              performing device eject.
> +           5-7: reserved and should be ignored by OSPM
>      [0x5-0x7] reserved
>      [0x8] Command data: (DWORD access)
>            contains 0 unless value last stored in 'Command field' is one of:
> @@ -82,7 +86,10 @@ write access:
>                 selected CPU device
>              3: if set to 1 initiates device eject, set by OSPM when it
>                 triggers CPU device removal and calls _EJ0 method
> -            4-7: reserved, OSPM must clear them before writing to register
> +            4: if set to 1 OSPM hands over device eject to firmware,
> +               Firmware shall issue device eject request as described above
> +               (bit #3) and OSPM should not touch device eject bit (#3),
> +            5-7: reserved, OSPM must clear them before writing to register
>      [0x5] Command field: (1 byte access)
>            value:
>              0: selects a CPU device with inserting/removing events and
> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> index f099b50927..09d2f20dae 100644
> --- a/hw/acpi/cpu.c
> +++ b/hw/acpi/cpu.c
> @@ -71,6 +71,7 @@ static uint64_t cpu_hotplug_rd(void *opaque, hwaddr addr, unsigned size)
>          val |= cdev->cpu ? 1 : 0;
>          val |= cdev->is_inserting ? 2 : 0;
>          val |= cdev->is_removing  ? 4 : 0;
> +        val |= cdev->fw_remove  ? 16 : 0;
>          trace_cpuhp_acpi_read_flags(cpu_st->selector, val);
>          break;
>      case ACPI_CPU_CMD_DATA_OFFSET_RW:
> @@ -148,6 +149,8 @@ static void cpu_hotplug_wr(void *opaque, hwaddr addr, uint64_t data,
>              hotplug_ctrl = qdev_get_hotplug_handler(dev);
>              hotplug_handler_unplug(hotplug_ctrl, dev, NULL);
>              object_unparent(OBJECT(dev));
> +        } else if (data & 16) {
> +            cdev->fw_remove = !cdev->fw_remove;
>          }
>          break;
>      case ACPI_CPU_CMD_OFFSET_WR:
> @@ -332,6 +335,7 @@ const VMStateDescription vmstate_cpu_hotplug = {
>  #define CPU_INSERT_EVENT  "CINS"
>  #define CPU_REMOVE_EVENT  "CRMV"
>  #define CPU_EJECT_EVENT   "CEJ0"
> +#define CPU_FW_EJECT_EVENT "CEJF"
>  
>  void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>                      hwaddr io_base,
> @@ -384,7 +388,10 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>          aml_append(field, aml_named_field(CPU_REMOVE_EVENT, 1));
>          /* initiates device eject, write only */
>          aml_append(field, aml_named_field(CPU_EJECT_EVENT, 1));
> -        aml_append(field, aml_reserved_field(4));
> +        aml_append(field, aml_reserved_field(1));
> +        /* tell firmware to do device eject, write only */
> +        aml_append(field, aml_named_field(CPU_FW_EJECT_EVENT, 1));
> +        aml_append(field, aml_reserved_field(2));
>          aml_append(field, aml_named_field(CPU_COMMAND, 8));
>          aml_append(cpu_ctrl_dev, field);
>  
> @@ -419,6 +426,7 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>          Aml *ins_evt = aml_name("%s.%s", cphp_res_path, CPU_INSERT_EVENT);
>          Aml *rm_evt = aml_name("%s.%s", cphp_res_path, CPU_REMOVE_EVENT);
>          Aml *ej_evt = aml_name("%s.%s", cphp_res_path, CPU_EJECT_EVENT);
> +        Aml *fw_ej_evt = aml_name("%s.%s", cphp_res_path, CPU_FW_EJECT_EVENT);
>  
>          aml_append(cpus_dev, aml_name_decl("_HID", aml_string("ACPI0010")));
>          aml_append(cpus_dev, aml_name_decl("_CID", aml_eisaid("PNP0A05")));
> @@ -461,7 +469,13 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>  
>              aml_append(method, aml_acquire(ctrl_lock, 0xFFFF));
>              aml_append(method, aml_store(idx, cpu_selector));
> -            aml_append(method, aml_store(one, ej_evt));
> +            if (opts.fw_unplugs_cpu) {
> +                aml_append(method, aml_store(one, fw_ej_evt));
> +                aml_append(method, aml_store(aml_int(OVMF_CPUHP_SMI_CMD),
> +                           aml_name("%s", opts.smi_path)));
> +            } else {
> +                aml_append(method, aml_store(one, ej_evt));
> +            }
>              aml_append(method, aml_release(ctrl_lock));
>          }
>          aml_append(cpus_dev, method);
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 1f5c211245..475e76f514 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -96,6 +96,7 @@ typedef struct AcpiPmInfo {
>      bool s4_disabled;
>      bool pcihp_bridge_en;
>      bool smi_on_cpuhp;
> +    bool smi_on_cpu_unplug;
>      bool pcihp_root_en;
>      uint8_t s4_val;
>      AcpiFadtData fadt;
> @@ -197,6 +198,7 @@ static void acpi_get_pm_info(MachineState *machine, AcpiPmInfo *pm)
>      pm->pcihp_io_base = 0;
>      pm->pcihp_io_len = 0;
>      pm->smi_on_cpuhp = false;
> +    pm->smi_on_cpu_unplug = false;
>  
>      assert(obj);
>      init_common_fadt_data(machine, obj, &pm->fadt);
> @@ -220,6 +222,8 @@ static void acpi_get_pm_info(MachineState *machine, AcpiPmInfo *pm)
>          pm->cpu_hp_io_base = ICH9_CPU_HOTPLUG_IO_BASE;
>          pm->smi_on_cpuhp =
>              !!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT));
> +        pm->smi_on_cpu_unplug =
> +            !!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT));
>      }
>  
>      /* The above need not be conditional on machine type because the reset port
> @@ -1582,6 +1586,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>          CPUHotplugFeatures opts = {
>              .acpi_1_compatible = true, .has_legacy_cphp = true,
>              .smi_path = pm->smi_on_cpuhp ? "\\_SB.PCI0.SMI0.SMIC" : NULL,
> +            .fw_unplugs_cpu = pm->smi_on_cpu_unplug,
>          };
>          build_cpus_aml(dsdt, machine, opts, pm->cpu_hp_io_base,
>                         "\\_SB.PCI0", "\\_GPE._E02");
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 17b514d1da..2952a00fe6 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -99,6 +99,7 @@
>  
>  GlobalProperty pc_compat_5_1[] = {
>      { "ICH9-LPC", "x-smi-cpu-hotplug", "off" },
> +    { "ICH9-LPC", "x-smi-cpu-hotunplug", "off" },
>  };
>  const size_t pc_compat_5_1_len = G_N_ELEMENTS(pc_compat_5_1);
>  
> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
> index 087a18d04d..8c667b7166 100644
> --- a/hw/isa/lpc_ich9.c
> +++ b/hw/isa/lpc_ich9.c
> @@ -770,7 +770,7 @@ static Property ich9_lpc_properties[] = {
>      DEFINE_PROP_BIT64("x-smi-cpu-hotplug", ICH9LPCState, smi_host_features,
>                        ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT, true),
>      DEFINE_PROP_BIT64("x-smi-cpu-hotunplug", ICH9LPCState, smi_host_features,
> -                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, false),
> +                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, true),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
>
Ankur Arora Nov. 26, 2020, 10:24 a.m. UTC | #2
On 2020-11-24 4:25 a.m., Igor Mammedov wrote:
> If firmware negotiates ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT feature,
> OSPM on CPU eject will set bit #4 in CPU hotplug block for to be
> ejected CPU to mark it for removal by firmware and trigger SMI
> upcall to let firmware do actual eject.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> PS:
>    - abuse 5.1 machine type for now to turn off unplug feature
>      (it will be moved to 5.2 machine type once new merge window is open)
> ---
>   include/hw/acpi/cpu.h           |  2 ++
>   docs/specs/acpi_cpu_hotplug.txt | 11 +++++++++--
>   hw/acpi/cpu.c                   | 18 ++++++++++++++++--
>   hw/i386/acpi-build.c            |  5 +++++
>   hw/i386/pc.c                    |  1 +
>   hw/isa/lpc_ich9.c               |  2 +-
>   6 files changed, 34 insertions(+), 5 deletions(-)
> 
> diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
> index 0eeedaa491..999caaf510 100644
> --- a/include/hw/acpi/cpu.h
> +++ b/include/hw/acpi/cpu.h
> @@ -22,6 +22,7 @@ typedef struct AcpiCpuStatus {
>       uint64_t arch_id;
>       bool is_inserting;
>       bool is_removing;
> +    bool fw_remove;
>       uint32_t ost_event;
>       uint32_t ost_status;
>   } AcpiCpuStatus;
> @@ -50,6 +51,7 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object *owner,
>   typedef struct CPUHotplugFeatures {
>       bool acpi_1_compatible;
>       bool has_legacy_cphp;
> +    bool fw_unplugs_cpu;
>       const char *smi_path;
>   } CPUHotplugFeatures;
>   
> diff --git a/docs/specs/acpi_cpu_hotplug.txt b/docs/specs/acpi_cpu_hotplug.txt
> index 9bb22d1270..f68ef6e06c 100644
> --- a/docs/specs/acpi_cpu_hotplug.txt
> +++ b/docs/specs/acpi_cpu_hotplug.txt
> @@ -57,7 +57,11 @@ read access:
>                 It's valid only when bit 0 is set.
>              2: Device remove event, used to distinguish device for which
>                 no device eject request to OSPM was issued.
> -           3-7: reserved and should be ignored by OSPM
> +           3: reserved and should be ignored by OSPM
> +           4: if set to 1, OSPM requests firmware to perform device eject,
> +              firmware shall clear this event by writing 1 into it before
> +              performing device eject> +           5-7: reserved and should be ignored by OSPM
>       [0x5-0x7] reserved
>       [0x8] Command data: (DWORD access)
>             contains 0 unless value last stored in 'Command field' is one of:
> @@ -82,7 +86,10 @@ write access:
>                  selected CPU device
>               3: if set to 1 initiates device eject, set by OSPM when it
>                  triggers CPU device removal and calls _EJ0 method
> -            4-7: reserved, OSPM must clear them before writing to register
> +            4: if set to 1 OSPM hands over device eject to firmware,
> +               Firmware shall issue device eject request as described above
> +               (bit #3) and OSPM should not touch device eject bit (#3),
> +            5-7: reserved, OSPM must clear them before writing to register
>       [0x5] Command field: (1 byte access)
>             value:
>               0: selects a CPU device with inserting/removing events and
> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> index f099b50927..09d2f20dae 100644
> --- a/hw/acpi/cpu.c
> +++ b/hw/acpi/cpu.c
> @@ -71,6 +71,7 @@ static uint64_t cpu_hotplug_rd(void *opaque, hwaddr addr, unsigned size)
>           val |= cdev->cpu ? 1 : 0;
>           val |= cdev->is_inserting ? 2 : 0;
>           val |= cdev->is_removing  ? 4 : 0;
> +        val |= cdev->fw_remove  ? 16 : 0;

I might be missing something but I don't see where cdev->fw_remove is being
set. We do set cdev->is_removing in acpi_cpu_unplug_request_cb() so AFAICS
we would always end up setting this bit:
>           val |= cdev->is_removing  ? 4 : 0;

Also, if cdev->fw_remove and cdev->is_removing are both true, val would be
(4 | 16). I'm guessing that in that case the AML determines which case gets
handled but it might make sense to set just one of these?


>           trace_cpuhp_acpi_read_flags(cpu_st->selector, val);
>           break;
>       case ACPI_CPU_CMD_DATA_OFFSET_RW:
> @@ -148,6 +149,8 @@ static void cpu_hotplug_wr(void *opaque, hwaddr addr, uint64_t data,
>               hotplug_ctrl = qdev_get_hotplug_handler(dev);
>               hotplug_handler_unplug(hotplug_ctrl, dev, NULL);
>               object_unparent(OBJECT(dev));
> +        } else if (data & 16) {
> +            cdev->fw_remove = !cdev->fw_remove;
>           }
>           break;
>       case ACPI_CPU_CMD_OFFSET_WR:
> @@ -332,6 +335,7 @@ const VMStateDescription vmstate_cpu_hotplug = {
>   #define CPU_INSERT_EVENT  "CINS"
>   #define CPU_REMOVE_EVENT  "CRMV"
>   #define CPU_EJECT_EVENT   "CEJ0"
> +#define CPU_FW_EJECT_EVENT "CEJF"
>   
>   void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>                       hwaddr io_base,
> @@ -384,7 +388,10 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>           aml_append(field, aml_named_field(CPU_REMOVE_EVENT, 1));
>           /* initiates device eject, write only */
>           aml_append(field, aml_named_field(CPU_EJECT_EVENT, 1));
> -        aml_append(field, aml_reserved_field(4));
> +        aml_append(field, aml_reserved_field(1));
> +        /* tell firmware to do device eject, write only */
> +        aml_append(field, aml_named_field(CPU_FW_EJECT_EVENT, 1));
> +        aml_append(field, aml_reserved_field(2));
>           aml_append(field, aml_named_field(CPU_COMMAND, 8));
>           aml_append(cpu_ctrl_dev, field);
>   
> @@ -419,6 +426,7 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>           Aml *ins_evt = aml_name("%s.%s", cphp_res_path, CPU_INSERT_EVENT);
>           Aml *rm_evt = aml_name("%s.%s", cphp_res_path, CPU_REMOVE_EVENT);
>           Aml *ej_evt = aml_name("%s.%s", cphp_res_path, CPU_EJECT_EVENT);
> +        Aml *fw_ej_evt = aml_name("%s.%s", cphp_res_path, CPU_FW_EJECT_EVENT);
>   
>           aml_append(cpus_dev, aml_name_decl("_HID", aml_string("ACPI0010")));
>           aml_append(cpus_dev, aml_name_decl("_CID", aml_eisaid("PNP0A05")));
> @@ -461,7 +469,13 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>   
>               aml_append(method, aml_acquire(ctrl_lock, 0xFFFF));
>               aml_append(method, aml_store(idx, cpu_selector));
> -            aml_append(method, aml_store(one, ej_evt));
> +            if (opts.fw_unplugs_cpu) {
> +                aml_append(method, aml_store(one, fw_ej_evt));
> +                aml_append(method, aml_store(aml_int(OVMF_CPUHP_SMI_CMD),
> +                           aml_name("%s", opts.smi_path)));
> +            } else {
> +                aml_append(method, aml_store(one, ej_evt));
> +            }
My knowledge of AML is rather rudimentary but this looks mostly reasonable to me.

One question: the corresponding code for CPU hotplug does not send an SMI_CMD.
Why the difference?

                     aml_append(while_ctx,
                         aml_store(aml_derefof(aml_index(new_cpus, cpu_idx)),
                                   uid));
                     aml_append(while_ctx,
                         aml_call2(CPU_NOTIFY_METHOD, uid, dev_chk));
                     aml_append(while_ctx, aml_store(uid, cpu_selector));
                     aml_append(while_ctx, aml_store(one, ins_evt));
                     aml_append(while_ctx, aml_increment(cpu_idx));


>               aml_append(method, aml_release(ctrl_lock));
>           }
>           aml_append(cpus_dev, method);
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 1f5c211245..475e76f514 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -96,6 +96,7 @@ typedef struct AcpiPmInfo {
>       bool s4_disabled;
>       bool pcihp_bridge_en;
>       bool smi_on_cpuhp;
> +    bool smi_on_cpu_unplug;
>       bool pcihp_root_en;
>       uint8_t s4_val;
>       AcpiFadtData fadt;
> @@ -197,6 +198,7 @@ static void acpi_get_pm_info(MachineState *machine, AcpiPmInfo *pm)
>       pm->pcihp_io_base = 0;
>       pm->pcihp_io_len = 0;
>       pm->smi_on_cpuhp = false;
> +    pm->smi_on_cpu_unplug = false;
>   
>       assert(obj);
>       init_common_fadt_data(machine, obj, &pm->fadt);
> @@ -220,6 +222,8 @@ static void acpi_get_pm_info(MachineState *machine, AcpiPmInfo *pm)
>           pm->cpu_hp_io_base = ICH9_CPU_HOTPLUG_IO_BASE;
>           pm->smi_on_cpuhp =
>               !!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT));
> +        pm->smi_on_cpu_unplug =
> +            !!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT));
>       }
>   
>       /* The above need not be conditional on machine type because the reset port
> @@ -1582,6 +1586,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>           CPUHotplugFeatures opts = {
>               .acpi_1_compatible = true, .has_legacy_cphp = true,
>               .smi_path = pm->smi_on_cpuhp ? "\\_SB.PCI0.SMI0.SMIC" : NULL,
> +            .fw_unplugs_cpu = pm->smi_on_cpu_unplug,
>           };
>           build_cpus_aml(dsdt, machine, opts, pm->cpu_hp_io_base,
>                          "\\_SB.PCI0", "\\_GPE._E02");
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 17b514d1da..2952a00fe6 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -99,6 +99,7 @@
>   
>   GlobalProperty pc_compat_5_1[] = {
>       { "ICH9-LPC", "x-smi-cpu-hotplug", "off" },
> +    { "ICH9-LPC", "x-smi-cpu-hotunplug", "off" },
>   };
>   const size_t pc_compat_5_1_len = G_N_ELEMENTS(pc_compat_5_1);
>   
> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
> index 087a18d04d..8c667b7166 100644
> --- a/hw/isa/lpc_ich9.c
> +++ b/hw/isa/lpc_ich9.c
> @@ -770,7 +770,7 @@ static Property ich9_lpc_properties[] = {
>       DEFINE_PROP_BIT64("x-smi-cpu-hotplug", ICH9LPCState, smi_host_features,
>                         ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT, true),
>       DEFINE_PROP_BIT64("x-smi-cpu-hotunplug", ICH9LPCState, smi_host_features,
> -                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, false),
> +                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, true),
>       DEFINE_PROP_END_OF_LIST(),
>   };
>   
> 

Thanks for sending out the patch btw. This helped me crystallize some of the
corresponding OVMF code.

Ankur
Laszlo Ersek Nov. 26, 2020, 11:17 a.m. UTC | #3
On 11/24/20 13:25, Igor Mammedov wrote:
> If firmware negotiates ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT feature,
> OSPM on CPU eject will set bit #4 in CPU hotplug block for to be
> ejected CPU to mark it for removal by firmware and trigger SMI
> upcall to let firmware do actual eject.
>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> PS:
>   - abuse 5.1 machine type for now to turn off unplug feature
>     (it will be moved to 5.2 machine type once new merge window is open)
> ---
>  include/hw/acpi/cpu.h           |  2 ++
>  docs/specs/acpi_cpu_hotplug.txt | 11 +++++++++--
>  hw/acpi/cpu.c                   | 18 ++++++++++++++++--
>  hw/i386/acpi-build.c            |  5 +++++
>  hw/i386/pc.c                    |  1 +
>  hw/isa/lpc_ich9.c               |  2 +-
>  6 files changed, 34 insertions(+), 5 deletions(-)
>
> diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
> index 0eeedaa491..999caaf510 100644
> --- a/include/hw/acpi/cpu.h
> +++ b/include/hw/acpi/cpu.h
> @@ -22,6 +22,7 @@ typedef struct AcpiCpuStatus {
>      uint64_t arch_id;
>      bool is_inserting;
>      bool is_removing;
> +    bool fw_remove;
>      uint32_t ost_event;
>      uint32_t ost_status;
>  } AcpiCpuStatus;
> @@ -50,6 +51,7 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object *owner,
>  typedef struct CPUHotplugFeatures {
>      bool acpi_1_compatible;
>      bool has_legacy_cphp;
> +    bool fw_unplugs_cpu;
>      const char *smi_path;
>  } CPUHotplugFeatures;
>
> diff --git a/docs/specs/acpi_cpu_hotplug.txt b/docs/specs/acpi_cpu_hotplug.txt
> index 9bb22d1270..f68ef6e06c 100644
> --- a/docs/specs/acpi_cpu_hotplug.txt
> +++ b/docs/specs/acpi_cpu_hotplug.txt
> @@ -57,7 +57,11 @@ read access:
>                It's valid only when bit 0 is set.
>             2: Device remove event, used to distinguish device for which
>                no device eject request to OSPM was issued.
> -           3-7: reserved and should be ignored by OSPM
> +           3: reserved and should be ignored by OSPM
> +           4: if set to 1, OSPM requests firmware to perform device eject,
> +              firmware shall clear this event by writing 1 into it before

(1) s/clear this event/clear this event bit/

> +              performing device eject.

(2) move the second and third lines ("firmware shall clear....") over to
the write documentation, below? In particular:

> +           5-7: reserved and should be ignored by OSPM
>      [0x5-0x7] reserved
>      [0x8] Command data: (DWORD access)
>            contains 0 unless value last stored in 'Command field' is one of:
> @@ -82,7 +86,10 @@ write access:
>                 selected CPU device
>              3: if set to 1 initiates device eject, set by OSPM when it
>                 triggers CPU device removal and calls _EJ0 method
> -            4-7: reserved, OSPM must clear them before writing to register
> +            4: if set to 1 OSPM hands over device eject to firmware,
> +               Firmware shall issue device eject request as described above
> +               (bit #3) and OSPM should not touch device eject bit (#3),

(3) it would be clearer if we documented the exact bit writing order
here:
- clear bit#4, *then* set bit#3 (two write accesses)
- versus clear bit#4 *and* set bit#3 (single access)



> +            5-7: reserved, OSPM must clear them before writing to register
>      [0x5] Command field: (1 byte access)
>            value:
>              0: selects a CPU device with inserting/removing events and
> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> index f099b50927..09d2f20dae 100644
> --- a/hw/acpi/cpu.c
> +++ b/hw/acpi/cpu.c
> @@ -71,6 +71,7 @@ static uint64_t cpu_hotplug_rd(void *opaque, hwaddr addr, unsigned size)
>          val |= cdev->cpu ? 1 : 0;
>          val |= cdev->is_inserting ? 2 : 0;
>          val |= cdev->is_removing  ? 4 : 0;
> +        val |= cdev->fw_remove  ? 16 : 0;
>          trace_cpuhp_acpi_read_flags(cpu_st->selector, val);
>          break;
>      case ACPI_CPU_CMD_DATA_OFFSET_RW:
> @@ -148,6 +149,8 @@ static void cpu_hotplug_wr(void *opaque, hwaddr addr, uint64_t data,
>              hotplug_ctrl = qdev_get_hotplug_handler(dev);
>              hotplug_handler_unplug(hotplug_ctrl, dev, NULL);
>              object_unparent(OBJECT(dev));
> +        } else if (data & 16) {
> +            cdev->fw_remove = !cdev->fw_remove;

hm... so I guess the ACPI code will first write bit#4 to flip
"fw_remove" from "off" to "on". Then the firmware will write bit#4 to
flip "fw_remove" back  to "off". And finally, the firmware will write
bit#3 (strictly as a separate access) to unplug the CPU.

(4) But anyway, taking a step back: what do we need the new bit for?

>          }
>          break;
>      case ACPI_CPU_CMD_OFFSET_WR:
> @@ -332,6 +335,7 @@ const VMStateDescription vmstate_cpu_hotplug = {
>  #define CPU_INSERT_EVENT  "CINS"
>  #define CPU_REMOVE_EVENT  "CRMV"
>  #define CPU_EJECT_EVENT   "CEJ0"
> +#define CPU_FW_EJECT_EVENT "CEJF"
>
>  void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>                      hwaddr io_base,
> @@ -384,7 +388,10 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>          aml_append(field, aml_named_field(CPU_REMOVE_EVENT, 1));
>          /* initiates device eject, write only */
>          aml_append(field, aml_named_field(CPU_EJECT_EVENT, 1));
> -        aml_append(field, aml_reserved_field(4));
> +        aml_append(field, aml_reserved_field(1));
> +        /* tell firmware to do device eject, write only */
> +        aml_append(field, aml_named_field(CPU_FW_EJECT_EVENT, 1));
> +        aml_append(field, aml_reserved_field(2));
>          aml_append(field, aml_named_field(CPU_COMMAND, 8));
>          aml_append(cpu_ctrl_dev, field);
>
> @@ -419,6 +426,7 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>          Aml *ins_evt = aml_name("%s.%s", cphp_res_path, CPU_INSERT_EVENT);
>          Aml *rm_evt = aml_name("%s.%s", cphp_res_path, CPU_REMOVE_EVENT);
>          Aml *ej_evt = aml_name("%s.%s", cphp_res_path, CPU_EJECT_EVENT);
> +        Aml *fw_ej_evt = aml_name("%s.%s", cphp_res_path, CPU_FW_EJECT_EVENT);
>
>          aml_append(cpus_dev, aml_name_decl("_HID", aml_string("ACPI0010")));
>          aml_append(cpus_dev, aml_name_decl("_CID", aml_eisaid("PNP0A05")));
> @@ -461,7 +469,13 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>
>              aml_append(method, aml_acquire(ctrl_lock, 0xFFFF));
>              aml_append(method, aml_store(idx, cpu_selector));
> -            aml_append(method, aml_store(one, ej_evt));
> +            if (opts.fw_unplugs_cpu) {
> +                aml_append(method, aml_store(one, fw_ej_evt));
> +                aml_append(method, aml_store(aml_int(OVMF_CPUHP_SMI_CMD),
> +                           aml_name("%s", opts.smi_path)));
> +            } else {
> +                aml_append(method, aml_store(one, ej_evt));
> +            }
>              aml_append(method, aml_release(ctrl_lock));
>          }
>          aml_append(cpus_dev, method);

Hmmm, OK, let me parse this.

Assume there is a big bunch of device_del QMP commands, QEMU marks the
"remove" event pending on the corresponding set of CPUs, plus also makes
the ACPI interrupt pending. The ACPI interrupt handler in the OS runs,
and calls CSCN. CSCN runs a loop, and for each CPU where the remove
event is pending, notifies the OS one by one. The OS in turn forgets
about the subject CPU, and calls the _EJ0 method on the affected CPU
ACPI object. The _EJ0 method on the CPU ACPI object calls CEJ0, passing
in the affected CPU's identifier.

The above hunk modifies the CEJ0 method.

(5) Question: pre-patch, both the CSCN method and the CEJ0 method
acquire the CPLK lock, but CEJ0 is actually called within CSCN
(indirectly, with the OS's cooperation). Is CPLK a recursive lock?

Anyway, let's see the CEJ0 modification. After the OS is done forgetting
about the CPU, the CEJ0 method no longer unplugs the CPU, instead it
sets the new bit#4 in the register block, and raises an SMI.

(6) So that's one SMI per CPU being removed. Is that OK?

(7) What if there are asynchronous plugs going on, and the firmware
notices them in the register block? ... Hm, I hope that should be OK,
because ultimately the CSCN method will learn about those too, and
inform the OS. On plug, the firmware doesn't modify the register block.

Ah! OK. I think I understand why bit#4 is important. The firmware may
notice pending remove events, but it must not act upon them -- it must
simply ignore them -- unless bit#4 is also set. Bit#2 set with bit#4
clear means the event is pending (QEMU got a device_del), but the OS has
not forgotten about the CPU yet -- so the firmware must not unplug it
yet. When the modified CEJ0 method runs, it sets bit#4 in addition to
the already set bit#2, advertising that the OS has *already* abandoned
the CPU.

This means we'll have to modify the QemuCpuhpCollectApicIds() function
in OVMF as well -- for collecting a CPU for unplug, just bit#2
(QEMU_CPUHP_STAT_REMOVE) is insufficient -- on such CPUs, the OS may
still be executing threads.

OK, this approach sounds plausible to me.

(8) Please extend the description of bit#2 in the "status flags read
access" section: "firmware must ignore bit#2 unless bit#4 is set".



> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 1f5c211245..475e76f514 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -96,6 +96,7 @@ typedef struct AcpiPmInfo {
>      bool s4_disabled;
>      bool pcihp_bridge_en;
>      bool smi_on_cpuhp;
> +    bool smi_on_cpu_unplug;
>      bool pcihp_root_en;
>      uint8_t s4_val;
>      AcpiFadtData fadt;
> @@ -197,6 +198,7 @@ static void acpi_get_pm_info(MachineState *machine, AcpiPmInfo *pm)
>      pm->pcihp_io_base = 0;
>      pm->pcihp_io_len = 0;
>      pm->smi_on_cpuhp = false;
> +    pm->smi_on_cpu_unplug = false;
>
>      assert(obj);
>      init_common_fadt_data(machine, obj, &pm->fadt);
> @@ -220,6 +222,8 @@ static void acpi_get_pm_info(MachineState *machine, AcpiPmInfo *pm)
>          pm->cpu_hp_io_base = ICH9_CPU_HOTPLUG_IO_BASE;
>          pm->smi_on_cpuhp =
>              !!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT));
> +        pm->smi_on_cpu_unplug =
> +            !!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT));
>      }
>
>      /* The above need not be conditional on machine type because the reset port
> @@ -1582,6 +1586,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>          CPUHotplugFeatures opts = {
>              .acpi_1_compatible = true, .has_legacy_cphp = true,
>              .smi_path = pm->smi_on_cpuhp ? "\\_SB.PCI0.SMI0.SMIC" : NULL,
> +            .fw_unplugs_cpu = pm->smi_on_cpu_unplug,
>          };
>          build_cpus_aml(dsdt, machine, opts, pm->cpu_hp_io_base,
>                         "\\_SB.PCI0", "\\_GPE._E02");
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 17b514d1da..2952a00fe6 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -99,6 +99,7 @@
>
>  GlobalProperty pc_compat_5_1[] = {
>      { "ICH9-LPC", "x-smi-cpu-hotplug", "off" },
> +    { "ICH9-LPC", "x-smi-cpu-hotunplug", "off" },
>  };
>  const size_t pc_compat_5_1_len = G_N_ELEMENTS(pc_compat_5_1);
>
> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
> index 087a18d04d..8c667b7166 100644
> --- a/hw/isa/lpc_ich9.c
> +++ b/hw/isa/lpc_ich9.c
> @@ -770,7 +770,7 @@ static Property ich9_lpc_properties[] = {
>      DEFINE_PROP_BIT64("x-smi-cpu-hotplug", ICH9LPCState, smi_host_features,
>                        ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT, true),
>      DEFINE_PROP_BIT64("x-smi-cpu-hotunplug", ICH9LPCState, smi_host_features,
> -                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, false),
> +                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, true),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>
>

(9) You have to extend smi_features_ok_callback() as well -- it is
invalid for the firmware to negotiate unplug, without negotiating plug.

In fact, as far as I can tell, that would even crash QEMU, given this
patch. Because, "opts.smi_path" would be set to NULL, but
"opts.fw_unplugs_cpu" would be set to "true". As a consequence, the
CPU_EJECT_METHOD change above would call aml_name("%s", NULL).

So something like the following looks necessary:

> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
> index 8c667b7166c7..5bc3f212fe77 100644
> --- a/hw/isa/lpc_ich9.c
> +++ b/hw/isa/lpc_ich9.c
> @@ -366,6 +366,7 @@ static void smi_features_ok_callback(void *opaque)
>  {
>      ICH9LPCState *lpc = opaque;
>      uint64_t guest_features;
> +    uint64_t guest_cpu_hotplug_features;
>
>      if (lpc->smi_features_ok) {
>          /* negotiation already complete, features locked */
> @@ -378,9 +379,11 @@ static void smi_features_ok_callback(void *opaque)
>          /* guest requests invalid features, leave @features_ok at zero */
>          return;
>      }
> +    guest_cpu_hotplug_features = guest_features &
> +                                 (BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT) |
> +                                  BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT));
>      if (!(guest_features & BIT_ULL(ICH9_LPC_SMI_F_BROADCAST_BIT)) &&
> -        guest_features & (BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT) |
> -                          BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT))) {
> +        guest_cpu_hotplug_features) {
>          /*
>           * cpu hot-[un]plug with SMI requires SMI broadcast,
>           * leave @features_ok at zero
> @@ -388,6 +391,12 @@ static void smi_features_ok_callback(void *opaque)
>          return;
>      }
>
> +    if (guest_cpu_hotplug_features ==
> +        BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT)) {
> +        /* cpu hot-unplug is unsupported without cpu-hotplug */
> +        return;
> +    }
> +
>      /* valid feature subset requested, lock it down, report success */
>      lpc->smi_negotiated_features = guest_features;
>      lpc->smi_features_ok = 1;


(10) It would be nice to separate this work into multiple patches. I
propose:

- [PATCH 1/5] x86: ich9: factor out "guest_cpu_hotplug_features"

>  hw/isa/lpc_ich9.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
> index 087a18d04de4..c46eefd13fd4 100644
> --- a/hw/isa/lpc_ich9.c
> +++ b/hw/isa/lpc_ich9.c
> @@ -366,6 +366,7 @@ static void smi_features_ok_callback(void *opaque)
>  {
>      ICH9LPCState *lpc = opaque;
>      uint64_t guest_features;
> +    uint64_t guest_cpu_hotplug_features;
>
>      if (lpc->smi_features_ok) {
>          /* negotiation already complete, features locked */
> @@ -378,9 +379,11 @@ static void smi_features_ok_callback(void *opaque)
>          /* guest requests invalid features, leave @features_ok at zero */
>          return;
>      }
> +    guest_cpu_hotplug_features = guest_features &
> +                                 (BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT) |
> +                                  BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT));
>      if (!(guest_features & BIT_ULL(ICH9_LPC_SMI_F_BROADCAST_BIT)) &&
> -        guest_features & (BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT) |
> -                          BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT))) {
> +        guest_cpu_hotplug_features) {
>          /*
>           * cpu hot-[un]plug with SMI requires SMI broadcast,
>           * leave @features_ok at zero


- [PATCH 2/5] x86: ich9: let firmware negotiate 'CPU hot-unplug with SMI' feature

>  hw/i386/pc.c      | 1 +
>  hw/isa/lpc_ich9.c | 8 +++++++-
>  2 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 17b514d1da50..2952a00fe694 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -99,6 +99,7 @@
>
>  GlobalProperty pc_compat_5_1[] = {
>      { "ICH9-LPC", "x-smi-cpu-hotplug", "off" },
> +    { "ICH9-LPC", "x-smi-cpu-hotunplug", "off" },
>  };
>  const size_t pc_compat_5_1_len = G_N_ELEMENTS(pc_compat_5_1);
>
> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
> index c46eefd13fd4..5bc3f212fe77 100644
> --- a/hw/isa/lpc_ich9.c
> +++ b/hw/isa/lpc_ich9.c
> @@ -391,6 +391,12 @@ static void smi_features_ok_callback(void *opaque)
>          return;
>      }
>
> +    if (guest_cpu_hotplug_features ==
> +        BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT)) {
> +        /* cpu hot-unplug is unsupported without cpu-hotplug */
> +        return;
> +    }
> +
>      /* valid feature subset requested, lock it down, report success */
>      lpc->smi_negotiated_features = guest_features;
>      lpc->smi_features_ok = 1;
> @@ -773,7 +779,7 @@ static Property ich9_lpc_properties[] = {
>      DEFINE_PROP_BIT64("x-smi-cpu-hotplug", ICH9LPCState, smi_host_features,
>                        ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT, true),
>      DEFINE_PROP_BIT64("x-smi-cpu-hotunplug", ICH9LPCState, smi_host_features,
> -                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, false),
> +                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, true),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>


- [PATCH 3/5] x86: acpi: introduce AcpiPmInfo::smi_on_cpu_unplug

>  hw/i386/acpi-build.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 1f5c2112452a..9036e5594c92 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -96,6 +96,7 @@ typedef struct AcpiPmInfo {
>      bool s4_disabled;
>      bool pcihp_bridge_en;
>      bool smi_on_cpuhp;
> +    bool smi_on_cpu_unplug;
>      bool pcihp_root_en;
>      uint8_t s4_val;
>      AcpiFadtData fadt;
> @@ -197,6 +198,7 @@ static void acpi_get_pm_info(MachineState *machine, AcpiPmInfo *pm)
>      pm->pcihp_io_base = 0;
>      pm->pcihp_io_len = 0;
>      pm->smi_on_cpuhp = false;
> +    pm->smi_on_cpu_unplug = false;
>
>      assert(obj);
>      init_common_fadt_data(machine, obj, &pm->fadt);
> @@ -220,6 +222,8 @@ static void acpi_get_pm_info(MachineState *machine, AcpiPmInfo *pm)
>          pm->cpu_hp_io_base = ICH9_CPU_HOTPLUG_IO_BASE;
>          pm->smi_on_cpuhp =
>              !!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT));
> +        pm->smi_on_cpu_unplug =
> +            !!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT));
>      }
>
>      /* The above need not be conditional on machine type because the reset port


- [PATCH 4/5] acpi: cpuhp: introduce 'firmware performs eject' status/control bits

>  docs/specs/acpi_cpu_hotplug.txt | 11 +++++++++--
>  include/hw/acpi/cpu.h           |  1 +
>  hw/acpi/cpu.c                   |  3 +++
>  3 files changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/docs/specs/acpi_cpu_hotplug.txt b/docs/specs/acpi_cpu_hotplug.txt
> index 9bb22d1270a9..f68ef6e06c7a 100644
> --- a/docs/specs/acpi_cpu_hotplug.txt
> +++ b/docs/specs/acpi_cpu_hotplug.txt
> @@ -57,7 +57,11 @@ read access:
>                It's valid only when bit 0 is set.
>             2: Device remove event, used to distinguish device for which
>                no device eject request to OSPM was issued.
> -           3-7: reserved and should be ignored by OSPM
> +           3: reserved and should be ignored by OSPM
> +           4: if set to 1, OSPM requests firmware to perform device eject,
> +              firmware shall clear this event by writing 1 into it before
> +              performing device eject.
> +           5-7: reserved and should be ignored by OSPM
>      [0x5-0x7] reserved
>      [0x8] Command data: (DWORD access)
>            contains 0 unless value last stored in 'Command field' is one of:
> @@ -82,7 +86,10 @@ write access:
>                 selected CPU device
>              3: if set to 1 initiates device eject, set by OSPM when it
>                 triggers CPU device removal and calls _EJ0 method
> -            4-7: reserved, OSPM must clear them before writing to register
> +            4: if set to 1 OSPM hands over device eject to firmware,
> +               Firmware shall issue device eject request as described above
> +               (bit #3) and OSPM should not touch device eject bit (#3),
> +            5-7: reserved, OSPM must clear them before writing to register
>      [0x5] Command field: (1 byte access)
>            value:
>              0: selects a CPU device with inserting/removing events and
> diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
> index 0eeedaa491c1..d71edde456f2 100644
> --- a/include/hw/acpi/cpu.h
> +++ b/include/hw/acpi/cpu.h
> @@ -22,6 +22,7 @@ typedef struct AcpiCpuStatus {
>      uint64_t arch_id;
>      bool is_inserting;
>      bool is_removing;
> +    bool fw_remove;
>      uint32_t ost_event;
>      uint32_t ost_status;
>  } AcpiCpuStatus;
> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> index f099b5092730..3dc83d73e20b 100644
> --- a/hw/acpi/cpu.c
> +++ b/hw/acpi/cpu.c
> @@ -71,6 +71,7 @@ static uint64_t cpu_hotplug_rd(void *opaque, hwaddr addr, unsigned size)
>          val |= cdev->cpu ? 1 : 0;
>          val |= cdev->is_inserting ? 2 : 0;
>          val |= cdev->is_removing  ? 4 : 0;
> +        val |= cdev->fw_remove  ? 16 : 0;
>          trace_cpuhp_acpi_read_flags(cpu_st->selector, val);
>          break;
>      case ACPI_CPU_CMD_DATA_OFFSET_RW:
> @@ -148,6 +149,8 @@ static void cpu_hotplug_wr(void *opaque, hwaddr addr, uint64_t data,
>              hotplug_ctrl = qdev_get_hotplug_handler(dev);
>              hotplug_handler_unplug(hotplug_ctrl, dev, NULL);
>              object_unparent(OBJECT(dev));
> +        } else if (data & 16) {
> +            cdev->fw_remove = !cdev->fw_remove;
>          }
>          break;
>      case ACPI_CPU_CMD_OFFSET_WR:


- [PATCH 5/5] x86: acpi: let the firmware handle pending "CPU remove" events in SMM

>  include/hw/acpi/cpu.h |  1 +
>  hw/acpi/cpu.c         | 15 +++++++++++++--
>  hw/i386/acpi-build.c  |  1 +
>  3 files changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
> index d71edde456f2..999caaf51060 100644
> --- a/include/hw/acpi/cpu.h
> +++ b/include/hw/acpi/cpu.h
> @@ -51,6 +51,7 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object *owner,
>  typedef struct CPUHotplugFeatures {
>      bool acpi_1_compatible;
>      bool has_legacy_cphp;
> +    bool fw_unplugs_cpu;
>      const char *smi_path;
>  } CPUHotplugFeatures;
>
> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> index 3dc83d73e20b..09d2f20daec0 100644
> --- a/hw/acpi/cpu.c
> +++ b/hw/acpi/cpu.c
> @@ -335,6 +335,7 @@ const VMStateDescription vmstate_cpu_hotplug = {
>  #define CPU_INSERT_EVENT  "CINS"
>  #define CPU_REMOVE_EVENT  "CRMV"
>  #define CPU_EJECT_EVENT   "CEJ0"
> +#define CPU_FW_EJECT_EVENT "CEJF"
>
>  void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>                      hwaddr io_base,
> @@ -387,7 +388,10 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>          aml_append(field, aml_named_field(CPU_REMOVE_EVENT, 1));
>          /* initiates device eject, write only */
>          aml_append(field, aml_named_field(CPU_EJECT_EVENT, 1));
> -        aml_append(field, aml_reserved_field(4));
> +        aml_append(field, aml_reserved_field(1));
> +        /* tell firmware to do device eject, write only */
> +        aml_append(field, aml_named_field(CPU_FW_EJECT_EVENT, 1));
> +        aml_append(field, aml_reserved_field(2));
>          aml_append(field, aml_named_field(CPU_COMMAND, 8));
>          aml_append(cpu_ctrl_dev, field);
>
> @@ -422,6 +426,7 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>          Aml *ins_evt = aml_name("%s.%s", cphp_res_path, CPU_INSERT_EVENT);
>          Aml *rm_evt = aml_name("%s.%s", cphp_res_path, CPU_REMOVE_EVENT);
>          Aml *ej_evt = aml_name("%s.%s", cphp_res_path, CPU_EJECT_EVENT);
> +        Aml *fw_ej_evt = aml_name("%s.%s", cphp_res_path, CPU_FW_EJECT_EVENT);
>
>          aml_append(cpus_dev, aml_name_decl("_HID", aml_string("ACPI0010")));
>          aml_append(cpus_dev, aml_name_decl("_CID", aml_eisaid("PNP0A05")));
> @@ -464,7 +469,13 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>
>              aml_append(method, aml_acquire(ctrl_lock, 0xFFFF));
>              aml_append(method, aml_store(idx, cpu_selector));
> -            aml_append(method, aml_store(one, ej_evt));
> +            if (opts.fw_unplugs_cpu) {
> +                aml_append(method, aml_store(one, fw_ej_evt));
> +                aml_append(method, aml_store(aml_int(OVMF_CPUHP_SMI_CMD),
> +                           aml_name("%s", opts.smi_path)));
> +            } else {
> +                aml_append(method, aml_store(one, ej_evt));
> +            }
>              aml_append(method, aml_release(ctrl_lock));
>          }
>          aml_append(cpus_dev, method);
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 9036e5594c92..475e76f514ff 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -1586,6 +1586,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>          CPUHotplugFeatures opts = {
>              .acpi_1_compatible = true, .has_legacy_cphp = true,
>              .smi_path = pm->smi_on_cpuhp ? "\\_SB.PCI0.SMI0.SMIC" : NULL,
> +            .fw_unplugs_cpu = pm->smi_on_cpu_unplug,
>          };
>          build_cpus_aml(dsdt, machine, opts, pm->cpu_hp_io_base,
>                         "\\_SB.PCI0", "\\_GPE._E02");

Thanks!
Laszlo
Laszlo Ersek Nov. 26, 2020, 12:46 p.m. UTC | #4
On 11/26/20 11:24, Ankur Arora wrote:
> On 2020-11-24 4:25 a.m., Igor Mammedov wrote:
>> If firmware negotiates ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT feature,
>> OSPM on CPU eject will set bit #4 in CPU hotplug block for to be
>> ejected CPU to mark it for removal by firmware and trigger SMI
>> upcall to let firmware do actual eject.
>>
>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
>> ---
>> PS:
>>    - abuse 5.1 machine type for now to turn off unplug feature
>>      (it will be moved to 5.2 machine type once new merge window is open)
>> ---
>>   include/hw/acpi/cpu.h           |  2 ++
>>   docs/specs/acpi_cpu_hotplug.txt | 11 +++++++++--
>>   hw/acpi/cpu.c                   | 18 ++++++++++++++++--
>>   hw/i386/acpi-build.c            |  5 +++++
>>   hw/i386/pc.c                    |  1 +
>>   hw/isa/lpc_ich9.c               |  2 +-
>>   6 files changed, 34 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
>> index 0eeedaa491..999caaf510 100644
>> --- a/include/hw/acpi/cpu.h
>> +++ b/include/hw/acpi/cpu.h
>> @@ -22,6 +22,7 @@ typedef struct AcpiCpuStatus {
>>       uint64_t arch_id;
>>       bool is_inserting;
>>       bool is_removing;
>> +    bool fw_remove;
>>       uint32_t ost_event;
>>       uint32_t ost_status;
>>   } AcpiCpuStatus;
>> @@ -50,6 +51,7 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object
>> *owner,
>>   typedef struct CPUHotplugFeatures {
>>       bool acpi_1_compatible;
>>       bool has_legacy_cphp;
>> +    bool fw_unplugs_cpu;
>>       const char *smi_path;
>>   } CPUHotplugFeatures;
>>   diff --git a/docs/specs/acpi_cpu_hotplug.txt
>> b/docs/specs/acpi_cpu_hotplug.txt
>> index 9bb22d1270..f68ef6e06c 100644
>> --- a/docs/specs/acpi_cpu_hotplug.txt
>> +++ b/docs/specs/acpi_cpu_hotplug.txt
>> @@ -57,7 +57,11 @@ read access:
>>                 It's valid only when bit 0 is set.
>>              2: Device remove event, used to distinguish device for which
>>                 no device eject request to OSPM was issued.
>> -           3-7: reserved and should be ignored by OSPM
>> +           3: reserved and should be ignored by OSPM
>> +           4: if set to 1, OSPM requests firmware to perform device
>> eject,
>> +              firmware shall clear this event by writing 1 into it
>> before
>> +              performing device eject> +           5-7: reserved and
>> should be ignored by OSPM
>>       [0x5-0x7] reserved
>>       [0x8] Command data: (DWORD access)
>>             contains 0 unless value last stored in 'Command field' is
>> one of:
>> @@ -82,7 +86,10 @@ write access:
>>                  selected CPU device
>>               3: if set to 1 initiates device eject, set by OSPM when it
>>                  triggers CPU device removal and calls _EJ0 method
>> -            4-7: reserved, OSPM must clear them before writing to
>> register
>> +            4: if set to 1 OSPM hands over device eject to firmware,
>> +               Firmware shall issue device eject request as described
>> above
>> +               (bit #3) and OSPM should not touch device eject bit (#3),
>> +            5-7: reserved, OSPM must clear them before writing to
>> register
>>       [0x5] Command field: (1 byte access)
>>             value:
>>               0: selects a CPU device with inserting/removing events and
>> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
>> index f099b50927..09d2f20dae 100644
>> --- a/hw/acpi/cpu.c
>> +++ b/hw/acpi/cpu.c
>> @@ -71,6 +71,7 @@ static uint64_t cpu_hotplug_rd(void *opaque, hwaddr
>> addr, unsigned size)
>>           val |= cdev->cpu ? 1 : 0;
>>           val |= cdev->is_inserting ? 2 : 0;
>>           val |= cdev->is_removing  ? 4 : 0;
>> +        val |= cdev->fw_remove  ? 16 : 0;
> 
> I might be missing something but I don't see where cdev->fw_remove is being
> set.

See just below, in the cpu_hotplug_wr() hunk. When bit#4 is written --
which happens through the ACPI code change --, fw_remove is inverted.


> We do set cdev->is_removing in acpi_cpu_unplug_request_cb() so AFAICS
> we would always end up setting this bit:
>>           val |= cdev->is_removing  ? 4 : 0;
> 
> Also, if cdev->fw_remove and cdev->is_removing are both true, val would be
> (4 | 16). I'm guessing that in that case the AML determines which case gets
> handled but it might make sense to set just one of these?

"is_removing" is set directly in response to the device_del QMP command.
That QMP command is asynchronous to the execution of the guest OS.

"fw_remove" is set (by virtue of inverting) by ACPI CEJ0, which is
executed by the guest OS's ACPI interpreter, after the guest OS has
de-scheduled all processes from the CPU being removed (= basically after
the OS has willfully forgotten about the CPU).

Therefore, considering the bitmask (is_removing, fw_remove), three
variations make sense:

#1 (is_removing=0, fw_remove=0) -- normal status; no unplug requested

#2 (is_removing=1, fw_remove=0) -- unplug requested via QMP, guest OS
                                   is processing the request

#3 (is_removing=1, fw_remove=1) -- guest OS removed all references from
                                   the CPU, firmware is permitted /
                                   required to forget about the CPU as
                                   well, and then unplug the CPU

#4 (is_removing=1, fw_remove=0) -- fimware is about to unplug the CPU

#5 (is_removing=0, fw_remove=0) -- firmware performing unplug


The variation (is_removing=0, fw_remove=1) is invalid / unused.


The firmware may be investigating the CPU register block between steps
#2 and #3 -- in other words, the firmware may see a CPU for which
is_remove is set (unplug requested via QMP), but the OS has not vacated
yet (fw_remove=0). In that case, the firmware must just skip the CPU --
once the OS is done, it will set fw_remove too, and raise another SMI.


> 
> 
>>           trace_cpuhp_acpi_read_flags(cpu_st->selector, val);
>>           break;
>>       case ACPI_CPU_CMD_DATA_OFFSET_RW:
>> @@ -148,6 +149,8 @@ static void cpu_hotplug_wr(void *opaque, hwaddr
>> addr, uint64_t data,
>>               hotplug_ctrl = qdev_get_hotplug_handler(dev);
>>               hotplug_handler_unplug(hotplug_ctrl, dev, NULL);
>>               object_unparent(OBJECT(dev));
>> +        } else if (data & 16) {
>> +            cdev->fw_remove = !cdev->fw_remove;
>>           }
>>           break;
>>       case ACPI_CPU_CMD_OFFSET_WR:
>> @@ -332,6 +335,7 @@ const VMStateDescription vmstate_cpu_hotplug = {
>>   #define CPU_INSERT_EVENT  "CINS"
>>   #define CPU_REMOVE_EVENT  "CRMV"
>>   #define CPU_EJECT_EVENT   "CEJ0"
>> +#define CPU_FW_EJECT_EVENT "CEJF"
>>     void build_cpus_aml(Aml *table, MachineState *machine,
>> CPUHotplugFeatures opts,
>>                       hwaddr io_base,
>> @@ -384,7 +388,10 @@ void build_cpus_aml(Aml *table, MachineState
>> *machine, CPUHotplugFeatures opts,
>>           aml_append(field, aml_named_field(CPU_REMOVE_EVENT, 1));
>>           /* initiates device eject, write only */
>>           aml_append(field, aml_named_field(CPU_EJECT_EVENT, 1));
>> -        aml_append(field, aml_reserved_field(4));
>> +        aml_append(field, aml_reserved_field(1));
>> +        /* tell firmware to do device eject, write only */
>> +        aml_append(field, aml_named_field(CPU_FW_EJECT_EVENT, 1));
>> +        aml_append(field, aml_reserved_field(2));
>>           aml_append(field, aml_named_field(CPU_COMMAND, 8));
>>           aml_append(cpu_ctrl_dev, field);
>>   @@ -419,6 +426,7 @@ void build_cpus_aml(Aml *table, MachineState
>> *machine, CPUHotplugFeatures opts,
>>           Aml *ins_evt = aml_name("%s.%s", cphp_res_path,
>> CPU_INSERT_EVENT);
>>           Aml *rm_evt = aml_name("%s.%s", cphp_res_path,
>> CPU_REMOVE_EVENT);
>>           Aml *ej_evt = aml_name("%s.%s", cphp_res_path,
>> CPU_EJECT_EVENT);
>> +        Aml *fw_ej_evt = aml_name("%s.%s", cphp_res_path,
>> CPU_FW_EJECT_EVENT);
>>             aml_append(cpus_dev, aml_name_decl("_HID",
>> aml_string("ACPI0010")));
>>           aml_append(cpus_dev, aml_name_decl("_CID",
>> aml_eisaid("PNP0A05")));
>> @@ -461,7 +469,13 @@ void build_cpus_aml(Aml *table, MachineState
>> *machine, CPUHotplugFeatures opts,
>>                 aml_append(method, aml_acquire(ctrl_lock, 0xFFFF));
>>               aml_append(method, aml_store(idx, cpu_selector));
>> -            aml_append(method, aml_store(one, ej_evt));
>> +            if (opts.fw_unplugs_cpu) {
>> +                aml_append(method, aml_store(one, fw_ej_evt));
>> +                aml_append(method,
>> aml_store(aml_int(OVMF_CPUHP_SMI_CMD),
>> +                           aml_name("%s", opts.smi_path)));
>> +            } else {
>> +                aml_append(method, aml_store(one, ej_evt));
>> +            }
> My knowledge of AML is rather rudimentary but this looks mostly
> reasonable to me.
> 
> One question: the corresponding code for CPU hotplug does not send an
> SMI_CMD.
> Why the difference?

This code (on eject) is executing *after* the OS kernel has processed
the event. But on hotplug, the ordering is different (it must be): in
that case, the CSCN (scan) method first notifies the firmware, and then
the OS.

Thanks
Laszlo

> 
>                     aml_append(while_ctx,
>                         aml_store(aml_derefof(aml_index(new_cpus,
> cpu_idx)),
>                                   uid));
>                     aml_append(while_ctx,
>                         aml_call2(CPU_NOTIFY_METHOD, uid, dev_chk));
>                     aml_append(while_ctx, aml_store(uid, cpu_selector));
>                     aml_append(while_ctx, aml_store(one, ins_evt));
>                     aml_append(while_ctx, aml_increment(cpu_idx));
> 
> 
>>               aml_append(method, aml_release(ctrl_lock));
>>           }
>>           aml_append(cpus_dev, method);
>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>> index 1f5c211245..475e76f514 100644
>> --- a/hw/i386/acpi-build.c
>> +++ b/hw/i386/acpi-build.c
>> @@ -96,6 +96,7 @@ typedef struct AcpiPmInfo {
>>       bool s4_disabled;
>>       bool pcihp_bridge_en;
>>       bool smi_on_cpuhp;
>> +    bool smi_on_cpu_unplug;
>>       bool pcihp_root_en;
>>       uint8_t s4_val;
>>       AcpiFadtData fadt;
>> @@ -197,6 +198,7 @@ static void acpi_get_pm_info(MachineState
>> *machine, AcpiPmInfo *pm)
>>       pm->pcihp_io_base = 0;
>>       pm->pcihp_io_len = 0;
>>       pm->smi_on_cpuhp = false;
>> +    pm->smi_on_cpu_unplug = false;
>>         assert(obj);
>>       init_common_fadt_data(machine, obj, &pm->fadt);
>> @@ -220,6 +222,8 @@ static void acpi_get_pm_info(MachineState
>> *machine, AcpiPmInfo *pm)
>>           pm->cpu_hp_io_base = ICH9_CPU_HOTPLUG_IO_BASE;
>>           pm->smi_on_cpuhp =
>>               !!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT));
>> +        pm->smi_on_cpu_unplug =
>> +            !!(smi_features &
>> BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT));
>>       }
>>         /* The above need not be conditional on machine type because
>> the reset port
>> @@ -1582,6 +1586,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>>           CPUHotplugFeatures opts = {
>>               .acpi_1_compatible = true, .has_legacy_cphp = true,
>>               .smi_path = pm->smi_on_cpuhp ? "\\_SB.PCI0.SMI0.SMIC" :
>> NULL,
>> +            .fw_unplugs_cpu = pm->smi_on_cpu_unplug,
>>           };
>>           build_cpus_aml(dsdt, machine, opts, pm->cpu_hp_io_base,
>>                          "\\_SB.PCI0", "\\_GPE._E02");
>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>> index 17b514d1da..2952a00fe6 100644
>> --- a/hw/i386/pc.c
>> +++ b/hw/i386/pc.c
>> @@ -99,6 +99,7 @@
>>     GlobalProperty pc_compat_5_1[] = {
>>       { "ICH9-LPC", "x-smi-cpu-hotplug", "off" },
>> +    { "ICH9-LPC", "x-smi-cpu-hotunplug", "off" },
>>   };
>>   const size_t pc_compat_5_1_len = G_N_ELEMENTS(pc_compat_5_1);
>>   diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
>> index 087a18d04d..8c667b7166 100644
>> --- a/hw/isa/lpc_ich9.c
>> +++ b/hw/isa/lpc_ich9.c
>> @@ -770,7 +770,7 @@ static Property ich9_lpc_properties[] = {
>>       DEFINE_PROP_BIT64("x-smi-cpu-hotplug", ICH9LPCState,
>> smi_host_features,
>>                         ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT, true),
>>       DEFINE_PROP_BIT64("x-smi-cpu-hotunplug", ICH9LPCState,
>> smi_host_features,
>> -                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, false),
>> +                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, true),
>>       DEFINE_PROP_END_OF_LIST(),
>>   };
>>  
> 
> Thanks for sending out the patch btw. This helped me crystallize some of
> the
> corresponding OVMF code.
> 
> Ankur
>
Igor Mammedov Nov. 26, 2020, 7:50 p.m. UTC | #5
On Thu, 26 Nov 2020 13:46:32 +0100
Laszlo Ersek <lersek@redhat.com> wrote:

> On 11/26/20 11:24, Ankur Arora wrote:
> > On 2020-11-24 4:25 a.m., Igor Mammedov wrote:  
> >> If firmware negotiates ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT feature,
> >> OSPM on CPU eject will set bit #4 in CPU hotplug block for to be
> >> ejected CPU to mark it for removal by firmware and trigger SMI
> >> upcall to let firmware do actual eject.
> >>
> >> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> >> ---
> >> PS:
> >>    - abuse 5.1 machine type for now to turn off unplug feature
> >>      (it will be moved to 5.2 machine type once new merge window is open)
> >> ---
> >>   include/hw/acpi/cpu.h           |  2 ++
> >>   docs/specs/acpi_cpu_hotplug.txt | 11 +++++++++--
> >>   hw/acpi/cpu.c                   | 18 ++++++++++++++++--
> >>   hw/i386/acpi-build.c            |  5 +++++
> >>   hw/i386/pc.c                    |  1 +
> >>   hw/isa/lpc_ich9.c               |  2 +-
> >>   6 files changed, 34 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
> >> index 0eeedaa491..999caaf510 100644
> >> --- a/include/hw/acpi/cpu.h
> >> +++ b/include/hw/acpi/cpu.h
> >> @@ -22,6 +22,7 @@ typedef struct AcpiCpuStatus {
> >>       uint64_t arch_id;
> >>       bool is_inserting;
> >>       bool is_removing;
> >> +    bool fw_remove;
> >>       uint32_t ost_event;
> >>       uint32_t ost_status;
> >>   } AcpiCpuStatus;
> >> @@ -50,6 +51,7 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object
> >> *owner,
> >>   typedef struct CPUHotplugFeatures {
> >>       bool acpi_1_compatible;
> >>       bool has_legacy_cphp;
> >> +    bool fw_unplugs_cpu;
> >>       const char *smi_path;
> >>   } CPUHotplugFeatures;
> >>   diff --git a/docs/specs/acpi_cpu_hotplug.txt
> >> b/docs/specs/acpi_cpu_hotplug.txt
> >> index 9bb22d1270..f68ef6e06c 100644
> >> --- a/docs/specs/acpi_cpu_hotplug.txt
> >> +++ b/docs/specs/acpi_cpu_hotplug.txt
> >> @@ -57,7 +57,11 @@ read access:
> >>                 It's valid only when bit 0 is set.
> >>              2: Device remove event, used to distinguish device for which
> >>                 no device eject request to OSPM was issued.
> >> -           3-7: reserved and should be ignored by OSPM
> >> +           3: reserved and should be ignored by OSPM
> >> +           4: if set to 1, OSPM requests firmware to perform device
> >> eject,
> >> +              firmware shall clear this event by writing 1 into it
> >> before
> >> +              performing device eject> +           5-7: reserved and
> >> should be ignored by OSPM
> >>       [0x5-0x7] reserved
> >>       [0x8] Command data: (DWORD access)
> >>             contains 0 unless value last stored in 'Command field' is
> >> one of:
> >> @@ -82,7 +86,10 @@ write access:
> >>                  selected CPU device
> >>               3: if set to 1 initiates device eject, set by OSPM when it
> >>                  triggers CPU device removal and calls _EJ0 method
> >> -            4-7: reserved, OSPM must clear them before writing to
> >> register
> >> +            4: if set to 1 OSPM hands over device eject to firmware,
> >> +               Firmware shall issue device eject request as described
> >> above
> >> +               (bit #3) and OSPM should not touch device eject bit (#3),
> >> +            5-7: reserved, OSPM must clear them before writing to
> >> register
> >>       [0x5] Command field: (1 byte access)
> >>             value:
> >>               0: selects a CPU device with inserting/removing events and
> >> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> >> index f099b50927..09d2f20dae 100644
> >> --- a/hw/acpi/cpu.c
> >> +++ b/hw/acpi/cpu.c
> >> @@ -71,6 +71,7 @@ static uint64_t cpu_hotplug_rd(void *opaque, hwaddr
> >> addr, unsigned size)
> >>           val |= cdev->cpu ? 1 : 0;
> >>           val |= cdev->is_inserting ? 2 : 0;
> >>           val |= cdev->is_removing  ? 4 : 0;
> >> +        val |= cdev->fw_remove  ? 16 : 0;  
> > 
> > I might be missing something but I don't see where cdev->fw_remove is being
> > set.  
> 
> See just below, in the cpu_hotplug_wr() hunk. When bit#4 is written --
> which happens through the ACPI code change --, fw_remove is inverted.
> 
> 
> > We do set cdev->is_removing in acpi_cpu_unplug_request_cb() so AFAICS
> > we would always end up setting this bit:  
> >>           val |= cdev->is_removing  ? 4 : 0;  
> > 
> > Also, if cdev->fw_remove and cdev->is_removing are both true, val would be
> > (4 | 16). I'm guessing that in that case the AML determines which case gets
> > handled but it might make sense to set just one of these?  
> 
> "is_removing" is set directly in response to the device_del QMP command.
> That QMP command is asynchronous to the execution of the guest OS.

its removing is notification to OSPM, which is cleared when ACPI scans
for events if I'm not mistaken.


> 
> "fw_remove" is set (by virtue of inverting) by ACPI CEJ0, which is
> executed by the guest OS's ACPI interpreter, after the guest OS has
> de-scheduled all processes from the CPU being removed (= basically after
> the OS has willfully forgotten about the CPU).
> 
> Therefore, considering the bitmask (is_removing, fw_remove), three
> variations make sense:
> 
> #1 (is_removing=0, fw_remove=0) -- normal status; no unplug requested
> 
> #2 (is_removing=1, fw_remove=0) -- unplug requested via QMP, guest OS
>                                    is processing the request


> #3 (is_removing=1, fw_remove=1) -- guest OS removed all references from
>                                    the CPU, firmware is permitted /
>                                    required to forget about the CPU as
>                                    well, and then unplug the CP
shouldn't be possible

> 
> #4 (is_removing=1, fw_remove=0) -- fimware is about to unplug the CPU
ditto


> #5 (is_removing=0, fw_remove=0) -- firmware performing unplug
> 
> 
> The variation (is_removing=0, fw_remove=1) is invalid / unused.
>
> 
> The firmware may be investigating the CPU register block between steps
> #2 and #3 -- in other words, the firmware may see a CPU for which
> is_remove is set (unplug requested via QMP), but the OS has not vacated
> yet (fw_remove=0). In that case, the firmware must just skip the CPU --
> once the OS is done, it will set fw_remove too, and raise another SMI.
> 
> 
> > 
> >   
> >>           trace_cpuhp_acpi_read_flags(cpu_st->selector, val);
> >>           break;
> >>       case ACPI_CPU_CMD_DATA_OFFSET_RW:
> >> @@ -148,6 +149,8 @@ static void cpu_hotplug_wr(void *opaque, hwaddr
> >> addr, uint64_t data,
> >>               hotplug_ctrl = qdev_get_hotplug_handler(dev);
> >>               hotplug_handler_unplug(hotplug_ctrl, dev, NULL);
> >>               object_unparent(OBJECT(dev));
> >> +        } else if (data & 16) {
> >> +            cdev->fw_remove = !cdev->fw_remove;
> >>           }
> >>           break;
> >>       case ACPI_CPU_CMD_OFFSET_WR:
> >> @@ -332,6 +335,7 @@ const VMStateDescription vmstate_cpu_hotplug = {
> >>   #define CPU_INSERT_EVENT  "CINS"
> >>   #define CPU_REMOVE_EVENT  "CRMV"
> >>   #define CPU_EJECT_EVENT   "CEJ0"
> >> +#define CPU_FW_EJECT_EVENT "CEJF"
> >>     void build_cpus_aml(Aml *table, MachineState *machine,
> >> CPUHotplugFeatures opts,
> >>                       hwaddr io_base,
> >> @@ -384,7 +388,10 @@ void build_cpus_aml(Aml *table, MachineState
> >> *machine, CPUHotplugFeatures opts,
> >>           aml_append(field, aml_named_field(CPU_REMOVE_EVENT, 1));
> >>           /* initiates device eject, write only */
> >>           aml_append(field, aml_named_field(CPU_EJECT_EVENT, 1));
> >> -        aml_append(field, aml_reserved_field(4));
> >> +        aml_append(field, aml_reserved_field(1));
> >> +        /* tell firmware to do device eject, write only */
> >> +        aml_append(field, aml_named_field(CPU_FW_EJECT_EVENT, 1));
> >> +        aml_append(field, aml_reserved_field(2));
> >>           aml_append(field, aml_named_field(CPU_COMMAND, 8));
> >>           aml_append(cpu_ctrl_dev, field);
> >>   @@ -419,6 +426,7 @@ void build_cpus_aml(Aml *table, MachineState
> >> *machine, CPUHotplugFeatures opts,
> >>           Aml *ins_evt = aml_name("%s.%s", cphp_res_path,
> >> CPU_INSERT_EVENT);
> >>           Aml *rm_evt = aml_name("%s.%s", cphp_res_path,
> >> CPU_REMOVE_EVENT);
> >>           Aml *ej_evt = aml_name("%s.%s", cphp_res_path,
> >> CPU_EJECT_EVENT);
> >> +        Aml *fw_ej_evt = aml_name("%s.%s", cphp_res_path,
> >> CPU_FW_EJECT_EVENT);
> >>             aml_append(cpus_dev, aml_name_decl("_HID",
> >> aml_string("ACPI0010")));
> >>           aml_append(cpus_dev, aml_name_decl("_CID",
> >> aml_eisaid("PNP0A05")));
> >> @@ -461,7 +469,13 @@ void build_cpus_aml(Aml *table, MachineState
> >> *machine, CPUHotplugFeatures opts,
> >>                 aml_append(method, aml_acquire(ctrl_lock, 0xFFFF));
> >>               aml_append(method, aml_store(idx, cpu_selector));
> >> -            aml_append(method, aml_store(one, ej_evt));
> >> +            if (opts.fw_unplugs_cpu) {
> >> +                aml_append(method, aml_store(one, fw_ej_evt));
> >> +                aml_append(method,
> >> aml_store(aml_int(OVMF_CPUHP_SMI_CMD),
> >> +                           aml_name("%s", opts.smi_path)));
> >> +            } else {
> >> +                aml_append(method, aml_store(one, ej_evt));
> >> +            }  
> > My knowledge of AML is rather rudimentary but this looks mostly
> > reasonable to me.
> > 
> > One question: the corresponding code for CPU hotplug does not send an
> > SMI_CMD.
> > Why the difference?  
> 
> This code (on eject) is executing *after* the OS kernel has processed
> the event. But on hotplug, the ordering is different (it must be): in
> that case, the CSCN (scan) method first notifies the firmware, and then
> the OS.
> 
> Thanks
> Laszlo
> 
> > 
> >                     aml_append(while_ctx,
> >                         aml_store(aml_derefof(aml_index(new_cpus,
> > cpu_idx)),
> >                                   uid));
> >                     aml_append(while_ctx,
> >                         aml_call2(CPU_NOTIFY_METHOD, uid, dev_chk));
> >                     aml_append(while_ctx, aml_store(uid, cpu_selector));
> >                     aml_append(while_ctx, aml_store(one, ins_evt));
> >                     aml_append(while_ctx, aml_increment(cpu_idx));
> > 
> >   
> >>               aml_append(method, aml_release(ctrl_lock));
> >>           }
> >>           aml_append(cpus_dev, method);
> >> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> >> index 1f5c211245..475e76f514 100644
> >> --- a/hw/i386/acpi-build.c
> >> +++ b/hw/i386/acpi-build.c
> >> @@ -96,6 +96,7 @@ typedef struct AcpiPmInfo {
> >>       bool s4_disabled;
> >>       bool pcihp_bridge_en;
> >>       bool smi_on_cpuhp;
> >> +    bool smi_on_cpu_unplug;
> >>       bool pcihp_root_en;
> >>       uint8_t s4_val;
> >>       AcpiFadtData fadt;
> >> @@ -197,6 +198,7 @@ static void acpi_get_pm_info(MachineState
> >> *machine, AcpiPmInfo *pm)
> >>       pm->pcihp_io_base = 0;
> >>       pm->pcihp_io_len = 0;
> >>       pm->smi_on_cpuhp = false;
> >> +    pm->smi_on_cpu_unplug = false;
> >>         assert(obj);
> >>       init_common_fadt_data(machine, obj, &pm->fadt);
> >> @@ -220,6 +222,8 @@ static void acpi_get_pm_info(MachineState
> >> *machine, AcpiPmInfo *pm)
> >>           pm->cpu_hp_io_base = ICH9_CPU_HOTPLUG_IO_BASE;
> >>           pm->smi_on_cpuhp =
> >>               !!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT));
> >> +        pm->smi_on_cpu_unplug =
> >> +            !!(smi_features &
> >> BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT));
> >>       }
> >>         /* The above need not be conditional on machine type because
> >> the reset port
> >> @@ -1582,6 +1586,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
> >>           CPUHotplugFeatures opts = {
> >>               .acpi_1_compatible = true, .has_legacy_cphp = true,
> >>               .smi_path = pm->smi_on_cpuhp ? "\\_SB.PCI0.SMI0.SMIC" :
> >> NULL,
> >> +            .fw_unplugs_cpu = pm->smi_on_cpu_unplug,
> >>           };
> >>           build_cpus_aml(dsdt, machine, opts, pm->cpu_hp_io_base,
> >>                          "\\_SB.PCI0", "\\_GPE._E02");
> >> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> >> index 17b514d1da..2952a00fe6 100644
> >> --- a/hw/i386/pc.c
> >> +++ b/hw/i386/pc.c
> >> @@ -99,6 +99,7 @@
> >>     GlobalProperty pc_compat_5_1[] = {
> >>       { "ICH9-LPC", "x-smi-cpu-hotplug", "off" },
> >> +    { "ICH9-LPC", "x-smi-cpu-hotunplug", "off" },
> >>   };
> >>   const size_t pc_compat_5_1_len = G_N_ELEMENTS(pc_compat_5_1);
> >>   diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
> >> index 087a18d04d..8c667b7166 100644
> >> --- a/hw/isa/lpc_ich9.c
> >> +++ b/hw/isa/lpc_ich9.c
> >> @@ -770,7 +770,7 @@ static Property ich9_lpc_properties[] = {
> >>       DEFINE_PROP_BIT64("x-smi-cpu-hotplug", ICH9LPCState,
> >> smi_host_features,
> >>                         ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT, true),
> >>       DEFINE_PROP_BIT64("x-smi-cpu-hotunplug", ICH9LPCState,
> >> smi_host_features,
> >> -                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, false),
> >> +                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, true),
> >>       DEFINE_PROP_END_OF_LIST(),
> >>   };
> >>    
> > 
> > Thanks for sending out the patch btw. This helped me crystallize some of
> > the
> > corresponding OVMF code.
> > 
> > Ankur
> >   
>
Igor Mammedov Nov. 26, 2020, 8:38 p.m. UTC | #6
On Thu, 26 Nov 2020 12:17:27 +0100
Laszlo Ersek <lersek@redhat.com> wrote:

> On 11/24/20 13:25, Igor Mammedov wrote:
> > If firmware negotiates ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT feature,
> > OSPM on CPU eject will set bit #4 in CPU hotplug block for to be
> > ejected CPU to mark it for removal by firmware and trigger SMI
> > upcall to let firmware do actual eject.
> >
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> > PS:
> >   - abuse 5.1 machine type for now to turn off unplug feature
> >     (it will be moved to 5.2 machine type once new merge window is open)
> > ---
> >  include/hw/acpi/cpu.h           |  2 ++
> >  docs/specs/acpi_cpu_hotplug.txt | 11 +++++++++--
> >  hw/acpi/cpu.c                   | 18 ++++++++++++++++--
> >  hw/i386/acpi-build.c            |  5 +++++
> >  hw/i386/pc.c                    |  1 +
> >  hw/isa/lpc_ich9.c               |  2 +-
> >  6 files changed, 34 insertions(+), 5 deletions(-)
> >
> > diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
> > index 0eeedaa491..999caaf510 100644
> > --- a/include/hw/acpi/cpu.h
> > +++ b/include/hw/acpi/cpu.h
> > @@ -22,6 +22,7 @@ typedef struct AcpiCpuStatus {
> >      uint64_t arch_id;
> >      bool is_inserting;
> >      bool is_removing;
> > +    bool fw_remove;
> >      uint32_t ost_event;
> >      uint32_t ost_status;
> >  } AcpiCpuStatus;
> > @@ -50,6 +51,7 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object *owner,
> >  typedef struct CPUHotplugFeatures {
> >      bool acpi_1_compatible;
> >      bool has_legacy_cphp;
> > +    bool fw_unplugs_cpu;
> >      const char *smi_path;
> >  } CPUHotplugFeatures;
> >
> > diff --git a/docs/specs/acpi_cpu_hotplug.txt b/docs/specs/acpi_cpu_hotplug.txt
> > index 9bb22d1270..f68ef6e06c 100644
> > --- a/docs/specs/acpi_cpu_hotplug.txt
> > +++ b/docs/specs/acpi_cpu_hotplug.txt
> > @@ -57,7 +57,11 @@ read access:
> >                It's valid only when bit 0 is set.
> >             2: Device remove event, used to distinguish device for which
> >                no device eject request to OSPM was issued.
> > -           3-7: reserved and should be ignored by OSPM
> > +           3: reserved and should be ignored by OSPM
> > +           4: if set to 1, OSPM requests firmware to perform device eject,
> > +              firmware shall clear this event by writing 1 into it before  
> 
> (1) s/clear this event/clear this event bit/
> 
> > +              performing device eject.  
> 
> (2) move the second and third lines ("firmware shall clear....") over to
> the write documentation, below? In particular:
> 
> > +           5-7: reserved and should be ignored by OSPM
> >      [0x5-0x7] reserved
> >      [0x8] Command data: (DWORD access)
> >            contains 0 unless value last stored in 'Command field' is one of:
> > @@ -82,7 +86,10 @@ write access:
> >                 selected CPU device
> >              3: if set to 1 initiates device eject, set by OSPM when it
> >                 triggers CPU device removal and calls _EJ0 method
> > -            4-7: reserved, OSPM must clear them before writing to register
> > +            4: if set to 1 OSPM hands over device eject to firmware,
> > +               Firmware shall issue device eject request as described above
> > +               (bit #3) and OSPM should not touch device eject bit (#3),  
> 
> (3) it would be clearer if we documented the exact bit writing order
> here:
> - clear bit#4, *then* set bit#3 (two write accesses)
> - versus clear bit#4 *and* set bit#3 (single access)

I was thinking that FW should not bother with clearing bit #4,
and QEMU should clear it when handling write to bit #3.
(it looks like I forgot to actually do that)

> 
> 
> 
> > +            5-7: reserved, OSPM must clear them before writing to register
> >      [0x5] Command field: (1 byte access)
> >            value:
> >              0: selects a CPU device with inserting/removing events and
> > diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> > index f099b50927..09d2f20dae 100644
> > --- a/hw/acpi/cpu.c
> > +++ b/hw/acpi/cpu.c
> > @@ -71,6 +71,7 @@ static uint64_t cpu_hotplug_rd(void *opaque, hwaddr addr, unsigned size)
> >          val |= cdev->cpu ? 1 : 0;
> >          val |= cdev->is_inserting ? 2 : 0;
> >          val |= cdev->is_removing  ? 4 : 0;
> > +        val |= cdev->fw_remove  ? 16 : 0;
> >          trace_cpuhp_acpi_read_flags(cpu_st->selector, val);
> >          break;
> >      case ACPI_CPU_CMD_DATA_OFFSET_RW:
> > @@ -148,6 +149,8 @@ static void cpu_hotplug_wr(void *opaque, hwaddr addr, uint64_t data,
> >              hotplug_ctrl = qdev_get_hotplug_handler(dev);
> >              hotplug_handler_unplug(hotplug_ctrl, dev, NULL);
> >              object_unparent(OBJECT(dev));
> > +        } else if (data & 16) {
> > +            cdev->fw_remove = !cdev->fw_remove;  
> 
> hm... so I guess the ACPI code will first write bit#4 to flip
> "fw_remove" from "off" to "on". Then the firmware will write bit#4 to
> flip "fw_remove" back  to "off". And finally, the firmware will write
> bit#3 (strictly as a separate access) to unplug the CPU.
sorry for confusion in doc vs impl, FW should only read bit #4, as for bit #3 only write.
 
> (4) But anyway, taking a step back: what do we need the new bit for?
> 
> >          }
> >          break;
> >      case ACPI_CPU_CMD_OFFSET_WR:
> > @@ -332,6 +335,7 @@ const VMStateDescription vmstate_cpu_hotplug = {
> >  #define CPU_INSERT_EVENT  "CINS"
> >  #define CPU_REMOVE_EVENT  "CRMV"
> >  #define CPU_EJECT_EVENT   "CEJ0"
> > +#define CPU_FW_EJECT_EVENT "CEJF"
> >
> >  void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> >                      hwaddr io_base,
> > @@ -384,7 +388,10 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> >          aml_append(field, aml_named_field(CPU_REMOVE_EVENT, 1));
> >          /* initiates device eject, write only */
> >          aml_append(field, aml_named_field(CPU_EJECT_EVENT, 1));
> > -        aml_append(field, aml_reserved_field(4));
> > +        aml_append(field, aml_reserved_field(1));
> > +        /* tell firmware to do device eject, write only */
> > +        aml_append(field, aml_named_field(CPU_FW_EJECT_EVENT, 1));
> > +        aml_append(field, aml_reserved_field(2));
> >          aml_append(field, aml_named_field(CPU_COMMAND, 8));
> >          aml_append(cpu_ctrl_dev, field);
> >
> > @@ -419,6 +426,7 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> >          Aml *ins_evt = aml_name("%s.%s", cphp_res_path, CPU_INSERT_EVENT);
> >          Aml *rm_evt = aml_name("%s.%s", cphp_res_path, CPU_REMOVE_EVENT);
> >          Aml *ej_evt = aml_name("%s.%s", cphp_res_path, CPU_EJECT_EVENT);
> > +        Aml *fw_ej_evt = aml_name("%s.%s", cphp_res_path, CPU_FW_EJECT_EVENT);
> >
> >          aml_append(cpus_dev, aml_name_decl("_HID", aml_string("ACPI0010")));
> >          aml_append(cpus_dev, aml_name_decl("_CID", aml_eisaid("PNP0A05")));
> > @@ -461,7 +469,13 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> >
> >              aml_append(method, aml_acquire(ctrl_lock, 0xFFFF));
> >              aml_append(method, aml_store(idx, cpu_selector));
> > -            aml_append(method, aml_store(one, ej_evt));
> > +            if (opts.fw_unplugs_cpu) {
> > +                aml_append(method, aml_store(one, fw_ej_evt));
> > +                aml_append(method, aml_store(aml_int(OVMF_CPUHP_SMI_CMD),
> > +                           aml_name("%s", opts.smi_path)));
> > +            } else {
> > +                aml_append(method, aml_store(one, ej_evt));
> > +            }
> >              aml_append(method, aml_release(ctrl_lock));
> >          }
> >          aml_append(cpus_dev, method);  
> 
> Hmmm, OK, let me parse this.
> 
> Assume there is a big bunch of device_del QMP commands, QEMU marks the
> "remove" event pending on the corresponding set of CPUs, plus also makes
> the ACPI interrupt pending. The ACPI interrupt handler in the OS runs,
> and calls CSCN. CSCN runs a loop, and for each CPU where the remove
> event is pending, notifies the OS one by one. The OS in turn forgets
> about the subject CPU, and calls the _EJ0 method on the affected CPU
> ACPI object. The _EJ0 method on the CPU ACPI object calls CEJ0, passing
> in the affected CPU's identifier.
> 
> The above hunk modifies the CEJ0 method.
> 
> (5) Question: pre-patch, both the CSCN method and the CEJ0 method
> acquire the CPLK lock, but CEJ0 is actually called within CSCN
> (indirectly, with the OS's cooperation). Is CPLK a recursive lock?
Theoretically scep supports recursive mutexes but I don't think it's the case here.

Considering it works currently, I think OS implements Notify event as async.
hence no clash wrt mutex. If EJ0 were handled within CSCN context,
EJ0 would mess cpu_selector value that CSCN is also using.

> Anyway, let's see the CEJ0 modification. After the OS is done forgetting
> about the CPU, the CEJ0 method no longer unplugs the CPU, instead it
> sets the new bit#4 in the register block, and raises an SMI.
> 
> (6) So that's one SMI per CPU being removed. Is that OK?

I guess it has performance penalty but there is nothing we can do about it,
OSPM does EJ0 calls asynchronously.
 
> (7) What if there are asynchronous plugs going on, and the firmware
> notices them in the register block? ... Hm, I hope that should be OK,
> because ultimately the CSCN method will learn about those too, and
> inform the OS. On plug, the firmware doesn't modify the register block.
shouldn't be issue (modulo bugs, I haven't tried to hot add and hot remove
the same CPU at the same time)

i.e. 
(QEMU) pause
(QEMU) device_add
(QEMU) device_del
(QEMU) cont

> Ah! OK. I think I understand why bit#4 is important. The firmware may
> notice pending remove events, but it must not act upon them -- it must
> simply ignore them -- unless bit#4 is also set. Bit#2 set with bit#4
> clear means the event is pending (QEMU got a device_del), but the OS has
> not forgotten about the CPU yet -- so the firmware must not unplug it
> yet. When the modified CEJ0 method runs, it sets bit#4 in addition to
> the already set bit#2, advertising that the OS has *already* abandoned
> the CPU.
firmware should ignore bit #2, it doesn't mean anything to it, OSPM might
ignore or nonsupport CPU removal. What firmware must care about is bit #4,
which tells it that OSPM is done with CPU and asks for to be removed by firmware.

> 
> This means we'll have to modify the QemuCpuhpCollectApicIds() function
> in OVMF as well -- for collecting a CPU for unplug, just bit#2
> (QEMU_CPUHP_STAT_REMOVE) is insufficient -- on such CPUs, the OS may
> still be executing threads.
> 
> OK, this approach sounds plausible to me.
> 
> (8) Please extend the description of bit#2 in the "status flags read
> access" section: "firmware must ignore bit#2 unless bit#4 is set".
> 
> 
> 
> > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > index 1f5c211245..475e76f514 100644
> > --- a/hw/i386/acpi-build.c
> > +++ b/hw/i386/acpi-build.c
> > @@ -96,6 +96,7 @@ typedef struct AcpiPmInfo {
> >      bool s4_disabled;
> >      bool pcihp_bridge_en;
> >      bool smi_on_cpuhp;
> > +    bool smi_on_cpu_unplug;
> >      bool pcihp_root_en;
> >      uint8_t s4_val;
> >      AcpiFadtData fadt;
> > @@ -197,6 +198,7 @@ static void acpi_get_pm_info(MachineState *machine, AcpiPmInfo *pm)
> >      pm->pcihp_io_base = 0;
> >      pm->pcihp_io_len = 0;
> >      pm->smi_on_cpuhp = false;
> > +    pm->smi_on_cpu_unplug = false;
> >
> >      assert(obj);
> >      init_common_fadt_data(machine, obj, &pm->fadt);
> > @@ -220,6 +222,8 @@ static void acpi_get_pm_info(MachineState *machine, AcpiPmInfo *pm)
> >          pm->cpu_hp_io_base = ICH9_CPU_HOTPLUG_IO_BASE;
> >          pm->smi_on_cpuhp =
> >              !!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT));
> > +        pm->smi_on_cpu_unplug =
> > +            !!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT));
> >      }
> >
> >      /* The above need not be conditional on machine type because the reset port
> > @@ -1582,6 +1586,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
> >          CPUHotplugFeatures opts = {
> >              .acpi_1_compatible = true, .has_legacy_cphp = true,
> >              .smi_path = pm->smi_on_cpuhp ? "\\_SB.PCI0.SMI0.SMIC" : NULL,
> > +            .fw_unplugs_cpu = pm->smi_on_cpu_unplug,
> >          };
> >          build_cpus_aml(dsdt, machine, opts, pm->cpu_hp_io_base,
> >                         "\\_SB.PCI0", "\\_GPE._E02");
> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > index 17b514d1da..2952a00fe6 100644
> > --- a/hw/i386/pc.c
> > +++ b/hw/i386/pc.c
> > @@ -99,6 +99,7 @@
> >
> >  GlobalProperty pc_compat_5_1[] = {
> >      { "ICH9-LPC", "x-smi-cpu-hotplug", "off" },
> > +    { "ICH9-LPC", "x-smi-cpu-hotunplug", "off" },
> >  };
> >  const size_t pc_compat_5_1_len = G_N_ELEMENTS(pc_compat_5_1);
> >
> > diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
> > index 087a18d04d..8c667b7166 100644
> > --- a/hw/isa/lpc_ich9.c
> > +++ b/hw/isa/lpc_ich9.c
> > @@ -770,7 +770,7 @@ static Property ich9_lpc_properties[] = {
> >      DEFINE_PROP_BIT64("x-smi-cpu-hotplug", ICH9LPCState, smi_host_features,
> >                        ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT, true),
> >      DEFINE_PROP_BIT64("x-smi-cpu-hotunplug", ICH9LPCState, smi_host_features,
> > -                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, false),
> > +                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, true),
> >      DEFINE_PROP_END_OF_LIST(),
> >  };
> >
> >  
> 
> (9) You have to extend smi_features_ok_callback() as well -- it is
> invalid for the firmware to negotiate unplug, without negotiating plug.
> 
> In fact, as far as I can tell, that would even crash QEMU, given this
> patch. Because, "opts.smi_path" would be set to NULL, but
> "opts.fw_unplugs_cpu" would be set to "true". As a consequence, the
> CPU_EJECT_METHOD change above would call aml_name("%s", NULL).
> 
> So something like the following looks necessary:

Thanks for suggestions,
I'll respin v2 with your feedback included.

> 
> > diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
> > index 8c667b7166c7..5bc3f212fe77 100644
> > --- a/hw/isa/lpc_ich9.c
> > +++ b/hw/isa/lpc_ich9.c
> > @@ -366,6 +366,7 @@ static void smi_features_ok_callback(void *opaque)
> >  {
> >      ICH9LPCState *lpc = opaque;
> >      uint64_t guest_features;
> > +    uint64_t guest_cpu_hotplug_features;
> >
> >      if (lpc->smi_features_ok) {
> >          /* negotiation already complete, features locked */
> > @@ -378,9 +379,11 @@ static void smi_features_ok_callback(void *opaque)
> >          /* guest requests invalid features, leave @features_ok at zero */
> >          return;
> >      }
> > +    guest_cpu_hotplug_features = guest_features &
> > +                                 (BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT) |
> > +                                  BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT));
> >      if (!(guest_features & BIT_ULL(ICH9_LPC_SMI_F_BROADCAST_BIT)) &&
> > -        guest_features & (BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT) |
> > -                          BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT))) {
> > +        guest_cpu_hotplug_features) {
> >          /*
> >           * cpu hot-[un]plug with SMI requires SMI broadcast,
> >           * leave @features_ok at zero
> > @@ -388,6 +391,12 @@ static void smi_features_ok_callback(void *opaque)
> >          return;
> >      }
> >
> > +    if (guest_cpu_hotplug_features ==
> > +        BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT)) {
> > +        /* cpu hot-unplug is unsupported without cpu-hotplug */
> > +        return;
> > +    }
> > +
> >      /* valid feature subset requested, lock it down, report success */
> >      lpc->smi_negotiated_features = guest_features;
> >      lpc->smi_features_ok = 1;  
> 
> 
> (10) It would be nice to separate this work into multiple patches. I
> propose:
> 
> - [PATCH 1/5] x86: ich9: factor out "guest_cpu_hotplug_features"
> 
> >  hw/isa/lpc_ich9.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
> > index 087a18d04de4..c46eefd13fd4 100644
> > --- a/hw/isa/lpc_ich9.c
> > +++ b/hw/isa/lpc_ich9.c
> > @@ -366,6 +366,7 @@ static void smi_features_ok_callback(void *opaque)
> >  {
> >      ICH9LPCState *lpc = opaque;
> >      uint64_t guest_features;
> > +    uint64_t guest_cpu_hotplug_features;
> >
> >      if (lpc->smi_features_ok) {
> >          /* negotiation already complete, features locked */
> > @@ -378,9 +379,11 @@ static void smi_features_ok_callback(void *opaque)
> >          /* guest requests invalid features, leave @features_ok at zero */
> >          return;
> >      }
> > +    guest_cpu_hotplug_features = guest_features &
> > +                                 (BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT) |
> > +                                  BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT));
> >      if (!(guest_features & BIT_ULL(ICH9_LPC_SMI_F_BROADCAST_BIT)) &&
> > -        guest_features & (BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT) |
> > -                          BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT))) {
> > +        guest_cpu_hotplug_features) {
> >          /*
> >           * cpu hot-[un]plug with SMI requires SMI broadcast,
> >           * leave @features_ok at zero  
> 
> 
> - [PATCH 2/5] x86: ich9: let firmware negotiate 'CPU hot-unplug with SMI' feature
> 
> >  hw/i386/pc.c      | 1 +
> >  hw/isa/lpc_ich9.c | 8 +++++++-
> >  2 files changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > index 17b514d1da50..2952a00fe694 100644
> > --- a/hw/i386/pc.c
> > +++ b/hw/i386/pc.c
> > @@ -99,6 +99,7 @@
> >
> >  GlobalProperty pc_compat_5_1[] = {
> >      { "ICH9-LPC", "x-smi-cpu-hotplug", "off" },
> > +    { "ICH9-LPC", "x-smi-cpu-hotunplug", "off" },
> >  };
> >  const size_t pc_compat_5_1_len = G_N_ELEMENTS(pc_compat_5_1);
> >
> > diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
> > index c46eefd13fd4..5bc3f212fe77 100644
> > --- a/hw/isa/lpc_ich9.c
> > +++ b/hw/isa/lpc_ich9.c
> > @@ -391,6 +391,12 @@ static void smi_features_ok_callback(void *opaque)
> >          return;
> >      }
> >
> > +    if (guest_cpu_hotplug_features ==
> > +        BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT)) {
> > +        /* cpu hot-unplug is unsupported without cpu-hotplug */
> > +        return;
> > +    }
> > +
> >      /* valid feature subset requested, lock it down, report success */
> >      lpc->smi_negotiated_features = guest_features;
> >      lpc->smi_features_ok = 1;
> > @@ -773,7 +779,7 @@ static Property ich9_lpc_properties[] = {
> >      DEFINE_PROP_BIT64("x-smi-cpu-hotplug", ICH9LPCState, smi_host_features,
> >                        ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT, true),
> >      DEFINE_PROP_BIT64("x-smi-cpu-hotunplug", ICH9LPCState, smi_host_features,
> > -                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, false),
> > +                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, true),
> >      DEFINE_PROP_END_OF_LIST(),
> >  };
> >  
> 
> 
> - [PATCH 3/5] x86: acpi: introduce AcpiPmInfo::smi_on_cpu_unplug
> 
> >  hw/i386/acpi-build.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > index 1f5c2112452a..9036e5594c92 100644
> > --- a/hw/i386/acpi-build.c
> > +++ b/hw/i386/acpi-build.c
> > @@ -96,6 +96,7 @@ typedef struct AcpiPmInfo {
> >      bool s4_disabled;
> >      bool pcihp_bridge_en;
> >      bool smi_on_cpuhp;
> > +    bool smi_on_cpu_unplug;
> >      bool pcihp_root_en;
> >      uint8_t s4_val;
> >      AcpiFadtData fadt;
> > @@ -197,6 +198,7 @@ static void acpi_get_pm_info(MachineState *machine, AcpiPmInfo *pm)
> >      pm->pcihp_io_base = 0;
> >      pm->pcihp_io_len = 0;
> >      pm->smi_on_cpuhp = false;
> > +    pm->smi_on_cpu_unplug = false;
> >
> >      assert(obj);
> >      init_common_fadt_data(machine, obj, &pm->fadt);
> > @@ -220,6 +222,8 @@ static void acpi_get_pm_info(MachineState *machine, AcpiPmInfo *pm)
> >          pm->cpu_hp_io_base = ICH9_CPU_HOTPLUG_IO_BASE;
> >          pm->smi_on_cpuhp =
> >              !!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT));
> > +        pm->smi_on_cpu_unplug =
> > +            !!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT));
> >      }
> >
> >      /* The above need not be conditional on machine type because the reset port  
> 
> 
> - [PATCH 4/5] acpi: cpuhp: introduce 'firmware performs eject' status/control bits
> 
> >  docs/specs/acpi_cpu_hotplug.txt | 11 +++++++++--
> >  include/hw/acpi/cpu.h           |  1 +
> >  hw/acpi/cpu.c                   |  3 +++
> >  3 files changed, 13 insertions(+), 2 deletions(-)
> >
> > diff --git a/docs/specs/acpi_cpu_hotplug.txt b/docs/specs/acpi_cpu_hotplug.txt
> > index 9bb22d1270a9..f68ef6e06c7a 100644
> > --- a/docs/specs/acpi_cpu_hotplug.txt
> > +++ b/docs/specs/acpi_cpu_hotplug.txt
> > @@ -57,7 +57,11 @@ read access:
> >                It's valid only when bit 0 is set.
> >             2: Device remove event, used to distinguish device for which
> >                no device eject request to OSPM was issued.
> > -           3-7: reserved and should be ignored by OSPM
> > +           3: reserved and should be ignored by OSPM
> > +           4: if set to 1, OSPM requests firmware to perform device eject,
> > +              firmware shall clear this event by writing 1 into it before
> > +              performing device eject.
> > +           5-7: reserved and should be ignored by OSPM
> >      [0x5-0x7] reserved
> >      [0x8] Command data: (DWORD access)
> >            contains 0 unless value last stored in 'Command field' is one of:
> > @@ -82,7 +86,10 @@ write access:
> >                 selected CPU device
> >              3: if set to 1 initiates device eject, set by OSPM when it
> >                 triggers CPU device removal and calls _EJ0 method
> > -            4-7: reserved, OSPM must clear them before writing to register
> > +            4: if set to 1 OSPM hands over device eject to firmware,
> > +               Firmware shall issue device eject request as described above
> > +               (bit #3) and OSPM should not touch device eject bit (#3),
> > +            5-7: reserved, OSPM must clear them before writing to register
> >      [0x5] Command field: (1 byte access)
> >            value:
> >              0: selects a CPU device with inserting/removing events and
> > diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
> > index 0eeedaa491c1..d71edde456f2 100644
> > --- a/include/hw/acpi/cpu.h
> > +++ b/include/hw/acpi/cpu.h
> > @@ -22,6 +22,7 @@ typedef struct AcpiCpuStatus {
> >      uint64_t arch_id;
> >      bool is_inserting;
> >      bool is_removing;
> > +    bool fw_remove;
> >      uint32_t ost_event;
> >      uint32_t ost_status;
> >  } AcpiCpuStatus;
> > diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> > index f099b5092730..3dc83d73e20b 100644
> > --- a/hw/acpi/cpu.c
> > +++ b/hw/acpi/cpu.c
> > @@ -71,6 +71,7 @@ static uint64_t cpu_hotplug_rd(void *opaque, hwaddr addr, unsigned size)
> >          val |= cdev->cpu ? 1 : 0;
> >          val |= cdev->is_inserting ? 2 : 0;
> >          val |= cdev->is_removing  ? 4 : 0;
> > +        val |= cdev->fw_remove  ? 16 : 0;
> >          trace_cpuhp_acpi_read_flags(cpu_st->selector, val);
> >          break;
> >      case ACPI_CPU_CMD_DATA_OFFSET_RW:
> > @@ -148,6 +149,8 @@ static void cpu_hotplug_wr(void *opaque, hwaddr addr, uint64_t data,
> >              hotplug_ctrl = qdev_get_hotplug_handler(dev);
> >              hotplug_handler_unplug(hotplug_ctrl, dev, NULL);
> >              object_unparent(OBJECT(dev));
> > +        } else if (data & 16) {
> > +            cdev->fw_remove = !cdev->fw_remove;
> >          }
> >          break;
> >      case ACPI_CPU_CMD_OFFSET_WR:  
> 
> 
> - [PATCH 5/5] x86: acpi: let the firmware handle pending "CPU remove" events in SMM
> 
> >  include/hw/acpi/cpu.h |  1 +
> >  hw/acpi/cpu.c         | 15 +++++++++++++--
> >  hw/i386/acpi-build.c  |  1 +
> >  3 files changed, 15 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
> > index d71edde456f2..999caaf51060 100644
> > --- a/include/hw/acpi/cpu.h
> > +++ b/include/hw/acpi/cpu.h
> > @@ -51,6 +51,7 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object *owner,
> >  typedef struct CPUHotplugFeatures {
> >      bool acpi_1_compatible;
> >      bool has_legacy_cphp;
> > +    bool fw_unplugs_cpu;
> >      const char *smi_path;
> >  } CPUHotplugFeatures;
> >
> > diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> > index 3dc83d73e20b..09d2f20daec0 100644
> > --- a/hw/acpi/cpu.c
> > +++ b/hw/acpi/cpu.c
> > @@ -335,6 +335,7 @@ const VMStateDescription vmstate_cpu_hotplug = {
> >  #define CPU_INSERT_EVENT  "CINS"
> >  #define CPU_REMOVE_EVENT  "CRMV"
> >  #define CPU_EJECT_EVENT   "CEJ0"
> > +#define CPU_FW_EJECT_EVENT "CEJF"
> >
> >  void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> >                      hwaddr io_base,
> > @@ -387,7 +388,10 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> >          aml_append(field, aml_named_field(CPU_REMOVE_EVENT, 1));
> >          /* initiates device eject, write only */
> >          aml_append(field, aml_named_field(CPU_EJECT_EVENT, 1));
> > -        aml_append(field, aml_reserved_field(4));
> > +        aml_append(field, aml_reserved_field(1));
> > +        /* tell firmware to do device eject, write only */
> > +        aml_append(field, aml_named_field(CPU_FW_EJECT_EVENT, 1));
> > +        aml_append(field, aml_reserved_field(2));
> >          aml_append(field, aml_named_field(CPU_COMMAND, 8));
> >          aml_append(cpu_ctrl_dev, field);
> >
> > @@ -422,6 +426,7 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> >          Aml *ins_evt = aml_name("%s.%s", cphp_res_path, CPU_INSERT_EVENT);
> >          Aml *rm_evt = aml_name("%s.%s", cphp_res_path, CPU_REMOVE_EVENT);
> >          Aml *ej_evt = aml_name("%s.%s", cphp_res_path, CPU_EJECT_EVENT);
> > +        Aml *fw_ej_evt = aml_name("%s.%s", cphp_res_path, CPU_FW_EJECT_EVENT);
> >
> >          aml_append(cpus_dev, aml_name_decl("_HID", aml_string("ACPI0010")));
> >          aml_append(cpus_dev, aml_name_decl("_CID", aml_eisaid("PNP0A05")));
> > @@ -464,7 +469,13 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> >
> >              aml_append(method, aml_acquire(ctrl_lock, 0xFFFF));
> >              aml_append(method, aml_store(idx, cpu_selector));
> > -            aml_append(method, aml_store(one, ej_evt));
> > +            if (opts.fw_unplugs_cpu) {
> > +                aml_append(method, aml_store(one, fw_ej_evt));
> > +                aml_append(method, aml_store(aml_int(OVMF_CPUHP_SMI_CMD),
> > +                           aml_name("%s", opts.smi_path)));
> > +            } else {
> > +                aml_append(method, aml_store(one, ej_evt));
> > +            }
> >              aml_append(method, aml_release(ctrl_lock));
> >          }
> >          aml_append(cpus_dev, method);
> > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > index 9036e5594c92..475e76f514ff 100644
> > --- a/hw/i386/acpi-build.c
> > +++ b/hw/i386/acpi-build.c
> > @@ -1586,6 +1586,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
> >          CPUHotplugFeatures opts = {
> >              .acpi_1_compatible = true, .has_legacy_cphp = true,
> >              .smi_path = pm->smi_on_cpuhp ? "\\_SB.PCI0.SMI0.SMIC" : NULL,
> > +            .fw_unplugs_cpu = pm->smi_on_cpu_unplug,
> >          };
> >          build_cpus_aml(dsdt, machine, opts, pm->cpu_hp_io_base,
> >                         "\\_SB.PCI0", "\\_GPE._E02");  
> 
> Thanks!
> Laszlo
> 
>
Igor Mammedov Nov. 26, 2020, 8:45 p.m. UTC | #7
On Thu, 26 Nov 2020 02:24:27 -0800
Ankur Arora <ankur.a.arora@oracle.com> wrote:

> On 2020-11-24 4:25 a.m., Igor Mammedov wrote:
> > If firmware negotiates ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT feature,
> > OSPM on CPU eject will set bit #4 in CPU hotplug block for to be
> > ejected CPU to mark it for removal by firmware and trigger SMI
> > upcall to let firmware do actual eject.
> > 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> > PS:
> >    - abuse 5.1 machine type for now to turn off unplug feature
> >      (it will be moved to 5.2 machine type once new merge window is open)
> > ---
> >   include/hw/acpi/cpu.h           |  2 ++
> >   docs/specs/acpi_cpu_hotplug.txt | 11 +++++++++--
> >   hw/acpi/cpu.c                   | 18 ++++++++++++++++--
> >   hw/i386/acpi-build.c            |  5 +++++
> >   hw/i386/pc.c                    |  1 +
> >   hw/isa/lpc_ich9.c               |  2 +-
> >   6 files changed, 34 insertions(+), 5 deletions(-)
> > 
> > diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
> > index 0eeedaa491..999caaf510 100644
> > --- a/include/hw/acpi/cpu.h
> > +++ b/include/hw/acpi/cpu.h
> > @@ -22,6 +22,7 @@ typedef struct AcpiCpuStatus {
> >       uint64_t arch_id;
> >       bool is_inserting;
> >       bool is_removing;
> > +    bool fw_remove;
> >       uint32_t ost_event;
> >       uint32_t ost_status;
> >   } AcpiCpuStatus;
> > @@ -50,6 +51,7 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object *owner,
> >   typedef struct CPUHotplugFeatures {
> >       bool acpi_1_compatible;
> >       bool has_legacy_cphp;
> > +    bool fw_unplugs_cpu;
> >       const char *smi_path;
> >   } CPUHotplugFeatures;
> >   
> > diff --git a/docs/specs/acpi_cpu_hotplug.txt b/docs/specs/acpi_cpu_hotplug.txt
> > index 9bb22d1270..f68ef6e06c 100644
> > --- a/docs/specs/acpi_cpu_hotplug.txt
> > +++ b/docs/specs/acpi_cpu_hotplug.txt
> > @@ -57,7 +57,11 @@ read access:
> >                 It's valid only when bit 0 is set.
> >              2: Device remove event, used to distinguish device for which
> >                 no device eject request to OSPM was issued.
> > -           3-7: reserved and should be ignored by OSPM
> > +           3: reserved and should be ignored by OSPM
> > +           4: if set to 1, OSPM requests firmware to perform device eject,
> > +              firmware shall clear this event by writing 1 into it before
> > +              performing device eject> +           5-7: reserved and should be ignored by OSPM
> >       [0x5-0x7] reserved
> >       [0x8] Command data: (DWORD access)
> >             contains 0 unless value last stored in 'Command field' is one of:
> > @@ -82,7 +86,10 @@ write access:
> >                  selected CPU device
> >               3: if set to 1 initiates device eject, set by OSPM when it
> >                  triggers CPU device removal and calls _EJ0 method
> > -            4-7: reserved, OSPM must clear them before writing to register
> > +            4: if set to 1 OSPM hands over device eject to firmware,
> > +               Firmware shall issue device eject request as described above
> > +               (bit #3) and OSPM should not touch device eject bit (#3),
> > +            5-7: reserved, OSPM must clear them before writing to register
> >       [0x5] Command field: (1 byte access)
> >             value:
> >               0: selects a CPU device with inserting/removing events and
> > diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> > index f099b50927..09d2f20dae 100644
> > --- a/hw/acpi/cpu.c
> > +++ b/hw/acpi/cpu.c
> > @@ -71,6 +71,7 @@ static uint64_t cpu_hotplug_rd(void *opaque, hwaddr addr, unsigned size)
> >           val |= cdev->cpu ? 1 : 0;
> >           val |= cdev->is_inserting ? 2 : 0;
> >           val |= cdev->is_removing  ? 4 : 0;
> > +        val |= cdev->fw_remove  ? 16 : 0;  
> 
> I might be missing something but I don't see where cdev->fw_remove is being
> set. We do set cdev->is_removing in acpi_cpu_unplug_request_cb() so AFAICS
> we would always end up setting this bit:
> >           val |= cdev->is_removing  ? 4 : 0;  
> 
> Also, if cdev->fw_remove and cdev->is_removing are both true, val would be
> (4 | 16). I'm guessing that in that case the AML determines which case gets
> handled but it might make sense to set just one of these?

cdev->fw_remove is set by AML when OSPM thinks it's ready to remove the CPU,
see "aml_append(method, aml_store(one, fw_ej_evt));" in this patch.

cdev->is_removing is set by QEMU's device_del command, and processed by AML
(which includes it being cleared before OSPM calls EJ0), it only serves
as flag for generating eject notification to OSPM.

> 
> 
> >           trace_cpuhp_acpi_read_flags(cpu_st->selector, val);
> >           break;
> >       case ACPI_CPU_CMD_DATA_OFFSET_RW:
> > @@ -148,6 +149,8 @@ static void cpu_hotplug_wr(void *opaque, hwaddr addr, uint64_t data,
> >               hotplug_ctrl = qdev_get_hotplug_handler(dev);
> >               hotplug_handler_unplug(hotplug_ctrl, dev, NULL);
> >               object_unparent(OBJECT(dev));
> > +        } else if (data & 16) {
> > +            cdev->fw_remove = !cdev->fw_remove;
> >           }
> >           break;
> >       case ACPI_CPU_CMD_OFFSET_WR:
> > @@ -332,6 +335,7 @@ const VMStateDescription vmstate_cpu_hotplug = {
> >   #define CPU_INSERT_EVENT  "CINS"
> >   #define CPU_REMOVE_EVENT  "CRMV"
> >   #define CPU_EJECT_EVENT   "CEJ0"
> > +#define CPU_FW_EJECT_EVENT "CEJF"
> >   
> >   void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> >                       hwaddr io_base,
> > @@ -384,7 +388,10 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> >           aml_append(field, aml_named_field(CPU_REMOVE_EVENT, 1));
> >           /* initiates device eject, write only */
> >           aml_append(field, aml_named_field(CPU_EJECT_EVENT, 1));
> > -        aml_append(field, aml_reserved_field(4));
> > +        aml_append(field, aml_reserved_field(1));
> > +        /* tell firmware to do device eject, write only */
> > +        aml_append(field, aml_named_field(CPU_FW_EJECT_EVENT, 1));
> > +        aml_append(field, aml_reserved_field(2));
> >           aml_append(field, aml_named_field(CPU_COMMAND, 8));
> >           aml_append(cpu_ctrl_dev, field);
> >   
> > @@ -419,6 +426,7 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> >           Aml *ins_evt = aml_name("%s.%s", cphp_res_path, CPU_INSERT_EVENT);
> >           Aml *rm_evt = aml_name("%s.%s", cphp_res_path, CPU_REMOVE_EVENT);
> >           Aml *ej_evt = aml_name("%s.%s", cphp_res_path, CPU_EJECT_EVENT);
> > +        Aml *fw_ej_evt = aml_name("%s.%s", cphp_res_path, CPU_FW_EJECT_EVENT);
> >   
> >           aml_append(cpus_dev, aml_name_decl("_HID", aml_string("ACPI0010")));
> >           aml_append(cpus_dev, aml_name_decl("_CID", aml_eisaid("PNP0A05")));
> > @@ -461,7 +469,13 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> >   
> >               aml_append(method, aml_acquire(ctrl_lock, 0xFFFF));
> >               aml_append(method, aml_store(idx, cpu_selector));
> > -            aml_append(method, aml_store(one, ej_evt));
> > +            if (opts.fw_unplugs_cpu) {
> > +                aml_append(method, aml_store(one, fw_ej_evt));
> > +                aml_append(method, aml_store(aml_int(OVMF_CPUHP_SMI_CMD),
> > +                           aml_name("%s", opts.smi_path)));
> > +            } else {
> > +                aml_append(method, aml_store(one, ej_evt));
> > +            }  
> My knowledge of AML is rather rudimentary but this looks mostly reasonable to me.
> 
> One question: the corresponding code for CPU hotplug does not send an SMI_CMD.
> Why the difference?
SMI for hotplug is sent during CSCN time before OSPM gets notification about new CPU[s]
it's a block below 'in case FW negotiated ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT,' comment.

> 
>                      aml_append(while_ctx,
>                          aml_store(aml_derefof(aml_index(new_cpus, cpu_idx)),
>                                    uid));
>                      aml_append(while_ctx,
>                          aml_call2(CPU_NOTIFY_METHOD, uid, dev_chk));
>                      aml_append(while_ctx, aml_store(uid, cpu_selector));
>                      aml_append(while_ctx, aml_store(one, ins_evt));
>                      aml_append(while_ctx, aml_increment(cpu_idx));
> 
> 
> >               aml_append(method, aml_release(ctrl_lock));
> >           }
> >           aml_append(cpus_dev, method);
> > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > index 1f5c211245..475e76f514 100644
> > --- a/hw/i386/acpi-build.c
> > +++ b/hw/i386/acpi-build.c
> > @@ -96,6 +96,7 @@ typedef struct AcpiPmInfo {
> >       bool s4_disabled;
> >       bool pcihp_bridge_en;
> >       bool smi_on_cpuhp;
> > +    bool smi_on_cpu_unplug;
> >       bool pcihp_root_en;
> >       uint8_t s4_val;
> >       AcpiFadtData fadt;
> > @@ -197,6 +198,7 @@ static void acpi_get_pm_info(MachineState *machine, AcpiPmInfo *pm)
> >       pm->pcihp_io_base = 0;
> >       pm->pcihp_io_len = 0;
> >       pm->smi_on_cpuhp = false;
> > +    pm->smi_on_cpu_unplug = false;
> >   
> >       assert(obj);
> >       init_common_fadt_data(machine, obj, &pm->fadt);
> > @@ -220,6 +222,8 @@ static void acpi_get_pm_info(MachineState *machine, AcpiPmInfo *pm)
> >           pm->cpu_hp_io_base = ICH9_CPU_HOTPLUG_IO_BASE;
> >           pm->smi_on_cpuhp =
> >               !!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT));
> > +        pm->smi_on_cpu_unplug =
> > +            !!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT));
> >       }
> >   
> >       /* The above need not be conditional on machine type because the reset port
> > @@ -1582,6 +1586,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
> >           CPUHotplugFeatures opts = {
> >               .acpi_1_compatible = true, .has_legacy_cphp = true,
> >               .smi_path = pm->smi_on_cpuhp ? "\\_SB.PCI0.SMI0.SMIC" : NULL,
> > +            .fw_unplugs_cpu = pm->smi_on_cpu_unplug,
> >           };
> >           build_cpus_aml(dsdt, machine, opts, pm->cpu_hp_io_base,
> >                          "\\_SB.PCI0", "\\_GPE._E02");
> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > index 17b514d1da..2952a00fe6 100644
> > --- a/hw/i386/pc.c
> > +++ b/hw/i386/pc.c
> > @@ -99,6 +99,7 @@
> >   
> >   GlobalProperty pc_compat_5_1[] = {
> >       { "ICH9-LPC", "x-smi-cpu-hotplug", "off" },
> > +    { "ICH9-LPC", "x-smi-cpu-hotunplug", "off" },
> >   };
> >   const size_t pc_compat_5_1_len = G_N_ELEMENTS(pc_compat_5_1);
> >   
> > diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
> > index 087a18d04d..8c667b7166 100644
> > --- a/hw/isa/lpc_ich9.c
> > +++ b/hw/isa/lpc_ich9.c
> > @@ -770,7 +770,7 @@ static Property ich9_lpc_properties[] = {
> >       DEFINE_PROP_BIT64("x-smi-cpu-hotplug", ICH9LPCState, smi_host_features,
> >                         ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT, true),
> >       DEFINE_PROP_BIT64("x-smi-cpu-hotunplug", ICH9LPCState, smi_host_features,
> > -                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, false),
> > +                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, true),
> >       DEFINE_PROP_END_OF_LIST(),
> >   };
> >   
> >   
> 
> Thanks for sending out the patch btw. This helped me crystallize some of the
> corresponding OVMF code.
> 
> Ankur
>
Ankur Arora Nov. 27, 2020, 3:35 a.m. UTC | #8
On 2020-11-26 4:46 a.m., Laszlo Ersek wrote:
> On 11/26/20 11:24, Ankur Arora wrote:
>> On 2020-11-24 4:25 a.m., Igor Mammedov wrote:
>>> If firmware negotiates ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT feature,
>>> OSPM on CPU eject will set bit #4 in CPU hotplug block for to be
>>> ejected CPU to mark it for removal by firmware and trigger SMI
>>> upcall to let firmware do actual eject.
>>>
>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
>>> ---
>>> PS:
>>>     - abuse 5.1 machine type for now to turn off unplug feature
>>>       (it will be moved to 5.2 machine type once new merge window is open)
>>> ---
>>>    include/hw/acpi/cpu.h           |  2 ++
>>>    docs/specs/acpi_cpu_hotplug.txt | 11 +++++++++--
>>>    hw/acpi/cpu.c                   | 18 ++++++++++++++++--
>>>    hw/i386/acpi-build.c            |  5 +++++
>>>    hw/i386/pc.c                    |  1 +
>>>    hw/isa/lpc_ich9.c               |  2 +-
>>>    6 files changed, 34 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
>>> index 0eeedaa491..999caaf510 100644
>>> --- a/include/hw/acpi/cpu.h
>>> +++ b/include/hw/acpi/cpu.h
>>> @@ -22,6 +22,7 @@ typedef struct AcpiCpuStatus {
>>>        uint64_t arch_id;
>>>        bool is_inserting;
>>>        bool is_removing;
>>> +    bool fw_remove;
>>>        uint32_t ost_event;
>>>        uint32_t ost_status;
>>>    } AcpiCpuStatus;
>>> @@ -50,6 +51,7 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object
>>> *owner,
>>>    typedef struct CPUHotplugFeatures {
>>>        bool acpi_1_compatible;
>>>        bool has_legacy_cphp;
>>> +    bool fw_unplugs_cpu;
>>>        const char *smi_path;
>>>    } CPUHotplugFeatures;
>>>    diff --git a/docs/specs/acpi_cpu_hotplug.txt
>>> b/docs/specs/acpi_cpu_hotplug.txt
>>> index 9bb22d1270..f68ef6e06c 100644
>>> --- a/docs/specs/acpi_cpu_hotplug.txt
>>> +++ b/docs/specs/acpi_cpu_hotplug.txt
>>> @@ -57,7 +57,11 @@ read access:
>>>                  It's valid only when bit 0 is set.
>>>               2: Device remove event, used to distinguish device for which
>>>                  no device eject request to OSPM was issued.
>>> -           3-7: reserved and should be ignored by OSPM
>>> +           3: reserved and should be ignored by OSPM
>>> +           4: if set to 1, OSPM requests firmware to perform device
>>> eject,
>>> +              firmware shall clear this event by writing 1 into it
>>> before
>>> +              performing device eject> +           5-7: reserved and
>>> should be ignored by OSPM
>>>        [0x5-0x7] reserved
>>>        [0x8] Command data: (DWORD access)
>>>              contains 0 unless value last stored in 'Command field' is
>>> one of:
>>> @@ -82,7 +86,10 @@ write access:
>>>                   selected CPU device
>>>                3: if set to 1 initiates device eject, set by OSPM when it
>>>                   triggers CPU device removal and calls _EJ0 method
>>> -            4-7: reserved, OSPM must clear them before writing to
>>> register
>>> +            4: if set to 1 OSPM hands over device eject to firmware,
>>> +               Firmware shall issue device eject request as described
>>> above
>>> +               (bit #3) and OSPM should not touch device eject bit (#3),
>>> +            5-7: reserved, OSPM must clear them before writing to
>>> register
>>>        [0x5] Command field: (1 byte access)
>>>              value:
>>>                0: selects a CPU device with inserting/removing events and
>>> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
>>> index f099b50927..09d2f20dae 100644
>>> --- a/hw/acpi/cpu.c
>>> +++ b/hw/acpi/cpu.c
>>> @@ -71,6 +71,7 @@ static uint64_t cpu_hotplug_rd(void *opaque, hwaddr
>>> addr, unsigned size)
>>>            val |= cdev->cpu ? 1 : 0;
>>>            val |= cdev->is_inserting ? 2 : 0;
>>>            val |= cdev->is_removing  ? 4 : 0;
>>> +        val |= cdev->fw_remove  ? 16 : 0;
>>
>> I might be missing something but I don't see where cdev->fw_remove is being
>> set.
> 
> See just below, in the cpu_hotplug_wr() hunk. When bit#4 is written --
> which happens through the ACPI code change --, fw_remove is inverted.
Thanks that makes sense. I was reading the AML building code all wrong.

> 
> 
>> We do set cdev->is_removing in acpi_cpu_unplug_request_cb() so AFAICS
>> we would always end up setting this bit:
>>>            val |= cdev->is_removing  ? 4 : 0;
>>
>> Also, if cdev->fw_remove and cdev->is_removing are both true, val would be
>> (4 | 16). I'm guessing that in that case the AML determines which case gets
>> handled but it might make sense to set just one of these?
> 
> "is_removing" is set directly in response to the device_del QMP command.
> That QMP command is asynchronous to the execution of the guest OS.
> j
> "fw_remove" is set (by virtue of inverting) by ACPI CEJ0, which is
> executed by the guest OS's ACPI interpreter, after the guest OS has
> de-scheduled all processes from the CPU being removed (= basically after
> the OS has willfully forgotten about the CPU).
> 
> Therefore, considering the bitmask (is_removing, fw_remove), three
> variations make sense:

Just annotating these with the corresponding ACPI code to make sure
I have it straight. Please correct if my interpretation is wrong. Also,
a few questions inline:

> 
> #1 (is_removing=0, fw_remove=0) -- normal status; no unplug requested
> 
> #2 (is_removing=1, fw_remove=0) -- unplug requested via QMP, guest OS
>                                     is processing the request

Guest executes the CSCN method and reads rm_evt (bit 2) (thus noticing
the is_removing=1), and then notifies the CPU to be removed via the
CTFY method.

    ifctx = aml_if(aml_equal(rm_evt, one));
    {
            aml_append(ifctx,
                       aml_call2(CPU_NOTIFY_METHOD, uid, eject_req));
            aml_append(ifctx, aml_store(one, rm_evt));
            aml_append(ifctx, aml_store(one, has_event));
    }

Then it does a store to rm_evt (bit 2). That would result in clearing
of is_removing. (Igor mentions that in a separate mail.)

1. Do we need to clear is_removing at all? AFAICS, it's only useful as
an ack to QEMU and I can't think of why that's useful. OTOH it
doesn't serve any useful purpose once the guest OS has seen the request.

2. Would it make sense to clear it first and then call CPU_NOTIFY_METHOD?
CPU_NOTIFY_METHOD (or _EJ0, COST) don't depend on is_removing but
that might change in the future.

The notify would end up in calling acpi_hotplug_schedule() which would be
responsible for queuing work (on CPU0) to detach+unplug the CPU.

Once the OS level detach succeeds, the worker evaluates the "_EJ0" method
which would do the actual CPU_EJECT_METHOD work.

If the detach fails then it evaluates the CPU_OST_METHOD which updates
the status for the event and the status.

At this point the state is back to:

(is_removing=0, fw_remove=0)

> #3 (is_removing=1, fw_remove=1) -- guest OS removed all references from
>                                     the CPU, firmware is permitted /
>                                     required to forget about the CPU as
>                                     well, and then unplug the CPU

CPU_EJECT_METHOD will do a store to bit 4, which would invert (and
thus set) fw_remove and then do the SMI.

So, this would be
> #3 (is_removing=0, fw_remove=1)

At this point the firmware calls QemuCPUhpCollectApicIds() which
(after changes) notices CPU(s) with fw_remove set.

Collects them and does a store to bit 4, which would clear fw_remove.

> 
> #4 (is_removing=1, fw_remove=0) -- fimware is about to unplug the CPU
> 
> #5 (is_removing=0, fw_remove=0) -- firmware performing unplug
Firmware does an unplug and writes to bit 3, thus clearing is_removing.

On return from the firmware the guest evaluates the COST again.

And, eventually goes back to the CSCN where it processes more
hotplug or unplug events.

> 
> The variation (is_removing=0, fw_remove=1) is invalid / unused.

/nods
> 
> 
> The firmware may be investigating the CPU register block between steps
> #2 and #3 -- in other words, the firmware may see a CPU for which
> is_remove is set (unplug requested via QMP), but the OS has not vacated
> yet (fw_remove=0). In that case, the firmware must just skip the CPU --
> once the OS is done, it will set fw_remove too, and raise another SMI.
Yeah, it makes sense for the firmware to only care about a CPU once it
sees fw_remove=1. (And as currently situated, the firmware would never
see is_removing=1 at all.)


Thanks
Ankur

> 
> 
>>
>>
>>>            trace_cpuhp_acpi_read_flags(cpu_st->selector, val);
>>>            break;
>>>        case ACPI_CPU_CMD_DATA_OFFSET_RW:
>>> @@ -148,6 +149,8 @@ static void cpu_hotplug_wr(void *opaque, hwaddr
>>> addr, uint64_t data,
>>>                hotplug_ctrl = qdev_get_hotplug_handler(dev);
>>>                hotplug_handler_unplug(hotplug_ctrl, dev, NULL);
>>>                object_unparent(OBJECT(dev));
>>> +        } else if (data & 16) {
>>> +            cdev->fw_remove = !cdev->fw_remove;
>>>            }
>>>            break;
>>>        case ACPI_CPU_CMD_OFFSET_WR:
>>> @@ -332,6 +335,7 @@ const VMStateDescription vmstate_cpu_hotplug = {
>>>    #define CPU_INSERT_EVENT  "CINS"
>>>    #define CPU_REMOVE_EVENT  "CRMV"
>>>    #define CPU_EJECT_EVENT   "CEJ0"
>>> +#define CPU_FW_EJECT_EVENT "CEJF"
>>>      void build_cpus_aml(Aml *table, MachineState *machine,
>>> CPUHotplugFeatures opts,
>>>                        hwaddr io_base,
>>> @@ -384,7 +388,10 @@ void build_cpus_aml(Aml *table, MachineState
>>> *machine, CPUHotplugFeatures opts,
>>>            aml_append(field, aml_named_field(CPU_REMOVE_EVENT, 1));
>>>            /* initiates device eject, write only */
>>>            aml_append(field, aml_named_field(CPU_EJECT_EVENT, 1));
>>> -        aml_append(field, aml_reserved_field(4));
>>> +        aml_append(field, aml_reserved_field(1));
>>> +        /* tell firmware to do device eject, write only */
>>> +        aml_append(field, aml_named_field(CPU_FW_EJECT_EVENT, 1));
>>> +        aml_append(field, aml_reserved_field(2));
>>>            aml_append(field, aml_named_field(CPU_COMMAND, 8));
>>>            aml_append(cpu_ctrl_dev, field);
>>>    @@ -419,6 +426,7 @@ void build_cpus_aml(Aml *table, MachineState
>>> *machine, CPUHotplugFeatures opts,
>>>            Aml *ins_evt = aml_name("%s.%s", cphp_res_path,
>>> CPU_INSERT_EVENT);
>>>            Aml *rm_evt = aml_name("%s.%s", cphp_res_path,
>>> CPU_REMOVE_EVENT);
>>>            Aml *ej_evt = aml_name("%s.%s", cphp_res_path,
>>> CPU_EJECT_EVENT);
>>> +        Aml *fw_ej_evt = aml_name("%s.%s", cphp_res_path,
>>> CPU_FW_EJECT_EVENT);
>>>              aml_append(cpus_dev, aml_name_decl("_HID",
>>> aml_string("ACPI0010")));
>>>            aml_append(cpus_dev, aml_name_decl("_CID",
>>> aml_eisaid("PNP0A05")));
>>> @@ -461,7 +469,13 @@ void build_cpus_aml(Aml *table, MachineState
>>> *machine, CPUHotplugFeatures opts,
>>>                  aml_append(method, aml_acquire(ctrl_lock, 0xFFFF));
>>>                aml_append(method, aml_store(idx, cpu_selector));
>>> -            aml_append(method, aml_store(one, ej_evt));
>>> +            if (opts.fw_unplugs_cpu) {
>>> +                aml_append(method, aml_store(one, fw_ej_evt));
>>> +                aml_append(method,
>>> aml_store(aml_int(OVMF_CPUHP_SMI_CMD),
>>> +                           aml_name("%s", opts.smi_path)));
>>> +            } else {
>>> +                aml_append(method, aml_store(one, ej_evt));
>>> +            }
>> My knowledge of AML is rather rudimentary but this looks mostly
>> reasonable to me.
>>
>> One question: the corresponding code for CPU hotplug does not send an
>> SMI_CMD.
>> Why the difference?
> 
> This code (on eject) is executing *after* the OS kernel has processed
> the event. But on hotplug, the ordering is different (it must be): in
> that case, the CSCN (scan) method first notifies the firmware, and then
> the OS.
> 
> Thanks
> Laszlo
> 
>>
>>                      aml_append(while_ctx,
>>                          aml_store(aml_derefof(aml_index(new_cpus,
>> cpu_idx)),
>>                                    uid));
>>                      aml_append(while_ctx,
>>                          aml_call2(CPU_NOTIFY_METHOD, uid, dev_chk));
>>                      aml_append(while_ctx, aml_store(uid, cpu_selector));
>>                      aml_append(while_ctx, aml_store(one, ins_evt));
>>                      aml_append(while_ctx, aml_increment(cpu_idx));
>>
>>
>>>                aml_append(method, aml_release(ctrl_lock));
>>>            }
>>>            aml_append(cpus_dev, method);
>>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>>> index 1f5c211245..475e76f514 100644
>>> --- a/hw/i386/acpi-build.c
>>> +++ b/hw/i386/acpi-build.c
>>> @@ -96,6 +96,7 @@ typedef struct AcpiPmInfo {
>>>        bool s4_disabled;
>>>        bool pcihp_bridge_en;
>>>        bool smi_on_cpuhp;
>>> +    bool smi_on_cpu_unplug;
>>>        bool pcihp_root_en;
>>>        uint8_t s4_val;
>>>        AcpiFadtData fadt;
>>> @@ -197,6 +198,7 @@ static void acpi_get_pm_info(MachineState
>>> *machine, AcpiPmInfo *pm)
>>>        pm->pcihp_io_base = 0;
>>>        pm->pcihp_io_len = 0;
>>>        pm->smi_on_cpuhp = false;
>>> +    pm->smi_on_cpu_unplug = false;
>>>          assert(obj);
>>>        init_common_fadt_data(machine, obj, &pm->fadt);
>>> @@ -220,6 +222,8 @@ static void acpi_get_pm_info(MachineState
>>> *machine, AcpiPmInfo *pm)
>>>            pm->cpu_hp_io_base = ICH9_CPU_HOTPLUG_IO_BASE;
>>>            pm->smi_on_cpuhp =
>>>                !!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT));
>>> +        pm->smi_on_cpu_unplug =
>>> +            !!(smi_features &
>>> BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT));
>>>        }
>>>          /* The above need not be conditional on machine type because
>>> the reset port
>>> @@ -1582,6 +1586,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>>>            CPUHotplugFeatures opts = {
>>>                .acpi_1_compatible = true, .has_legacy_cphp = true,
>>>                .smi_path = pm->smi_on_cpuhp ? "\\_SB.PCI0.SMI0.SMIC" :
>>> NULL,
>>> +            .fw_unplugs_cpu = pm->smi_on_cpu_unplug,
>>>            };
>>>            build_cpus_aml(dsdt, machine, opts, pm->cpu_hp_io_base,
>>>                           "\\_SB.PCI0", "\\_GPE._E02");
>>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>>> index 17b514d1da..2952a00fe6 100644
>>> --- a/hw/i386/pc.c
>>> +++ b/hw/i386/pc.c
>>> @@ -99,6 +99,7 @@
>>>      GlobalProperty pc_compat_5_1[] = {
>>>        { "ICH9-LPC", "x-smi-cpu-hotplug", "off" },
>>> +    { "ICH9-LPC", "x-smi-cpu-hotunplug", "off" },
>>>    };
>>>    const size_t pc_compat_5_1_len = G_N_ELEMENTS(pc_compat_5_1);
>>>    diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
>>> index 087a18d04d..8c667b7166 100644
>>> --- a/hw/isa/lpc_ich9.c
>>> +++ b/hw/isa/lpc_ich9.c
>>> @@ -770,7 +770,7 @@ static Property ich9_lpc_properties[] = {
>>>        DEFINE_PROP_BIT64("x-smi-cpu-hotplug", ICH9LPCState,
>>> smi_host_features,
>>>                          ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT, true),
>>>        DEFINE_PROP_BIT64("x-smi-cpu-hotunplug", ICH9LPCState,
>>> smi_host_features,
>>> -                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, false),
>>> +                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, true),
>>>        DEFINE_PROP_END_OF_LIST(),
>>>    };
>>>   
>>
>> Thanks for sending out the patch btw. This helped me crystallize some of
>> the
>> corresponding OVMF code.
>>
>> Ankur
>>
>
Ankur Arora Nov. 27, 2020, 3:39 a.m. UTC | #9
On 2020-11-26 11:50 a.m., Igor Mammedov wrote:
> On Thu, 26 Nov 2020 13:46:32 +0100
> Laszlo Ersek <lersek@redhat.com> wrote:
> 
>> On 11/26/20 11:24, Ankur Arora wrote:
>>> On 2020-11-24 4:25 a.m., Igor Mammedov wrote:
>>>> If firmware negotiates ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT feature,
>>>> OSPM on CPU eject will set bit #4 in CPU hotplug block for to be
>>>> ejected CPU to mark it for removal by firmware and trigger SMI
>>>> upcall to let firmware do actual eject.
>>>>
>>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
>>>> ---
>>>> PS:
>>>>     - abuse 5.1 machine type for now to turn off unplug feature
>>>>       (it will be moved to 5.2 machine type once new merge window is open)
>>>> ---
>>>>    include/hw/acpi/cpu.h           |  2 ++
>>>>    docs/specs/acpi_cpu_hotplug.txt | 11 +++++++++--
>>>>    hw/acpi/cpu.c                   | 18 ++++++++++++++++--
>>>>    hw/i386/acpi-build.c            |  5 +++++
>>>>    hw/i386/pc.c                    |  1 +
>>>>    hw/isa/lpc_ich9.c               |  2 +-
>>>>    6 files changed, 34 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
>>>> index 0eeedaa491..999caaf510 100644
>>>> --- a/include/hw/acpi/cpu.h
>>>> +++ b/include/hw/acpi/cpu.h
>>>> @@ -22,6 +22,7 @@ typedef struct AcpiCpuStatus {
>>>>        uint64_t arch_id;
>>>>        bool is_inserting;
>>>>        bool is_removing;
>>>> +    bool fw_remove;
>>>>        uint32_t ost_event;
>>>>        uint32_t ost_status;
>>>>    } AcpiCpuStatus;
>>>> @@ -50,6 +51,7 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object
>>>> *owner,
>>>>    typedef struct CPUHotplugFeatures {
>>>>        bool acpi_1_compatible;
>>>>        bool has_legacy_cphp;
>>>> +    bool fw_unplugs_cpu;
>>>>        const char *smi_path;
>>>>    } CPUHotplugFeatures;
>>>>    diff --git a/docs/specs/acpi_cpu_hotplug.txt
>>>> b/docs/specs/acpi_cpu_hotplug.txt
>>>> index 9bb22d1270..f68ef6e06c 100644
>>>> --- a/docs/specs/acpi_cpu_hotplug.txt
>>>> +++ b/docs/specs/acpi_cpu_hotplug.txt
>>>> @@ -57,7 +57,11 @@ read access:
>>>>                  It's valid only when bit 0 is set.
>>>>               2: Device remove event, used to distinguish device for which
>>>>                  no device eject request to OSPM was issued.
>>>> -           3-7: reserved and should be ignored by OSPM
>>>> +           3: reserved and should be ignored by OSPM
>>>> +           4: if set to 1, OSPM requests firmware to perform device
>>>> eject,
>>>> +              firmware shall clear this event by writing 1 into it
>>>> before
>>>> +              performing device eject> +           5-7: reserved and
>>>> should be ignored by OSPM
>>>>        [0x5-0x7] reserved
>>>>        [0x8] Command data: (DWORD access)
>>>>              contains 0 unless value last stored in 'Command field' is
>>>> one of:
>>>> @@ -82,7 +86,10 @@ write access:
>>>>                   selected CPU device
>>>>                3: if set to 1 initiates device eject, set by OSPM when it
>>>>                   triggers CPU device removal and calls _EJ0 method
>>>> -            4-7: reserved, OSPM must clear them before writing to
>>>> register
>>>> +            4: if set to 1 OSPM hands over device eject to firmware,
>>>> +               Firmware shall issue device eject request as described
>>>> above
>>>> +               (bit #3) and OSPM should not touch device eject bit (#3),
>>>> +            5-7: reserved, OSPM must clear them before writing to
>>>> register
>>>>        [0x5] Command field: (1 byte access)
>>>>              value:
>>>>                0: selects a CPU device with inserting/removing events and
>>>> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
>>>> index f099b50927..09d2f20dae 100644
>>>> --- a/hw/acpi/cpu.c
>>>> +++ b/hw/acpi/cpu.c
>>>> @@ -71,6 +71,7 @@ static uint64_t cpu_hotplug_rd(void *opaque, hwaddr
>>>> addr, unsigned size)
>>>>            val |= cdev->cpu ? 1 : 0;
>>>>            val |= cdev->is_inserting ? 2 : 0;
>>>>            val |= cdev->is_removing  ? 4 : 0;
>>>> +        val |= cdev->fw_remove  ? 16 : 0;
>>>
>>> I might be missing something but I don't see where cdev->fw_remove is being
>>> set.
>>
>> See just below, in the cpu_hotplug_wr() hunk. When bit#4 is written --
>> which happens through the ACPI code change --, fw_remove is inverted.
>>
>>
>>> We do set cdev->is_removing in acpi_cpu_unplug_request_cb() so AFAICS
>>> we would always end up setting this bit:
>>>>            val |= cdev->is_removing  ? 4 : 0;
>>>
>>> Also, if cdev->fw_remove and cdev->is_removing are both true, val would be
>>> (4 | 16). I'm guessing that in that case the AML determines which case gets
>>> handled but it might make sense to set just one of these?
>>
>> "is_removing" is set directly in response to the device_del QMP command.
>> That QMP command is asynchronous to the execution of the guest OS.
> 
> its removing is notification to OSPM, which is cleared when ACPI scans
> for events if I'm not mistaken.

Yeah, I think I finally have it on straight. Though, I have a couple of questions
for you in my reply to Laszlo.

Thanks
Ankur

> 
> 
>>
>> "fw_remove" is set (by virtue of inverting) by ACPI CEJ0, which is
>> executed by the guest OS's ACPI interpreter, after the guest OS has
>> de-scheduled all processes from the CPU being removed (= basically after
>> the OS has willfully forgotten about the CPU).
>>
>> Therefore, considering the bitmask (is_removing, fw_remove), three
>> variations make sense:
>>
>> #1 (is_removing=0, fw_remove=0) -- normal status; no unplug requested
>>
>> #2 (is_removing=1, fw_remove=0) -- unplug requested via QMP, guest OS
>>                                     is processing the request
> 
> 
>> #3 (is_removing=1, fw_remove=1) -- guest OS removed all references from
>>                                     the CPU, firmware is permitted /
>>                                     required to forget about the CPU as
>>                                     well, and then unplug the CP
> shouldn't be possible
> 
>>
>> #4 (is_removing=1, fw_remove=0) -- fimware is about to unplug the CPU
> ditto
> 
> 
>> #5 (is_removing=0, fw_remove=0) -- firmware performing unplug
>>
>>
>> The variation (is_removing=0, fw_remove=1) is invalid / unused.
>>
>>
>> The firmware may be investigating the CPU register block between steps
>> #2 and #3 -- in other words, the firmware may see a CPU for which
>> is_remove is set (unplug requested via QMP), but the OS has not vacated
>> yet (fw_remove=0). In that case, the firmware must just skip the CPU --
>> once the OS is done, it will set fw_remove too, and raise another SMI.
>>
>>
>>>
>>>    
>>>>            trace_cpuhp_acpi_read_flags(cpu_st->selector, val);
>>>>            break;
>>>>        case ACPI_CPU_CMD_DATA_OFFSET_RW:
>>>> @@ -148,6 +149,8 @@ static void cpu_hotplug_wr(void *opaque, hwaddr
>>>> addr, uint64_t data,
>>>>                hotplug_ctrl = qdev_get_hotplug_handler(dev);
>>>>                hotplug_handler_unplug(hotplug_ctrl, dev, NULL);
>>>>                object_unparent(OBJECT(dev));
>>>> +        } else if (data & 16) {
>>>> +            cdev->fw_remove = !cdev->fw_remove;
>>>>            }
>>>>            break;
>>>>        case ACPI_CPU_CMD_OFFSET_WR:
>>>> @@ -332,6 +335,7 @@ const VMStateDescription vmstate_cpu_hotplug = {
>>>>    #define CPU_INSERT_EVENT  "CINS"
>>>>    #define CPU_REMOVE_EVENT  "CRMV"
>>>>    #define CPU_EJECT_EVENT   "CEJ0"
>>>> +#define CPU_FW_EJECT_EVENT "CEJF"
>>>>      void build_cpus_aml(Aml *table, MachineState *machine,
>>>> CPUHotplugFeatures opts,
>>>>                        hwaddr io_base,
>>>> @@ -384,7 +388,10 @@ void build_cpus_aml(Aml *table, MachineState
>>>> *machine, CPUHotplugFeatures opts,
>>>>            aml_append(field, aml_named_field(CPU_REMOVE_EVENT, 1));
>>>>            /* initiates device eject, write only */
>>>>            aml_append(field, aml_named_field(CPU_EJECT_EVENT, 1));
>>>> -        aml_append(field, aml_reserved_field(4));
>>>> +        aml_append(field, aml_reserved_field(1));
>>>> +        /* tell firmware to do device eject, write only */
>>>> +        aml_append(field, aml_named_field(CPU_FW_EJECT_EVENT, 1));
>>>> +        aml_append(field, aml_reserved_field(2));
>>>>            aml_append(field, aml_named_field(CPU_COMMAND, 8));
>>>>            aml_append(cpu_ctrl_dev, field);
>>>>    @@ -419,6 +426,7 @@ void build_cpus_aml(Aml *table, MachineState
>>>> *machine, CPUHotplugFeatures opts,
>>>>            Aml *ins_evt = aml_name("%s.%s", cphp_res_path,
>>>> CPU_INSERT_EVENT);
>>>>            Aml *rm_evt = aml_name("%s.%s", cphp_res_path,
>>>> CPU_REMOVE_EVENT);
>>>>            Aml *ej_evt = aml_name("%s.%s", cphp_res_path,
>>>> CPU_EJECT_EVENT);
>>>> +        Aml *fw_ej_evt = aml_name("%s.%s", cphp_res_path,
>>>> CPU_FW_EJECT_EVENT);
>>>>              aml_append(cpus_dev, aml_name_decl("_HID",
>>>> aml_string("ACPI0010")));
>>>>            aml_append(cpus_dev, aml_name_decl("_CID",
>>>> aml_eisaid("PNP0A05")));
>>>> @@ -461,7 +469,13 @@ void build_cpus_aml(Aml *table, MachineState
>>>> *machine, CPUHotplugFeatures opts,
>>>>                  aml_append(method, aml_acquire(ctrl_lock, 0xFFFF));
>>>>                aml_append(method, aml_store(idx, cpu_selector));
>>>> -            aml_append(method, aml_store(one, ej_evt));
>>>> +            if (opts.fw_unplugs_cpu) {
>>>> +                aml_append(method, aml_store(one, fw_ej_evt));
>>>> +                aml_append(method,
>>>> aml_store(aml_int(OVMF_CPUHP_SMI_CMD),
>>>> +                           aml_name("%s", opts.smi_path)));
>>>> +            } else {
>>>> +                aml_append(method, aml_store(one, ej_evt));
>>>> +            }
>>> My knowledge of AML is rather rudimentary but this looks mostly
>>> reasonable to me.
>>>
>>> One question: the corresponding code for CPU hotplug does not send an
>>> SMI_CMD.
>>> Why the difference?
>>
>> This code (on eject) is executing *after* the OS kernel has processed
>> the event. But on hotplug, the ordering is different (it must be): in
>> that case, the CSCN (scan) method first notifies the firmware, and then
>> the OS.
>>
>> Thanks
>> Laszlo
>>
>>>
>>>                      aml_append(while_ctx,
>>>                          aml_store(aml_derefof(aml_index(new_cpus,
>>> cpu_idx)),
>>>                                    uid));
>>>                      aml_append(while_ctx,
>>>                          aml_call2(CPU_NOTIFY_METHOD, uid, dev_chk));
>>>                      aml_append(while_ctx, aml_store(uid, cpu_selector));
>>>                      aml_append(while_ctx, aml_store(one, ins_evt));
>>>                      aml_append(while_ctx, aml_increment(cpu_idx));
>>>
>>>    
>>>>                aml_append(method, aml_release(ctrl_lock));
>>>>            }
>>>>            aml_append(cpus_dev, method);
>>>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>>>> index 1f5c211245..475e76f514 100644
>>>> --- a/hw/i386/acpi-build.c
>>>> +++ b/hw/i386/acpi-build.c
>>>> @@ -96,6 +96,7 @@ typedef struct AcpiPmInfo {
>>>>        bool s4_disabled;
>>>>        bool pcihp_bridge_en;
>>>>        bool smi_on_cpuhp;
>>>> +    bool smi_on_cpu_unplug;
>>>>        bool pcihp_root_en;
>>>>        uint8_t s4_val;
>>>>        AcpiFadtData fadt;
>>>> @@ -197,6 +198,7 @@ static void acpi_get_pm_info(MachineState
>>>> *machine, AcpiPmInfo *pm)
>>>>        pm->pcihp_io_base = 0;
>>>>        pm->pcihp_io_len = 0;
>>>>        pm->smi_on_cpuhp = false;
>>>> +    pm->smi_on_cpu_unplug = false;
>>>>          assert(obj);
>>>>        init_common_fadt_data(machine, obj, &pm->fadt);
>>>> @@ -220,6 +222,8 @@ static void acpi_get_pm_info(MachineState
>>>> *machine, AcpiPmInfo *pm)
>>>>            pm->cpu_hp_io_base = ICH9_CPU_HOTPLUG_IO_BASE;
>>>>            pm->smi_on_cpuhp =
>>>>                !!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT));
>>>> +        pm->smi_on_cpu_unplug =
>>>> +            !!(smi_features &
>>>> BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT));
>>>>        }
>>>>          /* The above need not be conditional on machine type because
>>>> the reset port
>>>> @@ -1582,6 +1586,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>>>>            CPUHotplugFeatures opts = {
>>>>                .acpi_1_compatible = true, .has_legacy_cphp = true,
>>>>                .smi_path = pm->smi_on_cpuhp ? "\\_SB.PCI0.SMI0.SMIC" :
>>>> NULL,
>>>> +            .fw_unplugs_cpu = pm->smi_on_cpu_unplug,
>>>>            };
>>>>            build_cpus_aml(dsdt, machine, opts, pm->cpu_hp_io_base,
>>>>                           "\\_SB.PCI0", "\\_GPE._E02");
>>>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>>>> index 17b514d1da..2952a00fe6 100644
>>>> --- a/hw/i386/pc.c
>>>> +++ b/hw/i386/pc.c
>>>> @@ -99,6 +99,7 @@
>>>>      GlobalProperty pc_compat_5_1[] = {
>>>>        { "ICH9-LPC", "x-smi-cpu-hotplug", "off" },
>>>> +    { "ICH9-LPC", "x-smi-cpu-hotunplug", "off" },
>>>>    };
>>>>    const size_t pc_compat_5_1_len = G_N_ELEMENTS(pc_compat_5_1);
>>>>    diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
>>>> index 087a18d04d..8c667b7166 100644
>>>> --- a/hw/isa/lpc_ich9.c
>>>> +++ b/hw/isa/lpc_ich9.c
>>>> @@ -770,7 +770,7 @@ static Property ich9_lpc_properties[] = {
>>>>        DEFINE_PROP_BIT64("x-smi-cpu-hotplug", ICH9LPCState,
>>>> smi_host_features,
>>>>                          ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT, true),
>>>>        DEFINE_PROP_BIT64("x-smi-cpu-hotunplug", ICH9LPCState,
>>>> smi_host_features,
>>>> -                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, false),
>>>> +                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, true),
>>>>        DEFINE_PROP_END_OF_LIST(),
>>>>    };
>>>>     
>>>
>>> Thanks for sending out the patch btw. This helped me crystallize some of
>>> the
>>> corresponding OVMF code.
>>>
>>> Ankur
>>>    
>>
>
Ankur Arora Nov. 27, 2020, 4:10 a.m. UTC | #10
On 2020-11-26 12:38 p.m., Igor Mammedov wrote:
> On Thu, 26 Nov 2020 12:17:27 +0100
> Laszlo Ersek <lersek@redhat.com> wrote:
> 
>> On 11/24/20 13:25, Igor Mammedov wrote:
>>> If firmware negotiates ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT feature,
>>> OSPM on CPU eject will set bit #4 in CPU hotplug block for to be
>>> ejected CPU to mark it for removal by firmware and trigger SMI
>>> upcall to let firmware do actual eject.
>>>
>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
>>> ---
>>> PS:
>>>    - abuse 5.1 machine type for now to turn off unplug feature
>>>      (it will be moved to 5.2 machine type once new merge window is open)
>>> ---
>>>   include/hw/acpi/cpu.h           |  2 ++
>>>   docs/specs/acpi_cpu_hotplug.txt | 11 +++++++++--
>>>   hw/acpi/cpu.c                   | 18 ++++++++++++++++--
>>>   hw/i386/acpi-build.c            |  5 +++++
>>>   hw/i386/pc.c                    |  1 +
>>>   hw/isa/lpc_ich9.c               |  2 +-
>>>   6 files changed, 34 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
>>> index 0eeedaa491..999caaf510 100644
>>> --- a/include/hw/acpi/cpu.h
>>> +++ b/include/hw/acpi/cpu.h
>>> @@ -22,6 +22,7 @@ typedef struct AcpiCpuStatus {
>>>       uint64_t arch_id;
>>>       bool is_inserting;
>>>       bool is_removing;
>>> +    bool fw_remove;
>>>       uint32_t ost_event;
>>>       uint32_t ost_status;
>>>   } AcpiCpuStatus;
>>> @@ -50,6 +51,7 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object *owner,
>>>   typedef struct CPUHotplugFeatures {
>>>       bool acpi_1_compatible;
>>>       bool has_legacy_cphp;
>>> +    bool fw_unplugs_cpu;
>>>       const char *smi_path;
>>>   } CPUHotplugFeatures;
>>>
>>> diff --git a/docs/specs/acpi_cpu_hotplug.txt b/docs/specs/acpi_cpu_hotplug.txt
>>> index 9bb22d1270..f68ef6e06c 100644
>>> --- a/docs/specs/acpi_cpu_hotplug.txt
>>> +++ b/docs/specs/acpi_cpu_hotplug.txt
>>> @@ -57,7 +57,11 @@ read access:
>>>                 It's valid only when bit 0 is set.
>>>              2: Device remove event, used to distinguish device for which
>>>                 no device eject request to OSPM was issued.
>>> -           3-7: reserved and should be ignored by OSPM
>>> +           3: reserved and should be ignored by OSPM
>>> +           4: if set to 1, OSPM requests firmware to perform device eject,
>>> +              firmware shall clear this event by writing 1 into it before
>>
>> (1) s/clear this event/clear this event bit/
>>
>>> +              performing device eject.
>>
>> (2) move the second and third lines ("firmware shall clear....") over to
>> the write documentation, below? In particular:
>>
>>> +           5-7: reserved and should be ignored by OSPM
>>>       [0x5-0x7] reserved
>>>       [0x8] Command data: (DWORD access)
>>>             contains 0 unless value last stored in 'Command field' is one of:
>>> @@ -82,7 +86,10 @@ write access:
>>>                  selected CPU device
>>>               3: if set to 1 initiates device eject, set by OSPM when it
>>>                  triggers CPU device removal and calls _EJ0 method
>>> -            4-7: reserved, OSPM must clear them before writing to register
>>> +            4: if set to 1 OSPM hands over device eject to firmware,
>>> +               Firmware shall issue device eject request as described above
>>> +               (bit #3) and OSPM should not touch device eject bit (#3),
>>
>> (3) it would be clearer if we documented the exact bit writing order
>> here:
>> - clear bit#4, *then* set bit#3 (two write accesses)
>> - versus clear bit#4 *and* set bit#3 (single access)
> 
> I was thinking that FW should not bother with clearing bit #4,
> and QEMU should clear it when handling write to bit #3.
> (it looks like I forgot to actually do that)

Why involve the firmware with bit #3 at all? If the firmware only reads bit #4
to detect fw_remove and then write (and thus reset) bit #4, isn't that
good enough?


> 
>>
>>
>>
>>> +            5-7: reserved, OSPM must clear them before writing to register
>>>       [0x5] Command field: (1 byte access)
>>>             value:
>>>               0: selects a CPU device with inserting/removing events and
>>> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
>>> index f099b50927..09d2f20dae 100644
>>> --- a/hw/acpi/cpu.c
>>> +++ b/hw/acpi/cpu.c
>>> @@ -71,6 +71,7 @@ static uint64_t cpu_hotplug_rd(void *opaque, hwaddr addr, unsigned size)
>>>           val |= cdev->cpu ? 1 : 0;
>>>           val |= cdev->is_inserting ? 2 : 0;
>>>           val |= cdev->is_removing  ? 4 : 0;
>>> +        val |= cdev->fw_remove  ? 16 : 0;
>>>           trace_cpuhp_acpi_read_flags(cpu_st->selector, val);
>>>           break;
>>>       case ACPI_CPU_CMD_DATA_OFFSET_RW:
>>> @@ -148,6 +149,8 @@ static void cpu_hotplug_wr(void *opaque, hwaddr addr, uint64_t data,
>>>               hotplug_ctrl = qdev_get_hotplug_handler(dev);
>>>               hotplug_handler_unplug(hotplug_ctrl, dev, NULL);
>>>               object_unparent(OBJECT(dev));
>>> +        } else if (data & 16) {
>>> +            cdev->fw_remove = !cdev->fw_remove;
>>
>> hm... so I guess the ACPI code will first write bit#4 to flip
>> "fw_remove" from "off" to "on". Then the firmware will write bit#4 to
>> flip "fw_remove" back  to "off". And finally, the firmware will write
>> bit#3 (strictly as a separate access) to unplug the CPU.
> sorry for confusion in doc vs impl, FW should only read bit #4, as for bit #3 only write.
>   
>> (4) But anyway, taking a step back: what do we need the new bit for?
>>
>>>           }
>>>           break;
>>>       case ACPI_CPU_CMD_OFFSET_WR:
>>> @@ -332,6 +335,7 @@ const VMStateDescription vmstate_cpu_hotplug = {
>>>   #define CPU_INSERT_EVENT  "CINS"
>>>   #define CPU_REMOVE_EVENT  "CRMV"
>>>   #define CPU_EJECT_EVENT   "CEJ0"
>>> +#define CPU_FW_EJECT_EVENT "CEJF"
>>>
>>>   void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>>>                       hwaddr io_base,
>>> @@ -384,7 +388,10 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>>>           aml_append(field, aml_named_field(CPU_REMOVE_EVENT, 1));
>>>           /* initiates device eject, write only */
>>>           aml_append(field, aml_named_field(CPU_EJECT_EVENT, 1));
>>> -        aml_append(field, aml_reserved_field(4));
>>> +        aml_append(field, aml_reserved_field(1));
>>> +        /* tell firmware to do device eject, write only */
>>> +        aml_append(field, aml_named_field(CPU_FW_EJECT_EVENT, 1));
>>> +        aml_append(field, aml_reserved_field(2));
>>>           aml_append(field, aml_named_field(CPU_COMMAND, 8));
>>>           aml_append(cpu_ctrl_dev, field);
>>>
>>> @@ -419,6 +426,7 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>>>           Aml *ins_evt = aml_name("%s.%s", cphp_res_path, CPU_INSERT_EVENT);
>>>           Aml *rm_evt = aml_name("%s.%s", cphp_res_path, CPU_REMOVE_EVENT);
>>>           Aml *ej_evt = aml_name("%s.%s", cphp_res_path, CPU_EJECT_EVENT);
>>> +        Aml *fw_ej_evt = aml_name("%s.%s", cphp_res_path, CPU_FW_EJECT_EVENT);
>>>
>>>           aml_append(cpus_dev, aml_name_decl("_HID", aml_string("ACPI0010")));
>>>           aml_append(cpus_dev, aml_name_decl("_CID", aml_eisaid("PNP0A05")));
>>> @@ -461,7 +469,13 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>>>
>>>               aml_append(method, aml_acquire(ctrl_lock, 0xFFFF));
>>>               aml_append(method, aml_store(idx, cpu_selector));
>>> -            aml_append(method, aml_store(one, ej_evt));
>>> +            if (opts.fw_unplugs_cpu) {
>>> +                aml_append(method, aml_store(one, fw_ej_evt));
>>> +                aml_append(method, aml_store(aml_int(OVMF_CPUHP_SMI_CMD),
>>> +                           aml_name("%s", opts.smi_path)));
>>> +            } else {
>>> +                aml_append(method, aml_store(one, ej_evt));
>>> +            }
>>>               aml_append(method, aml_release(ctrl_lock));
>>>           }
>>>           aml_append(cpus_dev, method);
>>
>> Hmmm, OK, let me parse this.
>>
>> Assume there is a big bunch of device_del QMP commands, QEMU marks the
>> "remove" event pending on the corresponding set of CPUs, plus also makes
>> the ACPI interrupt pending. The ACPI interrupt handler in the OS runs,
>> and calls CSCN. CSCN runs a loop, and for each CPU where the remove
>> event is pending, notifies the OS one by one. The OS in turn forgets
>> about the subject CPU, and calls the _EJ0 method on the affected CPU
>> ACPI object. The _EJ0 method on the CPU ACPI object calls CEJ0, passing
>> in the affected CPU's identifier.
>>
>> The above hunk modifies the CEJ0 method.
>>
>> (5) Question: pre-patch, both the CSCN method and the CEJ0 method
>> acquire the CPLK lock, but CEJ0 is actually called within CSCN
>> (indirectly, with the OS's cooperation). Is CPLK a recursive lock?
> Theoretically scep supports recursive mutexes but I don't think it's the case here.
> 
> Considering it works currently, I think OS implements Notify event as async.
> hence no clash wrt mutex. If EJ0 were handled within CSCN context,
> EJ0 would mess cpu_selector value that CSCN is also using.

 From my read of the Linux code, yeah, the EJ0 execution happens in an
async worker on CPU 0 which first detaches the CPU and then executes EJ0.  
>> Anyway, let's see the CEJ0 modification. After the OS is done forgetting
>> about the CPU, the CEJ0 method no longer unplugs the CPU, instead it
>> sets the new bit#4 in the register block, and raises an SMI.
>>
>> (6) So that's one SMI per CPU being removed. Is that OK?
> 
> I guess it has performance penalty but there is nothing we can do about it,
> OSPM does EJ0 calls asynchronously.
>   
>> (7) What if there are asynchronous plugs going on, and the firmware
>> notices them in the register block? ... Hm, I hope that should be OK,
>> because ultimately the CSCN method will learn about those too, and
>> inform the OS. On plug, the firmware doesn't modify the register block.
> shouldn't be issue (modulo bugs, I haven't tried to hot add and hot remove
> the same CPU at the same time)

Yeah I was wondering what would happen for simultaneous hot add and remove.
I guess we would always do remove first and then the add, unless we hit
the break due to max_cpus_per_pass and switch to hot-add mode.

> 
> i.e.
> (QEMU) pause
> (QEMU) device_add
> (QEMU) device_del
> (QEMU) cont
> 
>> Ah! OK. I think I understand why bit#4 is important. The firmware may
>> notice pending remove events, but it must not act upon them -- it must
>> simply ignore them -- unless bit#4 is also set. Bit#2 set with bit#4
>> clear means the event is pending (QEMU got a device_del), but the OS has
>> not forgotten about the CPU yet -- so the firmware must not unplug it
>> yet. When the modified CEJ0 method runs, it sets bit#4 in addition to
>> the already set bit#2, advertising that the OS has *already* abandoned
>> the CPU.
> firmware should ignore bit #2, it doesn't mean anything to it, OSPM might
> ignore or nonsupport CPU removal. What firmware must care about is bit #4,
> which tells it that OSPM is done with CPU and asks for to be removed by firmware.

In my other mail, I was suggesting that the guest OS not reset bit #2 but
on second thoughts, this makes sense.


Thanks
Ankur

> 
>>
>> This means we'll have to modify the QemuCpuhpCollectApicIds() function
>> in OVMF as well -- for collecting a CPU for unplug, just bit#2
>> (QEMU_CPUHP_STAT_REMOVE) is insufficient -- on such CPUs, the OS may
>> still be executing threads.
>>
>> OK, this approach sounds plausible to me.
>>
>> (8) Please extend the description of bit#2 in the "status flags read
>> access" section: "firmware must ignore bit#2 unless bit#4 is set".
>>
>>
>>
>>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>>> index 1f5c211245..475e76f514 100644
>>> --- a/hw/i386/acpi-build.c
>>> +++ b/hw/i386/acpi-build.c
>>> @@ -96,6 +96,7 @@ typedef struct AcpiPmInfo {
>>>       bool s4_disabled;
>>>       bool pcihp_bridge_en;
>>>       bool smi_on_cpuhp;
>>> +    bool smi_on_cpu_unplug;
>>>       bool pcihp_root_en;
>>>       uint8_t s4_val;
>>>       AcpiFadtData fadt;
>>> @@ -197,6 +198,7 @@ static void acpi_get_pm_info(MachineState *machine, AcpiPmInfo *pm)
>>>       pm->pcihp_io_base = 0;
>>>       pm->pcihp_io_len = 0;
>>>       pm->smi_on_cpuhp = false;
>>> +    pm->smi_on_cpu_unplug = false;
>>>
>>>       assert(obj);
>>>       init_common_fadt_data(machine, obj, &pm->fadt);
>>> @@ -220,6 +222,8 @@ static void acpi_get_pm_info(MachineState *machine, AcpiPmInfo *pm)
>>>           pm->cpu_hp_io_base = ICH9_CPU_HOTPLUG_IO_BASE;
>>>           pm->smi_on_cpuhp =
>>>               !!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT));
>>> +        pm->smi_on_cpu_unplug =
>>> +            !!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT));
>>>       }
>>>
>>>       /* The above need not be conditional on machine type because the reset port
>>> @@ -1582,6 +1586,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>>>           CPUHotplugFeatures opts = {
>>>               .acpi_1_compatible = true, .has_legacy_cphp = true,
>>>               .smi_path = pm->smi_on_cpuhp ? "\\_SB.PCI0.SMI0.SMIC" : NULL,
>>> +            .fw_unplugs_cpu = pm->smi_on_cpu_unplug,
>>>           };
>>>           build_cpus_aml(dsdt, machine, opts, pm->cpu_hp_io_base,
>>>                          "\\_SB.PCI0", "\\_GPE._E02");
>>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>>> index 17b514d1da..2952a00fe6 100644
>>> --- a/hw/i386/pc.c
>>> +++ b/hw/i386/pc.c
>>> @@ -99,6 +99,7 @@
>>>
>>>   GlobalProperty pc_compat_5_1[] = {
>>>       { "ICH9-LPC", "x-smi-cpu-hotplug", "off" },
>>> +    { "ICH9-LPC", "x-smi-cpu-hotunplug", "off" },
>>>   };
>>>   const size_t pc_compat_5_1_len = G_N_ELEMENTS(pc_compat_5_1);
>>>
>>> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
>>> index 087a18d04d..8c667b7166 100644
>>> --- a/hw/isa/lpc_ich9.c
>>> +++ b/hw/isa/lpc_ich9.c
>>> @@ -770,7 +770,7 @@ static Property ich9_lpc_properties[] = {
>>>       DEFINE_PROP_BIT64("x-smi-cpu-hotplug", ICH9LPCState, smi_host_features,
>>>                         ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT, true),
>>>       DEFINE_PROP_BIT64("x-smi-cpu-hotunplug", ICH9LPCState, smi_host_features,
>>> -                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, false),
>>> +                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, true),
>>>       DEFINE_PROP_END_OF_LIST(),
>>>   };
>>>
>>>   
>>
>> (9) You have to extend smi_features_ok_callback() as well -- it is
>> invalid for the firmware to negotiate unplug, without negotiating plug.
>>
>> In fact, as far as I can tell, that would even crash QEMU, given this
>> patch. Because, "opts.smi_path" would be set to NULL, but
>> "opts.fw_unplugs_cpu" would be set to "true". As a consequence, the
>> CPU_EJECT_METHOD change above would call aml_name("%s", NULL).
>>
>> So something like the following looks necessary:
> 
> Thanks for suggestions,
> I'll respin v2 with your feedback included.
> 
>>
>>> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
>>> index 8c667b7166c7..5bc3f212fe77 100644
>>> --- a/hw/isa/lpc_ich9.c
>>> +++ b/hw/isa/lpc_ich9.c
>>> @@ -366,6 +366,7 @@ static void smi_features_ok_callback(void *opaque)
>>>   {
>>>       ICH9LPCState *lpc = opaque;
>>>       uint64_t guest_features;
>>> +    uint64_t guest_cpu_hotplug_features;
>>>
>>>       if (lpc->smi_features_ok) {
>>>           /* negotiation already complete, features locked */
>>> @@ -378,9 +379,11 @@ static void smi_features_ok_callback(void *opaque)
>>>           /* guest requests invalid features, leave @features_ok at zero */
>>>           return;
>>>       }
>>> +    guest_cpu_hotplug_features = guest_features &
>>> +                                 (BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT) |
>>> +                                  BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT));
>>>       if (!(guest_features & BIT_ULL(ICH9_LPC_SMI_F_BROADCAST_BIT)) &&
>>> -        guest_features & (BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT) |
>>> -                          BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT))) {
>>> +        guest_cpu_hotplug_features) {
>>>           /*
>>>            * cpu hot-[un]plug with SMI requires SMI broadcast,
>>>            * leave @features_ok at zero
>>> @@ -388,6 +391,12 @@ static void smi_features_ok_callback(void *opaque)
>>>           return;
>>>       }
>>>
>>> +    if (guest_cpu_hotplug_features ==
>>> +        BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT)) {
>>> +        /* cpu hot-unplug is unsupported without cpu-hotplug */
>>> +        return;
>>> +    }
>>> +
>>>       /* valid feature subset requested, lock it down, report success */
>>>       lpc->smi_negotiated_features = guest_features;
>>>       lpc->smi_features_ok = 1;
>>
>>
>> (10) It would be nice to separate this work into multiple patches. I
>> propose:
>>
>> - [PATCH 1/5] x86: ich9: factor out "guest_cpu_hotplug_features"
>>
>>>   hw/isa/lpc_ich9.c | 7 +++++--
>>>   1 file changed, 5 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
>>> index 087a18d04de4..c46eefd13fd4 100644
>>> --- a/hw/isa/lpc_ich9.c
>>> +++ b/hw/isa/lpc_ich9.c
>>> @@ -366,6 +366,7 @@ static void smi_features_ok_callback(void *opaque)
>>>   {
>>>       ICH9LPCState *lpc = opaque;
>>>       uint64_t guest_features;
>>> +    uint64_t guest_cpu_hotplug_features;
>>>
>>>       if (lpc->smi_features_ok) {
>>>           /* negotiation already complete, features locked */
>>> @@ -378,9 +379,11 @@ static void smi_features_ok_callback(void *opaque)
>>>           /* guest requests invalid features, leave @features_ok at zero */
>>>           return;
>>>       }
>>> +    guest_cpu_hotplug_features = guest_features &
>>> +                                 (BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT) |
>>> +                                  BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT));
>>>       if (!(guest_features & BIT_ULL(ICH9_LPC_SMI_F_BROADCAST_BIT)) &&
>>> -        guest_features & (BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT) |
>>> -                          BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT))) {
>>> +        guest_cpu_hotplug_features) {
>>>           /*
>>>            * cpu hot-[un]plug with SMI requires SMI broadcast,
>>>            * leave @features_ok at zero
>>
>>
>> - [PATCH 2/5] x86: ich9: let firmware negotiate 'CPU hot-unplug with SMI' feature
>>
>>>   hw/i386/pc.c      | 1 +
>>>   hw/isa/lpc_ich9.c | 8 +++++++-
>>>   2 files changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>>> index 17b514d1da50..2952a00fe694 100644
>>> --- a/hw/i386/pc.c
>>> +++ b/hw/i386/pc.c
>>> @@ -99,6 +99,7 @@
>>>
>>>   GlobalProperty pc_compat_5_1[] = {
>>>       { "ICH9-LPC", "x-smi-cpu-hotplug", "off" },
>>> +    { "ICH9-LPC", "x-smi-cpu-hotunplug", "off" },
>>>   };
>>>   const size_t pc_compat_5_1_len = G_N_ELEMENTS(pc_compat_5_1);
>>>
>>> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
>>> index c46eefd13fd4..5bc3f212fe77 100644
>>> --- a/hw/isa/lpc_ich9.c
>>> +++ b/hw/isa/lpc_ich9.c
>>> @@ -391,6 +391,12 @@ static void smi_features_ok_callback(void *opaque)
>>>           return;
>>>       }
>>>
>>> +    if (guest_cpu_hotplug_features ==
>>> +        BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT)) {
>>> +        /* cpu hot-unplug is unsupported without cpu-hotplug */
>>> +        return;
>>> +    }
>>> +
>>>       /* valid feature subset requested, lock it down, report success */
>>>       lpc->smi_negotiated_features = guest_features;
>>>       lpc->smi_features_ok = 1;
>>> @@ -773,7 +779,7 @@ static Property ich9_lpc_properties[] = {
>>>       DEFINE_PROP_BIT64("x-smi-cpu-hotplug", ICH9LPCState, smi_host_features,
>>>                         ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT, true),
>>>       DEFINE_PROP_BIT64("x-smi-cpu-hotunplug", ICH9LPCState, smi_host_features,
>>> -                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, false),
>>> +                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, true),
>>>       DEFINE_PROP_END_OF_LIST(),
>>>   };
>>>   
>>
>>
>> - [PATCH 3/5] x86: acpi: introduce AcpiPmInfo::smi_on_cpu_unplug
>>
>>>   hw/i386/acpi-build.c | 4 ++++
>>>   1 file changed, 4 insertions(+)
>>>
>>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>>> index 1f5c2112452a..9036e5594c92 100644
>>> --- a/hw/i386/acpi-build.c
>>> +++ b/hw/i386/acpi-build.c
>>> @@ -96,6 +96,7 @@ typedef struct AcpiPmInfo {
>>>       bool s4_disabled;
>>>       bool pcihp_bridge_en;
>>>       bool smi_on_cpuhp;
>>> +    bool smi_on_cpu_unplug;
>>>       bool pcihp_root_en;
>>>       uint8_t s4_val;
>>>       AcpiFadtData fadt;
>>> @@ -197,6 +198,7 @@ static void acpi_get_pm_info(MachineState *machine, AcpiPmInfo *pm)
>>>       pm->pcihp_io_base = 0;
>>>       pm->pcihp_io_len = 0;
>>>       pm->smi_on_cpuhp = false;
>>> +    pm->smi_on_cpu_unplug = false;
>>>
>>>       assert(obj);
>>>       init_common_fadt_data(machine, obj, &pm->fadt);
>>> @@ -220,6 +222,8 @@ static void acpi_get_pm_info(MachineState *machine, AcpiPmInfo *pm)
>>>           pm->cpu_hp_io_base = ICH9_CPU_HOTPLUG_IO_BASE;
>>>           pm->smi_on_cpuhp =
>>>               !!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT));
>>> +        pm->smi_on_cpu_unplug =
>>> +            !!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT));
>>>       }
>>>
>>>       /* The above need not be conditional on machine type because the reset port
>>
>>
>> - [PATCH 4/5] acpi: cpuhp: introduce 'firmware performs eject' status/control bits
>>
>>>   docs/specs/acpi_cpu_hotplug.txt | 11 +++++++++--
>>>   include/hw/acpi/cpu.h           |  1 +
>>>   hw/acpi/cpu.c                   |  3 +++
>>>   3 files changed, 13 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/docs/specs/acpi_cpu_hotplug.txt b/docs/specs/acpi_cpu_hotplug.txt
>>> index 9bb22d1270a9..f68ef6e06c7a 100644
>>> --- a/docs/specs/acpi_cpu_hotplug.txt
>>> +++ b/docs/specs/acpi_cpu_hotplug.txt
>>> @@ -57,7 +57,11 @@ read access:
>>>                 It's valid only when bit 0 is set.
>>>              2: Device remove event, used to distinguish device for which
>>>                 no device eject request to OSPM was issued.
>>> -           3-7: reserved and should be ignored by OSPM
>>> +           3: reserved and should be ignored by OSPM
>>> +           4: if set to 1, OSPM requests firmware to perform device eject,
>>> +              firmware shall clear this event by writing 1 into it before
>>> +              performing device eject.
>>> +           5-7: reserved and should be ignored by OSPM
>>>       [0x5-0x7] reserved
>>>       [0x8] Command data: (DWORD access)
>>>             contains 0 unless value last stored in 'Command field' is one of:
>>> @@ -82,7 +86,10 @@ write access:
>>>                  selected CPU device
>>>               3: if set to 1 initiates device eject, set by OSPM when it
>>>                  triggers CPU device removal and calls _EJ0 method
>>> -            4-7: reserved, OSPM must clear them before writing to register
>>> +            4: if set to 1 OSPM hands over device eject to firmware,
>>> +               Firmware shall issue device eject request as described above
>>> +               (bit #3) and OSPM should not touch device eject bit (#3),
>>> +            5-7: reserved, OSPM must clear them before writing to register
>>>       [0x5] Command field: (1 byte access)
>>>             value:
>>>               0: selects a CPU device with inserting/removing events and
>>> diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
>>> index 0eeedaa491c1..d71edde456f2 100644
>>> --- a/include/hw/acpi/cpu.h
>>> +++ b/include/hw/acpi/cpu.h
>>> @@ -22,6 +22,7 @@ typedef struct AcpiCpuStatus {
>>>       uint64_t arch_id;
>>>       bool is_inserting;
>>>       bool is_removing;
>>> +    bool fw_remove;
>>>       uint32_t ost_event;
>>>       uint32_t ost_status;
>>>   } AcpiCpuStatus;
>>> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
>>> index f099b5092730..3dc83d73e20b 100644
>>> --- a/hw/acpi/cpu.c
>>> +++ b/hw/acpi/cpu.c
>>> @@ -71,6 +71,7 @@ static uint64_t cpu_hotplug_rd(void *opaque, hwaddr addr, unsigned size)
>>>           val |= cdev->cpu ? 1 : 0;
>>>           val |= cdev->is_inserting ? 2 : 0;
>>>           val |= cdev->is_removing  ? 4 : 0;
>>> +        val |= cdev->fw_remove  ? 16 : 0;
>>>           trace_cpuhp_acpi_read_flags(cpu_st->selector, val);
>>>           break;
>>>       case ACPI_CPU_CMD_DATA_OFFSET_RW:
>>> @@ -148,6 +149,8 @@ static void cpu_hotplug_wr(void *opaque, hwaddr addr, uint64_t data,
>>>               hotplug_ctrl = qdev_get_hotplug_handler(dev);
>>>               hotplug_handler_unplug(hotplug_ctrl, dev, NULL);
>>>               object_unparent(OBJECT(dev));
>>> +        } else if (data & 16) {
>>> +            cdev->fw_remove = !cdev->fw_remove;
>>>           }
>>>           break;
>>>       case ACPI_CPU_CMD_OFFSET_WR:
>>
>>
>> - [PATCH 5/5] x86: acpi: let the firmware handle pending "CPU remove" events in SMM
>>
>>>   include/hw/acpi/cpu.h |  1 +
>>>   hw/acpi/cpu.c         | 15 +++++++++++++--
>>>   hw/i386/acpi-build.c  |  1 +
>>>   3 files changed, 15 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
>>> index d71edde456f2..999caaf51060 100644
>>> --- a/include/hw/acpi/cpu.h
>>> +++ b/include/hw/acpi/cpu.h
>>> @@ -51,6 +51,7 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object *owner,
>>>   typedef struct CPUHotplugFeatures {
>>>       bool acpi_1_compatible;
>>>       bool has_legacy_cphp;
>>> +    bool fw_unplugs_cpu;
>>>       const char *smi_path;
>>>   } CPUHotplugFeatures;
>>>
>>> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
>>> index 3dc83d73e20b..09d2f20daec0 100644
>>> --- a/hw/acpi/cpu.c
>>> +++ b/hw/acpi/cpu.c
>>> @@ -335,6 +335,7 @@ const VMStateDescription vmstate_cpu_hotplug = {
>>>   #define CPU_INSERT_EVENT  "CINS"
>>>   #define CPU_REMOVE_EVENT  "CRMV"
>>>   #define CPU_EJECT_EVENT   "CEJ0"
>>> +#define CPU_FW_EJECT_EVENT "CEJF"
>>>
>>>   void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>>>                       hwaddr io_base,
>>> @@ -387,7 +388,10 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>>>           aml_append(field, aml_named_field(CPU_REMOVE_EVENT, 1));
>>>           /* initiates device eject, write only */
>>>           aml_append(field, aml_named_field(CPU_EJECT_EVENT, 1));
>>> -        aml_append(field, aml_reserved_field(4));
>>> +        aml_append(field, aml_reserved_field(1));
>>> +        /* tell firmware to do device eject, write only */
>>> +        aml_append(field, aml_named_field(CPU_FW_EJECT_EVENT, 1));
>>> +        aml_append(field, aml_reserved_field(2));
>>>           aml_append(field, aml_named_field(CPU_COMMAND, 8));
>>>           aml_append(cpu_ctrl_dev, field);
>>>
>>> @@ -422,6 +426,7 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>>>           Aml *ins_evt = aml_name("%s.%s", cphp_res_path, CPU_INSERT_EVENT);
>>>           Aml *rm_evt = aml_name("%s.%s", cphp_res_path, CPU_REMOVE_EVENT);
>>>           Aml *ej_evt = aml_name("%s.%s", cphp_res_path, CPU_EJECT_EVENT);
>>> +        Aml *fw_ej_evt = aml_name("%s.%s", cphp_res_path, CPU_FW_EJECT_EVENT);
>>>
>>>           aml_append(cpus_dev, aml_name_decl("_HID", aml_string("ACPI0010")));
>>>           aml_append(cpus_dev, aml_name_decl("_CID", aml_eisaid("PNP0A05")));
>>> @@ -464,7 +469,13 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>>>
>>>               aml_append(method, aml_acquire(ctrl_lock, 0xFFFF));
>>>               aml_append(method, aml_store(idx, cpu_selector));
>>> -            aml_append(method, aml_store(one, ej_evt));
>>> +            if (opts.fw_unplugs_cpu) {
>>> +                aml_append(method, aml_store(one, fw_ej_evt));
>>> +                aml_append(method, aml_store(aml_int(OVMF_CPUHP_SMI_CMD),
>>> +                           aml_name("%s", opts.smi_path)));
>>> +            } else {
>>> +                aml_append(method, aml_store(one, ej_evt));
>>> +            }
>>>               aml_append(method, aml_release(ctrl_lock));
>>>           }
>>>           aml_append(cpus_dev, method);
>>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>>> index 9036e5594c92..475e76f514ff 100644
>>> --- a/hw/i386/acpi-build.c
>>> +++ b/hw/i386/acpi-build.c
>>> @@ -1586,6 +1586,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>>>           CPUHotplugFeatures opts = {
>>>               .acpi_1_compatible = true, .has_legacy_cphp = true,
>>>               .smi_path = pm->smi_on_cpuhp ? "\\_SB.PCI0.SMI0.SMIC" : NULL,
>>> +            .fw_unplugs_cpu = pm->smi_on_cpu_unplug,
>>>           };
>>>           build_cpus_aml(dsdt, machine, opts, pm->cpu_hp_io_base,
>>>                          "\\_SB.PCI0", "\\_GPE._E02");
>>
>> Thanks!
>> Laszlo
>>
>>
>
Igor Mammedov Nov. 27, 2020, 11:33 a.m. UTC | #11
On Thu, 26 Nov 2020 19:35:30 -0800
Ankur Arora <ankur.a.arora@oracle.com> wrote:

> On 2020-11-26 4:46 a.m., Laszlo Ersek wrote:
> > On 11/26/20 11:24, Ankur Arora wrote:  
> >> On 2020-11-24 4:25 a.m., Igor Mammedov wrote:  
> >>> If firmware negotiates ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT feature,
> >>> OSPM on CPU eject will set bit #4 in CPU hotplug block for to be
> >>> ejected CPU to mark it for removal by firmware and trigger SMI
> >>> upcall to let firmware do actual eject.
> >>>
> >>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> >>> ---
> >>> PS:
> >>>     - abuse 5.1 machine type for now to turn off unplug feature
> >>>       (it will be moved to 5.2 machine type once new merge window is open)
> >>> ---
> >>>    include/hw/acpi/cpu.h           |  2 ++
> >>>    docs/specs/acpi_cpu_hotplug.txt | 11 +++++++++--
> >>>    hw/acpi/cpu.c                   | 18 ++++++++++++++++--
> >>>    hw/i386/acpi-build.c            |  5 +++++
> >>>    hw/i386/pc.c                    |  1 +
> >>>    hw/isa/lpc_ich9.c               |  2 +-
> >>>    6 files changed, 34 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
> >>> index 0eeedaa491..999caaf510 100644
> >>> --- a/include/hw/acpi/cpu.h
> >>> +++ b/include/hw/acpi/cpu.h
> >>> @@ -22,6 +22,7 @@ typedef struct AcpiCpuStatus {
> >>>        uint64_t arch_id;
> >>>        bool is_inserting;
> >>>        bool is_removing;
> >>> +    bool fw_remove;
> >>>        uint32_t ost_event;
> >>>        uint32_t ost_status;
> >>>    } AcpiCpuStatus;
> >>> @@ -50,6 +51,7 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object
> >>> *owner,
> >>>    typedef struct CPUHotplugFeatures {
> >>>        bool acpi_1_compatible;
> >>>        bool has_legacy_cphp;
> >>> +    bool fw_unplugs_cpu;
> >>>        const char *smi_path;
> >>>    } CPUHotplugFeatures;
> >>>    diff --git a/docs/specs/acpi_cpu_hotplug.txt
> >>> b/docs/specs/acpi_cpu_hotplug.txt
> >>> index 9bb22d1270..f68ef6e06c 100644
> >>> --- a/docs/specs/acpi_cpu_hotplug.txt
> >>> +++ b/docs/specs/acpi_cpu_hotplug.txt
> >>> @@ -57,7 +57,11 @@ read access:
> >>>                  It's valid only when bit 0 is set.
> >>>               2: Device remove event, used to distinguish device for which
> >>>                  no device eject request to OSPM was issued.
> >>> -           3-7: reserved and should be ignored by OSPM
> >>> +           3: reserved and should be ignored by OSPM
> >>> +           4: if set to 1, OSPM requests firmware to perform device
> >>> eject,
> >>> +              firmware shall clear this event by writing 1 into it
> >>> before
> >>> +              performing device eject> +           5-7: reserved and
> >>> should be ignored by OSPM
> >>>        [0x5-0x7] reserved
> >>>        [0x8] Command data: (DWORD access)
> >>>              contains 0 unless value last stored in 'Command field' is
> >>> one of:
> >>> @@ -82,7 +86,10 @@ write access:
> >>>                   selected CPU device
> >>>                3: if set to 1 initiates device eject, set by OSPM when it
> >>>                   triggers CPU device removal and calls _EJ0 method
> >>> -            4-7: reserved, OSPM must clear them before writing to
> >>> register
> >>> +            4: if set to 1 OSPM hands over device eject to firmware,
> >>> +               Firmware shall issue device eject request as described
> >>> above
> >>> +               (bit #3) and OSPM should not touch device eject bit (#3),
> >>> +            5-7: reserved, OSPM must clear them before writing to
> >>> register
> >>>        [0x5] Command field: (1 byte access)
> >>>              value:
> >>>                0: selects a CPU device with inserting/removing events and
> >>> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> >>> index f099b50927..09d2f20dae 100644
> >>> --- a/hw/acpi/cpu.c
> >>> +++ b/hw/acpi/cpu.c
> >>> @@ -71,6 +71,7 @@ static uint64_t cpu_hotplug_rd(void *opaque, hwaddr
> >>> addr, unsigned size)
> >>>            val |= cdev->cpu ? 1 : 0;
> >>>            val |= cdev->is_inserting ? 2 : 0;
> >>>            val |= cdev->is_removing  ? 4 : 0;
> >>> +        val |= cdev->fw_remove  ? 16 : 0;  
> >>
> >> I might be missing something but I don't see where cdev->fw_remove is being
> >> set.  
> > 
> > See just below, in the cpu_hotplug_wr() hunk. When bit#4 is written --
> > which happens through the ACPI code change --, fw_remove is inverted.  
> Thanks that makes sense. I was reading the AML building code all wrong.
> 
> > 
> >   
> >> We do set cdev->is_removing in acpi_cpu_unplug_request_cb() so AFAICS
> >> we would always end up setting this bit:  
> >>>            val |= cdev->is_removing  ? 4 : 0;  
> >>
> >> Also, if cdev->fw_remove and cdev->is_removing are both true, val would be
> >> (4 | 16). I'm guessing that in that case the AML determines which case gets
> >> handled but it might make sense to set just one of these?  
> > 
> > "is_removing" is set directly in response to the device_del QMP command.
> > That QMP command is asynchronous to the execution of the guest OS.
> > j
> > "fw_remove" is set (by virtue of inverting) by ACPI CEJ0, which is
> > executed by the guest OS's ACPI interpreter, after the guest OS has
> > de-scheduled all processes from the CPU being removed (= basically after
> > the OS has willfully forgotten about the CPU).
> > 
> > Therefore, considering the bitmask (is_removing, fw_remove), three
> > variations make sense:  
> 
> Just annotating these with the corresponding ACPI code to make sure
> I have it straight. Please correct if my interpretation is wrong. Also,
> a few questions inline:
> 
> > 
> > #1 (is_removing=0, fw_remove=0) -- normal status; no unplug requested
> > 
> > #2 (is_removing=1, fw_remove=0) -- unplug requested via QMP, guest OS
> >                                     is processing the request  
> 
> Guest executes the CSCN method and reads rm_evt (bit 2) (thus noticing
> the is_removing=1), and then notifies the CPU to be removed via the
> CTFY method.
> 
>     ifctx = aml_if(aml_equal(rm_evt, one));
>     {
>             aml_append(ifctx,
>                        aml_call2(CPU_NOTIFY_METHOD, uid, eject_req));
>             aml_append(ifctx, aml_store(one, rm_evt));
>             aml_append(ifctx, aml_store(one, has_event));
>     }
> 
> Then it does a store to rm_evt (bit 2). That would result in clearing
> of is_removing. (Igor mentions that in a separate mail.)
> 
> 1. Do we need to clear is_removing at all? AFAICS, it's only useful as
> an ack to QEMU and I can't think of why that's useful. OTOH it
> doesn't serve any useful purpose once the guest OS has seen the request.
no firmware doesn't need to care about it, it's consumed by OSPM only
 
> 2. Would it make sense to clear it first and then call CPU_NOTIFY_METHOD?
> CPU_NOTIFY_METHOD (or _EJ0, COST) don't depend on is_removing but
> that might change in the future.

all methods are protected by be same mutex, so if _EJ0 is called while CSCN
in progress it will wait till CSCN is finished.
But clearing bit #2 before Notify should work too.

 
> The notify would end up in calling acpi_hotplug_schedule() which would be
> responsible for queuing work (on CPU0) to detach+unplug the CPU.
> 
> Once the OS level detach succeeds, the worker evaluates the "_EJ0" method
> which would do the actual CPU_EJECT_METHOD work.
> 
> If the detach fails then it evaluates the CPU_OST_METHOD which updates
> the status for the event and the status.
> 
> At this point the state is back to:
> 
> (is_removing=0, fw_remove=0)
if OSPM fails to release CPU for whatever reasons, it's valid
state, we just notify user using OST event that requested unplug wasn't successful.

> 
> > #3 (is_removing=1, fw_remove=1) -- guest OS removed all references from
> >                                     the CPU, firmware is permitted /
> >                                     required to forget about the CPU as
> >                                     well, and then unplug the CPU  
> 
> CPU_EJECT_METHOD will do a store to bit 4, which would invert (and
> thus set) fw_remove and then do the SMI.
> 
> So, this would be
> > #3 (is_removing=0, fw_remove=1)  
> 
> At this point the firmware calls QemuCPUhpCollectApicIds() which
> (after changes) notices CPU(s) with fw_remove set.
> 
> Collects them and does a store to bit 4, which would clear fw_remove.

I'd skip this step on firmware side and make QEMU clear it
when CPU is ejected.

> 
> > 
> > #4 (is_removing=1, fw_remove=0) -- fimware is about to unplug the CPU
> > 
> > #5 (is_removing=0, fw_remove=0) -- firmware performing unplug  
> Firmware does an unplug and writes to bit 3, thus clearing is_removing.
> 
> On return from the firmware the guest evaluates the COST again.
it's optional and depends on OSPM implementation (some do not call it on success)


> And, eventually goes back to the CSCN where it processes more
> hotplug or unplug events.
CSCN in case of unplug finishes first, and only after that EJ0 calls
are processed

> > The variation (is_removing=0, fw_remove=1) is invalid / unused.  
> 
> /nods
> > 
> > 
> > The firmware may be investigating the CPU register block between steps
> > #2 and #3 -- in other words, the firmware may see a CPU for which
> > is_remove is set (unplug requested via QMP), but the OS has not vacated
> > yet (fw_remove=0). In that case, the firmware must just skip the CPU --
> > once the OS is done, it will set fw_remove too, and raise another SMI.  
> Yeah, it makes sense for the firmware to only care about a CPU once it
> sees fw_remove=1. (And as currently situated, the firmware would never
> see is_removing=1 at all.)
> 
> 
> Thanks
> Ankur
> 
> > 
> >   
> >>
> >>  
> >>>            trace_cpuhp_acpi_read_flags(cpu_st->selector, val);
> >>>            break;
> >>>        case ACPI_CPU_CMD_DATA_OFFSET_RW:
> >>> @@ -148,6 +149,8 @@ static void cpu_hotplug_wr(void *opaque, hwaddr
> >>> addr, uint64_t data,
> >>>                hotplug_ctrl = qdev_get_hotplug_handler(dev);
> >>>                hotplug_handler_unplug(hotplug_ctrl, dev, NULL);
> >>>                object_unparent(OBJECT(dev));
> >>> +        } else if (data & 16) {
> >>> +            cdev->fw_remove = !cdev->fw_remove;
> >>>            }
> >>>            break;
> >>>        case ACPI_CPU_CMD_OFFSET_WR:
> >>> @@ -332,6 +335,7 @@ const VMStateDescription vmstate_cpu_hotplug = {
> >>>    #define CPU_INSERT_EVENT  "CINS"
> >>>    #define CPU_REMOVE_EVENT  "CRMV"
> >>>    #define CPU_EJECT_EVENT   "CEJ0"
> >>> +#define CPU_FW_EJECT_EVENT "CEJF"
> >>>      void build_cpus_aml(Aml *table, MachineState *machine,
> >>> CPUHotplugFeatures opts,
> >>>                        hwaddr io_base,
> >>> @@ -384,7 +388,10 @@ void build_cpus_aml(Aml *table, MachineState
> >>> *machine, CPUHotplugFeatures opts,
> >>>            aml_append(field, aml_named_field(CPU_REMOVE_EVENT, 1));
> >>>            /* initiates device eject, write only */
> >>>            aml_append(field, aml_named_field(CPU_EJECT_EVENT, 1));
> >>> -        aml_append(field, aml_reserved_field(4));
> >>> +        aml_append(field, aml_reserved_field(1));
> >>> +        /* tell firmware to do device eject, write only */
> >>> +        aml_append(field, aml_named_field(CPU_FW_EJECT_EVENT, 1));
> >>> +        aml_append(field, aml_reserved_field(2));
> >>>            aml_append(field, aml_named_field(CPU_COMMAND, 8));
> >>>            aml_append(cpu_ctrl_dev, field);
> >>>    @@ -419,6 +426,7 @@ void build_cpus_aml(Aml *table, MachineState
> >>> *machine, CPUHotplugFeatures opts,
> >>>            Aml *ins_evt = aml_name("%s.%s", cphp_res_path,
> >>> CPU_INSERT_EVENT);
> >>>            Aml *rm_evt = aml_name("%s.%s", cphp_res_path,
> >>> CPU_REMOVE_EVENT);
> >>>            Aml *ej_evt = aml_name("%s.%s", cphp_res_path,
> >>> CPU_EJECT_EVENT);
> >>> +        Aml *fw_ej_evt = aml_name("%s.%s", cphp_res_path,
> >>> CPU_FW_EJECT_EVENT);
> >>>              aml_append(cpus_dev, aml_name_decl("_HID",
> >>> aml_string("ACPI0010")));
> >>>            aml_append(cpus_dev, aml_name_decl("_CID",
> >>> aml_eisaid("PNP0A05")));
> >>> @@ -461,7 +469,13 @@ void build_cpus_aml(Aml *table, MachineState
> >>> *machine, CPUHotplugFeatures opts,
> >>>                  aml_append(method, aml_acquire(ctrl_lock, 0xFFFF));
> >>>                aml_append(method, aml_store(idx, cpu_selector));
> >>> -            aml_append(method, aml_store(one, ej_evt));
> >>> +            if (opts.fw_unplugs_cpu) {
> >>> +                aml_append(method, aml_store(one, fw_ej_evt));
> >>> +                aml_append(method,
> >>> aml_store(aml_int(OVMF_CPUHP_SMI_CMD),
> >>> +                           aml_name("%s", opts.smi_path)));
> >>> +            } else {
> >>> +                aml_append(method, aml_store(one, ej_evt));
> >>> +            }  
> >> My knowledge of AML is rather rudimentary but this looks mostly
> >> reasonable to me.
> >>
> >> One question: the corresponding code for CPU hotplug does not send an
> >> SMI_CMD.
> >> Why the difference?  
> > 
> > This code (on eject) is executing *after* the OS kernel has processed
> > the event. But on hotplug, the ordering is different (it must be): in
> > that case, the CSCN (scan) method first notifies the firmware, and then
> > the OS.
> > 
> > Thanks
> > Laszlo
> >   
> >>
> >>                      aml_append(while_ctx,
> >>                          aml_store(aml_derefof(aml_index(new_cpus,
> >> cpu_idx)),
> >>                                    uid));
> >>                      aml_append(while_ctx,
> >>                          aml_call2(CPU_NOTIFY_METHOD, uid, dev_chk));
> >>                      aml_append(while_ctx, aml_store(uid, cpu_selector));
> >>                      aml_append(while_ctx, aml_store(one, ins_evt));
> >>                      aml_append(while_ctx, aml_increment(cpu_idx));
> >>
> >>  
> >>>                aml_append(method, aml_release(ctrl_lock));
> >>>            }
> >>>            aml_append(cpus_dev, method);
> >>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> >>> index 1f5c211245..475e76f514 100644
> >>> --- a/hw/i386/acpi-build.c
> >>> +++ b/hw/i386/acpi-build.c
> >>> @@ -96,6 +96,7 @@ typedef struct AcpiPmInfo {
> >>>        bool s4_disabled;
> >>>        bool pcihp_bridge_en;
> >>>        bool smi_on_cpuhp;
> >>> +    bool smi_on_cpu_unplug;
> >>>        bool pcihp_root_en;
> >>>        uint8_t s4_val;
> >>>        AcpiFadtData fadt;
> >>> @@ -197,6 +198,7 @@ static void acpi_get_pm_info(MachineState
> >>> *machine, AcpiPmInfo *pm)
> >>>        pm->pcihp_io_base = 0;
> >>>        pm->pcihp_io_len = 0;
> >>>        pm->smi_on_cpuhp = false;
> >>> +    pm->smi_on_cpu_unplug = false;
> >>>          assert(obj);
> >>>        init_common_fadt_data(machine, obj, &pm->fadt);
> >>> @@ -220,6 +222,8 @@ static void acpi_get_pm_info(MachineState
> >>> *machine, AcpiPmInfo *pm)
> >>>            pm->cpu_hp_io_base = ICH9_CPU_HOTPLUG_IO_BASE;
> >>>            pm->smi_on_cpuhp =
> >>>                !!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT));
> >>> +        pm->smi_on_cpu_unplug =
> >>> +            !!(smi_features &
> >>> BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT));
> >>>        }
> >>>          /* The above need not be conditional on machine type because
> >>> the reset port
> >>> @@ -1582,6 +1586,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
> >>>            CPUHotplugFeatures opts = {
> >>>                .acpi_1_compatible = true, .has_legacy_cphp = true,
> >>>                .smi_path = pm->smi_on_cpuhp ? "\\_SB.PCI0.SMI0.SMIC" :
> >>> NULL,
> >>> +            .fw_unplugs_cpu = pm->smi_on_cpu_unplug,
> >>>            };
> >>>            build_cpus_aml(dsdt, machine, opts, pm->cpu_hp_io_base,
> >>>                           "\\_SB.PCI0", "\\_GPE._E02");
> >>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> >>> index 17b514d1da..2952a00fe6 100644
> >>> --- a/hw/i386/pc.c
> >>> +++ b/hw/i386/pc.c
> >>> @@ -99,6 +99,7 @@
> >>>      GlobalProperty pc_compat_5_1[] = {
> >>>        { "ICH9-LPC", "x-smi-cpu-hotplug", "off" },
> >>> +    { "ICH9-LPC", "x-smi-cpu-hotunplug", "off" },
> >>>    };
> >>>    const size_t pc_compat_5_1_len = G_N_ELEMENTS(pc_compat_5_1);
> >>>    diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
> >>> index 087a18d04d..8c667b7166 100644
> >>> --- a/hw/isa/lpc_ich9.c
> >>> +++ b/hw/isa/lpc_ich9.c
> >>> @@ -770,7 +770,7 @@ static Property ich9_lpc_properties[] = {
> >>>        DEFINE_PROP_BIT64("x-smi-cpu-hotplug", ICH9LPCState,
> >>> smi_host_features,
> >>>                          ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT, true),
> >>>        DEFINE_PROP_BIT64("x-smi-cpu-hotunplug", ICH9LPCState,
> >>> smi_host_features,
> >>> -                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, false),
> >>> +                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, true),
> >>>        DEFINE_PROP_END_OF_LIST(),
> >>>    };
> >>>     
> >>
> >> Thanks for sending out the patch btw. This helped me crystallize some of
> >> the
> >> corresponding OVMF code.
> >>
> >> Ankur
> >>  
> >
Igor Mammedov Nov. 27, 2020, 11:47 a.m. UTC | #12
On Thu, 26 Nov 2020 20:10:59 -0800
Ankur Arora <ankur.a.arora@oracle.com> wrote:

> On 2020-11-26 12:38 p.m., Igor Mammedov wrote:
> > On Thu, 26 Nov 2020 12:17:27 +0100
> > Laszlo Ersek <lersek@redhat.com> wrote:
> >   
> >> On 11/24/20 13:25, Igor Mammedov wrote:  
> >>> If firmware negotiates ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT feature,
> >>> OSPM on CPU eject will set bit #4 in CPU hotplug block for to be
> >>> ejected CPU to mark it for removal by firmware and trigger SMI
> >>> upcall to let firmware do actual eject.
> >>>
> >>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> >>> ---
> >>> PS:
> >>>    - abuse 5.1 machine type for now to turn off unplug feature
> >>>      (it will be moved to 5.2 machine type once new merge window is open)
> >>> ---
> >>>   include/hw/acpi/cpu.h           |  2 ++
> >>>   docs/specs/acpi_cpu_hotplug.txt | 11 +++++++++--
> >>>   hw/acpi/cpu.c                   | 18 ++++++++++++++++--
> >>>   hw/i386/acpi-build.c            |  5 +++++
> >>>   hw/i386/pc.c                    |  1 +
> >>>   hw/isa/lpc_ich9.c               |  2 +-
> >>>   6 files changed, 34 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
> >>> index 0eeedaa491..999caaf510 100644
> >>> --- a/include/hw/acpi/cpu.h
> >>> +++ b/include/hw/acpi/cpu.h
> >>> @@ -22,6 +22,7 @@ typedef struct AcpiCpuStatus {
> >>>       uint64_t arch_id;
> >>>       bool is_inserting;
> >>>       bool is_removing;
> >>> +    bool fw_remove;
> >>>       uint32_t ost_event;
> >>>       uint32_t ost_status;
> >>>   } AcpiCpuStatus;
> >>> @@ -50,6 +51,7 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object *owner,
> >>>   typedef struct CPUHotplugFeatures {
> >>>       bool acpi_1_compatible;
> >>>       bool has_legacy_cphp;
> >>> +    bool fw_unplugs_cpu;
> >>>       const char *smi_path;
> >>>   } CPUHotplugFeatures;
> >>>
> >>> diff --git a/docs/specs/acpi_cpu_hotplug.txt b/docs/specs/acpi_cpu_hotplug.txt
> >>> index 9bb22d1270..f68ef6e06c 100644
> >>> --- a/docs/specs/acpi_cpu_hotplug.txt
> >>> +++ b/docs/specs/acpi_cpu_hotplug.txt
> >>> @@ -57,7 +57,11 @@ read access:
> >>>                 It's valid only when bit 0 is set.
> >>>              2: Device remove event, used to distinguish device for which
> >>>                 no device eject request to OSPM was issued.
> >>> -           3-7: reserved and should be ignored by OSPM
> >>> +           3: reserved and should be ignored by OSPM
> >>> +           4: if set to 1, OSPM requests firmware to perform device eject,
> >>> +              firmware shall clear this event by writing 1 into it before  
> >>
> >> (1) s/clear this event/clear this event bit/
> >>  
> >>> +              performing device eject.  
> >>
> >> (2) move the second and third lines ("firmware shall clear....") over to
> >> the write documentation, below? In particular:
> >>  
> >>> +           5-7: reserved and should be ignored by OSPM
> >>>       [0x5-0x7] reserved
> >>>       [0x8] Command data: (DWORD access)
> >>>             contains 0 unless value last stored in 'Command field' is one of:
> >>> @@ -82,7 +86,10 @@ write access:
> >>>                  selected CPU device
> >>>               3: if set to 1 initiates device eject, set by OSPM when it
> >>>                  triggers CPU device removal and calls _EJ0 method
> >>> -            4-7: reserved, OSPM must clear them before writing to register
> >>> +            4: if set to 1 OSPM hands over device eject to firmware,
> >>> +               Firmware shall issue device eject request as described above
> >>> +               (bit #3) and OSPM should not touch device eject bit (#3),  
> >>
> >> (3) it would be clearer if we documented the exact bit writing order
> >> here:
> >> - clear bit#4, *then* set bit#3 (two write accesses)
> >> - versus clear bit#4 *and* set bit#3 (single access)  
> > 
> > I was thinking that FW should not bother with clearing bit #4,
> > and QEMU should clear it when handling write to bit #3.
> > (it looks like I forgot to actually do that)  
> 
> Why involve the firmware with bit #3 at all? If the firmware only reads bit #4
> to detect fw_remove and then write (and thus reset) bit #4, isn't that
> good enough?

That would needlessly complicate code on QEMU side and I don't want to
overload bit #4 with another semantics, and we already have bit #3 that
does eject. So unless there are issues with that, I'd stick to using
bit #3 for eject.

> 
> 
> >   
> >>
> >>
> >>  
> >>> +            5-7: reserved, OSPM must clear them before writing to register
> >>>       [0x5] Command field: (1 byte access)
> >>>             value:
> >>>               0: selects a CPU device with inserting/removing events and
> >>> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> >>> index f099b50927..09d2f20dae 100644
> >>> --- a/hw/acpi/cpu.c
> >>> +++ b/hw/acpi/cpu.c
> >>> @@ -71,6 +71,7 @@ static uint64_t cpu_hotplug_rd(void *opaque, hwaddr addr, unsigned size)
> >>>           val |= cdev->cpu ? 1 : 0;
> >>>           val |= cdev->is_inserting ? 2 : 0;
> >>>           val |= cdev->is_removing  ? 4 : 0;
> >>> +        val |= cdev->fw_remove  ? 16 : 0;
> >>>           trace_cpuhp_acpi_read_flags(cpu_st->selector, val);
> >>>           break;
> >>>       case ACPI_CPU_CMD_DATA_OFFSET_RW:
> >>> @@ -148,6 +149,8 @@ static void cpu_hotplug_wr(void *opaque, hwaddr addr, uint64_t data,
> >>>               hotplug_ctrl = qdev_get_hotplug_handler(dev);
> >>>               hotplug_handler_unplug(hotplug_ctrl, dev, NULL);
> >>>               object_unparent(OBJECT(dev));
> >>> +        } else if (data & 16) {
> >>> +            cdev->fw_remove = !cdev->fw_remove;  
> >>
> >> hm... so I guess the ACPI code will first write bit#4 to flip
> >> "fw_remove" from "off" to "on". Then the firmware will write bit#4 to
> >> flip "fw_remove" back  to "off". And finally, the firmware will write
> >> bit#3 (strictly as a separate access) to unplug the CPU.  
> > sorry for confusion in doc vs impl, FW should only read bit #4, as for bit #3 only write.
> >     
> >> (4) But anyway, taking a step back: what do we need the new bit for?
> >>  
> >>>           }
> >>>           break;
> >>>       case ACPI_CPU_CMD_OFFSET_WR:
> >>> @@ -332,6 +335,7 @@ const VMStateDescription vmstate_cpu_hotplug = {
> >>>   #define CPU_INSERT_EVENT  "CINS"
> >>>   #define CPU_REMOVE_EVENT  "CRMV"
> >>>   #define CPU_EJECT_EVENT   "CEJ0"
> >>> +#define CPU_FW_EJECT_EVENT "CEJF"
> >>>
> >>>   void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> >>>                       hwaddr io_base,
> >>> @@ -384,7 +388,10 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> >>>           aml_append(field, aml_named_field(CPU_REMOVE_EVENT, 1));
> >>>           /* initiates device eject, write only */
> >>>           aml_append(field, aml_named_field(CPU_EJECT_EVENT, 1));
> >>> -        aml_append(field, aml_reserved_field(4));
> >>> +        aml_append(field, aml_reserved_field(1));
> >>> +        /* tell firmware to do device eject, write only */
> >>> +        aml_append(field, aml_named_field(CPU_FW_EJECT_EVENT, 1));
> >>> +        aml_append(field, aml_reserved_field(2));
> >>>           aml_append(field, aml_named_field(CPU_COMMAND, 8));
> >>>           aml_append(cpu_ctrl_dev, field);
> >>>
> >>> @@ -419,6 +426,7 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> >>>           Aml *ins_evt = aml_name("%s.%s", cphp_res_path, CPU_INSERT_EVENT);
> >>>           Aml *rm_evt = aml_name("%s.%s", cphp_res_path, CPU_REMOVE_EVENT);
> >>>           Aml *ej_evt = aml_name("%s.%s", cphp_res_path, CPU_EJECT_EVENT);
> >>> +        Aml *fw_ej_evt = aml_name("%s.%s", cphp_res_path, CPU_FW_EJECT_EVENT);
> >>>
> >>>           aml_append(cpus_dev, aml_name_decl("_HID", aml_string("ACPI0010")));
> >>>           aml_append(cpus_dev, aml_name_decl("_CID", aml_eisaid("PNP0A05")));
> >>> @@ -461,7 +469,13 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> >>>
> >>>               aml_append(method, aml_acquire(ctrl_lock, 0xFFFF));
> >>>               aml_append(method, aml_store(idx, cpu_selector));
> >>> -            aml_append(method, aml_store(one, ej_evt));
> >>> +            if (opts.fw_unplugs_cpu) {
> >>> +                aml_append(method, aml_store(one, fw_ej_evt));
> >>> +                aml_append(method, aml_store(aml_int(OVMF_CPUHP_SMI_CMD),
> >>> +                           aml_name("%s", opts.smi_path)));
> >>> +            } else {
> >>> +                aml_append(method, aml_store(one, ej_evt));
> >>> +            }
> >>>               aml_append(method, aml_release(ctrl_lock));
> >>>           }
> >>>           aml_append(cpus_dev, method);  
> >>
> >> Hmmm, OK, let me parse this.
> >>
> >> Assume there is a big bunch of device_del QMP commands, QEMU marks the
> >> "remove" event pending on the corresponding set of CPUs, plus also makes
> >> the ACPI interrupt pending. The ACPI interrupt handler in the OS runs,
> >> and calls CSCN. CSCN runs a loop, and for each CPU where the remove
> >> event is pending, notifies the OS one by one. The OS in turn forgets
> >> about the subject CPU, and calls the _EJ0 method on the affected CPU
> >> ACPI object. The _EJ0 method on the CPU ACPI object calls CEJ0, passing
> >> in the affected CPU's identifier.
> >>
> >> The above hunk modifies the CEJ0 method.
> >>
> >> (5) Question: pre-patch, both the CSCN method and the CEJ0 method
> >> acquire the CPLK lock, but CEJ0 is actually called within CSCN
> >> (indirectly, with the OS's cooperation). Is CPLK a recursive lock?  
> > Theoretically scep supports recursive mutexes but I don't think it's the case here.
> > 
> > Considering it works currently, I think OS implements Notify event as async.
> > hence no clash wrt mutex. If EJ0 were handled within CSCN context,
> > EJ0 would mess cpu_selector value that CSCN is also using.  
> 
>  From my read of the Linux code, yeah, the EJ0 execution happens in an
> async worker on CPU 0 which first detaches the CPU and then executes EJ0.  
> >> Anyway, let's see the CEJ0 modification. After the OS is done forgetting
> >> about the CPU, the CEJ0 method no longer unplugs the CPU, instead it
> >> sets the new bit#4 in the register block, and raises an SMI.
> >>
> >> (6) So that's one SMI per CPU being removed. Is that OK?  
> > 
> > I guess it has performance penalty but there is nothing we can do about it,
> > OSPM does EJ0 calls asynchronously.
> >     
> >> (7) What if there are asynchronous plugs going on, and the firmware
> >> notices them in the register block? ... Hm, I hope that should be OK,
> >> because ultimately the CSCN method will learn about those too, and
> >> inform the OS. On plug, the firmware doesn't modify the register block.  
> > shouldn't be issue (modulo bugs, I haven't tried to hot add and hot remove
> > the same CPU at the same time)  
> 
> Yeah I was wondering what would happen for simultaneous hot add and remove.
> I guess we would always do remove first and then the add, unless we hit
> the break due to max_cpus_per_pass and switch to hot-add mode.
> 
> > 
> > i.e.
> > (QEMU) pause
> > (QEMU) device_add
> > (QEMU) device_del
> > (QEMU) cont

looking at current CPU_SCAN_METHOD
it will notice and process insert event only.
Remove event will be pending till the next cpu hotplug SCI.
(i.e. next time user hot(un)plugs a cpu)

not sure that such use case is worth fixing though.

> >   
> >> Ah! OK. I think I understand why bit#4 is important. The firmware may
> >> notice pending remove events, but it must not act upon them -- it must
> >> simply ignore them -- unless bit#4 is also set. Bit#2 set with bit#4
> >> clear means the event is pending (QEMU got a device_del), but the OS has
> >> not forgotten about the CPU yet -- so the firmware must not unplug it
> >> yet. When the modified CEJ0 method runs, it sets bit#4 in addition to
> >> the already set bit#2, advertising that the OS has *already* abandoned
> >> the CPU.  
> > firmware should ignore bit #2, it doesn't mean anything to it, OSPM might
> > ignore or nonsupport CPU removal. What firmware must care about is bit #4,
> > which tells it that OSPM is done with CPU and asks for to be removed by firmware.  
> 
> In my other mail, I was suggesting that the guest OS not reset bit #2 but
> on second thoughts, this makes sense.
> 
> 
> Thanks
> Ankur
> 
> >   
> >>
> >> This means we'll have to modify the QemuCpuhpCollectApicIds() function
> >> in OVMF as well -- for collecting a CPU for unplug, just bit#2
> >> (QEMU_CPUHP_STAT_REMOVE) is insufficient -- on such CPUs, the OS may
> >> still be executing threads.
> >>
> >> OK, this approach sounds plausible to me.
> >>
> >> (8) Please extend the description of bit#2 in the "status flags read
> >> access" section: "firmware must ignore bit#2 unless bit#4 is set".
> >>
> >>
> >>  
> >>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> >>> index 1f5c211245..475e76f514 100644
> >>> --- a/hw/i386/acpi-build.c
> >>> +++ b/hw/i386/acpi-build.c
> >>> @@ -96,6 +96,7 @@ typedef struct AcpiPmInfo {
> >>>       bool s4_disabled;
> >>>       bool pcihp_bridge_en;
> >>>       bool smi_on_cpuhp;
> >>> +    bool smi_on_cpu_unplug;
> >>>       bool pcihp_root_en;
> >>>       uint8_t s4_val;
> >>>       AcpiFadtData fadt;
> >>> @@ -197,6 +198,7 @@ static void acpi_get_pm_info(MachineState *machine, AcpiPmInfo *pm)
> >>>       pm->pcihp_io_base = 0;
> >>>       pm->pcihp_io_len = 0;
> >>>       pm->smi_on_cpuhp = false;
> >>> +    pm->smi_on_cpu_unplug = false;
> >>>
> >>>       assert(obj);
> >>>       init_common_fadt_data(machine, obj, &pm->fadt);
> >>> @@ -220,6 +222,8 @@ static void acpi_get_pm_info(MachineState *machine, AcpiPmInfo *pm)
> >>>           pm->cpu_hp_io_base = ICH9_CPU_HOTPLUG_IO_BASE;
> >>>           pm->smi_on_cpuhp =
> >>>               !!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT));
> >>> +        pm->smi_on_cpu_unplug =
> >>> +            !!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT));
> >>>       }
> >>>
> >>>       /* The above need not be conditional on machine type because the reset port
> >>> @@ -1582,6 +1586,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
> >>>           CPUHotplugFeatures opts = {
> >>>               .acpi_1_compatible = true, .has_legacy_cphp = true,
> >>>               .smi_path = pm->smi_on_cpuhp ? "\\_SB.PCI0.SMI0.SMIC" : NULL,
> >>> +            .fw_unplugs_cpu = pm->smi_on_cpu_unplug,
> >>>           };
> >>>           build_cpus_aml(dsdt, machine, opts, pm->cpu_hp_io_base,
> >>>                          "\\_SB.PCI0", "\\_GPE._E02");
> >>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> >>> index 17b514d1da..2952a00fe6 100644
> >>> --- a/hw/i386/pc.c
> >>> +++ b/hw/i386/pc.c
> >>> @@ -99,6 +99,7 @@
> >>>
> >>>   GlobalProperty pc_compat_5_1[] = {
> >>>       { "ICH9-LPC", "x-smi-cpu-hotplug", "off" },
> >>> +    { "ICH9-LPC", "x-smi-cpu-hotunplug", "off" },
> >>>   };
> >>>   const size_t pc_compat_5_1_len = G_N_ELEMENTS(pc_compat_5_1);
> >>>
> >>> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
> >>> index 087a18d04d..8c667b7166 100644
> >>> --- a/hw/isa/lpc_ich9.c
> >>> +++ b/hw/isa/lpc_ich9.c
> >>> @@ -770,7 +770,7 @@ static Property ich9_lpc_properties[] = {
> >>>       DEFINE_PROP_BIT64("x-smi-cpu-hotplug", ICH9LPCState, smi_host_features,
> >>>                         ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT, true),
> >>>       DEFINE_PROP_BIT64("x-smi-cpu-hotunplug", ICH9LPCState, smi_host_features,
> >>> -                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, false),
> >>> +                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, true),
> >>>       DEFINE_PROP_END_OF_LIST(),
> >>>   };
> >>>
> >>>     
> >>
> >> (9) You have to extend smi_features_ok_callback() as well -- it is
> >> invalid for the firmware to negotiate unplug, without negotiating plug.
> >>
> >> In fact, as far as I can tell, that would even crash QEMU, given this
> >> patch. Because, "opts.smi_path" would be set to NULL, but
> >> "opts.fw_unplugs_cpu" would be set to "true". As a consequence, the
> >> CPU_EJECT_METHOD change above would call aml_name("%s", NULL).
> >>
> >> So something like the following looks necessary:  
> > 
> > Thanks for suggestions,
> > I'll respin v2 with your feedback included.
> >   
> >>  
> >>> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
> >>> index 8c667b7166c7..5bc3f212fe77 100644
> >>> --- a/hw/isa/lpc_ich9.c
> >>> +++ b/hw/isa/lpc_ich9.c
> >>> @@ -366,6 +366,7 @@ static void smi_features_ok_callback(void *opaque)
> >>>   {
> >>>       ICH9LPCState *lpc = opaque;
> >>>       uint64_t guest_features;
> >>> +    uint64_t guest_cpu_hotplug_features;
> >>>
> >>>       if (lpc->smi_features_ok) {
> >>>           /* negotiation already complete, features locked */
> >>> @@ -378,9 +379,11 @@ static void smi_features_ok_callback(void *opaque)
> >>>           /* guest requests invalid features, leave @features_ok at zero */
> >>>           return;
> >>>       }
> >>> +    guest_cpu_hotplug_features = guest_features &
> >>> +                                 (BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT) |
> >>> +                                  BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT));
> >>>       if (!(guest_features & BIT_ULL(ICH9_LPC_SMI_F_BROADCAST_BIT)) &&
> >>> -        guest_features & (BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT) |
> >>> -                          BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT))) {
> >>> +        guest_cpu_hotplug_features) {
> >>>           /*
> >>>            * cpu hot-[un]plug with SMI requires SMI broadcast,
> >>>            * leave @features_ok at zero
> >>> @@ -388,6 +391,12 @@ static void smi_features_ok_callback(void *opaque)
> >>>           return;
> >>>       }
> >>>
> >>> +    if (guest_cpu_hotplug_features ==
> >>> +        BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT)) {
> >>> +        /* cpu hot-unplug is unsupported without cpu-hotplug */
> >>> +        return;
> >>> +    }
> >>> +
> >>>       /* valid feature subset requested, lock it down, report success */
> >>>       lpc->smi_negotiated_features = guest_features;
> >>>       lpc->smi_features_ok = 1;  
> >>
> >>
> >> (10) It would be nice to separate this work into multiple patches. I
> >> propose:
> >>
> >> - [PATCH 1/5] x86: ich9: factor out "guest_cpu_hotplug_features"
> >>  
> >>>   hw/isa/lpc_ich9.c | 7 +++++--
> >>>   1 file changed, 5 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
> >>> index 087a18d04de4..c46eefd13fd4 100644
> >>> --- a/hw/isa/lpc_ich9.c
> >>> +++ b/hw/isa/lpc_ich9.c
> >>> @@ -366,6 +366,7 @@ static void smi_features_ok_callback(void *opaque)
> >>>   {
> >>>       ICH9LPCState *lpc = opaque;
> >>>       uint64_t guest_features;
> >>> +    uint64_t guest_cpu_hotplug_features;
> >>>
> >>>       if (lpc->smi_features_ok) {
> >>>           /* negotiation already complete, features locked */
> >>> @@ -378,9 +379,11 @@ static void smi_features_ok_callback(void *opaque)
> >>>           /* guest requests invalid features, leave @features_ok at zero */
> >>>           return;
> >>>       }
> >>> +    guest_cpu_hotplug_features = guest_features &
> >>> +                                 (BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT) |
> >>> +                                  BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT));
> >>>       if (!(guest_features & BIT_ULL(ICH9_LPC_SMI_F_BROADCAST_BIT)) &&
> >>> -        guest_features & (BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT) |
> >>> -                          BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT))) {
> >>> +        guest_cpu_hotplug_features) {
> >>>           /*
> >>>            * cpu hot-[un]plug with SMI requires SMI broadcast,
> >>>            * leave @features_ok at zero  
> >>
> >>
> >> - [PATCH 2/5] x86: ich9: let firmware negotiate 'CPU hot-unplug with SMI' feature
> >>  
> >>>   hw/i386/pc.c      | 1 +
> >>>   hw/isa/lpc_ich9.c | 8 +++++++-
> >>>   2 files changed, 8 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> >>> index 17b514d1da50..2952a00fe694 100644
> >>> --- a/hw/i386/pc.c
> >>> +++ b/hw/i386/pc.c
> >>> @@ -99,6 +99,7 @@
> >>>
> >>>   GlobalProperty pc_compat_5_1[] = {
> >>>       { "ICH9-LPC", "x-smi-cpu-hotplug", "off" },
> >>> +    { "ICH9-LPC", "x-smi-cpu-hotunplug", "off" },
> >>>   };
> >>>   const size_t pc_compat_5_1_len = G_N_ELEMENTS(pc_compat_5_1);
> >>>
> >>> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
> >>> index c46eefd13fd4..5bc3f212fe77 100644
> >>> --- a/hw/isa/lpc_ich9.c
> >>> +++ b/hw/isa/lpc_ich9.c
> >>> @@ -391,6 +391,12 @@ static void smi_features_ok_callback(void *opaque)
> >>>           return;
> >>>       }
> >>>
> >>> +    if (guest_cpu_hotplug_features ==
> >>> +        BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT)) {
> >>> +        /* cpu hot-unplug is unsupported without cpu-hotplug */
> >>> +        return;
> >>> +    }
> >>> +
> >>>       /* valid feature subset requested, lock it down, report success */
> >>>       lpc->smi_negotiated_features = guest_features;
> >>>       lpc->smi_features_ok = 1;
> >>> @@ -773,7 +779,7 @@ static Property ich9_lpc_properties[] = {
> >>>       DEFINE_PROP_BIT64("x-smi-cpu-hotplug", ICH9LPCState, smi_host_features,
> >>>                         ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT, true),
> >>>       DEFINE_PROP_BIT64("x-smi-cpu-hotunplug", ICH9LPCState, smi_host_features,
> >>> -                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, false),
> >>> +                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, true),
> >>>       DEFINE_PROP_END_OF_LIST(),
> >>>   };
> >>>     
> >>
> >>
> >> - [PATCH 3/5] x86: acpi: introduce AcpiPmInfo::smi_on_cpu_unplug
> >>  
> >>>   hw/i386/acpi-build.c | 4 ++++
> >>>   1 file changed, 4 insertions(+)
> >>>
> >>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> >>> index 1f5c2112452a..9036e5594c92 100644
> >>> --- a/hw/i386/acpi-build.c
> >>> +++ b/hw/i386/acpi-build.c
> >>> @@ -96,6 +96,7 @@ typedef struct AcpiPmInfo {
> >>>       bool s4_disabled;
> >>>       bool pcihp_bridge_en;
> >>>       bool smi_on_cpuhp;
> >>> +    bool smi_on_cpu_unplug;
> >>>       bool pcihp_root_en;
> >>>       uint8_t s4_val;
> >>>       AcpiFadtData fadt;
> >>> @@ -197,6 +198,7 @@ static void acpi_get_pm_info(MachineState *machine, AcpiPmInfo *pm)
> >>>       pm->pcihp_io_base = 0;
> >>>       pm->pcihp_io_len = 0;
> >>>       pm->smi_on_cpuhp = false;
> >>> +    pm->smi_on_cpu_unplug = false;
> >>>
> >>>       assert(obj);
> >>>       init_common_fadt_data(machine, obj, &pm->fadt);
> >>> @@ -220,6 +222,8 @@ static void acpi_get_pm_info(MachineState *machine, AcpiPmInfo *pm)
> >>>           pm->cpu_hp_io_base = ICH9_CPU_HOTPLUG_IO_BASE;
> >>>           pm->smi_on_cpuhp =
> >>>               !!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT));
> >>> +        pm->smi_on_cpu_unplug =
> >>> +            !!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT));
> >>>       }
> >>>
> >>>       /* The above need not be conditional on machine type because the reset port  
> >>
> >>
> >> - [PATCH 4/5] acpi: cpuhp: introduce 'firmware performs eject' status/control bits
> >>  
> >>>   docs/specs/acpi_cpu_hotplug.txt | 11 +++++++++--
> >>>   include/hw/acpi/cpu.h           |  1 +
> >>>   hw/acpi/cpu.c                   |  3 +++
> >>>   3 files changed, 13 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/docs/specs/acpi_cpu_hotplug.txt b/docs/specs/acpi_cpu_hotplug.txt
> >>> index 9bb22d1270a9..f68ef6e06c7a 100644
> >>> --- a/docs/specs/acpi_cpu_hotplug.txt
> >>> +++ b/docs/specs/acpi_cpu_hotplug.txt
> >>> @@ -57,7 +57,11 @@ read access:
> >>>                 It's valid only when bit 0 is set.
> >>>              2: Device remove event, used to distinguish device for which
> >>>                 no device eject request to OSPM was issued.
> >>> -           3-7: reserved and should be ignored by OSPM
> >>> +           3: reserved and should be ignored by OSPM
> >>> +           4: if set to 1, OSPM requests firmware to perform device eject,
> >>> +              firmware shall clear this event by writing 1 into it before
> >>> +              performing device eject.
> >>> +           5-7: reserved and should be ignored by OSPM
> >>>       [0x5-0x7] reserved
> >>>       [0x8] Command data: (DWORD access)
> >>>             contains 0 unless value last stored in 'Command field' is one of:
> >>> @@ -82,7 +86,10 @@ write access:
> >>>                  selected CPU device
> >>>               3: if set to 1 initiates device eject, set by OSPM when it
> >>>                  triggers CPU device removal and calls _EJ0 method
> >>> -            4-7: reserved, OSPM must clear them before writing to register
> >>> +            4: if set to 1 OSPM hands over device eject to firmware,
> >>> +               Firmware shall issue device eject request as described above
> >>> +               (bit #3) and OSPM should not touch device eject bit (#3),
> >>> +            5-7: reserved, OSPM must clear them before writing to register
> >>>       [0x5] Command field: (1 byte access)
> >>>             value:
> >>>               0: selects a CPU device with inserting/removing events and
> >>> diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
> >>> index 0eeedaa491c1..d71edde456f2 100644
> >>> --- a/include/hw/acpi/cpu.h
> >>> +++ b/include/hw/acpi/cpu.h
> >>> @@ -22,6 +22,7 @@ typedef struct AcpiCpuStatus {
> >>>       uint64_t arch_id;
> >>>       bool is_inserting;
> >>>       bool is_removing;
> >>> +    bool fw_remove;
> >>>       uint32_t ost_event;
> >>>       uint32_t ost_status;
> >>>   } AcpiCpuStatus;
> >>> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> >>> index f099b5092730..3dc83d73e20b 100644
> >>> --- a/hw/acpi/cpu.c
> >>> +++ b/hw/acpi/cpu.c
> >>> @@ -71,6 +71,7 @@ static uint64_t cpu_hotplug_rd(void *opaque, hwaddr addr, unsigned size)
> >>>           val |= cdev->cpu ? 1 : 0;
> >>>           val |= cdev->is_inserting ? 2 : 0;
> >>>           val |= cdev->is_removing  ? 4 : 0;
> >>> +        val |= cdev->fw_remove  ? 16 : 0;
> >>>           trace_cpuhp_acpi_read_flags(cpu_st->selector, val);
> >>>           break;
> >>>       case ACPI_CPU_CMD_DATA_OFFSET_RW:
> >>> @@ -148,6 +149,8 @@ static void cpu_hotplug_wr(void *opaque, hwaddr addr, uint64_t data,
> >>>               hotplug_ctrl = qdev_get_hotplug_handler(dev);
> >>>               hotplug_handler_unplug(hotplug_ctrl, dev, NULL);
> >>>               object_unparent(OBJECT(dev));
> >>> +        } else if (data & 16) {
> >>> +            cdev->fw_remove = !cdev->fw_remove;
> >>>           }
> >>>           break;
> >>>       case ACPI_CPU_CMD_OFFSET_WR:  
> >>
> >>
> >> - [PATCH 5/5] x86: acpi: let the firmware handle pending "CPU remove" events in SMM
> >>  
> >>>   include/hw/acpi/cpu.h |  1 +
> >>>   hw/acpi/cpu.c         | 15 +++++++++++++--
> >>>   hw/i386/acpi-build.c  |  1 +
> >>>   3 files changed, 15 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
> >>> index d71edde456f2..999caaf51060 100644
> >>> --- a/include/hw/acpi/cpu.h
> >>> +++ b/include/hw/acpi/cpu.h
> >>> @@ -51,6 +51,7 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object *owner,
> >>>   typedef struct CPUHotplugFeatures {
> >>>       bool acpi_1_compatible;
> >>>       bool has_legacy_cphp;
> >>> +    bool fw_unplugs_cpu;
> >>>       const char *smi_path;
> >>>   } CPUHotplugFeatures;
> >>>
> >>> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> >>> index 3dc83d73e20b..09d2f20daec0 100644
> >>> --- a/hw/acpi/cpu.c
> >>> +++ b/hw/acpi/cpu.c
> >>> @@ -335,6 +335,7 @@ const VMStateDescription vmstate_cpu_hotplug = {
> >>>   #define CPU_INSERT_EVENT  "CINS"
> >>>   #define CPU_REMOVE_EVENT  "CRMV"
> >>>   #define CPU_EJECT_EVENT   "CEJ0"
> >>> +#define CPU_FW_EJECT_EVENT "CEJF"
> >>>
> >>>   void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> >>>                       hwaddr io_base,
> >>> @@ -387,7 +388,10 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> >>>           aml_append(field, aml_named_field(CPU_REMOVE_EVENT, 1));
> >>>           /* initiates device eject, write only */
> >>>           aml_append(field, aml_named_field(CPU_EJECT_EVENT, 1));
> >>> -        aml_append(field, aml_reserved_field(4));
> >>> +        aml_append(field, aml_reserved_field(1));
> >>> +        /* tell firmware to do device eject, write only */
> >>> +        aml_append(field, aml_named_field(CPU_FW_EJECT_EVENT, 1));
> >>> +        aml_append(field, aml_reserved_field(2));
> >>>           aml_append(field, aml_named_field(CPU_COMMAND, 8));
> >>>           aml_append(cpu_ctrl_dev, field);
> >>>
> >>> @@ -422,6 +426,7 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> >>>           Aml *ins_evt = aml_name("%s.%s", cphp_res_path, CPU_INSERT_EVENT);
> >>>           Aml *rm_evt = aml_name("%s.%s", cphp_res_path, CPU_REMOVE_EVENT);
> >>>           Aml *ej_evt = aml_name("%s.%s", cphp_res_path, CPU_EJECT_EVENT);
> >>> +        Aml *fw_ej_evt = aml_name("%s.%s", cphp_res_path, CPU_FW_EJECT_EVENT);
> >>>
> >>>           aml_append(cpus_dev, aml_name_decl("_HID", aml_string("ACPI0010")));
> >>>           aml_append(cpus_dev, aml_name_decl("_CID", aml_eisaid("PNP0A05")));
> >>> @@ -464,7 +469,13 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> >>>
> >>>               aml_append(method, aml_acquire(ctrl_lock, 0xFFFF));
> >>>               aml_append(method, aml_store(idx, cpu_selector));
> >>> -            aml_append(method, aml_store(one, ej_evt));
> >>> +            if (opts.fw_unplugs_cpu) {
> >>> +                aml_append(method, aml_store(one, fw_ej_evt));
> >>> +                aml_append(method, aml_store(aml_int(OVMF_CPUHP_SMI_CMD),
> >>> +                           aml_name("%s", opts.smi_path)));
> >>> +            } else {
> >>> +                aml_append(method, aml_store(one, ej_evt));
> >>> +            }
> >>>               aml_append(method, aml_release(ctrl_lock));
> >>>           }
> >>>           aml_append(cpus_dev, method);
> >>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> >>> index 9036e5594c92..475e76f514ff 100644
> >>> --- a/hw/i386/acpi-build.c
> >>> +++ b/hw/i386/acpi-build.c
> >>> @@ -1586,6 +1586,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
> >>>           CPUHotplugFeatures opts = {
> >>>               .acpi_1_compatible = true, .has_legacy_cphp = true,
> >>>               .smi_path = pm->smi_on_cpuhp ? "\\_SB.PCI0.SMI0.SMIC" : NULL,
> >>> +            .fw_unplugs_cpu = pm->smi_on_cpu_unplug,
> >>>           };
> >>>           build_cpus_aml(dsdt, machine, opts, pm->cpu_hp_io_base,
> >>>                          "\\_SB.PCI0", "\\_GPE._E02");  
> >>
> >> Thanks!
> >> Laszlo
> >>
> >>  
> >   
>
Laszlo Ersek Nov. 27, 2020, 2:48 p.m. UTC | #13
On 11/26/20 21:38, Igor Mammedov wrote:
> On Thu, 26 Nov 2020 12:17:27 +0100
> Laszlo Ersek <lersek@redhat.com> wrote:
> 
>> On 11/24/20 13:25, Igor Mammedov wrote:

>>> diff --git a/docs/specs/acpi_cpu_hotplug.txt b/docs/specs/acpi_cpu_hotplug.txt
>>> index 9bb22d1270..f68ef6e06c 100644
>>> --- a/docs/specs/acpi_cpu_hotplug.txt
>>> +++ b/docs/specs/acpi_cpu_hotplug.txt
>>> @@ -57,7 +57,11 @@ read access:
>>>                It's valid only when bit 0 is set.
>>>             2: Device remove event, used to distinguish device for which
>>>                no device eject request to OSPM was issued.
>>> -           3-7: reserved and should be ignored by OSPM
>>> +           3: reserved and should be ignored by OSPM
>>> +           4: if set to 1, OSPM requests firmware to perform device eject,
>>> +              firmware shall clear this event by writing 1 into it before  
>>
>> (1) s/clear this event/clear this event bit/
>>
>>> +              performing device eject.  
>>
>> (2) move the second and third lines ("firmware shall clear....") over to
>> the write documentation, below? In particular:
>>
>>> +           5-7: reserved and should be ignored by OSPM
>>>      [0x5-0x7] reserved
>>>      [0x8] Command data: (DWORD access)
>>>            contains 0 unless value last stored in 'Command field' is one of:
>>> @@ -82,7 +86,10 @@ write access:
>>>                 selected CPU device
>>>              3: if set to 1 initiates device eject, set by OSPM when it
>>>                 triggers CPU device removal and calls _EJ0 method
>>> -            4-7: reserved, OSPM must clear them before writing to register
>>> +            4: if set to 1 OSPM hands over device eject to firmware,
>>> +               Firmware shall issue device eject request as described above
>>> +               (bit #3) and OSPM should not touch device eject bit (#3),  
>>
>> (3) it would be clearer if we documented the exact bit writing order
>> here:
>> - clear bit#4, *then* set bit#3 (two write accesses)
>> - versus clear bit#4 *and* set bit#3 (single access)
> 
> I was thinking that FW should not bother with clearing bit #4,
> and QEMU should clear it when handling write to bit #3.
> (it looks like I forgot to actually do that)

That should work fine too, as long as it's clearly documented.


>>> @@ -332,6 +335,7 @@ const VMStateDescription vmstate_cpu_hotplug = {
>>>  #define CPU_INSERT_EVENT  "CINS"
>>>  #define CPU_REMOVE_EVENT  "CRMV"
>>>  #define CPU_EJECT_EVENT   "CEJ0"
>>> +#define CPU_FW_EJECT_EVENT "CEJF"
>>>
>>>  void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>>>                      hwaddr io_base,
>>> @@ -384,7 +388,10 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>>>          aml_append(field, aml_named_field(CPU_REMOVE_EVENT, 1));
>>>          /* initiates device eject, write only */
>>>          aml_append(field, aml_named_field(CPU_EJECT_EVENT, 1));
>>> -        aml_append(field, aml_reserved_field(4));
>>> +        aml_append(field, aml_reserved_field(1));
>>> +        /* tell firmware to do device eject, write only */
>>> +        aml_append(field, aml_named_field(CPU_FW_EJECT_EVENT, 1));
>>> +        aml_append(field, aml_reserved_field(2));
>>>          aml_append(field, aml_named_field(CPU_COMMAND, 8));
>>>          aml_append(cpu_ctrl_dev, field);
>>>
>>> @@ -419,6 +426,7 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>>>          Aml *ins_evt = aml_name("%s.%s", cphp_res_path, CPU_INSERT_EVENT);
>>>          Aml *rm_evt = aml_name("%s.%s", cphp_res_path, CPU_REMOVE_EVENT);
>>>          Aml *ej_evt = aml_name("%s.%s", cphp_res_path, CPU_EJECT_EVENT);
>>> +        Aml *fw_ej_evt = aml_name("%s.%s", cphp_res_path, CPU_FW_EJECT_EVENT);
>>>
>>>          aml_append(cpus_dev, aml_name_decl("_HID", aml_string("ACPI0010")));
>>>          aml_append(cpus_dev, aml_name_decl("_CID", aml_eisaid("PNP0A05")));
>>> @@ -461,7 +469,13 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>>>
>>>              aml_append(method, aml_acquire(ctrl_lock, 0xFFFF));
>>>              aml_append(method, aml_store(idx, cpu_selector));
>>> -            aml_append(method, aml_store(one, ej_evt));
>>> +            if (opts.fw_unplugs_cpu) {
>>> +                aml_append(method, aml_store(one, fw_ej_evt));
>>> +                aml_append(method, aml_store(aml_int(OVMF_CPUHP_SMI_CMD),
>>> +                           aml_name("%s", opts.smi_path)));
>>> +            } else {
>>> +                aml_append(method, aml_store(one, ej_evt));
>>> +            }
>>>              aml_append(method, aml_release(ctrl_lock));
>>>          }
>>>          aml_append(cpus_dev, method);  
>>
>> Hmmm, OK, let me parse this.
>>
>> Assume there is a big bunch of device_del QMP commands, QEMU marks the
>> "remove" event pending on the corresponding set of CPUs, plus also makes
>> the ACPI interrupt pending. The ACPI interrupt handler in the OS runs,
>> and calls CSCN. CSCN runs a loop, and for each CPU where the remove
>> event is pending, notifies the OS one by one. The OS in turn forgets
>> about the subject CPU, and calls the _EJ0 method on the affected CPU
>> ACPI object. The _EJ0 method on the CPU ACPI object calls CEJ0, passing
>> in the affected CPU's identifier.
>>
>> The above hunk modifies the CEJ0 method.
>>
>> (5) Question: pre-patch, both the CSCN method and the CEJ0 method
>> acquire the CPLK lock, but CEJ0 is actually called within CSCN
>> (indirectly, with the OS's cooperation). Is CPLK a recursive lock?
> Theoretically scep supports recursive mutexes but I don't think it's the case here.
> 
> Considering it works currently, I think OS implements Notify event as async.
> hence no clash wrt mutex. If EJ0 were handled within CSCN context,
> EJ0 would mess cpu_selector value that CSCN is also using.

Ah indeed. Yes, making Notify pending at first, and then delivering it
inside the kernel only after the current AML call stack returns -- that
seems to make sense. Otherwise we could get unbounded recursion (the
notify handler calls another AML method, which could contain another
notify ...)


>> Anyway, let's see the CEJ0 modification. After the OS is done forgetting
>> about the CPU, the CEJ0 method no longer unplugs the CPU, instead it
>> sets the new bit#4 in the register block, and raises an SMI.
>>
>> (6) So that's one SMI per CPU being removed. Is that OK?
> 
> I guess it has performance penalty but there is nothing we can do about it,
> OSPM does EJ0 calls asynchronously.

OK. Hot-unplug is not a frequent operation.


>  
>> (7) What if there are asynchronous plugs going on, and the firmware
>> notices them in the register block? ... Hm, I hope that should be OK,
>> because ultimately the CSCN method will learn about those too, and
>> inform the OS. On plug, the firmware doesn't modify the register block.
> shouldn't be issue (modulo bugs, I haven't tried to hot add and hot remove
> the same CPU at the same time)
> 
> i.e. 
> (QEMU) pause
> (QEMU) device_add
> (QEMU) device_del
> (QEMU) cont
> 
>> Ah! OK. I think I understand why bit#4 is important. The firmware may
>> notice pending remove events, but it must not act upon them -- it must
>> simply ignore them -- unless bit#4 is also set. Bit#2 set with bit#4
>> clear means the event is pending (QEMU got a device_del), but the OS has
>> not forgotten about the CPU yet -- so the firmware must not unplug it
>> yet. When the modified CEJ0 method runs, it sets bit#4 in addition to
>> the already set bit#2, advertising that the OS has *already* abandoned
>> the CPU.
> firmware should ignore bit #2, it doesn't mean anything to it, OSPM might
> ignore or nonsupport CPU removal. What firmware must care about is bit #4,
> which tells it that OSPM is done with CPU and asks for to be removed by firmware.

Makes sense, especially in combination with the idea that clearing the
fw_remove bit should clear is_removing too.

The firmware logic needs to be aware of is_removing though, at least
understand the existence of this bit, as the "get pending" command will
report such CPUs too that only have is_removing set. Shouldn't be a
problem, we just have to recognize it.

[...]


Thanks!
Laszlo
Laszlo Ersek Nov. 27, 2020, 3:02 p.m. UTC | #14
On 11/27/20 12:33, Igor Mammedov wrote:
> On Thu, 26 Nov 2020 19:35:30 -0800
> Ankur Arora <ankur.a.arora@oracle.com> wrote:
> 
>> On 2020-11-26 4:46 a.m., Laszlo Ersek wrote:
>>> On 11/26/20 11:24, Ankur Arora wrote:  
>>>> On 2020-11-24 4:25 a.m., Igor Mammedov wrote:  
>>>>> If firmware negotiates ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT feature,
>>>>> OSPM on CPU eject will set bit #4 in CPU hotplug block for to be
>>>>> ejected CPU to mark it for removal by firmware and trigger SMI
>>>>> upcall to let firmware do actual eject.
>>>>>
>>>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
>>>>> ---
>>>>> PS:
>>>>>     - abuse 5.1 machine type for now to turn off unplug feature
>>>>>       (it will be moved to 5.2 machine type once new merge window is open)
>>>>> ---
>>>>>    include/hw/acpi/cpu.h           |  2 ++
>>>>>    docs/specs/acpi_cpu_hotplug.txt | 11 +++++++++--
>>>>>    hw/acpi/cpu.c                   | 18 ++++++++++++++++--
>>>>>    hw/i386/acpi-build.c            |  5 +++++
>>>>>    hw/i386/pc.c                    |  1 +
>>>>>    hw/isa/lpc_ich9.c               |  2 +-
>>>>>    6 files changed, 34 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
>>>>> index 0eeedaa491..999caaf510 100644
>>>>> --- a/include/hw/acpi/cpu.h
>>>>> +++ b/include/hw/acpi/cpu.h
>>>>> @@ -22,6 +22,7 @@ typedef struct AcpiCpuStatus {
>>>>>        uint64_t arch_id;
>>>>>        bool is_inserting;
>>>>>        bool is_removing;
>>>>> +    bool fw_remove;
>>>>>        uint32_t ost_event;
>>>>>        uint32_t ost_status;
>>>>>    } AcpiCpuStatus;
>>>>> @@ -50,6 +51,7 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object
>>>>> *owner,
>>>>>    typedef struct CPUHotplugFeatures {
>>>>>        bool acpi_1_compatible;
>>>>>        bool has_legacy_cphp;
>>>>> +    bool fw_unplugs_cpu;
>>>>>        const char *smi_path;
>>>>>    } CPUHotplugFeatures;
>>>>>    diff --git a/docs/specs/acpi_cpu_hotplug.txt
>>>>> b/docs/specs/acpi_cpu_hotplug.txt
>>>>> index 9bb22d1270..f68ef6e06c 100644
>>>>> --- a/docs/specs/acpi_cpu_hotplug.txt
>>>>> +++ b/docs/specs/acpi_cpu_hotplug.txt
>>>>> @@ -57,7 +57,11 @@ read access:
>>>>>                  It's valid only when bit 0 is set.
>>>>>               2: Device remove event, used to distinguish device for which
>>>>>                  no device eject request to OSPM was issued.
>>>>> -           3-7: reserved and should be ignored by OSPM
>>>>> +           3: reserved and should be ignored by OSPM
>>>>> +           4: if set to 1, OSPM requests firmware to perform device
>>>>> eject,
>>>>> +              firmware shall clear this event by writing 1 into it
>>>>> before
>>>>> +              performing device eject> +           5-7: reserved and
>>>>> should be ignored by OSPM
>>>>>        [0x5-0x7] reserved
>>>>>        [0x8] Command data: (DWORD access)
>>>>>              contains 0 unless value last stored in 'Command field' is
>>>>> one of:
>>>>> @@ -82,7 +86,10 @@ write access:
>>>>>                   selected CPU device
>>>>>                3: if set to 1 initiates device eject, set by OSPM when it
>>>>>                   triggers CPU device removal and calls _EJ0 method
>>>>> -            4-7: reserved, OSPM must clear them before writing to
>>>>> register
>>>>> +            4: if set to 1 OSPM hands over device eject to firmware,
>>>>> +               Firmware shall issue device eject request as described
>>>>> above
>>>>> +               (bit #3) and OSPM should not touch device eject bit (#3),
>>>>> +            5-7: reserved, OSPM must clear them before writing to
>>>>> register
>>>>>        [0x5] Command field: (1 byte access)
>>>>>              value:
>>>>>                0: selects a CPU device with inserting/removing events and
>>>>> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
>>>>> index f099b50927..09d2f20dae 100644
>>>>> --- a/hw/acpi/cpu.c
>>>>> +++ b/hw/acpi/cpu.c
>>>>> @@ -71,6 +71,7 @@ static uint64_t cpu_hotplug_rd(void *opaque, hwaddr
>>>>> addr, unsigned size)
>>>>>            val |= cdev->cpu ? 1 : 0;
>>>>>            val |= cdev->is_inserting ? 2 : 0;
>>>>>            val |= cdev->is_removing  ? 4 : 0;
>>>>> +        val |= cdev->fw_remove  ? 16 : 0;  
>>>>
>>>> I might be missing something but I don't see where cdev->fw_remove is being
>>>> set.  
>>>
>>> See just below, in the cpu_hotplug_wr() hunk. When bit#4 is written --
>>> which happens through the ACPI code change --, fw_remove is inverted.  
>> Thanks that makes sense. I was reading the AML building code all wrong.
>>
>>>
>>>   
>>>> We do set cdev->is_removing in acpi_cpu_unplug_request_cb() so AFAICS
>>>> we would always end up setting this bit:  
>>>>>            val |= cdev->is_removing  ? 4 : 0;  
>>>>
>>>> Also, if cdev->fw_remove and cdev->is_removing are both true, val would be
>>>> (4 | 16). I'm guessing that in that case the AML determines which case gets
>>>> handled but it might make sense to set just one of these?  
>>>
>>> "is_removing" is set directly in response to the device_del QMP command.
>>> That QMP command is asynchronous to the execution of the guest OS.
>>> j
>>> "fw_remove" is set (by virtue of inverting) by ACPI CEJ0, which is
>>> executed by the guest OS's ACPI interpreter, after the guest OS has
>>> de-scheduled all processes from the CPU being removed (= basically after
>>> the OS has willfully forgotten about the CPU).
>>>
>>> Therefore, considering the bitmask (is_removing, fw_remove), three
>>> variations make sense:  
>>
>> Just annotating these with the corresponding ACPI code to make sure
>> I have it straight. Please correct if my interpretation is wrong. Also,
>> a few questions inline:
>>
>>>
>>> #1 (is_removing=0, fw_remove=0) -- normal status; no unplug requested
>>>
>>> #2 (is_removing=1, fw_remove=0) -- unplug requested via QMP, guest OS
>>>                                     is processing the request  
>>
>> Guest executes the CSCN method and reads rm_evt (bit 2) (thus noticing
>> the is_removing=1), and then notifies the CPU to be removed via the
>> CTFY method.
>>
>>     ifctx = aml_if(aml_equal(rm_evt, one));
>>     {
>>             aml_append(ifctx,
>>                        aml_call2(CPU_NOTIFY_METHOD, uid, eject_req));
>>             aml_append(ifctx, aml_store(one, rm_evt));
>>             aml_append(ifctx, aml_store(one, has_event));
>>     }
>>
>> Then it does a store to rm_evt (bit 2). That would result in clearing
>> of is_removing. (Igor mentions that in a separate mail.)
>>
>> 1. Do we need to clear is_removing at all? AFAICS, it's only useful as
>> an ack to QEMU and I can't think of why that's useful. OTOH it
>> doesn't serve any useful purpose once the guest OS has seen the request.
> no firmware doesn't need to care about it, it's consumed by OSPM only
>  
>> 2. Would it make sense to clear it first and then call CPU_NOTIFY_METHOD?
>> CPU_NOTIFY_METHOD (or _EJ0, COST) don't depend on is_removing but
>> that might change in the future.
> 
> all methods are protected by be same mutex, so if _EJ0 is called while CSCN
> in progress it will wait till CSCN is finished.
> But clearing bit #2 before Notify should work too.

I'd suggest not reordering existent stuff unless we really have to; the
firmware will have to deal with "is_removing" being the *only* status
flag set anway, as QMP "device_del" command(s) may set that bit for
another CPU (or multiple other CPUs) while the SMI handler is running,
and the "get pending" method will return such CPUs as well.

I wouldn't complicate the patches just in order to "hide" is_removing --
that's not a goal, so let's just keep as much AML untouched as we can.

(BTW I now understand why "is_removing" is clear when the eject method
runs for the same CPU -- because Notify (and so the eject method) is not
entered synchronously, it's only queued asynchronously. So it's actually
dispatched after the is_removing flag has been cleared.)


> 
>  
>> The notify would end up in calling acpi_hotplug_schedule() which would be
>> responsible for queuing work (on CPU0) to detach+unplug the CPU.
>>
>> Once the OS level detach succeeds, the worker evaluates the "_EJ0" method
>> which would do the actual CPU_EJECT_METHOD work.
>>
>> If the detach fails then it evaluates the CPU_OST_METHOD which updates
>> the status for the event and the status.
>>
>> At this point the state is back to:
>>
>> (is_removing=0, fw_remove=0)
> if OSPM fails to release CPU for whatever reasons, it's valid
> state, we just notify user using OST event that requested unplug wasn't successful.
> 
>>
>>> #3 (is_removing=1, fw_remove=1) -- guest OS removed all references from
>>>                                     the CPU, firmware is permitted /
>>>                                     required to forget about the CPU as
>>>                                     well, and then unplug the CPU  
>>
>> CPU_EJECT_METHOD will do a store to bit 4, which would invert (and
>> thus set) fw_remove and then do the SMI.
>>
>> So, this would be
>>> #3 (is_removing=0, fw_remove=1)  
>>
>> At this point the firmware calls QemuCPUhpCollectApicIds() which
>> (after changes) notices CPU(s) with fw_remove set.
>>
>> Collects them and does a store to bit 4, which would clear fw_remove.

I don't think we should clear fw_remove as soon as we collect the CPU,
in the firmware. Status flag modifications should be kept out of
QemuCpuhpCollectApicIds().


> 
> I'd skip this step on firmware side and make QEMU clear it
> when CPU is ejected.
> 
>>
>>>
>>> #4 (is_removing=1, fw_remove=0) -- fimware is about to unplug the CPU
>>>
>>> #5 (is_removing=0, fw_remove=0) -- firmware performing unplug  
>> Firmware does an unplug and writes to bit 3, thus clearing is_removing.
>>
>> On return from the firmware the guest evaluates the COST again.
> it's optional and depends on OSPM implementation (some do not call it on success)
> 
> 
>> And, eventually goes back to the CSCN where it processes more
>> hotplug or unplug events.
> CSCN in case of unplug finishes first, and only after that EJ0 calls
> are processed
> 
>>> The variation (is_removing=0, fw_remove=1) is invalid / unused.  
>>
>> /nods
>>>
>>>
>>> The firmware may be investigating the CPU register block between steps
>>> #2 and #3 -- in other words, the firmware may see a CPU for which
>>> is_remove is set (unplug requested via QMP), but the OS has not vacated
>>> yet (fw_remove=0). In that case, the firmware must just skip the CPU --
>>> once the OS is done, it will set fw_remove too, and raise another SMI.  
>> Yeah, it makes sense for the firmware to only care about a CPU once it
>> sees fw_remove=1. (And as currently situated, the firmware would never
>> see is_removing=1 at all.)

The firmware may well see is_removing=1, as the QMP device_del command
may set that bit on some other CPU, asynchronously to the firmware's
execution. The firmware needs to recognize if the "get pending" command
returns a CPU because of this, and just continue scanning.

Thanks
Laszlo
Igor Mammedov Nov. 27, 2020, 3:07 p.m. UTC | #15
On Fri, 27 Nov 2020 15:48:34 +0100
Laszlo Ersek <lersek@redhat.com> wrote:

> On 11/26/20 21:38, Igor Mammedov wrote:
> > On Thu, 26 Nov 2020 12:17:27 +0100
> > Laszlo Ersek <lersek@redhat.com> wrote:
> >   
> >> On 11/24/20 13:25, Igor Mammedov wrote:  
> 
> >>> diff --git a/docs/specs/acpi_cpu_hotplug.txt b/docs/specs/acpi_cpu_hotplug.txt
> >>> index 9bb22d1270..f68ef6e06c 100644
> >>> --- a/docs/specs/acpi_cpu_hotplug.txt
> >>> +++ b/docs/specs/acpi_cpu_hotplug.txt
> >>> @@ -57,7 +57,11 @@ read access:
> >>>                It's valid only when bit 0 is set.
> >>>             2: Device remove event, used to distinguish device for which
> >>>                no device eject request to OSPM was issued.
> >>> -           3-7: reserved and should be ignored by OSPM
> >>> +           3: reserved and should be ignored by OSPM
> >>> +           4: if set to 1, OSPM requests firmware to perform device eject,
> >>> +              firmware shall clear this event by writing 1 into it before    
> >>
> >> (1) s/clear this event/clear this event bit/
> >>  
> >>> +              performing device eject.    
> >>
> >> (2) move the second and third lines ("firmware shall clear....") over to
> >> the write documentation, below? In particular:
> >>  
> >>> +           5-7: reserved and should be ignored by OSPM
> >>>      [0x5-0x7] reserved
> >>>      [0x8] Command data: (DWORD access)
> >>>            contains 0 unless value last stored in 'Command field' is one of:
> >>> @@ -82,7 +86,10 @@ write access:
> >>>                 selected CPU device
> >>>              3: if set to 1 initiates device eject, set by OSPM when it
> >>>                 triggers CPU device removal and calls _EJ0 method
> >>> -            4-7: reserved, OSPM must clear them before writing to register
> >>> +            4: if set to 1 OSPM hands over device eject to firmware,
> >>> +               Firmware shall issue device eject request as described above
> >>> +               (bit #3) and OSPM should not touch device eject bit (#3),    
> >>
> >> (3) it would be clearer if we documented the exact bit writing order
> >> here:
> >> - clear bit#4, *then* set bit#3 (two write accesses)
> >> - versus clear bit#4 *and* set bit#3 (single access)  
> > 
> > I was thinking that FW should not bother with clearing bit #4,
> > and QEMU should clear it when handling write to bit #3.
> > (it looks like I forgot to actually do that)  
> 
> That should work fine too, as long as it's clearly documented.
> 
> 
> >>> @@ -332,6 +335,7 @@ const VMStateDescription vmstate_cpu_hotplug = {
> >>>  #define CPU_INSERT_EVENT  "CINS"
> >>>  #define CPU_REMOVE_EVENT  "CRMV"
> >>>  #define CPU_EJECT_EVENT   "CEJ0"
> >>> +#define CPU_FW_EJECT_EVENT "CEJF"
> >>>
> >>>  void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> >>>                      hwaddr io_base,
> >>> @@ -384,7 +388,10 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> >>>          aml_append(field, aml_named_field(CPU_REMOVE_EVENT, 1));
> >>>          /* initiates device eject, write only */
> >>>          aml_append(field, aml_named_field(CPU_EJECT_EVENT, 1));
> >>> -        aml_append(field, aml_reserved_field(4));
> >>> +        aml_append(field, aml_reserved_field(1));
> >>> +        /* tell firmware to do device eject, write only */
> >>> +        aml_append(field, aml_named_field(CPU_FW_EJECT_EVENT, 1));
> >>> +        aml_append(field, aml_reserved_field(2));
> >>>          aml_append(field, aml_named_field(CPU_COMMAND, 8));
> >>>          aml_append(cpu_ctrl_dev, field);
> >>>
> >>> @@ -419,6 +426,7 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> >>>          Aml *ins_evt = aml_name("%s.%s", cphp_res_path, CPU_INSERT_EVENT);
> >>>          Aml *rm_evt = aml_name("%s.%s", cphp_res_path, CPU_REMOVE_EVENT);
> >>>          Aml *ej_evt = aml_name("%s.%s", cphp_res_path, CPU_EJECT_EVENT);
> >>> +        Aml *fw_ej_evt = aml_name("%s.%s", cphp_res_path, CPU_FW_EJECT_EVENT);
> >>>
> >>>          aml_append(cpus_dev, aml_name_decl("_HID", aml_string("ACPI0010")));
> >>>          aml_append(cpus_dev, aml_name_decl("_CID", aml_eisaid("PNP0A05")));
> >>> @@ -461,7 +469,13 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> >>>
> >>>              aml_append(method, aml_acquire(ctrl_lock, 0xFFFF));
> >>>              aml_append(method, aml_store(idx, cpu_selector));
> >>> -            aml_append(method, aml_store(one, ej_evt));
> >>> +            if (opts.fw_unplugs_cpu) {
> >>> +                aml_append(method, aml_store(one, fw_ej_evt));
> >>> +                aml_append(method, aml_store(aml_int(OVMF_CPUHP_SMI_CMD),
> >>> +                           aml_name("%s", opts.smi_path)));
> >>> +            } else {
> >>> +                aml_append(method, aml_store(one, ej_evt));
> >>> +            }
> >>>              aml_append(method, aml_release(ctrl_lock));
> >>>          }
> >>>          aml_append(cpus_dev, method);    
> >>
> >> Hmmm, OK, let me parse this.
> >>
> >> Assume there is a big bunch of device_del QMP commands, QEMU marks the
> >> "remove" event pending on the corresponding set of CPUs, plus also makes
> >> the ACPI interrupt pending. The ACPI interrupt handler in the OS runs,
> >> and calls CSCN. CSCN runs a loop, and for each CPU where the remove
> >> event is pending, notifies the OS one by one. The OS in turn forgets
> >> about the subject CPU, and calls the _EJ0 method on the affected CPU
> >> ACPI object. The _EJ0 method on the CPU ACPI object calls CEJ0, passing
> >> in the affected CPU's identifier.
> >>
> >> The above hunk modifies the CEJ0 method.
> >>
> >> (5) Question: pre-patch, both the CSCN method and the CEJ0 method
> >> acquire the CPLK lock, but CEJ0 is actually called within CSCN
> >> (indirectly, with the OS's cooperation). Is CPLK a recursive lock?  
> > Theoretically scep supports recursive mutexes but I don't think it's the case here.
> > 
> > Considering it works currently, I think OS implements Notify event as async.
> > hence no clash wrt mutex. If EJ0 were handled within CSCN context,
> > EJ0 would mess cpu_selector value that CSCN is also using.  
> 
> Ah indeed. Yes, making Notify pending at first, and then delivering it
> inside the kernel only after the current AML call stack returns -- that
> seems to make sense. Otherwise we could get unbounded recursion (the
> notify handler calls another AML method, which could contain another
> notify ...)
> 
> 
> >> Anyway, let's see the CEJ0 modification. After the OS is done forgetting
> >> about the CPU, the CEJ0 method no longer unplugs the CPU, instead it
> >> sets the new bit#4 in the register block, and raises an SMI.
> >>
> >> (6) So that's one SMI per CPU being removed. Is that OK?  
> > 
> > I guess it has performance penalty but there is nothing we can do about it,
> > OSPM does EJ0 calls asynchronously.  
> 
> OK. Hot-unplug is not a frequent operation.
> 
> 
> >    
> >> (7) What if there are asynchronous plugs going on, and the firmware
> >> notices them in the register block? ... Hm, I hope that should be OK,
> >> because ultimately the CSCN method will learn about those too, and
> >> inform the OS. On plug, the firmware doesn't modify the register block.  
> > shouldn't be issue (modulo bugs, I haven't tried to hot add and hot remove
> > the same CPU at the same time)
> > 
> > i.e. 
> > (QEMU) pause
> > (QEMU) device_add
> > (QEMU) device_del
> > (QEMU) cont
> >   
> >> Ah! OK. I think I understand why bit#4 is important. The firmware may
> >> notice pending remove events, but it must not act upon them -- it must
> >> simply ignore them -- unless bit#4 is also set. Bit#2 set with bit#4
> >> clear means the event is pending (QEMU got a device_del), but the OS has
> >> not forgotten about the CPU yet -- so the firmware must not unplug it
> >> yet. When the modified CEJ0 method runs, it sets bit#4 in addition to
> >> the already set bit#2, advertising that the OS has *already* abandoned
> >> the CPU.  
> > firmware should ignore bit #2, it doesn't mean anything to it, OSPM might
> > ignore or nonsupport CPU removal. What firmware must care about is bit #4,
> > which tells it that OSPM is done with CPU and asks for to be removed by firmware.  
> 
> Makes sense, especially in combination with the idea that clearing the
> fw_remove bit should clear is_removing too.
> 
> The firmware logic needs to be aware of is_removing though, at least
> understand the existence of this bit, as the "get pending" command will
> report such CPUs too that only have is_removing set. Shouldn't be a
> problem, we just have to recognize it.

firmware shouldn't see bit #2 normally, it's cleared in AML during CSCN,
right after remove Notify is sent to OSPM. I don't see a reason for
firmware to use it, I'd just mask it out on firmware side if it messes logic.

potentially if we have concurrent plug/unplug for several CPUs, firmware
might see bit #2 which it should ignore, it's for OSPM consumption only.


> 
> [...]
> 
> 
> Thanks!
> Laszlo
Laszlo Ersek Nov. 27, 2020, 3:19 p.m. UTC | #16
On 11/27/20 05:10, Ankur Arora wrote:

> Yeah I was wondering what would happen for simultaneous hot add and remove.
> I guess we would always do remove first and then the add, unless we hit
> the break due to max_cpus_per_pass and switch to hot-add mode.

Considering the firmware only, I disagree with remove-then-add.

EFI_SMM_CPU_SERVICE_PROTOCOL.AddProcessor() and
EFI_SMM_CPU_SERVICE_PROTOCOL.RemoveProcessor() (implemented in
SmmAddProcessor() and SmmRemoveProcessor() in
"UefiCpuPkg/PiSmmCpuDxeSmm/CpuService.c", respectively) only mark the
processors for addition/removal. The actual processing is done only
later, in BSPHandler() --> SmmCpuUpdate(), when "all SMI handlers are
finished" (see the comment in SmmRemoveProcessor()).

Consequently, I would not suggest replacing a valid APIC ID in a
particular mCpuHotPlugData.ApicId[Index] slot with INVALID_APIC_ID
(corresponding to the unplug operation), and then possibly replacing
INVALID_APIC_ID in the *same slot* with the APIC ID of the newly plugged
CPU, in the exact same SMI invocation (= in the same execution of
CpuHotplugMmi()). That might cause some component in edk2 to see the
APIC ID in mCpuHotPlugData.ApicId[Index] to change from one valid ACPI
ID to another valid APIC ID, and I don't even want to think about what
kind of mess that could cause.

So no, please handle plugs first, for which unused slots in
mCpuHotPlugData.ApicId will be populated, and *then* handle removals (in
the same invocation of CpuHotplugMmi()).

By the way, for unplug, you will not have to re-set
mCpuHotPlugData.ApicId[Index] to INVALID_APIC_ID, as
SmmRemoveProcessor() does that internally. You just have to locate the
Index for the APIC ID being removed, for calling
EFI_SMM_CPU_SERVICE_PROTOCOL.RemoveProcessor().


Thanks
Laszlo
Laszlo Ersek Nov. 27, 2020, 4:52 p.m. UTC | #17
On 11/27/20 16:07, Igor Mammedov wrote:
> On Fri, 27 Nov 2020 15:48:34 +0100
> Laszlo Ersek <lersek@redhat.com> wrote:

>> The firmware logic needs to be aware of is_removing though, at least
>> understand the existence of this bit, as the "get pending" command
>> will report such CPUs too that only have is_removing set. Shouldn't
>> be a problem, we just have to recognize it.
>
> firmware shouldn't see bit #2 normally, it's cleared in AML during
> CSCN, right after remove Notify is sent to OSPM. I don't see a reason
> for firmware to use it, I'd just mask it out on firmware side if it
> messes logic.
>
> potentially if we have concurrent plug/unplug for several CPUs,
> firmware might see bit #2 which it should ignore, it's for OSPM
> consumption only.

Yes, that's what I meant.

Currently, inside the scanning loop of the QemuCpuhpCollectApicIds()
function in "OvmfPkg/CpuHotplugSmm/QemuCpuhp.c", there is no branch that
simply skips a CPU that was reported by QEMU_CPUHP_CMD_GET_PENDING. Now,
such a branch will be necessary.

This is what I mean (just for illustration):

$ git diff -b -U5

> diff --git a/OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h b/OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h
> index a34a6d3fae61..ddeef047c517 100644
> --- a/OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h
> +++ b/OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h
> @@ -32,10 +32,11 @@
>
>  #define QEMU_CPUHP_R_CPU_STAT                0x4
>  #define QEMU_CPUHP_STAT_ENABLED                BIT0
>  #define QEMU_CPUHP_STAT_INSERT                 BIT1
>  #define QEMU_CPUHP_STAT_REMOVE                 BIT2
> +#define QEMU_CPUHP_STAT_FIRMWARE_REMOVE        BIT4
>
>  #define QEMU_CPUHP_RW_CMD_DATA               0x8
>
>  #define QEMU_CPUHP_W_CPU_SEL                 0x0
>
> diff --git a/OvmfPkg/CpuHotplugSmm/QemuCpuhp.c b/OvmfPkg/CpuHotplugSmm/QemuCpuhp.c
> index 8d4a6693c8d6..9bff31628e61 100644
> --- a/OvmfPkg/CpuHotplugSmm/QemuCpuhp.c
> +++ b/OvmfPkg/CpuHotplugSmm/QemuCpuhp.c
> @@ -258,35 +258,44 @@ QemuCpuhpCollectApicIds (
>        DEBUG ((DEBUG_VERBOSE, "%a: CurrentSelector=%u: insert\n", __FUNCTION__,
>          CurrentSelector));
>
>        ExtendIds   = PluggedApicIds;
>        ExtendCount = PluggedCount;
> -    } else if ((CpuStatus & QEMU_CPUHP_STAT_REMOVE) != 0) {
> -      DEBUG ((DEBUG_VERBOSE, "%a: CurrentSelector=%u: remove\n", __FUNCTION__,
> -        CurrentSelector));
> +    } else if ((CpuStatus & QEMU_CPUHP_STAT_FIRMWARE_REMOVE) != 0) {
> +      DEBUG ((DEBUG_VERBOSE, "%a: CurrentSelector=%u: firmware remove\n",
> +        __FUNCTION__, CurrentSelector));
>
>        ExtendIds   = ToUnplugApicIds;
>        ExtendCount = ToUnplugCount;
> +    } else if ((CpuStatus & QEMU_CPUHP_STAT_REMOVE) != 0) {
> +      DEBUG ((DEBUG_VERBOSE, "%a: CurrentSelector=%u: remove\n", __FUNCTION__,
> +        CurrentSelector));
> +
> +      ExtendIds   = NULL;
> +      ExtendCount = NULL;
>      } else {
>        DEBUG ((DEBUG_VERBOSE, "%a: CurrentSelector=%u: no event\n",
>          __FUNCTION__, CurrentSelector));
>        break;
>      }
>
> +    ASSERT ((ExtendIds == NULL) == (ExtendCount == NULL));
> +    if (ExtendIds != NULL) {
>        //
> -    // Save the APIC ID of the CPU with the pending event, to the corresponding
> -    // APIC ID array.
> +      // Save the APIC ID of the CPU with the pending event, to the
> +      // corresponding APIC ID array.
>        //
>        if (*ExtendCount == ApicIdCount) {
>          DEBUG ((DEBUG_ERROR, "%a: APIC ID array too small\n", __FUNCTION__));
>          return EFI_BUFFER_TOO_SMALL;
>        }
>        QemuCpuhpWriteCommand (MmCpuIo, QEMU_CPUHP_CMD_GET_ARCH_ID);
>        NewApicId = QemuCpuhpReadCommandData (MmCpuIo);
>        DEBUG ((DEBUG_VERBOSE, "%a: ApicId=" FMT_APIC_ID "\n", __FUNCTION__,
>          NewApicId));
>        ExtendIds[(*ExtendCount)++] = NewApicId;
> +    }
>
>      //
>      // We've processed the CPU with (known) pending events, but we must never
>      // clear events. Therefore we need to advance past this CPU manually;
>      // otherwise, QEMU_CPUHP_CMD_GET_PENDING would stick to the currently

Thanks
Laszlo
Ankur Arora Nov. 27, 2020, 11:48 p.m. UTC | #18
On 2020-11-27 7:02 a.m., Laszlo Ersek wrote:
> On 11/27/20 12:33, Igor Mammedov wrote:
>> On Thu, 26 Nov 2020 19:35:30 -0800
>> Ankur Arora <ankur.a.arora@oracle.com> wrote:
>>
>>> On 2020-11-26 4:46 a.m., Laszlo Ersek wrote:
>>>> On 11/26/20 11:24, Ankur Arora wrote:
>>>>> On 2020-11-24 4:25 a.m., Igor Mammedov wrote:
>>>>>> If firmware negotiates ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT feature,
>>>>>> OSPM on CPU eject will set bit #4 in CPU hotplug block for to be
>>>>>> ejected CPU to mark it for removal by firmware and trigger SMI
>>>>>> upcall to let firmware do actual eject.
>>>>>>
>>>>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
>>>>>> ---
>>>>>> PS:
>>>>>>      - abuse 5.1 machine type for now to turn off unplug feature
>>>>>>        (it will be moved to 5.2 machine type once new merge window is open)
>>>>>> ---
>>>>>>     include/hw/acpi/cpu.h           |  2 ++
>>>>>>     docs/specs/acpi_cpu_hotplug.txt | 11 +++++++++--
>>>>>>     hw/acpi/cpu.c                   | 18 ++++++++++++++++--
>>>>>>     hw/i386/acpi-build.c            |  5 +++++
>>>>>>     hw/i386/pc.c                    |  1 +
>>>>>>     hw/isa/lpc_ich9.c               |  2 +-
>>>>>>     6 files changed, 34 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
>>>>>> index 0eeedaa491..999caaf510 100644
>>>>>> --- a/include/hw/acpi/cpu.h
>>>>>> +++ b/include/hw/acpi/cpu.h
>>>>>> @@ -22,6 +22,7 @@ typedef struct AcpiCpuStatus {
>>>>>>         uint64_t arch_id;
>>>>>>         bool is_inserting;
>>>>>>         bool is_removing;
>>>>>> +    bool fw_remove;
>>>>>>         uint32_t ost_event;
>>>>>>         uint32_t ost_status;
>>>>>>     } AcpiCpuStatus;
>>>>>> @@ -50,6 +51,7 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object
>>>>>> *owner,
>>>>>>     typedef struct CPUHotplugFeatures {
>>>>>>         bool acpi_1_compatible;
>>>>>>         bool has_legacy_cphp;
>>>>>> +    bool fw_unplugs_cpu;
>>>>>>         const char *smi_path;
>>>>>>     } CPUHotplugFeatures;
>>>>>>     diff --git a/docs/specs/acpi_cpu_hotplug.txt
>>>>>> b/docs/specs/acpi_cpu_hotplug.txt
>>>>>> index 9bb22d1270..f68ef6e06c 100644
>>>>>> --- a/docs/specs/acpi_cpu_hotplug.txt
>>>>>> +++ b/docs/specs/acpi_cpu_hotplug.txt
>>>>>> @@ -57,7 +57,11 @@ read access:
>>>>>>                   It's valid only when bit 0 is set.
>>>>>>                2: Device remove event, used to distinguish device for which
>>>>>>                   no device eject request to OSPM was issued.
>>>>>> -           3-7: reserved and should be ignored by OSPM
>>>>>> +           3: reserved and should be ignored by OSPM
>>>>>> +           4: if set to 1, OSPM requests firmware to perform device
>>>>>> eject,
>>>>>> +              firmware shall clear this event by writing 1 into it
>>>>>> before
>>>>>> +              performing device eject> +           5-7: reserved and
>>>>>> should be ignored by OSPM
>>>>>>         [0x5-0x7] reserved
>>>>>>         [0x8] Command data: (DWORD access)
>>>>>>               contains 0 unless value last stored in 'Command field' is
>>>>>> one of:
>>>>>> @@ -82,7 +86,10 @@ write access:
>>>>>>                    selected CPU device
>>>>>>                 3: if set to 1 initiates device eject, set by OSPM when it
>>>>>>                    triggers CPU device removal and calls _EJ0 method
>>>>>> -            4-7: reserved, OSPM must clear them before writing to
>>>>>> register
>>>>>> +            4: if set to 1 OSPM hands over device eject to firmware,
>>>>>> +               Firmware shall issue device eject request as described
>>>>>> above
>>>>>> +               (bit #3) and OSPM should not touch device eject bit (#3),
>>>>>> +            5-7: reserved, OSPM must clear them before writing to
>>>>>> register
>>>>>>         [0x5] Command field: (1 byte access)
>>>>>>               value:
>>>>>>                 0: selects a CPU device with inserting/removing events and
>>>>>> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
>>>>>> index f099b50927..09d2f20dae 100644
>>>>>> --- a/hw/acpi/cpu.c
>>>>>> +++ b/hw/acpi/cpu.c
>>>>>> @@ -71,6 +71,7 @@ static uint64_t cpu_hotplug_rd(void *opaque, hwaddr
>>>>>> addr, unsigned size)
>>>>>>             val |= cdev->cpu ? 1 : 0;
>>>>>>             val |= cdev->is_inserting ? 2 : 0;
>>>>>>             val |= cdev->is_removing  ? 4 : 0;
>>>>>> +        val |= cdev->fw_remove  ? 16 : 0;
>>>>>
>>>>> I might be missing something but I don't see where cdev->fw_remove is being
>>>>> set.
>>>>
>>>> See just below, in the cpu_hotplug_wr() hunk. When bit#4 is written --
>>>> which happens through the ACPI code change --, fw_remove is inverted.
>>> Thanks that makes sense. I was reading the AML building code all wrong.
>>>
>>>>
>>>>    
>>>>> We do set cdev->is_removing in acpi_cpu_unplug_request_cb() so AFAICS
>>>>> we would always end up setting this bit:
>>>>>>             val |= cdev->is_removing  ? 4 : 0;
>>>>>
>>>>> Also, if cdev->fw_remove and cdev->is_removing are both true, val would be
>>>>> (4 | 16). I'm guessing that in that case the AML determines which case gets
>>>>> handled but it might make sense to set just one of these?
>>>>
>>>> "is_removing" is set directly in response to the device_del QMP command.
>>>> That QMP command is asynchronous to the execution of the guest OS.
>>>> j
>>>> "fw_remove" is set (by virtue of inverting) by ACPI CEJ0, which is
>>>> executed by the guest OS's ACPI interpreter, after the guest OS has
>>>> de-scheduled all processes from the CPU being removed (= basically after
>>>> the OS has willfully forgotten about the CPU).
>>>>
>>>> Therefore, considering the bitmask (is_removing, fw_remove), three
>>>> variations make sense:
>>>
>>> Just annotating these with the corresponding ACPI code to make sure
>>> I have it straight. Please correct if my interpretation is wrong. Also,
>>> a few questions inline:
>>>
>>>>
>>>> #1 (is_removing=0, fw_remove=0) -- normal status; no unplug requested
>>>>
>>>> #2 (is_removing=1, fw_remove=0) -- unplug requested via QMP, guest OS
>>>>                                      is processing the request
>>>
>>> Guest executes the CSCN method and reads rm_evt (bit 2) (thus noticing
>>> the is_removing=1), and then notifies the CPU to be removed via the
>>> CTFY method.
>>>
>>>      ifctx = aml_if(aml_equal(rm_evt, one));
>>>      {
>>>              aml_append(ifctx,
>>>                         aml_call2(CPU_NOTIFY_METHOD, uid, eject_req));
>>>              aml_append(ifctx, aml_store(one, rm_evt));
>>>              aml_append(ifctx, aml_store(one, has_event));
>>>      }
>>>
>>> Then it does a store to rm_evt (bit 2). That would result in clearing
>>> of is_removing. (Igor mentions that in a separate mail.)
>>>
>>> 1. Do we need to clear is_removing at all? AFAICS, it's only useful as
>>> an ack to QEMU and I can't think of why that's useful. OTOH it
>>> doesn't serve any useful purpose once the guest OS has seen the request.
>> no firmware doesn't need to care about it, it's consumed by OSPM only
>>   
>>> 2. Would it make sense to clear it first and then call CPU_NOTIFY_METHOD?
>>> CPU_NOTIFY_METHOD (or _EJ0, COST) don't depend on is_removing but
>>> that might change in the future.
>>
>> all methods are protected by be same mutex, so if _EJ0 is called while CSCN
>> in progress it will wait till CSCN is finished.
>> But clearing bit #2 before Notify should work too.
> 
> I'd suggest not reordering existent stuff unless we really have to; the
> firmware will have to deal with "is_removing" being the *only* status
> flag set anway, as QMP "device_del" command(s) may set that bit for
> another CPU (or multiple other CPUs) while the SMI handler is running,
> and the "get pending" method will return such CPUs as well.

Yeah I was must making sure my understanding of these related pieces of
code was correct. And anyway, as Igor mentioned, that bit of AML is protected
by the mutex so the ordering doesn't even matter.

> 
> I wouldn't complicate the patches just in order to "hide" is_removing --
> that's not a goal, so let's just keep as much AML untouched as we can.
> 
> (BTW I now understand why "is_removing" is clear when the eject method
> runs for the same CPU -- because Notify (and so the eject method) is not
> entered synchronously, it's only queued asynchronously. So it's actually
> dispatched after the is_removing flag has been cleared.)
> 
> 
>>
>>   
>>> The notify would end up in calling acpi_hotplug_schedule() which would be
>>> responsible for queuing work (on CPU0) to detach+unplug the CPU.
>>>
>>> Once the OS level detach succeeds, the worker evaluates the "_EJ0" method
>>> which would do the actual CPU_EJECT_METHOD work.
>>>
>>> If the detach fails then it evaluates the CPU_OST_METHOD which updates
>>> the status for the event and the status.
>>>
>>> At this point the state is back to:
>>>
>>> (is_removing=0, fw_remove=0)
>> if OSPM fails to release CPU for whatever reasons, it's valid
>> state, we just notify user using OST event that requested unplug wasn't successful.
>>
>>>
>>>> #3 (is_removing=1, fw_remove=1) -- guest OS removed all references from
>>>>                                      the CPU, firmware is permitted /
>>>>                                      required to forget about the CPU as
>>>>                                      well, and then unplug the CPU
>>>
>>> CPU_EJECT_METHOD will do a store to bit 4, which would invert (and
>>> thus set) fw_remove and then do the SMI.
>>>
>>> So, this would be
>>>> #3 (is_removing=0, fw_remove=1)
>>>
>>> At this point the firmware calls QemuCPUhpCollectApicIds() which
>>> (after changes) notices CPU(s) with fw_remove set.
>>>
>>> Collects them and does a store to bit 4, which would clear fw_remove.
> 
> I don't think we should clear fw_remove as soon as we collect the CPU,
> in the firmware. Status flag modifications should be kept out of
> QemuCpuhpCollectApicIds().
> 
>>
>> I'd skip this step on firmware side and make QEMU clear it
>> when CPU is ejected.

Makes sense.

>>
>>>
>>>>
>>>> #4 (is_removing=1, fw_remove=0) -- fimware is about to unplug the CPU
>>>>
>>>> #5 (is_removing=0, fw_remove=0) -- firmware performing unplug
>>> Firmware does an unplug and writes to bit 3, thus clearing is_removing.
>>>
>>> On return from the firmware the guest evaluates the COST again.
>> it's optional and depends on OSPM implementation (some do not call it on success)
>>
>>
>>> And, eventually goes back to the CSCN where it processes more
>>> hotplug or unplug events.
>> CSCN in case of unplug finishes first, and only after that EJ0 calls
>> are processed
>>
>>>> The variation (is_removing=0, fw_remove=1) is invalid / unused.
>>>
>>> /nods
>>>>
>>>>
>>>> The firmware may be investigating the CPU register block between steps
>>>> #2 and #3 -- in other words, the firmware may see a CPU for which
>>>> is_remove is set (unplug requested via QMP), but the OS has not vacated
>>>> yet (fw_remove=0). In that case, the firmware must just skip the CPU --
>>>> once the OS is done, it will set fw_remove too, and raise another SMI.
>>> Yeah, it makes sense for the firmware to only care about a CPU once it
>>> sees fw_remove=1. (And as currently situated, the firmware would never
>>> see is_removing=1 at all.)
> 
> The firmware may well see is_removing=1, as the QMP device_del command
> may set that bit on some other CPU, asynchronously to the firmware's
> execution. The firmware needs to recognize if the "get pending" command
> returns a CPU because of this, and just continue scanning.
Right. Processing (for unplug) any pending CPUs which are not OSPM sanctioned
(so do not have fw_remove set) could be catastrophic for the guest and so
the unplug path should ignore them.

It is possible that there are CPUs with bits for both is_inserting and
is_removing. In that case QemuCpuhpCollectApicIds() would put them in the
PluggedApicIds array and the unplug eventually happens in the next
firmware invocation.

If a CPU has both is_inserting and fw_remove set, the firmware processes the
hotplug in that invocation and the unplug happens whenever the OSPM triggers
the firmware next.

Thanks
Ankur

> 
> Thanks
> Laszlo
>
Ankur Arora Nov. 27, 2020, 11:49 p.m. UTC | #19
On 2020-11-27 3:47 a.m., Igor Mammedov wrote:
> On Thu, 26 Nov 2020 20:10:59 -0800
> Ankur Arora <ankur.a.arora@oracle.com> wrote:
> 
>> On 2020-11-26 12:38 p.m., Igor Mammedov wrote:
>>> On Thu, 26 Nov 2020 12:17:27 +0100
>>> Laszlo Ersek <lersek@redhat.com> wrote:
>>>    
>>>> On 11/24/20 13:25, Igor Mammedov wrote:
>>>>> If firmware negotiates ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT feature,
>>>>> OSPM on CPU eject will set bit #4 in CPU hotplug block for to be
>>>>> ejected CPU to mark it for removal by firmware and trigger SMI
>>>>> upcall to let firmware do actual eject.
>>>>>
>>>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
>>>>> ---
>>>>> PS:
>>>>>     - abuse 5.1 machine type for now to turn off unplug feature
>>>>>       (it will be moved to 5.2 machine type once new merge window is open)
>>>>> ---
>>>>>    include/hw/acpi/cpu.h           |  2 ++
>>>>>    docs/specs/acpi_cpu_hotplug.txt | 11 +++++++++--
>>>>>    hw/acpi/cpu.c                   | 18 ++++++++++++++++--
>>>>>    hw/i386/acpi-build.c            |  5 +++++
>>>>>    hw/i386/pc.c                    |  1 +
>>>>>    hw/isa/lpc_ich9.c               |  2 +-
>>>>>    6 files changed, 34 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
>>>>> index 0eeedaa491..999caaf510 100644
>>>>> --- a/include/hw/acpi/cpu.h
>>>>> +++ b/include/hw/acpi/cpu.h
>>>>> @@ -22,6 +22,7 @@ typedef struct AcpiCpuStatus {
>>>>>        uint64_t arch_id;
>>>>>        bool is_inserting;
>>>>>        bool is_removing;
>>>>> +    bool fw_remove;
>>>>>        uint32_t ost_event;
>>>>>        uint32_t ost_status;
>>>>>    } AcpiCpuStatus;
>>>>> @@ -50,6 +51,7 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object *owner,
>>>>>    typedef struct CPUHotplugFeatures {
>>>>>        bool acpi_1_compatible;
>>>>>        bool has_legacy_cphp;
>>>>> +    bool fw_unplugs_cpu;
>>>>>        const char *smi_path;
>>>>>    } CPUHotplugFeatures;
>>>>>
>>>>> diff --git a/docs/specs/acpi_cpu_hotplug.txt b/docs/specs/acpi_cpu_hotplug.txt
>>>>> index 9bb22d1270..f68ef6e06c 100644
>>>>> --- a/docs/specs/acpi_cpu_hotplug.txt
>>>>> +++ b/docs/specs/acpi_cpu_hotplug.txt
>>>>> @@ -57,7 +57,11 @@ read access:
>>>>>                  It's valid only when bit 0 is set.
>>>>>               2: Device remove event, used to distinguish device for which
>>>>>                  no device eject request to OSPM was issued.
>>>>> -           3-7: reserved and should be ignored by OSPM
>>>>> +           3: reserved and should be ignored by OSPM
>>>>> +           4: if set to 1, OSPM requests firmware to perform device eject,
>>>>> +              firmware shall clear this event by writing 1 into it before
>>>>
>>>> (1) s/clear this event/clear this event bit/
>>>>   
>>>>> +              performing device eject.
>>>>
>>>> (2) move the second and third lines ("firmware shall clear....") over to
>>>> the write documentation, below? In particular:
>>>>   
>>>>> +           5-7: reserved and should be ignored by OSPM
>>>>>        [0x5-0x7] reserved
>>>>>        [0x8] Command data: (DWORD access)
>>>>>              contains 0 unless value last stored in 'Command field' is one of:
>>>>> @@ -82,7 +86,10 @@ write access:
>>>>>                   selected CPU device
>>>>>                3: if set to 1 initiates device eject, set by OSPM when it
>>>>>                   triggers CPU device removal and calls _EJ0 method
>>>>> -            4-7: reserved, OSPM must clear them before writing to register
>>>>> +            4: if set to 1 OSPM hands over device eject to firmware,
>>>>> +               Firmware shall issue device eject request as described above
>>>>> +               (bit #3) and OSPM should not touch device eject bit (#3),
>>>>
>>>> (3) it would be clearer if we documented the exact bit writing order
>>>> here:
>>>> - clear bit#4, *then* set bit#3 (two write accesses)
>>>> - versus clear bit#4 *and* set bit#3 (single access)
>>>
>>> I was thinking that FW should not bother with clearing bit #4,
>>> and QEMU should clear it when handling write to bit #3.
>>> (it looks like I forgot to actually do that)
>>
>> Why involve the firmware with bit #3 at all? If the firmware only reads bit #4
>> to detect fw_remove and then write (and thus reset) bit #4, isn't that
>> good enough?
> 
> That would needlessly complicate code on QEMU side and I don't want to
> overload bit #4 with another semantics, and we already have bit #3 that
> does eject. So unless there are issues with that, I'd stick to using
> bit #3 for eject.

Sounds good.

> 
>>
>>
>>>    
>>>>
>>>>
>>>>   
>>>>> +            5-7: reserved, OSPM must clear them before writing to register
>>>>>        [0x5] Command field: (1 byte access)
>>>>>              value:
>>>>>                0: selects a CPU device with inserting/removing events and
>>>>> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
>>>>> index f099b50927..09d2f20dae 100644
>>>>> --- a/hw/acpi/cpu.c
>>>>> +++ b/hw/acpi/cpu.c
>>>>> @@ -71,6 +71,7 @@ static uint64_t cpu_hotplug_rd(void *opaque, hwaddr addr, unsigned size)
>>>>>            val |= cdev->cpu ? 1 : 0;
>>>>>            val |= cdev->is_inserting ? 2 : 0;
>>>>>            val |= cdev->is_removing  ? 4 : 0;
>>>>> +        val |= cdev->fw_remove  ? 16 : 0;
>>>>>            trace_cpuhp_acpi_read_flags(cpu_st->selector, val);
>>>>>            break;
>>>>>        case ACPI_CPU_CMD_DATA_OFFSET_RW:
>>>>> @@ -148,6 +149,8 @@ static void cpu_hotplug_wr(void *opaque, hwaddr addr, uint64_t data,
>>>>>                hotplug_ctrl = qdev_get_hotplug_handler(dev);
>>>>>                hotplug_handler_unplug(hotplug_ctrl, dev, NULL);
>>>>>                object_unparent(OBJECT(dev));
>>>>> +        } else if (data & 16) {
>>>>> +            cdev->fw_remove = !cdev->fw_remove;
>>>>
>>>> hm... so I guess the ACPI code will first write bit#4 to flip
>>>> "fw_remove" from "off" to "on". Then the firmware will write bit#4 to
>>>> flip "fw_remove" back  to "off". And finally, the firmware will write
>>>> bit#3 (strictly as a separate access) to unplug the CPU.
>>> sorry for confusion in doc vs impl, FW should only read bit #4, as for bit #3 only write.
>>>      
>>>> (4) But anyway, taking a step back: what do we need the new bit for?
>>>>   
>>>>>            }
>>>>>            break;
>>>>>        case ACPI_CPU_CMD_OFFSET_WR:
>>>>> @@ -332,6 +335,7 @@ const VMStateDescription vmstate_cpu_hotplug = {
>>>>>    #define CPU_INSERT_EVENT  "CINS"
>>>>>    #define CPU_REMOVE_EVENT  "CRMV"
>>>>>    #define CPU_EJECT_EVENT   "CEJ0"
>>>>> +#define CPU_FW_EJECT_EVENT "CEJF"
>>>>>
>>>>>    void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>>>>>                        hwaddr io_base,
>>>>> @@ -384,7 +388,10 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>>>>>            aml_append(field, aml_named_field(CPU_REMOVE_EVENT, 1));
>>>>>            /* initiates device eject, write only */
>>>>>            aml_append(field, aml_named_field(CPU_EJECT_EVENT, 1));
>>>>> -        aml_append(field, aml_reserved_field(4));
>>>>> +        aml_append(field, aml_reserved_field(1));
>>>>> +        /* tell firmware to do device eject, write only */
>>>>> +        aml_append(field, aml_named_field(CPU_FW_EJECT_EVENT, 1));
>>>>> +        aml_append(field, aml_reserved_field(2));
>>>>>            aml_append(field, aml_named_field(CPU_COMMAND, 8));
>>>>>            aml_append(cpu_ctrl_dev, field);
>>>>>
>>>>> @@ -419,6 +426,7 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>>>>>            Aml *ins_evt = aml_name("%s.%s", cphp_res_path, CPU_INSERT_EVENT);
>>>>>            Aml *rm_evt = aml_name("%s.%s", cphp_res_path, CPU_REMOVE_EVENT);
>>>>>            Aml *ej_evt = aml_name("%s.%s", cphp_res_path, CPU_EJECT_EVENT);
>>>>> +        Aml *fw_ej_evt = aml_name("%s.%s", cphp_res_path, CPU_FW_EJECT_EVENT);
>>>>>
>>>>>            aml_append(cpus_dev, aml_name_decl("_HID", aml_string("ACPI0010")));
>>>>>            aml_append(cpus_dev, aml_name_decl("_CID", aml_eisaid("PNP0A05")));
>>>>> @@ -461,7 +469,13 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>>>>>
>>>>>                aml_append(method, aml_acquire(ctrl_lock, 0xFFFF));
>>>>>                aml_append(method, aml_store(idx, cpu_selector));
>>>>> -            aml_append(method, aml_store(one, ej_evt));
>>>>> +            if (opts.fw_unplugs_cpu) {
>>>>> +                aml_append(method, aml_store(one, fw_ej_evt));
>>>>> +                aml_append(method, aml_store(aml_int(OVMF_CPUHP_SMI_CMD),
>>>>> +                           aml_name("%s", opts.smi_path)));
>>>>> +            } else {
>>>>> +                aml_append(method, aml_store(one, ej_evt));
>>>>> +            }
>>>>>                aml_append(method, aml_release(ctrl_lock));
>>>>>            }
>>>>>            aml_append(cpus_dev, method);
>>>>
>>>> Hmmm, OK, let me parse this.
>>>>
>>>> Assume there is a big bunch of device_del QMP commands, QEMU marks the
>>>> "remove" event pending on the corresponding set of CPUs, plus also makes
>>>> the ACPI interrupt pending. The ACPI interrupt handler in the OS runs,
>>>> and calls CSCN. CSCN runs a loop, and for each CPU where the remove
>>>> event is pending, notifies the OS one by one. The OS in turn forgets
>>>> about the subject CPU, and calls the _EJ0 method on the affected CPU
>>>> ACPI object. The _EJ0 method on the CPU ACPI object calls CEJ0, passing
>>>> in the affected CPU's identifier.
>>>>
>>>> The above hunk modifies the CEJ0 method.
>>>>
>>>> (5) Question: pre-patch, both the CSCN method and the CEJ0 method
>>>> acquire the CPLK lock, but CEJ0 is actually called within CSCN
>>>> (indirectly, with the OS's cooperation). Is CPLK a recursive lock?
>>> Theoretically scep supports recursive mutexes but I don't think it's the case here.
>>>
>>> Considering it works currently, I think OS implements Notify event as async.
>>> hence no clash wrt mutex. If EJ0 were handled within CSCN context,
>>> EJ0 would mess cpu_selector value that CSCN is also using.
>>
>>   From my read of the Linux code, yeah, the EJ0 execution happens in an
>> async worker on CPU 0 which first detaches the CPU and then executes EJ0.
>>>> Anyway, let's see the CEJ0 modification. After the OS is done forgetting
>>>> about the CPU, the CEJ0 method no longer unplugs the CPU, instead it
>>>> sets the new bit#4 in the register block, and raises an SMI.
>>>>
>>>> (6) So that's one SMI per CPU being removed. Is that OK?
>>>
>>> I guess it has performance penalty but there is nothing we can do about it,
>>> OSPM does EJ0 calls asynchronously.
>>>      
>>>> (7) What if there are asynchronous plugs going on, and the firmware
>>>> notices them in the register block? ... Hm, I hope that should be OK,
>>>> because ultimately the CSCN method will learn about those too, and
>>>> inform the OS. On plug, the firmware doesn't modify the register block.
>>> shouldn't be issue (modulo bugs, I haven't tried to hot add and hot remove
>>> the same CPU at the same time)
>>
>> Yeah I was wondering what would happen for simultaneous hot add and remove.
>> I guess we would always do remove first and then the add, unless we hit
>> the break due to max_cpus_per_pass and switch to hot-add mode.
>>
>>>
>>> i.e.
>>> (QEMU) pause
>>> (QEMU) device_add
>>> (QEMU) device_del
>>> (QEMU) cont
> 
> looking at current CPU_SCAN_METHOD
> it will notice and process insert event only.
> Remove event will be pending till the next cpu hotplug SCI.
> (i.e. next time user hot(un)plugs a cpu)
> 
> not sure that such use case is worth fixing though.

Yeah this is probably only useful as a test-case.

Thanks
Ankur

> 
>>>    
>>>> Ah! OK. I think I understand why bit#4 is important. The firmware may
>>>> notice pending remove events, but it must not act upon them -- it must
>>>> simply ignore them -- unless bit#4 is also set. Bit#2 set with bit#4
>>>> clear means the event is pending (QEMU got a device_del), but the OS has
>>>> not forgotten about the CPU yet -- so the firmware must not unplug it
>>>> yet. When the modified CEJ0 method runs, it sets bit#4 in addition to
>>>> the already set bit#2, advertising that the OS has *already* abandoned
>>>> the CPU.
>>> firmware should ignore bit #2, it doesn't mean anything to it, OSPM might
>>> ignore or nonsupport CPU removal. What firmware must care about is bit #4,
>>> which tells it that OSPM is done with CPU and asks for to be removed by firmware.
>>
>> In my other mail, I was suggesting that the guest OS not reset bit #2 but
>> on second thoughts, this makes sense.
>>
>>
>> Thanks
>> Ankur
>>
>>>    
>>>>
>>>> This means we'll have to modify the QemuCpuhpCollectApicIds() function
>>>> in OVMF as well -- for collecting a CPU for unplug, just bit#2
>>>> (QEMU_CPUHP_STAT_REMOVE) is insufficient -- on such CPUs, the OS may
>>>> still be executing threads.
>>>>
>>>> OK, this approach sounds plausible to me.
>>>>
>>>> (8) Please extend the description of bit#2 in the "status flags read
>>>> access" section: "firmware must ignore bit#2 unless bit#4 is set".
>>>>
>>>>
>>>>   
>>>>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>>>>> index 1f5c211245..475e76f514 100644
>>>>> --- a/hw/i386/acpi-build.c
>>>>> +++ b/hw/i386/acpi-build.c
>>>>> @@ -96,6 +96,7 @@ typedef struct AcpiPmInfo {
>>>>>        bool s4_disabled;
>>>>>        bool pcihp_bridge_en;
>>>>>        bool smi_on_cpuhp;
>>>>> +    bool smi_on_cpu_unplug;
>>>>>        bool pcihp_root_en;
>>>>>        uint8_t s4_val;
>>>>>        AcpiFadtData fadt;
>>>>> @@ -197,6 +198,7 @@ static void acpi_get_pm_info(MachineState *machine, AcpiPmInfo *pm)
>>>>>        pm->pcihp_io_base = 0;
>>>>>        pm->pcihp_io_len = 0;
>>>>>        pm->smi_on_cpuhp = false;
>>>>> +    pm->smi_on_cpu_unplug = false;
>>>>>
>>>>>        assert(obj);
>>>>>        init_common_fadt_data(machine, obj, &pm->fadt);
>>>>> @@ -220,6 +222,8 @@ static void acpi_get_pm_info(MachineState *machine, AcpiPmInfo *pm)
>>>>>            pm->cpu_hp_io_base = ICH9_CPU_HOTPLUG_IO_BASE;
>>>>>            pm->smi_on_cpuhp =
>>>>>                !!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT));
>>>>> +        pm->smi_on_cpu_unplug =
>>>>> +            !!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT));
>>>>>        }
>>>>>
>>>>>        /* The above need not be conditional on machine type because the reset port
>>>>> @@ -1582,6 +1586,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>>>>>            CPUHotplugFeatures opts = {
>>>>>                .acpi_1_compatible = true, .has_legacy_cphp = true,
>>>>>                .smi_path = pm->smi_on_cpuhp ? "\\_SB.PCI0.SMI0.SMIC" : NULL,
>>>>> +            .fw_unplugs_cpu = pm->smi_on_cpu_unplug,
>>>>>            };
>>>>>            build_cpus_aml(dsdt, machine, opts, pm->cpu_hp_io_base,
>>>>>                           "\\_SB.PCI0", "\\_GPE._E02");
>>>>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>>>>> index 17b514d1da..2952a00fe6 100644
>>>>> --- a/hw/i386/pc.c
>>>>> +++ b/hw/i386/pc.c
>>>>> @@ -99,6 +99,7 @@
>>>>>
>>>>>    GlobalProperty pc_compat_5_1[] = {
>>>>>        { "ICH9-LPC", "x-smi-cpu-hotplug", "off" },
>>>>> +    { "ICH9-LPC", "x-smi-cpu-hotunplug", "off" },
>>>>>    };
>>>>>    const size_t pc_compat_5_1_len = G_N_ELEMENTS(pc_compat_5_1);
>>>>>
>>>>> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
>>>>> index 087a18d04d..8c667b7166 100644
>>>>> --- a/hw/isa/lpc_ich9.c
>>>>> +++ b/hw/isa/lpc_ich9.c
>>>>> @@ -770,7 +770,7 @@ static Property ich9_lpc_properties[] = {
>>>>>        DEFINE_PROP_BIT64("x-smi-cpu-hotplug", ICH9LPCState, smi_host_features,
>>>>>                          ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT, true),
>>>>>        DEFINE_PROP_BIT64("x-smi-cpu-hotunplug", ICH9LPCState, smi_host_features,
>>>>> -                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, false),
>>>>> +                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, true),
>>>>>        DEFINE_PROP_END_OF_LIST(),
>>>>>    };
>>>>>
>>>>>      
>>>>
>>>> (9) You have to extend smi_features_ok_callback() as well -- it is
>>>> invalid for the firmware to negotiate unplug, without negotiating plug.
>>>>
>>>> In fact, as far as I can tell, that would even crash QEMU, given this
>>>> patch. Because, "opts.smi_path" would be set to NULL, but
>>>> "opts.fw_unplugs_cpu" would be set to "true". As a consequence, the
>>>> CPU_EJECT_METHOD change above would call aml_name("%s", NULL).
>>>>
>>>> So something like the following looks necessary:
>>>
>>> Thanks for suggestions,
>>> I'll respin v2 with your feedback included.
>>>    
>>>>   
>>>>> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
>>>>> index 8c667b7166c7..5bc3f212fe77 100644
>>>>> --- a/hw/isa/lpc_ich9.c
>>>>> +++ b/hw/isa/lpc_ich9.c
>>>>> @@ -366,6 +366,7 @@ static void smi_features_ok_callback(void *opaque)
>>>>>    {
>>>>>        ICH9LPCState *lpc = opaque;
>>>>>        uint64_t guest_features;
>>>>> +    uint64_t guest_cpu_hotplug_features;
>>>>>
>>>>>        if (lpc->smi_features_ok) {
>>>>>            /* negotiation already complete, features locked */
>>>>> @@ -378,9 +379,11 @@ static void smi_features_ok_callback(void *opaque)
>>>>>            /* guest requests invalid features, leave @features_ok at zero */
>>>>>            return;
>>>>>        }
>>>>> +    guest_cpu_hotplug_features = guest_features &
>>>>> +                                 (BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT) |
>>>>> +                                  BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT));
>>>>>        if (!(guest_features & BIT_ULL(ICH9_LPC_SMI_F_BROADCAST_BIT)) &&
>>>>> -        guest_features & (BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT) |
>>>>> -                          BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT))) {
>>>>> +        guest_cpu_hotplug_features) {
>>>>>            /*
>>>>>             * cpu hot-[un]plug with SMI requires SMI broadcast,
>>>>>             * leave @features_ok at zero
>>>>> @@ -388,6 +391,12 @@ static void smi_features_ok_callback(void *opaque)
>>>>>            return;
>>>>>        }
>>>>>
>>>>> +    if (guest_cpu_hotplug_features ==
>>>>> +        BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT)) {
>>>>> +        /* cpu hot-unplug is unsupported without cpu-hotplug */
>>>>> +        return;
>>>>> +    }
>>>>> +
>>>>>        /* valid feature subset requested, lock it down, report success */
>>>>>        lpc->smi_negotiated_features = guest_features;
>>>>>        lpc->smi_features_ok = 1;
>>>>
>>>>
>>>> (10) It would be nice to separate this work into multiple patches. I
>>>> propose:
>>>>
>>>> - [PATCH 1/5] x86: ich9: factor out "guest_cpu_hotplug_features"
>>>>   
>>>>>    hw/isa/lpc_ich9.c | 7 +++++--
>>>>>    1 file changed, 5 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
>>>>> index 087a18d04de4..c46eefd13fd4 100644
>>>>> --- a/hw/isa/lpc_ich9.c
>>>>> +++ b/hw/isa/lpc_ich9.c
>>>>> @@ -366,6 +366,7 @@ static void smi_features_ok_callback(void *opaque)
>>>>>    {
>>>>>        ICH9LPCState *lpc = opaque;
>>>>>        uint64_t guest_features;
>>>>> +    uint64_t guest_cpu_hotplug_features;
>>>>>
>>>>>        if (lpc->smi_features_ok) {
>>>>>            /* negotiation already complete, features locked */
>>>>> @@ -378,9 +379,11 @@ static void smi_features_ok_callback(void *opaque)
>>>>>            /* guest requests invalid features, leave @features_ok at zero */
>>>>>            return;
>>>>>        }
>>>>> +    guest_cpu_hotplug_features = guest_features &
>>>>> +                                 (BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT) |
>>>>> +                                  BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT));
>>>>>        if (!(guest_features & BIT_ULL(ICH9_LPC_SMI_F_BROADCAST_BIT)) &&
>>>>> -        guest_features & (BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT) |
>>>>> -                          BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT))) {
>>>>> +        guest_cpu_hotplug_features) {
>>>>>            /*
>>>>>             * cpu hot-[un]plug with SMI requires SMI broadcast,
>>>>>             * leave @features_ok at zero
>>>>
>>>>
>>>> - [PATCH 2/5] x86: ich9: let firmware negotiate 'CPU hot-unplug with SMI' feature
>>>>   
>>>>>    hw/i386/pc.c      | 1 +
>>>>>    hw/isa/lpc_ich9.c | 8 +++++++-
>>>>>    2 files changed, 8 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>>>>> index 17b514d1da50..2952a00fe694 100644
>>>>> --- a/hw/i386/pc.c
>>>>> +++ b/hw/i386/pc.c
>>>>> @@ -99,6 +99,7 @@
>>>>>
>>>>>    GlobalProperty pc_compat_5_1[] = {
>>>>>        { "ICH9-LPC", "x-smi-cpu-hotplug", "off" },
>>>>> +    { "ICH9-LPC", "x-smi-cpu-hotunplug", "off" },
>>>>>    };
>>>>>    const size_t pc_compat_5_1_len = G_N_ELEMENTS(pc_compat_5_1);
>>>>>
>>>>> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
>>>>> index c46eefd13fd4..5bc3f212fe77 100644
>>>>> --- a/hw/isa/lpc_ich9.c
>>>>> +++ b/hw/isa/lpc_ich9.c
>>>>> @@ -391,6 +391,12 @@ static void smi_features_ok_callback(void *opaque)
>>>>>            return;
>>>>>        }
>>>>>
>>>>> +    if (guest_cpu_hotplug_features ==
>>>>> +        BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT)) {
>>>>> +        /* cpu hot-unplug is unsupported without cpu-hotplug */
>>>>> +        return;
>>>>> +    }
>>>>> +
>>>>>        /* valid feature subset requested, lock it down, report success */
>>>>>        lpc->smi_negotiated_features = guest_features;
>>>>>        lpc->smi_features_ok = 1;
>>>>> @@ -773,7 +779,7 @@ static Property ich9_lpc_properties[] = {
>>>>>        DEFINE_PROP_BIT64("x-smi-cpu-hotplug", ICH9LPCState, smi_host_features,
>>>>>                          ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT, true),
>>>>>        DEFINE_PROP_BIT64("x-smi-cpu-hotunplug", ICH9LPCState, smi_host_features,
>>>>> -                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, false),
>>>>> +                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, true),
>>>>>        DEFINE_PROP_END_OF_LIST(),
>>>>>    };
>>>>>      
>>>>
>>>>
>>>> - [PATCH 3/5] x86: acpi: introduce AcpiPmInfo::smi_on_cpu_unplug
>>>>   
>>>>>    hw/i386/acpi-build.c | 4 ++++
>>>>>    1 file changed, 4 insertions(+)
>>>>>
>>>>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>>>>> index 1f5c2112452a..9036e5594c92 100644
>>>>> --- a/hw/i386/acpi-build.c
>>>>> +++ b/hw/i386/acpi-build.c
>>>>> @@ -96,6 +96,7 @@ typedef struct AcpiPmInfo {
>>>>>        bool s4_disabled;
>>>>>        bool pcihp_bridge_en;
>>>>>        bool smi_on_cpuhp;
>>>>> +    bool smi_on_cpu_unplug;
>>>>>        bool pcihp_root_en;
>>>>>        uint8_t s4_val;
>>>>>        AcpiFadtData fadt;
>>>>> @@ -197,6 +198,7 @@ static void acpi_get_pm_info(MachineState *machine, AcpiPmInfo *pm)
>>>>>        pm->pcihp_io_base = 0;
>>>>>        pm->pcihp_io_len = 0;
>>>>>        pm->smi_on_cpuhp = false;
>>>>> +    pm->smi_on_cpu_unplug = false;
>>>>>
>>>>>        assert(obj);
>>>>>        init_common_fadt_data(machine, obj, &pm->fadt);
>>>>> @@ -220,6 +222,8 @@ static void acpi_get_pm_info(MachineState *machine, AcpiPmInfo *pm)
>>>>>            pm->cpu_hp_io_base = ICH9_CPU_HOTPLUG_IO_BASE;
>>>>>            pm->smi_on_cpuhp =
>>>>>                !!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT));
>>>>> +        pm->smi_on_cpu_unplug =
>>>>> +            !!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT));
>>>>>        }
>>>>>
>>>>>        /* The above need not be conditional on machine type because the reset port
>>>>
>>>>
>>>> - [PATCH 4/5] acpi: cpuhp: introduce 'firmware performs eject' status/control bits
>>>>   
>>>>>    docs/specs/acpi_cpu_hotplug.txt | 11 +++++++++--
>>>>>    include/hw/acpi/cpu.h           |  1 +
>>>>>    hw/acpi/cpu.c                   |  3 +++
>>>>>    3 files changed, 13 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/docs/specs/acpi_cpu_hotplug.txt b/docs/specs/acpi_cpu_hotplug.txt
>>>>> index 9bb22d1270a9..f68ef6e06c7a 100644
>>>>> --- a/docs/specs/acpi_cpu_hotplug.txt
>>>>> +++ b/docs/specs/acpi_cpu_hotplug.txt
>>>>> @@ -57,7 +57,11 @@ read access:
>>>>>                  It's valid only when bit 0 is set.
>>>>>               2: Device remove event, used to distinguish device for which
>>>>>                  no device eject request to OSPM was issued.
>>>>> -           3-7: reserved and should be ignored by OSPM
>>>>> +           3: reserved and should be ignored by OSPM
>>>>> +           4: if set to 1, OSPM requests firmware to perform device eject,
>>>>> +              firmware shall clear this event by writing 1 into it before
>>>>> +              performing device eject.
>>>>> +           5-7: reserved and should be ignored by OSPM
>>>>>        [0x5-0x7] reserved
>>>>>        [0x8] Command data: (DWORD access)
>>>>>              contains 0 unless value last stored in 'Command field' is one of:
>>>>> @@ -82,7 +86,10 @@ write access:
>>>>>                   selected CPU device
>>>>>                3: if set to 1 initiates device eject, set by OSPM when it
>>>>>                   triggers CPU device removal and calls _EJ0 method
>>>>> -            4-7: reserved, OSPM must clear them before writing to register
>>>>> +            4: if set to 1 OSPM hands over device eject to firmware,
>>>>> +               Firmware shall issue device eject request as described above
>>>>> +               (bit #3) and OSPM should not touch device eject bit (#3),
>>>>> +            5-7: reserved, OSPM must clear them before writing to register
>>>>>        [0x5] Command field: (1 byte access)
>>>>>              value:
>>>>>                0: selects a CPU device with inserting/removing events and
>>>>> diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
>>>>> index 0eeedaa491c1..d71edde456f2 100644
>>>>> --- a/include/hw/acpi/cpu.h
>>>>> +++ b/include/hw/acpi/cpu.h
>>>>> @@ -22,6 +22,7 @@ typedef struct AcpiCpuStatus {
>>>>>        uint64_t arch_id;
>>>>>        bool is_inserting;
>>>>>        bool is_removing;
>>>>> +    bool fw_remove;
>>>>>        uint32_t ost_event;
>>>>>        uint32_t ost_status;
>>>>>    } AcpiCpuStatus;
>>>>> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
>>>>> index f099b5092730..3dc83d73e20b 100644
>>>>> --- a/hw/acpi/cpu.c
>>>>> +++ b/hw/acpi/cpu.c
>>>>> @@ -71,6 +71,7 @@ static uint64_t cpu_hotplug_rd(void *opaque, hwaddr addr, unsigned size)
>>>>>            val |= cdev->cpu ? 1 : 0;
>>>>>            val |= cdev->is_inserting ? 2 : 0;
>>>>>            val |= cdev->is_removing  ? 4 : 0;
>>>>> +        val |= cdev->fw_remove  ? 16 : 0;
>>>>>            trace_cpuhp_acpi_read_flags(cpu_st->selector, val);
>>>>>            break;
>>>>>        case ACPI_CPU_CMD_DATA_OFFSET_RW:
>>>>> @@ -148,6 +149,8 @@ static void cpu_hotplug_wr(void *opaque, hwaddr addr, uint64_t data,
>>>>>                hotplug_ctrl = qdev_get_hotplug_handler(dev);
>>>>>                hotplug_handler_unplug(hotplug_ctrl, dev, NULL);
>>>>>                object_unparent(OBJECT(dev));
>>>>> +        } else if (data & 16) {
>>>>> +            cdev->fw_remove = !cdev->fw_remove;
>>>>>            }
>>>>>            break;
>>>>>        case ACPI_CPU_CMD_OFFSET_WR:
>>>>
>>>>
>>>> - [PATCH 5/5] x86: acpi: let the firmware handle pending "CPU remove" events in SMM
>>>>   
>>>>>    include/hw/acpi/cpu.h |  1 +
>>>>>    hw/acpi/cpu.c         | 15 +++++++++++++--
>>>>>    hw/i386/acpi-build.c  |  1 +
>>>>>    3 files changed, 15 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
>>>>> index d71edde456f2..999caaf51060 100644
>>>>> --- a/include/hw/acpi/cpu.h
>>>>> +++ b/include/hw/acpi/cpu.h
>>>>> @@ -51,6 +51,7 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object *owner,
>>>>>    typedef struct CPUHotplugFeatures {
>>>>>        bool acpi_1_compatible;
>>>>>        bool has_legacy_cphp;
>>>>> +    bool fw_unplugs_cpu;
>>>>>        const char *smi_path;
>>>>>    } CPUHotplugFeatures;
>>>>>
>>>>> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
>>>>> index 3dc83d73e20b..09d2f20daec0 100644
>>>>> --- a/hw/acpi/cpu.c
>>>>> +++ b/hw/acpi/cpu.c
>>>>> @@ -335,6 +335,7 @@ const VMStateDescription vmstate_cpu_hotplug = {
>>>>>    #define CPU_INSERT_EVENT  "CINS"
>>>>>    #define CPU_REMOVE_EVENT  "CRMV"
>>>>>    #define CPU_EJECT_EVENT   "CEJ0"
>>>>> +#define CPU_FW_EJECT_EVENT "CEJF"
>>>>>
>>>>>    void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>>>>>                        hwaddr io_base,
>>>>> @@ -387,7 +388,10 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>>>>>            aml_append(field, aml_named_field(CPU_REMOVE_EVENT, 1));
>>>>>            /* initiates device eject, write only */
>>>>>            aml_append(field, aml_named_field(CPU_EJECT_EVENT, 1));
>>>>> -        aml_append(field, aml_reserved_field(4));
>>>>> +        aml_append(field, aml_reserved_field(1));
>>>>> +        /* tell firmware to do device eject, write only */
>>>>> +        aml_append(field, aml_named_field(CPU_FW_EJECT_EVENT, 1));
>>>>> +        aml_append(field, aml_reserved_field(2));
>>>>>            aml_append(field, aml_named_field(CPU_COMMAND, 8));
>>>>>            aml_append(cpu_ctrl_dev, field);
>>>>>
>>>>> @@ -422,6 +426,7 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>>>>>            Aml *ins_evt = aml_name("%s.%s", cphp_res_path, CPU_INSERT_EVENT);
>>>>>            Aml *rm_evt = aml_name("%s.%s", cphp_res_path, CPU_REMOVE_EVENT);
>>>>>            Aml *ej_evt = aml_name("%s.%s", cphp_res_path, CPU_EJECT_EVENT);
>>>>> +        Aml *fw_ej_evt = aml_name("%s.%s", cphp_res_path, CPU_FW_EJECT_EVENT);
>>>>>
>>>>>            aml_append(cpus_dev, aml_name_decl("_HID", aml_string("ACPI0010")));
>>>>>            aml_append(cpus_dev, aml_name_decl("_CID", aml_eisaid("PNP0A05")));
>>>>> @@ -464,7 +469,13 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>>>>>
>>>>>                aml_append(method, aml_acquire(ctrl_lock, 0xFFFF));
>>>>>                aml_append(method, aml_store(idx, cpu_selector));
>>>>> -            aml_append(method, aml_store(one, ej_evt));
>>>>> +            if (opts.fw_unplugs_cpu) {
>>>>> +                aml_append(method, aml_store(one, fw_ej_evt));
>>>>> +                aml_append(method, aml_store(aml_int(OVMF_CPUHP_SMI_CMD),
>>>>> +                           aml_name("%s", opts.smi_path)));
>>>>> +            } else {
>>>>> +                aml_append(method, aml_store(one, ej_evt));
>>>>> +            }
>>>>>                aml_append(method, aml_release(ctrl_lock));
>>>>>            }
>>>>>            aml_append(cpus_dev, method);
>>>>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>>>>> index 9036e5594c92..475e76f514ff 100644
>>>>> --- a/hw/i386/acpi-build.c
>>>>> +++ b/hw/i386/acpi-build.c
>>>>> @@ -1586,6 +1586,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>>>>>            CPUHotplugFeatures opts = {
>>>>>                .acpi_1_compatible = true, .has_legacy_cphp = true,
>>>>>                .smi_path = pm->smi_on_cpuhp ? "\\_SB.PCI0.SMI0.SMIC" : NULL,
>>>>> +            .fw_unplugs_cpu = pm->smi_on_cpu_unplug,
>>>>>            };
>>>>>            build_cpus_aml(dsdt, machine, opts, pm->cpu_hp_io_base,
>>>>>                           "\\_SB.PCI0", "\\_GPE._E02");
>>>>
>>>> Thanks!
>>>> Laszlo
>>>>
>>>>   
>>>    
>>
>
Ankur Arora Nov. 28, 2020, 12:43 a.m. UTC | #20
On 2020-11-27 7:19 a.m., Laszlo Ersek wrote:
> On 11/27/20 05:10, Ankur Arora wrote:
> 
>> Yeah I was wondering what would happen for simultaneous hot add and remove.
>> I guess we would always do remove first and then the add, unless we hit
>> the break due to max_cpus_per_pass and switch to hot-add mode.
> 
> Considering the firmware only, I disagree with remove-then-add.
> 
> EFI_SMM_CPU_SERVICE_PROTOCOL.AddProcessor() and
> EFI_SMM_CPU_SERVICE_PROTOCOL.RemoveProcessor() (implemented in
> SmmAddProcessor() and SmmRemoveProcessor() in
> "UefiCpuPkg/PiSmmCpuDxeSmm/CpuService.c", respectively) only mark the
> processors for addition/removal. The actual processing is done only
> later, in BSPHandler() --> SmmCpuUpdate(), when "all SMI handlers are
> finished" (see the comment in SmmRemoveProcessor()).
> 
> Consequently, I would not suggest replacing a valid APIC ID in a
> particular mCpuHotPlugData.ApicId[Index] slot with INVALID_APIC_ID
> (corresponding to the unplug operation), and then possibly replacing
> INVALID_APIC_ID in the *same slot* with the APIC ID of the newly plugged
> CPU, in the exact same SMI invocation (= in the same execution of
> CpuHotplugMmi()). That might cause some component in edk2 to see the
> APIC ID in mCpuHotPlugData.ApicId[Index] to change from one valid ACPI
> ID to another valid APIC ID, and I don't even want to think about what
> kind of mess that could cause.

Shudders.

> 
> So no, please handle plugs first, for which unused slots in
> mCpuHotPlugData.ApicId will be populated, and *then* handle removals (in
> the same invocation of CpuHotplugMmi()).

Yeah, that ordering makes complete sense.

> 
> By the way, for unplug, you will not have to re-set
> mCpuHotPlugData.ApicId[Index] to INVALID_APIC_ID, as
> SmmRemoveProcessor() does that internally. You just have to locate the
> Index for the APIC ID being removed, for calling
> EFI_SMM_CPU_SERVICE_PROTOCOL.RemoveProcessor().

Right. The hotplug is more involved (given the need to pen the new CPU)
but for the unplug, AFAICS all the actual handling for removal is in
.RemoveProcessor() and at SMI exit in SmmCpuUpdate().


Thanks
Ankur

> 
> 
> Thanks
> Laszlo
>
Laszlo Ersek Nov. 30, 2020, 4:58 p.m. UTC | #21
On 11/28/20 00:48, Ankur Arora wrote:

> It is possible that there are CPUs with bits for both is_inserting and
> is_removing. In that case QemuCpuhpCollectApicIds() would put them in the
> PluggedApicIds array and the unplug eventually happens in the next
> firmware invocation.
> 
> If a CPU has both is_inserting and fw_remove set, the firmware processes
> the
> hotplug in that invocation and the unplug happens whenever the OSPM
> triggers
> the firmware next.

If these corner cases will actually work (I'm somewhat doubtful), that
will be really great.

Thanks!
Laszlo
Laszlo Ersek Nov. 30, 2020, 5 p.m. UTC | #22
On 11/28/20 01:43, Ankur Arora wrote:
> On 2020-11-27 7:19 a.m., Laszlo Ersek wrote:
>> On 11/27/20 05:10, Ankur Arora wrote:
>>
>>> Yeah I was wondering what would happen for simultaneous hot add and
>>> remove.
>>> I guess we would always do remove first and then the add, unless we hit
>>> the break due to max_cpus_per_pass and switch to hot-add mode.
>>
>> Considering the firmware only, I disagree with remove-then-add.
>>
>> EFI_SMM_CPU_SERVICE_PROTOCOL.AddProcessor() and
>> EFI_SMM_CPU_SERVICE_PROTOCOL.RemoveProcessor() (implemented in
>> SmmAddProcessor() and SmmRemoveProcessor() in
>> "UefiCpuPkg/PiSmmCpuDxeSmm/CpuService.c", respectively) only mark the
>> processors for addition/removal. The actual processing is done only
>> later, in BSPHandler() --> SmmCpuUpdate(), when "all SMI handlers are
>> finished" (see the comment in SmmRemoveProcessor()).
>>
>> Consequently, I would not suggest replacing a valid APIC ID in a
>> particular mCpuHotPlugData.ApicId[Index] slot with INVALID_APIC_ID
>> (corresponding to the unplug operation), and then possibly replacing
>> INVALID_APIC_ID in the *same slot* with the APIC ID of the newly plugged
>> CPU, in the exact same SMI invocation (= in the same execution of
>> CpuHotplugMmi()). That might cause some component in edk2 to see the
>> APIC ID in mCpuHotPlugData.ApicId[Index] to change from one valid ACPI
>> ID to another valid APIC ID, and I don't even want to think about what
>> kind of mess that could cause.
> 
> Shudders.
> 
>>
>> So no, please handle plugs first, for which unused slots in
>> mCpuHotPlugData.ApicId will be populated, and *then* handle removals (in
>> the same invocation of CpuHotplugMmi()).
> 
> Yeah, that ordering makes complete sense.
> 
>>
>> By the way, for unplug, you will not have to re-set
>> mCpuHotPlugData.ApicId[Index] to INVALID_APIC_ID, as
>> SmmRemoveProcessor() does that internally. You just have to locate the
>> Index for the APIC ID being removed, for calling
>> EFI_SMM_CPU_SERVICE_PROTOCOL.RemoveProcessor().
> 
> Right. The hotplug is more involved (given the need to pen the new CPU)
> but for the unplug, AFAICS all the actual handling for removal is in
> .RemoveProcessor() and at SMI exit in SmmCpuUpdate().

Yes, I got the same impression (without having tried to implement it, of
course).

Laszlo
Ankur Arora Nov. 30, 2020, 7:45 p.m. UTC | #23
On 2020-11-30 8:58 a.m., Laszlo Ersek wrote:
> On 11/28/20 00:48, Ankur Arora wrote:
> 
>> It is possible that there are CPUs with bits for both is_inserting and
>> is_removing. In that case QemuCpuhpCollectApicIds() would put them in the
>> PluggedApicIds array and the unplug eventually happens in the next
>> firmware invocation.
>>
>> If a CPU has both is_inserting and fw_remove set, the firmware processes
>> the
>> hotplug in that invocation and the unplug happens whenever the OSPM
>> triggers
>> the firmware next.
> 
> If these corner cases will actually work (I'm somewhat doubtful), that
> will be really great.

Heh, yeah. That's a big if. I'll see if I can hit it in my testing.

Thanks
Ankur
diff mbox series

Patch

diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
index 0eeedaa491..999caaf510 100644
--- a/include/hw/acpi/cpu.h
+++ b/include/hw/acpi/cpu.h
@@ -22,6 +22,7 @@  typedef struct AcpiCpuStatus {
     uint64_t arch_id;
     bool is_inserting;
     bool is_removing;
+    bool fw_remove;
     uint32_t ost_event;
     uint32_t ost_status;
 } AcpiCpuStatus;
@@ -50,6 +51,7 @@  void cpu_hotplug_hw_init(MemoryRegion *as, Object *owner,
 typedef struct CPUHotplugFeatures {
     bool acpi_1_compatible;
     bool has_legacy_cphp;
+    bool fw_unplugs_cpu;
     const char *smi_path;
 } CPUHotplugFeatures;
 
diff --git a/docs/specs/acpi_cpu_hotplug.txt b/docs/specs/acpi_cpu_hotplug.txt
index 9bb22d1270..f68ef6e06c 100644
--- a/docs/specs/acpi_cpu_hotplug.txt
+++ b/docs/specs/acpi_cpu_hotplug.txt
@@ -57,7 +57,11 @@  read access:
               It's valid only when bit 0 is set.
            2: Device remove event, used to distinguish device for which
               no device eject request to OSPM was issued.
-           3-7: reserved and should be ignored by OSPM
+           3: reserved and should be ignored by OSPM
+           4: if set to 1, OSPM requests firmware to perform device eject,
+              firmware shall clear this event by writing 1 into it before
+              performing device eject.
+           5-7: reserved and should be ignored by OSPM
     [0x5-0x7] reserved
     [0x8] Command data: (DWORD access)
           contains 0 unless value last stored in 'Command field' is one of:
@@ -82,7 +86,10 @@  write access:
                selected CPU device
             3: if set to 1 initiates device eject, set by OSPM when it
                triggers CPU device removal and calls _EJ0 method
-            4-7: reserved, OSPM must clear them before writing to register
+            4: if set to 1 OSPM hands over device eject to firmware,
+               Firmware shall issue device eject request as described above
+               (bit #3) and OSPM should not touch device eject bit (#3),
+            5-7: reserved, OSPM must clear them before writing to register
     [0x5] Command field: (1 byte access)
           value:
             0: selects a CPU device with inserting/removing events and
diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
index f099b50927..09d2f20dae 100644
--- a/hw/acpi/cpu.c
+++ b/hw/acpi/cpu.c
@@ -71,6 +71,7 @@  static uint64_t cpu_hotplug_rd(void *opaque, hwaddr addr, unsigned size)
         val |= cdev->cpu ? 1 : 0;
         val |= cdev->is_inserting ? 2 : 0;
         val |= cdev->is_removing  ? 4 : 0;
+        val |= cdev->fw_remove  ? 16 : 0;
         trace_cpuhp_acpi_read_flags(cpu_st->selector, val);
         break;
     case ACPI_CPU_CMD_DATA_OFFSET_RW:
@@ -148,6 +149,8 @@  static void cpu_hotplug_wr(void *opaque, hwaddr addr, uint64_t data,
             hotplug_ctrl = qdev_get_hotplug_handler(dev);
             hotplug_handler_unplug(hotplug_ctrl, dev, NULL);
             object_unparent(OBJECT(dev));
+        } else if (data & 16) {
+            cdev->fw_remove = !cdev->fw_remove;
         }
         break;
     case ACPI_CPU_CMD_OFFSET_WR:
@@ -332,6 +335,7 @@  const VMStateDescription vmstate_cpu_hotplug = {
 #define CPU_INSERT_EVENT  "CINS"
 #define CPU_REMOVE_EVENT  "CRMV"
 #define CPU_EJECT_EVENT   "CEJ0"
+#define CPU_FW_EJECT_EVENT "CEJF"
 
 void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
                     hwaddr io_base,
@@ -384,7 +388,10 @@  void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
         aml_append(field, aml_named_field(CPU_REMOVE_EVENT, 1));
         /* initiates device eject, write only */
         aml_append(field, aml_named_field(CPU_EJECT_EVENT, 1));
-        aml_append(field, aml_reserved_field(4));
+        aml_append(field, aml_reserved_field(1));
+        /* tell firmware to do device eject, write only */
+        aml_append(field, aml_named_field(CPU_FW_EJECT_EVENT, 1));
+        aml_append(field, aml_reserved_field(2));
         aml_append(field, aml_named_field(CPU_COMMAND, 8));
         aml_append(cpu_ctrl_dev, field);
 
@@ -419,6 +426,7 @@  void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
         Aml *ins_evt = aml_name("%s.%s", cphp_res_path, CPU_INSERT_EVENT);
         Aml *rm_evt = aml_name("%s.%s", cphp_res_path, CPU_REMOVE_EVENT);
         Aml *ej_evt = aml_name("%s.%s", cphp_res_path, CPU_EJECT_EVENT);
+        Aml *fw_ej_evt = aml_name("%s.%s", cphp_res_path, CPU_FW_EJECT_EVENT);
 
         aml_append(cpus_dev, aml_name_decl("_HID", aml_string("ACPI0010")));
         aml_append(cpus_dev, aml_name_decl("_CID", aml_eisaid("PNP0A05")));
@@ -461,7 +469,13 @@  void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
 
             aml_append(method, aml_acquire(ctrl_lock, 0xFFFF));
             aml_append(method, aml_store(idx, cpu_selector));
-            aml_append(method, aml_store(one, ej_evt));
+            if (opts.fw_unplugs_cpu) {
+                aml_append(method, aml_store(one, fw_ej_evt));
+                aml_append(method, aml_store(aml_int(OVMF_CPUHP_SMI_CMD),
+                           aml_name("%s", opts.smi_path)));
+            } else {
+                aml_append(method, aml_store(one, ej_evt));
+            }
             aml_append(method, aml_release(ctrl_lock));
         }
         aml_append(cpus_dev, method);
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 1f5c211245..475e76f514 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -96,6 +96,7 @@  typedef struct AcpiPmInfo {
     bool s4_disabled;
     bool pcihp_bridge_en;
     bool smi_on_cpuhp;
+    bool smi_on_cpu_unplug;
     bool pcihp_root_en;
     uint8_t s4_val;
     AcpiFadtData fadt;
@@ -197,6 +198,7 @@  static void acpi_get_pm_info(MachineState *machine, AcpiPmInfo *pm)
     pm->pcihp_io_base = 0;
     pm->pcihp_io_len = 0;
     pm->smi_on_cpuhp = false;
+    pm->smi_on_cpu_unplug = false;
 
     assert(obj);
     init_common_fadt_data(machine, obj, &pm->fadt);
@@ -220,6 +222,8 @@  static void acpi_get_pm_info(MachineState *machine, AcpiPmInfo *pm)
         pm->cpu_hp_io_base = ICH9_CPU_HOTPLUG_IO_BASE;
         pm->smi_on_cpuhp =
             !!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT));
+        pm->smi_on_cpu_unplug =
+            !!(smi_features & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT));
     }
 
     /* The above need not be conditional on machine type because the reset port
@@ -1582,6 +1586,7 @@  build_dsdt(GArray *table_data, BIOSLinker *linker,
         CPUHotplugFeatures opts = {
             .acpi_1_compatible = true, .has_legacy_cphp = true,
             .smi_path = pm->smi_on_cpuhp ? "\\_SB.PCI0.SMI0.SMIC" : NULL,
+            .fw_unplugs_cpu = pm->smi_on_cpu_unplug,
         };
         build_cpus_aml(dsdt, machine, opts, pm->cpu_hp_io_base,
                        "\\_SB.PCI0", "\\_GPE._E02");
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 17b514d1da..2952a00fe6 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -99,6 +99,7 @@ 
 
 GlobalProperty pc_compat_5_1[] = {
     { "ICH9-LPC", "x-smi-cpu-hotplug", "off" },
+    { "ICH9-LPC", "x-smi-cpu-hotunplug", "off" },
 };
 const size_t pc_compat_5_1_len = G_N_ELEMENTS(pc_compat_5_1);
 
diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
index 087a18d04d..8c667b7166 100644
--- a/hw/isa/lpc_ich9.c
+++ b/hw/isa/lpc_ich9.c
@@ -770,7 +770,7 @@  static Property ich9_lpc_properties[] = {
     DEFINE_PROP_BIT64("x-smi-cpu-hotplug", ICH9LPCState, smi_host_features,
                       ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT, true),
     DEFINE_PROP_BIT64("x-smi-cpu-hotunplug", ICH9LPCState, smi_host_features,
-                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, false),
+                      ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, true),
     DEFINE_PROP_END_OF_LIST(),
 };