diff mbox series

[v5,09/10] x68: acpi: trigger SMI before sending hotplug Notify event to OSPM

Message ID 20200907112348.530921-10-imammedo@redhat.com
State New
Headers show
Series x86: fix cpu hotplug with secure boot | expand

Commit Message

Igor Mammedov Sept. 7, 2020, 11:23 a.m. UTC
In case firmware has negotiated CPU hotplug SMI feature, generate
AML to describe SMI IO port region and send SMI to firmware
on each CPU hotplug SCI in case new CPUs were hotplugged.

Since new CPUs can be hotplugged while CPU_SCAN_METHOD is running
we can't send SMI before new CPUs are fetched from QEMU as it
could cause sending Notify to a CPU that firmware hasn't seen yet.
So fetch new CPUs into local cache first, then send SMI and
after that send Notify events to cached CPUs. This should ensure
that Notify is sent only to CPUs which were processed by firmware
first.
Any CPUs that were hotplugged after caching will be processed
by the next CPU_SCAN_METHOD, when pending SCI is handled.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
v5:
 - fix hotplug on Windows when there is more than 256 possible CPUs
   (Windows isn't able to handle VarPackage over 255 elements
    so process CPUs in batches)
 - (Laszlo Ersek <lersek@redhat.com>)
   - fix off-by-one in package length
   - fix not selecting CPU before clearing insert event
   - use aml_lgreater() instead of aml_lnot(aml_equal(num_added_cpus, zero)
   - split off in separate patches:
     - making smi_negotiated_features a property
     - introduction of AcpiPmInfo.smi_on_cpuhp
     - introduction of PCI0.SMI0 ACPI device
v2:
 - clear insert event after firmware has returned
   control from SMI. (Laszlo Ersek <lersek@redhat.com>)
v1:
 - make sure that Notify is sent only to CPUs seen by FW
 - (Laszlo Ersek <lersek@redhat.com>)
    - use macro instead of x-smi-negotiated-features
    - style fixes
    - reserve whole SMI block (0xB2, 0xB3)
v0:
 - s/aml_string/aml_eisaid/
   /fixes Windows BSOD, on nonsense value/ (Laszlo Ersek <lersek@redhat.com>)
 - put SMI device under PCI0 like the rest of hotplug IO port
 - do not generate SMI AML if CPU hotplug SMI feature hasn't been negotiated
---
 hw/acpi/cpu.c | 156 +++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 129 insertions(+), 27 deletions(-)

Comments

Laszlo Ersek Sept. 7, 2020, 3:17 p.m. UTC | #1
Hi Igor,

On 09/07/20 13:23, Igor Mammedov wrote:
> In case firmware has negotiated CPU hotplug SMI feature, generate
> AML to describe SMI IO port region and send SMI to firmware
> on each CPU hotplug SCI in case new CPUs were hotplugged.
>
> Since new CPUs can be hotplugged while CPU_SCAN_METHOD is running
> we can't send SMI before new CPUs are fetched from QEMU as it
> could cause sending Notify to a CPU that firmware hasn't seen yet.
> So fetch new CPUs into local cache first, then send SMI and
> after that send Notify events to cached CPUs. This should ensure
> that Notify is sent only to CPUs which were processed by firmware
> first.
> Any CPUs that were hotplugged after caching will be processed
> by the next CPU_SCAN_METHOD, when pending SCI is handled.
>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> v5:
>  - fix hotplug on Windows when there is more than 256 possible CPUs
>    (Windows isn't able to handle VarPackage over 255 elements
>     so process CPUs in batches)
>  - (Laszlo Ersek <lersek@redhat.com>)
>    - fix off-by-one in package length
>    - fix not selecting CPU before clearing insert event
>    - use aml_lgreater() instead of aml_lnot(aml_equal(num_added_cpus,
>      zero)
>    - split off in separate patches:
>      - making smi_negotiated_features a property
>      - introduction of AcpiPmInfo.smi_on_cpuhp
>      - introduction of PCI0.SMI0 ACPI device
> v2:
>  - clear insert event after firmware has returned
>    control from SMI. (Laszlo Ersek <lersek@redhat.com>)
> v1:
>  - make sure that Notify is sent only to CPUs seen by FW
>  - (Laszlo Ersek <lersek@redhat.com>)
>     - use macro instead of x-smi-negotiated-features
>     - style fixes
>     - reserve whole SMI block (0xB2, 0xB3)
> v0:
>  - s/aml_string/aml_eisaid/
>    /fixes Windows BSOD, on nonsense value/ (Laszlo Ersek
>    <lersek@redhat.com>)
>  - put SMI device under PCI0 like the rest of hotplug IO port
>  - do not generate SMI AML if CPU hotplug SMI feature hasn't been
>    negotiated
> ---
>  hw/acpi/cpu.c | 156 +++++++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 129 insertions(+), 27 deletions(-)
>
> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> index 3d6a500fb7..1283972001 100644
> --- a/hw/acpi/cpu.c
> +++ b/hw/acpi/cpu.c
> @@ -14,6 +14,8 @@
>  #define ACPI_CPU_CMD_DATA_OFFSET_RW 8
>  #define ACPI_CPU_CMD_DATA2_OFFSET_R 0
>
> +#define OVMF_CPUHP_SMI_CMD 4
> +
>  enum {
>      CPHP_GET_NEXT_CPU_WITH_EVENT_CMD = 0,
>      CPHP_OST_EVENT_CMD = 1,
> @@ -321,6 +323,7 @@ const VMStateDescription vmstate_cpu_hotplug = {
>  #define CPU_NOTIFY_METHOD "CTFY"
>  #define CPU_EJECT_METHOD  "CEJ0"
>  #define CPU_OST_METHOD    "COST"
> +#define CPU_ADDED_LIST    "CNEW"
>
>  #define CPU_ENABLED       "CPEN"
>  #define CPU_SELECTOR      "CSEL"
> @@ -465,42 +468,141 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>
>          method = aml_method(CPU_SCAN_METHOD, 0, AML_SERIALIZED);
>          {
> +            const uint8_t max_cpus_per_pass = 255;
>              Aml *else_ctx;
> -            Aml *while_ctx;
> +            Aml *while_ctx, *while_ctx2;
>              Aml *has_event = aml_local(0);
>              Aml *dev_chk = aml_int(1);
>              Aml *eject_req = aml_int(3);
>              Aml *next_cpu_cmd = aml_int(CPHP_GET_NEXT_CPU_WITH_EVENT_CMD);
> +            Aml *num_added_cpus = aml_local(1);
> +            Aml *cpu_idx = aml_local(2);
> +            Aml *uid = aml_local(3);
> +            Aml *has_job = aml_local(4);
> +            Aml *new_cpus = aml_name(CPU_ADDED_LIST);
>
>              aml_append(method, aml_acquire(ctrl_lock, 0xFFFF));
> -            aml_append(method, aml_store(one, has_event));
> -            while_ctx = aml_while(aml_equal(has_event, one));
> +
> +            /*
> +             * Windows versions newer than XP (including Windows 10/Windows
> +             * Server 2019), do support* VarPackageOp but, it is cripled to hold
> +             * the same elements number as old PackageOp.
> +             * For compatibility with Windows XP (so it won't crash) use ACPI1.0
> +             * PackageOp which can hold max 255 elements.
> +             *
> +             * use named package as old Windows don't support it in local var
> +             */
> +            aml_append(method, aml_name_decl(CPU_ADDED_LIST,
> +                                             aml_package(max_cpus_per_pass)));
> +
> +            aml_append(method, aml_store(zero, uid));
> +            aml_append(method, aml_store(one, has_job));
> +            /*
> +             * CPU_ADDED_LIST can hold limited number of elements, outer loop
> +             * allows to process CPUs in batches which let us to handle more
> +             * CPUs than CPU_ADDED_LIST can hold.
> +             */
> +            while_ctx2 = aml_while(aml_equal(has_job, one));
>              {
> -                 /* clear loop exit condition, ins_evt/rm_evt checks
> -                  * will set it to 1 while next_cpu_cmd returns a CPU
> -                  * with events */
> -                 aml_append(while_ctx, aml_store(zero, has_event));
> -                 aml_append(while_ctx, aml_store(next_cpu_cmd, cpu_cmd));
> -                 ifctx = aml_if(aml_equal(ins_evt, one));
> -                 {
> -                     aml_append(ifctx,
> -                         aml_call2(CPU_NOTIFY_METHOD, cpu_data, dev_chk));
> -                     aml_append(ifctx, aml_store(one, ins_evt));
> -                     aml_append(ifctx, aml_store(one, has_event));
> -                 }
> -                 aml_append(while_ctx, ifctx);
> -                 else_ctx = aml_else();
> -                 ifctx = aml_if(aml_equal(rm_evt, one));
> -                 {
> -                     aml_append(ifctx,
> -                         aml_call2(CPU_NOTIFY_METHOD, cpu_data, eject_req));
> -                     aml_append(ifctx, aml_store(one, rm_evt));
> -                     aml_append(ifctx, aml_store(one, has_event));
> -                 }
> -                 aml_append(else_ctx, ifctx);
> -                 aml_append(while_ctx, else_ctx);
> +                aml_append(while_ctx2, aml_store(zero, has_job));
> +
> +                aml_append(while_ctx2, aml_store(one, has_event));
> +                aml_append(while_ctx2, aml_store(zero, num_added_cpus));
> +
> +                /*
> +                 * Scan CPUs, till there are CPUs with events or
> +                 * CPU_ADDED_LIST capacity is exhausted
> +                 */
> +                while_ctx = aml_while(aml_land(aml_equal(has_event, one),
> +                                      aml_lless(uid, aml_int(arch_ids->len))));
> +                {
> +                     /*
> +                      * clear loop exit condition, ins_evt/rm_evt checks will
> +                      * set it to 1 while next_cpu_cmd returns a CPU with events
> +                      */
> +                     aml_append(while_ctx, aml_store(zero, has_event));
> +
> +                     aml_append(while_ctx, aml_store(uid, cpu_selector));
> +                     aml_append(while_ctx, aml_store(next_cpu_cmd, cpu_cmd));
> +
> +                     /*
> +                      * wrap around case, scan is complete, exit loop.
> +                      * It happens since events are not cleared in scan loop,
> +                      * so next_cpu_cmd continues to find already processed CPUs
> +                      */
> +                     ifctx = aml_if(aml_lless(cpu_data, uid));
> +                     {
> +                         aml_append(ifctx, aml_break());
> +                     }
> +                     aml_append(while_ctx, ifctx);
> +
> +                     /*
> +                      * if CPU_ADDED_LIST is full, exit inner loop and process
> +                      * collected CPUs
> +                      */
> +                     ifctx = aml_if(
> +                         aml_equal(num_added_cpus, aml_int(max_cpus_per_pass)));
> +                     {
> +                         aml_append(ifctx, aml_store(one, has_job));
> +                         aml_append(ifctx, aml_break());
> +                     }
> +                     aml_append(while_ctx, ifctx);
> +
> +                     aml_append(while_ctx, aml_store(cpu_data, uid));
> +                     ifctx = aml_if(aml_equal(ins_evt, one));
> +                     {
> +                         /* cache added CPUs to Notify/Wakeup later */
> +                         aml_append(ifctx, aml_store(uid,
> +                             aml_index(new_cpus, num_added_cpus)));
> +                         aml_append(ifctx, aml_increment(num_added_cpus));
> +                         aml_append(ifctx, aml_store(one, has_event));
> +                     }
> +                     aml_append(while_ctx, ifctx);
> +                     else_ctx = aml_else();
> +                     ifctx = aml_if(aml_equal(rm_evt, one));
> +                     {
> +                         aml_append(ifctx,
> +                             aml_call2(CPU_NOTIFY_METHOD, uid, eject_req));
> +                         aml_append(ifctx, aml_store(one, rm_evt));
> +                         aml_append(ifctx, aml_store(one, has_event));
> +                     }
> +                     aml_append(else_ctx, ifctx);
> +                     aml_append(while_ctx, else_ctx);
> +                     aml_append(while_ctx, aml_increment(uid));
> +                }
> +                aml_append(while_ctx2, while_ctx);
> +
> +                /*
> +                 * in case FW negotiated ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT,
> +                 * make upcall to FW, so it can pull in new CPUs before
> +                 * OS is notified and wakes them up
> +                 */
> +                if (opts.smi_path) {
> +                    ifctx = aml_if(aml_lgreater(num_added_cpus, zero));
> +                    {
> +                        aml_append(ifctx, aml_store(aml_int(OVMF_CPUHP_SMI_CMD),
> +                            aml_name("%s", opts.smi_path)));
> +                    }
> +                    aml_append(while_ctx2, ifctx);
> +                }
> +
> +                /* Notify OSPM about new CPUs and clear insert events */
> +                aml_append(while_ctx2, aml_store(zero, cpu_idx));
> +                while_ctx = aml_while(aml_lless(cpu_idx, num_added_cpus));
> +                {
> +                    aml_append(while_ctx,
> +                        aml_store(aml_derefof(aml_index(new_cpus, cpu_idx)),
> +                                  uid));
> +                    aml_append(while_ctx,
> +                        aml_call2(CPU_NOTIFY_METHOD, uid, dev_chk));
> +                    aml_append(while_ctx, aml_store(uid, aml_debug()));
> +                    aml_append(while_ctx, aml_store(uid, cpu_selector));
> +                    aml_append(while_ctx, aml_store(one, ins_evt));
> +                    aml_append(while_ctx, aml_increment(cpu_idx));
> +                }
> +                aml_append(while_ctx2, while_ctx);
>              }
> -            aml_append(method, while_ctx);
> +            aml_append(method, while_ctx2);
>              aml_append(method, aml_release(ctrl_lock));
>          }
>          aml_append(cpus_dev, method);
>

Here's the ASL decompiled, and using the meaningful local variable names
(and 8 possible VCPUs):

   1              Method (CSCN, 0, Serialized)
   2              {
   3                  Acquire (\_SB.PCI0.PRES.CPLK, 0xFFFF)
   4                  Name (CNEW, Package (0xFF){})
   5                  Local_Uid = Zero
   6                  Local_HasJob = One
   7                  While ((Local_HasJob == One))
   8                  {
   9                      Local_HasJob = Zero
  10                      Local_HasEvent = One
  11                      Local_NumAddedCpus = Zero
  12                      While (((Local_HasEvent == One) && (Local_Uid < 0x08)))
  13                      {
  14                          Local_HasEvent = Zero
  15                          \_SB.PCI0.PRES.CSEL = Local_Uid
  16                          \_SB.PCI0.PRES.CCMD = Zero
  17                          If ((\_SB.PCI0.PRES.CDAT < Local_Uid))
  18                          {
  19                              Break
  20                          }
  21
  22                          If ((Local_NumAddedCpus == 0xFF))
  23                          {
  24                              Local_HasJob = One
  25                              Break
  26                          }
  27
  28                          Local_Uid = \_SB.PCI0.PRES.CDAT
  29                          If ((\_SB.PCI0.PRES.CINS == One))
  30                          {
  31                              CNEW [Local_NumAddedCpus] = Local_Uid
  32                              Local_NumAddedCpus++
  33                              Local_HasEvent = One
  34                          }
  35                          ElseIf ((\_SB.PCI0.PRES.CRMV == One))
  36                          {
  37                              CTFY (Local_Uid, 0x03)
  38                              \_SB.PCI0.PRES.CRMV = One
  39                              Local_HasEvent = One
  40                          }
  41
  42                          Local_Uid++
  43                      }
  44
  45                      If ((Local_NumAddedCpus > Zero))
  46                      {
  47                          \_SB.PCI0.SMI0.SMIC = 0x04
  48                      }
  49
  50                      Local_CpuIdx = Zero
  51                      While ((Local_CpuIdx < Local_NumAddedCpus))
  52                      {
  53                          Local_Uid = DerefOf (CNEW [Local_CpuIdx])
  54                          CTFY (Local_Uid, One)
  55                          Debug = Local_Uid
  56                          \_SB.PCI0.PRES.CSEL = Local_Uid
  57                          \_SB.PCI0.PRES.CINS = One
  58                          Local_CpuIdx++
  59                      }
  60                  }
  61
  62                  Release (\_SB.PCI0.PRES.CPLK)
  63              }

When we take the Break on line 25, then:

(a) on line 25, the following equality holds:

  Local_Uid == CNEW[Local_NumAddedCpus - 1] + 1

(b) on line 60, the following equality holds:

  Local_Uid == CNEW[Local_NumAddedCpus - 1]

This means that, when we write Local_Uid to CSEL on line 15 again, then:

- we effectively re-investigate the last-cached CPU (with selector value
  CNEW[Local_NumAddedCpus - 1])

- rather than resuming the scanning right after it -- that is, with
  selector value (CNEW[Local_NumAddedCpus - 1] + 1) --, in the spirit of
  line 42.

My question is: is this "rewind" intentional?

Now, I don't see a functionality problem with this, as on line 57, we
clear the pending insert event for the last-cached CPU, so when we
re-check it, the "get pending" command will simply seek past it.

But I'd like to understand if this is *precisely* your intent, or if
it's an oversight and it just ends up working OK.

Basically my question is whether we should add, on top:

> --- cscn.v5     2020-09-07 15:02:13.401663487 +0200
> +++ cscn.v5.incr        2020-09-07 16:52:22.133843710 +0200
> @@ -57,6 +57,10 @@
>                          \_SB.PCI0.PRES.CINS = One
>                          Local_CpuIdx++
>                      }
> +
> +                    if ((Local_HasJob == One)) {
> +                        Local_Uid++
> +                    }
>                  }
>
>                  Release (\_SB.PCI0.PRES.CPLK)

If not -- that is, the currently proposed patch is intentional --, then
I think we should add a comment, about the effective "rewind" being
intentional & okay.

(Note: it's certainly valid and necessary to re-write CSEL on line 15
after raising the SMI on line 47; the question is not whether we should
or should not re-write CSEL (we must!), but the specific value that we
write to CSEL.)

So:

- If the outer loop body is entered only once, then the patch looks
  fine, from both the review side, and the testing side (I tested it
  with 4-8 possible VCPUs).

- If the outer loop body is entered twice or more, then from the review
  side, please my question above. From the testing side: do you have an
  environment where I could test this with OVMF?

(I expect it to work OK. Upon the first SMI, the firmware will likely
pick up more VCPUs than what's in CNEW. But edk2 commit 020bb4b46d6f
("OvmfPkg/CpuHotplugSmm: fix CPU hotplug race just before SMI
broadcast", 2020-08-27) should deal with that.)

Hmm, actually, there's no need for a special environment: I can patch
QEMU and lower "max_cpus_per_pass" to something small, such as "3" or
whatever, for testing the outer loop multiple times. But first I'd like
to know your thoughts on the "rewind".

Thanks!
Laszlo
Igor Mammedov Sept. 8, 2020, 7:39 a.m. UTC | #2
On Mon, 7 Sep 2020 17:17:52 +0200
Laszlo Ersek <lersek@redhat.com> wrote:

> Hi Igor,
> 
> On 09/07/20 13:23, Igor Mammedov wrote:
> > In case firmware has negotiated CPU hotplug SMI feature, generate
> > AML to describe SMI IO port region and send SMI to firmware
> > on each CPU hotplug SCI in case new CPUs were hotplugged.
> >
> > Since new CPUs can be hotplugged while CPU_SCAN_METHOD is running
> > we can't send SMI before new CPUs are fetched from QEMU as it
> > could cause sending Notify to a CPU that firmware hasn't seen yet.
> > So fetch new CPUs into local cache first, then send SMI and
> > after that send Notify events to cached CPUs. This should ensure
> > that Notify is sent only to CPUs which were processed by firmware
> > first.
> > Any CPUs that were hotplugged after caching will be processed
> > by the next CPU_SCAN_METHOD, when pending SCI is handled.
> >
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> > v5:
> >  - fix hotplug on Windows when there is more than 256 possible CPUs
> >    (Windows isn't able to handle VarPackage over 255 elements
> >     so process CPUs in batches)
> >  - (Laszlo Ersek <lersek@redhat.com>)
> >    - fix off-by-one in package length
> >    - fix not selecting CPU before clearing insert event
> >    - use aml_lgreater() instead of aml_lnot(aml_equal(num_added_cpus,
> >      zero)
> >    - split off in separate patches:
> >      - making smi_negotiated_features a property
> >      - introduction of AcpiPmInfo.smi_on_cpuhp
> >      - introduction of PCI0.SMI0 ACPI device
> > v2:
> >  - clear insert event after firmware has returned
> >    control from SMI. (Laszlo Ersek <lersek@redhat.com>)
> > v1:
> >  - make sure that Notify is sent only to CPUs seen by FW
> >  - (Laszlo Ersek <lersek@redhat.com>)
> >     - use macro instead of x-smi-negotiated-features
> >     - style fixes
> >     - reserve whole SMI block (0xB2, 0xB3)
> > v0:
> >  - s/aml_string/aml_eisaid/
> >    /fixes Windows BSOD, on nonsense value/ (Laszlo Ersek
> >    <lersek@redhat.com>)
> >  - put SMI device under PCI0 like the rest of hotplug IO port
> >  - do not generate SMI AML if CPU hotplug SMI feature hasn't been
> >    negotiated
> > ---
> >  hw/acpi/cpu.c | 156 +++++++++++++++++++++++++++++++++++++++++---------
> >  1 file changed, 129 insertions(+), 27 deletions(-)
> >
> > diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> > index 3d6a500fb7..1283972001 100644
> > --- a/hw/acpi/cpu.c
> > +++ b/hw/acpi/cpu.c
> > @@ -14,6 +14,8 @@
> >  #define ACPI_CPU_CMD_DATA_OFFSET_RW 8
> >  #define ACPI_CPU_CMD_DATA2_OFFSET_R 0
> >
> > +#define OVMF_CPUHP_SMI_CMD 4
> > +
> >  enum {
> >      CPHP_GET_NEXT_CPU_WITH_EVENT_CMD = 0,
> >      CPHP_OST_EVENT_CMD = 1,
> > @@ -321,6 +323,7 @@ const VMStateDescription vmstate_cpu_hotplug = {
> >  #define CPU_NOTIFY_METHOD "CTFY"
> >  #define CPU_EJECT_METHOD  "CEJ0"
> >  #define CPU_OST_METHOD    "COST"
> > +#define CPU_ADDED_LIST    "CNEW"
> >
> >  #define CPU_ENABLED       "CPEN"
> >  #define CPU_SELECTOR      "CSEL"
> > @@ -465,42 +468,141 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> >
> >          method = aml_method(CPU_SCAN_METHOD, 0, AML_SERIALIZED);
> >          {
> > +            const uint8_t max_cpus_per_pass = 255;
> >              Aml *else_ctx;
> > -            Aml *while_ctx;
> > +            Aml *while_ctx, *while_ctx2;
> >              Aml *has_event = aml_local(0);
> >              Aml *dev_chk = aml_int(1);
> >              Aml *eject_req = aml_int(3);
> >              Aml *next_cpu_cmd = aml_int(CPHP_GET_NEXT_CPU_WITH_EVENT_CMD);
> > +            Aml *num_added_cpus = aml_local(1);
> > +            Aml *cpu_idx = aml_local(2);
> > +            Aml *uid = aml_local(3);
> > +            Aml *has_job = aml_local(4);
> > +            Aml *new_cpus = aml_name(CPU_ADDED_LIST);
> >
> >              aml_append(method, aml_acquire(ctrl_lock, 0xFFFF));
> > -            aml_append(method, aml_store(one, has_event));
> > -            while_ctx = aml_while(aml_equal(has_event, one));
> > +
> > +            /*
> > +             * Windows versions newer than XP (including Windows 10/Windows
> > +             * Server 2019), do support* VarPackageOp but, it is cripled to hold
> > +             * the same elements number as old PackageOp.
> > +             * For compatibility with Windows XP (so it won't crash) use ACPI1.0
> > +             * PackageOp which can hold max 255 elements.
> > +             *
> > +             * use named package as old Windows don't support it in local var
> > +             */
> > +            aml_append(method, aml_name_decl(CPU_ADDED_LIST,
> > +                                             aml_package(max_cpus_per_pass)));
> > +
> > +            aml_append(method, aml_store(zero, uid));
> > +            aml_append(method, aml_store(one, has_job));
> > +            /*
> > +             * CPU_ADDED_LIST can hold limited number of elements, outer loop
> > +             * allows to process CPUs in batches which let us to handle more
> > +             * CPUs than CPU_ADDED_LIST can hold.
> > +             */
> > +            while_ctx2 = aml_while(aml_equal(has_job, one));
> >              {
> > -                 /* clear loop exit condition, ins_evt/rm_evt checks
> > -                  * will set it to 1 while next_cpu_cmd returns a CPU
> > -                  * with events */
> > -                 aml_append(while_ctx, aml_store(zero, has_event));
> > -                 aml_append(while_ctx, aml_store(next_cpu_cmd, cpu_cmd));
> > -                 ifctx = aml_if(aml_equal(ins_evt, one));
> > -                 {
> > -                     aml_append(ifctx,
> > -                         aml_call2(CPU_NOTIFY_METHOD, cpu_data, dev_chk));
> > -                     aml_append(ifctx, aml_store(one, ins_evt));
> > -                     aml_append(ifctx, aml_store(one, has_event));
> > -                 }
> > -                 aml_append(while_ctx, ifctx);
> > -                 else_ctx = aml_else();
> > -                 ifctx = aml_if(aml_equal(rm_evt, one));
> > -                 {
> > -                     aml_append(ifctx,
> > -                         aml_call2(CPU_NOTIFY_METHOD, cpu_data, eject_req));
> > -                     aml_append(ifctx, aml_store(one, rm_evt));
> > -                     aml_append(ifctx, aml_store(one, has_event));
> > -                 }
> > -                 aml_append(else_ctx, ifctx);
> > -                 aml_append(while_ctx, else_ctx);
> > +                aml_append(while_ctx2, aml_store(zero, has_job));
> > +
> > +                aml_append(while_ctx2, aml_store(one, has_event));
> > +                aml_append(while_ctx2, aml_store(zero, num_added_cpus));
> > +
> > +                /*
> > +                 * Scan CPUs, till there are CPUs with events or
> > +                 * CPU_ADDED_LIST capacity is exhausted
> > +                 */
> > +                while_ctx = aml_while(aml_land(aml_equal(has_event, one),
> > +                                      aml_lless(uid, aml_int(arch_ids->len))));
> > +                {
> > +                     /*
> > +                      * clear loop exit condition, ins_evt/rm_evt checks will
> > +                      * set it to 1 while next_cpu_cmd returns a CPU with events
> > +                      */
> > +                     aml_append(while_ctx, aml_store(zero, has_event));
> > +
> > +                     aml_append(while_ctx, aml_store(uid, cpu_selector));
> > +                     aml_append(while_ctx, aml_store(next_cpu_cmd, cpu_cmd));
> > +
> > +                     /*
> > +                      * wrap around case, scan is complete, exit loop.
> > +                      * It happens since events are not cleared in scan loop,
> > +                      * so next_cpu_cmd continues to find already processed CPUs
> > +                      */
> > +                     ifctx = aml_if(aml_lless(cpu_data, uid));
> > +                     {
> > +                         aml_append(ifctx, aml_break());
> > +                     }
> > +                     aml_append(while_ctx, ifctx);
> > +
> > +                     /*
> > +                      * if CPU_ADDED_LIST is full, exit inner loop and process
> > +                      * collected CPUs
> > +                      */
> > +                     ifctx = aml_if(
> > +                         aml_equal(num_added_cpus, aml_int(max_cpus_per_pass)));
> > +                     {
> > +                         aml_append(ifctx, aml_store(one, has_job));
> > +                         aml_append(ifctx, aml_break());
> > +                     }
> > +                     aml_append(while_ctx, ifctx);
> > +
> > +                     aml_append(while_ctx, aml_store(cpu_data, uid));
> > +                     ifctx = aml_if(aml_equal(ins_evt, one));
> > +                     {
> > +                         /* cache added CPUs to Notify/Wakeup later */
> > +                         aml_append(ifctx, aml_store(uid,
> > +                             aml_index(new_cpus, num_added_cpus)));
> > +                         aml_append(ifctx, aml_increment(num_added_cpus));
> > +                         aml_append(ifctx, aml_store(one, has_event));
> > +                     }
> > +                     aml_append(while_ctx, ifctx);
> > +                     else_ctx = aml_else();
> > +                     ifctx = aml_if(aml_equal(rm_evt, one));
> > +                     {
> > +                         aml_append(ifctx,
> > +                             aml_call2(CPU_NOTIFY_METHOD, uid, eject_req));
> > +                         aml_append(ifctx, aml_store(one, rm_evt));
> > +                         aml_append(ifctx, aml_store(one, has_event));
> > +                     }
> > +                     aml_append(else_ctx, ifctx);
> > +                     aml_append(while_ctx, else_ctx);
> > +                     aml_append(while_ctx, aml_increment(uid));
> > +                }
> > +                aml_append(while_ctx2, while_ctx);
> > +
> > +                /*
> > +                 * in case FW negotiated ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT,
> > +                 * make upcall to FW, so it can pull in new CPUs before
> > +                 * OS is notified and wakes them up
> > +                 */
> > +                if (opts.smi_path) {
> > +                    ifctx = aml_if(aml_lgreater(num_added_cpus, zero));
> > +                    {
> > +                        aml_append(ifctx, aml_store(aml_int(OVMF_CPUHP_SMI_CMD),
> > +                            aml_name("%s", opts.smi_path)));
> > +                    }
> > +                    aml_append(while_ctx2, ifctx);
> > +                }
> > +
> > +                /* Notify OSPM about new CPUs and clear insert events */
> > +                aml_append(while_ctx2, aml_store(zero, cpu_idx));
> > +                while_ctx = aml_while(aml_lless(cpu_idx, num_added_cpus));
> > +                {
> > +                    aml_append(while_ctx,
> > +                        aml_store(aml_derefof(aml_index(new_cpus, cpu_idx)),
> > +                                  uid));
> > +                    aml_append(while_ctx,
> > +                        aml_call2(CPU_NOTIFY_METHOD, uid, dev_chk));
> > +                    aml_append(while_ctx, aml_store(uid, aml_debug()));
> > +                    aml_append(while_ctx, aml_store(uid, cpu_selector));
> > +                    aml_append(while_ctx, aml_store(one, ins_evt));
> > +                    aml_append(while_ctx, aml_increment(cpu_idx));
> > +                }
> > +                aml_append(while_ctx2, while_ctx);
> >              }
> > -            aml_append(method, while_ctx);
> > +            aml_append(method, while_ctx2);
> >              aml_append(method, aml_release(ctrl_lock));
> >          }
> >          aml_append(cpus_dev, method);
> >  
> 
> Here's the ASL decompiled, and using the meaningful local variable names
> (and 8 possible VCPUs):
> 
>    1              Method (CSCN, 0, Serialized)
>    2              {
>    3                  Acquire (\_SB.PCI0.PRES.CPLK, 0xFFFF)
>    4                  Name (CNEW, Package (0xFF){})
>    5                  Local_Uid = Zero
>    6                  Local_HasJob = One
>    7                  While ((Local_HasJob == One))
>    8                  {
>    9                      Local_HasJob = Zero
>   10                      Local_HasEvent = One
>   11                      Local_NumAddedCpus = Zero
>   12                      While (((Local_HasEvent == One) && (Local_Uid < 0x08)))
>   13                      {
>   14                          Local_HasEvent = Zero
>   15                          \_SB.PCI0.PRES.CSEL = Local_Uid
>   16                          \_SB.PCI0.PRES.CCMD = Zero
>   17                          If ((\_SB.PCI0.PRES.CDAT < Local_Uid))
>   18                          {
>   19                              Break
>   20                          }
>   21
>   22                          If ((Local_NumAddedCpus == 0xFF))
>   23                          {
>   24                              Local_HasJob = One
>   25                              Break
>   26                          }
>   27
>   28                          Local_Uid = \_SB.PCI0.PRES.CDAT
>   29                          If ((\_SB.PCI0.PRES.CINS == One))
>   30                          {
>   31                              CNEW [Local_NumAddedCpus] = Local_Uid
>   32                              Local_NumAddedCpus++
>   33                              Local_HasEvent = One
>   34                          }
>   35                          ElseIf ((\_SB.PCI0.PRES.CRMV == One))
>   36                          {
>   37                              CTFY (Local_Uid, 0x03)
>   38                              \_SB.PCI0.PRES.CRMV = One
>   39                              Local_HasEvent = One
>   40                          }
>   41
>   42                          Local_Uid++
>   43                      }
>   44
>   45                      If ((Local_NumAddedCpus > Zero))
>   46                      {
>   47                          \_SB.PCI0.SMI0.SMIC = 0x04
>   48                      }
>   49
>   50                      Local_CpuIdx = Zero
>   51                      While ((Local_CpuIdx < Local_NumAddedCpus))
>   52                      {
>   53                          Local_Uid = DerefOf (CNEW [Local_CpuIdx])
>   54                          CTFY (Local_Uid, One)
>   55                          Debug = Local_Uid
>   56                          \_SB.PCI0.PRES.CSEL = Local_Uid
>   57                          \_SB.PCI0.PRES.CINS = One
>   58                          Local_CpuIdx++
>   59                      }
>   60                  }
>   61
>   62                  Release (\_SB.PCI0.PRES.CPLK)
>   63              }
> 
> When we take the Break on line 25, then:
> 
> (a) on line 25, the following equality holds:
> 
>   Local_Uid == CNEW[Local_NumAddedCpus - 1] + 1
> 
> (b) on line 60, the following equality holds:
> 
>   Local_Uid == CNEW[Local_NumAddedCpus - 1]
> 
> This means that, when we write Local_Uid to CSEL on line 15 again, then:
> 
> - we effectively re-investigate the last-cached CPU (with selector value
>   CNEW[Local_NumAddedCpus - 1])
> 
> - rather than resuming the scanning right after it -- that is, with
>   selector value (CNEW[Local_NumAddedCpus - 1] + 1) --, in the spirit of
>   line 42.
> 
> My question is: is this "rewind" intentional?
> 
> Now, I don't see a functionality problem with this, as on line 57, we
> clear the pending insert event for the last-cached CPU, so when we
> re-check it, the "get pending" command will simply seek past it.
> 
> But I'd like to understand if this is *precisely* your intent, or if
> it's an oversight and it just ends up working OK.
it's the later (it should not have any adverse effects) so I didn't care
much about restarting from the last processed CPU.

how about moving

  22                          If ((Local_NumAddedCpus == 0xFF))
  23                          {
  24                              Local_HasJob = One
  25                              Break
  26                          }

right after
  40                          }
  41
  42                          Local_Uid++

instead of adding extra 'if' at the end of outer loop?

> 
> Basically my question is whether we should add, on top:
> 
> > --- cscn.v5     2020-09-07 15:02:13.401663487 +0200
> > +++ cscn.v5.incr        2020-09-07 16:52:22.133843710 +0200
> > @@ -57,6 +57,10 @@
> >                          \_SB.PCI0.PRES.CINS = One
> >                          Local_CpuIdx++
> >                      }
> > +
> > +                    if ((Local_HasJob == One)) {
> > +                        Local_Uid++
> > +                    }
> >                  }
> >
> >                  Release (\_SB.PCI0.PRES.CPLK)  
> 
> If not -- that is, the currently proposed patch is intentional --, then
> I think we should add a comment, about the effective "rewind" being
> intentional & okay.
> 
> (Note: it's certainly valid and necessary to re-write CSEL on line 15
> after raising the SMI on line 47; the question is not whether we should
> or should not re-write CSEL (we must!), but the specific value that we
> write to CSEL.)
> 
> So:
> 
> - If the outer loop body is entered only once, then the patch looks
>   fine, from both the review side, and the testing side (I tested it
>   with 4-8 possible VCPUs).
> 
> - If the outer loop body is entered twice or more, then from the review
>   side, please my question above. From the testing side: do you have an
>   environment where I could test this with OVMF?
> 
> (I expect it to work OK. Upon the first SMI, the firmware will likely
> pick up more VCPUs than what's in CNEW. But edk2 commit 020bb4b46d6f
> ("OvmfPkg/CpuHotplugSmm: fix CPU hotplug race just before SMI
> broadcast", 2020-08-27) should deal with that.)
it should work, as firmware doesn't have to jump over hoops to
accomodate Windows quirks.

> Hmm, actually, there's no need for a special environment: I can patch
> QEMU and lower "max_cpus_per_pass" to something small, such as "3" or
> whatever, for testing the outer loop multiple times. But first I'd like
> to know your thoughts on the "rewind".

modulo OVMF, I've tested both:
 - hacking max_cpus_per_pass to simulate multiple passes of the outer loop
 - with maxcpus=288 to cross 255 thresold (2 passes of outer loop), to check
   that WS2019 is still working.

 
> Thanks!
> Laszlo
> 
>
Laszlo Ersek Sept. 8, 2020, 9:35 a.m. UTC | #3
On 09/08/20 09:39, Igor Mammedov wrote:
> On Mon, 7 Sep 2020 17:17:52 +0200
> Laszlo Ersek <lersek@redhat.com> wrote:

>>    1              Method (CSCN, 0, Serialized)
>>    2              {
>>    3                  Acquire (\_SB.PCI0.PRES.CPLK, 0xFFFF)
>>    4                  Name (CNEW, Package (0xFF){})
>>    5                  Local_Uid = Zero
>>    6                  Local_HasJob = One
>>    7                  While ((Local_HasJob == One))
>>    8                  {
>>    9                      Local_HasJob = Zero
>>   10                      Local_HasEvent = One
>>   11                      Local_NumAddedCpus = Zero
>>   12                      While (((Local_HasEvent == One) && (Local_Uid < 0x08)))
>>   13                      {
>>   14                          Local_HasEvent = Zero
>>   15                          \_SB.PCI0.PRES.CSEL = Local_Uid
>>   16                          \_SB.PCI0.PRES.CCMD = Zero
>>   17                          If ((\_SB.PCI0.PRES.CDAT < Local_Uid))
>>   18                          {
>>   19                              Break
>>   20                          }
>>   21
>>   22                          If ((Local_NumAddedCpus == 0xFF))
>>   23                          {
>>   24                              Local_HasJob = One
>>   25                              Break
>>   26                          }
>>   27
>>   28                          Local_Uid = \_SB.PCI0.PRES.CDAT
>>   29                          If ((\_SB.PCI0.PRES.CINS == One))
>>   30                          {
>>   31                              CNEW [Local_NumAddedCpus] = Local_Uid
>>   32                              Local_NumAddedCpus++
>>   33                              Local_HasEvent = One
>>   34                          }
>>   35                          ElseIf ((\_SB.PCI0.PRES.CRMV == One))
>>   36                          {
>>   37                              CTFY (Local_Uid, 0x03)
>>   38                              \_SB.PCI0.PRES.CRMV = One
>>   39                              Local_HasEvent = One
>>   40                          }
>>   41
>>   42                          Local_Uid++
>>   43                      }
>>   44
>>   45                      If ((Local_NumAddedCpus > Zero))
>>   46                      {
>>   47                          \_SB.PCI0.SMI0.SMIC = 0x04
>>   48                      }
>>   49
>>   50                      Local_CpuIdx = Zero
>>   51                      While ((Local_CpuIdx < Local_NumAddedCpus))
>>   52                      {
>>   53                          Local_Uid = DerefOf (CNEW [Local_CpuIdx])
>>   54                          CTFY (Local_Uid, One)
>>   55                          Debug = Local_Uid
>>   56                          \_SB.PCI0.PRES.CSEL = Local_Uid
>>   57                          \_SB.PCI0.PRES.CINS = One
>>   58                          Local_CpuIdx++
>>   59                      }
>>   60                  }
>>   61
>>   62                  Release (\_SB.PCI0.PRES.CPLK)
>>   63              }
>>
>> When we take the Break on line 25, then:
>>
>> (a) on line 25, the following equality holds:
>>
>>   Local_Uid == CNEW[Local_NumAddedCpus - 1] + 1
>>
>> (b) on line 60, the following equality holds:
>>
>>   Local_Uid == CNEW[Local_NumAddedCpus - 1]
>>
>> This means that, when we write Local_Uid to CSEL on line 15 again, then:
>>
>> - we effectively re-investigate the last-cached CPU (with selector value
>>   CNEW[Local_NumAddedCpus - 1])
>>
>> - rather than resuming the scanning right after it -- that is, with
>>   selector value (CNEW[Local_NumAddedCpus - 1] + 1) --, in the spirit of
>>   line 42.
>>
>> My question is: is this "rewind" intentional?
>>
>> Now, I don't see a functionality problem with this, as on line 57, we
>> clear the pending insert event for the last-cached CPU, so when we
>> re-check it, the "get pending" command will simply seek past it.
>>
>> But I'd like to understand if this is *precisely* your intent, or if
>> it's an oversight and it just ends up working OK.
> it's the later (it should not have any adverse effects) so I didn't care
> much about restarting from the last processed CPU.
>
> how about moving
>
>   22                          If ((Local_NumAddedCpus == 0xFF))
>   23                          {
>   24                              Local_HasJob = One
>   25                              Break
>   26                          }
>
> right after
>   40                          }
>   41
>   42                          Local_Uid++
>
> instead of adding extra 'if' at the end of outer loop?

That only seems to save a CSEL write on line 15, during the first
iteration of the outer loop. And we would still re-select the last
selector from CNEW, in the second iteration of the outer loop.

But, again, there's no bug; I just wanted to understand your intent.

Can you please squash the following patch:

> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> index 12839720018e..8dd4d8ebbf55 100644
> --- a/hw/acpi/cpu.c
> +++ b/hw/acpi/cpu.c
> @@ -601,6 +601,15 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>                      aml_append(while_ctx, aml_increment(cpu_idx));
>                  }
>                  aml_append(while_ctx2, while_ctx);
> +                /*
> +                 * If another batch is needed, then it will resume scanning
> +                 * exactly at -- and not after -- the last CPU that's currently
> +                 * in CPU_ADDED_LIST. In other words, the last CPU in
> +                 * CPU_ADDED_LIST is going to be re-checked. That's OK: we've
> +                 * just cleared the insert event for *all* CPUs in
> +                 * CPU_ADDED_LIST, including the last one. So the scan will
> +                 * simply seek past it.
> +                 */
>              }
>              aml_append(method, while_ctx2);
>              aml_append(method, aml_release(ctrl_lock));

With that:

Reviewed-by: Laszlo Ersek <lersek@redhat.com>

I'll also follow up with test results for this patch (meaning a lowered
"max_cpus_per_pass").

Thanks!
Laszlo
Laszlo Ersek Sept. 8, 2020, 11 a.m. UTC | #4
On 09/08/20 11:35, Laszlo Ersek wrote:
> On 09/08/20 09:39, Igor Mammedov wrote:
>> On Mon, 7 Sep 2020 17:17:52 +0200
>> Laszlo Ersek <lersek@redhat.com> wrote:
> 
>>>    1              Method (CSCN, 0, Serialized)
>>>    2              {
>>>    3                  Acquire (\_SB.PCI0.PRES.CPLK, 0xFFFF)
>>>    4                  Name (CNEW, Package (0xFF){})
>>>    5                  Local_Uid = Zero
>>>    6                  Local_HasJob = One
>>>    7                  While ((Local_HasJob == One))
>>>    8                  {
>>>    9                      Local_HasJob = Zero
>>>   10                      Local_HasEvent = One
>>>   11                      Local_NumAddedCpus = Zero
>>>   12                      While (((Local_HasEvent == One) && (Local_Uid < 0x08)))
>>>   13                      {
>>>   14                          Local_HasEvent = Zero
>>>   15                          \_SB.PCI0.PRES.CSEL = Local_Uid
>>>   16                          \_SB.PCI0.PRES.CCMD = Zero
>>>   17                          If ((\_SB.PCI0.PRES.CDAT < Local_Uid))
>>>   18                          {
>>>   19                              Break
>>>   20                          }
>>>   21
>>>   22                          If ((Local_NumAddedCpus == 0xFF))
>>>   23                          {
>>>   24                              Local_HasJob = One
>>>   25                              Break
>>>   26                          }
>>>   27
>>>   28                          Local_Uid = \_SB.PCI0.PRES.CDAT
>>>   29                          If ((\_SB.PCI0.PRES.CINS == One))
>>>   30                          {
>>>   31                              CNEW [Local_NumAddedCpus] = Local_Uid
>>>   32                              Local_NumAddedCpus++
>>>   33                              Local_HasEvent = One
>>>   34                          }
>>>   35                          ElseIf ((\_SB.PCI0.PRES.CRMV == One))
>>>   36                          {
>>>   37                              CTFY (Local_Uid, 0x03)
>>>   38                              \_SB.PCI0.PRES.CRMV = One
>>>   39                              Local_HasEvent = One
>>>   40                          }
>>>   41
>>>   42                          Local_Uid++
>>>   43                      }
>>>   44
>>>   45                      If ((Local_NumAddedCpus > Zero))
>>>   46                      {
>>>   47                          \_SB.PCI0.SMI0.SMIC = 0x04
>>>   48                      }
>>>   49
>>>   50                      Local_CpuIdx = Zero
>>>   51                      While ((Local_CpuIdx < Local_NumAddedCpus))
>>>   52                      {
>>>   53                          Local_Uid = DerefOf (CNEW [Local_CpuIdx])
>>>   54                          CTFY (Local_Uid, One)
>>>   55                          Debug = Local_Uid
>>>   56                          \_SB.PCI0.PRES.CSEL = Local_Uid
>>>   57                          \_SB.PCI0.PRES.CINS = One
>>>   58                          Local_CpuIdx++
>>>   59                      }
>>>   60                  }
>>>   61
>>>   62                  Release (\_SB.PCI0.PRES.CPLK)
>>>   63              }
>>>
>>> When we take the Break on line 25, then:
>>>
>>> (a) on line 25, the following equality holds:
>>>
>>>   Local_Uid == CNEW[Local_NumAddedCpus - 1] + 1
>>>
>>> (b) on line 60, the following equality holds:
>>>
>>>   Local_Uid == CNEW[Local_NumAddedCpus - 1]
>>>
>>> This means that, when we write Local_Uid to CSEL on line 15 again, then:
>>>
>>> - we effectively re-investigate the last-cached CPU (with selector value
>>>   CNEW[Local_NumAddedCpus - 1])
>>>
>>> - rather than resuming the scanning right after it -- that is, with
>>>   selector value (CNEW[Local_NumAddedCpus - 1] + 1) --, in the spirit of
>>>   line 42.
>>>
>>> My question is: is this "rewind" intentional?
>>>
>>> Now, I don't see a functionality problem with this, as on line 57, we
>>> clear the pending insert event for the last-cached CPU, so when we
>>> re-check it, the "get pending" command will simply seek past it.
>>>
>>> But I'd like to understand if this is *precisely* your intent, or if
>>> it's an oversight and it just ends up working OK.
>> it's the later (it should not have any adverse effects) so I didn't care
>> much about restarting from the last processed CPU.
>>
>> how about moving
>>
>>   22                          If ((Local_NumAddedCpus == 0xFF))
>>   23                          {
>>   24                              Local_HasJob = One
>>   25                              Break
>>   26                          }
>>
>> right after
>>   40                          }
>>   41
>>   42                          Local_Uid++
>>
>> instead of adding extra 'if' at the end of outer loop?
> 
> That only seems to save a CSEL write on line 15, during the first
> iteration of the outer loop. And we would still re-select the last
> selector from CNEW, in the second iteration of the outer loop.
> 
> But, again, there's no bug; I just wanted to understand your intent.
> 
> Can you please squash the following patch:
> 
>> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
>> index 12839720018e..8dd4d8ebbf55 100644
>> --- a/hw/acpi/cpu.c
>> +++ b/hw/acpi/cpu.c
>> @@ -601,6 +601,15 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>>                      aml_append(while_ctx, aml_increment(cpu_idx));
>>                  }
>>                  aml_append(while_ctx2, while_ctx);
>> +                /*
>> +                 * If another batch is needed, then it will resume scanning
>> +                 * exactly at -- and not after -- the last CPU that's currently
>> +                 * in CPU_ADDED_LIST. In other words, the last CPU in
>> +                 * CPU_ADDED_LIST is going to be re-checked. That's OK: we've
>> +                 * just cleared the insert event for *all* CPUs in
>> +                 * CPU_ADDED_LIST, including the last one. So the scan will
>> +                 * simply seek past it.
>> +                 */
>>              }
>>              aml_append(method, while_ctx2);
>>              aml_append(method, aml_release(ctrl_lock));
> 
> With that:
> 
> Reviewed-by: Laszlo Ersek <lersek@redhat.com>
> 
> I'll also follow up with test results for this patch (meaning a lowered
> "max_cpus_per_pass").

Tested-by: Laszlo Ersek <lersek@redhat.com>
diff mbox series

Patch

diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
index 3d6a500fb7..1283972001 100644
--- a/hw/acpi/cpu.c
+++ b/hw/acpi/cpu.c
@@ -14,6 +14,8 @@ 
 #define ACPI_CPU_CMD_DATA_OFFSET_RW 8
 #define ACPI_CPU_CMD_DATA2_OFFSET_R 0
 
+#define OVMF_CPUHP_SMI_CMD 4
+
 enum {
     CPHP_GET_NEXT_CPU_WITH_EVENT_CMD = 0,
     CPHP_OST_EVENT_CMD = 1,
@@ -321,6 +323,7 @@  const VMStateDescription vmstate_cpu_hotplug = {
 #define CPU_NOTIFY_METHOD "CTFY"
 #define CPU_EJECT_METHOD  "CEJ0"
 #define CPU_OST_METHOD    "COST"
+#define CPU_ADDED_LIST    "CNEW"
 
 #define CPU_ENABLED       "CPEN"
 #define CPU_SELECTOR      "CSEL"
@@ -465,42 +468,141 @@  void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
 
         method = aml_method(CPU_SCAN_METHOD, 0, AML_SERIALIZED);
         {
+            const uint8_t max_cpus_per_pass = 255;
             Aml *else_ctx;
-            Aml *while_ctx;
+            Aml *while_ctx, *while_ctx2;
             Aml *has_event = aml_local(0);
             Aml *dev_chk = aml_int(1);
             Aml *eject_req = aml_int(3);
             Aml *next_cpu_cmd = aml_int(CPHP_GET_NEXT_CPU_WITH_EVENT_CMD);
+            Aml *num_added_cpus = aml_local(1);
+            Aml *cpu_idx = aml_local(2);
+            Aml *uid = aml_local(3);
+            Aml *has_job = aml_local(4);
+            Aml *new_cpus = aml_name(CPU_ADDED_LIST);
 
             aml_append(method, aml_acquire(ctrl_lock, 0xFFFF));
-            aml_append(method, aml_store(one, has_event));
-            while_ctx = aml_while(aml_equal(has_event, one));
+
+            /*
+             * Windows versions newer than XP (including Windows 10/Windows
+             * Server 2019), do support* VarPackageOp but, it is cripled to hold
+             * the same elements number as old PackageOp.
+             * For compatibility with Windows XP (so it won't crash) use ACPI1.0
+             * PackageOp which can hold max 255 elements.
+             *
+             * use named package as old Windows don't support it in local var
+             */
+            aml_append(method, aml_name_decl(CPU_ADDED_LIST,
+                                             aml_package(max_cpus_per_pass)));
+
+            aml_append(method, aml_store(zero, uid));
+            aml_append(method, aml_store(one, has_job));
+            /*
+             * CPU_ADDED_LIST can hold limited number of elements, outer loop
+             * allows to process CPUs in batches which let us to handle more
+             * CPUs than CPU_ADDED_LIST can hold.
+             */
+            while_ctx2 = aml_while(aml_equal(has_job, one));
             {
-                 /* clear loop exit condition, ins_evt/rm_evt checks
-                  * will set it to 1 while next_cpu_cmd returns a CPU
-                  * with events */
-                 aml_append(while_ctx, aml_store(zero, has_event));
-                 aml_append(while_ctx, aml_store(next_cpu_cmd, cpu_cmd));
-                 ifctx = aml_if(aml_equal(ins_evt, one));
-                 {
-                     aml_append(ifctx,
-                         aml_call2(CPU_NOTIFY_METHOD, cpu_data, dev_chk));
-                     aml_append(ifctx, aml_store(one, ins_evt));
-                     aml_append(ifctx, aml_store(one, has_event));
-                 }
-                 aml_append(while_ctx, ifctx);
-                 else_ctx = aml_else();
-                 ifctx = aml_if(aml_equal(rm_evt, one));
-                 {
-                     aml_append(ifctx,
-                         aml_call2(CPU_NOTIFY_METHOD, cpu_data, eject_req));
-                     aml_append(ifctx, aml_store(one, rm_evt));
-                     aml_append(ifctx, aml_store(one, has_event));
-                 }
-                 aml_append(else_ctx, ifctx);
-                 aml_append(while_ctx, else_ctx);
+                aml_append(while_ctx2, aml_store(zero, has_job));
+
+                aml_append(while_ctx2, aml_store(one, has_event));
+                aml_append(while_ctx2, aml_store(zero, num_added_cpus));
+
+                /*
+                 * Scan CPUs, till there are CPUs with events or
+                 * CPU_ADDED_LIST capacity is exhausted
+                 */
+                while_ctx = aml_while(aml_land(aml_equal(has_event, one),
+                                      aml_lless(uid, aml_int(arch_ids->len))));
+                {
+                     /*
+                      * clear loop exit condition, ins_evt/rm_evt checks will
+                      * set it to 1 while next_cpu_cmd returns a CPU with events
+                      */
+                     aml_append(while_ctx, aml_store(zero, has_event));
+
+                     aml_append(while_ctx, aml_store(uid, cpu_selector));
+                     aml_append(while_ctx, aml_store(next_cpu_cmd, cpu_cmd));
+
+                     /*
+                      * wrap around case, scan is complete, exit loop.
+                      * It happens since events are not cleared in scan loop,
+                      * so next_cpu_cmd continues to find already processed CPUs
+                      */
+                     ifctx = aml_if(aml_lless(cpu_data, uid));
+                     {
+                         aml_append(ifctx, aml_break());
+                     }
+                     aml_append(while_ctx, ifctx);
+
+                     /*
+                      * if CPU_ADDED_LIST is full, exit inner loop and process
+                      * collected CPUs
+                      */
+                     ifctx = aml_if(
+                         aml_equal(num_added_cpus, aml_int(max_cpus_per_pass)));
+                     {
+                         aml_append(ifctx, aml_store(one, has_job));
+                         aml_append(ifctx, aml_break());
+                     }
+                     aml_append(while_ctx, ifctx);
+
+                     aml_append(while_ctx, aml_store(cpu_data, uid));
+                     ifctx = aml_if(aml_equal(ins_evt, one));
+                     {
+                         /* cache added CPUs to Notify/Wakeup later */
+                         aml_append(ifctx, aml_store(uid,
+                             aml_index(new_cpus, num_added_cpus)));
+                         aml_append(ifctx, aml_increment(num_added_cpus));
+                         aml_append(ifctx, aml_store(one, has_event));
+                     }
+                     aml_append(while_ctx, ifctx);
+                     else_ctx = aml_else();
+                     ifctx = aml_if(aml_equal(rm_evt, one));
+                     {
+                         aml_append(ifctx,
+                             aml_call2(CPU_NOTIFY_METHOD, uid, eject_req));
+                         aml_append(ifctx, aml_store(one, rm_evt));
+                         aml_append(ifctx, aml_store(one, has_event));
+                     }
+                     aml_append(else_ctx, ifctx);
+                     aml_append(while_ctx, else_ctx);
+                     aml_append(while_ctx, aml_increment(uid));
+                }
+                aml_append(while_ctx2, while_ctx);
+
+                /*
+                 * in case FW negotiated ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT,
+                 * make upcall to FW, so it can pull in new CPUs before
+                 * OS is notified and wakes them up
+                 */
+                if (opts.smi_path) {
+                    ifctx = aml_if(aml_lgreater(num_added_cpus, zero));
+                    {
+                        aml_append(ifctx, aml_store(aml_int(OVMF_CPUHP_SMI_CMD),
+                            aml_name("%s", opts.smi_path)));
+                    }
+                    aml_append(while_ctx2, ifctx);
+                }
+
+                /* Notify OSPM about new CPUs and clear insert events */
+                aml_append(while_ctx2, aml_store(zero, cpu_idx));
+                while_ctx = aml_while(aml_lless(cpu_idx, num_added_cpus));
+                {
+                    aml_append(while_ctx,
+                        aml_store(aml_derefof(aml_index(new_cpus, cpu_idx)),
+                                  uid));
+                    aml_append(while_ctx,
+                        aml_call2(CPU_NOTIFY_METHOD, uid, dev_chk));
+                    aml_append(while_ctx, aml_store(uid, aml_debug()));
+                    aml_append(while_ctx, aml_store(uid, cpu_selector));
+                    aml_append(while_ctx, aml_store(one, ins_evt));
+                    aml_append(while_ctx, aml_increment(cpu_idx));
+                }
+                aml_append(while_ctx2, while_ctx);
             }
-            aml_append(method, while_ctx);
+            aml_append(method, while_ctx2);
             aml_append(method, aml_release(ctrl_lock));
         }
         aml_append(cpus_dev, method);