Patchwork [RFC,seabios] enumerate APIC IDs directly from CPUs

login
register
mail settings
Submitter Eduardo Habkost
Date July 17, 2012, 9:56 p.m.
Message ID <1342562190-25456-1-git-send-email-ehabkost@redhat.com>
Download mbox | patch
Permalink /patch/171565/
State New
Headers show

Comments

Eduardo Habkost - July 17, 2012, 9:56 p.m.
This patch is an attempt to fix the non-continguous-APIC-ID problem without the
FW_CFG_LAPIC_INFO approach I have sent proposed last week.

Basically, this changes Seabios to probe for APIC IDs directly from the
CPUs on boot, instead of getting it using fw_cfg, store the found APIC
IDs on a bitmap, and use that information whe building the MADT, SRAT,
and SSDT ACPI tables.

To do this properly, we have to decide the meaning of "CPU IDs" in the
QEMU<->Seabios interfaces, too. I see two possible approaches:

1) Have Seabios and QEMU agree on a a "CPU identifier", that is
   independent from the APIC ID.
2) Always use the Initial APIC ID on all communication between QEMU and
   Seabios.

This patch implements the second approach. That means:

- QEMU won't know anything about the ACPI Processor IDs chosen by
  Seabios;
- The NUMA CPU affinity table in QEMU_CFG_NUMA is APIC-ID-based, not
  Processor-ID-based;
- The CPU hotplug bitmaps on I/O port 0xaf00 will be APIC-ID-based, too.

Internally, the following changes were made on the ACPI code:

- The CPON array is now APIC-ID-based
- The NTFY method now gets the CPU APIC ID as parameter
- The CPMA method now gets two parameters: the Processor ID and the
  APIC ID
- The CPST method now gets the CPU APIC ID as parameter

To try to make things less likely to break on the common, non-hotplug
case, this patch makes the Processor IDs be chosen this way:

- The CPUs present on boot get contiguous Processor IDs (even if the
  APIC IDs are not contiguous);
- The remaining Processor declarations are going to associated to the
  remaining APIC IDs (immediately after the last present APIC ID),
  sequentially. This means that hotplugged CPUs may not get contiguous
  Processor declarations if their APIC IDs are not contiguous.

I would like to know what others think about this approach. I think it
is a much simpler approach than relying on some APIC ID <=> CPU ID
translation to make Seabios and QEMU agree. This way, Seabios has
absolute freedom to choose the ACPI Processor IDs.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
---
 src/acpi-dsdt.dsl |   11 ++++--
 src/acpi.c        |  109 ++++++++++++++++++++++++++++++++++++++++++++---------
 src/smp.c         |   17 +++++++++
 src/ssdt-proc.dsl |   12 ++++--
 src/util.h        |    3 ++
 5 files changed, 127 insertions(+), 25 deletions(-)
Gleb Natapov - July 19, 2012, 9:58 a.m.
On Tue, Jul 17, 2012 at 06:56:30PM -0300, Eduardo Habkost wrote:
> This patch is an attempt to fix the non-continguous-APIC-ID problem without the
> FW_CFG_LAPIC_INFO approach I have sent proposed last week.
> 
> Basically, this changes Seabios to probe for APIC IDs directly from the
> CPUs on boot, instead of getting it using fw_cfg, store the found APIC
> IDs on a bitmap, and use that information whe building the MADT, SRAT,
> and SSDT ACPI tables.
> 
> To do this properly, we have to decide the meaning of "CPU IDs" in the
> QEMU<->Seabios interfaces, too. I see two possible approaches:
> 
> 1) Have Seabios and QEMU agree on a a "CPU identifier", that is
>    independent from the APIC ID.
> 2) Always use the Initial APIC ID on all communication between QEMU and
>    Seabios.
> 
We need to be prepared to support more than 255 cpus. With > 255 cpus
comes x2apic and x2apic has 32bit apic ids. HW does not have to support
all of the bits though, but potentially all the bitmasks can grow
prohibitedly large.


> This patch implements the second approach. That means:
> 
> - QEMU won't know anything about the ACPI Processor IDs chosen by
>   Seabios;
> - The NUMA CPU affinity table in QEMU_CFG_NUMA is APIC-ID-based, not
>   Processor-ID-based;
> - The CPU hotplug bitmaps on I/O port 0xaf00 will be APIC-ID-based, too.
> 
> Internally, the following changes were made on the ACPI code:
> 
> - The CPON array is now APIC-ID-based
> - The NTFY method now gets the CPU APIC ID as parameter
> - The CPMA method now gets two parameters: the Processor ID and the
>   APIC ID
> - The CPST method now gets the CPU APIC ID as parameter
> 
> To try to make things less likely to break on the common, non-hotplug
> case, this patch makes the Processor IDs be chosen this way:
> 
> - The CPUs present on boot get contiguous Processor IDs (even if the
>   APIC IDs are not contiguous);
> - The remaining Processor declarations are going to associated to the
>   remaining APIC IDs (immediately after the last present APIC ID),
>   sequentially. This means that hotplugged CPUs may not get contiguous
>   Processor declarations if their APIC IDs are not contiguous.
> 
I am curious what will happen if cpu will be hot plugged, than hibernate
and resume is done. After resume hot plugged cpu will have different
Processor ID in ACPI. This may or may not be a problem.

Some comment below.

> I would like to know what others think about this approach. I think it
> is a much simpler approach than relying on some APIC ID <=> CPU ID
> translation to make Seabios and QEMU agree. This way, Seabios has
> absolute freedom to choose the ACPI Processor IDs.
> 
> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
> ---
>  src/acpi-dsdt.dsl |   11 ++++--
>  src/acpi.c        |  109 ++++++++++++++++++++++++++++++++++++++++++++---------
>  src/smp.c         |   17 +++++++++
>  src/ssdt-proc.dsl |   12 ++++--
>  src/util.h        |    3 ++
>  5 files changed, 127 insertions(+), 25 deletions(-)
> 
> diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
> index 2060686..51ca037 100644
> --- a/src/acpi-dsdt.dsl
> +++ b/src/acpi-dsdt.dsl
> @@ -674,20 +674,23 @@ DefinitionBlock (
>          External(CPON, PkgObj)
>  
>          /* Methods called by run-time generated SSDT Processor objects */
> -        Method (CPMA, 1, NotSerialized) {
> +        Method (CPMA, 2, NotSerialized) {
>              // _MAT method - create an madt apic buffer
> +            // Arg0 = Processor ID
> +            // Arg1 = Local APIC ID
>              // Local0 = CPON flag for this cpu
> -            Store(DerefOf(Index(CPON, Arg0)), Local0)
> +            Store(DerefOf(Index(CPON, Arg1)), Local0)
>              // Local1 = Buffer (in madt apic form) to return
>              Store(Buffer(8) {0x00, 0x08, 0x00, 0x00, 0x00, 0, 0, 0}, Local1)
>              // Update the processor id, lapic id, and enable/disable status
>              Store(Arg0, Index(Local1, 2))
> -            Store(Arg0, Index(Local1, 3))
> +            Store(Arg1, Index(Local1, 3))
>              Store(Local0, Index(Local1, 4))
>              Return (Local1)
>          }
>          Method (CPST, 1, NotSerialized) {
>              // _STA method - return ON status of cpu
> +            // Arg0 = Local APIC ID
>              // Local0 = CPON flag for this cpu
>              Store(DerefOf(Index(CPON, Arg0)), Local0)
>              If (Local0) { Return(0xF) } Else { Return(0x0) }
> @@ -708,7 +711,7 @@ DefinitionBlock (
>              Store (PRS, Local5)
>              // Local2 = last read byte from bitmap
>              Store (Zero, Local2)
> -            // Local0 = cpuid iterator
> +            // Local0 = APIC ID iterator
>              Store (Zero, Local0)
>              While (LLess(Local0, SizeOf(CPON))) {
>                  // Local1 = CPON flag for this cpu
> diff --git a/src/acpi.c b/src/acpi.c
> index 55e4607..149ff42 100644
> --- a/src/acpi.c
> +++ b/src/acpi.c
> @@ -302,6 +302,63 @@ build_fadt(struct pci_device *pci)
>      return fadt;
>  }
>  
> +
> +/* APIC IDs of each Processor indexed by ACPI Processor ID.
> + * Used to make the MADT entries and SSDT Processor declarations
> + * match each other.
> + */
> +static u8 *processor_apic_ids;
> +
> +/* Number of Processor IDs that have a valid 8-bit APIC ID */
> +static int count_valid_apic_ids;
> +
> +/* Build a ACPI Processor ID -> Local APIC ID table, used for the MADT
> + * and SSDT Processor declarations.
> + * The CPUs present at boot will be assigned to the first entries.
> + * The remaining entries will get sequential APIC IDs.
> + */
> +static void
> +check_apic_ids(void)
> +{
> +    processor_apic_ids = malloc_high(MaxCountCPUs);
> +    if (!processor_apic_ids) {
> +        warn_noalloc();
> +        count_valid_apic_ids = 0;
> +        return;
> +    }
> +
> +    int i;
> +    int apic_id = 0;
> +    int last_present_apic_id = 0;
> +
> +    // the first entries are for the already-present CPUs:
> +    for (i = 0; i < MaxCountCPUs; i++) {
> +        // look for next present APIC ID:
> +        while (apic_id <= MAX_APIC_ID && !apic_id_is_present(apic_id))
> +            apic_id++;
> +
> +        if (apic_id > MAX_APIC_ID)
> +            break; // no more valid APIC IDs
> +
> +        processor_apic_ids[i] = apic_id;
> +        last_present_apic_id = apic_id;
> +        apic_id++;
> +    }
> +
> +    // the remaining entries will get sequential APIC IDs assigned, up to
> +    // MAX_APIC_ID
> +    apic_id = last_present_apic_id;
> +    for ( ; i < MaxCountCPUs; i++) {
> +        apic_id++;
> +        if (apic_id > MAX_APIC_ID)
> +            break;
> +
> +        processor_apic_ids[i] = apic_id;
> +    }
> +
> +    count_valid_apic_ids = i;
> +}
> +
What if there is only one apic id present and it is 0xfe? It looks like
the above code will result in count_valid_apic_ids been equal 1, so
instead of MaxCountCPUs only one will be created.

>  static void*
>  build_madt(void)
>  {
> @@ -321,15 +378,16 @@ build_madt(void)
>      madt->flags = cpu_to_le32(1);
>      struct madt_processor_apic *apic = (void*)&madt[1];
>      int i;
> -    for (i=0; i<MaxCountCPUs; i++) {
> +    for (i = 0; i < count_valid_apic_ids; i++) {
>          apic->type = APIC_PROCESSOR;
>          apic->length = sizeof(*apic);
>          apic->processor_id = i;
> -        apic->local_apic_id = i;
> -        if (i < CountCPUs)
> +        apic->local_apic_id = processor_apic_ids[i];
> +        if (apic_id_is_present(apic->local_apic_id)) {
>              apic->flags = cpu_to_le32(1);
> -        else
> +        } else {
>              apic->flags = cpu_to_le32(0);
> +        }
I do not think Seabios codding style require braces here.

>          apic++;
>      }
>      struct madt_io_apic *io_apic = (void*)apic;
> @@ -402,6 +460,7 @@ encodeLen(u8 *ssdt_ptr, int length, int bytes)
>  #define SD_OFFSET_CPUHEX (*ssdt_proc_name - *ssdt_proc_start + 2)
>  #define SD_OFFSET_CPUID1 (*ssdt_proc_name - *ssdt_proc_start + 4)
>  #define SD_OFFSET_CPUID2 (*ssdt_proc_id - *ssdt_proc_start)
> +#define SD_OFFSET_APICID (*ssdt_proc_apic_id - *ssdt_proc_start)
>  #define SD_SIZEOF (*ssdt_proc_end - *ssdt_proc_start)
>  #define SD_PROC (ssdp_proc_aml + *ssdt_proc_start)
>  
> @@ -410,12 +469,13 @@ encodeLen(u8 *ssdt_ptr, int length, int bytes)
>  static void*
>  build_ssdt(void)
>  {
> -    int acpi_cpus = MaxCountCPUs > 0xff ? 0xff : MaxCountCPUs;
> +    int acpi_cpus = count_valid_apic_ids;
> +    int cpon_size = count_valid_apic_ids ? processor_apic_ids[count_valid_apic_ids - 1] + 1 : 0;
>      // length = ScopeOp + procs + NTYF method + CPON package
>      int length = ((1+3+4)
>                    + (acpi_cpus * SD_SIZEOF)
>                    + (1+2+5+(12*acpi_cpus))
> -                  + (6+2+1+(1*acpi_cpus))
> +                  + (6+2+1+(1*cpon_size))
>                    + 17);
>      u8 *ssdt = malloc_high(sizeof(struct acpi_table_header) + length);
>      if (! ssdt) {
> @@ -440,10 +500,12 @@ build_ssdt(void)
>          ssdt_ptr[SD_OFFSET_CPUHEX+1] = getHex(i);
>          ssdt_ptr[SD_OFFSET_CPUID1] = i;
>          ssdt_ptr[SD_OFFSET_CPUID2] = i;
> +        ssdt_ptr[SD_OFFSET_APICID] = processor_apic_ids[i];
>          ssdt_ptr += SD_SIZEOF;
>      }
>  
>      // build "Method(NTFY, 2) {If (LEqual(Arg0, 0x00)) {Notify(CP00, Arg1)} ...}"
> +    // Arg0 = APIC ID
>      *(ssdt_ptr++) = 0x14; // MethodOp
>      ssdt_ptr = encodeLen(ssdt_ptr, 2+5+(12*acpi_cpus), 2);
>      *(ssdt_ptr++) = 'N';
> @@ -457,7 +519,7 @@ build_ssdt(void)
>          *(ssdt_ptr++) = 0x93; // LEqualOp
>          *(ssdt_ptr++) = 0x68; // Arg0Op
>          *(ssdt_ptr++) = 0x0A; // BytePrefix
> -        *(ssdt_ptr++) = i;
> +        *(ssdt_ptr++) = processor_apic_ids[i];
>          *(ssdt_ptr++) = 0x86; // NotifyOp
>          *(ssdt_ptr++) = 'C';
>          *(ssdt_ptr++) = 'P';
> @@ -467,6 +529,7 @@ build_ssdt(void)
>      }
>  
>      // build "Name(CPON, Package() { One, One, ..., Zero, Zero, ... })"
> +    // CPON array is APIC-ID based, not Processor ID based
>      *(ssdt_ptr++) = 0x08; // NameOp
>      *(ssdt_ptr++) = 'C';
>      *(ssdt_ptr++) = 'P';
> @@ -474,9 +537,9 @@ build_ssdt(void)
>      *(ssdt_ptr++) = 'N';
>      *(ssdt_ptr++) = 0x12; // PackageOp
>      ssdt_ptr = encodeLen(ssdt_ptr, 2+1+(1*acpi_cpus), 2);
> -    *(ssdt_ptr++) = acpi_cpus;
> -    for (i=0; i<acpi_cpus; i++)
> -        *(ssdt_ptr++) = (i < CountCPUs) ? 0x01 : 0x00;
> +    *(ssdt_ptr++) = cpon_size;
> +    for (i=0; i<cpon_size; i++)
> +        *(ssdt_ptr++) = (apic_id_is_present(i)) ? 0x01 : 0x00;
>  
>      // store pci io windows: start, end, length
>      // this way we don't have to do the math in the dsdt
> @@ -644,30 +707,40 @@ build_srat(void)
>      memset(srat, 0, srat_size);
>      srat->reserved1=1;
>      struct srat_processor_affinity *core = (void*)(srat + 1);
> -    int i;
> +    int apic_id;
>      u64 curnode;
> +    u64 *cpudata = numadata;
>  
> -    for (i = 0; i < MaxCountCPUs; ++i) {
> +    /* Note that the fw_cfg NUMA CPU affinity data is APIC-ID-based, not
> +     * Processor ID based, but the size of the tablesble is based on MaxCountCPUs
> +     */
> +    for (apic_id = 0; apic_id < MaxCountCPUs; ++apic_id) {
> +        if (apic_id > MAX_APIC_ID)
> +            break;
And here you omit braces.

> +        curnode = *cpudata++;
>          core->type = SRAT_PROCESSOR;
>          core->length = sizeof(*core);
> -        core->local_apic_id = i;
> -        curnode = *numadata++;
>          core->proximity_lo = curnode;
>          memset(core->proximity_hi, 0, 3);
>          core->local_sapic_eid = 0;
> -        if (i < CountCPUs)
> +        core->local_apic_id = apic_id;
> +        if (apic_id_is_present(apic_id)) {
>              core->flags = cpu_to_le32(1);
> -        else
> -            core->flags = 0;
> +        } else {
> +            core->flags = cpu_to_le32(0);
> +        }
And here you add them again :)

>          core++;
>      }
>  
> +    numadata += MaxCountCPUs;
> +
>  
>      /* the memory map is a bit tricky, it contains at least one hole
>       * from 640k-1M and possibly another one from 3.5G-4G.
>       */
>      struct srat_memory_affinity *numamem = (void*)core;
>      int slots = 0;
> +    int i;
>      u64 mem_len, mem_base, next_base = 0;
>  
>      acpi_build_srat_memory(numamem, 0, 640*1024, 0, 1);
> @@ -742,6 +815,8 @@ acpi_bios_init(void)
>              tbl_idx++;                                       \
>      } while(0)
>  
> +    check_apic_ids();
> +
>      struct fadt_descriptor_rev1 *fadt = build_fadt(pci);
>      ACPI_INIT_TABLE(fadt);
>      ACPI_INIT_TABLE(build_ssdt());
> diff --git a/src/smp.c b/src/smp.c
> index 8c077a1..af92e4a 100644
> --- a/src/smp.c
> +++ b/src/smp.c
> @@ -36,6 +36,8 @@ wrmsr_smp(u32 index, u64 val)
>  
>  u32 CountCPUs VAR16VISIBLE;
>  u32 MaxCountCPUs VAR16VISIBLE;
> +// 256 bits for the found APIC IDs
> +u32 FoundAPICIDs[256/32] VAR16VISIBLE;
>  extern void smp_ap_boot_code(void);
>  ASM16(
>      "  .global smp_ap_boot_code\n"
> @@ -59,6 +61,12 @@ ASM16(
>      "  jmp 1b\n"
>      "2:\n"
>  
> +    // get apic ID on EBX, set bit on FoundAPICIDs
> +    "  mov $1, %eax\n"
> +    "  cpuid\n"
> +    "  shrl $24, %ebx\n"
> +    "  lock bts %ebx, FoundAPICIDs\n"
> +
>      // Increment the cpu counter
>      "  lock incl CountCPUs\n"
>  
> @@ -67,6 +75,11 @@ ASM16(
>      "  jmp 1b\n"
>      );
>  
> +int apic_id_is_present(u8 apic_id)
> +{
> +    return FoundAPICIDs[apic_id/32] & (1 << (apic_id % 32));
> +}
> +
>  // find and initialize the CPUs by launching a SIPI to them
>  void
>  smp_probe(void)
> @@ -82,6 +95,10 @@ smp_probe(void)
>          return;
>      }
>  
> +    // mark the BSP initial APIC ID as found, too:
> +    u8 apic_id = ebx>>24;
> +    FoundAPICIDs[apic_id/32] |= (1 << (apic_id % 32));
> +
>      // Init the counter.
>      writel(&CountCPUs, 1);
>  
> diff --git a/src/ssdt-proc.dsl b/src/ssdt-proc.dsl
> index 0339422..7e7e855 100644
> --- a/src/ssdt-proc.dsl
> +++ b/src/ssdt-proc.dsl
> @@ -5,13 +5,15 @@
>   * be placed in the \_SB_ namespace.
>   *
>   * In addition to the aml code generated from this file, the
> - * src/acpi.c file creates a NTFY method with an entry for each cpu:
> + * src/acpi.c file creates a NTFY method with an entry for each cpu,
> + * that checks for the APIC ID:
>   *     Method(NTFY, 2) {
>   *         If (LEqual(Arg0, 0x00)) { Notify(CP00, Arg1) }
>   *         If (LEqual(Arg0, 0x01)) { Notify(CP01, Arg1) }
>   *         ...
>   *     }
> - * and a CPON array with the list of active and inactive cpus:
> + * and a CPON array with the list of active and inactive cpus,
> + * indexed by the APIC ID:
>   *     Name(CPON, Package() { One, One, ..., Zero, Zero, ... })
>   */
>  
> @@ -25,6 +27,8 @@ DefinitionBlock ("ssdt-proc.aml", "SSDT", 0x01, "BXPC", "BXSSDT", 0x1)
>      Processor (CPAA, 0xAA, 0x0000b010, 0x06) {
>          ACPI_EXTRACT_NAME_BYTE_CONST ssdt_proc_id
>          Name (ID, 0xAA)
> +        ACPI_EXTRACT_NAME_BYTE_CONST ssdt_proc_apic_id
> +        Name (APIC, 0xAA)
>  /*
>   * The src/acpi.c code requires the above ACP_EXTRACT tags so that it can update
>   * CPAA and 0xAA with the appropriate CPU id (see
> @@ -36,10 +40,10 @@ DefinitionBlock ("ssdt-proc.aml", "SSDT", 0x01, "BXPC", "BXSSDT", 0x1)
>          External(CPST, MethodObj)
>          External(CPEJ, MethodObj)
>          Method(_MAT, 0) {
> -            Return(CPMA(ID))
> +            Return(CPMA(ID, APIC))
>          }
>          Method (_STA, 0) {
> -            Return(CPST(ID))
> +            Return(CPST(APIC))
>          }
>          Method (_EJ0, 1, NotSerialized) {
>              CPEJ(ID, Arg0)
> diff --git a/src/util.h b/src/util.h
> index ef8ec7c..a68f56b 100644
> --- a/src/util.h
> +++ b/src/util.h
> @@ -351,10 +351,13 @@ void pci_setup(void);
>  void smm_init(void);
>  
>  // smp.c
> +#define MAX_APIC_ID 0xFE
> +
>  extern u32 CountCPUs;
>  extern u32 MaxCountCPUs;
>  void wrmsr_smp(u32 index, u64 val);
>  void smp_probe(void);
> +int apic_id_is_present(u8 apic_id);
>  
>  // coreboot.c
>  extern const char *CBvendor, *CBpart;
> -- 
> 1.7.10.4

--
			Gleb.
Eduardo Habkost - July 19, 2012, 2:28 p.m.
On Thu, Jul 19, 2012 at 12:58:46PM +0300, Gleb Natapov wrote:
> On Tue, Jul 17, 2012 at 06:56:30PM -0300, Eduardo Habkost wrote:
> > This patch is an attempt to fix the non-continguous-APIC-ID problem without the
> > FW_CFG_LAPIC_INFO approach I have sent proposed last week.
> > 
> > Basically, this changes Seabios to probe for APIC IDs directly from the
> > CPUs on boot, instead of getting it using fw_cfg, store the found APIC
> > IDs on a bitmap, and use that information whe building the MADT, SRAT,
> > and SSDT ACPI tables.
> > 
> > To do this properly, we have to decide the meaning of "CPU IDs" in the
> > QEMU<->Seabios interfaces, too. I see two possible approaches:
> > 
> > 1) Have Seabios and QEMU agree on a a "CPU identifier", that is
> >    independent from the APIC ID.
> > 2) Always use the Initial APIC ID on all communication between QEMU and
> >    Seabios.
> > 
> We need to be prepared to support more than 255 cpus. With > 255 cpus
> comes x2apic and x2apic has 32bit apic ids. HW does not have to support
> all of the bits though, but potentially all the bitmasks can grow
> prohibitedly large.

I see only two solutions:

- Specify an interface/convention for QEMU and Seabios agree upon a "CPU
  identifier" <=> x2apic <=> LAPIC ID mapping for all CPUs.
- Specify new NUMA-information and CPU hotplug interfaces (or extend the
  existing ones) based on x2apic ID, when Seabios start supporting
  x2apic.

I am not particularly inclined towards any of those two solutions. I
dislike them equally.  :-)

Note that I am more worried about the QEMU<->Seabios interfaces. The
APIC ID bitmap on smp.c, for example, is just an implementation detail:
if we make Seabios support x2apic, that code can be changed to use a
different data structure instead.



> 
> 
> > This patch implements the second approach. That means:
> > 
> > - QEMU won't know anything about the ACPI Processor IDs chosen by
> >   Seabios;
> > - The NUMA CPU affinity table in QEMU_CFG_NUMA is APIC-ID-based, not
> >   Processor-ID-based;
> > - The CPU hotplug bitmaps on I/O port 0xaf00 will be APIC-ID-based, too.
> > 
> > Internally, the following changes were made on the ACPI code:
> > 
> > - The CPON array is now APIC-ID-based
> > - The NTFY method now gets the CPU APIC ID as parameter
> > - The CPMA method now gets two parameters: the Processor ID and the
> >   APIC ID
> > - The CPST method now gets the CPU APIC ID as parameter
> > 
> > To try to make things less likely to break on the common, non-hotplug
> > case, this patch makes the Processor IDs be chosen this way:
> > 
> > - The CPUs present on boot get contiguous Processor IDs (even if the
> >   APIC IDs are not contiguous);
> > - The remaining Processor declarations are going to associated to the
> >   remaining APIC IDs (immediately after the last present APIC ID),
> >   sequentially. This means that hotplugged CPUs may not get contiguous
> >   Processor declarations if their APIC IDs are not contiguous.
> > 
> I am curious what will happen if cpu will be hot plugged, than hibernate
> and resume is done. After resume hot plugged cpu will have different
> Processor ID in ACPI. This may or may not be a problem.

True. Keeping those tables stable after hotplug and hibernate may be a
challenge.

Maybe it would be easier to just leave holes on the MADT and SSDT tables
(making APIC ID and Processor ID always match), and hope no OS will be
confused by the holes.

> 
> Some comment below.
> 
> > I would like to know what others think about this approach. I think it
> > is a much simpler approach than relying on some APIC ID <=> CPU ID
> > translation to make Seabios and QEMU agree. This way, Seabios has
> > absolute freedom to choose the ACPI Processor IDs.
> > 
> > Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
> > ---
> >  src/acpi-dsdt.dsl |   11 ++++--
> >  src/acpi.c        |  109 ++++++++++++++++++++++++++++++++++++++++++++---------
> >  src/smp.c         |   17 +++++++++
> >  src/ssdt-proc.dsl |   12 ++++--
> >  src/util.h        |    3 ++
> >  5 files changed, 127 insertions(+), 25 deletions(-)
> > 
> > diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
> > index 2060686..51ca037 100644
> > --- a/src/acpi-dsdt.dsl
> > +++ b/src/acpi-dsdt.dsl
> > @@ -674,20 +674,23 @@ DefinitionBlock (
> >          External(CPON, PkgObj)
> >  
> >          /* Methods called by run-time generated SSDT Processor objects */
> > -        Method (CPMA, 1, NotSerialized) {
> > +        Method (CPMA, 2, NotSerialized) {
> >              // _MAT method - create an madt apic buffer
> > +            // Arg0 = Processor ID
> > +            // Arg1 = Local APIC ID
> >              // Local0 = CPON flag for this cpu
> > -            Store(DerefOf(Index(CPON, Arg0)), Local0)
> > +            Store(DerefOf(Index(CPON, Arg1)), Local0)
> >              // Local1 = Buffer (in madt apic form) to return
> >              Store(Buffer(8) {0x00, 0x08, 0x00, 0x00, 0x00, 0, 0, 0}, Local1)
> >              // Update the processor id, lapic id, and enable/disable status
> >              Store(Arg0, Index(Local1, 2))
> > -            Store(Arg0, Index(Local1, 3))
> > +            Store(Arg1, Index(Local1, 3))
> >              Store(Local0, Index(Local1, 4))
> >              Return (Local1)
> >          }
> >          Method (CPST, 1, NotSerialized) {
> >              // _STA method - return ON status of cpu
> > +            // Arg0 = Local APIC ID
> >              // Local0 = CPON flag for this cpu
> >              Store(DerefOf(Index(CPON, Arg0)), Local0)
> >              If (Local0) { Return(0xF) } Else { Return(0x0) }
> > @@ -708,7 +711,7 @@ DefinitionBlock (
> >              Store (PRS, Local5)
> >              // Local2 = last read byte from bitmap
> >              Store (Zero, Local2)
> > -            // Local0 = cpuid iterator
> > +            // Local0 = APIC ID iterator
> >              Store (Zero, Local0)
> >              While (LLess(Local0, SizeOf(CPON))) {
> >                  // Local1 = CPON flag for this cpu
> > diff --git a/src/acpi.c b/src/acpi.c
> > index 55e4607..149ff42 100644
> > --- a/src/acpi.c
> > +++ b/src/acpi.c
> > @@ -302,6 +302,63 @@ build_fadt(struct pci_device *pci)
> >      return fadt;
> >  }
> >  
> > +
> > +/* APIC IDs of each Processor indexed by ACPI Processor ID.
> > + * Used to make the MADT entries and SSDT Processor declarations
> > + * match each other.
> > + */
> > +static u8 *processor_apic_ids;
> > +
> > +/* Number of Processor IDs that have a valid 8-bit APIC ID */
> > +static int count_valid_apic_ids;
> > +
> > +/* Build a ACPI Processor ID -> Local APIC ID table, used for the MADT
> > + * and SSDT Processor declarations.
> > + * The CPUs present at boot will be assigned to the first entries.
> > + * The remaining entries will get sequential APIC IDs.
> > + */
> > +static void
> > +check_apic_ids(void)
> > +{
> > +    processor_apic_ids = malloc_high(MaxCountCPUs);
> > +    if (!processor_apic_ids) {
> > +        warn_noalloc();
> > +        count_valid_apic_ids = 0;
> > +        return;
> > +    }
> > +
> > +    int i;
> > +    int apic_id = 0;
> > +    int last_present_apic_id = 0;
> > +
> > +    // the first entries are for the already-present CPUs:
> > +    for (i = 0; i < MaxCountCPUs; i++) {
> > +        // look for next present APIC ID:
> > +        while (apic_id <= MAX_APIC_ID && !apic_id_is_present(apic_id))
> > +            apic_id++;
> > +
> > +        if (apic_id > MAX_APIC_ID)
> > +            break; // no more valid APIC IDs
> > +
> > +        processor_apic_ids[i] = apic_id;
> > +        last_present_apic_id = apic_id;
> > +        apic_id++;
> > +    }
> > +
> > +    // the remaining entries will get sequential APIC IDs assigned, up to
> > +    // MAX_APIC_ID
> > +    apic_id = last_present_apic_id;
> > +    for ( ; i < MaxCountCPUs; i++) {
> > +        apic_id++;
> > +        if (apic_id > MAX_APIC_ID)
> > +            break;
> > +
> > +        processor_apic_ids[i] = apic_id;
> > +    }
> > +
> > +    count_valid_apic_ids = i;
> > +}
> > +
> What if there is only one apic id present and it is 0xfe? It looks like
> the above code will result in count_valid_apic_ids been equal 1, so
> instead of MaxCountCPUs only one will be created.

True. There's a big constraint/assumption for CPU hotplug, here:
hotplugged CPUs should always get APIC IDs _higher_ than the CPUs
existing at boot.

We could remove this constraint, but then it would be even more
difficult to keep the ACPI Processor IDs stable after CPU hotplug &
hibernate (the problem you pointed out above).

> 
> >  static void*
> >  build_madt(void)
> >  {
> > @@ -321,15 +378,16 @@ build_madt(void)
> >      madt->flags = cpu_to_le32(1);
> >      struct madt_processor_apic *apic = (void*)&madt[1];
> >      int i;
> > -    for (i=0; i<MaxCountCPUs; i++) {
> > +    for (i = 0; i < count_valid_apic_ids; i++) {
> >          apic->type = APIC_PROCESSOR;
> >          apic->length = sizeof(*apic);
> >          apic->processor_id = i;
> > -        apic->local_apic_id = i;
> > -        if (i < CountCPUs)
> > +        apic->local_apic_id = processor_apic_ids[i];
> > +        if (apic_id_is_present(apic->local_apic_id)) {
> >              apic->flags = cpu_to_le32(1);
> > -        else
> > +        } else {
> >              apic->flags = cpu_to_le32(0);
> > +        }
> I do not think Seabios codding style require braces here.

Too much time dealing with QEMU code got my head confused.  :)

> 
> >          apic++;
> >      }
> >      struct madt_io_apic *io_apic = (void*)apic;
> > @@ -402,6 +460,7 @@ encodeLen(u8 *ssdt_ptr, int length, int bytes)
> >  #define SD_OFFSET_CPUHEX (*ssdt_proc_name - *ssdt_proc_start + 2)
> >  #define SD_OFFSET_CPUID1 (*ssdt_proc_name - *ssdt_proc_start + 4)
> >  #define SD_OFFSET_CPUID2 (*ssdt_proc_id - *ssdt_proc_start)
> > +#define SD_OFFSET_APICID (*ssdt_proc_apic_id - *ssdt_proc_start)
> >  #define SD_SIZEOF (*ssdt_proc_end - *ssdt_proc_start)
> >  #define SD_PROC (ssdp_proc_aml + *ssdt_proc_start)
> >  
> > @@ -410,12 +469,13 @@ encodeLen(u8 *ssdt_ptr, int length, int bytes)
> >  static void*
> >  build_ssdt(void)
> >  {
> > -    int acpi_cpus = MaxCountCPUs > 0xff ? 0xff : MaxCountCPUs;
> > +    int acpi_cpus = count_valid_apic_ids;
> > +    int cpon_size = count_valid_apic_ids ? processor_apic_ids[count_valid_apic_ids - 1] + 1 : 0;
> >      // length = ScopeOp + procs + NTYF method + CPON package
> >      int length = ((1+3+4)
> >                    + (acpi_cpus * SD_SIZEOF)
> >                    + (1+2+5+(12*acpi_cpus))
> > -                  + (6+2+1+(1*acpi_cpus))
> > +                  + (6+2+1+(1*cpon_size))
> >                    + 17);
> >      u8 *ssdt = malloc_high(sizeof(struct acpi_table_header) + length);
> >      if (! ssdt) {
> > @@ -440,10 +500,12 @@ build_ssdt(void)
> >          ssdt_ptr[SD_OFFSET_CPUHEX+1] = getHex(i);
> >          ssdt_ptr[SD_OFFSET_CPUID1] = i;
> >          ssdt_ptr[SD_OFFSET_CPUID2] = i;
> > +        ssdt_ptr[SD_OFFSET_APICID] = processor_apic_ids[i];
> >          ssdt_ptr += SD_SIZEOF;
> >      }
> >  
> >      // build "Method(NTFY, 2) {If (LEqual(Arg0, 0x00)) {Notify(CP00, Arg1)} ...}"
> > +    // Arg0 = APIC ID
> >      *(ssdt_ptr++) = 0x14; // MethodOp
> >      ssdt_ptr = encodeLen(ssdt_ptr, 2+5+(12*acpi_cpus), 2);
> >      *(ssdt_ptr++) = 'N';
> > @@ -457,7 +519,7 @@ build_ssdt(void)
> >          *(ssdt_ptr++) = 0x93; // LEqualOp
> >          *(ssdt_ptr++) = 0x68; // Arg0Op
> >          *(ssdt_ptr++) = 0x0A; // BytePrefix
> > -        *(ssdt_ptr++) = i;
> > +        *(ssdt_ptr++) = processor_apic_ids[i];
> >          *(ssdt_ptr++) = 0x86; // NotifyOp
> >          *(ssdt_ptr++) = 'C';
> >          *(ssdt_ptr++) = 'P';
> > @@ -467,6 +529,7 @@ build_ssdt(void)
> >      }
> >  
> >      // build "Name(CPON, Package() { One, One, ..., Zero, Zero, ... })"
> > +    // CPON array is APIC-ID based, not Processor ID based
> >      *(ssdt_ptr++) = 0x08; // NameOp
> >      *(ssdt_ptr++) = 'C';
> >      *(ssdt_ptr++) = 'P';
> > @@ -474,9 +537,9 @@ build_ssdt(void)
> >      *(ssdt_ptr++) = 'N';
> >      *(ssdt_ptr++) = 0x12; // PackageOp
> >      ssdt_ptr = encodeLen(ssdt_ptr, 2+1+(1*acpi_cpus), 2);
> > -    *(ssdt_ptr++) = acpi_cpus;
> > -    for (i=0; i<acpi_cpus; i++)
> > -        *(ssdt_ptr++) = (i < CountCPUs) ? 0x01 : 0x00;
> > +    *(ssdt_ptr++) = cpon_size;
> > +    for (i=0; i<cpon_size; i++)
> > +        *(ssdt_ptr++) = (apic_id_is_present(i)) ? 0x01 : 0x00;
> >  
> >      // store pci io windows: start, end, length
> >      // this way we don't have to do the math in the dsdt
> > @@ -644,30 +707,40 @@ build_srat(void)
> >      memset(srat, 0, srat_size);
> >      srat->reserved1=1;
> >      struct srat_processor_affinity *core = (void*)(srat + 1);
> > -    int i;
> > +    int apic_id;
> >      u64 curnode;
> > +    u64 *cpudata = numadata;
> >  
> > -    for (i = 0; i < MaxCountCPUs; ++i) {
> > +    /* Note that the fw_cfg NUMA CPU affinity data is APIC-ID-based, not
> > +     * Processor ID based, but the size of the tablesble is based on MaxCountCPUs
> > +     */
> > +    for (apic_id = 0; apic_id < MaxCountCPUs; ++apic_id) {
> > +        if (apic_id > MAX_APIC_ID)
> > +            break;
> And here you omit braces.
> 
> > +        curnode = *cpudata++;
> >          core->type = SRAT_PROCESSOR;
> >          core->length = sizeof(*core);
> > -        core->local_apic_id = i;
> > -        curnode = *numadata++;
> >          core->proximity_lo = curnode;
> >          memset(core->proximity_hi, 0, 3);
> >          core->local_sapic_eid = 0;
> > -        if (i < CountCPUs)
> > +        core->local_apic_id = apic_id;
> > +        if (apic_id_is_present(apic_id)) {
> >              core->flags = cpu_to_le32(1);
> > -        else
> > -            core->flags = 0;
> > +        } else {
> > +            core->flags = cpu_to_le32(0);
> > +        }
> And here you add them again :)

Let's say I was unsure by the Seabios coding style, so I was trying to
make sure I broke all possible coding styles at the same time.  ;-)


> 
> >          core++;
> >      }
> >  
> > +    numadata += MaxCountCPUs;
> > +
> >  
> >      /* the memory map is a bit tricky, it contains at least one hole
> >       * from 640k-1M and possibly another one from 3.5G-4G.
> >       */
> >      struct srat_memory_affinity *numamem = (void*)core;
> >      int slots = 0;
> > +    int i;
> >      u64 mem_len, mem_base, next_base = 0;
> >  
> >      acpi_build_srat_memory(numamem, 0, 640*1024, 0, 1);
> > @@ -742,6 +815,8 @@ acpi_bios_init(void)
> >              tbl_idx++;                                       \
> >      } while(0)
> >  
> > +    check_apic_ids();
> > +
> >      struct fadt_descriptor_rev1 *fadt = build_fadt(pci);
> >      ACPI_INIT_TABLE(fadt);
> >      ACPI_INIT_TABLE(build_ssdt());
> > diff --git a/src/smp.c b/src/smp.c
> > index 8c077a1..af92e4a 100644
> > --- a/src/smp.c
> > +++ b/src/smp.c
> > @@ -36,6 +36,8 @@ wrmsr_smp(u32 index, u64 val)
> >  
> >  u32 CountCPUs VAR16VISIBLE;
> >  u32 MaxCountCPUs VAR16VISIBLE;
> > +// 256 bits for the found APIC IDs
> > +u32 FoundAPICIDs[256/32] VAR16VISIBLE;
> >  extern void smp_ap_boot_code(void);
> >  ASM16(
> >      "  .global smp_ap_boot_code\n"
> > @@ -59,6 +61,12 @@ ASM16(
> >      "  jmp 1b\n"
> >      "2:\n"
> >  
> > +    // get apic ID on EBX, set bit on FoundAPICIDs
> > +    "  mov $1, %eax\n"
> > +    "  cpuid\n"
> > +    "  shrl $24, %ebx\n"
> > +    "  lock bts %ebx, FoundAPICIDs\n"
> > +
> >      // Increment the cpu counter
> >      "  lock incl CountCPUs\n"
> >  
> > @@ -67,6 +75,11 @@ ASM16(
> >      "  jmp 1b\n"
> >      );
> >  
> > +int apic_id_is_present(u8 apic_id)
> > +{
> > +    return FoundAPICIDs[apic_id/32] & (1 << (apic_id % 32));
> > +}
> > +
> >  // find and initialize the CPUs by launching a SIPI to them
> >  void
> >  smp_probe(void)
> > @@ -82,6 +95,10 @@ smp_probe(void)
> >          return;
> >      }
> >  
> > +    // mark the BSP initial APIC ID as found, too:
> > +    u8 apic_id = ebx>>24;
> > +    FoundAPICIDs[apic_id/32] |= (1 << (apic_id % 32));
> > +
> >      // Init the counter.
> >      writel(&CountCPUs, 1);
> >  
> > diff --git a/src/ssdt-proc.dsl b/src/ssdt-proc.dsl
> > index 0339422..7e7e855 100644
> > --- a/src/ssdt-proc.dsl
> > +++ b/src/ssdt-proc.dsl
> > @@ -5,13 +5,15 @@
> >   * be placed in the \_SB_ namespace.
> >   *
> >   * In addition to the aml code generated from this file, the
> > - * src/acpi.c file creates a NTFY method with an entry for each cpu:
> > + * src/acpi.c file creates a NTFY method with an entry for each cpu,
> > + * that checks for the APIC ID:
> >   *     Method(NTFY, 2) {
> >   *         If (LEqual(Arg0, 0x00)) { Notify(CP00, Arg1) }
> >   *         If (LEqual(Arg0, 0x01)) { Notify(CP01, Arg1) }
> >   *         ...
> >   *     }
> > - * and a CPON array with the list of active and inactive cpus:
> > + * and a CPON array with the list of active and inactive cpus,
> > + * indexed by the APIC ID:
> >   *     Name(CPON, Package() { One, One, ..., Zero, Zero, ... })
> >   */
> >  
> > @@ -25,6 +27,8 @@ DefinitionBlock ("ssdt-proc.aml", "SSDT", 0x01, "BXPC", "BXSSDT", 0x1)
> >      Processor (CPAA, 0xAA, 0x0000b010, 0x06) {
> >          ACPI_EXTRACT_NAME_BYTE_CONST ssdt_proc_id
> >          Name (ID, 0xAA)
> > +        ACPI_EXTRACT_NAME_BYTE_CONST ssdt_proc_apic_id
> > +        Name (APIC, 0xAA)
> >  /*
> >   * The src/acpi.c code requires the above ACP_EXTRACT tags so that it can update
> >   * CPAA and 0xAA with the appropriate CPU id (see
> > @@ -36,10 +40,10 @@ DefinitionBlock ("ssdt-proc.aml", "SSDT", 0x01, "BXPC", "BXSSDT", 0x1)
> >          External(CPST, MethodObj)
> >          External(CPEJ, MethodObj)
> >          Method(_MAT, 0) {
> > -            Return(CPMA(ID))
> > +            Return(CPMA(ID, APIC))
> >          }
> >          Method (_STA, 0) {
> > -            Return(CPST(ID))
> > +            Return(CPST(APIC))
> >          }
> >          Method (_EJ0, 1, NotSerialized) {
> >              CPEJ(ID, Arg0)
> > diff --git a/src/util.h b/src/util.h
> > index ef8ec7c..a68f56b 100644
> > --- a/src/util.h
> > +++ b/src/util.h
> > @@ -351,10 +351,13 @@ void pci_setup(void);
> >  void smm_init(void);
> >  
> >  // smp.c
> > +#define MAX_APIC_ID 0xFE
> > +
> >  extern u32 CountCPUs;
> >  extern u32 MaxCountCPUs;
> >  void wrmsr_smp(u32 index, u64 val);
> >  void smp_probe(void);
> > +int apic_id_is_present(u8 apic_id);
> >  
> >  // coreboot.c
> >  extern const char *CBvendor, *CBpart;
> > -- 
> > 1.7.10.4
> 
> --
> 			Gleb.
>
Eduardo Habkost - July 19, 2012, 7:46 p.m.
On Thu, Jul 19, 2012 at 11:28:54AM -0300, Eduardo Habkost wrote:
> On Thu, Jul 19, 2012 at 12:58:46PM +0300, Gleb Natapov wrote:
> > On Tue, Jul 17, 2012 at 06:56:30PM -0300, Eduardo Habkost wrote:
> > > This patch is an attempt to fix the non-continguous-APIC-ID problem without the
> > > FW_CFG_LAPIC_INFO approach I have sent proposed last week.
> > > 
> > > Basically, this changes Seabios to probe for APIC IDs directly from the
> > > CPUs on boot, instead of getting it using fw_cfg, store the found APIC
> > > IDs on a bitmap, and use that information whe building the MADT, SRAT,
> > > and SSDT ACPI tables.
> > > 
> > > To do this properly, we have to decide the meaning of "CPU IDs" in the
> > > QEMU<->Seabios interfaces, too. I see two possible approaches:
> > > 
> > > 1) Have Seabios and QEMU agree on a a "CPU identifier", that is
> > >    independent from the APIC ID.
> > > 2) Always use the Initial APIC ID on all communication between QEMU and
> > >    Seabios.
> > > 
> > We need to be prepared to support more than 255 cpus. With > 255 cpus
> > comes x2apic and x2apic has 32bit apic ids. HW does not have to support
> > all of the bits though, but potentially all the bitmasks can grow
> > prohibitedly large.
> 
> I see only two solutions:
> 
> - Specify an interface/convention for QEMU and Seabios agree upon a "CPU
>   identifier" <=> x2apic <=> LAPIC ID mapping for all CPUs.
> - Specify new NUMA-information and CPU hotplug interfaces (or extend the
>   existing ones) based on x2apic ID, when Seabios start supporting
>   x2apic.
> 
> I am not particularly inclined towards any of those two solutions. I
> dislike them equally.  :-)

Oh, it is simpler than I have expected. x2APIC specification:

"The local APIC ID is initialized by hardware with a 32 bit ID (x2APIC
ID). The lowest 8 bits of the x2APIC ID is the legacy local xAPIC ID,
and is stored in the upper 8 bits of the APIC register for access in
xAPIC mode."

And the ACPI specification:

"Logical processors with APIC ID values of 255 and greater are required
to have a Processor Device object and must convey the processor’s APIC
information to OSPM using the Processor Local X2APIC structure. Logical
processors with APIC ID values less than 255 must use the Processor
Local APIC structure to convey their APIC information to OSPM."

That means the x2APIC ID and the xAPIC ID are interchangeable, for
values <= 255. That means the QEMU<=>Seabios communication can be safely
based on "APIC IDs" without any ambiguity.

The CPU hotplug interface is a bit of a problem because it is based on a
256-bit bitmap. But on the day it gets extended to support more than 256
CPUs, it can safely be based on APIC IDs and still keep compatibility
with systems without x2APIC.

So, now I am strongly inclined towards the second option from the list
above: just use APIC IDs everywhere to identify CPUs when QEMU and
Seabios communicate with each other, and QEMU can completely ignore the
"Processor ID" used by Seabios.

> 
> Note that I am more worried about the QEMU<->Seabios interfaces. The
> APIC ID bitmap on smp.c, for example, is just an implementation detail:
> if we make Seabios support x2apic, that code can be changed to use a
> different data structure instead.
> 
[...]
> > > To try to make things less likely to break on the common, non-hotplug
> > > case, this patch makes the Processor IDs be chosen this way:
> > > 
> > > - The CPUs present on boot get contiguous Processor IDs (even if the
> > >   APIC IDs are not contiguous);
> > > - The remaining Processor declarations are going to associated to the
> > >   remaining APIC IDs (immediately after the last present APIC ID),
> > >   sequentially. This means that hotplugged CPUs may not get contiguous
> > >   Processor declarations if their APIC IDs are not contiguous.
> > > 
> > I am curious what will happen if cpu will be hot plugged, than hibernate
> > and resume is done. After resume hot plugged cpu will have different
> > Processor ID in ACPI. This may or may not be a problem.
> 
> True. Keeping those tables stable after hotplug and hibernate may be a
> challenge.
> 
> Maybe it would be easier to just leave holes on the MADT and SSDT tables
> (making APIC ID and Processor ID always match), and hope no OS will be
> confused by the holes.

I am inclined to try this approach first (keep APIC ID == ACPI Processor
ID), to keep things simple in Seabios. I am hoping no OS will have
problems with the holes in the list of enabled Processor IDs.
Gleb Natapov - July 23, 2012, 8:58 a.m.
On Thu, Jul 19, 2012 at 04:46:35PM -0300, Eduardo Habkost wrote:
> On Thu, Jul 19, 2012 at 11:28:54AM -0300, Eduardo Habkost wrote:
> > On Thu, Jul 19, 2012 at 12:58:46PM +0300, Gleb Natapov wrote:
> > > On Tue, Jul 17, 2012 at 06:56:30PM -0300, Eduardo Habkost wrote:
> > > > This patch is an attempt to fix the non-continguous-APIC-ID problem without the
> > > > FW_CFG_LAPIC_INFO approach I have sent proposed last week.
> > > > 
> > > > Basically, this changes Seabios to probe for APIC IDs directly from the
> > > > CPUs on boot, instead of getting it using fw_cfg, store the found APIC
> > > > IDs on a bitmap, and use that information whe building the MADT, SRAT,
> > > > and SSDT ACPI tables.
> > > > 
> > > > To do this properly, we have to decide the meaning of "CPU IDs" in the
> > > > QEMU<->Seabios interfaces, too. I see two possible approaches:
> > > > 
> > > > 1) Have Seabios and QEMU agree on a a "CPU identifier", that is
> > > >    independent from the APIC ID.
> > > > 2) Always use the Initial APIC ID on all communication between QEMU and
> > > >    Seabios.
> > > > 
> > > We need to be prepared to support more than 255 cpus. With > 255 cpus
> > > comes x2apic and x2apic has 32bit apic ids. HW does not have to support
> > > all of the bits though, but potentially all the bitmasks can grow
> > > prohibitedly large.
> > 
> > I see only two solutions:
> > 
> > - Specify an interface/convention for QEMU and Seabios agree upon a "CPU
> >   identifier" <=> x2apic <=> LAPIC ID mapping for all CPUs.
> > - Specify new NUMA-information and CPU hotplug interfaces (or extend the
> >   existing ones) based on x2apic ID, when Seabios start supporting
> >   x2apic.
> > 
> > I am not particularly inclined towards any of those two solutions. I
> > dislike them equally.  :-)
> 
> Oh, it is simpler than I have expected. x2APIC specification:
> 
> "The local APIC ID is initialized by hardware with a 32 bit ID (x2APIC
> ID). The lowest 8 bits of the x2APIC ID is the legacy local xAPIC ID,
> and is stored in the upper 8 bits of the APIC register for access in
> xAPIC mode."
> 
> And the ACPI specification:
> 
> "Logical processors with APIC ID values of 255 and greater are required
> to have a Processor Device object and must convey the processor’s APIC
> information to OSPM using the Processor Local X2APIC structure. Logical
> processors with APIC ID values less than 255 must use the Processor
> Local APIC structure to convey their APIC information to OSPM."
> 
> That means the x2APIC ID and the xAPIC ID are interchangeable, for
> values <= 255. That means the QEMU<=>Seabios communication can be safely
> based on "APIC IDs" without any ambiguity.
> 
Yes for <= 255 they interchangeable. That's why we can add +x2apic to
our cpu models without changes to the BIOS.

> The CPU hotplug interface is a bit of a problem because it is based on a
> 256-bit bitmap. But on the day it gets extended to support more than 256
> CPUs, it can safely be based on APIC IDs and still keep compatibility
> with systems without x2APIC.
The bitmap will have to be extended if we will go beyond 256 cpus. Using
apic-id to index the bitmap means that the size of the bitmap is a
function of max apic-id we want to support, not max number of cpus.

> 
> So, now I am strongly inclined towards the second option from the list
> above: just use APIC IDs everywhere to identify CPUs when QEMU and
> Seabios communicate with each other, and QEMU can completely ignore the
> "Processor ID" used by Seabios.
I agree with making "Processor ID" Seabios internal thing.

> 
> > 
> > Note that I am more worried about the QEMU<->Seabios interfaces. The
> > APIC ID bitmap on smp.c, for example, is just an implementation detail:
> > if we make Seabios support x2apic, that code can be changed to use a
> > different data structure instead.
> > 
> [...]
> > > > To try to make things less likely to break on the common, non-hotplug
> > > > case, this patch makes the Processor IDs be chosen this way:
> > > > 
> > > > - The CPUs present on boot get contiguous Processor IDs (even if the
> > > >   APIC IDs are not contiguous);
> > > > - The remaining Processor declarations are going to associated to the
> > > >   remaining APIC IDs (immediately after the last present APIC ID),
> > > >   sequentially. This means that hotplugged CPUs may not get contiguous
> > > >   Processor declarations if their APIC IDs are not contiguous.
> > > > 
> > > I am curious what will happen if cpu will be hot plugged, than hibernate
> > > and resume is done. After resume hot plugged cpu will have different
> > > Processor ID in ACPI. This may or may not be a problem.
> > 
> > True. Keeping those tables stable after hotplug and hibernate may be a
> > challenge.
> > 
> > Maybe it would be easier to just leave holes on the MADT and SSDT tables
> > (making APIC ID and Processor ID always match), and hope no OS will be
> > confused by the holes.
> 
> I am inclined to try this approach first (keep APIC ID == ACPI Processor
> ID), to keep things simple in Seabios. I am hoping no OS will have
> problems with the holes in the list of enabled Processor IDs.
> 
They shouldn't.

--
			Gleb.

Patch

diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
index 2060686..51ca037 100644
--- a/src/acpi-dsdt.dsl
+++ b/src/acpi-dsdt.dsl
@@ -674,20 +674,23 @@  DefinitionBlock (
         External(CPON, PkgObj)
 
         /* Methods called by run-time generated SSDT Processor objects */
-        Method (CPMA, 1, NotSerialized) {
+        Method (CPMA, 2, NotSerialized) {
             // _MAT method - create an madt apic buffer
+            // Arg0 = Processor ID
+            // Arg1 = Local APIC ID
             // Local0 = CPON flag for this cpu
-            Store(DerefOf(Index(CPON, Arg0)), Local0)
+            Store(DerefOf(Index(CPON, Arg1)), Local0)
             // Local1 = Buffer (in madt apic form) to return
             Store(Buffer(8) {0x00, 0x08, 0x00, 0x00, 0x00, 0, 0, 0}, Local1)
             // Update the processor id, lapic id, and enable/disable status
             Store(Arg0, Index(Local1, 2))
-            Store(Arg0, Index(Local1, 3))
+            Store(Arg1, Index(Local1, 3))
             Store(Local0, Index(Local1, 4))
             Return (Local1)
         }
         Method (CPST, 1, NotSerialized) {
             // _STA method - return ON status of cpu
+            // Arg0 = Local APIC ID
             // Local0 = CPON flag for this cpu
             Store(DerefOf(Index(CPON, Arg0)), Local0)
             If (Local0) { Return(0xF) } Else { Return(0x0) }
@@ -708,7 +711,7 @@  DefinitionBlock (
             Store (PRS, Local5)
             // Local2 = last read byte from bitmap
             Store (Zero, Local2)
-            // Local0 = cpuid iterator
+            // Local0 = APIC ID iterator
             Store (Zero, Local0)
             While (LLess(Local0, SizeOf(CPON))) {
                 // Local1 = CPON flag for this cpu
diff --git a/src/acpi.c b/src/acpi.c
index 55e4607..149ff42 100644
--- a/src/acpi.c
+++ b/src/acpi.c
@@ -302,6 +302,63 @@  build_fadt(struct pci_device *pci)
     return fadt;
 }
 
+
+/* APIC IDs of each Processor indexed by ACPI Processor ID.
+ * Used to make the MADT entries and SSDT Processor declarations
+ * match each other.
+ */
+static u8 *processor_apic_ids;
+
+/* Number of Processor IDs that have a valid 8-bit APIC ID */
+static int count_valid_apic_ids;
+
+/* Build a ACPI Processor ID -> Local APIC ID table, used for the MADT
+ * and SSDT Processor declarations.
+ * The CPUs present at boot will be assigned to the first entries.
+ * The remaining entries will get sequential APIC IDs.
+ */
+static void
+check_apic_ids(void)
+{
+    processor_apic_ids = malloc_high(MaxCountCPUs);
+    if (!processor_apic_ids) {
+        warn_noalloc();
+        count_valid_apic_ids = 0;
+        return;
+    }
+
+    int i;
+    int apic_id = 0;
+    int last_present_apic_id = 0;
+
+    // the first entries are for the already-present CPUs:
+    for (i = 0; i < MaxCountCPUs; i++) {
+        // look for next present APIC ID:
+        while (apic_id <= MAX_APIC_ID && !apic_id_is_present(apic_id))
+            apic_id++;
+
+        if (apic_id > MAX_APIC_ID)
+            break; // no more valid APIC IDs
+
+        processor_apic_ids[i] = apic_id;
+        last_present_apic_id = apic_id;
+        apic_id++;
+    }
+
+    // the remaining entries will get sequential APIC IDs assigned, up to
+    // MAX_APIC_ID
+    apic_id = last_present_apic_id;
+    for ( ; i < MaxCountCPUs; i++) {
+        apic_id++;
+        if (apic_id > MAX_APIC_ID)
+            break;
+
+        processor_apic_ids[i] = apic_id;
+    }
+
+    count_valid_apic_ids = i;
+}
+
 static void*
 build_madt(void)
 {
@@ -321,15 +378,16 @@  build_madt(void)
     madt->flags = cpu_to_le32(1);
     struct madt_processor_apic *apic = (void*)&madt[1];
     int i;
-    for (i=0; i<MaxCountCPUs; i++) {
+    for (i = 0; i < count_valid_apic_ids; i++) {
         apic->type = APIC_PROCESSOR;
         apic->length = sizeof(*apic);
         apic->processor_id = i;
-        apic->local_apic_id = i;
-        if (i < CountCPUs)
+        apic->local_apic_id = processor_apic_ids[i];
+        if (apic_id_is_present(apic->local_apic_id)) {
             apic->flags = cpu_to_le32(1);
-        else
+        } else {
             apic->flags = cpu_to_le32(0);
+        }
         apic++;
     }
     struct madt_io_apic *io_apic = (void*)apic;
@@ -402,6 +460,7 @@  encodeLen(u8 *ssdt_ptr, int length, int bytes)
 #define SD_OFFSET_CPUHEX (*ssdt_proc_name - *ssdt_proc_start + 2)
 #define SD_OFFSET_CPUID1 (*ssdt_proc_name - *ssdt_proc_start + 4)
 #define SD_OFFSET_CPUID2 (*ssdt_proc_id - *ssdt_proc_start)
+#define SD_OFFSET_APICID (*ssdt_proc_apic_id - *ssdt_proc_start)
 #define SD_SIZEOF (*ssdt_proc_end - *ssdt_proc_start)
 #define SD_PROC (ssdp_proc_aml + *ssdt_proc_start)
 
@@ -410,12 +469,13 @@  encodeLen(u8 *ssdt_ptr, int length, int bytes)
 static void*
 build_ssdt(void)
 {
-    int acpi_cpus = MaxCountCPUs > 0xff ? 0xff : MaxCountCPUs;
+    int acpi_cpus = count_valid_apic_ids;
+    int cpon_size = count_valid_apic_ids ? processor_apic_ids[count_valid_apic_ids - 1] + 1 : 0;
     // length = ScopeOp + procs + NTYF method + CPON package
     int length = ((1+3+4)
                   + (acpi_cpus * SD_SIZEOF)
                   + (1+2+5+(12*acpi_cpus))
-                  + (6+2+1+(1*acpi_cpus))
+                  + (6+2+1+(1*cpon_size))
                   + 17);
     u8 *ssdt = malloc_high(sizeof(struct acpi_table_header) + length);
     if (! ssdt) {
@@ -440,10 +500,12 @@  build_ssdt(void)
         ssdt_ptr[SD_OFFSET_CPUHEX+1] = getHex(i);
         ssdt_ptr[SD_OFFSET_CPUID1] = i;
         ssdt_ptr[SD_OFFSET_CPUID2] = i;
+        ssdt_ptr[SD_OFFSET_APICID] = processor_apic_ids[i];
         ssdt_ptr += SD_SIZEOF;
     }
 
     // build "Method(NTFY, 2) {If (LEqual(Arg0, 0x00)) {Notify(CP00, Arg1)} ...}"
+    // Arg0 = APIC ID
     *(ssdt_ptr++) = 0x14; // MethodOp
     ssdt_ptr = encodeLen(ssdt_ptr, 2+5+(12*acpi_cpus), 2);
     *(ssdt_ptr++) = 'N';
@@ -457,7 +519,7 @@  build_ssdt(void)
         *(ssdt_ptr++) = 0x93; // LEqualOp
         *(ssdt_ptr++) = 0x68; // Arg0Op
         *(ssdt_ptr++) = 0x0A; // BytePrefix
-        *(ssdt_ptr++) = i;
+        *(ssdt_ptr++) = processor_apic_ids[i];
         *(ssdt_ptr++) = 0x86; // NotifyOp
         *(ssdt_ptr++) = 'C';
         *(ssdt_ptr++) = 'P';
@@ -467,6 +529,7 @@  build_ssdt(void)
     }
 
     // build "Name(CPON, Package() { One, One, ..., Zero, Zero, ... })"
+    // CPON array is APIC-ID based, not Processor ID based
     *(ssdt_ptr++) = 0x08; // NameOp
     *(ssdt_ptr++) = 'C';
     *(ssdt_ptr++) = 'P';
@@ -474,9 +537,9 @@  build_ssdt(void)
     *(ssdt_ptr++) = 'N';
     *(ssdt_ptr++) = 0x12; // PackageOp
     ssdt_ptr = encodeLen(ssdt_ptr, 2+1+(1*acpi_cpus), 2);
-    *(ssdt_ptr++) = acpi_cpus;
-    for (i=0; i<acpi_cpus; i++)
-        *(ssdt_ptr++) = (i < CountCPUs) ? 0x01 : 0x00;
+    *(ssdt_ptr++) = cpon_size;
+    for (i=0; i<cpon_size; i++)
+        *(ssdt_ptr++) = (apic_id_is_present(i)) ? 0x01 : 0x00;
 
     // store pci io windows: start, end, length
     // this way we don't have to do the math in the dsdt
@@ -644,30 +707,40 @@  build_srat(void)
     memset(srat, 0, srat_size);
     srat->reserved1=1;
     struct srat_processor_affinity *core = (void*)(srat + 1);
-    int i;
+    int apic_id;
     u64 curnode;
+    u64 *cpudata = numadata;
 
-    for (i = 0; i < MaxCountCPUs; ++i) {
+    /* Note that the fw_cfg NUMA CPU affinity data is APIC-ID-based, not
+     * Processor ID based, but the size of the tablesble is based on MaxCountCPUs
+     */
+    for (apic_id = 0; apic_id < MaxCountCPUs; ++apic_id) {
+        if (apic_id > MAX_APIC_ID)
+            break;
+        curnode = *cpudata++;
         core->type = SRAT_PROCESSOR;
         core->length = sizeof(*core);
-        core->local_apic_id = i;
-        curnode = *numadata++;
         core->proximity_lo = curnode;
         memset(core->proximity_hi, 0, 3);
         core->local_sapic_eid = 0;
-        if (i < CountCPUs)
+        core->local_apic_id = apic_id;
+        if (apic_id_is_present(apic_id)) {
             core->flags = cpu_to_le32(1);
-        else
-            core->flags = 0;
+        } else {
+            core->flags = cpu_to_le32(0);
+        }
         core++;
     }
 
+    numadata += MaxCountCPUs;
+
 
     /* the memory map is a bit tricky, it contains at least one hole
      * from 640k-1M and possibly another one from 3.5G-4G.
      */
     struct srat_memory_affinity *numamem = (void*)core;
     int slots = 0;
+    int i;
     u64 mem_len, mem_base, next_base = 0;
 
     acpi_build_srat_memory(numamem, 0, 640*1024, 0, 1);
@@ -742,6 +815,8 @@  acpi_bios_init(void)
             tbl_idx++;                                       \
     } while(0)
 
+    check_apic_ids();
+
     struct fadt_descriptor_rev1 *fadt = build_fadt(pci);
     ACPI_INIT_TABLE(fadt);
     ACPI_INIT_TABLE(build_ssdt());
diff --git a/src/smp.c b/src/smp.c
index 8c077a1..af92e4a 100644
--- a/src/smp.c
+++ b/src/smp.c
@@ -36,6 +36,8 @@  wrmsr_smp(u32 index, u64 val)
 
 u32 CountCPUs VAR16VISIBLE;
 u32 MaxCountCPUs VAR16VISIBLE;
+// 256 bits for the found APIC IDs
+u32 FoundAPICIDs[256/32] VAR16VISIBLE;
 extern void smp_ap_boot_code(void);
 ASM16(
     "  .global smp_ap_boot_code\n"
@@ -59,6 +61,12 @@  ASM16(
     "  jmp 1b\n"
     "2:\n"
 
+    // get apic ID on EBX, set bit on FoundAPICIDs
+    "  mov $1, %eax\n"
+    "  cpuid\n"
+    "  shrl $24, %ebx\n"
+    "  lock bts %ebx, FoundAPICIDs\n"
+
     // Increment the cpu counter
     "  lock incl CountCPUs\n"
 
@@ -67,6 +75,11 @@  ASM16(
     "  jmp 1b\n"
     );
 
+int apic_id_is_present(u8 apic_id)
+{
+    return FoundAPICIDs[apic_id/32] & (1 << (apic_id % 32));
+}
+
 // find and initialize the CPUs by launching a SIPI to them
 void
 smp_probe(void)
@@ -82,6 +95,10 @@  smp_probe(void)
         return;
     }
 
+    // mark the BSP initial APIC ID as found, too:
+    u8 apic_id = ebx>>24;
+    FoundAPICIDs[apic_id/32] |= (1 << (apic_id % 32));
+
     // Init the counter.
     writel(&CountCPUs, 1);
 
diff --git a/src/ssdt-proc.dsl b/src/ssdt-proc.dsl
index 0339422..7e7e855 100644
--- a/src/ssdt-proc.dsl
+++ b/src/ssdt-proc.dsl
@@ -5,13 +5,15 @@ 
  * be placed in the \_SB_ namespace.
  *
  * In addition to the aml code generated from this file, the
- * src/acpi.c file creates a NTFY method with an entry for each cpu:
+ * src/acpi.c file creates a NTFY method with an entry for each cpu,
+ * that checks for the APIC ID:
  *     Method(NTFY, 2) {
  *         If (LEqual(Arg0, 0x00)) { Notify(CP00, Arg1) }
  *         If (LEqual(Arg0, 0x01)) { Notify(CP01, Arg1) }
  *         ...
  *     }
- * and a CPON array with the list of active and inactive cpus:
+ * and a CPON array with the list of active and inactive cpus,
+ * indexed by the APIC ID:
  *     Name(CPON, Package() { One, One, ..., Zero, Zero, ... })
  */
 
@@ -25,6 +27,8 @@  DefinitionBlock ("ssdt-proc.aml", "SSDT", 0x01, "BXPC", "BXSSDT", 0x1)
     Processor (CPAA, 0xAA, 0x0000b010, 0x06) {
         ACPI_EXTRACT_NAME_BYTE_CONST ssdt_proc_id
         Name (ID, 0xAA)
+        ACPI_EXTRACT_NAME_BYTE_CONST ssdt_proc_apic_id
+        Name (APIC, 0xAA)
 /*
  * The src/acpi.c code requires the above ACP_EXTRACT tags so that it can update
  * CPAA and 0xAA with the appropriate CPU id (see
@@ -36,10 +40,10 @@  DefinitionBlock ("ssdt-proc.aml", "SSDT", 0x01, "BXPC", "BXSSDT", 0x1)
         External(CPST, MethodObj)
         External(CPEJ, MethodObj)
         Method(_MAT, 0) {
-            Return(CPMA(ID))
+            Return(CPMA(ID, APIC))
         }
         Method (_STA, 0) {
-            Return(CPST(ID))
+            Return(CPST(APIC))
         }
         Method (_EJ0, 1, NotSerialized) {
             CPEJ(ID, Arg0)
diff --git a/src/util.h b/src/util.h
index ef8ec7c..a68f56b 100644
--- a/src/util.h
+++ b/src/util.h
@@ -351,10 +351,13 @@  void pci_setup(void);
 void smm_init(void);
 
 // smp.c
+#define MAX_APIC_ID 0xFE
+
 extern u32 CountCPUs;
 extern u32 MaxCountCPUs;
 void wrmsr_smp(u32 index, u64 val);
 void smp_probe(void);
+int apic_id_is_present(u8 apic_id);
 
 // coreboot.c
 extern const char *CBvendor, *CBpart;