diff mbox

[RFC] spapr: add initial ibm, client-architecture-support rtas call support

Message ID 1378289952-32703-1-git-send-email-aik@ozlabs.ru
State New
Headers show

Commit Message

Alexey Kardashevskiy Sept. 4, 2013, 10:19 a.m. UTC
This is an RFC patch.

The modern Linux kernel supports every known POWERPC CPU so when
it boots, it can always find a matching cpu_spec from the cpu_specs array.
However if the kernel is quite old, it may be missing the definition of
the actual CPU. To provide ability for old kernels to work on modern
hardware, a Logical Processor Version concept was introduced in PowerISA.
From the hardware prospective, it is supported by PCR (Processor
Compatibility Register) which is defined in PowerISA. The register
enables compatibility mode which can be set to PowerISA 2.05 or 2.06.

PAPR+ specification defines a Logical Processor Version per every
version of PowerISA specification. PAPR+ also defines
a ibm,client-architecture-support rtas call which purpose is to provide
a negotiation mechanism for the guest and the hypervisor to work out
the best Logical Processor Version to continue with.

At the moment, the Linux kernel calls the ibm,client-architecture-support
method and only then reads the device. The current RTAS's handler checks
the capabilities from the array supplied by the guest kernel, analyses
if QEMU can or cannot provide with the requested features.
If QEMU supports everything the guest has requested, it returns from rtas
call and the guest continues booting.
If some parameter changes, QEMU fixes the device tree and reboots
the guest with a new tree.

In this version, the ibm,client-architecture-support handler checks
if the current CPU is in the list from the guest and if it is not, QEMU
adds a "cpu-version" property to a cpu node with the best of logical PVRs
supported by the guest.

Technically QEMU reboots and as a part of reboot, it fixes the tree and
this is when the cpu-version property is actually added.

Although it seems possible to add a custom interface between SLOF and QEMU
and implement device tree update on the fly to avoid a guest reboot,
there still may be cases when device tree change would not be enough.
As an example, the guest may ask for a bigger RMA area than QEMU allocates
by default.

The patch depends on "[PATCH v5] powerpc: add PVR mask support".

Cc: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Cc: Andreas Färber <afaerber@suse.de>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr.c              | 10 ++++++
 hw/ppc/spapr_hcall.c        | 76 +++++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h      |  7 ++++-
 target-ppc/cpu-models.h     | 13 ++++++++
 target-ppc/cpu-qom.h        |  1 +
 target-ppc/translate_init.c |  3 ++
 6 files changed, 109 insertions(+), 1 deletion(-)

Comments

Alexander Graf Sept. 4, 2013, 10:42 a.m. UTC | #1
On 04.09.2013, at 12:19, Alexey Kardashevskiy wrote:

>   This is an RFC patch.
> 
> The modern Linux kernel supports every known POWERPC CPU so when
> it boots, it can always find a matching cpu_spec from the cpu_specs array.
> However if the kernel is quite old, it may be missing the definition of
> the actual CPU. To provide ability for old kernels to work on modern
> hardware, a Logical Processor Version concept was introduced in PowerISA.
> From the hardware prospective, it is supported by PCR (Processor
> Compatibility Register) which is defined in PowerISA. The register
> enables compatibility mode which can be set to PowerISA 2.05 or 2.06.
> 
> PAPR+ specification defines a Logical Processor Version per every
> version of PowerISA specification. PAPR+ also defines
> a ibm,client-architecture-support rtas call which purpose is to provide
> a negotiation mechanism for the guest and the hypervisor to work out
> the best Logical Processor Version to continue with.
> 
> At the moment, the Linux kernel calls the ibm,client-architecture-support
> method and only then reads the device. The current RTAS's handler checks
> the capabilities from the array supplied by the guest kernel, analyses
> if QEMU can or cannot provide with the requested features.
> If QEMU supports everything the guest has requested, it returns from rtas
> call and the guest continues booting.
> If some parameter changes, QEMU fixes the device tree and reboots
> the guest with a new tree.
> 
> In this version, the ibm,client-architecture-support handler checks
> if the current CPU is in the list from the guest and if it is not, QEMU
> adds a "cpu-version" property to a cpu node with the best of logical PVRs
> supported by the guest.
> 
> Technically QEMU reboots and as a part of reboot, it fixes the tree and
> this is when the cpu-version property is actually added.
> 
> Although it seems possible to add a custom interface between SLOF and QEMU
> and implement device tree update on the fly to avoid a guest reboot,
> there still may be cases when device tree change would not be enough.
> As an example, the guest may ask for a bigger RMA area than QEMU allocates
> by default.
> 
> The patch depends on "[PATCH v5] powerpc: add PVR mask support".
> 
> Cc: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> Cc: Andreas Färber <afaerber@suse.de>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> hw/ppc/spapr.c              | 10 ++++++
> hw/ppc/spapr_hcall.c        | 76 +++++++++++++++++++++++++++++++++++++++++++++
> include/hw/ppc/spapr.h      |  7 ++++-
> target-ppc/cpu-models.h     | 13 ++++++++
> target-ppc/cpu-qom.h        |  1 +
> target-ppc/translate_init.c |  3 ++
> 6 files changed, 109 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 13574bf..5adf53c 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -238,6 +238,16 @@ static int spapr_fixup_cpu_dt(void *fdt, sPAPREnvironment *spapr)
>         if (ret < 0) {
>             return ret;
>         }
> +
> +        if (spapr->pvr_new) {
> +            ret = fdt_setprop(fdt, offset, "cpu-version",
> +                              &spapr->pvr_new, sizeof(spapr->pvr_new));
> +            if (ret < 0) {
> +                return ret;
> +            }
> +            /* Reset as the guest after reboot may give other PVR set */
> +            spapr->pvr_new = 0;
> +        }
>     }
>     return ret;
> }
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index 9f6e7b8..509de89 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -3,6 +3,7 @@
> #include "helper_regs.h"
> #include "hw/ppc/spapr.h"
> #include "mmu-hash64.h"
> +#include "cpu-models.h"
> 
> static target_ulong h_random(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>                            target_ulong opcode, target_ulong *args)
> @@ -792,6 +793,78 @@ out:
>     return ret;
> }
> 
> +static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
> +                                                  sPAPREnvironment *spapr,
> +                                                  target_ulong opcode,
> +                                                  target_ulong *args)
> +{
> +    target_ulong list = args[0];
> +    int i, number_of_option_vectors;
> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
> +    bool cpu_match = false;
> +    unsigned compat_cpu_level = 0, pvr_new;
> +
> +    /* Parse PVR list */
> +    for ( ; ; ) {
> +        uint32_t pvr, pvr_mask;
> +
> +        pvr_mask = ldl_phys(list);
> +        list += 4;
> +        pvr = ldl_phys(list);
> +        list += 4;
> +
> +        if ((cpu->env.spr[SPR_PVR] & pvr_mask) == (pvr & pvr_mask)) {
> +            cpu_match = true;
> +            pvr_new = cpu->env.spr[SPR_PVR];
> +        }
> +
> +        /* Is it a logical PVR? */
> +        if ((pvr & CPU_POWERPC_LOGICAL_MASK) == CPU_POWERPC_LOGICAL_MASK) {
> +            switch (pvr) {
> +            case CPU_POWERPC_LOGICAL_2_05:
> +                if ((pcc->pcr & POWERPC_ISA_COMPAT_2_05) &&
> +                    (compat_cpu_level < 2050)) {
> +                    compat_cpu_level = 2050;
> +                    pvr_new = pvr;
> +                }
> +                break;
> +            case CPU_POWERPC_LOGICAL_2_06:
> +                if ((pcc->pcr & POWERPC_ISA_COMPAT_2_06) &&
> +                    (compat_cpu_level < 2060)) {
> +                    compat_cpu_level = 2060;
> +                    pvr_new = pvr;
> +                }
> +                break;
> +            case CPU_POWERPC_LOGICAL_2_06_PLUS:
> +                if ((pcc->pcr & POWERPC_ISA_COMPAT_2_06) &&
> +                    (compat_cpu_level < 2061)) {
> +                    compat_cpu_level = 2061;
> +                    pvr_new = pvr;
> +                }
> +                break;
> +            }
> +        }
> +
> +        if (~pvr_mask & pvr) {
> +            break;
> +        }
> +    }
> +
> +    /* Parsing finished. Making decision whether to reboot or not */
> +    if (cpu_match) {
> +        return H_SUCCESS;
> +    }
> +    if (pvr_new == cpu->env.spr[SPR_PVR]) {
> +        return H_SUCCESS;
> +    }
> +
> +    cpu->env.spr[SPR_PVR] = pvr_new;

This would change the SPR value the guest sees. IIUC this is not what is happening with this call.

> +    spapr->pvr_new = pvr_new;
> +    qemu_system_reset_request();

I think there should be a way to define the compat mode from the beginning without the need to reboot the machine from itself. That way management tools can straight on create POWER6 compat machines without jumping through reboot hoops.


Alex

> +
> +    return H_FUNCTION;
> +}
> +
> static spapr_hcall_fn papr_hypercall_table[(MAX_HCALL_OPCODE / 4) + 1];
> static spapr_hcall_fn kvmppc_hypercall_table[KVMPPC_HCALL_MAX - KVMPPC_HCALL_BASE + 1];
> 
> @@ -879,6 +952,9 @@ static void hypercall_register_types(void)
>     spapr_register_hypercall(KVMPPC_H_RTAS, h_rtas);
> 
>     spapr_register_hypercall(H_SET_MODE, h_set_mode);
> +
> +    /* ibm,client-architecture-support support */
> +    spapr_register_hypercall(KVMPPC_H_CAS, h_client_architecture_support);
> }
> 
> type_init(hypercall_register_types)
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 995ddcf..e2158ff 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -44,6 +44,9 @@ typedef struct sPAPREnvironment {
>         uint64_t middle_iterations, middle_chunks, middle_valid, middle_invalid;
>         uint64_t final_chunks, final_valid, final_invalid;
>     } migstats;
> +
> +    uint32_t pvr_new;
> +
> } sPAPREnvironment;
> 
> #define H_SUCCESS         0
> @@ -304,7 +307,9 @@ typedef struct sPAPREnvironment {
> #define KVMPPC_HCALL_BASE       0xf000
> #define KVMPPC_H_RTAS           (KVMPPC_HCALL_BASE + 0x0)
> #define KVMPPC_H_LOGICAL_MEMOP  (KVMPPC_HCALL_BASE + 0x1)
> -#define KVMPPC_HCALL_MAX        KVMPPC_H_LOGICAL_MEMOP
> +/* Client Architecture support */
> +#define KVMPPC_H_CAS            (KVMPPC_HCALL_BASE + 0x2)
> +#define KVMPPC_HCALL_MAX        KVMPPC_H_CAS
> 
> extern sPAPREnvironment *spapr;
> 
> diff --git a/target-ppc/cpu-models.h b/target-ppc/cpu-models.h
> index 49ba4a4..a4e8763 100644
> --- a/target-ppc/cpu-models.h
> +++ b/target-ppc/cpu-models.h
> @@ -583,6 +583,13 @@ enum {
>     CPU_POWERPC_RS64II             = 0x00340000,
>     CPU_POWERPC_RS64III            = 0x00360000,
>     CPU_POWERPC_RS64IV             = 0x00370000,
> +
> +    /* Logical CPUs */
> +    CPU_POWERPC_LOGICAL_MASK       = 0x0F000000,
> +    CPU_POWERPC_LOGICAL_2_05       = 0x0F000002,
> +    CPU_POWERPC_LOGICAL_2_06       = 0x0F000003,
> +    CPU_POWERPC_LOGICAL_2_06_PLUS  = 0x0F100003,
> +
> #endif /* defined(TARGET_PPC64) */
>     /* Original POWER */
>     /* XXX: should be POWER (RIOS), RSC3308, RSC4608,
> @@ -745,4 +752,10 @@ enum {
>     POWERPC_SVR_8641D              = 0x80900121,
> };
> 
> +/* Processor Compatibility Register (PCR) */
> +enum {
> +    POWERPC_ISA_COMPAT_2_05 = 1ULL << 62,
> +    POWERPC_ISA_COMPAT_2_06 = 1ULL << 61,
> +};
> +
> #endif
> diff --git a/target-ppc/cpu-qom.h b/target-ppc/cpu-qom.h
> index 0ae8b09..702ae96 100644
> --- a/target-ppc/cpu-qom.h
> +++ b/target-ppc/cpu-qom.h
> @@ -55,6 +55,7 @@ typedef struct PowerPCCPUClass {
> 
>     uint32_t pvr;
>     uint32_t pvr_mask;
> +    uint64_t pcr;
>     uint32_t svr;
>     uint64_t insns_flags;
>     uint64_t insns_flags2;
> diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
> index e8cc3c8..645f54d 100644
> --- a/target-ppc/translate_init.c
> +++ b/target-ppc/translate_init.c
> @@ -7249,6 +7249,7 @@ POWERPC_FAMILY(POWER7)(ObjectClass *oc, void *data)
>     dc->desc = "POWER7";
>     pcc->pvr = CPU_POWERPC_POWER7_BASE;
>     pcc->pvr_mask = CPU_POWERPC_POWER7_MASK;
> +    pcc->pcr = POWERPC_ISA_COMPAT_2_05 | POWERPC_ISA_COMPAT_2_06;
>     pcc->init_proc = init_proc_POWER7;
>     pcc->check_pow = check_pow_nocheck;
>     pcc->insns_flags = PPC_INSNS_BASE | PPC_ISEL | PPC_STRING | PPC_MFTB |
> @@ -7286,6 +7287,7 @@ POWERPC_FAMILY(POWER7P)(ObjectClass *oc, void *data)
>     dc->desc = "POWER7+";
>     pcc->pvr = CPU_POWERPC_POWER7P_BASE;
>     pcc->pvr_mask = CPU_POWERPC_POWER7P_MASK;
> +    pcc->pcr = POWERPC_ISA_COMPAT_2_05 | POWERPC_ISA_COMPAT_2_06;
>     pcc->init_proc = init_proc_POWER7;
>     pcc->check_pow = check_pow_nocheck;
>     pcc->insns_flags = PPC_INSNS_BASE | PPC_ISEL | PPC_STRING | PPC_MFTB |
> @@ -7323,6 +7325,7 @@ POWERPC_FAMILY(POWER8)(ObjectClass *oc, void *data)
>     dc->desc = "POWER8";
>     pcc->pvr = CPU_POWERPC_POWER8_BASE;
>     pcc->pvr_mask = CPU_POWERPC_POWER8_MASK;
> +    pcc->pcr = POWERPC_ISA_COMPAT_2_05 | POWERPC_ISA_COMPAT_2_06;
>     pcc->init_proc = init_proc_POWER7;
>     pcc->check_pow = check_pow_nocheck;
>     pcc->insns_flags = PPC_INSNS_BASE | PPC_STRING | PPC_MFTB |
> -- 
> 1.8.4.rc4
>
Alexey Kardashevskiy Sept. 4, 2013, 11:40 a.m. UTC | #2
On 09/04/2013 08:42 PM, Alexander Graf wrote:
> 
> On 04.09.2013, at 12:19, Alexey Kardashevskiy wrote:
> 
>>   This is an RFC patch.
>>
>> The modern Linux kernel supports every known POWERPC CPU so when
>> it boots, it can always find a matching cpu_spec from the cpu_specs array.
>> However if the kernel is quite old, it may be missing the definition of
>> the actual CPU. To provide ability for old kernels to work on modern
>> hardware, a Logical Processor Version concept was introduced in PowerISA.
>> From the hardware prospective, it is supported by PCR (Processor
>> Compatibility Register) which is defined in PowerISA. The register
>> enables compatibility mode which can be set to PowerISA 2.05 or 2.06.
>>
>> PAPR+ specification defines a Logical Processor Version per every
>> version of PowerISA specification. PAPR+ also defines
>> a ibm,client-architecture-support rtas call which purpose is to provide
>> a negotiation mechanism for the guest and the hypervisor to work out
>> the best Logical Processor Version to continue with.
>>
>> At the moment, the Linux kernel calls the ibm,client-architecture-support
>> method and only then reads the device. The current RTAS's handler checks
>> the capabilities from the array supplied by the guest kernel, analyses
>> if QEMU can or cannot provide with the requested features.
>> If QEMU supports everything the guest has requested, it returns from rtas
>> call and the guest continues booting.
>> If some parameter changes, QEMU fixes the device tree and reboots
>> the guest with a new tree.
>>
>> In this version, the ibm,client-architecture-support handler checks
>> if the current CPU is in the list from the guest and if it is not, QEMU
>> adds a "cpu-version" property to a cpu node with the best of logical PVRs
>> supported by the guest.
>>
>> Technically QEMU reboots and as a part of reboot, it fixes the tree and
>> this is when the cpu-version property is actually added.
>>
>> Although it seems possible to add a custom interface between SLOF and QEMU
>> and implement device tree update on the fly to avoid a guest reboot,
>> there still may be cases when device tree change would not be enough.
>> As an example, the guest may ask for a bigger RMA area than QEMU allocates
>> by default.
>>
>> The patch depends on "[PATCH v5] powerpc: add PVR mask support".
>>
>> Cc: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
>> Cc: Andreas Färber <afaerber@suse.de>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>> hw/ppc/spapr.c              | 10 ++++++
>> hw/ppc/spapr_hcall.c        | 76 +++++++++++++++++++++++++++++++++++++++++++++
>> include/hw/ppc/spapr.h      |  7 ++++-
>> target-ppc/cpu-models.h     | 13 ++++++++
>> target-ppc/cpu-qom.h        |  1 +
>> target-ppc/translate_init.c |  3 ++
>> 6 files changed, 109 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 13574bf..5adf53c 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -238,6 +238,16 @@ static int spapr_fixup_cpu_dt(void *fdt, sPAPREnvironment *spapr)
>>         if (ret < 0) {
>>             return ret;
>>         }
>> +
>> +        if (spapr->pvr_new) {
>> +            ret = fdt_setprop(fdt, offset, "cpu-version",
>> +                              &spapr->pvr_new, sizeof(spapr->pvr_new));
>> +            if (ret < 0) {
>> +                return ret;
>> +            }
>> +            /* Reset as the guest after reboot may give other PVR set */
>> +            spapr->pvr_new = 0;
>> +        }
>>     }
>>     return ret;
>> }
>> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
>> index 9f6e7b8..509de89 100644
>> --- a/hw/ppc/spapr_hcall.c
>> +++ b/hw/ppc/spapr_hcall.c
>> @@ -3,6 +3,7 @@
>> #include "helper_regs.h"
>> #include "hw/ppc/spapr.h"
>> #include "mmu-hash64.h"
>> +#include "cpu-models.h"
>>
>> static target_ulong h_random(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>>                            target_ulong opcode, target_ulong *args)
>> @@ -792,6 +793,78 @@ out:
>>     return ret;
>> }
>>
>> +static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
>> +                                                  sPAPREnvironment *spapr,
>> +                                                  target_ulong opcode,
>> +                                                  target_ulong *args)
>> +{
>> +    target_ulong list = args[0];
>> +    int i, number_of_option_vectors;
>> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
>> +    bool cpu_match = false;
>> +    unsigned compat_cpu_level = 0, pvr_new;
>> +
>> +    /* Parse PVR list */
>> +    for ( ; ; ) {
>> +        uint32_t pvr, pvr_mask;
>> +
>> +        pvr_mask = ldl_phys(list);
>> +        list += 4;
>> +        pvr = ldl_phys(list);
>> +        list += 4;
>> +
>> +        if ((cpu->env.spr[SPR_PVR] & pvr_mask) == (pvr & pvr_mask)) {
>> +            cpu_match = true;
>> +            pvr_new = cpu->env.spr[SPR_PVR];
>> +        }
>> +
>> +        /* Is it a logical PVR? */
>> +        if ((pvr & CPU_POWERPC_LOGICAL_MASK) == CPU_POWERPC_LOGICAL_MASK) {
>> +            switch (pvr) {
>> +            case CPU_POWERPC_LOGICAL_2_05:
>> +                if ((pcc->pcr & POWERPC_ISA_COMPAT_2_05) &&
>> +                    (compat_cpu_level < 2050)) {
>> +                    compat_cpu_level = 2050;
>> +                    pvr_new = pvr;
>> +                }
>> +                break;
>> +            case CPU_POWERPC_LOGICAL_2_06:
>> +                if ((pcc->pcr & POWERPC_ISA_COMPAT_2_06) &&
>> +                    (compat_cpu_level < 2060)) {
>> +                    compat_cpu_level = 2060;
>> +                    pvr_new = pvr;
>> +                }
>> +                break;
>> +            case CPU_POWERPC_LOGICAL_2_06_PLUS:
>> +                if ((pcc->pcr & POWERPC_ISA_COMPAT_2_06) &&
>> +                    (compat_cpu_level < 2061)) {
>> +                    compat_cpu_level = 2061;
>> +                    pvr_new = pvr;
>> +                }
>> +                break;
>> +            }
>> +        }
>> +
>> +        if (~pvr_mask & pvr) {
>> +            break;
>> +        }
>> +    }
>> +
>> +    /* Parsing finished. Making decision whether to reboot or not */
>> +    if (cpu_match) {
>> +        return H_SUCCESS;
>> +    }
>> +    if (pvr_new == cpu->env.spr[SPR_PVR]) {
>> +        return H_SUCCESS;
>> +    }
>> +
>> +    cpu->env.spr[SPR_PVR] = pvr_new;
> 

> This would change the SPR value the guest sees. IIUC this is not what is
> happening with this call.

I am always trying to avoid adding new variables but yes, this is not the
case, I'll fix it.

> 
>> +    spapr->pvr_new = pvr_new;
>> +    qemu_system_reset_request();
> 

> I think there should be a way to define the compat mode from the
> beginning without the need to reboot the machine from itself. 

That is a different problem which I do not how to solve in a way to make
everybody happy. Add logical CPUs to every CPU family (at least for
POWER7/7+/8)?


> That way
> management tools can straight on create POWER6 compat machines without
> jumping through reboot hoops.

One of the examples (came from Paul) is:
the host runs on POWER8, the guest boots a kernel which is capable of
POWER7 only + POWER7-compat. We do this reboot tweak and boot in
POWER7-compat mode. Then the guest does "yum update" and gets POWER8 kernel
installed so when it reboots, it will boot in normal POWER8 mode. Everybody
is happy.

Having POWER7-compat mode set from the very beginning will break this
behaviour.


> 
> Alex
> 
>> +
>> +    return H_FUNCTION;
>> +}
>> +
>> static spapr_hcall_fn papr_hypercall_table[(MAX_HCALL_OPCODE / 4) + 1];
>> static spapr_hcall_fn kvmppc_hypercall_table[KVMPPC_HCALL_MAX - KVMPPC_HCALL_BASE + 1];
>>
>> @@ -879,6 +952,9 @@ static void hypercall_register_types(void)
>>     spapr_register_hypercall(KVMPPC_H_RTAS, h_rtas);
>>
>>     spapr_register_hypercall(H_SET_MODE, h_set_mode);
>> +
>> +    /* ibm,client-architecture-support support */
>> +    spapr_register_hypercall(KVMPPC_H_CAS, h_client_architecture_support);
>> }
>>
>> type_init(hypercall_register_types)
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index 995ddcf..e2158ff 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -44,6 +44,9 @@ typedef struct sPAPREnvironment {
>>         uint64_t middle_iterations, middle_chunks, middle_valid, middle_invalid;
>>         uint64_t final_chunks, final_valid, final_invalid;
>>     } migstats;
>> +
>> +    uint32_t pvr_new;
>> +
>> } sPAPREnvironment;
>>
>> #define H_SUCCESS         0
>> @@ -304,7 +307,9 @@ typedef struct sPAPREnvironment {
>> #define KVMPPC_HCALL_BASE       0xf000
>> #define KVMPPC_H_RTAS           (KVMPPC_HCALL_BASE + 0x0)
>> #define KVMPPC_H_LOGICAL_MEMOP  (KVMPPC_HCALL_BASE + 0x1)
>> -#define KVMPPC_HCALL_MAX        KVMPPC_H_LOGICAL_MEMOP
>> +/* Client Architecture support */
>> +#define KVMPPC_H_CAS            (KVMPPC_HCALL_BASE + 0x2)
>> +#define KVMPPC_HCALL_MAX        KVMPPC_H_CAS
>>
>> extern sPAPREnvironment *spapr;
>>
>> diff --git a/target-ppc/cpu-models.h b/target-ppc/cpu-models.h
>> index 49ba4a4..a4e8763 100644
>> --- a/target-ppc/cpu-models.h
>> +++ b/target-ppc/cpu-models.h
>> @@ -583,6 +583,13 @@ enum {
>>     CPU_POWERPC_RS64II             = 0x00340000,
>>     CPU_POWERPC_RS64III            = 0x00360000,
>>     CPU_POWERPC_RS64IV             = 0x00370000,
>> +
>> +    /* Logical CPUs */
>> +    CPU_POWERPC_LOGICAL_MASK       = 0x0F000000,
>> +    CPU_POWERPC_LOGICAL_2_05       = 0x0F000002,
>> +    CPU_POWERPC_LOGICAL_2_06       = 0x0F000003,
>> +    CPU_POWERPC_LOGICAL_2_06_PLUS  = 0x0F100003,
>> +
>> #endif /* defined(TARGET_PPC64) */
>>     /* Original POWER */
>>     /* XXX: should be POWER (RIOS), RSC3308, RSC4608,
>> @@ -745,4 +752,10 @@ enum {
>>     POWERPC_SVR_8641D              = 0x80900121,
>> };
>>
>> +/* Processor Compatibility Register (PCR) */
>> +enum {
>> +    POWERPC_ISA_COMPAT_2_05 = 1ULL << 62,
>> +    POWERPC_ISA_COMPAT_2_06 = 1ULL << 61,
>> +};
>> +
>> #endif
>> diff --git a/target-ppc/cpu-qom.h b/target-ppc/cpu-qom.h
>> index 0ae8b09..702ae96 100644
>> --- a/target-ppc/cpu-qom.h
>> +++ b/target-ppc/cpu-qom.h
>> @@ -55,6 +55,7 @@ typedef struct PowerPCCPUClass {
>>
>>     uint32_t pvr;
>>     uint32_t pvr_mask;
>> +    uint64_t pcr;
>>     uint32_t svr;
>>     uint64_t insns_flags;
>>     uint64_t insns_flags2;
>> diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
>> index e8cc3c8..645f54d 100644
>> --- a/target-ppc/translate_init.c
>> +++ b/target-ppc/translate_init.c
>> @@ -7249,6 +7249,7 @@ POWERPC_FAMILY(POWER7)(ObjectClass *oc, void *data)
>>     dc->desc = "POWER7";
>>     pcc->pvr = CPU_POWERPC_POWER7_BASE;
>>     pcc->pvr_mask = CPU_POWERPC_POWER7_MASK;
>> +    pcc->pcr = POWERPC_ISA_COMPAT_2_05 | POWERPC_ISA_COMPAT_2_06;
>>     pcc->init_proc = init_proc_POWER7;
>>     pcc->check_pow = check_pow_nocheck;
>>     pcc->insns_flags = PPC_INSNS_BASE | PPC_ISEL | PPC_STRING | PPC_MFTB |
>> @@ -7286,6 +7287,7 @@ POWERPC_FAMILY(POWER7P)(ObjectClass *oc, void *data)
>>     dc->desc = "POWER7+";
>>     pcc->pvr = CPU_POWERPC_POWER7P_BASE;
>>     pcc->pvr_mask = CPU_POWERPC_POWER7P_MASK;
>> +    pcc->pcr = POWERPC_ISA_COMPAT_2_05 | POWERPC_ISA_COMPAT_2_06;
>>     pcc->init_proc = init_proc_POWER7;
>>     pcc->check_pow = check_pow_nocheck;
>>     pcc->insns_flags = PPC_INSNS_BASE | PPC_ISEL | PPC_STRING | PPC_MFTB |
>> @@ -7323,6 +7325,7 @@ POWERPC_FAMILY(POWER8)(ObjectClass *oc, void *data)
>>     dc->desc = "POWER8";
>>     pcc->pvr = CPU_POWERPC_POWER8_BASE;
>>     pcc->pvr_mask = CPU_POWERPC_POWER8_MASK;
>> +    pcc->pcr = POWERPC_ISA_COMPAT_2_05 | POWERPC_ISA_COMPAT_2_06;
>>     pcc->init_proc = init_proc_POWER7;
>>     pcc->check_pow = check_pow_nocheck;
>>     pcc->insns_flags = PPC_INSNS_BASE | PPC_STRING | PPC_MFTB |
>> -- 
>> 1.8.4.rc4
>>
>
Benjamin Herrenschmidt Sept. 4, 2013, 11:54 a.m. UTC | #3
On Wed, 2013-09-04 at 21:40 +1000, Alexey Kardashevskiy wrote:
> One of the examples (came from Paul) is:
> the host runs on POWER8, the guest boots a kernel which is capable of
> POWER7 only + POWER7-compat. We do this reboot tweak and boot in
> POWER7-compat mode. Then the guest does "yum update" and gets POWER8
> kernel
> installed so when it reboots, it will boot in normal POWER8 mode.
> Everybody
> is happy.
> 
> Having POWER7-compat mode set from the very beginning will break this
> behaviour.

But it will allow migrating that partition to a P7 machine. It's
important to be able to specify the compat mode for that reason, to
create guests that can be migrated to older machines.

Cheers,
Ben.
Alexander Graf Sept. 4, 2013, 12:13 p.m. UTC | #4
On 04.09.2013, at 13:40, Alexey Kardashevskiy wrote:

> On 09/04/2013 08:42 PM, Alexander Graf wrote:
>> 
>> On 04.09.2013, at 12:19, Alexey Kardashevskiy wrote:
>> 
>>>  This is an RFC patch.
>>> 
>>> The modern Linux kernel supports every known POWERPC CPU so when
>>> it boots, it can always find a matching cpu_spec from the cpu_specs array.
>>> However if the kernel is quite old, it may be missing the definition of
>>> the actual CPU. To provide ability for old kernels to work on modern
>>> hardware, a Logical Processor Version concept was introduced in PowerISA.
>>> From the hardware prospective, it is supported by PCR (Processor
>>> Compatibility Register) which is defined in PowerISA. The register
>>> enables compatibility mode which can be set to PowerISA 2.05 or 2.06.
>>> 
>>> PAPR+ specification defines a Logical Processor Version per every
>>> version of PowerISA specification. PAPR+ also defines
>>> a ibm,client-architecture-support rtas call which purpose is to provide
>>> a negotiation mechanism for the guest and the hypervisor to work out
>>> the best Logical Processor Version to continue with.
>>> 
>>> At the moment, the Linux kernel calls the ibm,client-architecture-support
>>> method and only then reads the device. The current RTAS's handler checks
>>> the capabilities from the array supplied by the guest kernel, analyses
>>> if QEMU can or cannot provide with the requested features.
>>> If QEMU supports everything the guest has requested, it returns from rtas
>>> call and the guest continues booting.
>>> If some parameter changes, QEMU fixes the device tree and reboots
>>> the guest with a new tree.
>>> 
>>> In this version, the ibm,client-architecture-support handler checks
>>> if the current CPU is in the list from the guest and if it is not, QEMU
>>> adds a "cpu-version" property to a cpu node with the best of logical PVRs
>>> supported by the guest.
>>> 
>>> Technically QEMU reboots and as a part of reboot, it fixes the tree and
>>> this is when the cpu-version property is actually added.
>>> 
>>> Although it seems possible to add a custom interface between SLOF and QEMU
>>> and implement device tree update on the fly to avoid a guest reboot,
>>> there still may be cases when device tree change would not be enough.
>>> As an example, the guest may ask for a bigger RMA area than QEMU allocates
>>> by default.
>>> 
>>> The patch depends on "[PATCH v5] powerpc: add PVR mask support".
>>> 
>>> Cc: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
>>> Cc: Andreas Färber <afaerber@suse.de>
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---
>>> hw/ppc/spapr.c              | 10 ++++++
>>> hw/ppc/spapr_hcall.c        | 76 +++++++++++++++++++++++++++++++++++++++++++++
>>> include/hw/ppc/spapr.h      |  7 ++++-
>>> target-ppc/cpu-models.h     | 13 ++++++++
>>> target-ppc/cpu-qom.h        |  1 +
>>> target-ppc/translate_init.c |  3 ++
>>> 6 files changed, 109 insertions(+), 1 deletion(-)
>>> 
>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>> index 13574bf..5adf53c 100644
>>> --- a/hw/ppc/spapr.c
>>> +++ b/hw/ppc/spapr.c
>>> @@ -238,6 +238,16 @@ static int spapr_fixup_cpu_dt(void *fdt, sPAPREnvironment *spapr)
>>>        if (ret < 0) {
>>>            return ret;
>>>        }
>>> +
>>> +        if (spapr->pvr_new) {
>>> +            ret = fdt_setprop(fdt, offset, "cpu-version",
>>> +                              &spapr->pvr_new, sizeof(spapr->pvr_new));
>>> +            if (ret < 0) {
>>> +                return ret;
>>> +            }
>>> +            /* Reset as the guest after reboot may give other PVR set */
>>> +            spapr->pvr_new = 0;
>>> +        }
>>>    }
>>>    return ret;
>>> }
>>> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
>>> index 9f6e7b8..509de89 100644
>>> --- a/hw/ppc/spapr_hcall.c
>>> +++ b/hw/ppc/spapr_hcall.c
>>> @@ -3,6 +3,7 @@
>>> #include "helper_regs.h"
>>> #include "hw/ppc/spapr.h"
>>> #include "mmu-hash64.h"
>>> +#include "cpu-models.h"
>>> 
>>> static target_ulong h_random(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>>>                           target_ulong opcode, target_ulong *args)
>>> @@ -792,6 +793,78 @@ out:
>>>    return ret;
>>> }
>>> 
>>> +static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
>>> +                                                  sPAPREnvironment *spapr,
>>> +                                                  target_ulong opcode,
>>> +                                                  target_ulong *args)
>>> +{
>>> +    target_ulong list = args[0];
>>> +    int i, number_of_option_vectors;
>>> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
>>> +    bool cpu_match = false;
>>> +    unsigned compat_cpu_level = 0, pvr_new;
>>> +
>>> +    /* Parse PVR list */
>>> +    for ( ; ; ) {
>>> +        uint32_t pvr, pvr_mask;
>>> +
>>> +        pvr_mask = ldl_phys(list);
>>> +        list += 4;
>>> +        pvr = ldl_phys(list);
>>> +        list += 4;
>>> +
>>> +        if ((cpu->env.spr[SPR_PVR] & pvr_mask) == (pvr & pvr_mask)) {
>>> +            cpu_match = true;
>>> +            pvr_new = cpu->env.spr[SPR_PVR];
>>> +        }
>>> +
>>> +        /* Is it a logical PVR? */
>>> +        if ((pvr & CPU_POWERPC_LOGICAL_MASK) == CPU_POWERPC_LOGICAL_MASK) {
>>> +            switch (pvr) {
>>> +            case CPU_POWERPC_LOGICAL_2_05:
>>> +                if ((pcc->pcr & POWERPC_ISA_COMPAT_2_05) &&
>>> +                    (compat_cpu_level < 2050)) {
>>> +                    compat_cpu_level = 2050;
>>> +                    pvr_new = pvr;
>>> +                }
>>> +                break;
>>> +            case CPU_POWERPC_LOGICAL_2_06:
>>> +                if ((pcc->pcr & POWERPC_ISA_COMPAT_2_06) &&
>>> +                    (compat_cpu_level < 2060)) {
>>> +                    compat_cpu_level = 2060;
>>> +                    pvr_new = pvr;
>>> +                }
>>> +                break;
>>> +            case CPU_POWERPC_LOGICAL_2_06_PLUS:
>>> +                if ((pcc->pcr & POWERPC_ISA_COMPAT_2_06) &&
>>> +                    (compat_cpu_level < 2061)) {
>>> +                    compat_cpu_level = 2061;
>>> +                    pvr_new = pvr;
>>> +                }
>>> +                break;
>>> +            }
>>> +        }
>>> +
>>> +        if (~pvr_mask & pvr) {
>>> +            break;
>>> +        }
>>> +    }
>>> +
>>> +    /* Parsing finished. Making decision whether to reboot or not */
>>> +    if (cpu_match) {
>>> +        return H_SUCCESS;
>>> +    }
>>> +    if (pvr_new == cpu->env.spr[SPR_PVR]) {
>>> +        return H_SUCCESS;
>>> +    }
>>> +
>>> +    cpu->env.spr[SPR_PVR] = pvr_new;
>> 
> 
>> This would change the SPR value the guest sees. IIUC this is not what is
>> happening with this call.
> 
> I am always trying to avoid adding new variables but yes, this is not the
> case, I'll fix it.
> 
>> 
>>> +    spapr->pvr_new = pvr_new;
>>> +    qemu_system_reset_request();
>> 
> 
>> I think there should be a way to define the compat mode from the
>> beginning without the need to reboot the machine from itself. 
> 
> That is a different problem which I do not how to solve in a way to make
> everybody happy. Add logical CPUs to every CPU family (at least for
> POWER7/7+/8)?

I'm not sure the CPU is really the right place to put this information. After all, you are not changing the CPU itself, you're only changing a few bits inside of that CPU that make it appear closer to the legacy CPU plus a few device tree changes.

So IMHO this whole thing should be orthogonal to -cpu.

> 
> 
>> That way
>> management tools can straight on create POWER6 compat machines without
>> jumping through reboot hoops.
> 
> One of the examples (came from Paul) is:
> the host runs on POWER8, the guest boots a kernel which is capable of
> POWER7 only + POWER7-compat. We do this reboot tweak and boot in
> POWER7-compat mode. Then the guest does "yum update" and gets POWER8 kernel
> installed so when it reboots, it will boot in normal POWER8 mode. Everybody
> is happy.
> 
> Having POWER7-compat mode set from the very beginning will break this
> behaviour.

Why? Just because you're on POWER7 as default doesn't mean you can't bump to a newer compat later on, no?


Alex
Alexey Kardashevskiy Sept. 4, 2013, 12:59 p.m. UTC | #5
On 09/04/2013 09:54 PM, Benjamin Herrenschmidt wrote:
> On Wed, 2013-09-04 at 21:40 +1000, Alexey Kardashevskiy wrote:
>> One of the examples (came from Paul) is:
>> the host runs on POWER8, the guest boots a kernel which is capable of
>> POWER7 only + POWER7-compat. We do this reboot tweak and boot in
>> POWER7-compat mode. Then the guest does "yum update" and gets POWER8
>> kernel
>> installed so when it reboots, it will boot in normal POWER8 mode.
>> Everybody
>> is happy.
>>
>> Having POWER7-compat mode set from the very beginning will break this
>> behaviour.
> 
> But it will allow migrating that partition to a P7 machine. It's
> important to be able to specify the compat mode for that reason, to
> create guests that can be migrated to older machines.

I am not saying we do not need this feature, I am saying we do not always
want it.
Alexey Kardashevskiy Sept. 4, 2013, 1:08 p.m. UTC | #6
On 09/04/2013 10:13 PM, Alexander Graf wrote:
> 
> On 04.09.2013, at 13:40, Alexey Kardashevskiy wrote:
> 
>> On 09/04/2013 08:42 PM, Alexander Graf wrote:
>>>
>>> On 04.09.2013, at 12:19, Alexey Kardashevskiy wrote:
>>>
>>>>  This is an RFC patch.
>>>>
>>>> The modern Linux kernel supports every known POWERPC CPU so when
>>>> it boots, it can always find a matching cpu_spec from the cpu_specs array.
>>>> However if the kernel is quite old, it may be missing the definition of
>>>> the actual CPU. To provide ability for old kernels to work on modern
>>>> hardware, a Logical Processor Version concept was introduced in PowerISA.
>>>> From the hardware prospective, it is supported by PCR (Processor
>>>> Compatibility Register) which is defined in PowerISA. The register
>>>> enables compatibility mode which can be set to PowerISA 2.05 or 2.06.
>>>>
>>>> PAPR+ specification defines a Logical Processor Version per every
>>>> version of PowerISA specification. PAPR+ also defines
>>>> a ibm,client-architecture-support rtas call which purpose is to provide
>>>> a negotiation mechanism for the guest and the hypervisor to work out
>>>> the best Logical Processor Version to continue with.
>>>>
>>>> At the moment, the Linux kernel calls the ibm,client-architecture-support
>>>> method and only then reads the device. The current RTAS's handler checks
>>>> the capabilities from the array supplied by the guest kernel, analyses
>>>> if QEMU can or cannot provide with the requested features.
>>>> If QEMU supports everything the guest has requested, it returns from rtas
>>>> call and the guest continues booting.
>>>> If some parameter changes, QEMU fixes the device tree and reboots
>>>> the guest with a new tree.
>>>>
>>>> In this version, the ibm,client-architecture-support handler checks
>>>> if the current CPU is in the list from the guest and if it is not, QEMU
>>>> adds a "cpu-version" property to a cpu node with the best of logical PVRs
>>>> supported by the guest.
>>>>
>>>> Technically QEMU reboots and as a part of reboot, it fixes the tree and
>>>> this is when the cpu-version property is actually added.
>>>>
>>>> Although it seems possible to add a custom interface between SLOF and QEMU
>>>> and implement device tree update on the fly to avoid a guest reboot,
>>>> there still may be cases when device tree change would not be enough.
>>>> As an example, the guest may ask for a bigger RMA area than QEMU allocates
>>>> by default.
>>>>
>>>> The patch depends on "[PATCH v5] powerpc: add PVR mask support".
>>>>
>>>> Cc: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
>>>> Cc: Andreas Färber <afaerber@suse.de>
>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>> ---
>>>> hw/ppc/spapr.c              | 10 ++++++
>>>> hw/ppc/spapr_hcall.c        | 76 +++++++++++++++++++++++++++++++++++++++++++++
>>>> include/hw/ppc/spapr.h      |  7 ++++-
>>>> target-ppc/cpu-models.h     | 13 ++++++++
>>>> target-ppc/cpu-qom.h        |  1 +
>>>> target-ppc/translate_init.c |  3 ++
>>>> 6 files changed, 109 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>> index 13574bf..5adf53c 100644
>>>> --- a/hw/ppc/spapr.c
>>>> +++ b/hw/ppc/spapr.c
>>>> @@ -238,6 +238,16 @@ static int spapr_fixup_cpu_dt(void *fdt, sPAPREnvironment *spapr)
>>>>        if (ret < 0) {
>>>>            return ret;
>>>>        }
>>>> +
>>>> +        if (spapr->pvr_new) {
>>>> +            ret = fdt_setprop(fdt, offset, "cpu-version",
>>>> +                              &spapr->pvr_new, sizeof(spapr->pvr_new));
>>>> +            if (ret < 0) {
>>>> +                return ret;
>>>> +            }
>>>> +            /* Reset as the guest after reboot may give other PVR set */
>>>> +            spapr->pvr_new = 0;
>>>> +        }
>>>>    }
>>>>    return ret;
>>>> }
>>>> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
>>>> index 9f6e7b8..509de89 100644
>>>> --- a/hw/ppc/spapr_hcall.c
>>>> +++ b/hw/ppc/spapr_hcall.c
>>>> @@ -3,6 +3,7 @@
>>>> #include "helper_regs.h"
>>>> #include "hw/ppc/spapr.h"
>>>> #include "mmu-hash64.h"
>>>> +#include "cpu-models.h"
>>>>
>>>> static target_ulong h_random(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>>>>                           target_ulong opcode, target_ulong *args)
>>>> @@ -792,6 +793,78 @@ out:
>>>>    return ret;
>>>> }
>>>>
>>>> +static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
>>>> +                                                  sPAPREnvironment *spapr,
>>>> +                                                  target_ulong opcode,
>>>> +                                                  target_ulong *args)
>>>> +{
>>>> +    target_ulong list = args[0];
>>>> +    int i, number_of_option_vectors;
>>>> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
>>>> +    bool cpu_match = false;
>>>> +    unsigned compat_cpu_level = 0, pvr_new;
>>>> +
>>>> +    /* Parse PVR list */
>>>> +    for ( ; ; ) {
>>>> +        uint32_t pvr, pvr_mask;
>>>> +
>>>> +        pvr_mask = ldl_phys(list);
>>>> +        list += 4;
>>>> +        pvr = ldl_phys(list);
>>>> +        list += 4;
>>>> +
>>>> +        if ((cpu->env.spr[SPR_PVR] & pvr_mask) == (pvr & pvr_mask)) {
>>>> +            cpu_match = true;
>>>> +            pvr_new = cpu->env.spr[SPR_PVR];
>>>> +        }
>>>> +
>>>> +        /* Is it a logical PVR? */
>>>> +        if ((pvr & CPU_POWERPC_LOGICAL_MASK) == CPU_POWERPC_LOGICAL_MASK) {
>>>> +            switch (pvr) {
>>>> +            case CPU_POWERPC_LOGICAL_2_05:
>>>> +                if ((pcc->pcr & POWERPC_ISA_COMPAT_2_05) &&
>>>> +                    (compat_cpu_level < 2050)) {
>>>> +                    compat_cpu_level = 2050;
>>>> +                    pvr_new = pvr;
>>>> +                }
>>>> +                break;
>>>> +            case CPU_POWERPC_LOGICAL_2_06:
>>>> +                if ((pcc->pcr & POWERPC_ISA_COMPAT_2_06) &&
>>>> +                    (compat_cpu_level < 2060)) {
>>>> +                    compat_cpu_level = 2060;
>>>> +                    pvr_new = pvr;
>>>> +                }
>>>> +                break;
>>>> +            case CPU_POWERPC_LOGICAL_2_06_PLUS:
>>>> +                if ((pcc->pcr & POWERPC_ISA_COMPAT_2_06) &&
>>>> +                    (compat_cpu_level < 2061)) {
>>>> +                    compat_cpu_level = 2061;
>>>> +                    pvr_new = pvr;
>>>> +                }
>>>> +                break;
>>>> +            }
>>>> +        }
>>>> +
>>>> +        if (~pvr_mask & pvr) {
>>>> +            break;
>>>> +        }
>>>> +    }
>>>> +
>>>> +    /* Parsing finished. Making decision whether to reboot or not */
>>>> +    if (cpu_match) {
>>>> +        return H_SUCCESS;
>>>> +    }
>>>> +    if (pvr_new == cpu->env.spr[SPR_PVR]) {
>>>> +        return H_SUCCESS;
>>>> +    }
>>>> +
>>>> +    cpu->env.spr[SPR_PVR] = pvr_new;
>>>
>>
>>> This would change the SPR value the guest sees. IIUC this is not what is
>>> happening with this call.
>>
>> I am always trying to avoid adding new variables but yes, this is not the
>> case, I'll fix it.
>>
>>>
>>>> +    spapr->pvr_new = pvr_new;
>>>> +    qemu_system_reset_request();
>>>
>>
>>> I think there should be a way to define the compat mode from the
>>> beginning without the need to reboot the machine from itself. 
>>
>> That is a different problem which I do not how to solve in a way to make
>> everybody happy. Add logical CPUs to every CPU family (at least for
>> POWER7/7+/8)?
> 

> I'm not sure the CPU is really the right place to put this information.
> After all, you are not changing the CPU itself, you're only changing a
> few bits inside of that CPU that make it appear closer to the legacy CPU
> plus a few device tree changes.

> So IMHO this whole thing should be orthogonal to -cpu.

Well, since we cannot change CPU class on the fly, yes, it should be a
"compatibility" flags/properties/methods/whatever of the default CPU for
the specific family.


>>> That way
>>> management tools can straight on create POWER6 compat machines without
>>> jumping through reboot hoops.
>>
>> One of the examples (came from Paul) is:
>> the host runs on POWER8, the guest boots a kernel which is capable of
>> POWER7 only + POWER7-compat. We do this reboot tweak and boot in
>> POWER7-compat mode. Then the guest does "yum update" and gets POWER8 kernel
>> installed so when it reboots, it will boot in normal POWER8 mode. Everybody
>> is happy.
>>
>> Having POWER7-compat mode set from the very beginning will break this
>> behaviour.
> 
> Why? Just because you're on POWER7 as default doesn't mean you can't bump to a newer compat later on, no?

Bump when exactly? And it won't be a new compat, it will be a native CPU. I
thought you are totally against reboots and you want libvirt (for example)
to detect that this was ibm,client-architecture-support and reboot the
guest with new CPU type (or machine option).


> 
> 
> Alex
>
Alexander Graf Sept. 4, 2013, 1:37 p.m. UTC | #7
On 04.09.2013, at 15:08, Alexey Kardashevskiy wrote:

> On 09/04/2013 10:13 PM, Alexander Graf wrote:
>> 
>> On 04.09.2013, at 13:40, Alexey Kardashevskiy wrote:
>> 
>>> On 09/04/2013 08:42 PM, Alexander Graf wrote:
>>>> 
>>>> On 04.09.2013, at 12:19, Alexey Kardashevskiy wrote:
>>>> 
>>>>> This is an RFC patch.
>>>>> 
>>>>> The modern Linux kernel supports every known POWERPC CPU so when
>>>>> it boots, it can always find a matching cpu_spec from the cpu_specs array.
>>>>> However if the kernel is quite old, it may be missing the definition of
>>>>> the actual CPU. To provide ability for old kernels to work on modern
>>>>> hardware, a Logical Processor Version concept was introduced in PowerISA.
>>>>> From the hardware prospective, it is supported by PCR (Processor
>>>>> Compatibility Register) which is defined in PowerISA. The register
>>>>> enables compatibility mode which can be set to PowerISA 2.05 or 2.06.
>>>>> 
>>>>> PAPR+ specification defines a Logical Processor Version per every
>>>>> version of PowerISA specification. PAPR+ also defines
>>>>> a ibm,client-architecture-support rtas call which purpose is to provide
>>>>> a negotiation mechanism for the guest and the hypervisor to work out
>>>>> the best Logical Processor Version to continue with.
>>>>> 
>>>>> At the moment, the Linux kernel calls the ibm,client-architecture-support
>>>>> method and only then reads the device. The current RTAS's handler checks
>>>>> the capabilities from the array supplied by the guest kernel, analyses
>>>>> if QEMU can or cannot provide with the requested features.
>>>>> If QEMU supports everything the guest has requested, it returns from rtas
>>>>> call and the guest continues booting.
>>>>> If some parameter changes, QEMU fixes the device tree and reboots
>>>>> the guest with a new tree.
>>>>> 
>>>>> In this version, the ibm,client-architecture-support handler checks
>>>>> if the current CPU is in the list from the guest and if it is not, QEMU
>>>>> adds a "cpu-version" property to a cpu node with the best of logical PVRs
>>>>> supported by the guest.
>>>>> 
>>>>> Technically QEMU reboots and as a part of reboot, it fixes the tree and
>>>>> this is when the cpu-version property is actually added.
>>>>> 
>>>>> Although it seems possible to add a custom interface between SLOF and QEMU
>>>>> and implement device tree update on the fly to avoid a guest reboot,
>>>>> there still may be cases when device tree change would not be enough.
>>>>> As an example, the guest may ask for a bigger RMA area than QEMU allocates
>>>>> by default.
>>>>> 
>>>>> The patch depends on "[PATCH v5] powerpc: add PVR mask support".
>>>>> 
>>>>> Cc: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
>>>>> Cc: Andreas Färber <afaerber@suse.de>
>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>> ---
>>>>> hw/ppc/spapr.c              | 10 ++++++
>>>>> hw/ppc/spapr_hcall.c        | 76 +++++++++++++++++++++++++++++++++++++++++++++
>>>>> include/hw/ppc/spapr.h      |  7 ++++-
>>>>> target-ppc/cpu-models.h     | 13 ++++++++
>>>>> target-ppc/cpu-qom.h        |  1 +
>>>>> target-ppc/translate_init.c |  3 ++
>>>>> 6 files changed, 109 insertions(+), 1 deletion(-)
>>>>> 
>>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>>> index 13574bf..5adf53c 100644
>>>>> --- a/hw/ppc/spapr.c
>>>>> +++ b/hw/ppc/spapr.c
>>>>> @@ -238,6 +238,16 @@ static int spapr_fixup_cpu_dt(void *fdt, sPAPREnvironment *spapr)
>>>>>       if (ret < 0) {
>>>>>           return ret;
>>>>>       }
>>>>> +
>>>>> +        if (spapr->pvr_new) {
>>>>> +            ret = fdt_setprop(fdt, offset, "cpu-version",
>>>>> +                              &spapr->pvr_new, sizeof(spapr->pvr_new));
>>>>> +            if (ret < 0) {
>>>>> +                return ret;
>>>>> +            }
>>>>> +            /* Reset as the guest after reboot may give other PVR set */
>>>>> +            spapr->pvr_new = 0;
>>>>> +        }
>>>>>   }
>>>>>   return ret;
>>>>> }
>>>>> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
>>>>> index 9f6e7b8..509de89 100644
>>>>> --- a/hw/ppc/spapr_hcall.c
>>>>> +++ b/hw/ppc/spapr_hcall.c
>>>>> @@ -3,6 +3,7 @@
>>>>> #include "helper_regs.h"
>>>>> #include "hw/ppc/spapr.h"
>>>>> #include "mmu-hash64.h"
>>>>> +#include "cpu-models.h"
>>>>> 
>>>>> static target_ulong h_random(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>>>>>                          target_ulong opcode, target_ulong *args)
>>>>> @@ -792,6 +793,78 @@ out:
>>>>>   return ret;
>>>>> }
>>>>> 
>>>>> +static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
>>>>> +                                                  sPAPREnvironment *spapr,
>>>>> +                                                  target_ulong opcode,
>>>>> +                                                  target_ulong *args)
>>>>> +{
>>>>> +    target_ulong list = args[0];
>>>>> +    int i, number_of_option_vectors;
>>>>> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
>>>>> +    bool cpu_match = false;
>>>>> +    unsigned compat_cpu_level = 0, pvr_new;
>>>>> +
>>>>> +    /* Parse PVR list */
>>>>> +    for ( ; ; ) {
>>>>> +        uint32_t pvr, pvr_mask;
>>>>> +
>>>>> +        pvr_mask = ldl_phys(list);
>>>>> +        list += 4;
>>>>> +        pvr = ldl_phys(list);
>>>>> +        list += 4;
>>>>> +
>>>>> +        if ((cpu->env.spr[SPR_PVR] & pvr_mask) == (pvr & pvr_mask)) {
>>>>> +            cpu_match = true;
>>>>> +            pvr_new = cpu->env.spr[SPR_PVR];
>>>>> +        }
>>>>> +
>>>>> +        /* Is it a logical PVR? */
>>>>> +        if ((pvr & CPU_POWERPC_LOGICAL_MASK) == CPU_POWERPC_LOGICAL_MASK) {
>>>>> +            switch (pvr) {
>>>>> +            case CPU_POWERPC_LOGICAL_2_05:
>>>>> +                if ((pcc->pcr & POWERPC_ISA_COMPAT_2_05) &&
>>>>> +                    (compat_cpu_level < 2050)) {
>>>>> +                    compat_cpu_level = 2050;
>>>>> +                    pvr_new = pvr;
>>>>> +                }
>>>>> +                break;
>>>>> +            case CPU_POWERPC_LOGICAL_2_06:
>>>>> +                if ((pcc->pcr & POWERPC_ISA_COMPAT_2_06) &&
>>>>> +                    (compat_cpu_level < 2060)) {
>>>>> +                    compat_cpu_level = 2060;
>>>>> +                    pvr_new = pvr;
>>>>> +                }
>>>>> +                break;
>>>>> +            case CPU_POWERPC_LOGICAL_2_06_PLUS:
>>>>> +                if ((pcc->pcr & POWERPC_ISA_COMPAT_2_06) &&
>>>>> +                    (compat_cpu_level < 2061)) {
>>>>> +                    compat_cpu_level = 2061;
>>>>> +                    pvr_new = pvr;
>>>>> +                }
>>>>> +                break;
>>>>> +            }
>>>>> +        }
>>>>> +
>>>>> +        if (~pvr_mask & pvr) {
>>>>> +            break;
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>> +    /* Parsing finished. Making decision whether to reboot or not */
>>>>> +    if (cpu_match) {
>>>>> +        return H_SUCCESS;
>>>>> +    }
>>>>> +    if (pvr_new == cpu->env.spr[SPR_PVR]) {
>>>>> +        return H_SUCCESS;
>>>>> +    }
>>>>> +
>>>>> +    cpu->env.spr[SPR_PVR] = pvr_new;
>>>> 
>>> 
>>>> This would change the SPR value the guest sees. IIUC this is not what is
>>>> happening with this call.
>>> 
>>> I am always trying to avoid adding new variables but yes, this is not the
>>> case, I'll fix it.
>>> 
>>>> 
>>>>> +    spapr->pvr_new = pvr_new;
>>>>> +    qemu_system_reset_request();
>>>> 
>>> 
>>>> I think there should be a way to define the compat mode from the
>>>> beginning without the need to reboot the machine from itself. 
>>> 
>>> That is a different problem which I do not how to solve in a way to make
>>> everybody happy. Add logical CPUs to every CPU family (at least for
>>> POWER7/7+/8)?
>> 
> 
>> I'm not sure the CPU is really the right place to put this information.
>> After all, you are not changing the CPU itself, you're only changing a
>> few bits inside of that CPU that make it appear closer to the legacy CPU
>> plus a few device tree changes.
> 
>> So IMHO this whole thing should be orthogonal to -cpu.
> 
> Well, since we cannot change CPU class on the fly, yes, it should be a
> "compatibility" flags/properties/methods/whatever of the default CPU for
> the specific family.

Since it's machine global, it could just as well be a machine option, no? Or can you have multiple CPUs with different compat modes in a single system?

> 
> 
>>>> That way
>>>> management tools can straight on create POWER6 compat machines without
>>>> jumping through reboot hoops.
>>> 
>>> One of the examples (came from Paul) is:
>>> the host runs on POWER8, the guest boots a kernel which is capable of
>>> POWER7 only + POWER7-compat. We do this reboot tweak and boot in
>>> POWER7-compat mode. Then the guest does "yum update" and gets POWER8 kernel
>>> installed so when it reboots, it will boot in normal POWER8 mode. Everybody
>>> is happy.
>>> 
>>> Having POWER7-compat mode set from the very beginning will break this
>>> behaviour.
>> 
>> Why? Just because you're on POWER7 as default doesn't mean you can't bump to a newer compat later on, no?
> 
> Bump when exactly? And it won't be a new compat, it will be a native CPU. I

If you configure your guest to boot in POWER7-compat mode on your POWER8 box and it then tells you through ibm,client-architecture-support that it can do POWER8, we can just remove all the compat bits and be happy, no?

> thought you are totally against reboots and you want libvirt (for example)
> to detect that this was ibm,client-architecture-support and reboot the
> guest with new CPU type (or machine option).

I'm not sure which way the best one forward is, but at first we need to get something working that feels natural for people coming from x86 as well as pHyp.

So I think we should enable both use cases. We should do the on-the-fly transitioning in QEMU with the chance of libvirt intercepting it. I don't think the intercepting part is a hard requirement today, but we should keep it in mind to know the full picture.

That way a client-architecture-support rtas call can either reboot the system and automatically do "the right thing" and/or it can tell libvirt via QMP that we changed the client-architecture-support which means libvirt now has the chance to reconfigure the default setting next time it spawns a VM.

Which gets us to the third bit. You need to be able to tell QEMU what the "default" client-architecture-support level is when you boot a VM to make sure that a POWER7 guest on a POWER8 system doesn't always reboot to reconfigure itself first thing when it boots up. As Ben mentioned, migration is obviously a strong point for this one too.

So what do we need?

  - machine flag to indicate compat level
  - machine reset evaluates compat level, adjusts VM accordingly
  - rtas call merely modifies machine flag and triggers a reset
  - optional: QMP notifier when the rtas call changed the machine flag


Alex
Anthony Liguori Sept. 4, 2013, 9:32 p.m. UTC | #8
On Wed, Sep 4, 2013 at 8:37 AM, Alexander Graf <agraf@suse.de> wrote:
>
>>
>>> So IMHO this whole thing should be orthogonal to -cpu.
>>
>> Well, since we cannot change CPU class on the fly, yes, it should be a
>> "compatibility" flags/properties/methods/whatever of the default CPU for
>> the specific family.
>
> Since it's machine global, it could just as well be a machine option, no? Or can you have multiple CPUs with different compat modes in a single system?

AFAIK, this has nothing to do with CPUs.

>>> Why? Just because you're on POWER7 as default doesn't mean you can't bump to a newer compat later on, no?
>>
>> Bump when exactly? And it won't be a new compat, it will be a native CPU. I
>
> If you configure your guest to boot in POWER7-compat mode on your POWER8 box and it then tells you through ibm,client-architecture-support that it can do POWER8, we can just remove all the compat bits and be happy, no?

POWER8 is compatible with POWER7, right?  There's no magic bits that
need to be disabled AFAIK.

The effective version if exposed through device tree.  The reason a
reboot is needed is because the device tree needs to be updated (which
can't be done without a reboot).

The problem to solve is delaying device tree generation in order to
avoid the reboots, no?

(Maybe it's time to start thinking of non-PAPR interfaces that Linux
KVM guests can use to avoid a lot of this silliness...)

>> thought you are totally against reboots and you want libvirt (for example)
>> to detect that this was ibm,client-architecture-support and reboot the
>> guest with new CPU type (or machine option).
>
> I'm not sure which way the best one forward is, but at first we need to get something working that feels natural for people coming from x86 as well as pHyp.
>
> So I think we should enable both use cases. We should do the on-the-fly transitioning in QEMU with the chance of libvirt intercepting it. I don't think the intercepting part is a hard requirement today, but we should keep it in mind to know the full picture.
>
> That way a client-architecture-support rtas call can either reboot the system and automatically do "the right thing" and/or it can tell libvirt via QMP that we changed the client-architecture-support which means libvirt now has the chance to reconfigure the default setting next time it spawns a VM.
>
> Which gets us to the third bit. You need to be able to tell QEMU what the "default" client-architecture-support level is when you boot a VM to make sure that a POWER7 guest on a POWER8 system doesn't always reboot to reconfigure itself first thing when it boots up. As Ben mentioned, migration is obviously a strong point for this one too.
>
> So what do we need?
>
>   - machine flag to indicate compat level
>   - machine reset evaluates compat level, adjusts VM accordingly
>   - rtas call merely modifies machine flag and triggers a reset
>   - optional: QMP notifier when the rtas call changed the machine flag

Regards,

Anthony Liguori

>
>
> Alex
>
>
Paul Mackerras Sept. 5, 2013, 10:16 a.m. UTC | #9
On Wed, Sep 04, 2013 at 04:32:20PM -0500, Anthony Liguori wrote:
> On Wed, Sep 4, 2013 at 8:37 AM, Alexander Graf <agraf@suse.de> wrote:
> >
> >>
> >>> So IMHO this whole thing should be orthogonal to -cpu.
> >>
> >> Well, since we cannot change CPU class on the fly, yes, it should be a
> >> "compatibility" flags/properties/methods/whatever of the default CPU for
> >> the specific family.
> >
> > Since it's machine global, it could just as well be a machine option, no? Or can you have multiple CPUs with different compat modes in a single system?
> 
> AFAIK, this has nothing to do with CPUs.

I'm not sure what you mean by that; it has to do with CPUs since it
means changing the CPUs' behaviour, at least for user-mode programs.

> >>> Why? Just because you're on POWER7 as default doesn't mean you can't bump to a newer compat later on, no?
> >>
> >> Bump when exactly? And it won't be a new compat, it will be a native CPU. I
> >
> > If you configure your guest to boot in POWER7-compat mode on your POWER8 box and it then tells you through ibm,client-architecture-support that it can do POWER8, we can just remove all the compat bits and be happy, no?

Answering Alex here -- if we want to preserve the option of migrating
to a POWER7 host in future, we would run the guest in POWER7 compat
mode even if the current host is a POWER8 and the guest knows about
POWER8.

> POWER8 is compatible with POWER7, right?  There's no magic bits that
> need to be disabled AFAIK.

POWER8 is a superset of POWER7, yes, but what we want to do when
running an old OS and applications that don't know about POWER8 is to
disable the new POWER8 features in usermode (e.g., transactional
memory, various other new instructions, etc.).  The reason is to make
it more likely that applications won't misbehave even if they do
silly things like trying instructions that aren't defined on POWER7
(or at least, they will behave the same as they would on a real
POWER7).

To this end, the processor has a Processor Compatibility Register
(PCR) which has bits which disables the new instructions and SPRs when
the processor is in user mode - actually 2 bits, one which disables
the features that were new for POWER8 compared to POWER7, and one
which disables the features that were new for POWER7 compared to
POWER6.

> The effective version if exposed through device tree.  The reason a
> reboot is needed is because the device tree needs to be updated (which
> can't be done without a reboot).
> 
> The problem to solve is delaying device tree generation in order to
> avoid the reboots, no?

No, we can't delay generating the device tree to that point.  SLOF
needs the device tree, and the kernel does look at the device tree
before calling the ibm,client-architecture-support method.

> (Maybe it's time to start thinking of non-PAPR interfaces that Linux
> KVM guests can use to avoid a lot of this silliness...)

That won't help us boot (e.g.) RHEL6 or SLES11 on a POWER8, which is
what this is ultimately about.

In any case, there are changes we might need to make that will
definitely need a reboot - changing the size of the hashed page table,
for instance.

Paul.
Alexander Graf Sept. 5, 2013, 10:19 a.m. UTC | #10
On 05.09.2013, at 12:16, Paul Mackerras wrote:

> On Wed, Sep 04, 2013 at 04:32:20PM -0500, Anthony Liguori wrote:
>> On Wed, Sep 4, 2013 at 8:37 AM, Alexander Graf <agraf@suse.de> wrote:
>>> 
>>>> 
>>>>> So IMHO this whole thing should be orthogonal to -cpu.
>>>> 
>>>> Well, since we cannot change CPU class on the fly, yes, it should be a
>>>> "compatibility" flags/properties/methods/whatever of the default CPU for
>>>> the specific family.
>>> 
>>> Since it's machine global, it could just as well be a machine option, no? Or can you have multiple CPUs with different compat modes in a single system?
>> 
>> AFAIK, this has nothing to do with CPUs.
> 
> I'm not sure what you mean by that; it has to do with CPUs since it
> means changing the CPUs' behaviour, at least for user-mode programs.
> 
>>>>> Why? Just because you're on POWER7 as default doesn't mean you can't bump to a newer compat later on, no?
>>>> 
>>>> Bump when exactly? And it won't be a new compat, it will be a native CPU. I
>>> 
>>> If you configure your guest to boot in POWER7-compat mode on your POWER8 box and it then tells you through ibm,client-architecture-support that it can do POWER8, we can just remove all the compat bits and be happy, no?
> 
> Answering Alex here -- if we want to preserve the option of migrating
> to a POWER7 host in future, we would run the guest in POWER7 compat
> mode even if the current host is a POWER8 and the guest knows about
> POWER8.

Yes, so we boot the guest with compat mode set to POWER7, then the guest calls ibm,client-architecutre-support including POWER8 and then we can reconfigure the guest to be POWER8, right?


Alex
Paul Mackerras Sept. 5, 2013, 11:55 a.m. UTC | #11
On Thu, Sep 05, 2013 at 12:19:09PM +0200, Alexander Graf wrote:
> 
> On 05.09.2013, at 12:16, Paul Mackerras wrote:
> 
> > On Wed, Sep 04, 2013 at 04:32:20PM -0500, Anthony Liguori wrote:
> >> On Wed, Sep 4, 2013 at 8:37 AM, Alexander Graf <agraf@suse.de> wrote:
> >>> 
> >>>> 
> >>>>> So IMHO this whole thing should be orthogonal to -cpu.
> >>>> 
> >>>> Well, since we cannot change CPU class on the fly, yes, it should be a
> >>>> "compatibility" flags/properties/methods/whatever of the default CPU for
> >>>> the specific family.
> >>> 
> >>> Since it's machine global, it could just as well be a machine option, no? Or can you have multiple CPUs with different compat modes in a single system?
> >> 
> >> AFAIK, this has nothing to do with CPUs.
> > 
> > I'm not sure what you mean by that; it has to do with CPUs since it
> > means changing the CPUs' behaviour, at least for user-mode programs.
> > 
> >>>>> Why? Just because you're on POWER7 as default doesn't mean you can't bump to a newer compat later on, no?
> >>>> 
> >>>> Bump when exactly? And it won't be a new compat, it will be a native CPU. I
> >>> 
> >>> If you configure your guest to boot in POWER7-compat mode on your POWER8 box and it then tells you through ibm,client-architecture-support that it can do POWER8, we can just remove all the compat bits and be happy, no?
> > 
> > Answering Alex here -- if we want to preserve the option of migrating
> > to a POWER7 host in future, we would run the guest in POWER7 compat
> > mode even if the current host is a POWER8 and the guest knows about
> > POWER8.
> 
> Yes, so we boot the guest with compat mode set to POWER7, then the guest calls ibm,client-architecutre-support including POWER8 and then we can reconfigure the guest to be POWER8, right?

Not if we want to be able to migrate to a real POWER7 later.  If we
tell the guest it's a POWER8, it will start using POWER8 features, and
then break when we migrate it to a POWER7 where those features don't
exist.  If we run the POWER8 in POWER7 compatibility mode (and more
importantly, the device tree says it's a 2.06 architecture processor),
it should only use POWER7 features and then work just fine when
migrated to a real POWER7.

Paul.
Alexander Graf Sept. 5, 2013, 1:04 p.m. UTC | #12
On 05.09.2013, at 13:55, Paul Mackerras wrote:

> On Thu, Sep 05, 2013 at 12:19:09PM +0200, Alexander Graf wrote:
>> 
>> On 05.09.2013, at 12:16, Paul Mackerras wrote:
>> 
>>> On Wed, Sep 04, 2013 at 04:32:20PM -0500, Anthony Liguori wrote:
>>>> On Wed, Sep 4, 2013 at 8:37 AM, Alexander Graf <agraf@suse.de> wrote:
>>>>> 
>>>>>> 
>>>>>>> So IMHO this whole thing should be orthogonal to -cpu.
>>>>>> 
>>>>>> Well, since we cannot change CPU class on the fly, yes, it should be a
>>>>>> "compatibility" flags/properties/methods/whatever of the default CPU for
>>>>>> the specific family.
>>>>> 
>>>>> Since it's machine global, it could just as well be a machine option, no? Or can you have multiple CPUs with different compat modes in a single system?
>>>> 
>>>> AFAIK, this has nothing to do with CPUs.
>>> 
>>> I'm not sure what you mean by that; it has to do with CPUs since it
>>> means changing the CPUs' behaviour, at least for user-mode programs.
>>> 
>>>>>>> Why? Just because you're on POWER7 as default doesn't mean you can't bump to a newer compat later on, no?
>>>>>> 
>>>>>> Bump when exactly? And it won't be a new compat, it will be a native CPU. I
>>>>> 
>>>>> If you configure your guest to boot in POWER7-compat mode on your POWER8 box and it then tells you through ibm,client-architecture-support that it can do POWER8, we can just remove all the compat bits and be happy, no?
>>> 
>>> Answering Alex here -- if we want to preserve the option of migrating
>>> to a POWER7 host in future, we would run the guest in POWER7 compat
>>> mode even if the current host is a POWER8 and the guest knows about
>>> POWER8.
>> 
>> Yes, so we boot the guest with compat mode set to POWER7, then the guest calls ibm,client-architecutre-support including POWER8 and then we can reconfigure the guest to be POWER8, right?
> 
> Not if we want to be able to migrate to a real POWER7 later.  If we
> tell the guest it's a POWER8, it will start using POWER8 features, and
> then break when we migrate it to a POWER7 where those features don't
> exist.  If we run the POWER8 in POWER7 compatibility mode (and more
> importantly, the device tree says it's a 2.06 architecture processor),
> it should only use POWER7 features and then work just fine when
> migrated to a real POWER7.

Ah, ok. So we want to be able to specify 2 things:

  - current compat level
  - max compat level

That way we can boot a POWER8 virtual machine in POWER7 compat mode straight on boot, but allow that machine to later reconfigure itself if it isn't configured to POWER7 as max compat level.

The current level is mostly a convenience thing to not have to reboot SLOF every time you boot up a guest. The max level is what gives you compatibility throughout a cluster of machines.

Or do I miss anything here?


Alex
diff mbox

Patch

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 13574bf..5adf53c 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -238,6 +238,16 @@  static int spapr_fixup_cpu_dt(void *fdt, sPAPREnvironment *spapr)
         if (ret < 0) {
             return ret;
         }
+
+        if (spapr->pvr_new) {
+            ret = fdt_setprop(fdt, offset, "cpu-version",
+                              &spapr->pvr_new, sizeof(spapr->pvr_new));
+            if (ret < 0) {
+                return ret;
+            }
+            /* Reset as the guest after reboot may give other PVR set */
+            spapr->pvr_new = 0;
+        }
     }
     return ret;
 }
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 9f6e7b8..509de89 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -3,6 +3,7 @@ 
 #include "helper_regs.h"
 #include "hw/ppc/spapr.h"
 #include "mmu-hash64.h"
+#include "cpu-models.h"
 
 static target_ulong h_random(PowerPCCPU *cpu, sPAPREnvironment *spapr,
                            target_ulong opcode, target_ulong *args)
@@ -792,6 +793,78 @@  out:
     return ret;
 }
 
+static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
+                                                  sPAPREnvironment *spapr,
+                                                  target_ulong opcode,
+                                                  target_ulong *args)
+{
+    target_ulong list = args[0];
+    int i, number_of_option_vectors;
+    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
+    bool cpu_match = false;
+    unsigned compat_cpu_level = 0, pvr_new;
+
+    /* Parse PVR list */
+    for ( ; ; ) {
+        uint32_t pvr, pvr_mask;
+
+        pvr_mask = ldl_phys(list);
+        list += 4;
+        pvr = ldl_phys(list);
+        list += 4;
+
+        if ((cpu->env.spr[SPR_PVR] & pvr_mask) == (pvr & pvr_mask)) {
+            cpu_match = true;
+            pvr_new = cpu->env.spr[SPR_PVR];
+        }
+
+        /* Is it a logical PVR? */
+        if ((pvr & CPU_POWERPC_LOGICAL_MASK) == CPU_POWERPC_LOGICAL_MASK) {
+            switch (pvr) {
+            case CPU_POWERPC_LOGICAL_2_05:
+                if ((pcc->pcr & POWERPC_ISA_COMPAT_2_05) &&
+                    (compat_cpu_level < 2050)) {
+                    compat_cpu_level = 2050;
+                    pvr_new = pvr;
+                }
+                break;
+            case CPU_POWERPC_LOGICAL_2_06:
+                if ((pcc->pcr & POWERPC_ISA_COMPAT_2_06) &&
+                    (compat_cpu_level < 2060)) {
+                    compat_cpu_level = 2060;
+                    pvr_new = pvr;
+                }
+                break;
+            case CPU_POWERPC_LOGICAL_2_06_PLUS:
+                if ((pcc->pcr & POWERPC_ISA_COMPAT_2_06) &&
+                    (compat_cpu_level < 2061)) {
+                    compat_cpu_level = 2061;
+                    pvr_new = pvr;
+                }
+                break;
+            }
+        }
+
+        if (~pvr_mask & pvr) {
+            break;
+        }
+    }
+
+    /* Parsing finished. Making decision whether to reboot or not */
+    if (cpu_match) {
+        return H_SUCCESS;
+    }
+    if (pvr_new == cpu->env.spr[SPR_PVR]) {
+        return H_SUCCESS;
+    }
+
+    cpu->env.spr[SPR_PVR] = pvr_new;
+    spapr->pvr_new = pvr_new;
+    qemu_system_reset_request();
+
+    return H_FUNCTION;
+}
+
 static spapr_hcall_fn papr_hypercall_table[(MAX_HCALL_OPCODE / 4) + 1];
 static spapr_hcall_fn kvmppc_hypercall_table[KVMPPC_HCALL_MAX - KVMPPC_HCALL_BASE + 1];
 
@@ -879,6 +952,9 @@  static void hypercall_register_types(void)
     spapr_register_hypercall(KVMPPC_H_RTAS, h_rtas);
 
     spapr_register_hypercall(H_SET_MODE, h_set_mode);
+
+    /* ibm,client-architecture-support support */
+    spapr_register_hypercall(KVMPPC_H_CAS, h_client_architecture_support);
 }
 
 type_init(hypercall_register_types)
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 995ddcf..e2158ff 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -44,6 +44,9 @@  typedef struct sPAPREnvironment {
         uint64_t middle_iterations, middle_chunks, middle_valid, middle_invalid;
         uint64_t final_chunks, final_valid, final_invalid;
     } migstats;
+
+    uint32_t pvr_new;
+
 } sPAPREnvironment;
 
 #define H_SUCCESS         0
@@ -304,7 +307,9 @@  typedef struct sPAPREnvironment {
 #define KVMPPC_HCALL_BASE       0xf000
 #define KVMPPC_H_RTAS           (KVMPPC_HCALL_BASE + 0x0)
 #define KVMPPC_H_LOGICAL_MEMOP  (KVMPPC_HCALL_BASE + 0x1)
-#define KVMPPC_HCALL_MAX        KVMPPC_H_LOGICAL_MEMOP
+/* Client Architecture support */
+#define KVMPPC_H_CAS            (KVMPPC_HCALL_BASE + 0x2)
+#define KVMPPC_HCALL_MAX        KVMPPC_H_CAS
 
 extern sPAPREnvironment *spapr;
 
diff --git a/target-ppc/cpu-models.h b/target-ppc/cpu-models.h
index 49ba4a4..a4e8763 100644
--- a/target-ppc/cpu-models.h
+++ b/target-ppc/cpu-models.h
@@ -583,6 +583,13 @@  enum {
     CPU_POWERPC_RS64II             = 0x00340000,
     CPU_POWERPC_RS64III            = 0x00360000,
     CPU_POWERPC_RS64IV             = 0x00370000,
+
+    /* Logical CPUs */
+    CPU_POWERPC_LOGICAL_MASK       = 0x0F000000,
+    CPU_POWERPC_LOGICAL_2_05       = 0x0F000002,
+    CPU_POWERPC_LOGICAL_2_06       = 0x0F000003,
+    CPU_POWERPC_LOGICAL_2_06_PLUS  = 0x0F100003,
+
 #endif /* defined(TARGET_PPC64) */
     /* Original POWER */
     /* XXX: should be POWER (RIOS), RSC3308, RSC4608,
@@ -745,4 +752,10 @@  enum {
     POWERPC_SVR_8641D              = 0x80900121,
 };
 
+/* Processor Compatibility Register (PCR) */
+enum {
+    POWERPC_ISA_COMPAT_2_05 = 1ULL << 62,
+    POWERPC_ISA_COMPAT_2_06 = 1ULL << 61,
+};
+
 #endif
diff --git a/target-ppc/cpu-qom.h b/target-ppc/cpu-qom.h
index 0ae8b09..702ae96 100644
--- a/target-ppc/cpu-qom.h
+++ b/target-ppc/cpu-qom.h
@@ -55,6 +55,7 @@  typedef struct PowerPCCPUClass {
 
     uint32_t pvr;
     uint32_t pvr_mask;
+    uint64_t pcr;
     uint32_t svr;
     uint64_t insns_flags;
     uint64_t insns_flags2;
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index e8cc3c8..645f54d 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -7249,6 +7249,7 @@  POWERPC_FAMILY(POWER7)(ObjectClass *oc, void *data)
     dc->desc = "POWER7";
     pcc->pvr = CPU_POWERPC_POWER7_BASE;
     pcc->pvr_mask = CPU_POWERPC_POWER7_MASK;
+    pcc->pcr = POWERPC_ISA_COMPAT_2_05 | POWERPC_ISA_COMPAT_2_06;
     pcc->init_proc = init_proc_POWER7;
     pcc->check_pow = check_pow_nocheck;
     pcc->insns_flags = PPC_INSNS_BASE | PPC_ISEL | PPC_STRING | PPC_MFTB |
@@ -7286,6 +7287,7 @@  POWERPC_FAMILY(POWER7P)(ObjectClass *oc, void *data)
     dc->desc = "POWER7+";
     pcc->pvr = CPU_POWERPC_POWER7P_BASE;
     pcc->pvr_mask = CPU_POWERPC_POWER7P_MASK;
+    pcc->pcr = POWERPC_ISA_COMPAT_2_05 | POWERPC_ISA_COMPAT_2_06;
     pcc->init_proc = init_proc_POWER7;
     pcc->check_pow = check_pow_nocheck;
     pcc->insns_flags = PPC_INSNS_BASE | PPC_ISEL | PPC_STRING | PPC_MFTB |
@@ -7323,6 +7325,7 @@  POWERPC_FAMILY(POWER8)(ObjectClass *oc, void *data)
     dc->desc = "POWER8";
     pcc->pvr = CPU_POWERPC_POWER8_BASE;
     pcc->pvr_mask = CPU_POWERPC_POWER8_MASK;
+    pcc->pcr = POWERPC_ISA_COMPAT_2_05 | POWERPC_ISA_COMPAT_2_06;
     pcc->init_proc = init_proc_POWER7;
     pcc->check_pow = check_pow_nocheck;
     pcc->insns_flags = PPC_INSNS_BASE | PPC_STRING | PPC_MFTB |