diff mbox

target-ppc: Add POWER8E_v2.1 CPU model.

Message ID 1436327021-14744-1-git-send-email-david@gibson.dropbear.id.au
State New
Headers show

Commit Message

David Gibson July 8, 2015, 3:43 a.m. UTC
From: Andrea Bolognani <abologna@redhat.com>

Add a missing PVR value for the POWER8E v2.1 CPU.  Information taken
from the kernel cputable.

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 target-ppc/cpu-models.c | 4 +++-
 target-ppc/cpu-models.h | 1 +
 2 files changed, 4 insertions(+), 1 deletion(-)

Alex,

Not sure if this counts as a bugfix which can be merged now we're in
the hard freeze.  The lack of it does mean we can't work on one of our
dev machines which has this CPU.

Comments

Alexey Kardashevskiy July 8, 2015, 4:10 a.m. UTC | #1
On 07/08/2015 01:43 PM, David Gibson wrote:
> From: Andrea Bolognani <abologna@redhat.com>
>
> Add a missing PVR value for the POWER8E v2.1 CPU.  Information taken
> from the kernel cputable.
>
> Signed-off-by: Andrea Bolognani <abologna@redhat.com>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>   target-ppc/cpu-models.c | 4 +++-
>   target-ppc/cpu-models.h | 1 +
>   2 files changed, 4 insertions(+), 1 deletion(-)
>
> Alex,
>
> Not sure if this counts as a bugfix which can be merged now we're in
> the hard freeze.  The lack of it does mean we can't work on one of our
> dev machines which has this CPU.


Why is this a stopper? We stopped bothering with exact PVRs some time ago 
and -cpu POWER8 or -cpu host still work.


>
> diff --git a/target-ppc/cpu-models.c b/target-ppc/cpu-models.c
> index 4d5ab4b..9d8769b 100644
> --- a/target-ppc/cpu-models.c
> +++ b/target-ppc/cpu-models.c
> @@ -1140,6 +1140,8 @@
>                   "POWER7+ v2.1")
>       POWERPC_DEF("POWER8E_v1.0",  CPU_POWERPC_POWER8E_v10,            POWER8,
>                   "POWER8E v1.0")
> +    POWERPC_DEF("POWER8E_v2.1",  CPU_POWERPC_POWER8E_v21,            POWER8,
> +                "POWER8E v2.1")
>       POWERPC_DEF("POWER8_v1.0",   CPU_POWERPC_POWER8_v10,             POWER8,
>                   "POWER8 v1.0")
>       POWERPC_DEF("970_v2.2",      CPU_POWERPC_970_v22,                970,
> @@ -1389,7 +1391,7 @@ PowerPCCPUAlias ppc_cpu_aliases[] = {
>       { "POWER5gs", "POWER5+_v2.1" },
>       { "POWER7", "POWER7_v2.3" },
>       { "POWER7+", "POWER7+_v2.1" },
> -    { "POWER8E", "POWER8E_v1.0" },
> +    { "POWER8E", "POWER8E_v2.1" },
>       { "POWER8", "POWER8_v1.0" },
>       { "970", "970_v2.2" },
>       { "970fx", "970fx_v3.1" },
> diff --git a/target-ppc/cpu-models.h b/target-ppc/cpu-models.h
> index 9d80e72..add31c6 100644
> --- a/target-ppc/cpu-models.h
> +++ b/target-ppc/cpu-models.h
> @@ -558,6 +558,7 @@ enum {
>       CPU_POWERPC_POWER7P_v21        = 0x004A0201,
>       CPU_POWERPC_POWER8E_BASE       = 0x004B0000,
>       CPU_POWERPC_POWER8E_v10        = 0x004B0100,
> +    CPU_POWERPC_POWER8E_v21        = 0x004B0201,
>       CPU_POWERPC_POWER8_BASE        = 0x004D0000,
>       CPU_POWERPC_POWER8_v10         = 0x004D0100,
>       CPU_POWERPC_970_v22            = 0x00390202,
>
David Gibson July 8, 2015, 5:37 a.m. UTC | #2
On Wed, Jul 08, 2015 at 02:10:28PM +1000, Alexey Kardashevskiy wrote:
> On 07/08/2015 01:43 PM, David Gibson wrote:
> >From: Andrea Bolognani <abologna@redhat.com>
> >
> >Add a missing PVR value for the POWER8E v2.1 CPU.  Information taken
> >from the kernel cputable.
> >
> >Signed-off-by: Andrea Bolognani <abologna@redhat.com>
> >Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> >---
> >  target-ppc/cpu-models.c | 4 +++-
> >  target-ppc/cpu-models.h | 1 +
> >  2 files changed, 4 insertions(+), 1 deletion(-)
> >
> >Alex,
> >
> >Not sure if this counts as a bugfix which can be merged now we're in
> >the hard freeze.  The lack of it does mean we can't work on one of our
> >dev machines which has this CPU.
> 
> 
> Why is this a stopper? We stopped bothering with exact PVRs some time ago
> and -cpu POWER8 or -cpu host still work.

Andrea, can you clarify?

I think it's because libvirt likes to specify a specific CPU - and if
it gets the new PVR from the host, qemu won't understand it.
Alexey Kardashevskiy July 8, 2015, 6:40 a.m. UTC | #3
On 07/08/2015 03:37 PM, David Gibson wrote:
> On Wed, Jul 08, 2015 at 02:10:28PM +1000, Alexey Kardashevskiy wrote:
>> On 07/08/2015 01:43 PM, David Gibson wrote:
>>> From: Andrea Bolognani <abologna@redhat.com>
>>>
>>> Add a missing PVR value for the POWER8E v2.1 CPU.  Information taken
>> >from the kernel cputable.
>>>
>>> Signed-off-by: Andrea Bolognani <abologna@redhat.com>
>>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>>> ---
>>>   target-ppc/cpu-models.c | 4 +++-
>>>   target-ppc/cpu-models.h | 1 +
>>>   2 files changed, 4 insertions(+), 1 deletion(-)
>>>
>>> Alex,
>>>
>>> Not sure if this counts as a bugfix which can be merged now we're in
>>> the hard freeze.  The lack of it does mean we can't work on one of our
>>> dev machines which has this CPU.
>>
>>
>> Why is this a stopper? We stopped bothering with exact PVRs some time ago
>> and -cpu POWER8 or -cpu host still work.
>
> Andrea, can you clarify?
>
> I think it's because libvirt likes to specify a specific CPU - and if
> it gets the new PVR from the host, qemu won't understand it.


A specific CPU in this case is "POWER8", I added this specifically for 
libvirt (to allow migration between all versions of POWER8), it should not 
use versioned CPUs and it does not in powerkvm.

There is actually a patch to make this PVR masking/subclassing nicer, I 
replied to it with adding you in cc:, please have a look. Andreas Faerber 
had objections which I did not really grasp then.
David Gibson July 8, 2015, 6:45 a.m. UTC | #4
On Wed, Jul 08, 2015 at 04:40:27PM +1000, Alexey Kardashevskiy wrote:
> On 07/08/2015 03:37 PM, David Gibson wrote:
> >On Wed, Jul 08, 2015 at 02:10:28PM +1000, Alexey Kardashevskiy wrote:
> >>On 07/08/2015 01:43 PM, David Gibson wrote:
> >>>From: Andrea Bolognani <abologna@redhat.com>
> >>>
> >>>Add a missing PVR value for the POWER8E v2.1 CPU.  Information taken
> >>>from the kernel cputable.
> >>>
> >>>Signed-off-by: Andrea Bolognani <abologna@redhat.com>
> >>>Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> >>>---
> >>>  target-ppc/cpu-models.c | 4 +++-
> >>>  target-ppc/cpu-models.h | 1 +
> >>>  2 files changed, 4 insertions(+), 1 deletion(-)
> >>>
> >>>Alex,
> >>>
> >>>Not sure if this counts as a bugfix which can be merged now we're in
> >>>the hard freeze.  The lack of it does mean we can't work on one of our
> >>>dev machines which has this CPU.
> >>
> >>
> >>Why is this a stopper? We stopped bothering with exact PVRs some time ago
> >>and -cpu POWER8 or -cpu host still work.
> >
> >Andrea, can you clarify?
> >
> >I think it's because libvirt likes to specify a specific CPU - and if
> >it gets the new PVR from the host, qemu won't understand it.
> 
> 
> A specific CPU in this case is "POWER8", I added this specifically for
> libvirt (to allow migration between all versions of POWER8), it should not
> use versioned CPUs and it does not in powerkvm.

Uh.. won't that make qemu attempt to set a specific PVR, though -
which will fail with recent KVM if it's not *exactly* the same as the
host PVR.

> 
> There is actually a patch to make this PVR masking/subclassing nicer, I
> replied to it with adding you in cc:, please have a look. Andreas Faerber
> had objections which I did not really grasp then.
> 
>
Alexey Kardashevskiy July 8, 2015, 7:02 a.m. UTC | #5
On 07/08/2015 04:45 PM, David Gibson wrote:
> On Wed, Jul 08, 2015 at 04:40:27PM +1000, Alexey Kardashevskiy wrote:
>> On 07/08/2015 03:37 PM, David Gibson wrote:
>>> On Wed, Jul 08, 2015 at 02:10:28PM +1000, Alexey Kardashevskiy wrote:
>>>> On 07/08/2015 01:43 PM, David Gibson wrote:
>>>>> From: Andrea Bolognani <abologna@redhat.com>
>>>>>
>>>>> Add a missing PVR value for the POWER8E v2.1 CPU.  Information taken
>>>> >from the kernel cputable.
>>>>>
>>>>> Signed-off-by: Andrea Bolognani <abologna@redhat.com>
>>>>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>>>>> ---
>>>>>   target-ppc/cpu-models.c | 4 +++-
>>>>>   target-ppc/cpu-models.h | 1 +
>>>>>   2 files changed, 4 insertions(+), 1 deletion(-)
>>>>>
>>>>> Alex,
>>>>>
>>>>> Not sure if this counts as a bugfix which can be merged now we're in
>>>>> the hard freeze.  The lack of it does mean we can't work on one of our
>>>>> dev machines which has this CPU.
>>>>
>>>>
>>>> Why is this a stopper? We stopped bothering with exact PVRs some time ago
>>>> and -cpu POWER8 or -cpu host still work.
>>>
>>> Andrea, can you clarify?
>>>
>>> I think it's because libvirt likes to specify a specific CPU - and if
>>> it gets the new PVR from the host, qemu won't understand it.
>>
>>
>> A specific CPU in this case is "POWER8", I added this specifically for
>> libvirt (to allow migration between all versions of POWER8), it should not
>> use versioned CPUs and it does not in powerkvm.
>
> Uh.. won't that make qemu attempt to set a specific PVR, though -
> which will fail with recent KVM if it's not *exactly* the same as the
> host PVR.

The "POWER8" CPU class is created dynamically (like the "host" CPU class) 
and has the actual host PVR so setting it to KVM cannot fail.

When TCG, the class is not registered and alias is used instead.



>> There is actually a patch to make this PVR masking/subclassing nicer, I
>> replied to it with adding you in cc:, please have a look. Andreas Faerber
>> had objections which I did not really grasp then.
>>
>>
>
Andrea Bolognani July 8, 2015, 4:35 p.m. UTC | #6
On Wed, 2015-07-08 at 17:02 +1000, Alexey Kardashevskiy wrote:
> 
> > > > I think it's because libvirt likes to specify a specific CPU -
> > > > and if
> > > > it gets the new PVR from the host, qemu won't understand it.
> > > 
> > > A specific CPU in this case is "POWER8", I added this
> > > specifically for
> > > libvirt (to allow migration between all versions of POWER8), it
> > > should not
> > > use versioned CPUs and it does not in powerkvm.

Does "all versions of POWER8" include things like POWER8E, POWER8NVL
and "POWER8 DD1", as one of the variations is known in the kernel
source? Can we safely migrate guests eg. from a POWER8 v1.0 host to
a POWER8E v2.1 host?

> > Uh.. won't that make qemu attempt to set a specific PVR, though -
> > which will fail with recent KVM if it's not *exactly* the same as
> > the
> > host PVR.
> 
> The "POWER8" CPU class is created dynamically (like the "host" CPU
> class)
> and has the actual host PVR so setting it to KVM cannot fail.
> 
> When TCG, the class is not registered and alias is used instead.

Can you please point out where the dynamical creation happens for the
POWER8 class? I haven't been able to locate it, but then again I'm
all but familiar with QEMU's source :)

From libvirt's point of view, it would be nice to be able to identify
as "POWER8" anything that looks like it, by matching the host's PVR
agains a numer of known PVRs (with relative bitmask). Ideally, if the
host can be identified as POWER8, that's the only CPU model libvirt
should advertise...

Cheers.
Alexey Kardashevskiy July 9, 2015, 2:35 a.m. UTC | #7
On 07/09/2015 02:35 AM, Andrea Bolognani wrote:
> On Wed, 2015-07-08 at 17:02 +1000, Alexey Kardashevskiy wrote:
>>
>>>>> I think it's because libvirt likes to specify a specific CPU -
>>>>> and if
>>>>> it gets the new PVR from the host, qemu won't understand it.
>>>>
>>>> A specific CPU in this case is "POWER8", I added this
>>>> specifically for
>>>> libvirt (to allow migration between all versions of POWER8), it
>>>> should not
>>>> use versioned CPUs and it does not in powerkvm.
>
> Does "all versions of POWER8" include things like POWER8E, POWER8NVL
> and "POWER8 DD1", as one of the variations is known in the kernel
> source? Can we safely migrate guests eg. from a POWER8 v1.0 host to
> a POWER8E v2.1 host?

Yes. afaik the only difference between POWER8 and POWER8E is how many cores 
are packed into an actual chip.


>>> Uh.. won't that make qemu attempt to set a specific PVR, though -
>>> which will fail with recent KVM if it's not *exactly* the same as
>>> the
>>> host PVR.
>>
>> The "POWER8" CPU class is created dynamically (like the "host" CPU
>> class)
>> and has the actual host PVR so setting it to KVM cannot fail.
>>
>> When TCG, the class is not registered and alias is used instead.
>
> Can you please point out where the dynamical creation happens for the
> POWER8 class? I haven't been able to locate it, but then again I'm
> all but familiar with QEMU's source :)


kvm_ppc_register_host_cpu_type(), last chunk:

/* Register generic family CPU class for a family */
pvr_pcc = ppc_cpu_get_family_class(pvr_pcc);
dc = DEVICE_CLASS(pvr_pcc);
type_info.parent = object_class_get_name(OBJECT_CLASS(pvr_pcc));
type_info.name = g_strdup_printf("%s-"TYPE_POWERPC_CPU, dc->desc);
type_register(&type_info);


>  From libvirt's point of view, it would be nice to be able to identify
> as "POWER8" anything that looks like it, by matching the host's PVR
> agains a numer of known PVRs (with relative bitmask). Ideally, if the
> host can be identified as POWER8, that's the only CPU model libvirt
> should advertise...

I have a very little idea about libvirt here. QEMU considers everything 
with 0x004dxxxx and 0x004bxxxx as POWER8 (ppc_pvr_match_power8() helper) 
and supports migration between these.

I am adding Shiva to the coversation, he might enlighen us how this is 
solved by powerkvm's libvirt.
Andrea Bolognani July 9, 2015, 10:03 a.m. UTC | #8
On Thu, 2015-07-09 at 12:35 +1000, Alexey Kardashevskiy wrote:
> 
> > Does "all versions of POWER8" include things like POWER8E,
> > POWER8NVL
> > and "POWER8 DD1", as one of the variations is known in the kernel
> > source? Can we safely migrate guests eg. from a POWER8 v1.0 host to
> > a POWER8E v2.1 host?
> 
> Yes. afaik the only difference between POWER8 and POWER8E is how many
> cores
> are packed into an actual chip.

If I'm reading the kernel source[1] correctly, there are actually
subtle
differences other than the number of cores:

  #define CPU_FTRS_POWER8 (/* Bunch of features here */)
  #define CPU_FTRS_POWER8E (CPU_FTRS_POWER8 | CPU_FTR_PMAO_BUG)
  #define CPU_FTRS_POWER8_DD1 (CPU_FTRS_POWER8 & ~CPU_FTR_DBELL)

Whether or not these differences will cause issues, I have no idea :)

> >  From libvirt's point of view, it would be nice to be able to
> > identify
> > as "POWER8" anything that looks like it, by matching the host's PVR
> > agains a numer of known PVRs (with relative bitmask). Ideally, if
> > the
> > host can be identified as POWER8, that's the only CPU model libvirt
> > should advertise...
> 
> I have a very little idea about libvirt here. QEMU considers
> everything
> with 0x004dxxxx and 0x004bxxxx as POWER8 (ppc_pvr_match_power8()
> helper)
> and supports migration between these.

Looking again at the kernel source[2]:

  {   /* Power8E */
      .pvr_mask               = 0xffff0000,
      .pvr_value              = 0x004b0000,
      .cpu_name               = "POWER8E (raw)",
      .cpu_features           = CPU_FTRS_POWER8E,
      .cpu_user_features      = COMMON_USER_POWER8,
      .cpu_user_features2     = COMMON_USER2_POWER8,
      .mmu_features           = MMU_FTRS_POWER8,
      .icache_bsize           = 128,
      .dcache_bsize           = 128,
      .num_pmcs               = 6,
      .pmc_type               = PPC_PMC_IBM,
      .oprofile_cpu_type      = "ppc64/power8",
      .oprofile_type          = PPC_OPROFILE_INVALID,
      .cpu_setup              = __setup_cpu_power8,
      .cpu_restore            = __restore_cpu_power8,
      .flush_tlb              = __flush_tlb_power8,
      .machine_check_early    = __machine_check_early_realmode_p8,
      .platform               = "power8",
  },
  {   /* Power8NVL */
      .pvr_mask               = 0xffff0000,
      .pvr_value              = 0x004c0000,
      .cpu_name               = "POWER8NVL (raw)",
      .cpu_features           = CPU_FTRS_POWER8,
      /* Same as above */
  },
  {   /* Power8 DD1: Does not support doorbell IPIs */
      .pvr_mask               = 0xffffff00,
      .pvr_value              = 0x004d0100,
      .cpu_name               = "POWER8 (raw)",
      .cpu_features           = CPU_FTRS_POWER8_DD1,
      /* Same as above */
  },
  {   /* Power8 */
      .pvr_mask               = 0xffff0000,
      .pvr_value              = 0x004d0000,
      .cpu_name               = "POWER8 (raw)",
      .cpu_features           = CPU_FTRS_POWER8,
      /* Same as above */
  },

So the PVR matching, as done currently in QEMU, will include POWER8DD1
but exclude POWER8NVL, which according to to commit ddee09c0 and the
code
above is absolutely identical to POWER8.

libvirt currently considers the guest CPU model to be compatible with
the host CPU model if both have the same PVR, which is clearly far from
optimal.

If we can rely on the above CPU families, as identified by pvr_value
and pvr_mask, to behave exactly the same for our purposes[3], then we
can change libvirt to perform the same kind of PVR matching as QEMU and
report to the user that the guest CPU model POWER8 is compatible with
all of the above host CPU models.

> I am adding Shiva to the coversation, he might enlighen us how this
> is
> solved by powerkvm's libvirt.

Sure, the more the merrier :)

Cheers.


[1] arch/powerpc/include/asm/cputable.h
[2] arch/powerpc/kernel/cputable.c
[3] eg. a guest running on a POWER8E host can be migrated to a
    POWER8DD1 host, despite the fact that the former has
CPU_FTR_PMAO_BUG
    and the latter lacks CPU_FTR_DBELL.
Alexey Kardashevskiy July 10, 2015, 5:19 a.m. UTC | #9
On 07/09/2015 08:03 PM, Andrea Bolognani wrote:
> On Thu, 2015-07-09 at 12:35 +1000, Alexey Kardashevskiy wrote:
>>
>>> Does "all versions of POWER8" include things like POWER8E,
>>> POWER8NVL
>>> and "POWER8 DD1", as one of the variations is known in the kernel
>>> source? Can we safely migrate guests eg. from a POWER8 v1.0 host to
>>> a POWER8E v2.1 host?
>>
>> Yes. afaik the only difference between POWER8 and POWER8E is how many
>> cores
>> are packed into an actual chip.
>
> If I'm reading the kernel source[1] correctly, there are actually
> subtle
> differences other than the number of cores:
>
>    #define CPU_FTRS_POWER8 (/* Bunch of features here */)
>    #define CPU_FTRS_POWER8E (CPU_FTRS_POWER8 | CPU_FTR_PMAO_BUG)


POWER8E was the first family of POWER8. Some revisions of POWER8E have this 
bug. The bug is that if we migrate from a good POWER8 to POWER8 with this 
bug, perf counters will stop working as the source kernel did not detect 
broken CPU and did not enable a workaround. We do not really know how many 
of this chips are out there (not many I believe) and how many have this 
bug. And also I guess that more people will be annoyed by inability to 
migrate rather than by broken perf.

I'd leave migration enabled but it is worth mentioning somewhere in user's 
guide that if the user migrates a guest to POWER8E with the specific PVR, 
perf might break.


>    #define CPU_FTRS_POWER8_DD1 (CPU_FTRS_POWER8 & ~CPU_FTR_DBELL)

This bug appears in POWER8E and POWER8 1.0 ("DD1"). The bug is that when a 
CPU thread wakes up after nap because of doorbell message, the doorbell 
interrupt is not pending. Guests cannot do nap themselves (they use H_CEDE 
hypercall for this), when a thread wakes up, it wakes in the host context 
so there is nothing to handle in the guest, it is purely for KVM to workaround.




> Whether or not these differences will cause issues, I have no idea :)
>
>>>   From libvirt's point of view, it would be nice to be able to
>>> identify
>>> as "POWER8" anything that looks like it, by matching the host's PVR
>>> agains a numer of known PVRs (with relative bitmask). Ideally, if
>>> the
>>> host can be identified as POWER8, that's the only CPU model libvirt
>>> should advertise...
>>
>> I have a very little idea about libvirt here. QEMU considers
>> everything
>> with 0x004dxxxx and 0x004bxxxx as POWER8 (ppc_pvr_match_power8()
>> helper)
>> and supports migration between these.
>
> Looking again at the kernel source[2]:
>
>    {   /* Power8E */
>        .pvr_mask               = 0xffff0000,
>        .pvr_value              = 0x004b0000,
>        .cpu_name               = "POWER8E (raw)",
>        .cpu_features           = CPU_FTRS_POWER8E,
>        .cpu_user_features      = COMMON_USER_POWER8,
>        .cpu_user_features2     = COMMON_USER2_POWER8,
>        .mmu_features           = MMU_FTRS_POWER8,
>        .icache_bsize           = 128,
>        .dcache_bsize           = 128,
>        .num_pmcs               = 6,
>        .pmc_type               = PPC_PMC_IBM,
>        .oprofile_cpu_type      = "ppc64/power8",
>        .oprofile_type          = PPC_OPROFILE_INVALID,
>        .cpu_setup              = __setup_cpu_power8,
>        .cpu_restore            = __restore_cpu_power8,
>        .flush_tlb              = __flush_tlb_power8,
>        .machine_check_early    = __machine_check_early_realmode_p8,
>        .platform               = "power8",
>    },
>    {   /* Power8NVL */
>        .pvr_mask               = 0xffff0000,
>        .pvr_value              = 0x004c0000,
>        .cpu_name               = "POWER8NVL (raw)",
>        .cpu_features           = CPU_FTRS_POWER8,
>        /* Same as above */
>    },
>    {   /* Power8 DD1: Does not support doorbell IPIs */
>        .pvr_mask               = 0xffffff00,
>        .pvr_value              = 0x004d0100,
>        .cpu_name               = "POWER8 (raw)",
>        .cpu_features           = CPU_FTRS_POWER8_DD1,
>        /* Same as above */
>    },
>    {   /* Power8 */
>        .pvr_mask               = 0xffff0000,
>        .pvr_value              = 0x004d0000,
>        .cpu_name               = "POWER8 (raw)",
>        .cpu_features           = CPU_FTRS_POWER8,
>        /* Same as above */
>    },
>
> So the PVR matching, as done currently in QEMU, will include POWER8DD1
> but exclude POWER8NVL, which according to to commit ddee09c0 and the
> code
> above is absolutely identical to POWER8.


Yeah, we have to add 0x004c0000 into the POWER8 family as well, I'll 
doublecheck.


> libvirt currently considers the guest CPU model to be compatible with
> the host CPU model if both have the same PVR, which is clearly far from
> optimal.
>
> If we can rely on the above CPU families, as identified by pvr_value
> and pvr_mask, to behave exactly the same for our purposes[3], then we
> can change libvirt to perform the same kind of PVR matching as QEMU and
> report to the user that the guest CPU model POWER8 is compatible with
> all of the above host CPU models.
>
>> I am adding Shiva to the coversation, he might enlighen us how this
>> is
>> solved by powerkvm's libvirt.
>
> Sure, the more the merrier :)
>
> Cheers.
>
>
> [1] arch/powerpc/include/asm/cputable.h
> [2] arch/powerpc/kernel/cputable.c
> [3] eg. a guest running on a POWER8E host can be migrated to a
>      POWER8DD1 host, despite the fact that the former has
> CPU_FTR_PMAO_BUG
>      and the latter lacks CPU_FTR_DBELL.
>
Andrea Bolognani July 10, 2015, 9:19 a.m. UTC | #10
On Fri, 2015-07-10 at 15:19 +1000, Alexey Kardashevskiy wrote:
> 
> > If I'm reading the kernel source[1] correctly, there are actually
> > subtle
> > differences other than the number of cores:
> > 
> >    #define CPU_FTRS_POWER8 (/* Bunch of features here */)
> >    #define CPU_FTRS_POWER8E (CPU_FTRS_POWER8 | CPU_FTR_PMAO_BUG)
> 
> POWER8E was the first family of POWER8. Some revisions of POWER8E
> have this
> bug. The bug is that if we migrate from a good POWER8 to POWER8 with
> this
> bug, perf counters will stop working as the source kernel did not
> detect
> broken CPU and did not enable a workaround. We do not really know how
> many
> of this chips are out there (not many I believe) and how many have
> this
> bug. And also I guess that more people will be annoyed by inability
> to
> migrate rather than by broken perf.
> 
> I'd leave migration enabled but it is worth mentioning somewhere in
> user's
> guide that if the user migrates a guest to POWER8E with the specific
> PVR,
> perf might break.

David said we might be able to just apply the kernel PMAO workaround
unconditionally, which would of course be great. Even if that turns out
not to be possible, I agree with you that allowing migration is more
important than not breaking performance monitoring, and documenting
this properly should be enough.

> >    #define CPU_FTRS_POWER8_DD1 (CPU_FTRS_POWER8 & ~CPU_FTR_DBELL)
> 
> This bug appears in POWER8E and POWER8 1.0 ("DD1"). The bug is that
> when a
> CPU thread wakes up after nap because of doorbell message, the
> doorbell
> interrupt is not pending. Guests cannot do nap themselves (they use
> H_CEDE
> hypercall for this), when a thread wakes up, it wakes in the host
> context
> so there is nothing to handle in the guest, it is purely for KVM to
> workaround.

Great news!

> > So the PVR matching, as done currently in QEMU, will include
> > POWER8DD1
> > but exclude POWER8NVL, which according to to commit ddee09c0 and
> > the
> > code
> > above is absolutely identical to POWER8.
> 
> Yeah, we have to add 0x004c0000 into the POWER8 family as well, I'll
> doublecheck.

Okay. libvirt will soon advertise just POWER8 instead of specific
models, and will consider the POWER8 guest CPU model to be compatible
with any host having a POWER8* CPU, using the same PVR values and
masks as the kernel. Once QEMU is updated to include POWER8NVL, the
behavior will be consistent across the stack.

Just to be sure: the same applies to POWER7 as well, right? It
certainly looks so from both the kernel and QEMU code.

Thanks a lot for all the precious bits of information you've provided.

Cheers.
Alexey Kardashevskiy July 13, 2015, 3:11 a.m. UTC | #11
On 07/10/2015 07:19 PM, Andrea Bolognani wrote:
> On Fri, 2015-07-10 at 15:19 +1000, Alexey Kardashevskiy wrote:
>>
>>> If I'm reading the kernel source[1] correctly, there are actually
>>> subtle
>>> differences other than the number of cores:
>>>
>>>     #define CPU_FTRS_POWER8 (/* Bunch of features here */)
>>>     #define CPU_FTRS_POWER8E (CPU_FTRS_POWER8 | CPU_FTR_PMAO_BUG)
>>
>> POWER8E was the first family of POWER8. Some revisions of POWER8E
>> have this
>> bug. The bug is that if we migrate from a good POWER8 to POWER8 with
>> this
>> bug, perf counters will stop working as the source kernel did not
>> detect
>> broken CPU and did not enable a workaround. We do not really know how
>> many
>> of this chips are out there (not many I believe) and how many have
>> this
>> bug. And also I guess that more people will be annoyed by inability
>> to
>> migrate rather than by broken perf.
>>
>> I'd leave migration enabled but it is worth mentioning somewhere in
>> user's
>> guide that if the user migrates a guest to POWER8E with the specific
>> PVR,
>> perf might break.
>
> David said we might be able to just apply the kernel PMAO workaround
> unconditionally, which would of course be great. Even if that turns out
> not to be possible, I agree with you that allowing migration is more
> important than not breaking performance monitoring, and documenting
> this properly should be enough.
>
>>>     #define CPU_FTRS_POWER8_DD1 (CPU_FTRS_POWER8 & ~CPU_FTR_DBELL)
>>
>> This bug appears in POWER8E and POWER8 1.0 ("DD1"). The bug is that
>> when a
>> CPU thread wakes up after nap because of doorbell message, the
>> doorbell
>> interrupt is not pending. Guests cannot do nap themselves (they use
>> H_CEDE
>> hypercall for this), when a thread wakes up, it wakes in the host
>> context
>> so there is nothing to handle in the guest, it is purely for KVM to
>> workaround.
>
> Great news!
>
>>> So the PVR matching, as done currently in QEMU, will include
>>> POWER8DD1
>>> but exclude POWER8NVL, which according to to commit ddee09c0 and
>>> the
>>> code
>>> above is absolutely identical to POWER8.
>>
>> Yeah, we have to add 0x004c0000 into the POWER8 family as well, I'll
>> doublecheck.
>
> Okay. libvirt will soon advertise just POWER8 instead of specific
> models, and will consider the POWER8 guest CPU model to be compatible
> with any host having a POWER8* CPU, using the same PVR values and
> masks as the kernel. Once QEMU is updated to include POWER8NVL, the
> behavior will be consistent across the stack.


Correct.

> Just to be sure: the same applies to POWER7 as well, right? It
> certainly looks so from both the kernel and QEMU code.

Right.


> Thanks a lot for all the precious bits of information you've provided.

I've learnt a lot too :)
diff mbox

Patch

diff --git a/target-ppc/cpu-models.c b/target-ppc/cpu-models.c
index 4d5ab4b..9d8769b 100644
--- a/target-ppc/cpu-models.c
+++ b/target-ppc/cpu-models.c
@@ -1140,6 +1140,8 @@ 
                 "POWER7+ v2.1")
     POWERPC_DEF("POWER8E_v1.0",  CPU_POWERPC_POWER8E_v10,            POWER8,
                 "POWER8E v1.0")
+    POWERPC_DEF("POWER8E_v2.1",  CPU_POWERPC_POWER8E_v21,            POWER8,
+                "POWER8E v2.1")
     POWERPC_DEF("POWER8_v1.0",   CPU_POWERPC_POWER8_v10,             POWER8,
                 "POWER8 v1.0")
     POWERPC_DEF("970_v2.2",      CPU_POWERPC_970_v22,                970,
@@ -1389,7 +1391,7 @@  PowerPCCPUAlias ppc_cpu_aliases[] = {
     { "POWER5gs", "POWER5+_v2.1" },
     { "POWER7", "POWER7_v2.3" },
     { "POWER7+", "POWER7+_v2.1" },
-    { "POWER8E", "POWER8E_v1.0" },
+    { "POWER8E", "POWER8E_v2.1" },
     { "POWER8", "POWER8_v1.0" },
     { "970", "970_v2.2" },
     { "970fx", "970fx_v3.1" },
diff --git a/target-ppc/cpu-models.h b/target-ppc/cpu-models.h
index 9d80e72..add31c6 100644
--- a/target-ppc/cpu-models.h
+++ b/target-ppc/cpu-models.h
@@ -558,6 +558,7 @@  enum {
     CPU_POWERPC_POWER7P_v21        = 0x004A0201,
     CPU_POWERPC_POWER8E_BASE       = 0x004B0000,
     CPU_POWERPC_POWER8E_v10        = 0x004B0100,
+    CPU_POWERPC_POWER8E_v21        = 0x004B0201,
     CPU_POWERPC_POWER8_BASE        = 0x004D0000,
     CPU_POWERPC_POWER8_v10         = 0x004D0100,
     CPU_POWERPC_970_v22            = 0x00390202,