diff mbox series

[2/2] spapr: Adjust default VSMT value for better migration compatibility

Message ID 20180115072715.25921-3-david@gibson.dropbear.id.au
State New
Headers show
Series Further VSMT fixes | expand

Commit Message

David Gibson Jan. 15, 2018, 7:27 a.m. UTC
fa98fbfc "PC: KVM: Support machine option to set VSMT mode" introduced the
"vsmt" parameter for the pseries machine type, which controls the spacing
of the vcpu ids of thread 0 for each virtual core.  This was done to bring
some consistency and stability to how that was done, while still allowing
backwards compatibility for migration and otherwise.

The default value we used for vsmt was set to the max of the host's
advertised default number of threads and the number of vthreads per vcore
in the guest.  This was done to continue running without extra parameters
on older KVM versions which don't allow the VSMT value to be changed.

Unfortunately, even that smaller than before leakage of host configuration
into guest visible configuration still breaks things.  Specifically a guest
with 4 (or less) vthread/vcore will get a different vsmt value when
running on a POWER8 (vsmt==8) and POWER9 (vsmt==4) host.  That means the
vcpu ids don't line up so you can't migrate between them, though you should
be able to.

Long term we really want to make vsmt == smp_threads for sufficiently
new machine types.  However, that means that qemu will then require a
sufficiently recent KVM (one which supports changing VSMT) - that's still
not widely enough deployed to be really comfortable to do.

In the meantime we some default that will work as often as possible.
This patch changes that default to 8 in all circumstances.  This does
change guest visible behaviour (including for existing machine versions)
for many cases - just not the most common/important case.

Following is case by case justification for why this is still the least
worst option.  Note that any of the old behaviours can still be duplicated
after this patch, it's just that it requires manual intervention by
setting the vsmt property on the command line.

KVM HV on POWER8 host:
   This is the overwhelmingly common case in production setups, and is
   unchanged by design.  POWER8 hosts will advertise a default VSMT mode
   of 8, and > 8 vthreads/vcore isn't permitted

KVM HV on POWER7 host:
   Will break, but POWER7s allowing KVM were never released to the public.

KVM HV on POWER9 host:
   Not yet released to the public, breaking this now will reduce other
   breakage later.

KVM HV on PowerPC 970:
   Will theoretically break it, but it was barely supported to begin with
   and already required various user visible hacks to work.  Also so old
   that I just don't care.

TCG:
   This is the nastiest one; it means migration of TCG guests (without
   manual vsmt setting) will break.  Since TCG is rarely used in production
   I think this is worth it for the other benefits.  It does also remove
   one more barrier to TCG<->KVM migration which could be interesting for
   debugging applications.

KVM PR:
   As with TCG, this will break migration of existing configurations,
   without adding extra manual vsmt options.  As with TCG, it is rare in
   production so I think the benefits outweigh breakages.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

Comments

Laurent Vivier Jan. 15, 2018, 7:53 a.m. UTC | #1
On 15/01/2018 08:27, David Gibson wrote:
> fa98fbfc "PC: KVM: Support machine option to set VSMT mode" introduced the
> "vsmt" parameter for the pseries machine type, which controls the spacing
> of the vcpu ids of thread 0 for each virtual core.  This was done to bring
> some consistency and stability to how that was done, while still allowing
> backwards compatibility for migration and otherwise.
> 
> The default value we used for vsmt was set to the max of the host's
> advertised default number of threads and the number of vthreads per vcore
> in the guest.  This was done to continue running without extra parameters
> on older KVM versions which don't allow the VSMT value to be changed.
> 
> Unfortunately, even that smaller than before leakage of host configuration
> into guest visible configuration still breaks things.  Specifically a guest
> with 4 (or less) vthread/vcore will get a different vsmt value when
> running on a POWER8 (vsmt==8) and POWER9 (vsmt==4) host.  That means the
> vcpu ids don't line up so you can't migrate between them, though you should
> be able to.
> 
> Long term we really want to make vsmt == smp_threads for sufficiently
> new machine types.  However, that means that qemu will then require a
> sufficiently recent KVM (one which supports changing VSMT) - that's still
> not widely enough deployed to be really comfortable to do.
> 
> In the meantime we some default that will work as often as possible.
> This patch changes that default to 8 in all circumstances.  This does
> change guest visible behaviour (including for existing machine versions)
> for many cases - just not the most common/important case.
> 
> Following is case by case justification for why this is still the least
> worst option.  Note that any of the old behaviours can still be duplicated
> after this patch, it's just that it requires manual intervention by
> setting the vsmt property on the command line.
> 
> KVM HV on POWER8 host:
>    This is the overwhelmingly common case in production setups, and is
>    unchanged by design.  POWER8 hosts will advertise a default VSMT mode
>    of 8, and > 8 vthreads/vcore isn't permitted
> 
> KVM HV on POWER7 host:
>    Will break, but POWER7s allowing KVM were never released to the public.
> 
> KVM HV on POWER9 host:
>    Not yet released to the public, breaking this now will reduce other
>    breakage later.
> 
> KVM HV on PowerPC 970:
>    Will theoretically break it, but it was barely supported to begin with
>    and already required various user visible hacks to work.  Also so old
>    that I just don't care.
> 
> TCG:
>    This is the nastiest one; it means migration of TCG guests (without
>    manual vsmt setting) will break.  Since TCG is rarely used in production
>    I think this is worth it for the other benefits.  It does also remove
>    one more barrier to TCG<->KVM migration which could be interesting for
>    debugging applications.
> 
> KVM PR:
>    As with TCG, this will break migration of existing configurations,
>    without adding extra manual vsmt options.  As with TCG, it is rare in
>    production so I think the benefits outweigh breakages.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/ppc/spapr.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)

Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Greg Kurz Jan. 15, 2018, 9:48 a.m. UTC | #2
On Mon, 15 Jan 2018 18:27:15 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> fa98fbfc "PC: KVM: Support machine option to set VSMT mode" introduced the
> "vsmt" parameter for the pseries machine type, which controls the spacing
> of the vcpu ids of thread 0 for each virtual core.  This was done to bring
> some consistency and stability to how that was done, while still allowing
> backwards compatibility for migration and otherwise.
> 
> The default value we used for vsmt was set to the max of the host's
> advertised default number of threads and the number of vthreads per vcore
> in the guest.  This was done to continue running without extra parameters
> on older KVM versions which don't allow the VSMT value to be changed.
> 
> Unfortunately, even that smaller than before leakage of host configuration
> into guest visible configuration still breaks things.  Specifically a guest
> with 4 (or less) vthread/vcore will get a different vsmt value when
> running on a POWER8 (vsmt==8) and POWER9 (vsmt==4) host.  That means the
> vcpu ids don't line up so you can't migrate between them, though you should
> be able to.
> 
> Long term we really want to make vsmt == smp_threads for sufficiently
> new machine types.  However, that means that qemu will then require a
> sufficiently recent KVM (one which supports changing VSMT) - that's still
> not widely enough deployed to be really comfortable to do.
> 
> In the meantime we some default that will work as often as possible.

s/we some/we need some/ ?

> This patch changes that default to 8 in all circumstances.  This does
> change guest visible behaviour (including for existing machine versions)
> for many cases - just not the most common/important case.
> 
> Following is case by case justification for why this is still the least
> worst option.  Note that any of the old behaviours can still be duplicated
> after this patch, it's just that it requires manual intervention by
> setting the vsmt property on the command line.
> 

IIUC this unconditionally breaks existing setups that rely on static
Micro-Threading on a POWER8 host (eg, subcores-per-core=2 on the host
and smp_threads=4). I have no evidence this is a widely used setup,
but FWIW it is documented in some IBM RedBooks:

"Performance Optimization and Tuning Techniques for IBM Power Systems
 Processors Including IBM POWER8"

http://www.redbooks.ibm.com/abstracts/sg248171.html?Open

"IBM PowerKVM: Configuration and Use"

http://www.redbooks.ibm.com/abstracts/sg248231.html?Open

Maybe the new behaviour could be added for new machine types only ?

Anyway, in case you don't want extra complexity,

Reviewed-by: Greg Kurz <groug@kaod.org>

> KVM HV on POWER8 host:
>    This is the overwhelmingly common case in production setups, and is
>    unchanged by design.  POWER8 hosts will advertise a default VSMT mode
>    of 8, and > 8 vthreads/vcore isn't permitted
> 
> KVM HV on POWER7 host:
>    Will break, but POWER7s allowing KVM were never released to the public.
> 
> KVM HV on POWER9 host:
>    Not yet released to the public, breaking this now will reduce other
>    breakage later.
> 
> KVM HV on PowerPC 970:
>    Will theoretically break it, but it was barely supported to begin with
>    and already required various user visible hacks to work.  Also so old
>    that I just don't care.
> 
> TCG:
>    This is the nastiest one; it means migration of TCG guests (without
>    manual vsmt setting) will break.  Since TCG is rarely used in production
>    I think this is worth it for the other benefits.  It does also remove
>    one more barrier to TCG<->KVM migration which could be interesting for
>    debugging applications.
> 
> KVM PR:
>    As with TCG, this will break migration of existing configurations,
>    without adding extra manual vsmt options.  As with TCG, it is rare in
>    production so I think the benefits outweigh breakages.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/ppc/spapr.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index e35214bfc3..8e5ef7c9de 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2305,9 +2305,14 @@ static void spapr_set_vsmt_mode(sPAPRMachineState *spapr, Error **errp)
>          }
>          /* In this case, spapr->vsmt has been set by the command line */
>      } else {
> -        /* Choose a VSMT mode that may be higher than necessary but is
> -         * likely to be compatible with hosts that don't have VSMT. */
> -        spapr->vsmt = MAX(kvm_smt, smp_threads);
> +        /*
> +         * Default VSMT value is tricky, because we need it to be as
> +         * consistent as possible (for migration), but this requires
> +         * changing it for at least some existing cases.  We pick 8 as
> +         * the value that we'd get with KVM on POWER8, the
> +         * overwhelmingly common case in production systems.
> +         */
> +        spapr->vsmt = 8;
>      }
>  
>      /* KVM: If necessary, set the SMT mode: */
Jose Ricardo Ziviani Jan. 15, 2018, 10:38 a.m. UTC | #3
On Mon, Jan 15, 2018 at 06:27:15PM +1100, David Gibson wrote:
> fa98fbfc "PC: KVM: Support machine option to set VSMT mode" introduced the
> "vsmt" parameter for the pseries machine type, which controls the spacing
> of the vcpu ids of thread 0 for each virtual core.  This was done to bring
> some consistency and stability to how that was done, while still allowing
> backwards compatibility for migration and otherwise.
> 
> The default value we used for vsmt was set to the max of the host's
> advertised default number of threads and the number of vthreads per vcore
> in the guest.  This was done to continue running without extra parameters
> on older KVM versions which don't allow the VSMT value to be changed.
> 
> Unfortunately, even that smaller than before leakage of host configuration
> into guest visible configuration still breaks things.  Specifically a guest
> with 4 (or less) vthread/vcore will get a different vsmt value when
> running on a POWER8 (vsmt==8) and POWER9 (vsmt==4) host.  That means the
> vcpu ids don't line up so you can't migrate between them, though you should
> be able to.
> 
> Long term we really want to make vsmt == smp_threads for sufficiently
> new machine types.  However, that means that qemu will then require a
> sufficiently recent KVM (one which supports changing VSMT) - that's still
> not widely enough deployed to be really comfortable to do.
> 
> In the meantime we some default that will work as often as possible.
> This patch changes that default to 8 in all circumstances.  This does
> change guest visible behaviour (including for existing machine versions)
> for many cases - just not the most common/important case.
> 
> Following is case by case justification for why this is still the least
> worst option.  Note that any of the old behaviours can still be duplicated
> after this patch, it's just that it requires manual intervention by
> setting the vsmt property on the command line.
> 
> KVM HV on POWER8 host:
>    This is the overwhelmingly common case in production setups, and is
>    unchanged by design.  POWER8 hosts will advertise a default VSMT mode
>    of 8, and > 8 vthreads/vcore isn't permitted
> 
> KVM HV on POWER7 host:
>    Will break, but POWER7s allowing KVM were never released to the public.
> 
> KVM HV on POWER9 host:
>    Not yet released to the public, breaking this now will reduce other
>    breakage later.
> 
> KVM HV on PowerPC 970:
>    Will theoretically break it, but it was barely supported to begin with
>    and already required various user visible hacks to work.  Also so old
>    that I just don't care.
> 
> TCG:
>    This is the nastiest one; it means migration of TCG guests (without
>    manual vsmt setting) will break.  Since TCG is rarely used in production
>    I think this is worth it for the other benefits.  It does also remove
>    one more barrier to TCG<->KVM migration which could be interesting for
>    debugging applications.
> 
> KVM PR:
>    As with TCG, this will break migration of existing configurations,
>    without adding extra manual vsmt options.  As with TCG, it is rare in
>    production so I think the benefits outweigh breakages.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/ppc/spapr.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index e35214bfc3..8e5ef7c9de 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2305,9 +2305,14 @@ static void spapr_set_vsmt_mode(sPAPRMachineState *spapr, Error **errp)
>          }
>          /* In this case, spapr->vsmt has been set by the command line */
>      } else {
> -        /* Choose a VSMT mode that may be higher than necessary but is
> -         * likely to be compatible with hosts that don't have VSMT. */
> -        spapr->vsmt = MAX(kvm_smt, smp_threads);
> +        /*
> +         * Default VSMT value is tricky, because we need it to be as
> +         * consistent as possible (for migration), but this requires
> +         * changing it for at least some existing cases.  We pick 8 as
> +         * the value that we'd get with KVM on POWER8, the
> +         * overwhelmingly common case in production systems.
> +         */
> +        spapr->vsmt = 8;
>      }
> 
>      /* KVM: If necessary, set the SMT mode: */
> -- 
> 2.14.3
> 

Great rationale.

Reviewed-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
David Gibson Jan. 16, 2018, 4:42 a.m. UTC | #4
On Mon, Jan 15, 2018 at 10:48:47AM +0100, Greg Kurz wrote:
> On Mon, 15 Jan 2018 18:27:15 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > fa98fbfc "PC: KVM: Support machine option to set VSMT mode" introduced the
> > "vsmt" parameter for the pseries machine type, which controls the spacing
> > of the vcpu ids of thread 0 for each virtual core.  This was done to bring
> > some consistency and stability to how that was done, while still allowing
> > backwards compatibility for migration and otherwise.
> > 
> > The default value we used for vsmt was set to the max of the host's
> > advertised default number of threads and the number of vthreads per vcore
> > in the guest.  This was done to continue running without extra parameters
> > on older KVM versions which don't allow the VSMT value to be changed.
> > 
> > Unfortunately, even that smaller than before leakage of host configuration
> > into guest visible configuration still breaks things.  Specifically a guest
> > with 4 (or less) vthread/vcore will get a different vsmt value when
> > running on a POWER8 (vsmt==8) and POWER9 (vsmt==4) host.  That means the
> > vcpu ids don't line up so you can't migrate between them, though you should
> > be able to.
> > 
> > Long term we really want to make vsmt == smp_threads for sufficiently
> > new machine types.  However, that means that qemu will then require a
> > sufficiently recent KVM (one which supports changing VSMT) - that's still
> > not widely enough deployed to be really comfortable to do.
> > 
> > In the meantime we some default that will work as often as possible.
> 
> s/we some/we need some/ ?

Corrected.

> > This patch changes that default to 8 in all circumstances.  This does
> > change guest visible behaviour (including for existing machine versions)
> > for many cases - just not the most common/important case.
> > 
> > Following is case by case justification for why this is still the least
> > worst option.  Note that any of the old behaviours can still be duplicated
> > after this patch, it's just that it requires manual intervention by
> > setting the vsmt property on the command line.
> > 
> 
> IIUC this unconditionally breaks existing setups that rely on static
> Micro-Threading on a POWER8 host (eg, subcores-per-core=2 on the host
> and smp_threads=4). I have no evidence this is a widely used setup,
> but FWIW it is documented in some IBM RedBooks:

Well.. it will break migration between old and new qemu on the
microthreaded setup,  but fix it between new qemu on microthreaded
setup and new qemu on non-microthreaded setup (old qemu on
microthreaded to old qemu on non-microthreaded was already broken for
the same reasons as p8<->p9).  It's not really obvious to me which is
preferable.

> "Performance Optimization and Tuning Techniques for IBM Power Systems
>  Processors Including IBM POWER8"
> 
> http://www.redbooks.ibm.com/abstracts/sg248171.html?Open
> 
> "IBM PowerKVM: Configuration and Use"
> 
> http://www.redbooks.ibm.com/abstracts/sg248231.html?Open
>
> Maybe the new behaviour could be added for new machine types only ?

I'd really prefer not to.  It makes some existing cases work, but
breaks some other cases.  Given that the old behaviour is inherently
wrong, I'm more inclined to change it.
diff mbox series

Patch

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index e35214bfc3..8e5ef7c9de 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2305,9 +2305,14 @@  static void spapr_set_vsmt_mode(sPAPRMachineState *spapr, Error **errp)
         }
         /* In this case, spapr->vsmt has been set by the command line */
     } else {
-        /* Choose a VSMT mode that may be higher than necessary but is
-         * likely to be compatible with hosts that don't have VSMT. */
-        spapr->vsmt = MAX(kvm_smt, smp_threads);
+        /*
+         * Default VSMT value is tricky, because we need it to be as
+         * consistent as possible (for migration), but this requires
+         * changing it for at least some existing cases.  We pick 8 as
+         * the value that we'd get with KVM on POWER8, the
+         * overwhelmingly common case in production systems.
+         */
+        spapr->vsmt = 8;
     }
 
     /* KVM: If necessary, set the SMT mode: */