diff mbox series

KVM: PPC: Book3S HV: Do not allocate HPT for a nested guest

Message ID 20200911041607.198092-1-farosas@linux.ibm.com
State New
Headers show
Series KVM: PPC: Book3S HV: Do not allocate HPT for a nested guest | expand

Checks

Context Check Description
snowpatch_ozlabs/needsstable success Patch has no Fixes tags
snowpatch_ozlabs/checkpatch success total: 0 errors, 0 warnings, 0 checks, 12 lines checked
snowpatch_ozlabs/build-pmac32 warning Upstream build failed, couldn't test patch
snowpatch_ozlabs/build-ppc64e warning Upstream build failed, couldn't test patch
snowpatch_ozlabs/build-ppc64be warning Upstream build failed, couldn't test patch
snowpatch_ozlabs/build-ppc64le warning Upstream build failed, couldn't test patch
snowpatch_ozlabs/apply_patch success Successfully applied on branch powerpc/merge (4b552a4cbf286ff9dcdab19153f3c1c7d1680fab)

Commit Message

Fabiano Rosas Sept. 11, 2020, 4:16 a.m. UTC
The current nested KVM code does not support HPT guests. This is
informed/enforced in some ways:

- Hosts < P9 will not be able to enable the nested HV feature;

- The nested hypervisor MMU capabilities will not contain
  KVM_CAP_PPC_MMU_HASH_V3;

- QEMU reflects the MMU capabilities in the
  'ibm,arch-vec-5-platform-support' device-tree property;

- The nested guest, at 'prom_parse_mmu_model' ignores the
  'disable_radix' kernel command line option if HPT is not supported;

- The KVM_PPC_CONFIGURE_V3_MMU ioctl will fail if trying to use HPT.

There is, however, still a way to start a HPT guest by using
max-compat-cpu=power8 at the QEMU machine options. This leads to the
guest being set to use hash after QEMU calls the KVM_PPC_ALLOCATE_HTAB
ioctl.

With the guest set to hash, the nested hypervisor goes through the
entry path that has no knowledge of nesting (kvmppc_run_vcpu) and
crashes when it tries to execute an hypervisor-privileged (mtspr
HDEC) instruction at __kvmppc_vcore_entry:

root@L1:~ $ qemu-system-ppc64 -machine pseries,max-cpu-compat=power8 ...

<snip>
[  538.543303] CPU: 83 PID: 25185 Comm: CPU 0/KVM Not tainted 5.9.0-rc4 #1
[  538.543355] NIP:  c00800000753f388 LR: c00800000753f368 CTR: c0000000001e5ec0
[  538.543417] REGS: c0000013e91e33b0 TRAP: 0700   Not tainted  (5.9.0-rc4)
[  538.543470] MSR:  8000000002843033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE>  CR: 22422882  XER: 20040000
[  538.543546] CFAR: c00800000753f4b0 IRQMASK: 3
               GPR00: c0080000075397a0 c0000013e91e3640 c00800000755e600 0000000080000000
               GPR04: 0000000000000000 c0000013eab19800 c000001394de0000 00000043a054db72
               GPR08: 00000000003b1652 0000000000000000 0000000000000000 c0080000075502e0
               GPR12: c0000000001e5ec0 c0000007ffa74200 c0000013eab19800 0000000000000008
               GPR16: 0000000000000000 c00000139676c6c0 c000000001d23948 c0000013e91e38b8
               GPR20: 0000000000000053 0000000000000000 0000000000000001 0000000000000000
               GPR24: 0000000000000001 0000000000000001 0000000000000000 0000000000000001
               GPR28: 0000000000000001 0000000000000053 c0000013eab19800 0000000000000001
[  538.544067] NIP [c00800000753f388] __kvmppc_vcore_entry+0x90/0x104 [kvm_hv]
[  538.544121] LR [c00800000753f368] __kvmppc_vcore_entry+0x70/0x104 [kvm_hv]
[  538.544173] Call Trace:
[  538.544196] [c0000013e91e3640] [c0000013e91e3680] 0xc0000013e91e3680 (unreliable)
[  538.544260] [c0000013e91e3820] [c0080000075397a0] kvmppc_run_core+0xbc8/0x19d0 [kvm_hv]
[  538.544325] [c0000013e91e39e0] [c00800000753d99c] kvmppc_vcpu_run_hv+0x404/0xc00 [kvm_hv]
[  538.544394] [c0000013e91e3ad0] [c0080000072da4fc] kvmppc_vcpu_run+0x34/0x48 [kvm]
[  538.544472] [c0000013e91e3af0] [c0080000072d61b8] kvm_arch_vcpu_ioctl_run+0x310/0x420 [kvm]
[  538.544539] [c0000013e91e3b80] [c0080000072c7450] kvm_vcpu_ioctl+0x298/0x778 [kvm]
[  538.544605] [c0000013e91e3ce0] [c0000000004b8c2c] sys_ioctl+0x1dc/0xc90
[  538.544662] [c0000013e91e3dc0] [c00000000002f9a4] system_call_exception+0xe4/0x1c0
[  538.544726] [c0000013e91e3e20] [c00000000000d140] system_call_common+0xf0/0x27c
[  538.544787] Instruction dump:
[  538.544821] f86d1098 60000000 60000000 48000099 e8ad0fe8 e8c500a0 e9264140 75290002
[  538.544886] 7d1602a6 7cec42a6 40820008 7d0807b4 <7d164ba6> 7d083a14 f90d10a0 480104fd
[  538.544953] ---[ end trace 74423e2b948c2e0c ]---

This patch makes the KVM_PPC_ALLOCATE_HTAB ioctl fail when running in
the nested hypervisor, causing QEMU to abort.

Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
---
 arch/powerpc/kvm/book3s_hv.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Greg Kurz Sept. 11, 2020, 7:45 a.m. UTC | #1
On Fri, 11 Sep 2020 01:16:07 -0300
Fabiano Rosas <farosas@linux.ibm.com> wrote:

> The current nested KVM code does not support HPT guests. This is
> informed/enforced in some ways:
> 
> - Hosts < P9 will not be able to enable the nested HV feature;
> 
> - The nested hypervisor MMU capabilities will not contain
>   KVM_CAP_PPC_MMU_HASH_V3;
> 
> - QEMU reflects the MMU capabilities in the
>   'ibm,arch-vec-5-platform-support' device-tree property;
> 
> - The nested guest, at 'prom_parse_mmu_model' ignores the
>   'disable_radix' kernel command line option if HPT is not supported;
> 
> - The KVM_PPC_CONFIGURE_V3_MMU ioctl will fail if trying to use HPT.
> 
> There is, however, still a way to start a HPT guest by using
> max-compat-cpu=power8 at the QEMU machine options. This leads to the
> guest being set to use hash after QEMU calls the KVM_PPC_ALLOCATE_HTAB
> ioctl.
> 
> With the guest set to hash, the nested hypervisor goes through the
> entry path that has no knowledge of nesting (kvmppc_run_vcpu) and
> crashes when it tries to execute an hypervisor-privileged (mtspr
> HDEC) instruction at __kvmppc_vcore_entry:
> 
> root@L1:~ $ qemu-system-ppc64 -machine pseries,max-cpu-compat=power8 ...
> 
> <snip>
> [  538.543303] CPU: 83 PID: 25185 Comm: CPU 0/KVM Not tainted 5.9.0-rc4 #1
> [  538.543355] NIP:  c00800000753f388 LR: c00800000753f368 CTR: c0000000001e5ec0
> [  538.543417] REGS: c0000013e91e33b0 TRAP: 0700   Not tainted  (5.9.0-rc4)
> [  538.543470] MSR:  8000000002843033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE>  CR: 22422882  XER: 20040000
> [  538.543546] CFAR: c00800000753f4b0 IRQMASK: 3
>                GPR00: c0080000075397a0 c0000013e91e3640 c00800000755e600 0000000080000000
>                GPR04: 0000000000000000 c0000013eab19800 c000001394de0000 00000043a054db72
>                GPR08: 00000000003b1652 0000000000000000 0000000000000000 c0080000075502e0
>                GPR12: c0000000001e5ec0 c0000007ffa74200 c0000013eab19800 0000000000000008
>                GPR16: 0000000000000000 c00000139676c6c0 c000000001d23948 c0000013e91e38b8
>                GPR20: 0000000000000053 0000000000000000 0000000000000001 0000000000000000
>                GPR24: 0000000000000001 0000000000000001 0000000000000000 0000000000000001
>                GPR28: 0000000000000001 0000000000000053 c0000013eab19800 0000000000000001
> [  538.544067] NIP [c00800000753f388] __kvmppc_vcore_entry+0x90/0x104 [kvm_hv]
> [  538.544121] LR [c00800000753f368] __kvmppc_vcore_entry+0x70/0x104 [kvm_hv]
> [  538.544173] Call Trace:
> [  538.544196] [c0000013e91e3640] [c0000013e91e3680] 0xc0000013e91e3680 (unreliable)
> [  538.544260] [c0000013e91e3820] [c0080000075397a0] kvmppc_run_core+0xbc8/0x19d0 [kvm_hv]
> [  538.544325] [c0000013e91e39e0] [c00800000753d99c] kvmppc_vcpu_run_hv+0x404/0xc00 [kvm_hv]
> [  538.544394] [c0000013e91e3ad0] [c0080000072da4fc] kvmppc_vcpu_run+0x34/0x48 [kvm]
> [  538.544472] [c0000013e91e3af0] [c0080000072d61b8] kvm_arch_vcpu_ioctl_run+0x310/0x420 [kvm]
> [  538.544539] [c0000013e91e3b80] [c0080000072c7450] kvm_vcpu_ioctl+0x298/0x778 [kvm]
> [  538.544605] [c0000013e91e3ce0] [c0000000004b8c2c] sys_ioctl+0x1dc/0xc90
> [  538.544662] [c0000013e91e3dc0] [c00000000002f9a4] system_call_exception+0xe4/0x1c0
> [  538.544726] [c0000013e91e3e20] [c00000000000d140] system_call_common+0xf0/0x27c
> [  538.544787] Instruction dump:
> [  538.544821] f86d1098 60000000 60000000 48000099 e8ad0fe8 e8c500a0 e9264140 75290002
> [  538.544886] 7d1602a6 7cec42a6 40820008 7d0807b4 <7d164ba6> 7d083a14 f90d10a0 480104fd
> [  538.544953] ---[ end trace 74423e2b948c2e0c ]---
> 
> This patch makes the KVM_PPC_ALLOCATE_HTAB ioctl fail when running in
> the nested hypervisor, causing QEMU to abort.
> 
> Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
> Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
> ---

LGTM

Reviewed-by: Greg Kurz <groug@kaod.org>

>  arch/powerpc/kvm/book3s_hv.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 4ba06a2a306c..764b6239ef72 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -5250,6 +5250,12 @@ static long kvm_arch_vm_ioctl_hv(struct file *filp,
>  	case KVM_PPC_ALLOCATE_HTAB: {
>  		u32 htab_order;
>  
> +		/* If we're a nested hypervisor, we currently only support radix */
> +		if (kvmhv_on_pseries()) {
> +			r = -EOPNOTSUPP;
> +			break;
> +		}
> +
>  		r = -EFAULT;
>  		if (get_user(htab_order, (u32 __user *)argp))
>  			break;
Greg Kurz Sept. 11, 2020, 8:01 a.m. UTC | #2
On Fri, 11 Sep 2020 09:45:36 +0200
Greg Kurz <groug@kaod.org> wrote:

> On Fri, 11 Sep 2020 01:16:07 -0300
> Fabiano Rosas <farosas@linux.ibm.com> wrote:
> 
> > The current nested KVM code does not support HPT guests. This is
> > informed/enforced in some ways:
> > 
> > - Hosts < P9 will not be able to enable the nested HV feature;
> > 
> > - The nested hypervisor MMU capabilities will not contain
> >   KVM_CAP_PPC_MMU_HASH_V3;
> > 
> > - QEMU reflects the MMU capabilities in the
> >   'ibm,arch-vec-5-platform-support' device-tree property;
> > 
> > - The nested guest, at 'prom_parse_mmu_model' ignores the
> >   'disable_radix' kernel command line option if HPT is not supported;
> > 
> > - The KVM_PPC_CONFIGURE_V3_MMU ioctl will fail if trying to use HPT.
> > 
> > There is, however, still a way to start a HPT guest by using
> > max-compat-cpu=power8 at the QEMU machine options. This leads to the
> > guest being set to use hash after QEMU calls the KVM_PPC_ALLOCATE_HTAB
> > ioctl.
> > 
> > With the guest set to hash, the nested hypervisor goes through the
> > entry path that has no knowledge of nesting (kvmppc_run_vcpu) and
> > crashes when it tries to execute an hypervisor-privileged (mtspr
> > HDEC) instruction at __kvmppc_vcore_entry:
> > 
> > root@L1:~ $ qemu-system-ppc64 -machine pseries,max-cpu-compat=power8 ...
> > 
> > <snip>
> > [  538.543303] CPU: 83 PID: 25185 Comm: CPU 0/KVM Not tainted 5.9.0-rc4 #1
> > [  538.543355] NIP:  c00800000753f388 LR: c00800000753f368 CTR: c0000000001e5ec0
> > [  538.543417] REGS: c0000013e91e33b0 TRAP: 0700   Not tainted  (5.9.0-rc4)
> > [  538.543470] MSR:  8000000002843033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE>  CR: 22422882  XER: 20040000
> > [  538.543546] CFAR: c00800000753f4b0 IRQMASK: 3
> >                GPR00: c0080000075397a0 c0000013e91e3640 c00800000755e600 0000000080000000
> >                GPR04: 0000000000000000 c0000013eab19800 c000001394de0000 00000043a054db72
> >                GPR08: 00000000003b1652 0000000000000000 0000000000000000 c0080000075502e0
> >                GPR12: c0000000001e5ec0 c0000007ffa74200 c0000013eab19800 0000000000000008
> >                GPR16: 0000000000000000 c00000139676c6c0 c000000001d23948 c0000013e91e38b8
> >                GPR20: 0000000000000053 0000000000000000 0000000000000001 0000000000000000
> >                GPR24: 0000000000000001 0000000000000001 0000000000000000 0000000000000001
> >                GPR28: 0000000000000001 0000000000000053 c0000013eab19800 0000000000000001
> > [  538.544067] NIP [c00800000753f388] __kvmppc_vcore_entry+0x90/0x104 [kvm_hv]
> > [  538.544121] LR [c00800000753f368] __kvmppc_vcore_entry+0x70/0x104 [kvm_hv]
> > [  538.544173] Call Trace:
> > [  538.544196] [c0000013e91e3640] [c0000013e91e3680] 0xc0000013e91e3680 (unreliable)
> > [  538.544260] [c0000013e91e3820] [c0080000075397a0] kvmppc_run_core+0xbc8/0x19d0 [kvm_hv]
> > [  538.544325] [c0000013e91e39e0] [c00800000753d99c] kvmppc_vcpu_run_hv+0x404/0xc00 [kvm_hv]
> > [  538.544394] [c0000013e91e3ad0] [c0080000072da4fc] kvmppc_vcpu_run+0x34/0x48 [kvm]
> > [  538.544472] [c0000013e91e3af0] [c0080000072d61b8] kvm_arch_vcpu_ioctl_run+0x310/0x420 [kvm]
> > [  538.544539] [c0000013e91e3b80] [c0080000072c7450] kvm_vcpu_ioctl+0x298/0x778 [kvm]
> > [  538.544605] [c0000013e91e3ce0] [c0000000004b8c2c] sys_ioctl+0x1dc/0xc90
> > [  538.544662] [c0000013e91e3dc0] [c00000000002f9a4] system_call_exception+0xe4/0x1c0
> > [  538.544726] [c0000013e91e3e20] [c00000000000d140] system_call_common+0xf0/0x27c
> > [  538.544787] Instruction dump:
> > [  538.544821] f86d1098 60000000 60000000 48000099 e8ad0fe8 e8c500a0 e9264140 75290002
> > [  538.544886] 7d1602a6 7cec42a6 40820008 7d0807b4 <7d164ba6> 7d083a14 f90d10a0 480104fd
> > [  538.544953] ---[ end trace 74423e2b948c2e0c ]---
> > 
> > This patch makes the KVM_PPC_ALLOCATE_HTAB ioctl fail when running in
> > the nested hypervisor, causing QEMU to abort.
> > 
> > Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
> > Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
> > ---
> 
> LGTM
> 
> Reviewed-by: Greg Kurz <groug@kaod.org>
> 
> >  arch/powerpc/kvm/book3s_hv.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > index 4ba06a2a306c..764b6239ef72 100644
> > --- a/arch/powerpc/kvm/book3s_hv.c
> > +++ b/arch/powerpc/kvm/book3s_hv.c
> > @@ -5250,6 +5250,12 @@ static long kvm_arch_vm_ioctl_hv(struct file *filp,
> >  	case KVM_PPC_ALLOCATE_HTAB: {
> >  		u32 htab_order;
> >  
> > +		/* If we're a nested hypervisor, we currently only support radix */
> > +		if (kvmhv_on_pseries()) {
> > +			r = -EOPNOTSUPP;

According to POSIX [1]:

[ENOTSUP]
Not supported. The implementation does not support the requested feature or value.

[EOPNOTSUPP]
Operation not supported on socket. The type of socket (address family or protocol) does not support the requested operation. A conforming implementation may assign the same values for [EOPNOTSUPP] and [ENOTSUP].

Even if these two happen to be equal in linux, it seems that ENOTSUP, which
doesn't refer to sockets, is more appropriate.

[1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html

> > +			break;
> > +		}
> > +
> >  		r = -EFAULT;
> >  		if (get_user(htab_order, (u32 __user *)argp))
> >  			break;
>
Michal Suchánek Sept. 11, 2020, 8:08 a.m. UTC | #3
On Fri, Sep 11, 2020 at 10:01:33AM +0200, Greg Kurz wrote:
> On Fri, 11 Sep 2020 09:45:36 +0200
> Greg Kurz <groug@kaod.org> wrote:
> 
> > On Fri, 11 Sep 2020 01:16:07 -0300
> > Fabiano Rosas <farosas@linux.ibm.com> wrote:
> > 
> > > The current nested KVM code does not support HPT guests. This is
> > > informed/enforced in some ways:
> > > 
> > > - Hosts < P9 will not be able to enable the nested HV feature;
> > > 
> > > - The nested hypervisor MMU capabilities will not contain
> > >   KVM_CAP_PPC_MMU_HASH_V3;
> > > 
> > > - QEMU reflects the MMU capabilities in the
> > >   'ibm,arch-vec-5-platform-support' device-tree property;
> > > 
> > > - The nested guest, at 'prom_parse_mmu_model' ignores the
> > >   'disable_radix' kernel command line option if HPT is not supported;
> > > 
> > > - The KVM_PPC_CONFIGURE_V3_MMU ioctl will fail if trying to use HPT.
> > > 
> > > There is, however, still a way to start a HPT guest by using
> > > max-compat-cpu=power8 at the QEMU machine options. This leads to the
> > > guest being set to use hash after QEMU calls the KVM_PPC_ALLOCATE_HTAB
> > > ioctl.
> > > 
> > > With the guest set to hash, the nested hypervisor goes through the
> > > entry path that has no knowledge of nesting (kvmppc_run_vcpu) and
> > > crashes when it tries to execute an hypervisor-privileged (mtspr
> > > HDEC) instruction at __kvmppc_vcore_entry:
> > > 
> > > root@L1:~ $ qemu-system-ppc64 -machine pseries,max-cpu-compat=power8 ...
> > > 
> > > <snip>
> > > [  538.543303] CPU: 83 PID: 25185 Comm: CPU 0/KVM Not tainted 5.9.0-rc4 #1
> > > [  538.543355] NIP:  c00800000753f388 LR: c00800000753f368 CTR: c0000000001e5ec0
> > > [  538.543417] REGS: c0000013e91e33b0 TRAP: 0700   Not tainted  (5.9.0-rc4)
> > > [  538.543470] MSR:  8000000002843033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE>  CR: 22422882  XER: 20040000
> > > [  538.543546] CFAR: c00800000753f4b0 IRQMASK: 3
> > >                GPR00: c0080000075397a0 c0000013e91e3640 c00800000755e600 0000000080000000
> > >                GPR04: 0000000000000000 c0000013eab19800 c000001394de0000 00000043a054db72
> > >                GPR08: 00000000003b1652 0000000000000000 0000000000000000 c0080000075502e0
> > >                GPR12: c0000000001e5ec0 c0000007ffa74200 c0000013eab19800 0000000000000008
> > >                GPR16: 0000000000000000 c00000139676c6c0 c000000001d23948 c0000013e91e38b8
> > >                GPR20: 0000000000000053 0000000000000000 0000000000000001 0000000000000000
> > >                GPR24: 0000000000000001 0000000000000001 0000000000000000 0000000000000001
> > >                GPR28: 0000000000000001 0000000000000053 c0000013eab19800 0000000000000001
> > > [  538.544067] NIP [c00800000753f388] __kvmppc_vcore_entry+0x90/0x104 [kvm_hv]
> > > [  538.544121] LR [c00800000753f368] __kvmppc_vcore_entry+0x70/0x104 [kvm_hv]
> > > [  538.544173] Call Trace:
> > > [  538.544196] [c0000013e91e3640] [c0000013e91e3680] 0xc0000013e91e3680 (unreliable)
> > > [  538.544260] [c0000013e91e3820] [c0080000075397a0] kvmppc_run_core+0xbc8/0x19d0 [kvm_hv]
> > > [  538.544325] [c0000013e91e39e0] [c00800000753d99c] kvmppc_vcpu_run_hv+0x404/0xc00 [kvm_hv]
> > > [  538.544394] [c0000013e91e3ad0] [c0080000072da4fc] kvmppc_vcpu_run+0x34/0x48 [kvm]
> > > [  538.544472] [c0000013e91e3af0] [c0080000072d61b8] kvm_arch_vcpu_ioctl_run+0x310/0x420 [kvm]
> > > [  538.544539] [c0000013e91e3b80] [c0080000072c7450] kvm_vcpu_ioctl+0x298/0x778 [kvm]
> > > [  538.544605] [c0000013e91e3ce0] [c0000000004b8c2c] sys_ioctl+0x1dc/0xc90
> > > [  538.544662] [c0000013e91e3dc0] [c00000000002f9a4] system_call_exception+0xe4/0x1c0
> > > [  538.544726] [c0000013e91e3e20] [c00000000000d140] system_call_common+0xf0/0x27c
> > > [  538.544787] Instruction dump:
> > > [  538.544821] f86d1098 60000000 60000000 48000099 e8ad0fe8 e8c500a0 e9264140 75290002
> > > [  538.544886] 7d1602a6 7cec42a6 40820008 7d0807b4 <7d164ba6> 7d083a14 f90d10a0 480104fd
> > > [  538.544953] ---[ end trace 74423e2b948c2e0c ]---
> > > 
> > > This patch makes the KVM_PPC_ALLOCATE_HTAB ioctl fail when running in
> > > the nested hypervisor, causing QEMU to abort.
> > > 
> > > Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
> > > Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
> > > ---
> > 
> > LGTM
> > 
> > Reviewed-by: Greg Kurz <groug@kaod.org>
> > 
> > >  arch/powerpc/kvm/book3s_hv.c | 6 ++++++
> > >  1 file changed, 6 insertions(+)
> > > 
> > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > > index 4ba06a2a306c..764b6239ef72 100644
> > > --- a/arch/powerpc/kvm/book3s_hv.c
> > > +++ b/arch/powerpc/kvm/book3s_hv.c
> > > @@ -5250,6 +5250,12 @@ static long kvm_arch_vm_ioctl_hv(struct file *filp,
> > >  	case KVM_PPC_ALLOCATE_HTAB: {
> > >  		u32 htab_order;
> > >  
> > > +		/* If we're a nested hypervisor, we currently only support radix */
> > > +		if (kvmhv_on_pseries()) {
> > > +			r = -EOPNOTSUPP;
> 
> According to POSIX [1]:
> 
> [ENOTSUP]
> Not supported. The implementation does not support the requested feature or value.
> 
> [EOPNOTSUPP]
> Operation not supported on socket. The type of socket (address family or protocol) does not support the requested operation. A conforming implementation may assign the same values for [EOPNOTSUPP] and [ENOTSUP].
> 
> Even if these two happen to be equal in linux, it seems that ENOTSUP, which
> doesn't refer to sockets, is more appropriate.
> 
> [1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html

Which in Linux kernel maps to a completely bogus value not recognized by
userspace.

So when you mean ENOTSUP use EOPNOTSUPP in Linux.

Thanks

Michal
Greg Kurz Sept. 11, 2020, 8:17 a.m. UTC | #4
On Fri, 11 Sep 2020 10:08:10 +0200
Michal Suchánek <msuchanek@suse.de> wrote:

> On Fri, Sep 11, 2020 at 10:01:33AM +0200, Greg Kurz wrote:
> > On Fri, 11 Sep 2020 09:45:36 +0200
> > Greg Kurz <groug@kaod.org> wrote:
> > 
> > > On Fri, 11 Sep 2020 01:16:07 -0300
> > > Fabiano Rosas <farosas@linux.ibm.com> wrote:
> > > 
> > > > The current nested KVM code does not support HPT guests. This is
> > > > informed/enforced in some ways:
> > > > 
> > > > - Hosts < P9 will not be able to enable the nested HV feature;
> > > > 
> > > > - The nested hypervisor MMU capabilities will not contain
> > > >   KVM_CAP_PPC_MMU_HASH_V3;
> > > > 
> > > > - QEMU reflects the MMU capabilities in the
> > > >   'ibm,arch-vec-5-platform-support' device-tree property;
> > > > 
> > > > - The nested guest, at 'prom_parse_mmu_model' ignores the
> > > >   'disable_radix' kernel command line option if HPT is not supported;
> > > > 
> > > > - The KVM_PPC_CONFIGURE_V3_MMU ioctl will fail if trying to use HPT.
> > > > 
> > > > There is, however, still a way to start a HPT guest by using
> > > > max-compat-cpu=power8 at the QEMU machine options. This leads to the
> > > > guest being set to use hash after QEMU calls the KVM_PPC_ALLOCATE_HTAB
> > > > ioctl.
> > > > 
> > > > With the guest set to hash, the nested hypervisor goes through the
> > > > entry path that has no knowledge of nesting (kvmppc_run_vcpu) and
> > > > crashes when it tries to execute an hypervisor-privileged (mtspr
> > > > HDEC) instruction at __kvmppc_vcore_entry:
> > > > 
> > > > root@L1:~ $ qemu-system-ppc64 -machine pseries,max-cpu-compat=power8 ...
> > > > 
> > > > <snip>
> > > > [  538.543303] CPU: 83 PID: 25185 Comm: CPU 0/KVM Not tainted 5.9.0-rc4 #1
> > > > [  538.543355] NIP:  c00800000753f388 LR: c00800000753f368 CTR: c0000000001e5ec0
> > > > [  538.543417] REGS: c0000013e91e33b0 TRAP: 0700   Not tainted  (5.9.0-rc4)
> > > > [  538.543470] MSR:  8000000002843033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE>  CR: 22422882  XER: 20040000
> > > > [  538.543546] CFAR: c00800000753f4b0 IRQMASK: 3
> > > >                GPR00: c0080000075397a0 c0000013e91e3640 c00800000755e600 0000000080000000
> > > >                GPR04: 0000000000000000 c0000013eab19800 c000001394de0000 00000043a054db72
> > > >                GPR08: 00000000003b1652 0000000000000000 0000000000000000 c0080000075502e0
> > > >                GPR12: c0000000001e5ec0 c0000007ffa74200 c0000013eab19800 0000000000000008
> > > >                GPR16: 0000000000000000 c00000139676c6c0 c000000001d23948 c0000013e91e38b8
> > > >                GPR20: 0000000000000053 0000000000000000 0000000000000001 0000000000000000
> > > >                GPR24: 0000000000000001 0000000000000001 0000000000000000 0000000000000001
> > > >                GPR28: 0000000000000001 0000000000000053 c0000013eab19800 0000000000000001
> > > > [  538.544067] NIP [c00800000753f388] __kvmppc_vcore_entry+0x90/0x104 [kvm_hv]
> > > > [  538.544121] LR [c00800000753f368] __kvmppc_vcore_entry+0x70/0x104 [kvm_hv]
> > > > [  538.544173] Call Trace:
> > > > [  538.544196] [c0000013e91e3640] [c0000013e91e3680] 0xc0000013e91e3680 (unreliable)
> > > > [  538.544260] [c0000013e91e3820] [c0080000075397a0] kvmppc_run_core+0xbc8/0x19d0 [kvm_hv]
> > > > [  538.544325] [c0000013e91e39e0] [c00800000753d99c] kvmppc_vcpu_run_hv+0x404/0xc00 [kvm_hv]
> > > > [  538.544394] [c0000013e91e3ad0] [c0080000072da4fc] kvmppc_vcpu_run+0x34/0x48 [kvm]
> > > > [  538.544472] [c0000013e91e3af0] [c0080000072d61b8] kvm_arch_vcpu_ioctl_run+0x310/0x420 [kvm]
> > > > [  538.544539] [c0000013e91e3b80] [c0080000072c7450] kvm_vcpu_ioctl+0x298/0x778 [kvm]
> > > > [  538.544605] [c0000013e91e3ce0] [c0000000004b8c2c] sys_ioctl+0x1dc/0xc90
> > > > [  538.544662] [c0000013e91e3dc0] [c00000000002f9a4] system_call_exception+0xe4/0x1c0
> > > > [  538.544726] [c0000013e91e3e20] [c00000000000d140] system_call_common+0xf0/0x27c
> > > > [  538.544787] Instruction dump:
> > > > [  538.544821] f86d1098 60000000 60000000 48000099 e8ad0fe8 e8c500a0 e9264140 75290002
> > > > [  538.544886] 7d1602a6 7cec42a6 40820008 7d0807b4 <7d164ba6> 7d083a14 f90d10a0 480104fd
> > > > [  538.544953] ---[ end trace 74423e2b948c2e0c ]---
> > > > 
> > > > This patch makes the KVM_PPC_ALLOCATE_HTAB ioctl fail when running in
> > > > the nested hypervisor, causing QEMU to abort.
> > > > 
> > > > Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
> > > > Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
> > > > ---
> > > 
> > > LGTM
> > > 
> > > Reviewed-by: Greg Kurz <groug@kaod.org>
> > > 
> > > >  arch/powerpc/kvm/book3s_hv.c | 6 ++++++
> > > >  1 file changed, 6 insertions(+)
> > > > 
> > > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > > > index 4ba06a2a306c..764b6239ef72 100644
> > > > --- a/arch/powerpc/kvm/book3s_hv.c
> > > > +++ b/arch/powerpc/kvm/book3s_hv.c
> > > > @@ -5250,6 +5250,12 @@ static long kvm_arch_vm_ioctl_hv(struct file *filp,
> > > >  	case KVM_PPC_ALLOCATE_HTAB: {
> > > >  		u32 htab_order;
> > > >  
> > > > +		/* If we're a nested hypervisor, we currently only support radix */
> > > > +		if (kvmhv_on_pseries()) {
> > > > +			r = -EOPNOTSUPP;
> > 
> > According to POSIX [1]:
> > 
> > [ENOTSUP]
> > Not supported. The implementation does not support the requested feature or value.
> > 
> > [EOPNOTSUPP]
> > Operation not supported on socket. The type of socket (address family or protocol) does not support the requested operation. A conforming implementation may assign the same values for [EOPNOTSUPP] and [ENOTSUP].
> > 
> > Even if these two happen to be equal in linux, it seems that ENOTSUP, which
> > doesn't refer to sockets, is more appropriate.
> > 
> > [1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html
> 
> Which in Linux kernel maps to a completely bogus value not recognized by
> userspace.
> 

There's no such thing as ENOTSUP in Linux actually :)

Linux has ENOTSUPP which indeed doesn't map to anything usable by
userspace... very unfortunate naming :-(

> So when you mean ENOTSUP use EOPNOTSUPP in Linux.
> 

Right. So this patch is good as it is.

Cheers,

--
Greg

> Thanks
> 
> Michal
David Gibson Sept. 14, 2020, 6:12 a.m. UTC | #5
On Fri, Sep 11, 2020 at 01:16:07AM -0300, Fabiano Rosas wrote:
> The current nested KVM code does not support HPT guests. This is
> informed/enforced in some ways:
> 
> - Hosts < P9 will not be able to enable the nested HV feature;
> 
> - The nested hypervisor MMU capabilities will not contain
>   KVM_CAP_PPC_MMU_HASH_V3;
> 
> - QEMU reflects the MMU capabilities in the
>   'ibm,arch-vec-5-platform-support' device-tree property;
> 
> - The nested guest, at 'prom_parse_mmu_model' ignores the
>   'disable_radix' kernel command line option if HPT is not supported;
> 
> - The KVM_PPC_CONFIGURE_V3_MMU ioctl will fail if trying to use HPT.
> 
> There is, however, still a way to start a HPT guest by using
> max-compat-cpu=power8 at the QEMU machine options. This leads to the
> guest being set to use hash after QEMU calls the KVM_PPC_ALLOCATE_HTAB
> ioctl.
> 
> With the guest set to hash, the nested hypervisor goes through the
> entry path that has no knowledge of nesting (kvmppc_run_vcpu) and
> crashes when it tries to execute an hypervisor-privileged (mtspr
> HDEC) instruction at __kvmppc_vcore_entry:
> 
> root@L1:~ $ qemu-system-ppc64 -machine pseries,max-cpu-compat=power8 ...
> 
> <snip>
> [  538.543303] CPU: 83 PID: 25185 Comm: CPU 0/KVM Not tainted 5.9.0-rc4 #1
> [  538.543355] NIP:  c00800000753f388 LR: c00800000753f368 CTR: c0000000001e5ec0
> [  538.543417] REGS: c0000013e91e33b0 TRAP: 0700   Not tainted  (5.9.0-rc4)
> [  538.543470] MSR:  8000000002843033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE>  CR: 22422882  XER: 20040000
> [  538.543546] CFAR: c00800000753f4b0 IRQMASK: 3
>                GPR00: c0080000075397a0 c0000013e91e3640 c00800000755e600 0000000080000000
>                GPR04: 0000000000000000 c0000013eab19800 c000001394de0000 00000043a054db72
>                GPR08: 00000000003b1652 0000000000000000 0000000000000000 c0080000075502e0
>                GPR12: c0000000001e5ec0 c0000007ffa74200 c0000013eab19800 0000000000000008
>                GPR16: 0000000000000000 c00000139676c6c0 c000000001d23948 c0000013e91e38b8
>                GPR20: 0000000000000053 0000000000000000 0000000000000001 0000000000000000
>                GPR24: 0000000000000001 0000000000000001 0000000000000000 0000000000000001
>                GPR28: 0000000000000001 0000000000000053 c0000013eab19800 0000000000000001
> [  538.544067] NIP [c00800000753f388] __kvmppc_vcore_entry+0x90/0x104 [kvm_hv]
> [  538.544121] LR [c00800000753f368] __kvmppc_vcore_entry+0x70/0x104 [kvm_hv]
> [  538.544173] Call Trace:
> [  538.544196] [c0000013e91e3640] [c0000013e91e3680] 0xc0000013e91e3680 (unreliable)
> [  538.544260] [c0000013e91e3820] [c0080000075397a0] kvmppc_run_core+0xbc8/0x19d0 [kvm_hv]
> [  538.544325] [c0000013e91e39e0] [c00800000753d99c] kvmppc_vcpu_run_hv+0x404/0xc00 [kvm_hv]
> [  538.544394] [c0000013e91e3ad0] [c0080000072da4fc] kvmppc_vcpu_run+0x34/0x48 [kvm]
> [  538.544472] [c0000013e91e3af0] [c0080000072d61b8] kvm_arch_vcpu_ioctl_run+0x310/0x420 [kvm]
> [  538.544539] [c0000013e91e3b80] [c0080000072c7450] kvm_vcpu_ioctl+0x298/0x778 [kvm]
> [  538.544605] [c0000013e91e3ce0] [c0000000004b8c2c] sys_ioctl+0x1dc/0xc90
> [  538.544662] [c0000013e91e3dc0] [c00000000002f9a4] system_call_exception+0xe4/0x1c0
> [  538.544726] [c0000013e91e3e20] [c00000000000d140] system_call_common+0xf0/0x27c
> [  538.544787] Instruction dump:
> [  538.544821] f86d1098 60000000 60000000 48000099 e8ad0fe8 e8c500a0 e9264140 75290002
> [  538.544886] 7d1602a6 7cec42a6 40820008 7d0807b4 <7d164ba6> 7d083a14 f90d10a0 480104fd
> [  538.544953] ---[ end trace 74423e2b948c2e0c ]---
> 
> This patch makes the KVM_PPC_ALLOCATE_HTAB ioctl fail when running in
> the nested hypervisor, causing QEMU to abort.
> 
> Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
> Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  arch/powerpc/kvm/book3s_hv.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 4ba06a2a306c..764b6239ef72 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -5250,6 +5250,12 @@ static long kvm_arch_vm_ioctl_hv(struct file *filp,
>  	case KVM_PPC_ALLOCATE_HTAB: {
>  		u32 htab_order;
>  
> +		/* If we're a nested hypervisor, we currently only support radix */
> +		if (kvmhv_on_pseries()) {
> +			r = -EOPNOTSUPP;
> +			break;
> +		}
> +
>  		r = -EFAULT;
>  		if (get_user(htab_order, (u32 __user *)argp))
>  			break;
Paul Mackerras Sept. 23, 2020, 12:09 a.m. UTC | #6
On Fri, Sep 11, 2020 at 01:16:07AM -0300, Fabiano Rosas wrote:
> The current nested KVM code does not support HPT guests. This is
> informed/enforced in some ways:
> 
> - Hosts < P9 will not be able to enable the nested HV feature;
> 
> - The nested hypervisor MMU capabilities will not contain
>   KVM_CAP_PPC_MMU_HASH_V3;
> 
> - QEMU reflects the MMU capabilities in the
>   'ibm,arch-vec-5-platform-support' device-tree property;
> 
> - The nested guest, at 'prom_parse_mmu_model' ignores the
>   'disable_radix' kernel command line option if HPT is not supported;
> 
> - The KVM_PPC_CONFIGURE_V3_MMU ioctl will fail if trying to use HPT.
> 
> There is, however, still a way to start a HPT guest by using
> max-compat-cpu=power8 at the QEMU machine options. This leads to the
> guest being set to use hash after QEMU calls the KVM_PPC_ALLOCATE_HTAB
> ioctl.
> 
> With the guest set to hash, the nested hypervisor goes through the
> entry path that has no knowledge of nesting (kvmppc_run_vcpu) and
> crashes when it tries to execute an hypervisor-privileged (mtspr
> HDEC) instruction at __kvmppc_vcore_entry:
> 
> root@L1:~ $ qemu-system-ppc64 -machine pseries,max-cpu-compat=power8 ...
> 
> <snip>
> [  538.543303] CPU: 83 PID: 25185 Comm: CPU 0/KVM Not tainted 5.9.0-rc4 #1
> [  538.543355] NIP:  c00800000753f388 LR: c00800000753f368 CTR: c0000000001e5ec0
> [  538.543417] REGS: c0000013e91e33b0 TRAP: 0700   Not tainted  (5.9.0-rc4)
> [  538.543470] MSR:  8000000002843033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE>  CR: 22422882  XER: 20040000
> [  538.543546] CFAR: c00800000753f4b0 IRQMASK: 3
>                GPR00: c0080000075397a0 c0000013e91e3640 c00800000755e600 0000000080000000
>                GPR04: 0000000000000000 c0000013eab19800 c000001394de0000 00000043a054db72
>                GPR08: 00000000003b1652 0000000000000000 0000000000000000 c0080000075502e0
>                GPR12: c0000000001e5ec0 c0000007ffa74200 c0000013eab19800 0000000000000008
>                GPR16: 0000000000000000 c00000139676c6c0 c000000001d23948 c0000013e91e38b8
>                GPR20: 0000000000000053 0000000000000000 0000000000000001 0000000000000000
>                GPR24: 0000000000000001 0000000000000001 0000000000000000 0000000000000001
>                GPR28: 0000000000000001 0000000000000053 c0000013eab19800 0000000000000001
> [  538.544067] NIP [c00800000753f388] __kvmppc_vcore_entry+0x90/0x104 [kvm_hv]
> [  538.544121] LR [c00800000753f368] __kvmppc_vcore_entry+0x70/0x104 [kvm_hv]
> [  538.544173] Call Trace:
> [  538.544196] [c0000013e91e3640] [c0000013e91e3680] 0xc0000013e91e3680 (unreliable)
> [  538.544260] [c0000013e91e3820] [c0080000075397a0] kvmppc_run_core+0xbc8/0x19d0 [kvm_hv]
> [  538.544325] [c0000013e91e39e0] [c00800000753d99c] kvmppc_vcpu_run_hv+0x404/0xc00 [kvm_hv]
> [  538.544394] [c0000013e91e3ad0] [c0080000072da4fc] kvmppc_vcpu_run+0x34/0x48 [kvm]
> [  538.544472] [c0000013e91e3af0] [c0080000072d61b8] kvm_arch_vcpu_ioctl_run+0x310/0x420 [kvm]
> [  538.544539] [c0000013e91e3b80] [c0080000072c7450] kvm_vcpu_ioctl+0x298/0x778 [kvm]
> [  538.544605] [c0000013e91e3ce0] [c0000000004b8c2c] sys_ioctl+0x1dc/0xc90
> [  538.544662] [c0000013e91e3dc0] [c00000000002f9a4] system_call_exception+0xe4/0x1c0
> [  538.544726] [c0000013e91e3e20] [c00000000000d140] system_call_common+0xf0/0x27c
> [  538.544787] Instruction dump:
> [  538.544821] f86d1098 60000000 60000000 48000099 e8ad0fe8 e8c500a0 e9264140 75290002
> [  538.544886] 7d1602a6 7cec42a6 40820008 7d0807b4 <7d164ba6> 7d083a14 f90d10a0 480104fd
> [  538.544953] ---[ end trace 74423e2b948c2e0c ]---
> 
> This patch makes the KVM_PPC_ALLOCATE_HTAB ioctl fail when running in
> the nested hypervisor, causing QEMU to abort.
> 
> Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
> Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>

Thanks, applied.

Paul.
diff mbox series

Patch

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 4ba06a2a306c..764b6239ef72 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -5250,6 +5250,12 @@  static long kvm_arch_vm_ioctl_hv(struct file *filp,
 	case KVM_PPC_ALLOCATE_HTAB: {
 		u32 htab_order;
 
+		/* If we're a nested hypervisor, we currently only support radix */
+		if (kvmhv_on_pseries()) {
+			r = -EOPNOTSUPP;
+			break;
+		}
+
 		r = -EFAULT;
 		if (get_user(htab_order, (u32 __user *)argp))
 			break;