KVM: PPC: Book3S PR: Enable use on POWER9 inside HPT-mode guests

Message ID 20180519055638.GA24787@fergus.ozlabs.ibm.com
State Accepted
Headers show
Series
  • KVM: PPC: Book3S PR: Enable use on POWER9 inside HPT-mode guests
Related show

Commit Message

Paul Mackerras May 19, 2018, 5:56 a.m.
This relaxes the restriction on using PR KVM on POWER9.  The existing
code does work inside a guest partition running in HPT mode, because
hypercalls such as H_ENTER use the old HPTE format, not the new
format used by POWER9, and so no change to PR KVM's HPT manipulation
code is required.  PR KVM will still refuse to run if the kernel is
using radix translation or if it is running bare-metal.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/kvm/book3s_pr.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Comments

Greg Kurz May 23, 2018, 5:04 p.m. | #1
On Sat, 19 May 2018 15:56:38 +1000
Paul Mackerras <paulus@ozlabs.org> wrote:

> This relaxes the restriction on using PR KVM on POWER9.  The existing
> code does work inside a guest partition running in HPT mode, because
> hypercalls such as H_ENTER use the old HPTE format, not the new
> format used by POWER9, and so no change to PR KVM's HPT manipulation
> code is required.  PR KVM will still refuse to run if the kernel is
> using radix translation or if it is running bare-metal.
> 
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---

Paul,

I have built a 4.16.0 kernel + this patch and booted the L1 guest
with "disable_radix=on". I could then successfully boot a L2 guest,
using the same kernel for simplicity. Both guests using identical
fedora28 images. So it seems to be working at first sight.


But, if I boot the L2 guest with the default fedora28 kernel, ie
4.16.9-300.fc28.ppc64le, the L2 guest hangs.

OF stdout device is: /vdevice/vty@71000000
Preparing to boot Linux version 4.16.9-300.fc28.ppc64le (mockbuild@buildvm-ppc64le-05.ppc.fedoraproject.org) (gcc version 8.1.1 20180502 (Red Hat 8.1.1-1) (GCC)) #1 SMP Thu May 17 04:31:32 UTC 2018
Detected machine type: 0000000000000101
command line: BOOT_IMAGE=/boot/vmlinuz-4.16.9-300.fc28.ppc64le root=UUID=22128c5c-30b1-4e0a-ac16-95853df31131 ro rhgb console=hvc0 early_printk LANG=en_US.UTF-8
Max number of cores passed to firmware: 1024 (NR_CPUS = 1024)
Calling ibm,client-architecture-support... done
memory layout at init:
  memory_limit : 0000000000000000 (16 MB aligned)
  alloc_bottom : 0000000004e70000
  alloc_top    : 0000000030000000
  alloc_top_hi : 0000000100000000
  rmo_top      : 0000000030000000
  ram_top      : 0000000100000000
instantiating rtas at 0x000000002fff0000... done
prom_hold_cpus: skipped
copying OF device tree...
Building dt strings...
Building dt structure...
Device tree strings 0x0000000004e80000 -> 0x0000000004e80aaf
Device tree struct  0x0000000004e90000 -> 0x0000000004ea0000
Quiescing Open Firmware ...
Booting Linux via __start() @ 0x0000000002000000 ...

(qemu) p $pc
0xc000000000026aa0
(qemu) p $lr
0xc000000000119ff4

# addr2line -e /usr/lib/debug/lib/modules/4.16.9-300.fc28.ppc64le/vmlinux 0xc000000000026aa0
/usr/src/debug/kernel-4.16.fc28/linux-4.16.9-300.fc28.ppc64le/./arch/powerpc/include/asm/time.h:115

# addr2line -e /usr/lib/debug/lib/modules/4.16.9-300.fc28.ppc64le/vmlinux 0xc000000000119ff4
/usr/src/debug/kernel-4.16.fc28/linux-4.16.9-300.fc28.ppc64le/kernel/panic.c:300

ie, the final mdelay(PANIC_TIMER_STEP) in panic().

Not sure how to debug this further, any suggestion is welcome :)

Cheers,

--
Greg

>  arch/powerpc/kvm/book3s_pr.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
> index 67061d3..3d0251e 100644
> --- a/arch/powerpc/kvm/book3s_pr.c
> +++ b/arch/powerpc/kvm/book3s_pr.c
> @@ -1735,9 +1735,16 @@ static void kvmppc_core_destroy_vm_pr(struct kvm *kvm)
>  static int kvmppc_core_check_processor_compat_pr(void)
>  {
>  	/*
> -	 * Disable KVM for Power9 untill the required bits merged.
> +	 * PR KVM can work on POWER9 inside a guest partition
> +	 * running in HPT mode.  It can't work if we are using
> +	 * radix translation (because radix provides no way for
> +	 * a process to have unique translations in quadrant 3)
> +	 * or in a bare-metal HPT-mode host (because POWER9
> +	 * uses a modified HPTE format which the PR KVM code
> +	 * has not been adapted to use).
>  	 */
> -	if (cpu_has_feature(CPU_FTR_ARCH_300))
> +	if (cpu_has_feature(CPU_FTR_ARCH_300) &&
> +	    (radix_enabled() || cpu_has_feature(CPU_FTR_HVMODE)))
>  		return -EIO;
>  	return 0;
>  }

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul Mackerras May 23, 2018, 11:12 p.m. | #2
On Wed, May 23, 2018 at 07:04:21PM +0200, Greg Kurz wrote:
> On Sat, 19 May 2018 15:56:38 +1000
> Paul Mackerras <paulus@ozlabs.org> wrote:
> 
> > This relaxes the restriction on using PR KVM on POWER9.  The existing
> > code does work inside a guest partition running in HPT mode, because
> > hypercalls such as H_ENTER use the old HPTE format, not the new
> > format used by POWER9, and so no change to PR KVM's HPT manipulation
> > code is required.  PR KVM will still refuse to run if the kernel is
> > using radix translation or if it is running bare-metal.
> > 
> > Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> > ---
> 
> Paul,
> 
> I have built a 4.16.0 kernel + this patch and booted the L1 guest
> with "disable_radix=on". I could then successfully boot a L2 guest,
> using the same kernel for simplicity. Both guests using identical
> fedora28 images. So it seems to be working at first sight.
> 
> 
> But, if I boot the L2 guest with the default fedora28 kernel, ie
> 4.16.9-300.fc28.ppc64le, the L2 guest hangs.
> 
> OF stdout device is: /vdevice/vty@71000000
> Preparing to boot Linux version 4.16.9-300.fc28.ppc64le (mockbuild@buildvm-ppc64le-05.ppc.fedoraproject.org) (gcc version 8.1.1 20180502 (Red Hat 8.1.1-1) (GCC)) #1 SMP Thu May 17 04:31:32 UTC 2018
> Detected machine type: 0000000000000101
> command line: BOOT_IMAGE=/boot/vmlinuz-4.16.9-300.fc28.ppc64le root=UUID=22128c5c-30b1-4e0a-ac16-95853df31131 ro rhgb console=hvc0 early_printk LANG=en_US.UTF-8
> Max number of cores passed to firmware: 1024 (NR_CPUS = 1024)
> Calling ibm,client-architecture-support... done
> memory layout at init:
>   memory_limit : 0000000000000000 (16 MB aligned)
>   alloc_bottom : 0000000004e70000
>   alloc_top    : 0000000030000000
>   alloc_top_hi : 0000000100000000
>   rmo_top      : 0000000030000000
>   ram_top      : 0000000100000000
> instantiating rtas at 0x000000002fff0000... done
> prom_hold_cpus: skipped
> copying OF device tree...
> Building dt strings...
> Building dt structure...
> Device tree strings 0x0000000004e80000 -> 0x0000000004e80aaf
> Device tree struct  0x0000000004e90000 -> 0x0000000004ea0000
> Quiescing Open Firmware ...
> Booting Linux via __start() @ 0x0000000002000000 ...
> 
> (qemu) p $pc
> 0xc000000000026aa0
> (qemu) p $lr
> 0xc000000000119ff4
> 
> # addr2line -e /usr/lib/debug/lib/modules/4.16.9-300.fc28.ppc64le/vmlinux 0xc000000000026aa0
> /usr/src/debug/kernel-4.16.fc28/linux-4.16.9-300.fc28.ppc64le/./arch/powerpc/include/asm/time.h:115
> 
> # addr2line -e /usr/lib/debug/lib/modules/4.16.9-300.fc28.ppc64le/vmlinux 0xc000000000119ff4
> /usr/src/debug/kernel-4.16.fc28/linux-4.16.9-300.fc28.ppc64le/kernel/panic.c:300
> 
> ie, the final mdelay(PANIC_TIMER_STEP) in panic().
> 
> Not sure how to debug this further, any suggestion is welcome :)

I suggest you find the address of log_buf from System.map, read that
via the qemu command line (log_buf is a pointer), then dump the memory
it points to, so you can see the panic message.

Another thing to try would be to do the same test on a POWER8.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg Kurz May 25, 2018, 10:48 a.m. | #3
On Thu, 24 May 2018 09:12:09 +1000
Paul Mackerras <paulus@ozlabs.org> wrote:

> On Wed, May 23, 2018 at 07:04:21PM +0200, Greg Kurz wrote:
> > On Sat, 19 May 2018 15:56:38 +1000
> > Paul Mackerras <paulus@ozlabs.org> wrote:
> >   
> > > This relaxes the restriction on using PR KVM on POWER9.  The existing
> > > code does work inside a guest partition running in HPT mode, because
> > > hypercalls such as H_ENTER use the old HPTE format, not the new
> > > format used by POWER9, and so no change to PR KVM's HPT manipulation
> > > code is required.  PR KVM will still refuse to run if the kernel is
> > > using radix translation or if it is running bare-metal.
> > > 
> > > Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> > > ---  
> > 
> > Paul,
> > 
> > I have built a 4.16.0 kernel + this patch and booted the L1 guest
> > with "disable_radix=on". I could then successfully boot a L2 guest,
> > using the same kernel for simplicity. Both guests using identical
> > fedora28 images. So it seems to be working at first sight.
> > 
> > 
> > But, if I boot the L2 guest with the default fedora28 kernel, ie
> > 4.16.9-300.fc28.ppc64le, the L2 guest hangs.
> > 
> > OF stdout device is: /vdevice/vty@71000000
> > Preparing to boot Linux version 4.16.9-300.fc28.ppc64le (mockbuild@buildvm-ppc64le-05.ppc.fedoraproject.org) (gcc version 8.1.1 20180502 (Red Hat 8.1.1-1) (GCC)) #1 SMP Thu May 17 04:31:32 UTC 2018
> > Detected machine type: 0000000000000101
> > command line: BOOT_IMAGE=/boot/vmlinuz-4.16.9-300.fc28.ppc64le root=UUID=22128c5c-30b1-4e0a-ac16-95853df31131 ro rhgb console=hvc0 early_printk LANG=en_US.UTF-8
> > Max number of cores passed to firmware: 1024 (NR_CPUS = 1024)
> > Calling ibm,client-architecture-support... done
> > memory layout at init:
> >   memory_limit : 0000000000000000 (16 MB aligned)
> >   alloc_bottom : 0000000004e70000
> >   alloc_top    : 0000000030000000
> >   alloc_top_hi : 0000000100000000
> >   rmo_top      : 0000000030000000
> >   ram_top      : 0000000100000000
> > instantiating rtas at 0x000000002fff0000... done
> > prom_hold_cpus: skipped
> > copying OF device tree...
> > Building dt strings...
> > Building dt structure...
> > Device tree strings 0x0000000004e80000 -> 0x0000000004e80aaf
> > Device tree struct  0x0000000004e90000 -> 0x0000000004ea0000
> > Quiescing Open Firmware ...
> > Booting Linux via __start() @ 0x0000000002000000 ...
> > 
> > (qemu) p $pc
> > 0xc000000000026aa0
> > (qemu) p $lr
> > 0xc000000000119ff4
> > 
> > # addr2line -e /usr/lib/debug/lib/modules/4.16.9-300.fc28.ppc64le/vmlinux 0xc000000000026aa0
> > /usr/src/debug/kernel-4.16.fc28/linux-4.16.9-300.fc28.ppc64le/./arch/powerpc/include/asm/time.h:115
> > 
> > # addr2line -e /usr/lib/debug/lib/modules/4.16.9-300.fc28.ppc64le/vmlinux 0xc000000000119ff4
> > /usr/src/debug/kernel-4.16.fc28/linux-4.16.9-300.fc28.ppc64le/kernel/panic.c:300
> > 
> > ie, the final mdelay(PANIC_TIMER_STEP) in panic().
> > 
> > Not sure how to debug this further, any suggestion is welcome :)  
> 
> I suggest you find the address of log_buf from System.map, read that
> via the qemu command line (log_buf is a pointer), then dump the memory
> it points to, so you can see the panic message.
> 

Hi Paul,

Thanks for your suggestion.

I could reproduced the problem if I boot the L2 guest with an upstream
kernel (commit d7b66b4ab034). I've tried to dump the log_buf but things
didn't go well:

$ grep 'd log_buf' System.map 
c000000001304f08 d log_buf_len
c000000001304f10 d log_buf

(qemu) x 0xc000000001304f08
c000000001304f08: Cannot access memory

Since 4.16.0 works, I could bisect down to:

commit dbfcf3cb9c681aa0c5d0bb46068f98d5b1823dd3
Author: Paul Mackerras <paulus@ozlabs.org>
Date:   Thu Feb 16 16:03:39 2017 +1100

    powerpc/64: Call H_REGISTER_PROC_TBL when running as a HPT guest on POWER9

The hcall is handled by QEMU, which then calls the KVM_PPC_CONFIGURE_V3_MMU
ioctl, which fails since PR KVM doesn't implement it, and H_REGISTER_PROC_TBL
fails with H_PARAMETER. The panic hence come from...

static int pseries_lpar_register_process_table(unsigned long base,
			unsigned long page_size, unsigned long table_size)
{
	.
	.
	.
	for (;;) {
		rc = plpar_hcall_norets(H_REGISTER_PROC_TBL, flags, base,
					page_size, table_size);
		if (!H_IS_LONG_BUSY(rc))
			break;
		mdelay(get_longbusy_msecs(rc));
	}
	if (rc != H_SUCCESS) {
		pr_err("Failed to register process table (rc=%ld)\n", rc);
		BUG();
		^^^
		here.

The changelog of commit dbfcf3cb9c68 reads:

" If the hypervisor is able to support both radix and HPT guests, it would
  be entitled to defer allocation of the HPT until the H_REGISTER_PROC_TBL
  call"

But in our case, the hypervisor is QEMU/PR KVM in a L1 guest booted with radix
disabled. It is hence not "entitled to defer allocation of the HPT", and QEMU
allocates one during initial machine reset.

If I patch QEMU to make H_REGISTER_PROC_TBL a nop when KVM_CAP_PPC_MMU_RADIX
returns 0, then the L2 kernel boots like a charm.

So I'm wondering if the guest should even call H_REGISTER_PROC_TBL in this
case, since there's nothing to do ? 

Also, peeking into PAPR, I see that H_REGISTER_PROC_TBL is mandatory only "If
the platform supports the In-Memory Table Translation Option", which isn't
the case here. This is supposed to be advertised through the "hcall-imtt"
function set in the OF property "ibm,hypertas-functions" in the /rtas node.

I guess a correct behavior would be for QEMU to advertise "hcall-imtt"
when it supports both radix and hash, and the kernel should only call
H_REGISTER_PROC_TBL if it is available.

Of course, neither QEMU, nor the kernel seem to care about "hcall-imtt" today...
so I guess the easier way is to fix H_REGISTER_PROC_TBL in QEMU.

> Another thing to try would be to do the same test on a POWER8.
> 

No surprise, it continues to work on a POWER8, since:

               /*
                * On POWER9, we need to do a H_REGISTER_PROC_TBL hcall
                * to inform the hypervisor that we wish to use the HPT.
                */
               if (cpu_has_feature(CPU_FTR_ARCH_300))
                       register_process_table(0, 0, 0);

> Paul.

Cheers,

--
Greg
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul Mackerras May 29, 2018, 1:59 a.m. | #4
On Fri, May 25, 2018 at 12:48:31PM +0200, Greg Kurz wrote:

> I could reproduced the problem if I boot the L2 guest with an upstream
> kernel (commit d7b66b4ab034). I've tried to dump the log_buf but things
> didn't go well:
> 
> $ grep 'd log_buf' System.map 
> c000000001304f08 d log_buf_len
> c000000001304f10 d log_buf
> 
> (qemu) x 0xc000000001304f08
> c000000001304f08: Cannot access memory

I would use "xp 0x1304f08".

> Since 4.16.0 works, I could bisect down to:
> 
> commit dbfcf3cb9c681aa0c5d0bb46068f98d5b1823dd3
> Author: Paul Mackerras <paulus@ozlabs.org>
> Date:   Thu Feb 16 16:03:39 2017 +1100
> 
>     powerpc/64: Call H_REGISTER_PROC_TBL when running as a HPT guest on POWER9
> 
> The hcall is handled by QEMU, which then calls the KVM_PPC_CONFIGURE_V3_MMU
> ioctl, which fails since PR KVM doesn't implement it, and H_REGISTER_PROC_TBL
> fails with H_PARAMETER. The panic hence come from...

Hmmm.  Maybe the kernel should check the ibm,architecture-vec-5
property in /chosen and only register the process table if it
indicates the hypervisor can support either hash or radix.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg Kurz May 29, 2018, 11:55 a.m. | #5
On Tue, 29 May 2018 11:59:44 +1000
Paul Mackerras <paulus@ozlabs.org> wrote:

> On Fri, May 25, 2018 at 12:48:31PM +0200, Greg Kurz wrote:
> 
> > I could reproduced the problem if I boot the L2 guest with an upstream
> > kernel (commit d7b66b4ab034). I've tried to dump the log_buf but things
> > didn't go well:
> > 
> > $ grep 'd log_buf' System.map 
> > c000000001304f08 d log_buf_len
> > c000000001304f10 d log_buf
> > 
> > (qemu) x 0xc000000001304f08
> > c000000001304f08: Cannot access memory  
> 
> I would use "xp 0x1304f08".
> 

Dumb me, virtual addresses don't work this early... :-\

And the panic message confirms that we're hitting the BUG()
in pseries_lpar_register_process_table().

fFailed to register process table (rc=-4)
------------[ cut here ]------------
Fkernel BUG at arch/powerpc/platforms/pseries/lpar.c:750!
Oops: Exception in kernel mode, sig: 5 [#1]
LE SMP NR_CPUS=1024 NUMA 
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0-kvm-ppc-next-gku+ #2
NIP:  c0000000000c73c4 LR: c0000000000c73c0 CTR: c000000000193bb8
REGS: c000000001473bc0 TRAP: 0700   Not tainted  (4.17.0-kvm-ppc-next-gku+)
MSR:  a000000000022003 <SF,FP,RI,LE>  CR: 28042884  XER: 20040000
CFAR: 0000000000000000 SOFTE: 1 
GPR00: c0000000000c73c0 c000000001473e40 c000000001476700 0000000000000028 
GPR04: 0000000000000001 0000000000000000 c00000000163d794 c000000001636700 
GPR08: 0000000000000000 c000000000fe0c28 0000000000000000 0000000000000000 
GPR12: 0000000000002000 c0000000017c0000 000000003dc5dd10 0000000002d9f7d8 
GPR16: fffffffffffffffd 000000003dc5dd10 000000003e457c00 0000000000000014 
GPR20: 0000000002db83a8 0000000002000000 000000002fff0000 fffffffffffffffd 
GPR24: 000000003dc5dd58 c000000000000000 c00000000161fc78 c000000000bb61d8 
GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
NIP [c0000000000c73c4] pseries_lpar_register_process_table+0xf4/0x100
LR [c0000000000c73c0] pseries_lpar_register_process_table+0xf0/0x100
Call Trace:
[c000000001473e40] [c0000000000c73c0] pseries_lpar_register_process_table+0xf0/0x100 (unreliable)
[c000000001473ed0] [0000000000ef5b64] 0xef5b64
[c000000001473f60] [0000000000eeb504] 0xeeb504
[c000000001473f90] [000000000000b348] 0xb348
Instruction dump:
e8010010 eb61ffd8 eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8 7c0803a6 4e800020 
3c62ff94 3863e6c0 480cbe15 60000000 <0fe00000> 60000000 60000000 3c4c013b 
random: get_random_bytes called from print_oops_end_marker+0x40/0x80 with crng_init=0
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Attempted to kill the idle task!
---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---


> > Since 4.16.0 works, I could bisect down to:
> > 
> > commit dbfcf3cb9c681aa0c5d0bb46068f98d5b1823dd3
> > Author: Paul Mackerras <paulus@ozlabs.org>
> > Date:   Thu Feb 16 16:03:39 2017 +1100
> > 
> >     powerpc/64: Call H_REGISTER_PROC_TBL when running as a HPT guest on POWER9
> > 
> > The hcall is handled by QEMU, which then calls the KVM_PPC_CONFIGURE_V3_MMU
> > ioctl, which fails since PR KVM doesn't implement it, and H_REGISTER_PROC_TBL
> > fails with H_PARAMETER. The panic hence come from...  
> 
> Hmmm.  Maybe the kernel should check the ibm,architecture-vec-5
> property in /chosen and only register the process table if it
> indicates the hypervisor can support either hash or radix.
> 

Indeed the kernel could do that, but it is a bit unfortunate anyway for
H_REGISTER_PROC_TBL(0, 0, 0, 0) to fail with H_PARAMETER, depending on
the hypervisor being PR KVM or HV KVM... As an alternative, I've sent
a patch for QEMU to avoid calling KVM_PPC_CONFIGURE_V3_MMU when the HPT
is owned by QEMU in userspace.

https://patchwork.ozlabs.org/patch/920500/

> Paul.

Cheers,

--
Greg
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg Kurz May 31, 2018, 6:02 a.m. | #6
On Sat, 19 May 2018 15:56:38 +1000
Paul Mackerras <paulus@ozlabs.org> wrote:

> This relaxes the restriction on using PR KVM on POWER9.  The existing
> code does work inside a guest partition running in HPT mode, because
> hypercalls such as H_ENTER use the old HPTE format, not the new
> format used by POWER9, and so no change to PR KVM's HPT manipulation
> code is required.  PR KVM will still refuse to run if the kernel is
> using radix translation or if it is running bare-metal.
> 
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---

With your other patch applied (https://patchwork.ozlabs.org/patch/916766/).

Tested-by: Greg Kurz <groug@kaod.org>

>  arch/powerpc/kvm/book3s_pr.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
> index 67061d3..3d0251e 100644
> --- a/arch/powerpc/kvm/book3s_pr.c
> +++ b/arch/powerpc/kvm/book3s_pr.c
> @@ -1735,9 +1735,16 @@ static void kvmppc_core_destroy_vm_pr(struct kvm *kvm)
>  static int kvmppc_core_check_processor_compat_pr(void)
>  {
>  	/*
> -	 * Disable KVM for Power9 untill the required bits merged.
> +	 * PR KVM can work on POWER9 inside a guest partition
> +	 * running in HPT mode.  It can't work if we are using
> +	 * radix translation (because radix provides no way for
> +	 * a process to have unique translations in quadrant 3)
> +	 * or in a bare-metal HPT-mode host (because POWER9
> +	 * uses a modified HPTE format which the PR KVM code
> +	 * has not been adapted to use).
>  	 */
> -	if (cpu_has_feature(CPU_FTR_ARCH_300))
> +	if (cpu_has_feature(CPU_FTR_ARCH_300) &&
> +	    (radix_enabled() || cpu_has_feature(CPU_FTR_HVMODE)))
>  		return -EIO;
>  	return 0;
>  }

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 67061d3..3d0251e 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -1735,9 +1735,16 @@  static void kvmppc_core_destroy_vm_pr(struct kvm *kvm)
 static int kvmppc_core_check_processor_compat_pr(void)
 {
 	/*
-	 * Disable KVM for Power9 untill the required bits merged.
+	 * PR KVM can work on POWER9 inside a guest partition
+	 * running in HPT mode.  It can't work if we are using
+	 * radix translation (because radix provides no way for
+	 * a process to have unique translations in quadrant 3)
+	 * or in a bare-metal HPT-mode host (because POWER9
+	 * uses a modified HPTE format which the PR KVM code
+	 * has not been adapted to use).
 	 */
-	if (cpu_has_feature(CPU_FTR_ARCH_300))
+	if (cpu_has_feature(CPU_FTR_ARCH_300) &&
+	    (radix_enabled() || cpu_has_feature(CPU_FTR_HVMODE)))
 		return -EIO;
 	return 0;
 }