powerpc: fix 32-bit KVM-PR lockup and panic with MacOS guest

Message ID 20190208143319.11980-1-mark.cave-ayland@ilande.co.uk
State Accepted
Commit fe1ef6bcdb4fca33434256a802a3ed6aacf0bd2f
Headers show
Series
  • powerpc: fix 32-bit KVM-PR lockup and panic with MacOS guest
Related show

Checks

Context Check Description
snowpatch_ozlabs/checkpatch fail total: 1 errors, 1 warnings, 2 checks, 8 lines checked
snowpatch_ozlabs/build-pmac32 success build succeeded & removed 0 sparse warning(s)
snowpatch_ozlabs/build-ppc64e success build succeeded & removed 0 sparse warning(s)
snowpatch_ozlabs/build-ppc64be success build succeeded & removed 0 sparse warning(s)
snowpatch_ozlabs/build-ppc64le success build succeeded & removed 0 sparse warning(s)
snowpatch_ozlabs/apply_patch success next/apply_patch Successfully applied

Commit Message

Mark Cave-Ayland Feb. 8, 2019, 2:33 p.m.
Commit 8792468da5e1 "powerpc: Add the ability to save FPU without giving it up"
unexpectedly removed the MSR_FE0 and MSR_FE1 bits from the bitmask used to
update the MSR of the previous thread in __giveup_fpu() causing a KVM-PR MacOS
guest to lockup and panic the kernel.

Reinstate these bits to the MSR bitmask to enable MacOS guests to run under
32-bit KVM-PR once again without issue.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
---
 arch/powerpc/kernel/process.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Christophe Leroy Feb. 8, 2019, 2:45 p.m. | #1
Le 08/02/2019 à 15:33, Mark Cave-Ayland a écrit :
> Commit 8792468da5e1 "powerpc: Add the ability to save FPU without giving it up"

Expected format for the above is:

Commit 123456789abc ("text")

> unexpectedly removed the MSR_FE0 and MSR_FE1 bits from the bitmask used to
> update the MSR of the previous thread in __giveup_fpu() causing a KVM-PR MacOS
> guest to lockup and panic the kernel.
> 
> Reinstate these bits to the MSR bitmask to enable MacOS guests to run under
> 32-bit KVM-PR once again without issue.
> 
> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>

Should include a Fixes: and a Cc to stable ?

Fixes: 8792468da5e1 ("powerpc: Add the ability to save FPU without 
giving it up")
Cc: stable@vger.kernel.org

Christophe

> ---
>   arch/powerpc/kernel/process.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index ce393df243aa..71bad4b6f80d 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -176,7 +176,7 @@ static void __giveup_fpu(struct task_struct *tsk)
>   
>   	save_fpu(tsk);
>   	msr = tsk->thread.regs->msr;
> -	msr &= ~MSR_FP;
> +	msr &= ~(MSR_FP|MSR_FE0|MSR_FE1);
>   #ifdef CONFIG_VSX
>   	if (cpu_has_feature(CPU_FTR_VSX))
>   		msr &= ~MSR_VSX;
>
Mark Cave-Ayland Feb. 8, 2019, 2:51 p.m. | #2
On 08/02/2019 14:45, Christophe Leroy wrote:

> Le 08/02/2019 à 15:33, Mark Cave-Ayland a écrit :
>> Commit 8792468da5e1 "powerpc: Add the ability to save FPU without giving it up"
> 
> Expected format for the above is:
> 
> Commit 123456789abc ("text")

Hi Christophe,

Apologies - I'm fairly new at submitting kernel patches, but I can re-send it in the
correct format later if required.

>> unexpectedly removed the MSR_FE0 and MSR_FE1 bits from the bitmask used to
>> update the MSR of the previous thread in __giveup_fpu() causing a KVM-PR MacOS
>> guest to lockup and panic the kernel.
>>
>> Reinstate these bits to the MSR bitmask to enable MacOS guests to run under
>> 32-bit KVM-PR once again without issue.
>>
>> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
> 
> Should include a Fixes: and a Cc to stable ?
> 
> Fixes: 8792468da5e1 ("powerpc: Add the ability to save FPU without giving it up")
> Cc: stable@vger.kernel.org

Indeed, but there are still some questions to be asked here:

1) Why were these bits removed from the original bitmask in the first place without
it being documented in the commit message?

2) Is this the right fix? I'm told that MacOS guests already run without this patch
on a G5 under 64-bit KVM-PR which may suggest that this is a workaround for another
bug elsewhere in the 32-bit powerpc code.


If you think that these points don't matter, then I'm happy to resubmit the patch
as-is based upon your comments above.


ATB,

Mark.
Benjamin Herrenschmidt Feb. 11, 2019, 12:30 a.m. | #3
On Fri, 2019-02-08 at 14:51 +0000, Mark Cave-Ayland wrote:
> 
> Indeed, but there are still some questions to be asked here:
> 
> 1) Why were these bits removed from the original bitmask in the first place without
> it being documented in the commit message?
> 
> 2) Is this the right fix? I'm told that MacOS guests already run without this patch
> on a G5 under 64-bit KVM-PR which may suggest that this is a workaround for another
> bug elsewhere in the 32-bit powerpc code.
> 
> 
> If you think that these points don't matter, then I'm happy to resubmit the patch
> as-is based upon your comments above.

We should write a test case to verify that FE0/FE1 are properly
preserved/context-switched etc... I bet if we accidentally wiped them,
we wouldn't notice 99.9% of the time.

Cheers,
Ben.
Mark Cave-Ayland Feb. 11, 2019, 9:39 p.m. | #4
On 11/02/2019 00:30, Benjamin Herrenschmidt wrote:

> On Fri, 2019-02-08 at 14:51 +0000, Mark Cave-Ayland wrote:
>>
>> Indeed, but there are still some questions to be asked here:
>>
>> 1) Why were these bits removed from the original bitmask in the first place without
>> it being documented in the commit message?
>>
>> 2) Is this the right fix? I'm told that MacOS guests already run without this patch
>> on a G5 under 64-bit KVM-PR which may suggest that this is a workaround for another
>> bug elsewhere in the 32-bit powerpc code.
>>
>>
>> If you think that these points don't matter, then I'm happy to resubmit the patch
>> as-is based upon your comments above.
> 
> We should write a test case to verify that FE0/FE1 are properly
> preserved/context-switched etc... I bet if we accidentally wiped them,
> we wouldn't notice 99.9% of the time.

Right I guess it's more likely to cause in issue in the KVM PR case because the guest
can alter the flags in a way that doesn't go through the normal process switch mechanism.

The original patchset at
https://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg98326.html does include
some tests in the first few patches, but AFAICT they are concerned with the contents
of the FP registers rather than the related MSRs.

Who is the right person to ask about fixing issues related to context switching with
KVM PR? I did add the original author's email address to my first few emails but have
had no response back :/


ATB,

Mark.
Michael Ellerman Feb. 19, 2019, 4:20 a.m. | #5
Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> writes:
> On 08/02/2019 14:45, Christophe Leroy wrote:
>
>> Le 08/02/2019 à 15:33, Mark Cave-Ayland a écrit :
>>> Commit 8792468da5e1 "powerpc: Add the ability to save FPU without giving it up"
>> 
>> Expected format for the above is:
>> 
>> Commit 123456789abc ("text")
>
> Hi Christophe,
>
> Apologies - I'm fairly new at submitting kernel patches, but I can re-send it in the
> correct format later if required.
>
>>> unexpectedly removed the MSR_FE0 and MSR_FE1 bits from the bitmask used to
>>> update the MSR of the previous thread in __giveup_fpu() causing a KVM-PR MacOS
>>> guest to lockup and panic the kernel.

Which kernel is panicking? The guest or the host?

>>> Reinstate these bits to the MSR bitmask to enable MacOS guests to run under
>>> 32-bit KVM-PR once again without issue.
>>>
>>> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
>> 
>> Should include a Fixes: and a Cc to stable ?
>> 
>> Fixes: 8792468da5e1 ("powerpc: Add the ability to save FPU without giving it up")
>> Cc: stable@vger.kernel.org
>
> Indeed, but there are still some questions to be asked here:
>
> 1) Why were these bits removed from the original bitmask in the first place without
> it being documented in the commit message?

It was almost certainly an accident.

> 2) Is this the right fix? I'm told that MacOS guests already run without this patch
> on a G5 under 64-bit KVM-PR which may suggest that this is a workaround for another
> bug elsewhere in the 32-bit powerpc code.

That's slightly worrying. It's hard to say without more detail on why
the guest is crashing.

I think your patch looks OK based just on the fact that it restores the
previous behaviour, so I'll pick it up and pass it through my usual
testing. If nothing breaks I'll merge it.

cheers
Michael Ellerman Feb. 19, 2019, 4:55 a.m. | #6
Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> writes:
> On 11/02/2019 00:30, Benjamin Herrenschmidt wrote:
>
>> On Fri, 2019-02-08 at 14:51 +0000, Mark Cave-Ayland wrote:
>>>
>>> Indeed, but there are still some questions to be asked here:
>>>
>>> 1) Why were these bits removed from the original bitmask in the first place without
>>> it being documented in the commit message?
>>>
>>> 2) Is this the right fix? I'm told that MacOS guests already run without this patch
>>> on a G5 under 64-bit KVM-PR which may suggest that this is a workaround for another
>>> bug elsewhere in the 32-bit powerpc code.
>>>
>>>
>>> If you think that these points don't matter, then I'm happy to resubmit the patch
>>> as-is based upon your comments above.
>> 
>> We should write a test case to verify that FE0/FE1 are properly
>> preserved/context-switched etc... I bet if we accidentally wiped them,
>> we wouldn't notice 99.9% of the time.
>
> Right I guess it's more likely to cause in issue in the KVM PR case because the guest
> can alter the flags in a way that doesn't go through the normal process switch mechanism.
>
> The original patchset at
> https://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg98326.html does include
> some tests in the first few patches, but AFAICT they are concerned with the contents
> of the FP registers rather than the related MSRs.

fpu_preempt.c should be able to be adapted to also check the MSR bits.

> Who is the right person to ask about fixing issues related to context switching with
> KVM PR?

KVM PR doesn't really have a maintainer TBH. Feel like volunteering? :)

> I did add the original author's email address to my first few emails but have
> had no response back :/

Cyril who wrote the original FPU patch has moved on to other things.

cheers
Mark Cave-Ayland Feb. 19, 2019, 7:55 a.m. | #7
On 19/02/2019 04:20, Michael Ellerman wrote:

Hi Michael,

> Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> writes:
>> On 08/02/2019 14:45, Christophe Leroy wrote:
>>
>>> Le 08/02/2019 à 15:33, Mark Cave-Ayland a écrit :
>>>> Commit 8792468da5e1 "powerpc: Add the ability to save FPU without giving it up"
>>>
>>> Expected format for the above is:
>>>
>>> Commit 123456789abc ("text")
>>
>> Hi Christophe,
>>
>> Apologies - I'm fairly new at submitting kernel patches, but I can re-send it in the
>> correct format later if required.
>>
>>>> unexpectedly removed the MSR_FE0 and MSR_FE1 bits from the bitmask used to
>>>> update the MSR of the previous thread in __giveup_fpu() causing a KVM-PR MacOS
>>>> guest to lockup and panic the kernel.
> 
> Which kernel is panicking? The guest or the host?

It's the host kernel. As long as you occasionally tap a few keys to keep the screen
blanking disabled then you can see the panic on the physical console.

I've uploaded a photo I took during the bisection containing the panic when booting
MacOS X 10.2 under qemu-system-ppc to
https://www.ilande.co.uk/tmp/qemu/macmini-kvm.jpg in case you find it useful.

Given that it's really easy to recreate, let me know if you want me to do a git
pull/rebuild and/or if you need any debugging information as it's easy for me to
reproduce.

>>>> Reinstate these bits to the MSR bitmask to enable MacOS guests to run under
>>>> 32-bit KVM-PR once again without issue.
>>>>
>>>> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
>>>
>>> Should include a Fixes: and a Cc to stable ?
>>>
>>> Fixes: 8792468da5e1 ("powerpc: Add the ability to save FPU without giving it up")
>>> Cc: stable@vger.kernel.org
>>
>> Indeed, but there are still some questions to be asked here:
>>
>> 1) Why were these bits removed from the original bitmask in the first place without
>> it being documented in the commit message?
> 
> It was almost certainly an accident.

Heh, okay :)

>> 2) Is this the right fix? I'm told that MacOS guests already run without this patch
>> on a G5 under 64-bit KVM-PR which may suggest that this is a workaround for another
>> bug elsewhere in the 32-bit powerpc code.
> 
> That's slightly worrying. It's hard to say without more detail on why
> the guest is crashing.
> 
> I think your patch looks OK based just on the fact that it restores the
> previous behaviour, so I'll pick it up and pass it through my usual
> testing. If nothing breaks I'll merge it.

That would be great! Does it need a CC to stable too? It would be great if this would
get picked up in the next set of Debian ports kernels, for example.


ATB,

Mark.
Mark Cave-Ayland Feb. 19, 2019, 8:15 a.m. | #8
On 19/02/2019 04:55, Michael Ellerman wrote:

> Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> writes:
>> On 11/02/2019 00:30, Benjamin Herrenschmidt wrote:
>>
>>> On Fri, 2019-02-08 at 14:51 +0000, Mark Cave-Ayland wrote:
>>>>
>>>> Indeed, but there are still some questions to be asked here:
>>>>
>>>> 1) Why were these bits removed from the original bitmask in the first place without
>>>> it being documented in the commit message?
>>>>
>>>> 2) Is this the right fix? I'm told that MacOS guests already run without this patch
>>>> on a G5 under 64-bit KVM-PR which may suggest that this is a workaround for another
>>>> bug elsewhere in the 32-bit powerpc code.
>>>>
>>>>
>>>> If you think that these points don't matter, then I'm happy to resubmit the patch
>>>> as-is based upon your comments above.
>>>
>>> We should write a test case to verify that FE0/FE1 are properly
>>> preserved/context-switched etc... I bet if we accidentally wiped them,
>>> we wouldn't notice 99.9% of the time.
>>
>> Right I guess it's more likely to cause in issue in the KVM PR case because the guest
>> can alter the flags in a way that doesn't go through the normal process switch mechanism.
>>
>> The original patchset at
>> https://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg98326.html does include
>> some tests in the first few patches, but AFAICT they are concerned with the contents
>> of the FP registers rather than the related MSRs.
> 
> fpu_preempt.c should be able to be adapted to also check the MSR bits.
> 
>> Who is the right person to ask about fixing issues related to context switching with
>> KVM PR?
> 
> KVM PR doesn't really have a maintainer TBH. Feel like volunteering? :)

Well I only have a 32-bit Mac Mini here which I'm using to help flush out bugs in
QEMU's emulation, so I can keep an occasional eye on the 32-bit side of things but as
it's a hobby project time is quite limited.

As/when time allows I'd be interested to figure out what MacOS 9 does that causes KVM
PR to bail, and if it's possible to run KVM PR on an SMP kernel but certainly I'd
need some help from the very knowledgable people on these lists.

>> I did add the original author's email address to my first few emails but have
>> had no response back :/
> 
> Cyril who wrote the original FPU patch has moved on to other things.

Ah okay then.


ATB,

Mark.
Michael Ellerman Feb. 20, 2019, 12:41 p.m. | #9
Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> writes:
> On 19/02/2019 04:20, Michael Ellerman wrote:
>> Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> writes:
>>>>> unexpectedly removed the MSR_FE0 and MSR_FE1 bits from the bitmask used to
>>>>> update the MSR of the previous thread in __giveup_fpu() causing a KVM-PR MacOS
>>>>> guest to lockup and panic the kernel.
>> 
>> Which kernel is panicking? The guest or the host?
>
> It's the host kernel. As long as you occasionally tap a few keys to keep the screen
> blanking disabled then you can see the panic on the physical console.

Ah crap I assumed you meant the guest kernel.

> I've uploaded a photo I took during the bisection containing the panic when booting
> MacOS X 10.2 under qemu-system-ppc to
> https://www.ilande.co.uk/tmp/qemu/macmini-kvm.jpg in case you find it useful.

OK. That's a host crash, but only because init died (systemd). Though
the reason it died is because we didn't clear FE0/1 properly, so still a
kernel bug.

> Given that it's really easy to recreate, let me know if you want me to do a git
> pull/rebuild and/or if you need any debugging information as it's easy for me to
> reproduce.

I think that's OK. It's reasonably clear what's going on.


>>> 2) Is this the right fix? I'm told that MacOS guests already run without this patch
>>> on a G5 under 64-bit KVM-PR which may suggest that this is a workaround for another
>>> bug elsewhere in the 32-bit powerpc code.
>> 
>> That's slightly worrying. It's hard to say without more detail on why
>> the guest is crashing.
>> 
>> I think your patch looks OK based just on the fact that it restores the
>> previous behaviour, so I'll pick it up and pass it through my usual
>> testing. If nothing breaks I'll merge it.
>
> That would be great! Does it need a CC to stable too? It would be great if this would
> get picked up in the next set of Debian ports kernels, for example.

I'll add Cc stable.

cheers

Patch

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index ce393df243aa..71bad4b6f80d 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -176,7 +176,7 @@  static void __giveup_fpu(struct task_struct *tsk)
 
 	save_fpu(tsk);
 	msr = tsk->thread.regs->msr;
-	msr &= ~MSR_FP;
+	msr &= ~(MSR_FP|MSR_FE0|MSR_FE1);
 #ifdef CONFIG_VSX
 	if (cpu_has_feature(CPU_FTR_VSX))
 		msr &= ~MSR_VSX;