diff mbox series

[v2,for,6.2?] gicv3: fix ICH_MISR's LRENP computation

Message ID 20211207094427.3473-1-damien.hedde@greensocs.com
State New
Headers show
Series [v2,for,6.2?] gicv3: fix ICH_MISR's LRENP computation | expand

Commit Message

Damien Hedde Dec. 7, 2021, 9:44 a.m. UTC
According to the "Arm Generic Interrupt Controller Architecture
Specification GIC architecture version 3 and 4" (version G: page 345
for aarch64 or 509 for aarch32):
LRENP bit of ICH_MISR is set when ICH_HCR.LRENPIE==1 and
ICH_HCR.EOIcount is non-zero.

When only LRENPIE was set (and EOI count was zero), the LRENP bit was
wrongly set and MISR value was wrong.

As an additional consequence, if an hypervisor set ICH_HCR.LRENPIE,
the maintenance interrupt was constantly fired. It happens since patch
9cee1efe92 ("hw/intc: Set GIC maintenance interrupt level to only 0 or 1")
which fixed another bug about maintenance interrupt (most significant
bits of misr, including this one, were ignored in the interrupt trigger).

Fixes: 83f036fe3d ("hw/intc/arm_gicv3: Add accessors for ICH_ system registers")
Signed-off-by: Damien Hedde <damien.hedde@greensocs.com>
---
The gic doc is available here:
https://developer.arm.com/documentation/ihi0069/g

v2: identical resend because subject screw-up (sorry)

Thanks,
Damien
---
 hw/intc/arm_gicv3_cpuif.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Philippe Mathieu-Daudé Dec. 7, 2021, 12:45 p.m. UTC | #1
On 12/7/21 10:44, Damien Hedde wrote:
> According to the "Arm Generic Interrupt Controller Architecture
> Specification GIC architecture version 3 and 4" (version G: page 345
> for aarch64 or 509 for aarch32):
> LRENP bit of ICH_MISR is set when ICH_HCR.LRENPIE==1 and
> ICH_HCR.EOIcount is non-zero.
> 
> When only LRENPIE was set (and EOI count was zero), the LRENP bit was
> wrongly set and MISR value was wrong.
> 
> As an additional consequence, if an hypervisor set ICH_HCR.LRENPIE,
> the maintenance interrupt was constantly fired. It happens since patch
> 9cee1efe92 ("hw/intc: Set GIC maintenance interrupt level to only 0 or 1")
> which fixed another bug about maintenance interrupt (most significant
> bits of misr, including this one, were ignored in the interrupt trigger).
> 
> Fixes: 83f036fe3d ("hw/intc/arm_gicv3: Add accessors for ICH_ system registers")

This commit predates 6.1 release, so technically this is not
a regression for 6.2.

> Signed-off-by: Damien Hedde <damien.hedde@greensocs.com>
> ---
> The gic doc is available here:
> https://developer.arm.com/documentation/ihi0069/g
> 
> v2: identical resend because subject screw-up (sorry)
> 
> Thanks,
> Damien
> ---
>  hw/intc/arm_gicv3_cpuif.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
> index 7fba931450..85fc369e55 100644
> --- a/hw/intc/arm_gicv3_cpuif.c
> +++ b/hw/intc/arm_gicv3_cpuif.c
> @@ -351,7 +351,8 @@ static uint32_t maintenance_interrupt_state(GICv3CPUState *cs)
>      /* Scan list registers and fill in the U, NP and EOI bits */
>      eoi_maintenance_interrupt_state(cs, &value);
>  
> -    if (cs->ich_hcr_el2 & (ICH_HCR_EL2_LRENPIE | ICH_HCR_EL2_EOICOUNT_MASK)) {
> +    if ((cs->ich_hcr_el2 & ICH_HCR_EL2_LRENPIE) &&
> +        (cs->ich_hcr_el2 & ICH_HCR_EL2_EOICOUNT_MASK)) {
>          value |= ICH_MISR_EL2_LRENP;
>      }
>  
>
Damien Hedde Dec. 7, 2021, 1:05 p.m. UTC | #2
On 12/7/21 13:45, Philippe Mathieu-Daudé wrote:
> On 12/7/21 10:44, Damien Hedde wrote:
>> According to the "Arm Generic Interrupt Controller Architecture
>> Specification GIC architecture version 3 and 4" (version G: page 345
>> for aarch64 or 509 for aarch32):
>> LRENP bit of ICH_MISR is set when ICH_HCR.LRENPIE==1 and
>> ICH_HCR.EOIcount is non-zero.
>>
>> When only LRENPIE was set (and EOI count was zero), the LRENP bit was
>> wrongly set and MISR value was wrong.
>>
>> As an additional consequence, if an hypervisor set ICH_HCR.LRENPIE,
>> the maintenance interrupt was constantly fired. It happens since patch
>> 9cee1efe92 ("hw/intc: Set GIC maintenance interrupt level to only 0 or 1")
>> which fixed another bug about maintenance interrupt (most significant
>> bits of misr, including this one, were ignored in the interrupt trigger).
>>
>> Fixes: 83f036fe3d ("hw/intc/arm_gicv3: Add accessors for ICH_ system registers")
> 
> This commit predates 6.1 release, so technically this is not
> a regression for 6.2.

Do you mean "Fixes:" is meant only for regression or simply that this 
patch should not go for 6.2 ?

9cee1efe92 was introduced after 6.1, and changed the interrupt behavior. 
Thought I'm not sure if we can consider this as a fix for 9cee1efe92: it 
only makes the previous error more visible.

Damien
Peter Maydell Dec. 7, 2021, 1:32 p.m. UTC | #3
On Tue, 7 Dec 2021 at 13:05, Damien Hedde <damien.hedde@greensocs.com> wrote:
> On 12/7/21 13:45, Philippe Mathieu-Daudé wrote:
> > On 12/7/21 10:44, Damien Hedde wrote:
> >> According to the "Arm Generic Interrupt Controller Architecture
> >> Specification GIC architecture version 3 and 4" (version G: page 345
> >> for aarch64 or 509 for aarch32):
> >> LRENP bit of ICH_MISR is set when ICH_HCR.LRENPIE==1 and
> >> ICH_HCR.EOIcount is non-zero.
> >>
> >> When only LRENPIE was set (and EOI count was zero), the LRENP bit was
> >> wrongly set and MISR value was wrong.
> >>
> >> As an additional consequence, if an hypervisor set ICH_HCR.LRENPIE,
> >> the maintenance interrupt was constantly fired. It happens since patch
> >> 9cee1efe92 ("hw/intc: Set GIC maintenance interrupt level to only 0 or 1")
> >> which fixed another bug about maintenance interrupt (most significant
> >> bits of misr, including this one, were ignored in the interrupt trigger).
> >>
> >> Fixes: 83f036fe3d ("hw/intc/arm_gicv3: Add accessors for ICH_ system registers")
> >
> > This commit predates 6.1 release, so technically this is not
> > a regression for 6.2.
>
> Do you mean "Fixes:" is meant only for regression or simply that this
> patch should not go for 6.2 ?

Fixes: is fine in all situations where the commit is fixing
a bug that was introduced in the commit hash it mentions.

Separately, given where we are in the release cycle, a patch has
to hit a very high bar to go into 6.2: at least "this breaks
a real world use case that worked fine in 6.1", and probably also
"a use case that we expect a fair number of users to be using".

-- PMM
Peter Maydell Dec. 7, 2021, 2:21 p.m. UTC | #4
On Tue, 7 Dec 2021 at 09:44, Damien Hedde <damien.hedde@greensocs.com> wrote:
>
> According to the "Arm Generic Interrupt Controller Architecture
> Specification GIC architecture version 3 and 4" (version G: page 345
> for aarch64 or 509 for aarch32):
> LRENP bit of ICH_MISR is set when ICH_HCR.LRENPIE==1 and
> ICH_HCR.EOIcount is non-zero.
>
> When only LRENPIE was set (and EOI count was zero), the LRENP bit was
> wrongly set and MISR value was wrong.
>
> As an additional consequence, if an hypervisor set ICH_HCR.LRENPIE,
> the maintenance interrupt was constantly fired. It happens since patch
> 9cee1efe92 ("hw/intc: Set GIC maintenance interrupt level to only 0 or 1")
> which fixed another bug about maintenance interrupt (most significant
> bits of misr, including this one, were ignored in the interrupt trigger).
>
> Fixes: 83f036fe3d ("hw/intc/arm_gicv3: Add accessors for ICH_ system registers")
> Signed-off-by: Damien Hedde <damien.hedde@greensocs.com>
> ---
> The gic doc is available here:
> https://developer.arm.com/documentation/ihi0069/g
>
> v2: identical resend because subject screw-up (sorry)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

I won't try to put this into 6.2 unless you have a common guest
that runs into this bug.

thanks
-- PMM
Brian Cain Dec. 7, 2021, 3:18 p.m. UTC | #5
> -----Original Message-----
> From: Qemu-devel <qemu-devel-bounces+bcain=quicinc.com@nongnu.org>
> On Behalf Of Peter Maydell
...
> On Tue, 7 Dec 2021 at 09:44, Damien Hedde <damien.hedde@greensocs.com>
> wrote:
> >
> > According to the "Arm Generic Interrupt Controller Architecture
> > Specification GIC architecture version 3 and 4" (version G: page 345
> > for aarch64 or 509 for aarch32):
> > LRENP bit of ICH_MISR is set when ICH_HCR.LRENPIE==1 and
> > ICH_HCR.EOIcount is non-zero.
> >
> > When only LRENPIE was set (and EOI count was zero), the LRENP bit was
> > wrongly set and MISR value was wrong.
> >
> > As an additional consequence, if an hypervisor set ICH_HCR.LRENPIE,
> > the maintenance interrupt was constantly fired. It happens since patch
> > 9cee1efe92 ("hw/intc: Set GIC maintenance interrupt level to only 0 or 1")
> > which fixed another bug about maintenance interrupt (most significant
> > bits of misr, including this one, were ignored in the interrupt trigger).
> >
> > Fixes: 83f036fe3d ("hw/intc/arm_gicv3: Add accessors for ICH_ system
> registers")
> > Signed-off-by: Damien Hedde <damien.hedde@greensocs.com>
> > ---
> > The gic doc is available here:
> > https://developer.arm.com/documentation/ihi0069/g
> >
> > v2: identical resend because subject screw-up (sorry)
> 
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> 
> I won't try to put this into 6.2 unless you have a common guest
> that runs into this bug.

Peter,

I know that Qualcomm encounters this issue with its hypervisor (https://github.com/quic/gunyah-hypervisor).  Apologies for not being familiar -- "common guest" means multiple guest systems/OSs that encounter the issue?  Does that mean that it would not suffice to demonstrate the issue for the one known case?

-Brian
Damien Hedde Dec. 7, 2021, 3:22 p.m. UTC | #6
On 12/7/21 15:21, Peter Maydell wrote:
> On Tue, 7 Dec 2021 at 09:44, Damien Hedde <damien.hedde@greensocs.com> wrote:
>>
>> According to the "Arm Generic Interrupt Controller Architecture
>> Specification GIC architecture version 3 and 4" (version G: page 345
>> for aarch64 or 509 for aarch32):
>> LRENP bit of ICH_MISR is set when ICH_HCR.LRENPIE==1 and
>> ICH_HCR.EOIcount is non-zero.
>>
>> When only LRENPIE was set (and EOI count was zero), the LRENP bit was
>> wrongly set and MISR value was wrong.
>>
>> As an additional consequence, if an hypervisor set ICH_HCR.LRENPIE,
>> the maintenance interrupt was constantly fired. It happens since patch
>> 9cee1efe92 ("hw/intc: Set GIC maintenance interrupt level to only 0 or 1")
>> which fixed another bug about maintenance interrupt (most significant
>> bits of misr, including this one, were ignored in the interrupt trigger).
>>
>> Fixes: 83f036fe3d ("hw/intc/arm_gicv3: Add accessors for ICH_ system registers")
>> Signed-off-by: Damien Hedde <damien.hedde@greensocs.com>
>> ---
>> The gic doc is available here:
>> https://developer.arm.com/documentation/ihi0069/g
>>
>> v2: identical resend because subject screw-up (sorry)
> 
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> 
> I won't try to put this into 6.2 unless you have a common guest
> that runs into this bug.
> 
> thanks
> -- PMM
> 

I don't know if this fit into "common guest" but my use case is:

 > ./build/qemu-system-aarch64 \
 >     -machine virt,virtualization=on,gic-version=3,highmem=off  \
 >     -cpu max -m size=4G -smp cpus=8 -nographic  \
 >     -kernel hypvm.elf   \
 >     -device loader,file=Image,addr=0x41080000  \
 >     -device loader,file=virt_512M.dtb,addr=0x44200000

where Image is a buildroot compiled kernel and hypvm.elf is an 
hypervisor from qualcomm (https://github.com/quic/gunyah-hypervisor).

It boots fine on v6.0 or v6.1 but hangs on master.

It's the same hypervisor Brian is talking about.

Thanks,
Damien
Peter Maydell Dec. 7, 2021, 3:24 p.m. UTC | #7
On Tue, 7 Dec 2021 at 15:18, Brian Cain <bcain@quicinc.com> wrote:
> Peter Maydell wrote:
> > I won't try to put this into 6.2 unless you have a common guest
> > that runs into this bug.

> I know that Qualcomm encounters this issue with its hypervisor (https://github.com/quic/gunyah-hypervisor).  Apologies for not being familiar -- "common guest" means multiple guest systems/OSs that encounter the issue?  Does that mean that it would not suffice to demonstrate the issue for the one known case?

It means "if you see this with a Linux, BSD etc guest that's
more important than if you see this with some oddball thing
nobody else is using and whose users aren't as likely to be
using release versions of QEMU rather than mainline".

The bug is a bug in any case and we'll fix it, it's just a
question of whether it meets the bar to go into 6.2, which is
hopefully going to have its final RC tagged today. If this
patch had arrived a week ago then the bar would have been
lower and it would definitely have gone in. As it is I have
to weigh up the chances of this change causing a regression
for eg KVM running on emulated QEMU.

thanks
-- PMM
Brian Cain Dec. 7, 2021, 3:26 p.m. UTC | #8
> -----Original Message-----
> From: Peter Maydell <peter.maydell@linaro.org>
...
> On Tue, 7 Dec 2021 at 15:18, Brian Cain <bcain@quicinc.com> wrote:
> > Peter Maydell wrote:
> > > I won't try to put this into 6.2 unless you have a common guest
> > > that runs into this bug.
> 
> > I know that Qualcomm encounters this issue with its hypervisor
> (https://github.com/quic/gunyah-hypervisor).  Apologies for not being familiar
> -- "common guest" means multiple guest systems/OSs that encounter the
> issue?  Does that mean that it would not suffice to demonstrate the issue for
> the one known case?
> 
> It means "if you see this with a Linux, BSD etc guest that's
> more important than if you see this with some oddball thing
> nobody else is using and whose users aren't as likely to be
> using release versions of QEMU rather than mainline".

Ok, gotcha, thanks for the clarification :)

> The bug is a bug in any case and we'll fix it, it's just a
> question of whether it meets the bar to go into 6.2, which is
> hopefully going to have its final RC tagged today. If this
> patch had arrived a week ago then the bar would have been
> lower and it would definitely have gone in. As it is I have
> to weigh up the chances of this change causing a regression
> for eg KVM running on emulated QEMU.

I understand, and it sounds like the right call.

Thanks!
-Brian
Peter Maydell Dec. 7, 2021, 3:45 p.m. UTC | #9
On Tue, 7 Dec 2021 at 15:24, Peter Maydell <peter.maydell@linaro.org> wrote:
> The bug is a bug in any case and we'll fix it, it's just a
> question of whether it meets the bar to go into 6.2, which is
> hopefully going to have its final RC tagged today. If this
> patch had arrived a week ago then the bar would have been
> lower and it would definitely have gone in. As it is I have
> to weigh up the chances of this change causing a regression
> for eg KVM running on emulated QEMU.

Looking at the KVM source it doesn't ever set the LRENPIE
bit (it doesn't even have a #define for it), which both
explains why we didn't notice this bug before and also
means we can be pretty certain we're not going to cause a
regression for KVM at least if we fix it...

-- PMM
Damien Hedde Dec. 7, 2021, 3:49 p.m. UTC | #10
On 12/7/21 16:45, Peter Maydell wrote:
> On Tue, 7 Dec 2021 at 15:24, Peter Maydell <peter.maydell@linaro.org> wrote:
>> The bug is a bug in any case and we'll fix it, it's just a
>> question of whether it meets the bar to go into 6.2, which is
>> hopefully going to have its final RC tagged today. If this
>> patch had arrived a week ago then the bar would have been
>> lower and it would definitely have gone in. As it is I have
>> to weigh up the chances of this change causing a regression
>> for eg KVM running on emulated QEMU.
> 
> Looking at the KVM source it doesn't ever set the LRENPIE
> bit (it doesn't even have a #define for it), which both
> explains why we didn't notice this bug before and also
> means we can be pretty certain we're not going to cause a
> regression for KVM at least if we fix it...
> 
> -- PMM
> 

We are perfectly fine with this not going into 6.2.

--
Damien
Peter Maydell Dec. 7, 2021, 8:24 p.m. UTC | #11
On Tue, 7 Dec 2021 at 15:49, Damien Hedde <damien.hedde@greensocs.com> wrote:
>
>
>
> On 12/7/21 16:45, Peter Maydell wrote:
> > On Tue, 7 Dec 2021 at 15:24, Peter Maydell <peter.maydell@linaro.org> wrote:
> >> The bug is a bug in any case and we'll fix it, it's just a
> >> question of whether it meets the bar to go into 6.2, which is
> >> hopefully going to have its final RC tagged today. If this
> >> patch had arrived a week ago then the bar would have been
> >> lower and it would definitely have gone in. As it is I have
> >> to weigh up the chances of this change causing a regression
> >> for eg KVM running on emulated QEMU.
> >
> > Looking at the KVM source it doesn't ever set the LRENPIE
> > bit (it doesn't even have a #define for it), which both
> > explains why we didn't notice this bug before and also
> > means we can be pretty certain we're not going to cause a
> > regression for KVM at least if we fix it...

> We are perfectly fine with this not going into 6.2.

I thought about it a bit more, and realized that we could
end up giving KVM spurious maintenance interrupts even though
it doesn't set the LRENPIE bit, because the incorrect OR
meant we'd send a maint irq whenever the EOIcount was nonzero.
So we've put this fix in for 6.2.

Thanks for the patch and the discussion.

-- PMM
diff mbox series

Patch

diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
index 7fba931450..85fc369e55 100644
--- a/hw/intc/arm_gicv3_cpuif.c
+++ b/hw/intc/arm_gicv3_cpuif.c
@@ -351,7 +351,8 @@  static uint32_t maintenance_interrupt_state(GICv3CPUState *cs)
     /* Scan list registers and fill in the U, NP and EOI bits */
     eoi_maintenance_interrupt_state(cs, &value);
 
-    if (cs->ich_hcr_el2 & (ICH_HCR_EL2_LRENPIE | ICH_HCR_EL2_EOICOUNT_MASK)) {
+    if ((cs->ich_hcr_el2 & ICH_HCR_EL2_LRENPIE) &&
+        (cs->ich_hcr_el2 & ICH_HCR_EL2_EOICOUNT_MASK)) {
         value |= ICH_MISR_EL2_LRENP;
     }