Message ID | 20211124202005.989935-1-peter.maydell@linaro.org |
---|---|
State | New |
Headers | show |
Series | [v2] hw/intc/arm_gicv3: Update cached state after LPI state changes | expand |
Peter Maydell <peter.maydell@linaro.org> writes: > The logic of gicv3_redist_update() is as follows: > * it must be called in any code path that changes the state of > (only) redistributor interrupts > * if it finds a redistributor interrupt that is (now) higher > priority than the previous highest-priority pending interrupt, > then this must be the new highest-priority pending interrupt > * if it does *not* find a better redistributor interrupt, then: > - if the previous state was "no interrupts pending" then > the new state is still "no interrupts pending" > - if the previous best interrupt was not a redistributor > interrupt then that remains the best interrupt > - if the previous best interrupt *was* a redistributor interrupt, > then the new best interrupt must be some non-redistributor > interrupt, but we don't know which so must do a full scan > > In commit 17fb5e36aabd4b2c125 we effectively added the LPI interrupts > as a kind of "redistributor interrupt" for this purpose, by adding > cs->hpplpi to the set of things that gicv3_redist_update() considers > before it gives up and decides to do a full scan of distributor > interrupts. However we didn't quite get this right: > * the condition check for "was the previous best interrupt a > redistributor interrupt" must be updated to include LPIs > in what it considers to be redistributor interrupts > * every code path which updates the LPI state which > gicv3_redist_update() checks must also call gicv3_redist_update(): > this is cs->hpplpi and the GICR_CTLR ENABLE_LPIS bit > > This commit fixes this by: > * correcting the test on cs->hppi.irq in gicv3_redist_update() > * making gicv3_redist_update_lpi() always call gicv3_redist_update() > * introducing a new gicv3_redist_update_lpi_only() for the one > callsite (the post-load hook) which must not call > gicv3_redist_update() > * making gicv3_redist_lpi_pending() always call gicv3_redist_update(), > either directly or via gicv3_redist_update_lpi() > * removing a couple of now-unnecessary calls to gicv3_redist_update() > from some callers of those two functions > * calling gicv3_redist_update() when the GICR_CTLR ENABLE_LPIS > bit is cleared > > (This means that the not-file-local gicv3_redist_* LPI related > functions now all take care of the updates of internally cached > GICv3 information, in the same way the older functions > gicv3_redist_set_irq() and gicv3_redist_send_sgi() do.) > > The visible effect of this bug was that when the guest acknowledged > an LPI by reading ICC_IAR1_EL1, we marked it as not pending in the > LPI data structure but still left it in cs->hppi so we would offer it > to the guest again. In particular for setups using an emulated GICv3 > and ITS and using devices which use LPIs (ie PCI devices) a Linux > guest would complain "irq 54: nobody cared" and then hang. (The hang > was intermittent, presumably depending on the timing between > different interrupts arriving and being completed.) > > Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Tested-by: Alex Bennée <alex.bennee@linaro.org> Interestingly this also triggers an extra IRQ in v4 of my kvm-unit-test ITS patches. However it works with v3 which was more limited in the excising of the test: v3: --8<---------------cut here---------------start------------->8--- modified arm/gic.c @@ -732,21 +732,17 @@ static void test_its_trigger(void) "dev2/eventid=20 does not trigger any LPI"); /* - * re-enable the LPI but willingly do not call invall - * so the change in config is not taken into account. - * The LPI should not hit + * re-enable the LPI. While "A change to the LPI configuration + * is not guaranteed to be visible until an appropriate + * invalidation operation has completed" hardware that doesn't + * implement caches may have delivered the event at any point + * after the enabling. Check the LPI has hit by the time the + * invall is done. */ gicv3_lpi_set_config(8195, LPI_PROP_DEFAULT); stats_reset(); cpumask_clear(&mask); its_send_int(dev2, 20); - wait_for_interrupts(&mask); - report(check_acked(&mask, -1, -1), - "dev2/eventid=20 still does not trigger any LPI"); - - /* Now call the invall and check the LPI hits */ - stats_reset(); - cpumask_clear(&mask); cpumask_set_cpu(3, &mask); its_send_invall(col3); wait_for_interrupts(&mask); --8<---------------cut here---------------end--------------->8--- v4: --8<---------------cut here---------------start------------->8--- modified arm/gic.c @@ -732,34 +732,22 @@ static void test_its_trigger(void) "dev2/eventid=20 does not trigger any LPI"); /* - * re-enable the LPI but willingly do not call invall - * so the change in config is not taken into account. - * The LPI should not hit + * re-enable the LPI. While "A change to the LPI configuration + * is not guaranteed to be visible until an appropriate + * invalidation operation has completed" hardware that doesn't + * implement caches may have delivered the event at any point + * after the enabling. Check the LPI has hit by the time the + * invall is done. */ - gicv3_lpi_set_config(8195, LPI_PROP_DEFAULT); - stats_reset(); - cpumask_clear(&mask); - its_send_int(dev2, 20); - wait_for_interrupts(&mask); - report(check_acked(&mask, -1, -1), - "dev2/eventid=20 still does not trigger any LPI"); - - /* Now call the invall and check the LPI hits */ stats_reset(); - cpumask_clear(&mask); - cpumask_set_cpu(3, &mask); + gicv3_lpi_set_config(8195, LPI_PROP_DEFAULT); its_send_invall(col3); - wait_for_interrupts(&mask); - report(check_acked(&mask, 0, 8195), - "dev2/eventid=20 pending LPI is received"); - - stats_reset(); cpumask_clear(&mask); cpumask_set_cpu(3, &mask); its_send_int(dev2, 20); wait_for_interrupts(&mask); report(check_acked(&mask, 0, 8195), - "dev2/eventid=20 now triggers an LPI"); + "dev2/eventid=20 triggers an LPI"); report_prefix_pop(); --8<---------------cut here---------------end--------------->8--- I think my v3 was correct and the v4 is too aggressive as I was chasing a regression in the QEMU code. > --- > I think this is now a proper fix for the problem. Testing > definitely welcomed... The commit message makes it sound like a bit > of a "several things in one patch" change, but it isn't really IMHO: > I just erred on the side of being very verbose in the description... > --- > hw/intc/gicv3_internal.h | 17 +++++++++++++++++ > hw/intc/arm_gicv3.c | 6 ++++-- > hw/intc/arm_gicv3_redist.c | 14 ++++++++++---- > 3 files changed, 31 insertions(+), 6 deletions(-) > > diff --git a/hw/intc/gicv3_internal.h b/hw/intc/gicv3_internal.h > index a0369dace7b..70f34ee4955 100644 > --- a/hw/intc/gicv3_internal.h > +++ b/hw/intc/gicv3_internal.h > @@ -463,7 +463,24 @@ void gicv3_dist_set_irq(GICv3State *s, int irq, int level); > void gicv3_redist_set_irq(GICv3CPUState *cs, int irq, int level); > void gicv3_redist_process_lpi(GICv3CPUState *cs, int irq, int level); > void gicv3_redist_lpi_pending(GICv3CPUState *cs, int irq, int level); > +/** > + * gicv3_redist_update_lpi: > + * @cs: GICv3CPUState > + * > + * Scan the LPI pending table and recalculate the highest priority > + * pending LPI and also the overall highest priority pending interrupt. > + */ > void gicv3_redist_update_lpi(GICv3CPUState *cs); > +/** > + * gicv3_redist_update_lpi_only: > + * @cs: GICv3CPUState > + * > + * Scan the LPI pending table and recalculate cs->hpplpi only, > + * without calling gicv3_redist_update() to recalculate the overall > + * highest priority pending interrupt. This should be called after > + * an incoming migration has loaded new state. > + */ > +void gicv3_redist_update_lpi_only(GICv3CPUState *cs); good commenting ;-) Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
diff --git a/hw/intc/gicv3_internal.h b/hw/intc/gicv3_internal.h index a0369dace7b..70f34ee4955 100644 --- a/hw/intc/gicv3_internal.h +++ b/hw/intc/gicv3_internal.h @@ -463,7 +463,24 @@ void gicv3_dist_set_irq(GICv3State *s, int irq, int level); void gicv3_redist_set_irq(GICv3CPUState *cs, int irq, int level); void gicv3_redist_process_lpi(GICv3CPUState *cs, int irq, int level); void gicv3_redist_lpi_pending(GICv3CPUState *cs, int irq, int level); +/** + * gicv3_redist_update_lpi: + * @cs: GICv3CPUState + * + * Scan the LPI pending table and recalculate the highest priority + * pending LPI and also the overall highest priority pending interrupt. + */ void gicv3_redist_update_lpi(GICv3CPUState *cs); +/** + * gicv3_redist_update_lpi_only: + * @cs: GICv3CPUState + * + * Scan the LPI pending table and recalculate cs->hpplpi only, + * without calling gicv3_redist_update() to recalculate the overall + * highest priority pending interrupt. This should be called after + * an incoming migration has loaded new state. + */ +void gicv3_redist_update_lpi_only(GICv3CPUState *cs); void gicv3_redist_send_sgi(GICv3CPUState *cs, int grp, int irq, bool ns); void gicv3_init_cpuif(GICv3State *s); diff --git a/hw/intc/arm_gicv3.c b/hw/intc/arm_gicv3.c index c6282984b1e..9f5f815db9b 100644 --- a/hw/intc/arm_gicv3.c +++ b/hw/intc/arm_gicv3.c @@ -186,7 +186,9 @@ static void gicv3_redist_update_noirqset(GICv3CPUState *cs) * interrupt has reduced in priority and any other interrupt could * now be the new best one). */ - if (!seenbetter && cs->hppi.prio != 0xff && cs->hppi.irq < GIC_INTERNAL) { + if (!seenbetter && cs->hppi.prio != 0xff && + (cs->hppi.irq < GIC_INTERNAL || + cs->hppi.irq >= GICV3_LPI_INTID_START)) { gicv3_full_update_noirqset(cs->gic); } } @@ -354,7 +356,7 @@ static void arm_gicv3_post_load(GICv3State *s) * pending interrupt, but don't set IRQ or FIQ lines. */ for (i = 0; i < s->num_cpu; i++) { - gicv3_redist_update_lpi(&s->cpu[i]); + gicv3_redist_update_lpi_only(&s->cpu[i]); } gicv3_full_update_noirqset(s); /* Repopulate the cache of GICv3CPUState pointers for target CPUs */ diff --git a/hw/intc/arm_gicv3_redist.c b/hw/intc/arm_gicv3_redist.c index 424e7e28a86..c8ff3eca085 100644 --- a/hw/intc/arm_gicv3_redist.c +++ b/hw/intc/arm_gicv3_redist.c @@ -256,9 +256,10 @@ static MemTxResult gicr_writel(GICv3CPUState *cs, hwaddr offset, cs->gicr_ctlr |= GICR_CTLR_ENABLE_LPIS; /* Check for any pending interr in pending table */ gicv3_redist_update_lpi(cs); - gicv3_redist_update(cs); } else { cs->gicr_ctlr &= ~GICR_CTLR_ENABLE_LPIS; + /* cs->hppi might have been an LPI; recalculate */ + gicv3_redist_update(cs); } } return MEMTX_OK; @@ -571,7 +572,7 @@ static void gicv3_redist_check_lpi_priority(GICv3CPUState *cs, int irq) } } -void gicv3_redist_update_lpi(GICv3CPUState *cs) +void gicv3_redist_update_lpi_only(GICv3CPUState *cs) { /* * This function scans the LPI pending table and for each pending @@ -614,6 +615,12 @@ void gicv3_redist_update_lpi(GICv3CPUState *cs) } } +void gicv3_redist_update_lpi(GICv3CPUState *cs) +{ + gicv3_redist_update_lpi_only(cs); + gicv3_redist_update(cs); +} + void gicv3_redist_lpi_pending(GICv3CPUState *cs, int irq, int level) { /* @@ -651,6 +658,7 @@ void gicv3_redist_lpi_pending(GICv3CPUState *cs, int irq, int level) */ if (level) { gicv3_redist_check_lpi_priority(cs, irq); + gicv3_redist_update(cs); } else { if (irq == cs->hpplpi.irq) { gicv3_redist_update_lpi(cs); @@ -673,8 +681,6 @@ void gicv3_redist_process_lpi(GICv3CPUState *cs, int irq, int level) /* set/clear the pending bit for this irq */ gicv3_redist_lpi_pending(cs, irq, level); - - gicv3_redist_update(cs); } void gicv3_redist_set_irq(GICv3CPUState *cs, int irq, int level)
The logic of gicv3_redist_update() is as follows: * it must be called in any code path that changes the state of (only) redistributor interrupts * if it finds a redistributor interrupt that is (now) higher priority than the previous highest-priority pending interrupt, then this must be the new highest-priority pending interrupt * if it does *not* find a better redistributor interrupt, then: - if the previous state was "no interrupts pending" then the new state is still "no interrupts pending" - if the previous best interrupt was not a redistributor interrupt then that remains the best interrupt - if the previous best interrupt *was* a redistributor interrupt, then the new best interrupt must be some non-redistributor interrupt, but we don't know which so must do a full scan In commit 17fb5e36aabd4b2c125 we effectively added the LPI interrupts as a kind of "redistributor interrupt" for this purpose, by adding cs->hpplpi to the set of things that gicv3_redist_update() considers before it gives up and decides to do a full scan of distributor interrupts. However we didn't quite get this right: * the condition check for "was the previous best interrupt a redistributor interrupt" must be updated to include LPIs in what it considers to be redistributor interrupts * every code path which updates the LPI state which gicv3_redist_update() checks must also call gicv3_redist_update(): this is cs->hpplpi and the GICR_CTLR ENABLE_LPIS bit This commit fixes this by: * correcting the test on cs->hppi.irq in gicv3_redist_update() * making gicv3_redist_update_lpi() always call gicv3_redist_update() * introducing a new gicv3_redist_update_lpi_only() for the one callsite (the post-load hook) which must not call gicv3_redist_update() * making gicv3_redist_lpi_pending() always call gicv3_redist_update(), either directly or via gicv3_redist_update_lpi() * removing a couple of now-unnecessary calls to gicv3_redist_update() from some callers of those two functions * calling gicv3_redist_update() when the GICR_CTLR ENABLE_LPIS bit is cleared (This means that the not-file-local gicv3_redist_* LPI related functions now all take care of the updates of internally cached GICv3 information, in the same way the older functions gicv3_redist_set_irq() and gicv3_redist_send_sgi() do.) The visible effect of this bug was that when the guest acknowledged an LPI by reading ICC_IAR1_EL1, we marked it as not pending in the LPI data structure but still left it in cs->hppi so we would offer it to the guest again. In particular for setups using an emulated GICv3 and ITS and using devices which use LPIs (ie PCI devices) a Linux guest would complain "irq 54: nobody cared" and then hang. (The hang was intermittent, presumably depending on the timing between different interrupts arriving and being completed.) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- I think this is now a proper fix for the problem. Testing definitely welcomed... The commit message makes it sound like a bit of a "several things in one patch" change, but it isn't really IMHO: I just erred on the side of being very verbose in the description... --- hw/intc/gicv3_internal.h | 17 +++++++++++++++++ hw/intc/arm_gicv3.c | 6 ++++-- hw/intc/arm_gicv3_redist.c | 14 ++++++++++---- 3 files changed, 31 insertions(+), 6 deletions(-)