mbox series

[RFC,qemu,legoater/aspeed-3.1,0/4] Handle short timer periods

Message ID 20190111035638.19725-1-andrew@aj.id.au
Headers show
Series Handle short timer periods | expand

Message

Andrew Jeffery Jan. 11, 2019, 3:56 a.m. UTC
Hello,

We've had an issue for a while, since the introduction of 4451d3f59f2a
("clocksource/drivers/fttmr010: Fix set_next_event handler") in Linux, where
our qemu instances have a "sticky" behaviour. The VM becomes unresponsive at
unexpected times for unpredicable but often long periods of time.

This series, along with the associated patch to Linux's fttmr010 driver[0],
aims to resolve it.

[0] http://patchwork.ozlabs.org/patch/1023363/

The series an RFC series because it doesn't fix all the cases, just
demonstrates a potential way forward. The approach is to provide back-pressure
- to test if the reload value is set to a value that's smaller than some
experimentally determined value (in this case, 20us), and if so, configure a
period of at least 20us. The MAX() of the requested and minimum values is then
set in the reload register rather than the requested value, which can then be
read back by Linux. The fttmr010 driver, upon observing the greater value,
starts pushing back on the clock event subsystem, which then iteratively
determines a minimum acceptible event period before proceeding with the boot
process.

The implementation does not take care of the match register cases, but at the
moment they are unused by Linux on Aspeed SoCs. The match register case is not
taken care of because I'm not sure if this implementation is what we want to
use going forward, or whether we want to do something that's completely hidden
in the qemu model. However taking advantage of Linux's existing support for
determining minimum timer periods seemed like easy solution.

The stickiness turns out to be a consequence of Linux configuring a very short
timer period (1us, mostly due to the timerio_rng driver), which causes qemu to
spend a large chunk of its timeslice handling the host timer interrupt rather
than executing guest code.

Please critique!

Andrew

Andrew Jeffery (4):
  timer: aspeed: Fire interrupt on failure to meet deadline
  timer: aspeed: Status register contains reload for stopped timer
  timer: aspeed: Fix match calculations
  timer: aspeed: Provide back-pressure information for short periods

 hw/misc/aspeed_scu.c            |  6 ++++++
 hw/timer/aspeed_timer.c         | 37 ++++++++++++++++++++++++++-------
 include/hw/timer/aspeed_timer.h |  1 +
 3 files changed, 36 insertions(+), 8 deletions(-)

Comments

Cédric Le Goater Jan. 11, 2019, 10:14 a.m. UTC | #1
Hello Andrew,

On 1/11/19 4:56 AM, Andrew Jeffery wrote:
> Hello,
> 
> We've had an issue for a while, since the introduction of 4451d3f59f2a
> ("clocksource/drivers/fttmr010: Fix set_next_event handler") in Linux, where
> our qemu instances have a "sticky" behaviour. The VM becomes unresponsive at
> unexpected times for unpredicable but often long periods of time.
> 
> This series, along with the associated patch to Linux's fttmr010 driver[0],
> aims to resolve it.
> 
> [0] http://patchwork.ozlabs.org/patch/1023363/

I gave the whole a try and on a x86 host, QEMU reaches :

[    8.282333] systemd[1]: Set hostname to <romulus>.

on a ppc64el :

[   11.497910] systemd[1]: Set hostname to <romulus>.

which is much better than before.

As the QEMU patchset does not seem to impact support for older images, 
I have updated the aspeed-3.1 branch and also created a new branch for 
dev : aspeed-4.0.

> The series an RFC series because it doesn't fix all the cases, just
> demonstrates a potential way forward. The approach is to provide back-pressure
> - to test if the reload value is set to a value that's smaller than some
> experimentally determined value (in this case, 20us), and if so, configure a
> period of at least 20us. The MAX() of the requested and minimum values is then
> set in the reload register rather than the requested value, which can then be
> read back by Linux. The fttmr010 driver, upon observing the greater value,
> starts pushing back on the clock event subsystem, which then iteratively
> determines a minimum acceptible event period before proceeding with the boot
> process.
> 
> The implementation does not take care of the match register cases, but at the
> moment they are unused by Linux on Aspeed SoCs. The match register case is not
> taken care of because I'm not sure if this implementation is what we want to
> use going forward, or whether we want to do something that's completely hidden
> in the qemu model. However taking advantage of Linux's existing support for
> determining minimum timer periods seemed like easy solution.
> 
> The stickiness turns out to be a consequence of Linux configuring a very short
> timer period (1us, mostly due to the timerio_rng driver), which causes qemu to
> spend a large chunk of its timeslice handling the host timer interrupt rather
> than executing guest code.
> 
> Please critique!

We are adding another QEMU HW tweak in Linux. We need feedback from the
Linux maintainers I would say.

Thanks,

C. 
 
> 
> Andrew
> 
> Andrew Jeffery (4):
>   timer: aspeed: Fire interrupt on failure to meet deadline
>   timer: aspeed: Status register contains reload for stopped timer
>   timer: aspeed: Fix match calculations
>   timer: aspeed: Provide back-pressure information for short periods
> 
>  hw/misc/aspeed_scu.c            |  6 ++++++
>  hw/timer/aspeed_timer.c         | 37 ++++++++++++++++++++++++++-------
>  include/hw/timer/aspeed_timer.h |  1 +
>  3 files changed, 36 insertions(+), 8 deletions(-)
>
Andrew Jeffery Jan. 14, 2019, 1:43 a.m. UTC | #2
On Fri, 11 Jan 2019, at 20:44, Cédric Le Goater wrote:
> Hello Andrew,
> 
> On 1/11/19 4:56 AM, Andrew Jeffery wrote:
> > Hello,
> > 
> > We've had an issue for a while, since the introduction of 4451d3f59f2a
> > ("clocksource/drivers/fttmr010: Fix set_next_event handler") in Linux, where
> > our qemu instances have a "sticky" behaviour. The VM becomes unresponsive at
> > unexpected times for unpredicable but often long periods of time.
> > 
> > This series, along with the associated patch to Linux's fttmr010 driver[0],
> > aims to resolve it.
> > 
> > [0] http://patchwork.ozlabs.org/patch/1023363/
> 
> I gave the whole a try and on a x86 host, QEMU reaches :
> 
> [    8.282333] systemd[1]: Set hostname to <romulus>.
> 
> on a ppc64el :
> 
> [   11.497910] systemd[1]: Set hostname to <romulus>.
> 
> which is much better than before.
> 
> As the QEMU patchset does not seem to impact support for older images, 
> I have updated the aspeed-3.1 branch and also created a new branch for 
> dev : aspeed-4.0.
> 
> > The series an RFC series because it doesn't fix all the cases, just
> > demonstrates a potential way forward. The approach is to provide back-pressure
> > - to test if the reload value is set to a value that's smaller than some
> > experimentally determined value (in this case, 20us), and if so, configure a
> > period of at least 20us. The MAX() of the requested and minimum values is then
> > set in the reload register rather than the requested value, which can then be
> > read back by Linux. The fttmr010 driver, upon observing the greater value,
> > starts pushing back on the clock event subsystem, which then iteratively
> > determines a minimum acceptible event period before proceeding with the boot
> > process.
> > 
> > The implementation does not take care of the match register cases, but at the
> > moment they are unused by Linux on Aspeed SoCs. The match register case is not
> > taken care of because I'm not sure if this implementation is what we want to
> > use going forward, or whether we want to do something that's completely hidden
> > in the qemu model. However taking advantage of Linux's existing support for
> > determining minimum timer periods seemed like easy solution.

Do you mind helping me hash out the full behaviour?

The key requirement is that any modelled timer period not be shorter than e.g. the
20us I picked. So there are a number cases:

MIN_PERIOD = ticks(20us)

1. RELOAD < MIN_PERIOD
2. RELOAD >= (2 * MIN_PERIOD), MATCH{1,2} < MIN_PERIOD
3. RELOAD >= (2 * MIN_PERIOD), (RELOAD - (RELOAD - MATCH{1,2})) < MIN_PERIOD
4. MIN_PERIOD <= RELOAD < (2 * MIN_PERIOD), MATCH{1,2} < RELOAD
5. RELOAD > (2 * MIN_PERIOD), MIN_PERIOD < MATCH1 <= (RELOAD - MIN_PERIOD), 0 <= (MATCH1 - MATCH2) < MIN_PERIOD
6. RELOAD > (2 * MIN_PERIOD), 0 <= (MATCH2 - MATCH1) < MIN_PERIOD, MIN_PERIOD < MATCH2 <= (RELOAD - MIN_PERIOD)
7. RELOAD < (3 * MIN_PERIOD), 0 < MATCH1 < RELOAD, 0 < MATCH2 < RELOAD

The provided patches only solve case 1 (with the assumption that the match
registers contain an irrelevant value).

For the rest, I think we need to implement the following:

Each time RELOAD, MATCH1, or MATCH2 are modified, check the constraints
above, and expand all values as necessary to accommodate the minimum period
constraint of 20us. With the constraint of "not-before" on a timer event,
conceptually the algorithm looks like the following for the most general case of
two valid match register values:

* Define registers RELOAD, MATCH1, MATCH2

* Sort the values of RELOAD, MATCH1 and MATCH2 such that we have values
R > X > Y > 0 where R maps to RELOAD, and X and Y to MATCH1 and MATCH2 as
appropriate

* Set dY = MAX(MIN_PERIOD - (Y - 0), 0);
* Set Y = Y + dY
* Set X = X + dY
* Set R = R + dY
* Set dX = MAX(MIN_PERIOD - (X - Y), 0);
* Set X = X + dX
* Set R = R + dX
* Set dR = MAX(MIN_PERIOD - (R - X), 0);
* Set R = R + dR

* Assign R, X and Y back to RELOAD, MATCH1 and MATCH2 as appropriate

With this we require R <= (UINT32_MAX - (2 * (MIN_PERIOD - 1))) to
allow for any adjustments.

The downfall of this approach is it only works for a stopped timer. I can't see
how we could expand time on a running timer to account for a short match period
*and* keep consistency with the monotonic decrease of the status register.

Unless we use shadow registers or track things like ticks-per-tick? Currently the
way the timer is implemented is it derives the ticks and time periods directly from
the configuration of the model. If we implement a set of shadow registers that
contain the expanded times above and we translate between the configured values
and the dilated time it might be possible?

That's a half-baked thought, so if it doesn't make sense please ask questions and
I'll try to untangle the idea from the wool in my head.

It seems this needs some deep consideration :/

Andrew