diff mbox

[PATCHv7,01/11] clockevents: Prefer CPU local devices over global devices

Message ID 1370291642-13259-2-git-send-email-sboyd@codeaurora.org
State New
Headers show

Commit Message

Stephen Boyd June 3, 2013, 8:33 p.m. UTC
On an SMP system with only one global clockevent and a dummy
clockevent per CPU we run into problems. We want the dummy
clockevents to be registered as the per CPU tick devices, but
we can only achieve that if we register the dummy clockevents
before the global clockevent or if we artificially inflate the
rating of the dummy clockevents to be higher than the rating
of the global clockevent. Failure to do so leads to boot
hangs when the dummy timers are registered on all other CPUs
besides the CPU that accepted the global clockevent as its tick
device and there is no broadcast timer to poke the dummy
devices.

If we're registering multiple clockevents and one clockevent is
global and the other is local to a particular CPU we should
choose to use the local clockevent regardless of the rating of
the device. This way, if the clockevent is a dummy it will take
the tick device duty as long as there isn't a higher rated tick
device and any global clockevent will be bumped out into
broadcast mode, fixing the problem described above.

Reported-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Sören Brinkmann <soren.brinkmann@xilinx.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>,
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
---
 kernel/time/tick-common.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Daniel Lezcano June 6, 2013, 3:12 p.m. UTC | #1
On 06/03/2013 10:33 PM, Stephen Boyd wrote:
> On an SMP system with only one global clockevent and a dummy
> clockevent per CPU we run into problems. We want the dummy
> clockevents to be registered as the per CPU tick devices, but
> we can only achieve that if we register the dummy clockevents
> before the global clockevent or if we artificially inflate the
> rating of the dummy clockevents to be higher than the rating
> of the global clockevent. Failure to do so leads to boot
> hangs when the dummy timers are registered on all other CPUs
> besides the CPU that accepted the global clockevent as its tick
> device and there is no broadcast timer to poke the dummy
> devices.
> 
> If we're registering multiple clockevents and one clockevent is
> global and the other is local to a particular CPU we should
> choose to use the local clockevent regardless of the rating of
> the device. This way, if the clockevent is a dummy it will take
> the tick device duty as long as there isn't a higher rated tick
> device and any global clockevent will be bumped out into
> broadcast mode, fixing the problem described above.

It is not clear the connection between the changelog, the patch and the
comment. Could you clarify a bit ?

Thanks
  -- Daniel



> Reported-by: Mark Rutland <mark.rutland@arm.com>
> Tested-by: Mark Rutland <mark.rutland@arm.com>
> Tested-by: Sören Brinkmann <soren.brinkmann@xilinx.com>
> Acked-by: Marc Zyngier <marc.zyngier@arm.com>,
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
> ---
>  kernel/time/tick-common.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
> index 5d3fb10..3da62de 100644
> --- a/kernel/time/tick-common.c
> +++ b/kernel/time/tick-common.c
> @@ -254,9 +254,10 @@ static int tick_check_new_device(struct clock_event_device *newdev)
>  		    !(newdev->features & CLOCK_EVT_FEAT_ONESHOT))
>  			goto out_bc;
>  		/*
> -		 * Check the rating
> +		 * Check the rating, but prefer CPU local devices
>  		 */
> -		if (curdev->rating >= newdev->rating)
> +		if (curdev->rating >= newdev->rating &&
> +		    cpumask_equal(curdev->cpumask, newdev->cpumask))
>  			goto out_bc;
>  	}
>  
>
Stephen Boyd June 6, 2013, 6:04 p.m. UTC | #2
On 06/06, Daniel Lezcano wrote:
> On 06/03/2013 10:33 PM, Stephen Boyd wrote:
> > On an SMP system with only one global clockevent and a dummy
> > clockevent per CPU we run into problems. We want the dummy
> > clockevents to be registered as the per CPU tick devices, but
> > we can only achieve that if we register the dummy clockevents
> > before the global clockevent or if we artificially inflate the
> > rating of the dummy clockevents to be higher than the rating
> > of the global clockevent. Failure to do so leads to boot
> > hangs when the dummy timers are registered on all other CPUs
> > besides the CPU that accepted the global clockevent as its tick
> > device and there is no broadcast timer to poke the dummy
> > devices.
> > 
> > If we're registering multiple clockevents and one clockevent is
> > global and the other is local to a particular CPU we should
> > choose to use the local clockevent regardless of the rating of
> > the device. This way, if the clockevent is a dummy it will take
> > the tick device duty as long as there isn't a higher rated tick
> > device and any global clockevent will be bumped out into
> > broadcast mode, fixing the problem described above.
> 
> It is not clear the connection between the changelog, the patch and the
> comment. Could you clarify a bit ?
> 

There is one tick device per-cpu and one broadcast device. The
broadcast device can only be a global clockevent, whereas the
per-cpu tick device can be a global clockevent or a per-cpu
clockevent. The code tries hard to keep per-cpu clockevents in
the tick device slots but it has an ordering/rating requirement
that doesn't work when there are only dummy per-cpu devices and
one global device.

Perhaps an example will help. Let's say you only have one global
clockevent such as the sp804, and you have SMP enabled. To
support SMP we have to register dummy clockevents on each CPU so
that the sp804 can go into broadcast mode. If we don't do this,
only the CPU that registered the sp804 will get interrupts while
the other CPUs will be left with no tick device and thus no
scheduling. To fix this we register dummy clockevents on all the
CPUs _before_ we register the sp804 to force the sp804 into the
broadcast slot. Or we give the dummy clockevents a higher rating
than the sp804 so that when we register them after the sp804 the
sp804 is bumped out to broadcast duty.

If the dummy devices are registered before the sp804 we can give
the dummies a low rating and the sp804 will still go into the
broadcast slot due to this code:

	/*
	 * If we have a cpu local device already, do not replace it
	 * by a non cpu local device
	 */
	if (curdev && cpumask_equal(curdev->cpumask, cpumask_of(cpu)))
		goto out_bc;

If we register the sp804 before the dummies we're also fine as
long as the rating of the dummy is more than the sp804.  Playing
games with the dummy rating is not very nice so this patch fixes
it by allowing the per-cpu device to replace the global device no
matter what the rating of the global device is.

This fixes the sp804 case when the dummy is rated lower than
sp804 and it removes any ordering requirement from the
registration of clockevents. It also completes the logic above
where we prefer cpu local devices over non cpu local devices.
Daniel Lezcano June 6, 2013, 10:30 p.m. UTC | #3
On 06/06/2013 08:04 PM, Stephen Boyd wrote:
> On 06/06, Daniel Lezcano wrote:
>> On 06/03/2013 10:33 PM, Stephen Boyd wrote:
>>> On an SMP system with only one global clockevent and a dummy
>>> clockevent per CPU we run into problems. We want the dummy
>>> clockevents to be registered as the per CPU tick devices, but
>>> we can only achieve that if we register the dummy clockevents
>>> before the global clockevent or if we artificially inflate the
>>> rating of the dummy clockevents to be higher than the rating
>>> of the global clockevent. Failure to do so leads to boot
>>> hangs when the dummy timers are registered on all other CPUs
>>> besides the CPU that accepted the global clockevent as its tick
>>> device and there is no broadcast timer to poke the dummy
>>> devices.
>>>
>>> If we're registering multiple clockevents and one clockevent is
>>> global and the other is local to a particular CPU we should
>>> choose to use the local clockevent regardless of the rating of
>>> the device. This way, if the clockevent is a dummy it will take
>>> the tick device duty as long as there isn't a higher rated tick
>>> device and any global clockevent will be bumped out into
>>> broadcast mode, fixing the problem described above.
>>
>> It is not clear the connection between the changelog, the patch and the
>> comment. Could you clarify a bit ?
>>
> 
> There is one tick device per-cpu and one broadcast device. The
> broadcast device can only be a global clockevent, whereas the
> per-cpu tick device can be a global clockevent or a per-cpu
> clockevent. The code tries hard to keep per-cpu clockevents in
> the tick device slots but it has an ordering/rating requirement
> that doesn't work when there are only dummy per-cpu devices and
> one global device.
> 
> Perhaps an example will help. Let's say you only have one global
> clockevent such as the sp804, and you have SMP enabled. To
> support SMP we have to register dummy clockevents on each CPU so
> that the sp804 can go into broadcast mode. If we don't do this,
> only the CPU that registered the sp804 will get interrupts while
> the other CPUs will be left with no tick device and thus no
> scheduling. To fix this we register dummy clockevents on all the
> CPUs _before_ we register the sp804 to force the sp804 into the
> broadcast slot. Or we give the dummy clockevents a higher rating
> than the sp804 so that when we register them after the sp804 the
> sp804 is bumped out to broadcast duty.
> 
> If the dummy devices are registered before the sp804 we can give
> the dummies a low rating and the sp804 will still go into the
> broadcast slot due to this code:
> 
> 	/*
> 	 * If we have a cpu local device already, do not replace it
> 	 * by a non cpu local device
> 	 */
> 	if (curdev && cpumask_equal(curdev->cpumask, cpumask_of(cpu)))
> 		goto out_bc;
> 
> If we register the sp804 before the dummies we're also fine as
> long as the rating of the dummy is more than the sp804.  Playing
> games with the dummy rating is not very nice so this patch fixes
> it by allowing the per-cpu device to replace the global device no
> matter what the rating of the global device is.
> 
> This fixes the sp804 case when the dummy is rated lower than
> sp804 and it removes any ordering requirement from the
> registration of clockevents. It also completes the logic above
> where we prefer cpu local devices over non cpu local devices.

Thanks for the detailed explanation.

Did Thomas reacted to this patch ?
Stephen Boyd June 6, 2013, 10:38 p.m. UTC | #4
On 06/07, Daniel Lezcano wrote:
> On 06/06/2013 08:04 PM, Stephen Boyd wrote:
> > On 06/06, Daniel Lezcano wrote:
> >> On 06/03/2013 10:33 PM, Stephen Boyd wrote:
> >>> On an SMP system with only one global clockevent and a dummy
> >>> clockevent per CPU we run into problems. We want the dummy
> >>> clockevents to be registered as the per CPU tick devices, but
> >>> we can only achieve that if we register the dummy clockevents
> >>> before the global clockevent or if we artificially inflate the
> >>> rating of the dummy clockevents to be higher than the rating
> >>> of the global clockevent. Failure to do so leads to boot
> >>> hangs when the dummy timers are registered on all other CPUs
> >>> besides the CPU that accepted the global clockevent as its tick
> >>> device and there is no broadcast timer to poke the dummy
> >>> devices.
> >>>
> >>> If we're registering multiple clockevents and one clockevent is
> >>> global and the other is local to a particular CPU we should
> >>> choose to use the local clockevent regardless of the rating of
> >>> the device. This way, if the clockevent is a dummy it will take
> >>> the tick device duty as long as there isn't a higher rated tick
> >>> device and any global clockevent will be bumped out into
> >>> broadcast mode, fixing the problem described above.
> >>
> >> It is not clear the connection between the changelog, the patch and the
> >> comment. Could you clarify a bit ?
> >>
> > 
> > There is one tick device per-cpu and one broadcast device. The
> > broadcast device can only be a global clockevent, whereas the
> > per-cpu tick device can be a global clockevent or a per-cpu
> > clockevent. The code tries hard to keep per-cpu clockevents in
> > the tick device slots but it has an ordering/rating requirement
> > that doesn't work when there are only dummy per-cpu devices and
> > one global device.
> > 
> > Perhaps an example will help. Let's say you only have one global
> > clockevent such as the sp804, and you have SMP enabled. To
> > support SMP we have to register dummy clockevents on each CPU so
> > that the sp804 can go into broadcast mode. If we don't do this,
> > only the CPU that registered the sp804 will get interrupts while
> > the other CPUs will be left with no tick device and thus no
> > scheduling. To fix this we register dummy clockevents on all the
> > CPUs _before_ we register the sp804 to force the sp804 into the
> > broadcast slot. Or we give the dummy clockevents a higher rating
> > than the sp804 so that when we register them after the sp804 the
> > sp804 is bumped out to broadcast duty.
> > 
> > If the dummy devices are registered before the sp804 we can give
> > the dummies a low rating and the sp804 will still go into the
> > broadcast slot due to this code:
> > 
> > 	/*
> > 	 * If we have a cpu local device already, do not replace it
> > 	 * by a non cpu local device
> > 	 */
> > 	if (curdev && cpumask_equal(curdev->cpumask, cpumask_of(cpu)))
> > 		goto out_bc;
> > 
> > If we register the sp804 before the dummies we're also fine as
> > long as the rating of the dummy is more than the sp804.  Playing
> > games with the dummy rating is not very nice so this patch fixes
> > it by allowing the per-cpu device to replace the global device no
> > matter what the rating of the global device is.
> > 
> > This fixes the sp804 case when the dummy is rated lower than
> > sp804 and it removes any ordering requirement from the
> > registration of clockevents. It also completes the logic above
> > where we prefer cpu local devices over non cpu local devices.
> 
> Thanks for the detailed explanation.
> 
> Did Thomas reacted to this patch ?
> 

So far there has been no response from Thomas.
Stephen Boyd June 12, 2013, 9:44 p.m. UTC | #5
On 06/06, Stephen Boyd wrote:
> On 06/07, Daniel Lezcano wrote:
> > On 06/06/2013 08:04 PM, Stephen Boyd wrote:
> > > On 06/06, Daniel Lezcano wrote:
> > >> On 06/03/2013 10:33 PM, Stephen Boyd wrote:
> > >>> On an SMP system with only one global clockevent and a dummy
> > >>> clockevent per CPU we run into problems. We want the dummy
> > >>> clockevents to be registered as the per CPU tick devices, but
> > >>> we can only achieve that if we register the dummy clockevents
> > >>> before the global clockevent or if we artificially inflate the
> > >>> rating of the dummy clockevents to be higher than the rating
> > >>> of the global clockevent. Failure to do so leads to boot
> > >>> hangs when the dummy timers are registered on all other CPUs
> > >>> besides the CPU that accepted the global clockevent as its tick
> > >>> device and there is no broadcast timer to poke the dummy
> > >>> devices.
> > >>>
> > >>> If we're registering multiple clockevents and one clockevent is
> > >>> global and the other is local to a particular CPU we should
> > >>> choose to use the local clockevent regardless of the rating of
> > >>> the device. This way, if the clockevent is a dummy it will take
> > >>> the tick device duty as long as there isn't a higher rated tick
> > >>> device and any global clockevent will be bumped out into
> > >>> broadcast mode, fixing the problem described above.
> > >>
> > >> It is not clear the connection between the changelog, the patch and the
> > >> comment. Could you clarify a bit ?
> > >>
> > > 
> > > There is one tick device per-cpu and one broadcast device. The
> > > broadcast device can only be a global clockevent, whereas the
> > > per-cpu tick device can be a global clockevent or a per-cpu
> > > clockevent. The code tries hard to keep per-cpu clockevents in
> > > the tick device slots but it has an ordering/rating requirement
> > > that doesn't work when there are only dummy per-cpu devices and
> > > one global device.
> > > 
> > > Perhaps an example will help. Let's say you only have one global
> > > clockevent such as the sp804, and you have SMP enabled. To
> > > support SMP we have to register dummy clockevents on each CPU so
> > > that the sp804 can go into broadcast mode. If we don't do this,
> > > only the CPU that registered the sp804 will get interrupts while
> > > the other CPUs will be left with no tick device and thus no
> > > scheduling. To fix this we register dummy clockevents on all the
> > > CPUs _before_ we register the sp804 to force the sp804 into the
> > > broadcast slot. Or we give the dummy clockevents a higher rating
> > > than the sp804 so that when we register them after the sp804 the
> > > sp804 is bumped out to broadcast duty.
> > > 
> > > If the dummy devices are registered before the sp804 we can give
> > > the dummies a low rating and the sp804 will still go into the
> > > broadcast slot due to this code:
> > > 
> > > 	/*
> > > 	 * If we have a cpu local device already, do not replace it
> > > 	 * by a non cpu local device
> > > 	 */
> > > 	if (curdev && cpumask_equal(curdev->cpumask, cpumask_of(cpu)))
> > > 		goto out_bc;
> > > 
> > > If we register the sp804 before the dummies we're also fine as
> > > long as the rating of the dummy is more than the sp804.  Playing
> > > games with the dummy rating is not very nice so this patch fixes
> > > it by allowing the per-cpu device to replace the global device no
> > > matter what the rating of the global device is.
> > > 
> > > This fixes the sp804 case when the dummy is rated lower than
> > > sp804 and it removes any ordering requirement from the
> > > registration of clockevents. It also completes the logic above
> > > where we prefer cpu local devices over non cpu local devices.
> > 
> > Thanks for the detailed explanation.
> > 
> > Did Thomas reacted to this patch ?
> > 
> 
> So far there has been no response from Thomas.
> 

Will you ack this patch anyway? Or do we need Thomas to review
this patch? It seems that this patch series has stalled again.
Daniel Lezcano June 13, 2013, 9:33 a.m. UTC | #6
On 06/12/2013 11:44 PM, Stephen Boyd wrote:
> On 06/06, Stephen Boyd wrote:
>> On 06/07, Daniel Lezcano wrote:
>>> On 06/06/2013 08:04 PM, Stephen Boyd wrote:
>>>> On 06/06, Daniel Lezcano wrote:
>>>>> On 06/03/2013 10:33 PM, Stephen Boyd wrote:
>>>>>> On an SMP system with only one global clockevent and a dummy
>>>>>> clockevent per CPU we run into problems. We want the dummy
>>>>>> clockevents to be registered as the per CPU tick devices, but
>>>>>> we can only achieve that if we register the dummy clockevents
>>>>>> before the global clockevent or if we artificially inflate the
>>>>>> rating of the dummy clockevents to be higher than the rating
>>>>>> of the global clockevent. Failure to do so leads to boot
>>>>>> hangs when the dummy timers are registered on all other CPUs
>>>>>> besides the CPU that accepted the global clockevent as its tick
>>>>>> device and there is no broadcast timer to poke the dummy
>>>>>> devices.
>>>>>>
>>>>>> If we're registering multiple clockevents and one clockevent is
>>>>>> global and the other is local to a particular CPU we should
>>>>>> choose to use the local clockevent regardless of the rating of
>>>>>> the device. This way, if the clockevent is a dummy it will take
>>>>>> the tick device duty as long as there isn't a higher rated tick
>>>>>> device and any global clockevent will be bumped out into
>>>>>> broadcast mode, fixing the problem described above.
>>>>>
>>>>> It is not clear the connection between the changelog, the patch and the
>>>>> comment. Could you clarify a bit ?
>>>>>
>>>>
>>>> There is one tick device per-cpu and one broadcast device. The
>>>> broadcast device can only be a global clockevent, whereas the
>>>> per-cpu tick device can be a global clockevent or a per-cpu
>>>> clockevent. The code tries hard to keep per-cpu clockevents in
>>>> the tick device slots but it has an ordering/rating requirement
>>>> that doesn't work when there are only dummy per-cpu devices and
>>>> one global device.
>>>>
>>>> Perhaps an example will help. Let's say you only have one global
>>>> clockevent such as the sp804, and you have SMP enabled. To
>>>> support SMP we have to register dummy clockevents on each CPU so
>>>> that the sp804 can go into broadcast mode. If we don't do this,
>>>> only the CPU that registered the sp804 will get interrupts while
>>>> the other CPUs will be left with no tick device and thus no
>>>> scheduling. To fix this we register dummy clockevents on all the
>>>> CPUs _before_ we register the sp804 to force the sp804 into the
>>>> broadcast slot. Or we give the dummy clockevents a higher rating
>>>> than the sp804 so that when we register them after the sp804 the
>>>> sp804 is bumped out to broadcast duty.
>>>>
>>>> If the dummy devices are registered before the sp804 we can give
>>>> the dummies a low rating and the sp804 will still go into the
>>>> broadcast slot due to this code:
>>>>
>>>> 	/*
>>>> 	 * If we have a cpu local device already, do not replace it
>>>> 	 * by a non cpu local device
>>>> 	 */
>>>> 	if (curdev && cpumask_equal(curdev->cpumask, cpumask_of(cpu)))
>>>> 		goto out_bc;
>>>>
>>>> If we register the sp804 before the dummies we're also fine as
>>>> long as the rating of the dummy is more than the sp804.  Playing
>>>> games with the dummy rating is not very nice so this patch fixes
>>>> it by allowing the per-cpu device to replace the global device no
>>>> matter what the rating of the global device is.
>>>>
>>>> This fixes the sp804 case when the dummy is rated lower than
>>>> sp804 and it removes any ordering requirement from the
>>>> registration of clockevents. It also completes the logic above
>>>> where we prefer cpu local devices over non cpu local devices.
>>>
>>> Thanks for the detailed explanation.
>>>
>>> Did Thomas reacted to this patch ?
>>>
>>
>> So far there has been no response from Thomas.
>>
> 
> Will you ack this patch anyway? Or do we need Thomas to review
> this patch? It seems that this patch series has stalled again.

I prefer Thomas to have a look at it and ack it. I changed Cc to To for
Thomas.

Thanks
  -- Daniel
Thomas Gleixner June 13, 2013, 1:15 p.m. UTC | #7
On Thu, 13 Jun 2013, Daniel Lezcano wrote:
> I prefer Thomas to have a look at it and ack it. I changed Cc to To for
> Thomas.

The patch does not apply on tip timers/core. The code has been
reworked a month ago. Please work against tip timers/core. That's
where this stuff ends up.

Thanks,

	tglx
Soren Brinkmann June 13, 2013, 8:16 p.m. UTC | #8
On Thu, Jun 13, 2013 at 11:39:50AM -0700, Stephen Boyd wrote:
> On 06/13, Thomas Gleixner wrote:
> > On Thu, 13 Jun 2013, Daniel Lezcano wrote:
> > > I prefer Thomas to have a look at it and ack it. I changed Cc to To for
> > > Thomas.
> > 
> > The patch does not apply on tip timers/core. The code has been
> > reworked a month ago. Please work against tip timers/core. That's
> > where this stuff ends up.
> > 
> 
> Ah, I thought your patch series had stalled. Here is a refreshed
> patch. Every other patch in this series applies cleanly to tip
> timers/core so I don't want to resend them again unless
> absolutely necessary.
> 
> -----8<-----
> Subject: [PATCH v8] clockevents: Prefer CPU local devices over global devices
> 
> On an SMP system with only one global clockevent and a dummy
> clockevent per CPU we run into problems. We want the dummy
> clockevents to be registered as the per CPU tick devices, but
> we can only achieve that if we register the dummy clockevents
> before the global clockevent or if we artificially inflate the
> rating of the dummy clockevents to be higher than the rating
> of the global clockevent. Failure to do so leads to boot
> hangs when the dummy timers are registered on all other CPUs
> besides the CPU that accepted the global clockevent as its tick
> device and there is no broadcast timer to poke the dummy
> devices.
> 
> If we're registering multiple clockevents and one clockevent is
> global and the other is local to a particular CPU we should
> choose to use the local clockevent regardless of the rating of
> the device. This way, if the clockevent is a dummy it will take
> the tick device duty as long as there isn't a higher rated tick
> device and any global clockevent will be bumped out into
> broadcast mode, fixing the problem described above.
> 
> Reported-by: Mark Rutland <mark.rutland@arm.com>
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Tested-by: Sören Brinkmann <soren.brinkmann@xilinx.com>

I retested my case on tip/timers/core with the same results.

	Sören
Mark Rutland June 18, 2013, 10:22 a.m. UTC | #9
On Thu, Jun 13, 2013 at 07:39:50PM +0100, Stephen Boyd wrote:
> On 06/13, Thomas Gleixner wrote:
> > On Thu, 13 Jun 2013, Daniel Lezcano wrote:
> > > I prefer Thomas to have a look at it and ack it. I changed Cc to To for
> > > Thomas.
> > 
> > The patch does not apply on tip timers/core. The code has been
> > reworked a month ago. Please work against tip timers/core. That's
> > where this stuff ends up.
> > 
> 
> Ah, I thought your patch series had stalled. Here is a refreshed
> patch. Every other patch in this series applies cleanly to tip
> timers/core so I don't want to resend them again unless
> absolutely necessary.
> 
> -----8<-----
> Subject: [PATCH v8] clockevents: Prefer CPU local devices over global devices
> 
> On an SMP system with only one global clockevent and a dummy
> clockevent per CPU we run into problems. We want the dummy
> clockevents to be registered as the per CPU tick devices, but
> we can only achieve that if we register the dummy clockevents
> before the global clockevent or if we artificially inflate the
> rating of the dummy clockevents to be higher than the rating
> of the global clockevent. Failure to do so leads to boot
> hangs when the dummy timers are registered on all other CPUs
> besides the CPU that accepted the global clockevent as its tick
> device and there is no broadcast timer to poke the dummy
> devices.
> 
> If we're registering multiple clockevents and one clockevent is
> global and the other is local to a particular CPU we should
> choose to use the local clockevent regardless of the rating of
> the device. This way, if the clockevent is a dummy it will take
> the tick device duty as long as there isn't a higher rated tick
> device and any global clockevent will be bumped out into
> broadcast mode, fixing the problem described above.
> 
> Reported-by: Mark Rutland <mark.rutland@arm.com>
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>

I've just tested this atop of tip/timers/core on a tc2, using only the
sp804. As previously, without the patch boot hangs, and with the patch
I'm able to reach userspace and do useful things.

Tested-by: Mark Rutland <mark.rutland@arm.com>

Thanks for working on this, Stephen.

Mark.

> ---
>  kernel/time/tick-common.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
> index 5edfb48..edd45f6 100644
> --- a/kernel/time/tick-common.c
> +++ b/kernel/time/tick-common.c
> @@ -243,8 +243,13 @@ static bool tick_check_preferred(struct clock_event_device *curdev,
>  			return false;
>  	}
>  
> -	/* Use the higher rated one */
> -	return !curdev || newdev->rating > curdev->rating;
> +	/*
> +	 * Use the higher rated one, but prefer a CPU local device with a lower
> +	 * rating than a non-CPU local device
> +	 */
> +	return !curdev ||
> +		newdev->rating > curdev->rating ||
> +	       !cpumask_equal(curdev->cpumask, newdev->cpumask);
>  }
>  
>  /*
> 
> -- 
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> hosted by The Linux Foundation
>
Stephen Boyd June 19, 2013, 4:30 p.m. UTC | #10
Thomas,

On 06/18, Mark Rutland wrote:
> On Thu, Jun 13, 2013 at 07:39:50PM +0100, Stephen Boyd wrote:
> > On 06/13, Thomas Gleixner wrote:
> > > On Thu, 13 Jun 2013, Daniel Lezcano wrote:
> > > > I prefer Thomas to have a look at it and ack it. I changed Cc to To for
> > > > Thomas.
> > > 
> > > The patch does not apply on tip timers/core. The code has been
> > > reworked a month ago. Please work against tip timers/core. That's
> > > where this stuff ends up.
> > > 
> > 
> > Ah, I thought your patch series had stalled. Here is a refreshed
> > patch. Every other patch in this series applies cleanly to tip
> > timers/core so I don't want to resend them again unless
> > absolutely necessary.
> > 
> > -----8<-----
> > Subject: [PATCH v8] clockevents: Prefer CPU local devices over global devices
> > 
> > On an SMP system with only one global clockevent and a dummy
> > clockevent per CPU we run into problems. We want the dummy
> > clockevents to be registered as the per CPU tick devices, but
> > we can only achieve that if we register the dummy clockevents
> > before the global clockevent or if we artificially inflate the
> > rating of the dummy clockevents to be higher than the rating
> > of the global clockevent. Failure to do so leads to boot
> > hangs when the dummy timers are registered on all other CPUs
> > besides the CPU that accepted the global clockevent as its tick
> > device and there is no broadcast timer to poke the dummy
> > devices.
> > 
> > If we're registering multiple clockevents and one clockevent is
> > global and the other is local to a particular CPU we should
> > choose to use the local clockevent regardless of the rating of
> > the device. This way, if the clockevent is a dummy it will take
> > the tick device duty as long as there isn't a higher rated tick
> > device and any global clockevent will be bumped out into
> > broadcast mode, fixing the problem described above.
> > 
> > Reported-by: Mark Rutland <mark.rutland@arm.com>
> > Cc: John Stultz <john.stultz@linaro.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
> > Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
> 
> I've just tested this atop of tip/timers/core on a tc2, using only the
> sp804. As previously, without the patch boot hangs, and with the patch
> I'm able to reach userspace and do useful things.
> 
> Tested-by: Mark Rutland <mark.rutland@arm.com>
> 
> Thanks for working on this, Stephen.

Can you pick up the first two patches in this series please? And
preferably make a stable branch that can be pulled into arm-soc?
Then I can send the rest through the arm-soc tree.
Stephen Boyd June 21, 2013, 5:07 p.m. UTC | #11
On 06/19, Stephen Boyd wrote:
> 
> Can you pick up the first two patches in this series please? And
> preferably make a stable branch that can be pulled into arm-soc?
> Then I can send the rest through the arm-soc tree.
> 

ping?
Stephen Boyd June 24, 2013, 8:07 p.m. UTC | #12
On 06/21/13 10:07, Stephen Boyd wrote:
> On 06/19, Stephen Boyd wrote:
>> Can you pick up the first two patches in this series please? And
>> preferably make a stable branch that can be pulled into arm-soc?
>> Then I can send the rest through the arm-soc tree.
>>
> ping?
>

Thomas, please apply these first two patches.
diff mbox

Patch

diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index 5d3fb10..3da62de 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -254,9 +254,10 @@  static int tick_check_new_device(struct clock_event_device *newdev)
 		    !(newdev->features & CLOCK_EVT_FEAT_ONESHOT))
 			goto out_bc;
 		/*
-		 * Check the rating
+		 * Check the rating, but prefer CPU local devices
 		 */
-		if (curdev->rating >= newdev->rating)
+		if (curdev->rating >= newdev->rating &&
+		    cpumask_equal(curdev->cpumask, newdev->cpumask))
 			goto out_bc;
 	}