diff mbox series

[v2] powerpc/pseries: explicitly reschedule during drmem_lmb list traversal

Message ID 20200812012005.1919255-1-nathanl@linux.ibm.com (mailing list archive)
State Superseded
Headers show
Series [v2] powerpc/pseries: explicitly reschedule during drmem_lmb list traversal | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success Successfully applied on branch powerpc/merge (f04b169e953c4db1a3a3c1d23eea09c726f01ee5)
snowpatch_ozlabs/build-ppc64le warning Upstream build failed, couldn't test patch
snowpatch_ozlabs/build-ppc64be warning Upstream build failed, couldn't test patch
snowpatch_ozlabs/build-ppc64e warning Upstream build failed, couldn't test patch
snowpatch_ozlabs/build-pmac32 warning Upstream build failed, couldn't test patch
snowpatch_ozlabs/checkpatch warning total: 0 errors, 2 warnings, 0 checks, 31 lines checked
snowpatch_ozlabs/needsstable warning Please consider tagging this patch for stable!

Commit Message

Nathan Lynch Aug. 12, 2020, 1:20 a.m. UTC
The drmem lmb list can have hundreds of thousands of entries, and
unfortunately lookups take the form of linear searches. As long as
this is the case, traversals have the potential to monopolize the CPU
and provoke lockup reports, workqueue stalls, and the like unless
they explicitly yield.

Rather than placing cond_resched() calls within various
for_each_drmem_lmb() loop blocks in the code, put it in the iteration
expression of the loop macro itself so users can't omit it.

Call cond_resched() on every 20th element. Each iteration of the loop
in DLPAR code paths can involve around ten RTAS calls which can each
take up to 250us, so this ensures the check is performed at worst
every few milliseconds.

Fixes: 6c6ea53725b3 ("powerpc/mm: Separate ibm, dynamic-memory data from DT format")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/include/asm/drmem.h | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

Changes since v1:
* Add bounds assertions in drmem_lmb_next().
* Call cond_resched() in the iterator on only every 20th element
  instead of on every iteration, to reduce overhead in tight loops.

Comments

Christophe Leroy Aug. 12, 2020, 5:19 a.m. UTC | #1
Le 12/08/2020 à 03:20, Nathan Lynch a écrit :
> The drmem lmb list can have hundreds of thousands of entries, and
> unfortunately lookups take the form of linear searches. As long as
> this is the case, traversals have the potential to monopolize the CPU
> and provoke lockup reports, workqueue stalls, and the like unless
> they explicitly yield.
> 
> Rather than placing cond_resched() calls within various
> for_each_drmem_lmb() loop blocks in the code, put it in the iteration
> expression of the loop macro itself so users can't omit it.
> 
> Call cond_resched() on every 20th element. Each iteration of the loop
> in DLPAR code paths can involve around ten RTAS calls which can each
> take up to 250us, so this ensures the check is performed at worst
> every few milliseconds.
> 
> Fixes: 6c6ea53725b3 ("powerpc/mm: Separate ibm, dynamic-memory data from DT format")
> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
> ---
>   arch/powerpc/include/asm/drmem.h | 18 +++++++++++++++++-
>   1 file changed, 17 insertions(+), 1 deletion(-)
> 
> Changes since v1:
> * Add bounds assertions in drmem_lmb_next().
> * Call cond_resched() in the iterator on only every 20th element
>    instead of on every iteration, to reduce overhead in tight loops.
> 
> diff --git a/arch/powerpc/include/asm/drmem.h b/arch/powerpc/include/asm/drmem.h
> index 17ccc6474ab6..583277e30dd2 100644
> --- a/arch/powerpc/include/asm/drmem.h
> +++ b/arch/powerpc/include/asm/drmem.h
> @@ -8,6 +8,9 @@
>   #ifndef _ASM_POWERPC_LMB_H
>   #define _ASM_POWERPC_LMB_H
>   
> +#include <linux/bug.h>
> +#include <linux/sched.h>
> +
>   struct drmem_lmb {
>   	u64     base_addr;
>   	u32     drc_index;
> @@ -26,8 +29,21 @@ struct drmem_lmb_info {
>   
>   extern struct drmem_lmb_info *drmem_info;
>   
> +static inline struct drmem_lmb *drmem_lmb_next(struct drmem_lmb *lmb)
> +{
> +	const unsigned int resched_interval = 20;
> +
> +	BUG_ON(lmb < drmem_info->lmbs);
> +	BUG_ON(lmb >= drmem_info->lmbs + drmem_info->n_lmbs);

BUG_ON() shall be avoided unless absolutely necessary.
Wouldn't WARN_ON() together with an early return be enough ?

> +
> +	if ((lmb - drmem_info->lmbs) % resched_interval == 0)
> +		cond_resched();

Do you need something that precise ? Can't you use 16 or 32 and use a 
logical AND instead of a MODULO ?

And what garanties that lmb is always an element of a table based at 
drmem_info->lmbs ?

What about:

static inline struct drmem_lmb *drmem_lmb_next(struct drmem_lmb *lmb, 
struct drmem_lmb *start)
{
	const unsigned int resched_interval = 16;

	if ((++lmb - start) & resched_interval == 0)
		cond_resched();

	return lmb;
}

#define for_each_drmem_lmb_in_range(lmb, start, end)		\
	for ((lmb) = (start); (lmb) < (end); lmb = drmem_lmb_next(lmb, start))


> +
> +	return ++lmb;
> +}
> +
>   #define for_each_drmem_lmb_in_range(lmb, start, end)		\
> -	for ((lmb) = (start); (lmb) < (end); (lmb)++)
> +	for ((lmb) = (start); (lmb) < (end); lmb = drmem_lmb_next(lmb))
>   
>   #define for_each_drmem_lmb(lmb)					\
>   	for_each_drmem_lmb_in_range((lmb),			\
> 

Christophe
Nathan Lynch Aug. 12, 2020, 1:46 p.m. UTC | #2
Hi Christophe,

Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>> +static inline struct drmem_lmb *drmem_lmb_next(struct drmem_lmb *lmb)
>> +{
>> +	const unsigned int resched_interval = 20;
>> +
>> +	BUG_ON(lmb < drmem_info->lmbs);
>> +	BUG_ON(lmb >= drmem_info->lmbs + drmem_info->n_lmbs);
>
> BUG_ON() shall be avoided unless absolutely necessary.
> Wouldn't WARN_ON() together with an early return be enough ?

Not sure what a sensible early return behavior would be. If the iterator
doesn't advance the pointer the behavior will be a hang.

BUG_ON a bounds-check failure is appropriate here; many users of this
API will corrupt memory otherwise.

>> +
>> +	if ((lmb - drmem_info->lmbs) % resched_interval == 0)
>> +		cond_resched();
>
> Do you need something that precise ? Can't you use 16 or 32 and use a 
> logical AND instead of a MODULO ?

Eh if you're executing in this code you've already lost with respect to
performance considerations at this level, see the discussion on v1. I'll
use 16 since I'm going to reroll the patch though.

> And what garanties that lmb is always an element of a table based at 
> drmem_info->lmbs ?

Well, that's its only intended use right now. There should not be any
other arrays of drmem_lmb objects, and I hope we don't gain any.


> What about:
>
> static inline struct drmem_lmb *drmem_lmb_next(struct drmem_lmb *lmb, 
> struct drmem_lmb *start)
> {
> 	const unsigned int resched_interval = 16;
>
> 	if ((++lmb - start) & resched_interval == 0)
                           ^^^
Did you mean '%' here? The bitwise AND doesn't do what I want.

Otherwise, making drmem_lmb_next() more general by adding a 'start'
argument could ease refactoring to come, so I'll do that.
Christophe Leroy Aug. 12, 2020, 2:26 p.m. UTC | #3
Le 12/08/2020 à 15:46, Nathan Lynch a écrit :
> Hi Christophe,
> 
> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>>> +static inline struct drmem_lmb *drmem_lmb_next(struct drmem_lmb *lmb)
>>> +{
>>> +	const unsigned int resched_interval = 20;
>>> +
>>> +	BUG_ON(lmb < drmem_info->lmbs);
>>> +	BUG_ON(lmb >= drmem_info->lmbs + drmem_info->n_lmbs);
>>
>> BUG_ON() shall be avoided unless absolutely necessary.
>> Wouldn't WARN_ON() together with an early return be enough ?
> 
> Not sure what a sensible early return behavior would be. If the iterator
> doesn't advance the pointer the behavior will be a hang.

I was thinking about returning lmb++ without checking the cond_resched 
stuff.

> 
> BUG_ON a bounds-check failure is appropriate here; many users of this
> API will corrupt memory otherwise.

It looks really strange to me to do bounds-checks in an iterator like 
this. Should be checked before entering the loop.

> 
>>> +
>>> +	if ((lmb - drmem_info->lmbs) % resched_interval == 0)
>>> +		cond_resched();
>>
>> Do you need something that precise ? Can't you use 16 or 32 and use a
>> logical AND instead of a MODULO ?
> 
> Eh if you're executing in this code you've already lost with respect to
> performance considerations at this level, see the discussion on v1. I'll
> use 16 since I'm going to reroll the patch though.
> 
>> And what garanties that lmb is always an element of a table based at
>> drmem_info->lmbs ?
> 
> Well, that's its only intended use right now. There should not be any
> other arrays of drmem_lmb objects, and I hope we don't gain any.
> 
> 
>> What about:
>>
>> static inline struct drmem_lmb *drmem_lmb_next(struct drmem_lmb *lmb,
>> struct drmem_lmb *start)
>> {
>> 	const unsigned int resched_interval = 16;
>>
>> 	if ((++lmb - start) & resched_interval == 0)
>                             ^^^
> Did you mean '%' here? The bitwise AND doesn't do what I want.

I meant '& (resched_interval - 1)' indeed. But yes we can leave the %, 
GCC will change a % 16 to an & 15.

> 
> Otherwise, making drmem_lmb_next() more general by adding a 'start'
> argument could ease refactoring to come, so I'll do that.

Yes, the main idea is to avoid the access to a global variable in such 
an helper.

> 

Christophe
Tyrel Datwyler Aug. 12, 2020, 4:22 p.m. UTC | #4
On 8/11/20 6:20 PM, Nathan Lynch wrote:
> The drmem lmb list can have hundreds of thousands of entries, and
> unfortunately lookups take the form of linear searches. As long as
> this is the case, traversals have the potential to monopolize the CPU
> and provoke lockup reports, workqueue stalls, and the like unless
> they explicitly yield.
> 
> Rather than placing cond_resched() calls within various
> for_each_drmem_lmb() loop blocks in the code, put it in the iteration
> expression of the loop macro itself so users can't omit it.
> 
> Call cond_resched() on every 20th element. Each iteration of the loop
> in DLPAR code paths can involve around ten RTAS calls which can each
> take up to 250us, so this ensures the check is performed at worst
> every few milliseconds.
> 
> Fixes: 6c6ea53725b3 ("powerpc/mm: Separate ibm, dynamic-memory data from DT format")
> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
> ---
>  arch/powerpc/include/asm/drmem.h | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> Changes since v1:
> * Add bounds assertions in drmem_lmb_next().
> * Call cond_resched() in the iterator on only every 20th element
>   instead of on every iteration, to reduce overhead in tight loops.
> 
> diff --git a/arch/powerpc/include/asm/drmem.h b/arch/powerpc/include/asm/drmem.h
> index 17ccc6474ab6..583277e30dd2 100644
> --- a/arch/powerpc/include/asm/drmem.h
> +++ b/arch/powerpc/include/asm/drmem.h
> @@ -8,6 +8,9 @@
>  #ifndef _ASM_POWERPC_LMB_H
>  #define _ASM_POWERPC_LMB_H
>  
> +#include <linux/bug.h>
> +#include <linux/sched.h>
> +
>  struct drmem_lmb {
>  	u64     base_addr;
>  	u32     drc_index;
> @@ -26,8 +29,21 @@ struct drmem_lmb_info {
>  
>  extern struct drmem_lmb_info *drmem_info;
>  
> +static inline struct drmem_lmb *drmem_lmb_next(struct drmem_lmb *lmb)
> +{
> +	const unsigned int resched_interval = 20;
> +
> +	BUG_ON(lmb < drmem_info->lmbs);
> +	BUG_ON(lmb >= drmem_info->lmbs + drmem_info->n_lmbs);

I think BUG_ON is a pretty big no-no these days unless there is no other option.

-Tyrel

> +
> +	if ((lmb - drmem_info->lmbs) % resched_interval == 0)
> +		cond_resched();
> +
> +	return ++lmb;
> +}
> +
>  #define for_each_drmem_lmb_in_range(lmb, start, end)		\
> -	for ((lmb) = (start); (lmb) < (end); (lmb)++)
> +	for ((lmb) = (start); (lmb) < (end); lmb = drmem_lmb_next(lmb))
>  
>  #define for_each_drmem_lmb(lmb)					\
>  	for_each_drmem_lmb_in_range((lmb),			\
>
Michael Ellerman Aug. 13, 2020, 12:28 a.m. UTC | #5
Tyrel Datwyler <tyreld@linux.ibm.com> writes:
> On 8/11/20 6:20 PM, Nathan Lynch wrote:
>> The drmem lmb list can have hundreds of thousands of entries, and
>> unfortunately lookups take the form of linear searches. As long as
>> this is the case, traversals have the potential to monopolize the CPU
>> and provoke lockup reports, workqueue stalls, and the like unless
>> they explicitly yield.
>> 
>> Rather than placing cond_resched() calls within various
>> for_each_drmem_lmb() loop blocks in the code, put it in the iteration
>> expression of the loop macro itself so users can't omit it.
>> 
>> Call cond_resched() on every 20th element. Each iteration of the loop
>> in DLPAR code paths can involve around ten RTAS calls which can each
>> take up to 250us, so this ensures the check is performed at worst
>> every few milliseconds.
>> 
>> Fixes: 6c6ea53725b3 ("powerpc/mm: Separate ibm, dynamic-memory data from DT format")
>> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
>> ---
>>  arch/powerpc/include/asm/drmem.h | 18 +++++++++++++++++-
>>  1 file changed, 17 insertions(+), 1 deletion(-)
>> 
>> Changes since v1:
>> * Add bounds assertions in drmem_lmb_next().
>> * Call cond_resched() in the iterator on only every 20th element
>>   instead of on every iteration, to reduce overhead in tight loops.
>> 
>> diff --git a/arch/powerpc/include/asm/drmem.h b/arch/powerpc/include/asm/drmem.h
>> index 17ccc6474ab6..583277e30dd2 100644
>> --- a/arch/powerpc/include/asm/drmem.h
>> +++ b/arch/powerpc/include/asm/drmem.h
>> @@ -8,6 +8,9 @@
>>  #ifndef _ASM_POWERPC_LMB_H
>>  #define _ASM_POWERPC_LMB_H
>>  
>> +#include <linux/bug.h>
>> +#include <linux/sched.h>
>> +
>>  struct drmem_lmb {
>>  	u64     base_addr;
>>  	u32     drc_index;
>> @@ -26,8 +29,21 @@ struct drmem_lmb_info {
>>  
>>  extern struct drmem_lmb_info *drmem_info;
>>  
>> +static inline struct drmem_lmb *drmem_lmb_next(struct drmem_lmb *lmb)
>> +{
>> +	const unsigned int resched_interval = 20;
>> +
>> +	BUG_ON(lmb < drmem_info->lmbs);
>> +	BUG_ON(lmb >= drmem_info->lmbs + drmem_info->n_lmbs);
>
> I think BUG_ON is a pretty big no-no these days unless there is no other option.

It's complicated, but yes we would like to avoid adding them if we can.

In a case like this there is no other option, *if* the check has to be
in drmem_lmb_next().

But I don't think we really need to check that there.

If for some reason this was called with an *lmb pointing outside of the
lmbs array it would confuse the cond_resched() logic, but it's not worth
crashing the box for that IMHO.

cheers
Nathan Lynch Aug. 13, 2020, 1:07 a.m. UTC | #6
Michael Ellerman <mpe@ellerman.id.au> writes:
> Tyrel Datwyler <tyreld@linux.ibm.com> writes:
>> On 8/11/20 6:20 PM, Nathan Lynch wrote:
>>>  
>>> +static inline struct drmem_lmb *drmem_lmb_next(struct drmem_lmb *lmb)
>>> +{
>>> +	const unsigned int resched_interval = 20;
>>> +
>>> +	BUG_ON(lmb < drmem_info->lmbs);
>>> +	BUG_ON(lmb >= drmem_info->lmbs + drmem_info->n_lmbs);
>>
>> I think BUG_ON is a pretty big no-no these days unless there is no other option.
>
> It's complicated, but yes we would like to avoid adding them if we can.
>
> In a case like this there is no other option, *if* the check has to be
> in drmem_lmb_next().
>
> But I don't think we really need to check that there.
>
> If for some reason this was called with an *lmb pointing outside of the
> lmbs array it would confuse the cond_resched() logic, but it's not worth
> crashing the box for that IMHO.

The BUG_ONs are pretty much orthogonal to the cond_resched().

It's not apparent from the context of the change, but some users of the
for_each_drmem_lmb* macros modify elements of the drmem_info->lmbs
array. If the lmb passed to drmem_lmb_next() violates the bounds check
(say, if the callsite inappropriately modifies it within the loop), such
users are guaranteed to corrupt other objects in memory. This was my
thinking in adding the BUG_ONs, and I don't see another place to do
it.
Michael Ellerman Aug. 17, 2020, 3:45 a.m. UTC | #7
Nathan Lynch <nathanl@linux.ibm.com> writes:
> Michael Ellerman <mpe@ellerman.id.au> writes:
>> Tyrel Datwyler <tyreld@linux.ibm.com> writes:
>>> On 8/11/20 6:20 PM, Nathan Lynch wrote:
>>>>  
>>>> +static inline struct drmem_lmb *drmem_lmb_next(struct drmem_lmb *lmb)
>>>> +{
>>>> +	const unsigned int resched_interval = 20;
>>>> +
>>>> +	BUG_ON(lmb < drmem_info->lmbs);
>>>> +	BUG_ON(lmb >= drmem_info->lmbs + drmem_info->n_lmbs);
>>>
>>> I think BUG_ON is a pretty big no-no these days unless there is no other option.
>>
>> It's complicated, but yes we would like to avoid adding them if we can.
>>
>> In a case like this there is no other option, *if* the check has to be
>> in drmem_lmb_next().
>>
>> But I don't think we really need to check that there.
>>
>> If for some reason this was called with an *lmb pointing outside of the
>> lmbs array it would confuse the cond_resched() logic, but it's not worth
>> crashing the box for that IMHO.
>
> The BUG_ONs are pretty much orthogonal to the cond_resched().

Yes that was kind of my point. We don't need them to implement the
cond_resched() logic.

> It's not apparent from the context of the change, but some users of the
> for_each_drmem_lmb* macros modify elements of the drmem_info->lmbs
> array. If the lmb passed to drmem_lmb_next() violates the bounds check
> (say, if the callsite inappropriately modifies it within the loop), such
> users are guaranteed to corrupt other objects in memory. This was my
> thinking in adding the BUG_ONs, and I don't see another place to do
> it.

If it's really something we're worried about, I think we'd be better off
putting checks for that in the code that's doing those modifications.

That way you have enough context to do something more nuanced than a
BUG(), ie. you can print a useful message and fail whatever operation is
in progress.

If we did that then we could also add those BUG_ONs() as a safety net.

What you really want is a way for the for_each_xxx() construct to
express that there was an error traversing the list, but there isn't
really a nice way to do that in C.

cheers
diff mbox series

Patch

diff --git a/arch/powerpc/include/asm/drmem.h b/arch/powerpc/include/asm/drmem.h
index 17ccc6474ab6..583277e30dd2 100644
--- a/arch/powerpc/include/asm/drmem.h
+++ b/arch/powerpc/include/asm/drmem.h
@@ -8,6 +8,9 @@ 
 #ifndef _ASM_POWERPC_LMB_H
 #define _ASM_POWERPC_LMB_H
 
+#include <linux/bug.h>
+#include <linux/sched.h>
+
 struct drmem_lmb {
 	u64     base_addr;
 	u32     drc_index;
@@ -26,8 +29,21 @@  struct drmem_lmb_info {
 
 extern struct drmem_lmb_info *drmem_info;
 
+static inline struct drmem_lmb *drmem_lmb_next(struct drmem_lmb *lmb)
+{
+	const unsigned int resched_interval = 20;
+
+	BUG_ON(lmb < drmem_info->lmbs);
+	BUG_ON(lmb >= drmem_info->lmbs + drmem_info->n_lmbs);
+
+	if ((lmb - drmem_info->lmbs) % resched_interval == 0)
+		cond_resched();
+
+	return ++lmb;
+}
+
 #define for_each_drmem_lmb_in_range(lmb, start, end)		\
-	for ((lmb) = (start); (lmb) < (end); (lmb)++)
+	for ((lmb) = (start); (lmb) < (end); lmb = drmem_lmb_next(lmb))
 
 #define for_each_drmem_lmb(lmb)					\
 	for_each_drmem_lmb_in_range((lmb),			\