diff mbox series

powerpc/paravirt: correct preempt debug splat in vcpu_is_preempted()

Message ID 20210921031213.2029824-1-nathanl@linux.ibm.com (mailing list archive)
State Changes Requested
Headers show
Series powerpc/paravirt: correct preempt debug splat in vcpu_is_preempted() | expand
Related show

Checks

Context Check Description
snowpatch_ozlabs/github-powerpc_ppctests success Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_selftests success Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_sparse success Successfully ran 4 jobs.
snowpatch_ozlabs/github-powerpc_kernel_qemu success Successfully ran 24 jobs.
snowpatch_ozlabs/github-powerpc_clang success Successfully ran 7 jobs.

Commit Message

Nathan Lynch Sept. 21, 2021, 3:12 a.m. UTC
vcpu_is_preempted() can be used outside of preempt-disabled critical
sections, yielding warnings such as:

BUG: using smp_processor_id() in preemptible [00000000] code: systemd-udevd/185
caller is rwsem_spin_on_owner+0x1cc/0x2d0
CPU: 1 PID: 185 Comm: systemd-udevd Not tainted 5.15.0-rc2+ #33
Call Trace:
[c000000012907ac0] [c000000000aa30a8] dump_stack_lvl+0xac/0x108 (unreliable)
[c000000012907b00] [c000000001371f70] check_preemption_disabled+0x150/0x160
[c000000012907b90] [c0000000001e0e8c] rwsem_spin_on_owner+0x1cc/0x2d0
[c000000012907be0] [c0000000001e1408] rwsem_down_write_slowpath+0x478/0x9a0
[c000000012907ca0] [c000000000576cf4] filename_create+0x94/0x1e0
[c000000012907d10] [c00000000057ac08] do_symlinkat+0x68/0x1a0
[c000000012907d70] [c00000000057ae18] sys_symlink+0x58/0x70
[c000000012907da0] [c00000000002e448] system_call_exception+0x198/0x3c0
[c000000012907e10] [c00000000000c54c] system_call_common+0xec/0x250

The result of vcpu_is_preempted() is always subject to invalidation by
events inside and outside of Linux; it's just a best guess at a point in
time. Use raw_smp_processor_id() to avoid such warnings.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Fixes: ca3f969dcb11 ("powerpc/paravirt: Use is_kvm_guest() in vcpu_is_preempted()")
---
 arch/powerpc/include/asm/paravirt.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

Michael Ellerman Sept. 22, 2021, 6:32 a.m. UTC | #1
Nathan Lynch <nathanl@linux.ibm.com> writes:
> vcpu_is_preempted() can be used outside of preempt-disabled critical
> sections, yielding warnings such as:
>
> BUG: using smp_processor_id() in preemptible [00000000] code: systemd-udevd/185
> caller is rwsem_spin_on_owner+0x1cc/0x2d0
> CPU: 1 PID: 185 Comm: systemd-udevd Not tainted 5.15.0-rc2+ #33
> Call Trace:
> [c000000012907ac0] [c000000000aa30a8] dump_stack_lvl+0xac/0x108 (unreliable)
> [c000000012907b00] [c000000001371f70] check_preemption_disabled+0x150/0x160
> [c000000012907b90] [c0000000001e0e8c] rwsem_spin_on_owner+0x1cc/0x2d0
> [c000000012907be0] [c0000000001e1408] rwsem_down_write_slowpath+0x478/0x9a0
> [c000000012907ca0] [c000000000576cf4] filename_create+0x94/0x1e0
> [c000000012907d10] [c00000000057ac08] do_symlinkat+0x68/0x1a0
> [c000000012907d70] [c00000000057ae18] sys_symlink+0x58/0x70
> [c000000012907da0] [c00000000002e448] system_call_exception+0x198/0x3c0
> [c000000012907e10] [c00000000000c54c] system_call_common+0xec/0x250
>
> The result of vcpu_is_preempted() is always subject to invalidation by
> events inside and outside of Linux; it's just a best guess at a point in
> time. Use raw_smp_processor_id() to avoid such warnings.
>
> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
> Fixes: ca3f969dcb11 ("powerpc/paravirt: Use is_kvm_guest() in vcpu_is_preempted()")
> ---
>  arch/powerpc/include/asm/paravirt.h | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/include/asm/paravirt.h b/arch/powerpc/include/asm/paravirt.h
> index bcb7b5f917be..e429aca566de 100644
> --- a/arch/powerpc/include/asm/paravirt.h
> +++ b/arch/powerpc/include/asm/paravirt.h
> @@ -97,7 +97,14 @@ static inline bool vcpu_is_preempted(int cpu)
>  
>  #ifdef CONFIG_PPC_SPLPAR
>  	if (!is_kvm_guest()) {
> -		int first_cpu = cpu_first_thread_sibling(smp_processor_id());
> +		int first_cpu;
> +
> +		/*
> +		 * This is only a guess at best, and this function may be
> +		 * called with preemption enabled. Using raw_smp_processor_id()
> +		 * does not damage accuracy.
> +		 */
> +		first_cpu = cpu_first_thread_sibling(raw_smp_processor_id());

This change seems good, except I think the comment needs to be a lot
more explicit about what it's doing and why.

A casual reader is going to be confused about vcpu preemption vs
"preemption", which are basically unrelated yet use the same word.

It's not clear how raw_smp_processor_id() is related to (Linux)
preemption, unless you know that smp_processor_id() is the alternative
and it contains a preemption check.

And "this is only a guess" is not clear on what *this* is, you're
referring to the result of the whole function, but that's not obvious.

>  		/*
>  		 * Preemption can only happen at core granularity. This CPU
                   ^^^^^^^^^^
                   Means something different to "preemption" above.

I know you didn't write that comment, and maybe we need to rewrite some
of those existing comments to make it clear they're not talking about
Linux preemption.

cheers
Srikar Dronamraju Sept. 22, 2021, 7:57 a.m. UTC | #2
* Nathan Lynch <nathanl@linux.ibm.com> [2021-09-20 22:12:13]:

> vcpu_is_preempted() can be used outside of preempt-disabled critical
> sections, yielding warnings such as:
> 
> BUG: using smp_processor_id() in preemptible [00000000] code: systemd-udevd/185
> caller is rwsem_spin_on_owner+0x1cc/0x2d0
> CPU: 1 PID: 185 Comm: systemd-udevd Not tainted 5.15.0-rc2+ #33
> Call Trace:
> [c000000012907ac0] [c000000000aa30a8] dump_stack_lvl+0xac/0x108 (unreliable)
> [c000000012907b00] [c000000001371f70] check_preemption_disabled+0x150/0x160
> [c000000012907b90] [c0000000001e0e8c] rwsem_spin_on_owner+0x1cc/0x2d0
> [c000000012907be0] [c0000000001e1408] rwsem_down_write_slowpath+0x478/0x9a0
> [c000000012907ca0] [c000000000576cf4] filename_create+0x94/0x1e0
> [c000000012907d10] [c00000000057ac08] do_symlinkat+0x68/0x1a0
> [c000000012907d70] [c00000000057ae18] sys_symlink+0x58/0x70
> [c000000012907da0] [c00000000002e448] system_call_exception+0x198/0x3c0
> [c000000012907e10] [c00000000000c54c] system_call_common+0xec/0x250
> 
> The result of vcpu_is_preempted() is always subject to invalidation by
> events inside and outside of Linux; it's just a best guess at a point in
> time. Use raw_smp_processor_id() to avoid such warnings.

Typically smp_processor_id() and raw_smp_processor_id() except for the
CONFIG_DEBUG_PREEMPT. In the CONFIG_DEBUG_PREEMPT case, smp_processor_id()
is actually debug_smp_processor_id(), which does all the checks.

I believe these checks in debug_smp_processor_id() are only valid for x86
case (aka cases were they have __smp_processor_id() defined.)
i.e x86 has a different implementation of _smp_processor_id() for stable and
unstable

> 
> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
> Fixes: ca3f969dcb11 ("powerpc/paravirt: Use is_kvm_guest() in vcpu_is_preempted()")
> ---
>  arch/powerpc/include/asm/paravirt.h | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/paravirt.h b/arch/powerpc/include/asm/paravirt.h
> index bcb7b5f917be..e429aca566de 100644
> --- a/arch/powerpc/include/asm/paravirt.h
> +++ b/arch/powerpc/include/asm/paravirt.h
> @@ -97,7 +97,14 @@ static inline bool vcpu_is_preempted(int cpu)
> 
>  #ifdef CONFIG_PPC_SPLPAR
>  	if (!is_kvm_guest()) {
> -		int first_cpu = cpu_first_thread_sibling(smp_processor_id());
> +		int first_cpu;
> +
> +		/*
> +		 * This is only a guess at best, and this function may be
> +		 * called with preemption enabled. Using raw_smp_processor_id()
> +		 * does not damage accuracy.
> +		 */
> +		first_cpu = cpu_first_thread_sibling(raw_smp_processor_id());
> 
>  		/*
>  		 * Preemption can only happen at core granularity. This CPU
> -- 
> 2.31.1
> 

How about something like the below?

diff --git a/include/linux/smp.h b/include/linux/smp.h
index 510519e8a1eb..8c669e8ceb73 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -256,12 +256,14 @@ static inline int get_boot_cpu_id(void)
  */
 #ifndef __smp_processor_id
 #define __smp_processor_id(x) raw_smp_processor_id(x)
-#endif
-
+#else
 #ifdef CONFIG_DEBUG_PREEMPT
   extern unsigned int debug_smp_processor_id(void);
 # define smp_processor_id() debug_smp_processor_id()
-#else
+#endif
+#endif
+
+#ifndef smp_processor_id
 # define smp_processor_id() __smp_processor_id()
 #endif
Nathan Lynch Sept. 22, 2021, 3:28 p.m. UTC | #3
Michael Ellerman <mpe@ellerman.id.au> writes:
> Nathan Lynch <nathanl@linux.ibm.com> writes:
>> --- a/arch/powerpc/include/asm/paravirt.h
>> +++ b/arch/powerpc/include/asm/paravirt.h
>> @@ -97,7 +97,14 @@ static inline bool vcpu_is_preempted(int cpu)
>>  
>>  #ifdef CONFIG_PPC_SPLPAR
>>  	if (!is_kvm_guest()) {
>> -		int first_cpu = cpu_first_thread_sibling(smp_processor_id());
>> +		int first_cpu;
>> +
>> +		/*
>> +		 * This is only a guess at best, and this function may be
>> +		 * called with preemption enabled. Using raw_smp_processor_id()
>> +		 * does not damage accuracy.
>> +		 */
>> +		first_cpu = cpu_first_thread_sibling(raw_smp_processor_id());
>
> This change seems good, except I think the comment needs to be a lot
> more explicit about what it's doing and why.
>
> A casual reader is going to be confused about vcpu preemption vs
> "preemption", which are basically unrelated yet use the same word.
>
> It's not clear how raw_smp_processor_id() is related to (Linux)
> preemption, unless you know that smp_processor_id() is the alternative
> and it contains a preemption check.
>
> And "this is only a guess" is not clear on what *this* is, you're
> referring to the result of the whole function, but that's not obvious.

You're right.

>
>>  		/*
>>  		 * Preemption can only happen at core granularity. This CPU
>                    ^^^^^^^^^^
>                    Means something different to "preemption" above.
>
> I know you didn't write that comment, and maybe we need to rewrite some
> of those existing comments to make it clear they're not talking about
> Linux preemption.

Thanks, agreed on all points. I'll rework the existing comments and any
new ones to clearly distinguish between the two senses of preemption
here.
Nathan Lynch Sept. 22, 2021, 4:01 p.m. UTC | #4
Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
> * Nathan Lynch <nathanl@linux.ibm.com> [2021-09-20 22:12:13]:
>
>> vcpu_is_preempted() can be used outside of preempt-disabled critical
>> sections, yielding warnings such as:
>> 
>> BUG: using smp_processor_id() in preemptible [00000000] code: systemd-udevd/185
>> caller is rwsem_spin_on_owner+0x1cc/0x2d0
>> CPU: 1 PID: 185 Comm: systemd-udevd Not tainted 5.15.0-rc2+ #33
>> Call Trace:
>> [c000000012907ac0] [c000000000aa30a8] dump_stack_lvl+0xac/0x108 (unreliable)
>> [c000000012907b00] [c000000001371f70] check_preemption_disabled+0x150/0x160
>> [c000000012907b90] [c0000000001e0e8c] rwsem_spin_on_owner+0x1cc/0x2d0
>> [c000000012907be0] [c0000000001e1408] rwsem_down_write_slowpath+0x478/0x9a0
>> [c000000012907ca0] [c000000000576cf4] filename_create+0x94/0x1e0
>> [c000000012907d10] [c00000000057ac08] do_symlinkat+0x68/0x1a0
>> [c000000012907d70] [c00000000057ae18] sys_symlink+0x58/0x70
>> [c000000012907da0] [c00000000002e448] system_call_exception+0x198/0x3c0
>> [c000000012907e10] [c00000000000c54c] system_call_common+0xec/0x250
>> 
>> The result of vcpu_is_preempted() is always subject to invalidation by
>> events inside and outside of Linux; it's just a best guess at a point in
>> time. Use raw_smp_processor_id() to avoid such warnings.
>
> Typically smp_processor_id() and raw_smp_processor_id() except for the
> CONFIG_DEBUG_PREEMPT.

Sorry, I don't follow...

> In the CONFIG_DEBUG_PREEMPT case, smp_processor_id()
> is actually debug_smp_processor_id(), which does all the checks.

Yes, OK.

> I believe these checks in debug_smp_processor_id() are only valid for x86
> case (aka cases were they have __smp_processor_id() defined.)

Hmm, I am under the impression that the checks in
debug_smp_processor_id() are valid regardless of whether the arch
overrides __smp_processor_id().

I think the stack trace here correctly identifies an incorrect use of
smp_processor_id(), and the call site needs to be changed. Do you
disagree?
Srikar Dronamraju Sept. 22, 2021, 4:33 p.m. UTC | #5
* Nathan Lynch <nathanl@linux.ibm.com> [2021-09-22 11:01:12]:

> Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
> > * Nathan Lynch <nathanl@linux.ibm.com> [2021-09-20 22:12:13]:
> >
> >> vcpu_is_preempted() can be used outside of preempt-disabled critical
> >> sections, yielding warnings such as:
> >> 
> >> BUG: using smp_processor_id() in preemptible [00000000] code: systemd-udevd/185
> >> caller is rwsem_spin_on_owner+0x1cc/0x2d0
> >> CPU: 1 PID: 185 Comm: systemd-udevd Not tainted 5.15.0-rc2+ #33
> >> Call Trace:
> >> [c000000012907ac0] [c000000000aa30a8] dump_stack_lvl+0xac/0x108 (unreliable)
> >> [c000000012907b00] [c000000001371f70] check_preemption_disabled+0x150/0x160
> >> [c000000012907b90] [c0000000001e0e8c] rwsem_spin_on_owner+0x1cc/0x2d0
> >> [c000000012907be0] [c0000000001e1408] rwsem_down_write_slowpath+0x478/0x9a0
> >> [c000000012907ca0] [c000000000576cf4] filename_create+0x94/0x1e0
> >> [c000000012907d10] [c00000000057ac08] do_symlinkat+0x68/0x1a0
> >> [c000000012907d70] [c00000000057ae18] sys_symlink+0x58/0x70
> >> [c000000012907da0] [c00000000002e448] system_call_exception+0x198/0x3c0
> >> [c000000012907e10] [c00000000000c54c] system_call_common+0xec/0x250
> >> 
> >> The result of vcpu_is_preempted() is always subject to invalidation by
> >> events inside and outside of Linux; it's just a best guess at a point in
> >> time. Use raw_smp_processor_id() to avoid such warnings.
> >
> > Typically smp_processor_id() and raw_smp_processor_id() except for the
> > CONFIG_DEBUG_PREEMPT.
> 
> Sorry, I don't follow...

I meant, Unless CONFIG_DEBUG_PREEMPT, smp_processor_id() is defined as
raw_processor_id().

> 
> > In the CONFIG_DEBUG_PREEMPT case, smp_processor_id()
> > is actually debug_smp_processor_id(), which does all the checks.
> 
> Yes, OK.
> 
> > I believe these checks in debug_smp_processor_id() are only valid for x86
> > case (aka cases were they have __smp_processor_id() defined.)
> 
> Hmm, I am under the impression that the checks in
> debug_smp_processor_id() are valid regardless of whether the arch
> overrides __smp_processor_id().

From include/linux/smp.h

/*
 * Allow the architecture to differentiate between a stable and unstable read.
 * For example, x86 uses an IRQ-safe asm-volatile read for the unstable but a
 * regular asm read for the stable.
 */
#ifndef __smp_processor_id
#define __smp_processor_id(x) raw_smp_processor_id(x)
#endif

As far as I see, only x86 has a definition of __smp_processor_id.
So for archs like Powerpc, __smp_processor_id(), is always
defined as raw_smp_processor_id(). Right?

I would think debug_smp_processor_id() would be useful if __smp_processor_id()
is different from raw_smp_processor_id(). Do note debug_smp_processor_id() 
calls raw_smp_processor_id().

Or can I understand how debug_smp_processor_id() is useful if
__smp_processor_id() is defined as raw_smp_processor_id()?

> I think the stack trace here correctly identifies an incorrect use of
> smp_processor_id(), and the call site needs to be changed. Do you
> disagree?

Yes the stack_trace shows that debug_smp_processor_id(). However what I want 
to understand is why should we even call debug_smp_processor_id(), when our
__smp_processor_id() is defined as raw_smp_processor_id().
Nathan Lynch Sept. 22, 2021, 7:29 p.m. UTC | #6
Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:

> * Nathan Lynch <nathanl@linux.ibm.com> [2021-09-22 11:01:12]:
>
>> Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
>> > * Nathan Lynch <nathanl@linux.ibm.com> [2021-09-20 22:12:13]:
>> >
>> >> vcpu_is_preempted() can be used outside of preempt-disabled critical
>> >> sections, yielding warnings such as:
>> >> 
>> >> BUG: using smp_processor_id() in preemptible [00000000] code: systemd-udevd/185
>> >> caller is rwsem_spin_on_owner+0x1cc/0x2d0
>> >> CPU: 1 PID: 185 Comm: systemd-udevd Not tainted 5.15.0-rc2+ #33
>> >> Call Trace:
>> >> [c000000012907ac0] [c000000000aa30a8] dump_stack_lvl+0xac/0x108 (unreliable)
>> >> [c000000012907b00] [c000000001371f70] check_preemption_disabled+0x150/0x160
>> >> [c000000012907b90] [c0000000001e0e8c] rwsem_spin_on_owner+0x1cc/0x2d0
>> >> [c000000012907be0] [c0000000001e1408] rwsem_down_write_slowpath+0x478/0x9a0
>> >> [c000000012907ca0] [c000000000576cf4] filename_create+0x94/0x1e0
>> >> [c000000012907d10] [c00000000057ac08] do_symlinkat+0x68/0x1a0
>> >> [c000000012907d70] [c00000000057ae18] sys_symlink+0x58/0x70
>> >> [c000000012907da0] [c00000000002e448] system_call_exception+0x198/0x3c0
>> >> [c000000012907e10] [c00000000000c54c] system_call_common+0xec/0x250
>> >> 
>> >> The result of vcpu_is_preempted() is always subject to invalidation by
>> >> events inside and outside of Linux; it's just a best guess at a point in
>> >> time. Use raw_smp_processor_id() to avoid such warnings.
>> >
>> > Typically smp_processor_id() and raw_smp_processor_id() except for the
>> > CONFIG_DEBUG_PREEMPT.
>> 
>> Sorry, I don't follow...
>
> I meant, Unless CONFIG_DEBUG_PREEMPT, smp_processor_id() is defined as
> raw_processor_id().
>
>> 
>> > In the CONFIG_DEBUG_PREEMPT case, smp_processor_id()
>> > is actually debug_smp_processor_id(), which does all the checks.
>> 
>> Yes, OK.
>> 
>> > I believe these checks in debug_smp_processor_id() are only valid for x86
>> > case (aka cases were they have __smp_processor_id() defined.)
>> 
>> Hmm, I am under the impression that the checks in
>> debug_smp_processor_id() are valid regardless of whether the arch
>> overrides __smp_processor_id().
>
> From include/linux/smp.h
>
> /*
>  * Allow the architecture to differentiate between a stable and unstable read.
>  * For example, x86 uses an IRQ-safe asm-volatile read for the unstable but a
>  * regular asm read for the stable.
>  */
> #ifndef __smp_processor_id
> #define __smp_processor_id(x) raw_smp_processor_id(x)
> #endif
>
> As far as I see, only x86 has a definition of __smp_processor_id.
> So for archs like Powerpc, __smp_processor_id(), is always
> defined as raw_smp_processor_id(). Right?

Sure, yes.

> I would think debug_smp_processor_id() would be useful if __smp_processor_id()
> is different from raw_smp_processor_id(). Do note debug_smp_processor_id() 
> calls raw_smp_processor_id().

I do not think the utility of debug_smp_processor_id() is related to
whether the arch defines __smp_processor_id().

> Or can I understand how debug_smp_processor_id() is useful if
> __smp_processor_id() is defined as raw_smp_processor_id()?

So, for powerpc with DEBUG_PREEMPT unset, a call to smp_procesor_id()
expands to __smp_processor_id() which expands to raw_smp_processor_id(),
avoiding the preempt safety checks. This is working as intended.

For powerpc with DEBUG_PREEMPT=y, a call to smp_processor_id() expands
to the out of line call to debug_smp_processor_id(), which calls
raw_smp_processor_id() and performs the checks, warning if called in an
inappropriate context, as seen here. Also working as intended.

AFAICT __smp_processor_id() is a performance/codegen-oriented hook, and
not really related to the debug facility. Please see 9ed7d75b2f09d
("x86/percpu: Relax smp_processor_id()").

>> I think the stack trace here correctly identifies an incorrect use of
>> smp_processor_id(), and the call site needs to be changed. Do you
>> disagree?
>
> Yes the stack_trace shows that debug_smp_processor_id(). However what
> I want to understand is why should we even call
> debug_smp_processor_id(), when our __smp_processor_id() is defined as
> raw_smp_processor_id().

smp_processor_id() should always expand to debug_smp_processor_id() when
DEBUG_PREEMPT=y, regardless of whether the arch overrides
__smp_processor_id(). That is how I understand the intent of the code as
written.
Michael Ellerman Sept. 23, 2021, 7:29 a.m. UTC | #7
Nathan Lynch <nathanl@linux.ibm.com> writes:
> Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
>
>> * Nathan Lynch <nathanl@linux.ibm.com> [2021-09-22 11:01:12]:
>>
>>> Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
>>> > * Nathan Lynch <nathanl@linux.ibm.com> [2021-09-20 22:12:13]:
>>> >
>>> >> vcpu_is_preempted() can be used outside of preempt-disabled critical
>>> >> sections, yielding warnings such as:
>>> >> 
>>> >> BUG: using smp_processor_id() in preemptible [00000000] code: systemd-udevd/185
>>> >> caller is rwsem_spin_on_owner+0x1cc/0x2d0
>>> >> CPU: 1 PID: 185 Comm: systemd-udevd Not tainted 5.15.0-rc2+ #33
>>> >> Call Trace:
>>> >> [c000000012907ac0] [c000000000aa30a8] dump_stack_lvl+0xac/0x108 (unreliable)
>>> >> [c000000012907b00] [c000000001371f70] check_preemption_disabled+0x150/0x160
>>> >> [c000000012907b90] [c0000000001e0e8c] rwsem_spin_on_owner+0x1cc/0x2d0
>>> >> [c000000012907be0] [c0000000001e1408] rwsem_down_write_slowpath+0x478/0x9a0
>>> >> [c000000012907ca0] [c000000000576cf4] filename_create+0x94/0x1e0
>>> >> [c000000012907d10] [c00000000057ac08] do_symlinkat+0x68/0x1a0
>>> >> [c000000012907d70] [c00000000057ae18] sys_symlink+0x58/0x70
>>> >> [c000000012907da0] [c00000000002e448] system_call_exception+0x198/0x3c0
>>> >> [c000000012907e10] [c00000000000c54c] system_call_common+0xec/0x250
>>> >> 
>>> >> The result of vcpu_is_preempted() is always subject to invalidation by
>>> >> events inside and outside of Linux; it's just a best guess at a point in
>>> >> time. Use raw_smp_processor_id() to avoid such warnings.
>>> >
>>> > Typically smp_processor_id() and raw_smp_processor_id() except for the
>>> > CONFIG_DEBUG_PREEMPT.
>>> 
>>> Sorry, I don't follow...
>>
>> I meant, Unless CONFIG_DEBUG_PREEMPT, smp_processor_id() is defined as
>> raw_processor_id().
>>
>>> 
>>> > In the CONFIG_DEBUG_PREEMPT case, smp_processor_id()
>>> > is actually debug_smp_processor_id(), which does all the checks.
>>> 
>>> Yes, OK.
>>> 
>>> > I believe these checks in debug_smp_processor_id() are only valid for x86
>>> > case (aka cases were they have __smp_processor_id() defined.)
>>> 
>>> Hmm, I am under the impression that the checks in
>>> debug_smp_processor_id() are valid regardless of whether the arch
>>> overrides __smp_processor_id().
>>
>> From include/linux/smp.h
>>
>> /*
>>  * Allow the architecture to differentiate between a stable and unstable read.
>>  * For example, x86 uses an IRQ-safe asm-volatile read for the unstable but a
>>  * regular asm read for the stable.
>>  */
>> #ifndef __smp_processor_id
>> #define __smp_processor_id(x) raw_smp_processor_id(x)
>> #endif
>>
>> As far as I see, only x86 has a definition of __smp_processor_id.
>> So for archs like Powerpc, __smp_processor_id(), is always
>> defined as raw_smp_processor_id(). Right?
>
> Sure, yes.
>
>> I would think debug_smp_processor_id() would be useful if __smp_processor_id()
>> is different from raw_smp_processor_id(). Do note debug_smp_processor_id() 
>> calls raw_smp_processor_id().

Agree.

> I do not think the utility of debug_smp_processor_id() is related to
> whether the arch defines __smp_processor_id().
>
>> Or can I understand how debug_smp_processor_id() is useful if
>> __smp_processor_id() is defined as raw_smp_processor_id()?

debug_smp_processor_id() is useful on powerpc, as well as other arches,
because it checks that we're in a context where the processor id won't
change out from under us.

eg. something like this is unsafe:

  int counts[NR_CPUS];
  int tmp, cpu;
  
  cpu = smp_processor_id();
  tmp = counts[cpu];
  				<- preempted here and migrated to another CPU
  counts[cpu] = tmp + 1;


> So, for powerpc with DEBUG_PREEMPT unset, a call to smp_procesor_id()
> expands to __smp_processor_id() which expands to raw_smp_processor_id(),
> avoiding the preempt safety checks. This is working as intended.
>
> For powerpc with DEBUG_PREEMPT=y, a call to smp_processor_id() expands
> to the out of line call to debug_smp_processor_id(), which calls
> raw_smp_processor_id() and performs the checks, warning if called in an
> inappropriate context, as seen here. Also working as intended.
>
> AFAICT __smp_processor_id() is a performance/codegen-oriented hook, and
> not really related to the debug facility. Please see 9ed7d75b2f09d
> ("x86/percpu: Relax smp_processor_id()").

Yeah good find.

cheers
Srikar Dronamraju Sept. 23, 2021, 6:02 p.m. UTC | #8
* Michael Ellerman <mpe@ellerman.id.au> [2021-09-23 17:29:32]:

> Nathan Lynch <nathanl@linux.ibm.com> writes:
> > Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
> >
> >> * Nathan Lynch <nathanl@linux.ibm.com> [2021-09-22 11:01:12]:
> >>
> >>> Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
> >>> > * Nathan Lynch <nathanl@linux.ibm.com> [2021-09-20 22:12:13]:
> >>> >
> >>> >> vcpu_is_preempted() can be used outside of preempt-disabled critical
> >>> >> sections, yielding warnings such as:
> >>> >> 
> >>> >> BUG: using smp_processor_id() in preemptible [00000000] code: systemd-udevd/185
> >>> >> caller is rwsem_spin_on_owner+0x1cc/0x2d0
> >>> >> CPU: 1 PID: 185 Comm: systemd-udevd Not tainted 5.15.0-rc2+ #33
> >>> >> Call Trace:
> >>> >> [c000000012907ac0] [c000000000aa30a8] dump_stack_lvl+0xac/0x108 (unreliable)
> >>> >> [c000000012907b00] [c000000001371f70] check_preemption_disabled+0x150/0x160
> >>> >> [c000000012907b90] [c0000000001e0e8c] rwsem_spin_on_owner+0x1cc/0x2d0
> >>> >> [c000000012907be0] [c0000000001e1408] rwsem_down_write_slowpath+0x478/0x9a0
> >>> >> [c000000012907ca0] [c000000000576cf4] filename_create+0x94/0x1e0
> >>> >> [c000000012907d10] [c00000000057ac08] do_symlinkat+0x68/0x1a0
> >>> >> [c000000012907d70] [c00000000057ae18] sys_symlink+0x58/0x70
> >>> >> [c000000012907da0] [c00000000002e448] system_call_exception+0x198/0x3c0
> >>> >> [c000000012907e10] [c00000000000c54c] system_call_common+0xec/0x250
> >>> >> 
> >>> >> The result of vcpu_is_preempted() is always subject to invalidation by
> >>> >> events inside and outside of Linux; it's just a best guess at a point in
> >>> >> time. Use raw_smp_processor_id() to avoid such warnings.
> >>> >
> >>> > Typically smp_processor_id() and raw_smp_processor_id() except for the
> >>> > CONFIG_DEBUG_PREEMPT.
> >>> 
> >>> Sorry, I don't follow...
> >>
> >> I meant, Unless CONFIG_DEBUG_PREEMPT, smp_processor_id() is defined as
> >> raw_processor_id().
> >>
> >>> 
> >>> > In the CONFIG_DEBUG_PREEMPT case, smp_processor_id()
> >>> > is actually debug_smp_processor_id(), which does all the checks.
> >>> 
> >>> Yes, OK.
> >>> 
> >>> > I believe these checks in debug_smp_processor_id() are only valid for x86
> >>> > case (aka cases were they have __smp_processor_id() defined.)
> >>> 
> >>> Hmm, I am under the impression that the checks in
> >>> debug_smp_processor_id() are valid regardless of whether the arch
> >>> overrides __smp_processor_id().
> >>
> >> From include/linux/smp.h
> >>
> >> /*
> >>  * Allow the architecture to differentiate between a stable and unstable read.
> >>  * For example, x86 uses an IRQ-safe asm-volatile read for the unstable but a
> >>  * regular asm read for the stable.
> >>  */
> >> #ifndef __smp_processor_id
> >> #define __smp_processor_id(x) raw_smp_processor_id(x)
> >> #endif
> >>
> >> As far as I see, only x86 has a definition of __smp_processor_id.
> >> So for archs like Powerpc, __smp_processor_id(), is always
> >> defined as raw_smp_processor_id(). Right?
> >
> > Sure, yes.
> >
> >> I would think debug_smp_processor_id() would be useful if __smp_processor_id()
> >> is different from raw_smp_processor_id(). Do note debug_smp_processor_id() 
> >> calls raw_smp_processor_id().
> 
> Agree.
> 
> > I do not think the utility of debug_smp_processor_id() is related to
> > whether the arch defines __smp_processor_id().
> >
> >> Or can I understand how debug_smp_processor_id() is useful if
> >> __smp_processor_id() is defined as raw_smp_processor_id()?
> 
> debug_smp_processor_id() is useful on powerpc, as well as other arches,
> because it checks that we're in a context where the processor id won't
> change out from under us.
> 
> eg. something like this is unsafe:
> 
>   int counts[NR_CPUS];
>   int tmp, cpu;
>   
>   cpu = smp_processor_id();
>   tmp = counts[cpu];
>   				<- preempted here and migrated to another CPU
>   counts[cpu] = tmp + 1;
> 

If lets say the above call was replaced by raw_smp_processor_id(), how would
it avoid the preemption / migration to another CPU?

Replacing it with raw_smp_processor_id() may avoid, the debug splat but the
underlying issue would still remain as is. No?

> 
> > So, for powerpc with DEBUG_PREEMPT unset, a call to smp_procesor_id()
> > expands to __smp_processor_id() which expands to raw_smp_processor_id(),
> > avoiding the preempt safety checks. This is working as intended.
> >
> > For powerpc with DEBUG_PREEMPT=y, a call to smp_processor_id() expands
> > to the out of line call to debug_smp_processor_id(), which calls
> > raw_smp_processor_id() and performs the checks, warning if called in an
> > inappropriate context, as seen here. Also working as intended.
> >
> > AFAICT __smp_processor_id() is a performance/codegen-oriented hook, and
> > not really related to the debug facility. Please see 9ed7d75b2f09d
> > ("x86/percpu: Relax smp_processor_id()").
> 
> Yeah good find.
> 
> cheers
Michael Ellerman Sept. 24, 2021, 3:07 a.m. UTC | #9
Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
> * Michael Ellerman <mpe@ellerman.id.au> [2021-09-23 17:29:32]:
>
>> Nathan Lynch <nathanl@linux.ibm.com> writes:
>> > Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
>> >
>> >> * Nathan Lynch <nathanl@linux.ibm.com> [2021-09-22 11:01:12]:
>> >>
>> >>> Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
...
>> >> Or can I understand how debug_smp_processor_id() is useful if
>> >> __smp_processor_id() is defined as raw_smp_processor_id()?
>> 
>> debug_smp_processor_id() is useful on powerpc, as well as other arches,
>> because it checks that we're in a context where the processor id won't
>> change out from under us.
>> 
>> eg. something like this is unsafe:
>> 
>>   int counts[NR_CPUS];
>>   int tmp, cpu;
>>   
>>   cpu = smp_processor_id();
>>   tmp = counts[cpu];
>>   				<- preempted here and migrated to another CPU
>>   counts[cpu] = tmp + 1;
>> 
>
> If lets say the above call was replaced by raw_smp_processor_id(), how would
> it avoid the preemption / migration to another CPU?
>
> Replacing it with raw_smp_processor_id() may avoid, the debug splat but the
> underlying issue would still remain as is. No?

Correct.

Using raw_smp_processor_id() is generally the wrong answer. For this
example the correct fix is to disable preemption around the code, eg:

   int counts[NR_CPUS];
   int tmp, cpu;
   
   preempt_disable()

   cpu = smp_processor_id();
   tmp = counts[cpu];
   counts[cpu] = tmp + 1;

   preempt_enable();


For the original issue I think it is OK to use raw_smp_processor_id(),
because we're already doing a racy check of whether another CPU has been
preempted by the hypervisor.

        if (!is_kvm_guest()) {
                int first_cpu = cpu_first_thread_sibling(smp_processor_id());

                if (cpu_first_thread_sibling(cpu) == first_cpu)
                        return false;
        }

We could disable preemption around that, eg:

        if (!is_kvm_guest()) {
                int first_cpu;
                bool is_sibling;

                preempt_disable();
                first_cpu = cpu_first_thread_sibling(smp_processor_id());
                is_sibling = (cpu_first_thread_sibling(cpu) == first_cpu)
                preempt_enable();

                // Can be preempted here

                if (is_sibling)
                    return false;
        }

But before we return we could be preempted, and then is_sibling is no
longer necessarily correct. So that doesn't really gain us anything.

The only way to make that result stable is to disable preemption in the
caller, but most callers don't want to AFAICS, because they know they're
doing a racy check to begin with.

cheers
Nathan Lynch Sept. 25, 2021, 12:10 a.m. UTC | #10
Michael Ellerman <mpe@ellerman.id.au> writes:
> Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
>> * Michael Ellerman <mpe@ellerman.id.au> [2021-09-23 17:29:32]:
>>
>>> Nathan Lynch <nathanl@linux.ibm.com> writes:
>>> > Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
>>> >
>>> >> * Nathan Lynch <nathanl@linux.ibm.com> [2021-09-22 11:01:12]:
>>> >>
>>> >>> Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
> ...
>>> >> Or can I understand how debug_smp_processor_id() is useful if
>>> >> __smp_processor_id() is defined as raw_smp_processor_id()?
>>> 
>>> debug_smp_processor_id() is useful on powerpc, as well as other arches,
>>> because it checks that we're in a context where the processor id won't
>>> change out from under us.
>>> 
>>> eg. something like this is unsafe:
>>> 
>>>   int counts[NR_CPUS];
>>>   int tmp, cpu;
>>>   
>>>   cpu = smp_processor_id();
>>>   tmp = counts[cpu];
>>>   				<- preempted here and migrated to another CPU
>>>   counts[cpu] = tmp + 1;
>>> 
>>
>> If lets say the above call was replaced by raw_smp_processor_id(), how would
>> it avoid the preemption / migration to another CPU?
>>
>> Replacing it with raw_smp_processor_id() may avoid, the debug splat but the
>> underlying issue would still remain as is. No?
>
> Correct.
>
> Using raw_smp_processor_id() is generally the wrong answer. For this
> example the correct fix is to disable preemption around the code, eg:
>
>    int counts[NR_CPUS];
>    int tmp, cpu;
>    
>    preempt_disable()
>
>    cpu = smp_processor_id();
>    tmp = counts[cpu];
>    counts[cpu] = tmp + 1;
>
>    preempt_enable();
>
>
> For the original issue I think it is OK to use raw_smp_processor_id(),
> because we're already doing a racy check of whether another CPU has been
> preempted by the hypervisor.
>
>         if (!is_kvm_guest()) {
>                 int first_cpu = cpu_first_thread_sibling(smp_processor_id());
>
>                 if (cpu_first_thread_sibling(cpu) == first_cpu)
>                         return false;
>         }
>
> We could disable preemption around that, eg:
>
>         if (!is_kvm_guest()) {
>                 int first_cpu;
>                 bool is_sibling;
>
>                 preempt_disable();
>                 first_cpu = cpu_first_thread_sibling(smp_processor_id());
>                 is_sibling = (cpu_first_thread_sibling(cpu) == first_cpu)
>                 preempt_enable();
>
>                 // Can be preempted here
>
>                 if (is_sibling)
>                     return false;
>         }
>
> But before we return we could be preempted, and then is_sibling is no
> longer necessarily correct. So that doesn't really gain us anything.
>
> The only way to make that result stable is to disable preemption in the
> caller, but most callers don't want to AFAICS, because they know they're
> doing a racy check to begin with.

I'll add that one way I think about this is that when I choose
smp_processor_id(), I am making a claim about the context in which it is
used, and when I use raw_smp_processor_id() I am making a different
claim.

smp_processor_id() => I am sampling the CPU index in a critical section
where the result equals the CPU that executes the entire critical
section, and I am relying on that property for the section's
correctness. This claim is verified by debug_smp_processor_id() when
DEBUG_PREEMPT=y.

raw_smp_processor_id() => I am sampling the CPU index and using the
result in a way that is safe even if it differs from the CPU(s) on which
the surrounding code actually executes.

This framing doesn't cover all situations, but I've found it to be
generally useful.
diff mbox series

Patch

diff --git a/arch/powerpc/include/asm/paravirt.h b/arch/powerpc/include/asm/paravirt.h
index bcb7b5f917be..e429aca566de 100644
--- a/arch/powerpc/include/asm/paravirt.h
+++ b/arch/powerpc/include/asm/paravirt.h
@@ -97,7 +97,14 @@  static inline bool vcpu_is_preempted(int cpu)
 
 #ifdef CONFIG_PPC_SPLPAR
 	if (!is_kvm_guest()) {
-		int first_cpu = cpu_first_thread_sibling(smp_processor_id());
+		int first_cpu;
+
+		/*
+		 * This is only a guess at best, and this function may be
+		 * called with preemption enabled. Using raw_smp_processor_id()
+		 * does not damage accuracy.
+		 */
+		first_cpu = cpu_first_thread_sibling(raw_smp_processor_id());
 
 		/*
 		 * Preemption can only happen at core granularity. This CPU