Message ID | 20210921031213.2029824-1-nathanl@linux.ibm.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | powerpc/paravirt: correct preempt debug splat in vcpu_is_preempted() | expand |
Related | show |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/github-powerpc_ppctests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_selftests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_sparse | success | Successfully ran 4 jobs. |
snowpatch_ozlabs/github-powerpc_kernel_qemu | success | Successfully ran 24 jobs. |
snowpatch_ozlabs/github-powerpc_clang | success | Successfully ran 7 jobs. |
Nathan Lynch <nathanl@linux.ibm.com> writes: > vcpu_is_preempted() can be used outside of preempt-disabled critical > sections, yielding warnings such as: > > BUG: using smp_processor_id() in preemptible [00000000] code: systemd-udevd/185 > caller is rwsem_spin_on_owner+0x1cc/0x2d0 > CPU: 1 PID: 185 Comm: systemd-udevd Not tainted 5.15.0-rc2+ #33 > Call Trace: > [c000000012907ac0] [c000000000aa30a8] dump_stack_lvl+0xac/0x108 (unreliable) > [c000000012907b00] [c000000001371f70] check_preemption_disabled+0x150/0x160 > [c000000012907b90] [c0000000001e0e8c] rwsem_spin_on_owner+0x1cc/0x2d0 > [c000000012907be0] [c0000000001e1408] rwsem_down_write_slowpath+0x478/0x9a0 > [c000000012907ca0] [c000000000576cf4] filename_create+0x94/0x1e0 > [c000000012907d10] [c00000000057ac08] do_symlinkat+0x68/0x1a0 > [c000000012907d70] [c00000000057ae18] sys_symlink+0x58/0x70 > [c000000012907da0] [c00000000002e448] system_call_exception+0x198/0x3c0 > [c000000012907e10] [c00000000000c54c] system_call_common+0xec/0x250 > > The result of vcpu_is_preempted() is always subject to invalidation by > events inside and outside of Linux; it's just a best guess at a point in > time. Use raw_smp_processor_id() to avoid such warnings. > > Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> > Fixes: ca3f969dcb11 ("powerpc/paravirt: Use is_kvm_guest() in vcpu_is_preempted()") > --- > arch/powerpc/include/asm/paravirt.h | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/include/asm/paravirt.h b/arch/powerpc/include/asm/paravirt.h > index bcb7b5f917be..e429aca566de 100644 > --- a/arch/powerpc/include/asm/paravirt.h > +++ b/arch/powerpc/include/asm/paravirt.h > @@ -97,7 +97,14 @@ static inline bool vcpu_is_preempted(int cpu) > > #ifdef CONFIG_PPC_SPLPAR > if (!is_kvm_guest()) { > - int first_cpu = cpu_first_thread_sibling(smp_processor_id()); > + int first_cpu; > + > + /* > + * This is only a guess at best, and this function may be > + * called with preemption enabled. Using raw_smp_processor_id() > + * does not damage accuracy. > + */ > + first_cpu = cpu_first_thread_sibling(raw_smp_processor_id()); This change seems good, except I think the comment needs to be a lot more explicit about what it's doing and why. A casual reader is going to be confused about vcpu preemption vs "preemption", which are basically unrelated yet use the same word. It's not clear how raw_smp_processor_id() is related to (Linux) preemption, unless you know that smp_processor_id() is the alternative and it contains a preemption check. And "this is only a guess" is not clear on what *this* is, you're referring to the result of the whole function, but that's not obvious. > /* > * Preemption can only happen at core granularity. This CPU ^^^^^^^^^^ Means something different to "preemption" above. I know you didn't write that comment, and maybe we need to rewrite some of those existing comments to make it clear they're not talking about Linux preemption. cheers
* Nathan Lynch <nathanl@linux.ibm.com> [2021-09-20 22:12:13]: > vcpu_is_preempted() can be used outside of preempt-disabled critical > sections, yielding warnings such as: > > BUG: using smp_processor_id() in preemptible [00000000] code: systemd-udevd/185 > caller is rwsem_spin_on_owner+0x1cc/0x2d0 > CPU: 1 PID: 185 Comm: systemd-udevd Not tainted 5.15.0-rc2+ #33 > Call Trace: > [c000000012907ac0] [c000000000aa30a8] dump_stack_lvl+0xac/0x108 (unreliable) > [c000000012907b00] [c000000001371f70] check_preemption_disabled+0x150/0x160 > [c000000012907b90] [c0000000001e0e8c] rwsem_spin_on_owner+0x1cc/0x2d0 > [c000000012907be0] [c0000000001e1408] rwsem_down_write_slowpath+0x478/0x9a0 > [c000000012907ca0] [c000000000576cf4] filename_create+0x94/0x1e0 > [c000000012907d10] [c00000000057ac08] do_symlinkat+0x68/0x1a0 > [c000000012907d70] [c00000000057ae18] sys_symlink+0x58/0x70 > [c000000012907da0] [c00000000002e448] system_call_exception+0x198/0x3c0 > [c000000012907e10] [c00000000000c54c] system_call_common+0xec/0x250 > > The result of vcpu_is_preempted() is always subject to invalidation by > events inside and outside of Linux; it's just a best guess at a point in > time. Use raw_smp_processor_id() to avoid such warnings. Typically smp_processor_id() and raw_smp_processor_id() except for the CONFIG_DEBUG_PREEMPT. In the CONFIG_DEBUG_PREEMPT case, smp_processor_id() is actually debug_smp_processor_id(), which does all the checks. I believe these checks in debug_smp_processor_id() are only valid for x86 case (aka cases were they have __smp_processor_id() defined.) i.e x86 has a different implementation of _smp_processor_id() for stable and unstable > > Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> > Fixes: ca3f969dcb11 ("powerpc/paravirt: Use is_kvm_guest() in vcpu_is_preempted()") > --- > arch/powerpc/include/asm/paravirt.h | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/include/asm/paravirt.h b/arch/powerpc/include/asm/paravirt.h > index bcb7b5f917be..e429aca566de 100644 > --- a/arch/powerpc/include/asm/paravirt.h > +++ b/arch/powerpc/include/asm/paravirt.h > @@ -97,7 +97,14 @@ static inline bool vcpu_is_preempted(int cpu) > > #ifdef CONFIG_PPC_SPLPAR > if (!is_kvm_guest()) { > - int first_cpu = cpu_first_thread_sibling(smp_processor_id()); > + int first_cpu; > + > + /* > + * This is only a guess at best, and this function may be > + * called with preemption enabled. Using raw_smp_processor_id() > + * does not damage accuracy. > + */ > + first_cpu = cpu_first_thread_sibling(raw_smp_processor_id()); > > /* > * Preemption can only happen at core granularity. This CPU > -- > 2.31.1 > How about something like the below? diff --git a/include/linux/smp.h b/include/linux/smp.h index 510519e8a1eb..8c669e8ceb73 100644 --- a/include/linux/smp.h +++ b/include/linux/smp.h @@ -256,12 +256,14 @@ static inline int get_boot_cpu_id(void) */ #ifndef __smp_processor_id #define __smp_processor_id(x) raw_smp_processor_id(x) -#endif - +#else #ifdef CONFIG_DEBUG_PREEMPT extern unsigned int debug_smp_processor_id(void); # define smp_processor_id() debug_smp_processor_id() -#else +#endif +#endif + +#ifndef smp_processor_id # define smp_processor_id() __smp_processor_id() #endif
Michael Ellerman <mpe@ellerman.id.au> writes: > Nathan Lynch <nathanl@linux.ibm.com> writes: >> --- a/arch/powerpc/include/asm/paravirt.h >> +++ b/arch/powerpc/include/asm/paravirt.h >> @@ -97,7 +97,14 @@ static inline bool vcpu_is_preempted(int cpu) >> >> #ifdef CONFIG_PPC_SPLPAR >> if (!is_kvm_guest()) { >> - int first_cpu = cpu_first_thread_sibling(smp_processor_id()); >> + int first_cpu; >> + >> + /* >> + * This is only a guess at best, and this function may be >> + * called with preemption enabled. Using raw_smp_processor_id() >> + * does not damage accuracy. >> + */ >> + first_cpu = cpu_first_thread_sibling(raw_smp_processor_id()); > > This change seems good, except I think the comment needs to be a lot > more explicit about what it's doing and why. > > A casual reader is going to be confused about vcpu preemption vs > "preemption", which are basically unrelated yet use the same word. > > It's not clear how raw_smp_processor_id() is related to (Linux) > preemption, unless you know that smp_processor_id() is the alternative > and it contains a preemption check. > > And "this is only a guess" is not clear on what *this* is, you're > referring to the result of the whole function, but that's not obvious. You're right. > >> /* >> * Preemption can only happen at core granularity. This CPU > ^^^^^^^^^^ > Means something different to "preemption" above. > > I know you didn't write that comment, and maybe we need to rewrite some > of those existing comments to make it clear they're not talking about > Linux preemption. Thanks, agreed on all points. I'll rework the existing comments and any new ones to clearly distinguish between the two senses of preemption here.
Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes: > * Nathan Lynch <nathanl@linux.ibm.com> [2021-09-20 22:12:13]: > >> vcpu_is_preempted() can be used outside of preempt-disabled critical >> sections, yielding warnings such as: >> >> BUG: using smp_processor_id() in preemptible [00000000] code: systemd-udevd/185 >> caller is rwsem_spin_on_owner+0x1cc/0x2d0 >> CPU: 1 PID: 185 Comm: systemd-udevd Not tainted 5.15.0-rc2+ #33 >> Call Trace: >> [c000000012907ac0] [c000000000aa30a8] dump_stack_lvl+0xac/0x108 (unreliable) >> [c000000012907b00] [c000000001371f70] check_preemption_disabled+0x150/0x160 >> [c000000012907b90] [c0000000001e0e8c] rwsem_spin_on_owner+0x1cc/0x2d0 >> [c000000012907be0] [c0000000001e1408] rwsem_down_write_slowpath+0x478/0x9a0 >> [c000000012907ca0] [c000000000576cf4] filename_create+0x94/0x1e0 >> [c000000012907d10] [c00000000057ac08] do_symlinkat+0x68/0x1a0 >> [c000000012907d70] [c00000000057ae18] sys_symlink+0x58/0x70 >> [c000000012907da0] [c00000000002e448] system_call_exception+0x198/0x3c0 >> [c000000012907e10] [c00000000000c54c] system_call_common+0xec/0x250 >> >> The result of vcpu_is_preempted() is always subject to invalidation by >> events inside and outside of Linux; it's just a best guess at a point in >> time. Use raw_smp_processor_id() to avoid such warnings. > > Typically smp_processor_id() and raw_smp_processor_id() except for the > CONFIG_DEBUG_PREEMPT. Sorry, I don't follow... > In the CONFIG_DEBUG_PREEMPT case, smp_processor_id() > is actually debug_smp_processor_id(), which does all the checks. Yes, OK. > I believe these checks in debug_smp_processor_id() are only valid for x86 > case (aka cases were they have __smp_processor_id() defined.) Hmm, I am under the impression that the checks in debug_smp_processor_id() are valid regardless of whether the arch overrides __smp_processor_id(). I think the stack trace here correctly identifies an incorrect use of smp_processor_id(), and the call site needs to be changed. Do you disagree?
* Nathan Lynch <nathanl@linux.ibm.com> [2021-09-22 11:01:12]: > Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes: > > * Nathan Lynch <nathanl@linux.ibm.com> [2021-09-20 22:12:13]: > > > >> vcpu_is_preempted() can be used outside of preempt-disabled critical > >> sections, yielding warnings such as: > >> > >> BUG: using smp_processor_id() in preemptible [00000000] code: systemd-udevd/185 > >> caller is rwsem_spin_on_owner+0x1cc/0x2d0 > >> CPU: 1 PID: 185 Comm: systemd-udevd Not tainted 5.15.0-rc2+ #33 > >> Call Trace: > >> [c000000012907ac0] [c000000000aa30a8] dump_stack_lvl+0xac/0x108 (unreliable) > >> [c000000012907b00] [c000000001371f70] check_preemption_disabled+0x150/0x160 > >> [c000000012907b90] [c0000000001e0e8c] rwsem_spin_on_owner+0x1cc/0x2d0 > >> [c000000012907be0] [c0000000001e1408] rwsem_down_write_slowpath+0x478/0x9a0 > >> [c000000012907ca0] [c000000000576cf4] filename_create+0x94/0x1e0 > >> [c000000012907d10] [c00000000057ac08] do_symlinkat+0x68/0x1a0 > >> [c000000012907d70] [c00000000057ae18] sys_symlink+0x58/0x70 > >> [c000000012907da0] [c00000000002e448] system_call_exception+0x198/0x3c0 > >> [c000000012907e10] [c00000000000c54c] system_call_common+0xec/0x250 > >> > >> The result of vcpu_is_preempted() is always subject to invalidation by > >> events inside and outside of Linux; it's just a best guess at a point in > >> time. Use raw_smp_processor_id() to avoid such warnings. > > > > Typically smp_processor_id() and raw_smp_processor_id() except for the > > CONFIG_DEBUG_PREEMPT. > > Sorry, I don't follow... I meant, Unless CONFIG_DEBUG_PREEMPT, smp_processor_id() is defined as raw_processor_id(). > > > In the CONFIG_DEBUG_PREEMPT case, smp_processor_id() > > is actually debug_smp_processor_id(), which does all the checks. > > Yes, OK. > > > I believe these checks in debug_smp_processor_id() are only valid for x86 > > case (aka cases were they have __smp_processor_id() defined.) > > Hmm, I am under the impression that the checks in > debug_smp_processor_id() are valid regardless of whether the arch > overrides __smp_processor_id(). From include/linux/smp.h /* * Allow the architecture to differentiate between a stable and unstable read. * For example, x86 uses an IRQ-safe asm-volatile read for the unstable but a * regular asm read for the stable. */ #ifndef __smp_processor_id #define __smp_processor_id(x) raw_smp_processor_id(x) #endif As far as I see, only x86 has a definition of __smp_processor_id. So for archs like Powerpc, __smp_processor_id(), is always defined as raw_smp_processor_id(). Right? I would think debug_smp_processor_id() would be useful if __smp_processor_id() is different from raw_smp_processor_id(). Do note debug_smp_processor_id() calls raw_smp_processor_id(). Or can I understand how debug_smp_processor_id() is useful if __smp_processor_id() is defined as raw_smp_processor_id()? > I think the stack trace here correctly identifies an incorrect use of > smp_processor_id(), and the call site needs to be changed. Do you > disagree? Yes the stack_trace shows that debug_smp_processor_id(). However what I want to understand is why should we even call debug_smp_processor_id(), when our __smp_processor_id() is defined as raw_smp_processor_id().
Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes: > * Nathan Lynch <nathanl@linux.ibm.com> [2021-09-22 11:01:12]: > >> Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes: >> > * Nathan Lynch <nathanl@linux.ibm.com> [2021-09-20 22:12:13]: >> > >> >> vcpu_is_preempted() can be used outside of preempt-disabled critical >> >> sections, yielding warnings such as: >> >> >> >> BUG: using smp_processor_id() in preemptible [00000000] code: systemd-udevd/185 >> >> caller is rwsem_spin_on_owner+0x1cc/0x2d0 >> >> CPU: 1 PID: 185 Comm: systemd-udevd Not tainted 5.15.0-rc2+ #33 >> >> Call Trace: >> >> [c000000012907ac0] [c000000000aa30a8] dump_stack_lvl+0xac/0x108 (unreliable) >> >> [c000000012907b00] [c000000001371f70] check_preemption_disabled+0x150/0x160 >> >> [c000000012907b90] [c0000000001e0e8c] rwsem_spin_on_owner+0x1cc/0x2d0 >> >> [c000000012907be0] [c0000000001e1408] rwsem_down_write_slowpath+0x478/0x9a0 >> >> [c000000012907ca0] [c000000000576cf4] filename_create+0x94/0x1e0 >> >> [c000000012907d10] [c00000000057ac08] do_symlinkat+0x68/0x1a0 >> >> [c000000012907d70] [c00000000057ae18] sys_symlink+0x58/0x70 >> >> [c000000012907da0] [c00000000002e448] system_call_exception+0x198/0x3c0 >> >> [c000000012907e10] [c00000000000c54c] system_call_common+0xec/0x250 >> >> >> >> The result of vcpu_is_preempted() is always subject to invalidation by >> >> events inside and outside of Linux; it's just a best guess at a point in >> >> time. Use raw_smp_processor_id() to avoid such warnings. >> > >> > Typically smp_processor_id() and raw_smp_processor_id() except for the >> > CONFIG_DEBUG_PREEMPT. >> >> Sorry, I don't follow... > > I meant, Unless CONFIG_DEBUG_PREEMPT, smp_processor_id() is defined as > raw_processor_id(). > >> >> > In the CONFIG_DEBUG_PREEMPT case, smp_processor_id() >> > is actually debug_smp_processor_id(), which does all the checks. >> >> Yes, OK. >> >> > I believe these checks in debug_smp_processor_id() are only valid for x86 >> > case (aka cases were they have __smp_processor_id() defined.) >> >> Hmm, I am under the impression that the checks in >> debug_smp_processor_id() are valid regardless of whether the arch >> overrides __smp_processor_id(). > > From include/linux/smp.h > > /* > * Allow the architecture to differentiate between a stable and unstable read. > * For example, x86 uses an IRQ-safe asm-volatile read for the unstable but a > * regular asm read for the stable. > */ > #ifndef __smp_processor_id > #define __smp_processor_id(x) raw_smp_processor_id(x) > #endif > > As far as I see, only x86 has a definition of __smp_processor_id. > So for archs like Powerpc, __smp_processor_id(), is always > defined as raw_smp_processor_id(). Right? Sure, yes. > I would think debug_smp_processor_id() would be useful if __smp_processor_id() > is different from raw_smp_processor_id(). Do note debug_smp_processor_id() > calls raw_smp_processor_id(). I do not think the utility of debug_smp_processor_id() is related to whether the arch defines __smp_processor_id(). > Or can I understand how debug_smp_processor_id() is useful if > __smp_processor_id() is defined as raw_smp_processor_id()? So, for powerpc with DEBUG_PREEMPT unset, a call to smp_procesor_id() expands to __smp_processor_id() which expands to raw_smp_processor_id(), avoiding the preempt safety checks. This is working as intended. For powerpc with DEBUG_PREEMPT=y, a call to smp_processor_id() expands to the out of line call to debug_smp_processor_id(), which calls raw_smp_processor_id() and performs the checks, warning if called in an inappropriate context, as seen here. Also working as intended. AFAICT __smp_processor_id() is a performance/codegen-oriented hook, and not really related to the debug facility. Please see 9ed7d75b2f09d ("x86/percpu: Relax smp_processor_id()"). >> I think the stack trace here correctly identifies an incorrect use of >> smp_processor_id(), and the call site needs to be changed. Do you >> disagree? > > Yes the stack_trace shows that debug_smp_processor_id(). However what > I want to understand is why should we even call > debug_smp_processor_id(), when our __smp_processor_id() is defined as > raw_smp_processor_id(). smp_processor_id() should always expand to debug_smp_processor_id() when DEBUG_PREEMPT=y, regardless of whether the arch overrides __smp_processor_id(). That is how I understand the intent of the code as written.
Nathan Lynch <nathanl@linux.ibm.com> writes: > Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes: > >> * Nathan Lynch <nathanl@linux.ibm.com> [2021-09-22 11:01:12]: >> >>> Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes: >>> > * Nathan Lynch <nathanl@linux.ibm.com> [2021-09-20 22:12:13]: >>> > >>> >> vcpu_is_preempted() can be used outside of preempt-disabled critical >>> >> sections, yielding warnings such as: >>> >> >>> >> BUG: using smp_processor_id() in preemptible [00000000] code: systemd-udevd/185 >>> >> caller is rwsem_spin_on_owner+0x1cc/0x2d0 >>> >> CPU: 1 PID: 185 Comm: systemd-udevd Not tainted 5.15.0-rc2+ #33 >>> >> Call Trace: >>> >> [c000000012907ac0] [c000000000aa30a8] dump_stack_lvl+0xac/0x108 (unreliable) >>> >> [c000000012907b00] [c000000001371f70] check_preemption_disabled+0x150/0x160 >>> >> [c000000012907b90] [c0000000001e0e8c] rwsem_spin_on_owner+0x1cc/0x2d0 >>> >> [c000000012907be0] [c0000000001e1408] rwsem_down_write_slowpath+0x478/0x9a0 >>> >> [c000000012907ca0] [c000000000576cf4] filename_create+0x94/0x1e0 >>> >> [c000000012907d10] [c00000000057ac08] do_symlinkat+0x68/0x1a0 >>> >> [c000000012907d70] [c00000000057ae18] sys_symlink+0x58/0x70 >>> >> [c000000012907da0] [c00000000002e448] system_call_exception+0x198/0x3c0 >>> >> [c000000012907e10] [c00000000000c54c] system_call_common+0xec/0x250 >>> >> >>> >> The result of vcpu_is_preempted() is always subject to invalidation by >>> >> events inside and outside of Linux; it's just a best guess at a point in >>> >> time. Use raw_smp_processor_id() to avoid such warnings. >>> > >>> > Typically smp_processor_id() and raw_smp_processor_id() except for the >>> > CONFIG_DEBUG_PREEMPT. >>> >>> Sorry, I don't follow... >> >> I meant, Unless CONFIG_DEBUG_PREEMPT, smp_processor_id() is defined as >> raw_processor_id(). >> >>> >>> > In the CONFIG_DEBUG_PREEMPT case, smp_processor_id() >>> > is actually debug_smp_processor_id(), which does all the checks. >>> >>> Yes, OK. >>> >>> > I believe these checks in debug_smp_processor_id() are only valid for x86 >>> > case (aka cases were they have __smp_processor_id() defined.) >>> >>> Hmm, I am under the impression that the checks in >>> debug_smp_processor_id() are valid regardless of whether the arch >>> overrides __smp_processor_id(). >> >> From include/linux/smp.h >> >> /* >> * Allow the architecture to differentiate between a stable and unstable read. >> * For example, x86 uses an IRQ-safe asm-volatile read for the unstable but a >> * regular asm read for the stable. >> */ >> #ifndef __smp_processor_id >> #define __smp_processor_id(x) raw_smp_processor_id(x) >> #endif >> >> As far as I see, only x86 has a definition of __smp_processor_id. >> So for archs like Powerpc, __smp_processor_id(), is always >> defined as raw_smp_processor_id(). Right? > > Sure, yes. > >> I would think debug_smp_processor_id() would be useful if __smp_processor_id() >> is different from raw_smp_processor_id(). Do note debug_smp_processor_id() >> calls raw_smp_processor_id(). Agree. > I do not think the utility of debug_smp_processor_id() is related to > whether the arch defines __smp_processor_id(). > >> Or can I understand how debug_smp_processor_id() is useful if >> __smp_processor_id() is defined as raw_smp_processor_id()? debug_smp_processor_id() is useful on powerpc, as well as other arches, because it checks that we're in a context where the processor id won't change out from under us. eg. something like this is unsafe: int counts[NR_CPUS]; int tmp, cpu; cpu = smp_processor_id(); tmp = counts[cpu]; <- preempted here and migrated to another CPU counts[cpu] = tmp + 1; > So, for powerpc with DEBUG_PREEMPT unset, a call to smp_procesor_id() > expands to __smp_processor_id() which expands to raw_smp_processor_id(), > avoiding the preempt safety checks. This is working as intended. > > For powerpc with DEBUG_PREEMPT=y, a call to smp_processor_id() expands > to the out of line call to debug_smp_processor_id(), which calls > raw_smp_processor_id() and performs the checks, warning if called in an > inappropriate context, as seen here. Also working as intended. > > AFAICT __smp_processor_id() is a performance/codegen-oriented hook, and > not really related to the debug facility. Please see 9ed7d75b2f09d > ("x86/percpu: Relax smp_processor_id()"). Yeah good find. cheers
* Michael Ellerman <mpe@ellerman.id.au> [2021-09-23 17:29:32]: > Nathan Lynch <nathanl@linux.ibm.com> writes: > > Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes: > > > >> * Nathan Lynch <nathanl@linux.ibm.com> [2021-09-22 11:01:12]: > >> > >>> Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes: > >>> > * Nathan Lynch <nathanl@linux.ibm.com> [2021-09-20 22:12:13]: > >>> > > >>> >> vcpu_is_preempted() can be used outside of preempt-disabled critical > >>> >> sections, yielding warnings such as: > >>> >> > >>> >> BUG: using smp_processor_id() in preemptible [00000000] code: systemd-udevd/185 > >>> >> caller is rwsem_spin_on_owner+0x1cc/0x2d0 > >>> >> CPU: 1 PID: 185 Comm: systemd-udevd Not tainted 5.15.0-rc2+ #33 > >>> >> Call Trace: > >>> >> [c000000012907ac0] [c000000000aa30a8] dump_stack_lvl+0xac/0x108 (unreliable) > >>> >> [c000000012907b00] [c000000001371f70] check_preemption_disabled+0x150/0x160 > >>> >> [c000000012907b90] [c0000000001e0e8c] rwsem_spin_on_owner+0x1cc/0x2d0 > >>> >> [c000000012907be0] [c0000000001e1408] rwsem_down_write_slowpath+0x478/0x9a0 > >>> >> [c000000012907ca0] [c000000000576cf4] filename_create+0x94/0x1e0 > >>> >> [c000000012907d10] [c00000000057ac08] do_symlinkat+0x68/0x1a0 > >>> >> [c000000012907d70] [c00000000057ae18] sys_symlink+0x58/0x70 > >>> >> [c000000012907da0] [c00000000002e448] system_call_exception+0x198/0x3c0 > >>> >> [c000000012907e10] [c00000000000c54c] system_call_common+0xec/0x250 > >>> >> > >>> >> The result of vcpu_is_preempted() is always subject to invalidation by > >>> >> events inside and outside of Linux; it's just a best guess at a point in > >>> >> time. Use raw_smp_processor_id() to avoid such warnings. > >>> > > >>> > Typically smp_processor_id() and raw_smp_processor_id() except for the > >>> > CONFIG_DEBUG_PREEMPT. > >>> > >>> Sorry, I don't follow... > >> > >> I meant, Unless CONFIG_DEBUG_PREEMPT, smp_processor_id() is defined as > >> raw_processor_id(). > >> > >>> > >>> > In the CONFIG_DEBUG_PREEMPT case, smp_processor_id() > >>> > is actually debug_smp_processor_id(), which does all the checks. > >>> > >>> Yes, OK. > >>> > >>> > I believe these checks in debug_smp_processor_id() are only valid for x86 > >>> > case (aka cases were they have __smp_processor_id() defined.) > >>> > >>> Hmm, I am under the impression that the checks in > >>> debug_smp_processor_id() are valid regardless of whether the arch > >>> overrides __smp_processor_id(). > >> > >> From include/linux/smp.h > >> > >> /* > >> * Allow the architecture to differentiate between a stable and unstable read. > >> * For example, x86 uses an IRQ-safe asm-volatile read for the unstable but a > >> * regular asm read for the stable. > >> */ > >> #ifndef __smp_processor_id > >> #define __smp_processor_id(x) raw_smp_processor_id(x) > >> #endif > >> > >> As far as I see, only x86 has a definition of __smp_processor_id. > >> So for archs like Powerpc, __smp_processor_id(), is always > >> defined as raw_smp_processor_id(). Right? > > > > Sure, yes. > > > >> I would think debug_smp_processor_id() would be useful if __smp_processor_id() > >> is different from raw_smp_processor_id(). Do note debug_smp_processor_id() > >> calls raw_smp_processor_id(). > > Agree. > > > I do not think the utility of debug_smp_processor_id() is related to > > whether the arch defines __smp_processor_id(). > > > >> Or can I understand how debug_smp_processor_id() is useful if > >> __smp_processor_id() is defined as raw_smp_processor_id()? > > debug_smp_processor_id() is useful on powerpc, as well as other arches, > because it checks that we're in a context where the processor id won't > change out from under us. > > eg. something like this is unsafe: > > int counts[NR_CPUS]; > int tmp, cpu; > > cpu = smp_processor_id(); > tmp = counts[cpu]; > <- preempted here and migrated to another CPU > counts[cpu] = tmp + 1; > If lets say the above call was replaced by raw_smp_processor_id(), how would it avoid the preemption / migration to another CPU? Replacing it with raw_smp_processor_id() may avoid, the debug splat but the underlying issue would still remain as is. No? > > > So, for powerpc with DEBUG_PREEMPT unset, a call to smp_procesor_id() > > expands to __smp_processor_id() which expands to raw_smp_processor_id(), > > avoiding the preempt safety checks. This is working as intended. > > > > For powerpc with DEBUG_PREEMPT=y, a call to smp_processor_id() expands > > to the out of line call to debug_smp_processor_id(), which calls > > raw_smp_processor_id() and performs the checks, warning if called in an > > inappropriate context, as seen here. Also working as intended. > > > > AFAICT __smp_processor_id() is a performance/codegen-oriented hook, and > > not really related to the debug facility. Please see 9ed7d75b2f09d > > ("x86/percpu: Relax smp_processor_id()"). > > Yeah good find. > > cheers
Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes: > * Michael Ellerman <mpe@ellerman.id.au> [2021-09-23 17:29:32]: > >> Nathan Lynch <nathanl@linux.ibm.com> writes: >> > Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes: >> > >> >> * Nathan Lynch <nathanl@linux.ibm.com> [2021-09-22 11:01:12]: >> >> >> >>> Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes: ... >> >> Or can I understand how debug_smp_processor_id() is useful if >> >> __smp_processor_id() is defined as raw_smp_processor_id()? >> >> debug_smp_processor_id() is useful on powerpc, as well as other arches, >> because it checks that we're in a context where the processor id won't >> change out from under us. >> >> eg. something like this is unsafe: >> >> int counts[NR_CPUS]; >> int tmp, cpu; >> >> cpu = smp_processor_id(); >> tmp = counts[cpu]; >> <- preempted here and migrated to another CPU >> counts[cpu] = tmp + 1; >> > > If lets say the above call was replaced by raw_smp_processor_id(), how would > it avoid the preemption / migration to another CPU? > > Replacing it with raw_smp_processor_id() may avoid, the debug splat but the > underlying issue would still remain as is. No? Correct. Using raw_smp_processor_id() is generally the wrong answer. For this example the correct fix is to disable preemption around the code, eg: int counts[NR_CPUS]; int tmp, cpu; preempt_disable() cpu = smp_processor_id(); tmp = counts[cpu]; counts[cpu] = tmp + 1; preempt_enable(); For the original issue I think it is OK to use raw_smp_processor_id(), because we're already doing a racy check of whether another CPU has been preempted by the hypervisor. if (!is_kvm_guest()) { int first_cpu = cpu_first_thread_sibling(smp_processor_id()); if (cpu_first_thread_sibling(cpu) == first_cpu) return false; } We could disable preemption around that, eg: if (!is_kvm_guest()) { int first_cpu; bool is_sibling; preempt_disable(); first_cpu = cpu_first_thread_sibling(smp_processor_id()); is_sibling = (cpu_first_thread_sibling(cpu) == first_cpu) preempt_enable(); // Can be preempted here if (is_sibling) return false; } But before we return we could be preempted, and then is_sibling is no longer necessarily correct. So that doesn't really gain us anything. The only way to make that result stable is to disable preemption in the caller, but most callers don't want to AFAICS, because they know they're doing a racy check to begin with. cheers
Michael Ellerman <mpe@ellerman.id.au> writes: > Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes: >> * Michael Ellerman <mpe@ellerman.id.au> [2021-09-23 17:29:32]: >> >>> Nathan Lynch <nathanl@linux.ibm.com> writes: >>> > Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes: >>> > >>> >> * Nathan Lynch <nathanl@linux.ibm.com> [2021-09-22 11:01:12]: >>> >> >>> >>> Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes: > ... >>> >> Or can I understand how debug_smp_processor_id() is useful if >>> >> __smp_processor_id() is defined as raw_smp_processor_id()? >>> >>> debug_smp_processor_id() is useful on powerpc, as well as other arches, >>> because it checks that we're in a context where the processor id won't >>> change out from under us. >>> >>> eg. something like this is unsafe: >>> >>> int counts[NR_CPUS]; >>> int tmp, cpu; >>> >>> cpu = smp_processor_id(); >>> tmp = counts[cpu]; >>> <- preempted here and migrated to another CPU >>> counts[cpu] = tmp + 1; >>> >> >> If lets say the above call was replaced by raw_smp_processor_id(), how would >> it avoid the preemption / migration to another CPU? >> >> Replacing it with raw_smp_processor_id() may avoid, the debug splat but the >> underlying issue would still remain as is. No? > > Correct. > > Using raw_smp_processor_id() is generally the wrong answer. For this > example the correct fix is to disable preemption around the code, eg: > > int counts[NR_CPUS]; > int tmp, cpu; > > preempt_disable() > > cpu = smp_processor_id(); > tmp = counts[cpu]; > counts[cpu] = tmp + 1; > > preempt_enable(); > > > For the original issue I think it is OK to use raw_smp_processor_id(), > because we're already doing a racy check of whether another CPU has been > preempted by the hypervisor. > > if (!is_kvm_guest()) { > int first_cpu = cpu_first_thread_sibling(smp_processor_id()); > > if (cpu_first_thread_sibling(cpu) == first_cpu) > return false; > } > > We could disable preemption around that, eg: > > if (!is_kvm_guest()) { > int first_cpu; > bool is_sibling; > > preempt_disable(); > first_cpu = cpu_first_thread_sibling(smp_processor_id()); > is_sibling = (cpu_first_thread_sibling(cpu) == first_cpu) > preempt_enable(); > > // Can be preempted here > > if (is_sibling) > return false; > } > > But before we return we could be preempted, and then is_sibling is no > longer necessarily correct. So that doesn't really gain us anything. > > The only way to make that result stable is to disable preemption in the > caller, but most callers don't want to AFAICS, because they know they're > doing a racy check to begin with. I'll add that one way I think about this is that when I choose smp_processor_id(), I am making a claim about the context in which it is used, and when I use raw_smp_processor_id() I am making a different claim. smp_processor_id() => I am sampling the CPU index in a critical section where the result equals the CPU that executes the entire critical section, and I am relying on that property for the section's correctness. This claim is verified by debug_smp_processor_id() when DEBUG_PREEMPT=y. raw_smp_processor_id() => I am sampling the CPU index and using the result in a way that is safe even if it differs from the CPU(s) on which the surrounding code actually executes. This framing doesn't cover all situations, but I've found it to be generally useful.
diff --git a/arch/powerpc/include/asm/paravirt.h b/arch/powerpc/include/asm/paravirt.h index bcb7b5f917be..e429aca566de 100644 --- a/arch/powerpc/include/asm/paravirt.h +++ b/arch/powerpc/include/asm/paravirt.h @@ -97,7 +97,14 @@ static inline bool vcpu_is_preempted(int cpu) #ifdef CONFIG_PPC_SPLPAR if (!is_kvm_guest()) { - int first_cpu = cpu_first_thread_sibling(smp_processor_id()); + int first_cpu; + + /* + * This is only a guess at best, and this function may be + * called with preemption enabled. Using raw_smp_processor_id() + * does not damage accuracy. + */ + first_cpu = cpu_first_thread_sibling(raw_smp_processor_id()); /* * Preemption can only happen at core granularity. This CPU
vcpu_is_preempted() can be used outside of preempt-disabled critical sections, yielding warnings such as: BUG: using smp_processor_id() in preemptible [00000000] code: systemd-udevd/185 caller is rwsem_spin_on_owner+0x1cc/0x2d0 CPU: 1 PID: 185 Comm: systemd-udevd Not tainted 5.15.0-rc2+ #33 Call Trace: [c000000012907ac0] [c000000000aa30a8] dump_stack_lvl+0xac/0x108 (unreliable) [c000000012907b00] [c000000001371f70] check_preemption_disabled+0x150/0x160 [c000000012907b90] [c0000000001e0e8c] rwsem_spin_on_owner+0x1cc/0x2d0 [c000000012907be0] [c0000000001e1408] rwsem_down_write_slowpath+0x478/0x9a0 [c000000012907ca0] [c000000000576cf4] filename_create+0x94/0x1e0 [c000000012907d10] [c00000000057ac08] do_symlinkat+0x68/0x1a0 [c000000012907d70] [c00000000057ae18] sys_symlink+0x58/0x70 [c000000012907da0] [c00000000002e448] system_call_exception+0x198/0x3c0 [c000000012907e10] [c00000000000c54c] system_call_common+0xec/0x250 The result of vcpu_is_preempted() is always subject to invalidation by events inside and outside of Linux; it's just a best guess at a point in time. Use raw_smp_processor_id() to avoid such warnings. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Fixes: ca3f969dcb11 ("powerpc/paravirt: Use is_kvm_guest() in vcpu_is_preempted()") --- arch/powerpc/include/asm/paravirt.h | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)