diff mbox series

powerpc/lib/xor_vmx: Relax frame size for clang

Message ID 20190621085822.1527-1-malat@debian.org (mailing list archive)
State New
Headers show
Series powerpc/lib/xor_vmx: Relax frame size for clang | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success Successfully applied on branch next (e610a466d16a086e321f0bd421e2fc75cff28605)
snowpatch_ozlabs/build-ppc64le success Build succeeded
snowpatch_ozlabs/build-ppc64be success Build succeeded
snowpatch_ozlabs/build-ppc64e success Build succeeded
snowpatch_ozlabs/build-pmac32 success Build succeeded
snowpatch_ozlabs/checkpatch success total: 0 errors, 0 warnings, 0 checks, 9 lines checked

Commit Message

Mathieu Malaterre June 21, 2019, 8:58 a.m. UTC
When building with clang-8 the frame size limit is hit:

  ../arch/powerpc/lib/xor_vmx.c:119:6: error: stack frame size of 1200 bytes in function '__xor_altivec_5' [-Werror,-Wframe-larger-than=]

Follow the same approach as commit 9c87156cce5a ("powerpc/xmon: Relax
frame size for clang") until a proper fix is implemented upstream in
clang and relax requirement for clang.

Link: https://github.com/ClangBuiltLinux/linux/issues/563
Cc: Joel Stanley <joel@jms.id.au>
Signed-off-by: Mathieu Malaterre <malat@debian.org>
---
 arch/powerpc/lib/Makefile | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Christophe Leroy Sept. 7, 2022, 5:21 p.m. UTC | #1
Le 21/06/2019 à 10:58, Mathieu Malaterre a écrit :
> When building with clang-8 the frame size limit is hit:
> 
>    ../arch/powerpc/lib/xor_vmx.c:119:6: error: stack frame size of 1200 bytes in function '__xor_altivec_5' [-Werror,-Wframe-larger-than=]
> 
> Follow the same approach as commit 9c87156cce5a ("powerpc/xmon: Relax
> frame size for clang") until a proper fix is implemented upstream in
> clang and relax requirement for clang.

With Clang 14 I get the following errors, but only with KASAN selected.

   CC      arch/powerpc/lib/xor_vmx.o
arch/powerpc/lib/xor_vmx.c:95:6: error: stack frame size (1040) exceeds 
limit (1024) in '__xor_altivec_4' [-Werror,-Wframe-larger-than]
void __xor_altivec_4(unsigned long bytes,
      ^
arch/powerpc/lib/xor_vmx.c:124:6: error: stack frame size (1312) exceeds 
limit (1024) in '__xor_altivec_5' [-Werror,-Wframe-larger-than]
void __xor_altivec_5(unsigned long bytes,
      ^


Is this patch still relevant ?

Or should frame size be relaxed when KASAN is selected ? After all the 
stack size is multiplied by 2 when we have KASAN, so maybe the warning 
limit should be increased as well ?

Thanks
Christophe

> 
> Link: https://github.com/ClangBuiltLinux/linux/issues/563
> Cc: Joel Stanley <joel@jms.id.au>
> Signed-off-by: Mathieu Malaterre <malat@debian.org>
> ---
>   arch/powerpc/lib/Makefile | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
> index c55f9c27bf79..b3f7d64caaf0 100644
> --- a/arch/powerpc/lib/Makefile
> +++ b/arch/powerpc/lib/Makefile
> @@ -58,5 +58,9 @@ obj-$(CONFIG_FTR_FIXUP_SELFTEST) += feature-fixups-test.o
>   
>   obj-$(CONFIG_ALTIVEC)	+= xor_vmx.o xor_vmx_glue.o
>   CFLAGS_xor_vmx.o += -maltivec $(call cc-option,-mabi=altivec)
> +ifdef CONFIG_CC_IS_CLANG
> +# See https://github.com/ClangBuiltLinux/linux/issues/563
> +CFLAGS_xor_vmx.o += -Wframe-larger-than=4096
> +endif
>   
>   obj-$(CONFIG_PPC64) += $(obj64-y)
Michael Ellerman Sept. 8, 2022, 12:27 a.m. UTC | #2
Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> Le 21/06/2019 à 10:58, Mathieu Malaterre a écrit :
>> When building with clang-8 the frame size limit is hit:
>> 
>>    ../arch/powerpc/lib/xor_vmx.c:119:6: error: stack frame size of 1200 bytes in function '__xor_altivec_5' [-Werror,-Wframe-larger-than=]
>> 
>> Follow the same approach as commit 9c87156cce5a ("powerpc/xmon: Relax
>> frame size for clang") until a proper fix is implemented upstream in
>> clang and relax requirement for clang.
>
> With Clang 14 I get the following errors, but only with KASAN selected.
>
>    CC      arch/powerpc/lib/xor_vmx.o
> arch/powerpc/lib/xor_vmx.c:95:6: error: stack frame size (1040) exceeds 
> limit (1024) in '__xor_altivec_4' [-Werror,-Wframe-larger-than]
> void __xor_altivec_4(unsigned long bytes,
>       ^
> arch/powerpc/lib/xor_vmx.c:124:6: error: stack frame size (1312) exceeds 
> limit (1024) in '__xor_altivec_5' [-Werror,-Wframe-larger-than]
> void __xor_altivec_5(unsigned long bytes,
>       ^

That's a 32-bit build?

> Is this patch still relevant ?

The clang issue was closed because a different change fixed the issue:

  https://github.com/ClangBuiltLinux/linux/issues/563

> Or should frame size be relaxed when KASAN is selected ? After all the 
> stack size is multiplied by 2 when we have KASAN, so maybe the warning 
> limit should be increased as well ?

Yeah that would make some sense.

On 64-bit the largest frame in that file is 1424, which is below the
default 2048 byte limit.

So maybe just increase it for 32-bit && KASAN.

What would be nice is if the FRAME_WARN value could be calculated as a
percentage of the THREAD_SHIFT, but that's not easily doable with the
way things are structured in Kconfig.

cheers
Christophe Leroy Sept. 8, 2022, 6 a.m. UTC | #3
Le 08/09/2022 à 02:27, Michael Ellerman a écrit :
> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>> Le 21/06/2019 à 10:58, Mathieu Malaterre a écrit :
>>> When building with clang-8 the frame size limit is hit:
>>>
>>>     ../arch/powerpc/lib/xor_vmx.c:119:6: error: stack frame size of 1200 bytes in function '__xor_altivec_5' [-Werror,-Wframe-larger-than=]
>>>
>>> Follow the same approach as commit 9c87156cce5a ("powerpc/xmon: Relax
>>> frame size for clang") until a proper fix is implemented upstream in
>>> clang and relax requirement for clang.
>>
>> With Clang 14 I get the following errors, but only with KASAN selected.
>>
>>     CC      arch/powerpc/lib/xor_vmx.o
>> arch/powerpc/lib/xor_vmx.c:95:6: error: stack frame size (1040) exceeds
>> limit (1024) in '__xor_altivec_4' [-Werror,-Wframe-larger-than]
>> void __xor_altivec_4(unsigned long bytes,
>>        ^
>> arch/powerpc/lib/xor_vmx.c:124:6: error: stack frame size (1312) exceeds
>> limit (1024) in '__xor_altivec_5' [-Werror,-Wframe-larger-than]
>> void __xor_altivec_5(unsigned long bytes,
>>        ^
> 
> That's a 32-bit build?

Yes, pmac32_defconfig

> 
>> Is this patch still relevant ?
> 
> The clang issue was closed because a different change fixed the issue:
> 
>    https://github.com/ClangBuiltLinux/linux/issues/563
> 
>> Or should frame size be relaxed when KASAN is selected ? After all the
>> stack size is multiplied by 2 when we have KASAN, so maybe the warning
>> limit should be increased as well ?
> 
> Yeah that would make some sense.
> 
> On 64-bit the largest frame in that file is 1424, which is below the
> default 2048 byte limit.
> 
> So maybe just increase it for 32-bit && KASAN.
> 
> What would be nice is if the FRAME_WARN value could be calculated as a
> percentage of the THREAD_SHIFT, but that's not easily doable with the
> way things are structured in Kconfig.
> 

Looking at it more deeply, I see strange things.

What is that frame size ? I thought it was the number of bytes r1 is 
decremented at the begining of the function, but it seems not, at least 
on GCC. It seems GCC substrats 112 bytes while clang doesn't.

I set CONFIG_FRAME_WARN to 8 and with GCC and without KASAN, I get no 
warning, allthough I have:

00000000 <__xor_altivec_2>:
    0:	94 21 ff f0 	stwu    r1,-16(r1)
00000078 <__xor_altivec_3>:
   78:	94 21 ff f0 	stwu    r1,-16(r1)
0000010c <__xor_altivec_4>:
  10c:	94 21 ff f0 	stwu    r1,-16(r1)
000001c4 <__xor_altivec_5>:
  1c4:	94 21 ff e0 	stwu    r1,-32(r1)

With GCC and inline KASAN I get:

arch/powerpc/lib/xor_vmx.c: In function '__xor_altivec_2':
arch/powerpc/lib/xor_vmx.c:69:1: warning: the frame size of 96 bytes is 
larger than 8 bytes [-Wframe-larger-than=]
arch/powerpc/lib/xor_vmx.c: In function '__xor_altivec_3':
arch/powerpc/lib/xor_vmx.c:93:1: warning: the frame size of 128 bytes is 
larger than 8 bytes [-Wframe-larger-than=]
arch/powerpc/lib/xor_vmx.c: In function '__xor_altivec_4':
arch/powerpc/lib/xor_vmx.c:122:1: warning: the frame size of 80 bytes is 
larger than 8 bytes [-Wframe-larger-than=]
arch/powerpc/lib/xor_vmx.c: In function '__xor_altivec_5':
arch/powerpc/lib/xor_vmx.c:156:1: warning: the frame size of 128 bytes 
is larger than 8 bytes [-Wframe-larger-than=]

00000000 <__xor_altivec_2>:
        0:	94 21 ff 30 	stwu    r1,-208(r1)
00000458 <__xor_altivec_3>:
      458:	94 21 ff 00 	stwu    r1,-256(r1)
00000b94 <__xor_altivec_4>:
      b94:	94 21 fe b0 	stwu    r1,-336(r1)
000015b8 <__xor_altivec_5>:
     15b8:	94 21 fe 60 	stwu    r1,-416(r1)

With CLANG and without KASAN I get:

   CC      arch/powerpc/lib/xor_vmx.o
arch/powerpc/lib/xor_vmx.c:52:6: warning: stack frame size (144) exceeds 
limit (8) in '__xor_altivec_2' [-Wframe-larger-than]
void __xor_altivec_2(unsigned long bytes,
arch/powerpc/lib/xor_vmx.c:71:6: warning: stack frame size (144) exceeds 
limit (8) in '__xor_altivec_3' [-Wframe-larger-than]
void __xor_altivec_3(unsigned long bytes,
arch/powerpc/lib/xor_vmx.c:95:6: warning: stack frame size (160) exceeds 
limit (8) in '__xor_altivec_4' [-Wframe-larger-than]
void __xor_altivec_4(unsigned long bytes,
arch/powerpc/lib/xor_vmx.c:124:6: warning: stack frame size (144) 
exceeds limit (8) in '__xor_altivec_5' [-Wframe-larger-than]
void __xor_altivec_5(unsigned long bytes,

00000000 <__xor_altivec_2>:
        0:	94 21 ff 70 	stwu    r1,-144(r1)
00000528 <__xor_altivec_3>:
      528:	94 21 ff 70 	stwu    r1,-144(r1)
00000c4c <__xor_altivec_4>:
      c4c:	94 21 ff 60 	stwu    r1,-160(r1)
000015a4 <__xor_altivec_5>:
     15a4:	94 21 ff 70 	stwu    r1,-144(r1)

With CLANG and with inline KASAN I get:

arch/powerpc/lib/xor_vmx.c:52:6: warning: stack frame size (512) exceeds 
limit (8) in '__xor_altivec_2' [-Wframe-larger-than]
void __xor_altivec_2(unsigned long bytes,
arch/powerpc/lib/xor_vmx.c:71:6: warning: stack frame size (768) exceeds 
limit (8) in '__xor_altivec_3' [-Wframe-larger-than]
void __xor_altivec_3(unsigned long bytes,
arch/powerpc/lib/xor_vmx.c:95:6: warning: stack frame size (1040) 
exceeds limit (8) in '__xor_altivec_4' [-Wframe-larger-than]
void __xor_altivec_4(unsigned long bytes,
arch/powerpc/lib/xor_vmx.c:124:6: warning: stack frame size (1312) 
exceeds limit (8) in '__xor_altivec_5' [-Wframe-larger-than]
void __xor_altivec_5(unsigned long bytes,

00000000 <__xor_altivec_2>:
        8:	94 21 fe 00 	stwu    r1,-512(r1)
00000a24 <__xor_altivec_3>:
      a2c:	94 21 fd 00 	stwu    r1,-768(r1)
000019a4 <__xor_altivec_4>:
     19ac:	94 21 fb f0 	stwu    r1,-1040(r1)
00002f20 <__xor_altivec_5>:
     2f28:	94 21 fa e0 	stwu    r1,-1312(r1)


So it seems that GCC and CLANG don't warn on the same thing, is that 
expected ? GCC substrats 112 bytes, which is the minimum frame size on a 
ppc64, but here I'm building a ppc32 kernel, min frame size is 16.

And CLANG is still using stack a lot more than GCC.

Christophe
Segher Boessenkool Sept. 8, 2022, 1:48 p.m. UTC | #4
On Thu, Sep 08, 2022 at 06:00:24AM +0000, Christophe Leroy wrote:
> Looking at it more deeply, I see strange things.

I'll have to see full generated machine code to be able to see strange
things, there isn't enough information at all here yet.  Sorry.

Use private mail if it is too big or uninteresting for the list :-)

> What is that frame size ? I thought it was the number of bytes r1 is 
> decremented at the begining of the function, but it seems not, at least 
> on GCC. It seems GCC substrats 112 bytes while clang doesn't.

That is the vars size + the fixed size + the size of the parameter
save area + the size of the regs save area, rounded up to a multiple
of 16.  Fixed size is 8 on 32-bit PowerPC ELF.  Frame size used by GCC
here is just the vars size.

> So it seems that GCC and CLANG don't warn on the same thing, is that 
> expected ? GCC substrats 112 bytes, which is the minimum frame size on a 
> ppc64, but here I'm building a ppc32 kernel, min frame size is 16.

I need to see the generated code to make sense of what is happening
here.  It sounds like it is doing varargs calls or similar expensive
stack juggling.  Or just saving a boatload of registers on the stack.

> And CLANG is still using stack a lot more than GCC.

Good to hear!  Well, good for GCC, anyway ;-)


Segher
Arnd Bergmann Sept. 8, 2022, 3:07 p.m. UTC | #5
On Thu, Sep 8, 2022, at 2:27 AM, Michael Ellerman wrote:
> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>
> Yeah that would make some sense.
>
> On 64-bit the largest frame in that file is 1424, which is below the
> default 2048 byte limit.
>
> So maybe just increase it for 32-bit && KASAN.
>
> What would be nice is if the FRAME_WARN value could be calculated as a
> percentage of the THREAD_SHIFT, but that's not easily doable with the
> way things are structured in Kconfig.
>

Increasing the warning limit slightly for 32-bit with
CONFIG_KASAN_STACK makes sense, but there are a lot of
related concerns:

- I was hoping to still stay under 1280 bytes for the warning
  limit, so that even with KASAN_STACK enabled, we are able to
  catch warnings in functions that use a stupid amount of
  local variables, without getting too many false positives.

- if the XOR code has its frame size explode like this, it's
  probably an indication of the compiler doing something wrong,
  not the kernel code. The result is likely that the "optimized"
  XOR implementation is slower than the default version as a
  result, and the kernel will pick the other one at boot time.
  This needs to be confirmed of course, but an easier workaround
  for this instance might be to just disable the xor_vmx module
  when KASAN_STACK is set.

- The warning limit on 32-bit is actually 2028 bytes when
  GCC_PLUGIN_LATENT_ENTROPY is set. I think this is a mistake
  and we should lower /that/ limit instead, but a side-effect
  here is that an allmodconfig kernel build with gcc will fail
  to warn about bugs that exist both with gcc and clang, while
  clang complains about it.

      Arnd
Segher Boessenkool Sept. 8, 2022, 10:40 p.m. UTC | #6
Hi!

On Thu, Sep 08, 2022 at 05:07:24PM +0200, Arnd Bergmann wrote:
> - if the XOR code has its frame size explode like this, it's
>   probably an indication of the compiler doing something wrong,
>   not the kernel code.

On the contrary, it is most likely an indication that the kernel code
wants something unreasonable.  Like, having 20 variables live at the
same time, but still wanting nicely scheduled machine code generated.

But I suspect GCC unrolled the loops here, even?  Best way to prevent
that here is to put an option in the Makefile, for these files.  We
don't want any of this unrolled after all?  Or, alternatively, remove
all the manual unrolling from this code, let GCC do its thing, without
painting it in a corner.

>   The result is likely that the "optimized"
>   XOR implementation is slower than the default version as a
>   result, and the kernel will pick the other one at boot time.

Yes.  So it's self-healing even, of a sort :-)


Segher
Christophe Leroy Sept. 9, 2022, 5:01 a.m. UTC | #7
Le 08/09/2022 à 15:48, Segher Boessenkool a écrit :
> On Thu, Sep 08, 2022 at 06:00:24AM +0000, Christophe Leroy wrote:
>> Looking at it more deeply, I see strange things.
> 
> I'll have to see full generated machine code to be able to see strange
> things, there isn't enough information at all here yet.  Sorry.

Well, what I call strange is the fact that with GCC the number of bytes 
reported by -Wframe-larger-than doesn't match the value the offset used 
for the stwu at the start of the function, while it does with clang.

> 
> Use private mail if it is too big or uninteresting for the list :-)
> 
>> What is that frame size ? I thought it was the number of bytes r1 is
>> decremented at the begining of the function, but it seems not, at least
>> on GCC. It seems GCC substrats 112 bytes while clang doesn't.
> 
> That is the vars size + the fixed size + the size of the parameter
> save area + the size of the regs save area, rounded up to a multiple
> of 16.  Fixed size is 8 on 32-bit PowerPC ELF.  Frame size used by GCC
> here is just the vars size.

Ok, so it means that the stack utilisation is underestimated when using 
GCC ? Or is it clang that overestimates it ?

> 
>> So it seems that GCC and CLANG don't warn on the same thing, is that
>> expected ? GCC substrats 112 bytes, which is the minimum frame size on a
>> ppc64, but here I'm building a ppc32 kernel, min frame size is 16.
> 
> I need to see the generated code to make sense of what is happening
> here.  It sounds like it is doing varargs calls or similar expensive
> stack juggling.  Or just saving a boatload of registers on the stack.
> 

Ok, I'll send it to you. But once again, I don't mind what the code 
really look like, I'm just worried that GCC doesn't report the entire 
stack usage.


Christophe
diff mbox series

Patch

diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
index c55f9c27bf79..b3f7d64caaf0 100644
--- a/arch/powerpc/lib/Makefile
+++ b/arch/powerpc/lib/Makefile
@@ -58,5 +58,9 @@  obj-$(CONFIG_FTR_FIXUP_SELFTEST) += feature-fixups-test.o
 
 obj-$(CONFIG_ALTIVEC)	+= xor_vmx.o xor_vmx_glue.o
 CFLAGS_xor_vmx.o += -maltivec $(call cc-option,-mabi=altivec)
+ifdef CONFIG_CC_IS_CLANG
+# See https://github.com/ClangBuiltLinux/linux/issues/563
+CFLAGS_xor_vmx.o += -Wframe-larger-than=4096
+endif
 
 obj-$(CONFIG_PPC64) += $(obj64-y)