[v4,3/3] powerpc/prom_init: Use -ffreestanding to avoid a reference to bcmp
diff mbox series

Message ID 20191014025101.18567-4-natechancellor@gmail.com
State New
Headers show
Series
  • LLVM/Clang fixes for pseries_defconfig
Related show

Checks

Context Check Description
snowpatch_ozlabs/checkpatch success total: 0 errors, 0 warnings, 0 checks, 8 lines checked
snowpatch_ozlabs/build-pmac32 success Build succeeded
snowpatch_ozlabs/build-ppc64e success Build succeeded
snowpatch_ozlabs/build-ppc64be success Build succeeded
snowpatch_ozlabs/build-ppc64le success Build succeeded
snowpatch_ozlabs/apply_patch success Successfully applied on branch next (600802af9049be799465b24d14162918545634bf)

Commit Message

Nathan Chancellor Oct. 14, 2019, 2:51 a.m. UTC
r374662 gives LLVM the ability to convert certain loops into a reference
to bcmp as an optimization; this breaks prom_init_check.sh:

  CALL    arch/powerpc/kernel/prom_init_check.sh
Error: External symbol 'bcmp' referenced from prom_init.c
make[2]: *** [arch/powerpc/kernel/Makefile:196: prom_init_check] Error 1

bcmp is defined in lib/string.c as a wrapper for memcmp so this could be
added to the whitelist. However, commit 450e7dd4001f ("powerpc/prom_init:
don't use string functions from lib/") copied memcmp as prom_memcmp to
avoid KASAN instrumentation so having bcmp be resolved to regular memcmp
would break that assumption. Furthermore, because the compiler is the
one that inserted bcmp, we cannot provide something like prom_bcmp.

To prevent LLVM from being clever with optimizations like this, use
-ffreestanding to tell LLVM we are not hosted so it is not free to make
transformations like this.

Link: https://github.com/ClangBuiltLinux/linux/issues/647
Link: https://github.com/llvm/llvm-project/commit/76cdcf25b883751d83402baea6316772aa73865c
Reviewed-by: Nick Desaulneris <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
---

v1 -> v3:

* New patch in the series

v3 -> v4:

* Rebase on v5.4-rc3.

* Add Nick's reviewed-by tag.

* Update the LLVM commit reference to the latest applied version (r374662)
  as it was originally committed as r370454, reverted in r370788, and
  reapplied as r374662.

 arch/powerpc/kernel/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Segher Boessenkool Oct. 14, 2019, 9:35 a.m. UTC | #1
On Sun, Oct 13, 2019 at 07:51:01PM -0700, Nathan Chancellor wrote:
> r374662 gives LLVM the ability to convert certain loops into a reference
> to bcmp as an optimization; this breaks prom_init_check.sh:

When/why does LLVM think this is okay?  This function has been removed
from POSIX over a decade ago (and before that it always was marked as
legacy).


Segher
Nick Desaulniers Oct. 14, 2019, 3:56 p.m. UTC | #2
On Mon, Oct 14, 2019 at 2:35 AM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> On Sun, Oct 13, 2019 at 07:51:01PM -0700, Nathan Chancellor wrote:
> > r374662 gives LLVM the ability to convert certain loops into a reference
> > to bcmp as an optimization; this breaks prom_init_check.sh:
>
> When/why does LLVM think this is okay?  This function has been removed
> from POSIX over a decade ago (and before that it always was marked as
> legacy).

Segher, do you have links for any of the above? If so, that would be
helpful to me. I'm arguing against certain transforms that assume that
one library function is faster than another, when such claims are
based on measurements from one stdlib implementation. (There's others
in the pipeline I'm not too thrilled about, too).

The rationale for why it was added was that memcmp takes a measurable
amount of time in Google's fleet, and most calls to memcmp don't care
about the position of the mismatch; bcmp is lower overhead (or at
least for our libc implementation, not sure about others).
Segher Boessenkool Oct. 14, 2019, 7:11 p.m. UTC | #3
On Mon, Oct 14, 2019 at 08:56:12AM -0700, Nick Desaulniers wrote:
> On Mon, Oct 14, 2019 at 2:35 AM Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> >
> > On Sun, Oct 13, 2019 at 07:51:01PM -0700, Nathan Chancellor wrote:
> > > r374662 gives LLVM the ability to convert certain loops into a reference
> > > to bcmp as an optimization; this breaks prom_init_check.sh:
> >
> > When/why does LLVM think this is okay?  This function has been removed
> > from POSIX over a decade ago (and before that it always was marked as
> > legacy).
> 
> Segher, do you have links for any of the above? If so, that would be
> helpful to me.

Sure!

https://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xsh_chap03.html

Older versions are harder to find online, unfortunately.  But there is

https://kernel.org/pub/linux/docs/man-pages/man-pages-posix/

in which man3p/bcmp.3p says:

FUTURE DIRECTIONS
       This function may be withdrawn in a future version.

Finally, the Linux man pages say (man bcmp):

CONFORMING TO
       4.3BSD.   This  function   is   deprecated   (marked   as   LEGACY   in
       POSIX.1-2001): use memcmp(3) in new programs.  POSIX.1-2008 removes the
       specification of bcmp().


> I'm arguing against certain transforms that assume that
> one library function is faster than another, when such claims are
> based on measurements from one stdlib implementation.

Wow.  The difference between memcmp and bcmp is trivial (just the return
value is different, and that costs hardly anything to add).  And memcmp
is guaranteed to exist since C89/C90 at least.

> The rationale for why it was added was that memcmp takes a measurable
> amount of time in Google's fleet, and most calls to memcmp don't care
> about the position of the mismatch; bcmp is lower overhead (or at
> least for our libc implementation, not sure about others).

You just have to do the read of the last words you compare as big-endian,
and then you can just subtract the two words, convert that to "int" (which
is very inconvenient to do, but hardly expensive), and there you go.

Or on x86 use the bswap insn, or something like it.

Or, if you use GCC, it has __builtin_memcmp but also __builtin_memcmp_eq,
and those are automatically used, too.


Segher
Nathan Chancellor Oct. 18, 2019, 7 p.m. UTC | #4
On Mon, Oct 14, 2019 at 02:11:41PM -0500, Segher Boessenkool wrote:
> On Mon, Oct 14, 2019 at 08:56:12AM -0700, Nick Desaulniers wrote:
> > On Mon, Oct 14, 2019 at 2:35 AM Segher Boessenkool
> > <segher@kernel.crashing.org> wrote:
> > >
> > > On Sun, Oct 13, 2019 at 07:51:01PM -0700, Nathan Chancellor wrote:
> > > > r374662 gives LLVM the ability to convert certain loops into a reference
> > > > to bcmp as an optimization; this breaks prom_init_check.sh:
> > >
> > > When/why does LLVM think this is okay?  This function has been removed
> > > from POSIX over a decade ago (and before that it always was marked as
> > > legacy).
> > 
> > Segher, do you have links for any of the above? If so, that would be
> > helpful to me.
> 
> Sure!
> 
> https://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xsh_chap03.html
> 
> Older versions are harder to find online, unfortunately.  But there is
> 
> https://kernel.org/pub/linux/docs/man-pages/man-pages-posix/
> 
> in which man3p/bcmp.3p says:
> 
> FUTURE DIRECTIONS
>        This function may be withdrawn in a future version.
> 
> Finally, the Linux man pages say (man bcmp):
> 
> CONFORMING TO
>        4.3BSD.   This  function   is   deprecated   (marked   as   LEGACY   in
>        POSIX.1-2001): use memcmp(3) in new programs.  POSIX.1-2008 removes the
>        specification of bcmp().
> 
> 
> > I'm arguing against certain transforms that assume that
> > one library function is faster than another, when such claims are
> > based on measurements from one stdlib implementation.
> 
> Wow.  The difference between memcmp and bcmp is trivial (just the return
> value is different, and that costs hardly anything to add).  And memcmp
> is guaranteed to exist since C89/C90 at least.
> 
> > The rationale for why it was added was that memcmp takes a measurable
> > amount of time in Google's fleet, and most calls to memcmp don't care
> > about the position of the mismatch; bcmp is lower overhead (or at
> > least for our libc implementation, not sure about others).
> 
> You just have to do the read of the last words you compare as big-endian,
> and then you can just subtract the two words, convert that to "int" (which
> is very inconvenient to do, but hardly expensive), and there you go.
> 
> Or on x86 use the bswap insn, or something like it.
> 
> Or, if you use GCC, it has __builtin_memcmp but also __builtin_memcmp_eq,
> and those are automatically used, too.
> 
> 
> Segher

Just as an FYI, there was some more discussion around the availablity
and use of bcmp in this LLVM bug which spawned
commit 5f074f3e192f ("lib/string.c: implement a basic bcmp").

https://bugs.llvm.org/show_bug.cgi?id=41035#c13

I believe this is the proper solution but I am fine with whatever works,
I just want our CI to be green without any out of tree patches again...

Cheers,
Nathan
Segher Boessenkool Oct. 18, 2019, 8:02 p.m. UTC | #5
On Fri, Oct 18, 2019 at 12:00:22PM -0700, Nathan Chancellor wrote:
> Just as an FYI, there was some more discussion around the availablity
> and use of bcmp in this LLVM bug which spawned
> commit 5f074f3e192f ("lib/string.c: implement a basic bcmp").
> 
> https://bugs.llvm.org/show_bug.cgi?id=41035#c13
> 
> I believe this is the proper solution but I am fine with whatever works,
> I just want our CI to be green without any out of tree patches again...

I think the proper solution is for the kernel to *do* use -ffreestanding,
and then somehow tell the kernel that memcpy etc. are the standard
functions.  A freestanding GCC already requires memcpy, memmove, memset,
memcmp, and sometimes abort to exist and do the standard thing; why cannot
programs then also rely on it to be the standard functions.

What exact functions are the reason the kernel does not use -ffreestanding?
Is it just memcpy?  Is more wanted?


Segher
Nathan Chancellor Oct. 22, 2019, 5:15 a.m. UTC | #6
On Fri, Oct 18, 2019 at 03:02:10PM -0500, Segher Boessenkool wrote:
> On Fri, Oct 18, 2019 at 12:00:22PM -0700, Nathan Chancellor wrote:
> > Just as an FYI, there was some more discussion around the availablity
> > and use of bcmp in this LLVM bug which spawned
> > commit 5f074f3e192f ("lib/string.c: implement a basic bcmp").
> > 
> > https://bugs.llvm.org/show_bug.cgi?id=41035#c13
> > 
> > I believe this is the proper solution but I am fine with whatever works,
> > I just want our CI to be green without any out of tree patches again...
> 
> I think the proper solution is for the kernel to *do* use -ffreestanding,
> and then somehow tell the kernel that memcpy etc. are the standard
> functions.  A freestanding GCC already requires memcpy, memmove, memset,
> memcmp, and sometimes abort to exist and do the standard thing; why cannot
> programs then also rely on it to be the standard functions.
> 
> What exact functions are the reason the kernel does not use -ffreestanding?
> Is it just memcpy?  Is more wanted?
> 
> 
> Segher

I think Linus summarized it pretty well here:

https://lore.kernel.org/lkml/CAHk-=wi-epJZfBHDbKKDZ64us7WkF=LpUfhvYBmZSteO8Q0RAg@mail.gmail.com/

Cheers,
Nathan
Segher Boessenkool Oct. 22, 2019, 8:57 a.m. UTC | #7
On Mon, Oct 21, 2019 at 10:15:29PM -0700, Nathan Chancellor wrote:
> On Fri, Oct 18, 2019 at 03:02:10PM -0500, Segher Boessenkool wrote:
> > I think the proper solution is for the kernel to *do* use -ffreestanding,
> > and then somehow tell the kernel that memcpy etc. are the standard
> > functions.  A freestanding GCC already requires memcpy, memmove, memset,
> > memcmp, and sometimes abort to exist and do the standard thing; why cannot
> > programs then also rely on it to be the standard functions.
> > 
> > What exact functions are the reason the kernel does not use -ffreestanding?
> > Is it just memcpy?  Is more wanted?
> 
> I think Linus summarized it pretty well here:
> 
> https://lore.kernel.org/lkml/CAHk-=wi-epJZfBHDbKKDZ64us7WkF=LpUfhvYBmZSteO8Q0RAg@mail.gmail.com/

GCC recognises __builtin_memcpy (or any other __builtin) just fine even
with -ffreestanding.

So the kernel wants a warning (or error) whenever a call to one of these
library functions is generated by the compiler without the user asking
for it directly (via a __builtin)?  And that is all that is needed for
the kernel to use -ffreestanding?

That shouldn't be hard.  Anything missing here?


Segher
Nathan Chancellor Oct. 30, 2019, 4:12 a.m. UTC | #8
On Tue, Oct 22, 2019 at 03:57:09AM -0500, Segher Boessenkool wrote:
> On Mon, Oct 21, 2019 at 10:15:29PM -0700, Nathan Chancellor wrote:
> > On Fri, Oct 18, 2019 at 03:02:10PM -0500, Segher Boessenkool wrote:
> > > I think the proper solution is for the kernel to *do* use -ffreestanding,
> > > and then somehow tell the kernel that memcpy etc. are the standard
> > > functions.  A freestanding GCC already requires memcpy, memmove, memset,
> > > memcmp, and sometimes abort to exist and do the standard thing; why cannot
> > > programs then also rely on it to be the standard functions.
> > > 
> > > What exact functions are the reason the kernel does not use -ffreestanding?
> > > Is it just memcpy?  Is more wanted?
> > 
> > I think Linus summarized it pretty well here:
> > 
> > https://lore.kernel.org/lkml/CAHk-=wi-epJZfBHDbKKDZ64us7WkF=LpUfhvYBmZSteO8Q0RAg@mail.gmail.com/
> 
> GCC recognises __builtin_memcpy (or any other __builtin) just fine even
> with -ffreestanding.
> 
> So the kernel wants a warning (or error) whenever a call to one of these
> library functions is generated by the compiler without the user asking
> for it directly (via a __builtin)?  And that is all that is needed for
> the kernel to use -ffreestanding?
> 
> That shouldn't be hard.  Anything missing here?
> 
> 
> Segher

Yes, I suppose that would be good enough.

I don't know if there are any other optimizations that are missed out on
by using -ffreestanding. It would probably be worth asking other kernel
developers on a separate thread (or the one I linked above).

Would be nice to get this shored up soon since our PowerPC builds have
been broken since the beginning of August :/

Cheers,
Nathan

Patch
diff mbox series

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index f1f362146135..7f0ee465dfb6 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -21,7 +21,7 @@  CFLAGS_prom_init.o += $(DISABLE_LATENT_ENTROPY_PLUGIN)
 CFLAGS_btext.o += $(DISABLE_LATENT_ENTROPY_PLUGIN)
 CFLAGS_prom.o += $(DISABLE_LATENT_ENTROPY_PLUGIN)
 
-CFLAGS_prom_init.o += $(call cc-option, -fno-stack-protector)
+CFLAGS_prom_init.o += $(call cc-option, -fno-stack-protector) -ffreestanding
 
 ifdef CONFIG_FUNCTION_TRACER
 # Do not trace early boot code