Message ID | 20191014025101.18567-4-natechancellor@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | LLVM/Clang fixes for pseries_defconfig | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | Successfully applied on branch next (600802af9049be799465b24d14162918545634bf) |
snowpatch_ozlabs/build-ppc64le | success | Build succeeded |
snowpatch_ozlabs/build-ppc64be | success | Build succeeded |
snowpatch_ozlabs/build-ppc64e | success | Build succeeded |
snowpatch_ozlabs/build-pmac32 | success | Build succeeded |
snowpatch_ozlabs/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 8 lines checked |
On Sun, Oct 13, 2019 at 07:51:01PM -0700, Nathan Chancellor wrote: > r374662 gives LLVM the ability to convert certain loops into a reference > to bcmp as an optimization; this breaks prom_init_check.sh: When/why does LLVM think this is okay? This function has been removed from POSIX over a decade ago (and before that it always was marked as legacy). Segher
On Mon, Oct 14, 2019 at 2:35 AM Segher Boessenkool <segher@kernel.crashing.org> wrote: > > On Sun, Oct 13, 2019 at 07:51:01PM -0700, Nathan Chancellor wrote: > > r374662 gives LLVM the ability to convert certain loops into a reference > > to bcmp as an optimization; this breaks prom_init_check.sh: > > When/why does LLVM think this is okay? This function has been removed > from POSIX over a decade ago (and before that it always was marked as > legacy). Segher, do you have links for any of the above? If so, that would be helpful to me. I'm arguing against certain transforms that assume that one library function is faster than another, when such claims are based on measurements from one stdlib implementation. (There's others in the pipeline I'm not too thrilled about, too). The rationale for why it was added was that memcmp takes a measurable amount of time in Google's fleet, and most calls to memcmp don't care about the position of the mismatch; bcmp is lower overhead (or at least for our libc implementation, not sure about others).
On Mon, Oct 14, 2019 at 08:56:12AM -0700, Nick Desaulniers wrote: > On Mon, Oct 14, 2019 at 2:35 AM Segher Boessenkool > <segher@kernel.crashing.org> wrote: > > > > On Sun, Oct 13, 2019 at 07:51:01PM -0700, Nathan Chancellor wrote: > > > r374662 gives LLVM the ability to convert certain loops into a reference > > > to bcmp as an optimization; this breaks prom_init_check.sh: > > > > When/why does LLVM think this is okay? This function has been removed > > from POSIX over a decade ago (and before that it always was marked as > > legacy). > > Segher, do you have links for any of the above? If so, that would be > helpful to me. Sure! https://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xsh_chap03.html Older versions are harder to find online, unfortunately. But there is https://kernel.org/pub/linux/docs/man-pages/man-pages-posix/ in which man3p/bcmp.3p says: FUTURE DIRECTIONS This function may be withdrawn in a future version. Finally, the Linux man pages say (man bcmp): CONFORMING TO 4.3BSD. This function is deprecated (marked as LEGACY in POSIX.1-2001): use memcmp(3) in new programs. POSIX.1-2008 removes the specification of bcmp(). > I'm arguing against certain transforms that assume that > one library function is faster than another, when such claims are > based on measurements from one stdlib implementation. Wow. The difference between memcmp and bcmp is trivial (just the return value is different, and that costs hardly anything to add). And memcmp is guaranteed to exist since C89/C90 at least. > The rationale for why it was added was that memcmp takes a measurable > amount of time in Google's fleet, and most calls to memcmp don't care > about the position of the mismatch; bcmp is lower overhead (or at > least for our libc implementation, not sure about others). You just have to do the read of the last words you compare as big-endian, and then you can just subtract the two words, convert that to "int" (which is very inconvenient to do, but hardly expensive), and there you go. Or on x86 use the bswap insn, or something like it. Or, if you use GCC, it has __builtin_memcmp but also __builtin_memcmp_eq, and those are automatically used, too. Segher
On Mon, Oct 14, 2019 at 02:11:41PM -0500, Segher Boessenkool wrote: > On Mon, Oct 14, 2019 at 08:56:12AM -0700, Nick Desaulniers wrote: > > On Mon, Oct 14, 2019 at 2:35 AM Segher Boessenkool > > <segher@kernel.crashing.org> wrote: > > > > > > On Sun, Oct 13, 2019 at 07:51:01PM -0700, Nathan Chancellor wrote: > > > > r374662 gives LLVM the ability to convert certain loops into a reference > > > > to bcmp as an optimization; this breaks prom_init_check.sh: > > > > > > When/why does LLVM think this is okay? This function has been removed > > > from POSIX over a decade ago (and before that it always was marked as > > > legacy). > > > > Segher, do you have links for any of the above? If so, that would be > > helpful to me. > > Sure! > > https://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xsh_chap03.html > > Older versions are harder to find online, unfortunately. But there is > > https://kernel.org/pub/linux/docs/man-pages/man-pages-posix/ > > in which man3p/bcmp.3p says: > > FUTURE DIRECTIONS > This function may be withdrawn in a future version. > > Finally, the Linux man pages say (man bcmp): > > CONFORMING TO > 4.3BSD. This function is deprecated (marked as LEGACY in > POSIX.1-2001): use memcmp(3) in new programs. POSIX.1-2008 removes the > specification of bcmp(). > > > > I'm arguing against certain transforms that assume that > > one library function is faster than another, when such claims are > > based on measurements from one stdlib implementation. > > Wow. The difference between memcmp and bcmp is trivial (just the return > value is different, and that costs hardly anything to add). And memcmp > is guaranteed to exist since C89/C90 at least. > > > The rationale for why it was added was that memcmp takes a measurable > > amount of time in Google's fleet, and most calls to memcmp don't care > > about the position of the mismatch; bcmp is lower overhead (or at > > least for our libc implementation, not sure about others). > > You just have to do the read of the last words you compare as big-endian, > and then you can just subtract the two words, convert that to "int" (which > is very inconvenient to do, but hardly expensive), and there you go. > > Or on x86 use the bswap insn, or something like it. > > Or, if you use GCC, it has __builtin_memcmp but also __builtin_memcmp_eq, > and those are automatically used, too. > > > Segher Just as an FYI, there was some more discussion around the availablity and use of bcmp in this LLVM bug which spawned commit 5f074f3e192f ("lib/string.c: implement a basic bcmp"). https://bugs.llvm.org/show_bug.cgi?id=41035#c13 I believe this is the proper solution but I am fine with whatever works, I just want our CI to be green without any out of tree patches again... Cheers, Nathan
On Fri, Oct 18, 2019 at 12:00:22PM -0700, Nathan Chancellor wrote: > Just as an FYI, there was some more discussion around the availablity > and use of bcmp in this LLVM bug which spawned > commit 5f074f3e192f ("lib/string.c: implement a basic bcmp"). > > https://bugs.llvm.org/show_bug.cgi?id=41035#c13 > > I believe this is the proper solution but I am fine with whatever works, > I just want our CI to be green without any out of tree patches again... I think the proper solution is for the kernel to *do* use -ffreestanding, and then somehow tell the kernel that memcpy etc. are the standard functions. A freestanding GCC already requires memcpy, memmove, memset, memcmp, and sometimes abort to exist and do the standard thing; why cannot programs then also rely on it to be the standard functions. What exact functions are the reason the kernel does not use -ffreestanding? Is it just memcpy? Is more wanted? Segher
On Fri, Oct 18, 2019 at 03:02:10PM -0500, Segher Boessenkool wrote: > On Fri, Oct 18, 2019 at 12:00:22PM -0700, Nathan Chancellor wrote: > > Just as an FYI, there was some more discussion around the availablity > > and use of bcmp in this LLVM bug which spawned > > commit 5f074f3e192f ("lib/string.c: implement a basic bcmp"). > > > > https://bugs.llvm.org/show_bug.cgi?id=41035#c13 > > > > I believe this is the proper solution but I am fine with whatever works, > > I just want our CI to be green without any out of tree patches again... > > I think the proper solution is for the kernel to *do* use -ffreestanding, > and then somehow tell the kernel that memcpy etc. are the standard > functions. A freestanding GCC already requires memcpy, memmove, memset, > memcmp, and sometimes abort to exist and do the standard thing; why cannot > programs then also rely on it to be the standard functions. > > What exact functions are the reason the kernel does not use -ffreestanding? > Is it just memcpy? Is more wanted? > > > Segher I think Linus summarized it pretty well here: https://lore.kernel.org/lkml/CAHk-=wi-epJZfBHDbKKDZ64us7WkF=LpUfhvYBmZSteO8Q0RAg@mail.gmail.com/ Cheers, Nathan
On Mon, Oct 21, 2019 at 10:15:29PM -0700, Nathan Chancellor wrote: > On Fri, Oct 18, 2019 at 03:02:10PM -0500, Segher Boessenkool wrote: > > I think the proper solution is for the kernel to *do* use -ffreestanding, > > and then somehow tell the kernel that memcpy etc. are the standard > > functions. A freestanding GCC already requires memcpy, memmove, memset, > > memcmp, and sometimes abort to exist and do the standard thing; why cannot > > programs then also rely on it to be the standard functions. > > > > What exact functions are the reason the kernel does not use -ffreestanding? > > Is it just memcpy? Is more wanted? > > I think Linus summarized it pretty well here: > > https://lore.kernel.org/lkml/CAHk-=wi-epJZfBHDbKKDZ64us7WkF=LpUfhvYBmZSteO8Q0RAg@mail.gmail.com/ GCC recognises __builtin_memcpy (or any other __builtin) just fine even with -ffreestanding. So the kernel wants a warning (or error) whenever a call to one of these library functions is generated by the compiler without the user asking for it directly (via a __builtin)? And that is all that is needed for the kernel to use -ffreestanding? That shouldn't be hard. Anything missing here? Segher
On Tue, Oct 22, 2019 at 03:57:09AM -0500, Segher Boessenkool wrote: > On Mon, Oct 21, 2019 at 10:15:29PM -0700, Nathan Chancellor wrote: > > On Fri, Oct 18, 2019 at 03:02:10PM -0500, Segher Boessenkool wrote: > > > I think the proper solution is for the kernel to *do* use -ffreestanding, > > > and then somehow tell the kernel that memcpy etc. are the standard > > > functions. A freestanding GCC already requires memcpy, memmove, memset, > > > memcmp, and sometimes abort to exist and do the standard thing; why cannot > > > programs then also rely on it to be the standard functions. > > > > > > What exact functions are the reason the kernel does not use -ffreestanding? > > > Is it just memcpy? Is more wanted? > > > > I think Linus summarized it pretty well here: > > > > https://lore.kernel.org/lkml/CAHk-=wi-epJZfBHDbKKDZ64us7WkF=LpUfhvYBmZSteO8Q0RAg@mail.gmail.com/ > > GCC recognises __builtin_memcpy (or any other __builtin) just fine even > with -ffreestanding. > > So the kernel wants a warning (or error) whenever a call to one of these > library functions is generated by the compiler without the user asking > for it directly (via a __builtin)? And that is all that is needed for > the kernel to use -ffreestanding? > > That shouldn't be hard. Anything missing here? > > > Segher Yes, I suppose that would be good enough. I don't know if there are any other optimizations that are missed out on by using -ffreestanding. It would probably be worth asking other kernel developers on a separate thread (or the one I linked above). Would be nice to get this shored up soon since our PowerPC builds have been broken since the beginning of August :/ Cheers, Nathan
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index f1f362146135..7f0ee465dfb6 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -21,7 +21,7 @@ CFLAGS_prom_init.o += $(DISABLE_LATENT_ENTROPY_PLUGIN) CFLAGS_btext.o += $(DISABLE_LATENT_ENTROPY_PLUGIN) CFLAGS_prom.o += $(DISABLE_LATENT_ENTROPY_PLUGIN) -CFLAGS_prom_init.o += $(call cc-option, -fno-stack-protector) +CFLAGS_prom_init.o += $(call cc-option, -fno-stack-protector) -ffreestanding ifdef CONFIG_FUNCTION_TRACER # Do not trace early boot code