diff mbox series

lld status with powerpc64

Message ID 20211105072312.wbqrhgyjvsypkj52@google.com
State New
Headers show
Series lld status with powerpc64 | expand

Commit Message

Fangrui Song Nov. 5, 2021, 7:23 a.m. UTC
On 2021-10-27, Tulio Magno Quites Machado Filho wrote:
>Tulio Magno Quites Machado Filho via Libc-alpha <libc-alpha@sourceware.org> writes:
>
>> Fangrui Song via Libc-alpha <libc-alpha@sourceware.org> writes:
>>
>>> On 2021-10-26, Adhemerval Zanella wrote:
>>>>  #2  0x00007ffff7d7ae90 in __libc_start_main_impl ()
>>>>     from
>>>>  /home/azanella/glibc/build/powerpc64le-linux-gnu-power9-lld/libc.so.6
>>>>  #3  0x0000000000000000 in ?? ()
>>>>  (gdb) disas
>>>>  Dump of assembler code for function __gep_setup___vmx__sigjmp_save:
>>>>  => 0x00007ffff7f2a980 <+0>:     .long 0x613ffe6
>>>>     0x00007ffff7f2a984 <+4>:     li      r12,-1280
>>
>> This is a pla, but this GDB isn't able to disassemble it.  This instruction
>> shouldn't be used unless when configuring glibc using --with-cpu=power10.
>
>I can reproduce this issue even when configuring glibc with
>--with-cpu=power9 --disable-multi-arch, which means the build should not have
>any Power10 instructions.
>
>-- 
>Tulio Magno

__gep_setup___vmx__sigjmp_save means a r12 setup stub in ld.lld and is
used with R_PPC64_REL24_NOTOC for a non-PLT branch.

Clang only emits `bl foo@notoc` with -mcpu=power10.
However, sysdeps/powerpc/powerpc64/configure.ac enables USE_PPC64_NOTOC
when the assembler (gas) supports it.


Adhemerval noticed that ld.lld has a behavior difference from GNU ld:
ld.lld defaults to PC-relative paddi stub while GNU ld doesn't (like
--power10-stubs=no).
Is USE_PPC64_NOTOC supposed to be used when targeting POWER9 and below?
If yes, the ld.lld default should be fixed.


In my testing environment (POWER9), ld.bfd doesn't support @notoc, so
USE_PPC64_NOTOC is undefined.

% gcc --version
gcc (Debian 8.3.0-6) 8.3.0
...
% ld.bfd --version
GNU ld (GNU Binutils for Debian) 2.31.1
...

I only see 8 more FAILs with ld.lld than ld.bfd


% diff -u 0 1


I suspect ifuncmain1* is again related to the order of R_*_IRELATIVE with
regard to R_*_JUMP_SLOT referencing a STT_GNU_IFUNC symbol.
(something like https://maskray.me/blog/2021-01-18-gnu-indirect-function#relocation-resolving-order)
But perhaps Adhemerval can look a look at it.


For elf/tst-tlsopt-powerpc, it is simply because ld.lld doesn't implement the GNU ld powerpc64's
__tls_get_addr_opt (pseudo-TLSDESC): https://maskray.me/blog/2021-02-14-all-about-thread-local-storage#powerpc-__tls_get_addr_opt
Let me send a configure patch to disable it...



Hey, so lld linked glibc powerpc64 is in a pretty good status!

Comments

Fangrui Song Nov. 5, 2021, 7:45 a.m. UTC | #1
On 2021-11-05, Fangrui Song wrote:
>
>On 2021-10-27, Tulio Magno Quites Machado Filho wrote:
>>Tulio Magno Quites Machado Filho via Libc-alpha <libc-alpha@sourceware.org> writes:
>>
>>>Fangrui Song via Libc-alpha <libc-alpha@sourceware.org> writes:
>>>
>>>>On 2021-10-26, Adhemerval Zanella wrote:
>>>>> #2  0x00007ffff7d7ae90 in __libc_start_main_impl ()
>>>>>    from
>>>>> /home/azanella/glibc/build/powerpc64le-linux-gnu-power9-lld/libc.so.6
>>>>> #3  0x0000000000000000 in ?? ()
>>>>> (gdb) disas
>>>>> Dump of assembler code for function __gep_setup___vmx__sigjmp_save:
>>>>> => 0x00007ffff7f2a980 <+0>:     .long 0x613ffe6
>>>>>    0x00007ffff7f2a984 <+4>:     li      r12,-1280
>>>
>>>This is a pla, but this GDB isn't able to disassemble it.  This instruction
>>>shouldn't be used unless when configuring glibc using --with-cpu=power10.
>>
>>I can reproduce this issue even when configuring glibc with
>>--with-cpu=power9 --disable-multi-arch, which means the build should not have
>>any Power10 instructions.
>>
>>-- 
>>Tulio Magno
>
>__gep_setup___vmx__sigjmp_save means a r12 setup stub in ld.lld and is
>used with R_PPC64_REL24_NOTOC for a non-PLT branch.
>
>Clang only emits `bl foo@notoc` with -mcpu=power10.
>However, sysdeps/powerpc/powerpc64/configure.ac enables USE_PPC64_NOTOC
>when the assembler (gas) supports it.
>
>
>Adhemerval noticed that ld.lld has a behavior difference from GNU ld:
>ld.lld defaults to PC-relative paddi stub while GNU ld doesn't (like
>--power10-stubs=no).
>Is USE_PPC64_NOTOC supposed to be used when targeting POWER9 and below?
>If yes, the ld.lld default should be fixed.
>
>
>In my testing environment (POWER9), ld.bfd doesn't support @notoc, so
>USE_PPC64_NOTOC is undefined.
>
>% gcc --version
>gcc (Debian 8.3.0-6) 8.3.0
>...
>% ld.bfd --version
>GNU ld (GNU Binutils for Debian) 2.31.1
>...
>
>I only see 8 more FAILs with ld.lld than ld.bfd
>
>
>% diff -u 0 1
>--- 0   2021-11-05 00:11:43.218731302 -0700
>+++ 1   2021-11-05 00:11:37.659286448 -0700
>@@ -9,6 +9,14 @@
> FAIL: debug/tst-lfschk6
> FAIL: dlfcn/bug-atexit3
> FAIL: elf/check-abi-libc
>+FAIL: elf/ifuncmain1pic
>+FAIL: elf/ifuncmain1pie
>+FAIL: elf/ifuncmain1vis
>+FAIL: elf/ifuncmain1vispic
>+FAIL: elf/ifuncmain1vispie
>+FAIL: elf/ifuncmain3
>+FAIL: elf/ifuncmain6pie
>+FAIL: elf/tst-tlsopt-powerpc
> FAIL: nptl/tst-cancel24
> FAIL: nptl/tst-minstack-throw
> FAIL: nptl/tst-once5
>
>
>I suspect ifuncmain1* is again related to the order of R_*_IRELATIVE with
>regard to R_*_JUMP_SLOT referencing a STT_GNU_IFUNC symbol.
>(something like https://maskray.me/blog/2021-01-18-gnu-indirect-function#relocation-resolving-order)
>But perhaps Adhemerval can look a look at it.
>
>
>For elf/tst-tlsopt-powerpc, it is simply because ld.lld doesn't implement the GNU ld powerpc64's
>__tls_get_addr_opt (pseudo-TLSDESC): https://maskray.me/blog/2021-02-14-all-about-thread-local-storage#powerpc-__tls_get_addr_opt
>Let me send a configure patch to disable it...

Actually I do not know how to disable tst-tlsopt-powerpc properly.

Perhaps add sysdeps/powerpc/configure.ac and move the
sysdeps/unix/sysv/linux/powerpc/configure.ac --no-tls-get-addr-optimize
to sysdeps/powerpc/configure.ac?
sysdeps/powerpc/preconfigure.ac exists (I don't know how it is used).

The patch requires some non-trivial configure.ac change, so I'd hope
that an expert can do it :)

>
>
>Hey, so lld linked glibc powerpc64 is in a pretty good status!
Adhemerval Zanella Netto Nov. 5, 2021, 1:58 p.m. UTC | #2
On 05/11/2021 04:23, Fangrui Song wrote:
> 
> On 2021-10-27, Tulio Magno Quites Machado Filho wrote:
>> Tulio Magno Quites Machado Filho via Libc-alpha <libc-alpha@sourceware.org> writes:
>>
>>> Fangrui Song via Libc-alpha <libc-alpha@sourceware.org> writes:
>>>
>>>> On 2021-10-26, Adhemerval Zanella wrote:
>>>>>  #2  0x00007ffff7d7ae90 in __libc_start_main_impl ()
>>>>>     from
>>>>>  /home/azanella/glibc/build/powerpc64le-linux-gnu-power9-lld/libc.so.6
>>>>>  #3  0x0000000000000000 in ?? ()
>>>>>  (gdb) disas
>>>>>  Dump of assembler code for function __gep_setup___vmx__sigjmp_save:
>>>>>  => 0x00007ffff7f2a980 <+0>:     .long 0x613ffe6
>>>>>     0x00007ffff7f2a984 <+4>:     li      r12,-1280
>>>
>>> This is a pla, but this GDB isn't able to disassemble it.  This instruction
>>> shouldn't be used unless when configuring glibc using --with-cpu=power10.
>>
>> I can reproduce this issue even when configuring glibc with
>> --with-cpu=power9 --disable-multi-arch, which means the build should not have
>> any Power10 instructions.
>>
>> -- 
>> Tulio Magno
> 
> __gep_setup___vmx__sigjmp_save means a r12 setup stub in ld.lld and is
> used with R_PPC64_REL24_NOTOC for a non-PLT branch.
> 
> Clang only emits `bl foo@notoc` with -mcpu=power10.
> However, sysdeps/powerpc/powerpc64/configure.ac enables USE_PPC64_NOTOC
> when the assembler (gas) supports it.
> 
> 
> Adhemerval noticed that ld.lld has a behavior difference from GNU ld:
> ld.lld defaults to PC-relative paddi stub while GNU ld doesn't (like
> --power10-stubs=no).
> Is USE_PPC64_NOTOC supposed to be used when targeting POWER9 and below?
> If yes, the ld.lld default should be fixed.

I figure out the issue and both bfd, gold, and lld align themselves on the
power10-stubs options handling.  Using --power10-stubs without an arg is 
equivalent to --power10-stubs=yes, but not specifying --power10-stubs at
all should be equivalent to --power10-stubs=auto.  It somewhat confusing,
but I think it is to allow linker and compiler to be independently
regarding power10 stub generation.

The issue is bfd enables power10 relocs generation on stubs iff it sees
the new pc-relative relocations (for instance R_PPC64_D34), otherwise 
it generates default stubs (ppc64_elf_check_relocs:4700).

So we have two options here:

  1. Do not define USE_PPC64_NOTOC if with-lld (not optimal if ldd
     aims to support such behavior).

  2. Define USE_PPC64_NOTOC iff linker supports such optimization.
     It means to emit a NOTOC relocation (R_PPC64_REL24_NOTOC),
     link a simple binary without any pcrel and check if the stub has 
     power10 instruction.

  3. Remove the USE_PPC64_NOTOC usage.  It is used on setjmp routines
     and on the syscall definition to call the __syscall_error.

I am aiming to implement 2. since at least by disabling USE_PPC64_NOTOC 
manually on config.h when configuring with lld I can build glibc.

> 
> 
> In my testing environment (POWER9), ld.bfd doesn't support @notoc, so
> USE_PPC64_NOTOC is undefined.
> 
> % gcc --version
> gcc (Debian 8.3.0-6) 8.3.0
> ...
> % ld.bfd --version
> GNU ld (GNU Binutils for Debian) 2.31.1
> ...
> 
> I only see 8 more FAILs with ld.lld than ld.bfd

I am still having trouble to *finish* make check with lld release 13
my environment (some tests stuck on infinite loop).

> 
> 
> % diff -u 0 1
> --- 0   2021-11-05 00:11:43.218731302 -0700
> +++ 1   2021-11-05 00:11:37.659286448 -0700
> @@ -9,6 +9,14 @@
>  FAIL: debug/tst-lfschk6
>  FAIL: dlfcn/bug-atexit3
>  FAIL: elf/check-abi-libc
> +FAIL: elf/ifuncmain1pic
> +FAIL: elf/ifuncmain1pie
> +FAIL: elf/ifuncmain1vis
> +FAIL: elf/ifuncmain1vispic
> +FAIL: elf/ifuncmain1vispie
> +FAIL: elf/ifuncmain3
> +FAIL: elf/ifuncmain6pie
> +FAIL: elf/tst-tlsopt-powerpc
>  FAIL: nptl/tst-cancel24
>  FAIL: nptl/tst-minstack-throw
>  FAIL: nptl/tst-once5
> 
> 
> I suspect ifuncmain1* is again related to the order of R_*_IRELATIVE with
> regard to R_*_JUMP_SLOT referencing a STT_GNU_IFUNC symbol.
> (something like https://maskray.me/blog/2021-01-18-gnu-indirect-function#relocation-resolving-order)
> But perhaps Adhemerval can look a look at it.

I still need to understand why armhf also fails with the ifunc using
protected symbols, although I am not if they are related to IRELATIVE
ordering.

> 
> 
> For elf/tst-tlsopt-powerpc, it is simply because ld.lld doesn't implement the GNU ld powerpc64's
> __tls_get_addr_opt (pseudo-TLSDESC): https://maskray.me/blog/2021-02-14-all-about-thread-local-storage#powerpc-__tls_get_addr_opt
> Let me send a configure patch to disable it...
> 
> Actually I do not know how to disable tst-tlsopt-powerpc properly.
> 
> Perhaps add sysdeps/powerpc/configure.ac and move the
> sysdeps/unix/sysv/linux/powerpc/configure.ac --no-tls-get-addr-optimize
> to sysdeps/powerpc/configure.ac?
> sysdeps/powerpc/preconfigure.ac exists (I don't know how it is used).
> 
> The patch requires some non-trivial configure.ac change, so I'd hope
> that an expert can do it 

Maybe just disable the test if __tls_get_addr_opt (pseudo-TLSDESC) is not supported
by the linker?


> 
> 
> Hey, so lld linked glibc powerpc64 is in a pretty good status!

I am still not sure about it, I did could run some tests but I am still
struggling to get a make check to finish.
Adhemerval Zanella Netto Nov. 5, 2021, 7:32 p.m. UTC | #3
On 05/11/2021 10:58, Adhemerval Zanella wrote:

> So we have two options here:
> 
>   1. Do not define USE_PPC64_NOTOC if with-lld (not optimal if ldd
>      aims to support such behavior).
> 
>   2. Define USE_PPC64_NOTOC iff linker supports such optimization.
>      It means to emit a NOTOC relocation (R_PPC64_REL24_NOTOC),
>      link a simple binary without any pcrel and check if the stub has 
>      power10 instruction.
> 
>   3. Remove the USE_PPC64_NOTOC usage.  It is used on setjmp routines
>      and on the syscall definition to call the __syscall_error.
> 
> I am aiming to implement 2. since at least by disabling USE_PPC64_NOTOC 
> manually on config.h when configuring with lld I can build glibc.
> 
>>
>>
>> In my testing environment (POWER9), ld.bfd doesn't support @notoc, so
>> USE_PPC64_NOTOC is undefined.
>>
>> % gcc --version
>> gcc (Debian 8.3.0-6) 8.3.0
>> ...
>> % ld.bfd --version
>> GNU ld (GNU Binutils for Debian) 2.31.1
>> ...
>>
>> I only see 8 more FAILs with ld.lld than ld.bfd
> 
> I am still having trouble to *finish* make check with lld release 13
> my environment (some tests stuck on infinite loop).
> 
>>
>>
>> % diff -u 0 1
>> --- 0   2021-11-05 00:11:43.218731302 -0700
>> +++ 1   2021-11-05 00:11:37.659286448 -0700
>> @@ -9,6 +9,14 @@
>>  FAIL: debug/tst-lfschk6
>>  FAIL: dlfcn/bug-atexit3
>>  FAIL: elf/check-abi-libc
>> +FAIL: elf/ifuncmain1pic
>> +FAIL: elf/ifuncmain1pie
>> +FAIL: elf/ifuncmain1vis
>> +FAIL: elf/ifuncmain1vispic
>> +FAIL: elf/ifuncmain1vispie
>> +FAIL: elf/ifuncmain3
>> +FAIL: elf/ifuncmain6pie
>> +FAIL: elf/tst-tlsopt-powerpc
>>  FAIL: nptl/tst-cancel24
>>  FAIL: nptl/tst-minstack-throw
>>  FAIL: nptl/tst-once5
>>
>>
>> I suspect ifuncmain1* is again related to the order of R_*_IRELATIVE with
>> regard to R_*_JUMP_SLOT referencing a STT_GNU_IFUNC symbol.
>> (something like https://maskray.me/blog/2021-01-18-gnu-indirect-function#relocation-resolving-order)
>> But perhaps Adhemerval can look a look at it.
> 
> I still need to understand why armhf also fails with the ifunc using
> protected symbols, although I am not if they are related to IRELATIVE
> ordering.
> 
>>
>>
>> For elf/tst-tlsopt-powerpc, it is simply because ld.lld doesn't implement the GNU ld powerpc64's
>> __tls_get_addr_opt (pseudo-TLSDESC): https://maskray.me/blog/2021-02-14-all-about-thread-local-storage#powerpc-__tls_get_addr_opt
>> Let me send a configure patch to disable it...
>>
>> Actually I do not know how to disable tst-tlsopt-powerpc properly.
>>
>> Perhaps add sysdeps/powerpc/configure.ac and move the
>> sysdeps/unix/sysv/linux/powerpc/configure.ac --no-tls-get-addr-optimize
>> to sysdeps/powerpc/configure.ac?
>> sysdeps/powerpc/preconfigure.ac exists (I don't know how it is used).
>>
>> The patch requires some non-trivial configure.ac change, so I'd hope
>> that an expert can do it 
> 
> Maybe just disable the test if __tls_get_addr_opt (pseudo-TLSDESC) is not supported
> by the linker?
> 
> 
>>
>>
>> Hey, so lld linked glibc powerpc64 is in a pretty good status!
> 
> I am still not sure about it, I did could run some tests but I am still
> struggling to get a make check to finish.
> 

So it turned out the another issues was my gcc toolchain did not have
initfini array set which lets to compiler generate .ctors instead of
.init_array and lld does not support .ctors (it just ignores instead of
converting to .init_array entries as bfd).

With NOTOC fix and a proper toolchain I am seeing the same regression
you are seeing:

FAIL: elf/ifuncmain1pic
FAIL: elf/ifuncmain1pie
FAIL: elf/ifuncmain1vis
FAIL: elf/ifuncmain1vispic
FAIL: elf/ifuncmain1vispie
FAIL: elf/ifuncmain3
FAIL: elf/ifuncmain6pie
FAIL: elf/tst-tlsopt-powerpc

I think it would be good use --enable-initfini-array on build-many-glibc.py
until we set gcc-12 as default.
H.J. Lu Nov. 5, 2021, 7:38 p.m. UTC | #4
On Fri, Nov 5, 2021 at 12:33 PM Adhemerval Zanella via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
>
>
> On 05/11/2021 10:58, Adhemerval Zanella wrote:
>
> > So we have two options here:
> >
> >   1. Do not define USE_PPC64_NOTOC if with-lld (not optimal if ldd
> >      aims to support such behavior).
> >
> >   2. Define USE_PPC64_NOTOC iff linker supports such optimization.
> >      It means to emit a NOTOC relocation (R_PPC64_REL24_NOTOC),
> >      link a simple binary without any pcrel and check if the stub has
> >      power10 instruction.
> >
> >   3. Remove the USE_PPC64_NOTOC usage.  It is used on setjmp routines
> >      and on the syscall definition to call the __syscall_error.
> >
> > I am aiming to implement 2. since at least by disabling USE_PPC64_NOTOC
> > manually on config.h when configuring with lld I can build glibc.
> >
> >>
> >>
> >> In my testing environment (POWER9), ld.bfd doesn't support @notoc, so
> >> USE_PPC64_NOTOC is undefined.
> >>
> >> % gcc --version
> >> gcc (Debian 8.3.0-6) 8.3.0
> >> ...
> >> % ld.bfd --version
> >> GNU ld (GNU Binutils for Debian) 2.31.1
> >> ...
> >>
> >> I only see 8 more FAILs with ld.lld than ld.bfd
> >
> > I am still having trouble to *finish* make check with lld release 13
> > my environment (some tests stuck on infinite loop).
> >
> >>
> >>
> >> % diff -u 0 1
> >> --- 0   2021-11-05 00:11:43.218731302 -0700
> >> +++ 1   2021-11-05 00:11:37.659286448 -0700
> >> @@ -9,6 +9,14 @@
> >>  FAIL: debug/tst-lfschk6
> >>  FAIL: dlfcn/bug-atexit3
> >>  FAIL: elf/check-abi-libc
> >> +FAIL: elf/ifuncmain1pic
> >> +FAIL: elf/ifuncmain1pie
> >> +FAIL: elf/ifuncmain1vis
> >> +FAIL: elf/ifuncmain1vispic
> >> +FAIL: elf/ifuncmain1vispie
> >> +FAIL: elf/ifuncmain3
> >> +FAIL: elf/ifuncmain6pie
> >> +FAIL: elf/tst-tlsopt-powerpc
> >>  FAIL: nptl/tst-cancel24
> >>  FAIL: nptl/tst-minstack-throw
> >>  FAIL: nptl/tst-once5
> >>
> >>
> >> I suspect ifuncmain1* is again related to the order of R_*_IRELATIVE with
> >> regard to R_*_JUMP_SLOT referencing a STT_GNU_IFUNC symbol.
> >> (something like https://maskray.me/blog/2021-01-18-gnu-indirect-function#relocation-resolving-order)
> >> But perhaps Adhemerval can look a look at it.
> >
> > I still need to understand why armhf also fails with the ifunc using
> > protected symbols, although I am not if they are related to IRELATIVE
> > ordering.
> >
> >>
> >>
> >> For elf/tst-tlsopt-powerpc, it is simply because ld.lld doesn't implement the GNU ld powerpc64's
> >> __tls_get_addr_opt (pseudo-TLSDESC): https://maskray.me/blog/2021-02-14-all-about-thread-local-storage#powerpc-__tls_get_addr_opt
> >> Let me send a configure patch to disable it...
> >>
> >> Actually I do not know how to disable tst-tlsopt-powerpc properly.
> >>
> >> Perhaps add sysdeps/powerpc/configure.ac and move the
> >> sysdeps/unix/sysv/linux/powerpc/configure.ac --no-tls-get-addr-optimize
> >> to sysdeps/powerpc/configure.ac?
> >> sysdeps/powerpc/preconfigure.ac exists (I don't know how it is used).
> >>
> >> The patch requires some non-trivial configure.ac change, so I'd hope
> >> that an expert can do it
> >
> > Maybe just disable the test if __tls_get_addr_opt (pseudo-TLSDESC) is not supported
> > by the linker?
> >
> >
> >>
> >>
> >> Hey, so lld linked glibc powerpc64 is in a pretty good status!
> >
> > I am still not sure about it, I did could run some tests but I am still
> > struggling to get a make check to finish.
> >
>
> So it turned out the another issues was my gcc toolchain did not have
> initfini array set which lets to compiler generate .ctors instead of
> .init_array and lld does not support .ctors (it just ignores instead of
> converting to .init_array entries as bfd).
>
> With NOTOC fix and a proper toolchain I am seeing the same regression
> you are seeing:
>
> FAIL: elf/ifuncmain1pic
> FAIL: elf/ifuncmain1pie
> FAIL: elf/ifuncmain1vis
> FAIL: elf/ifuncmain1vispic
> FAIL: elf/ifuncmain1vispie
> FAIL: elf/ifuncmain3
> FAIL: elf/ifuncmain6pie
> FAIL: elf/tst-tlsopt-powerpc
>
> I think it would be good use --enable-initfini-array on build-many-glibc.py
> until we set gcc-12 as default.

https://sourceware.org/pipermail/libc-alpha/2021-June/127413.html
H.J. Lu Nov. 5, 2021, 7:40 p.m. UTC | #5
On Fri, Nov 5, 2021 at 12:38 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Fri, Nov 5, 2021 at 12:33 PM Adhemerval Zanella via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
> >
> >
> >
> > On 05/11/2021 10:58, Adhemerval Zanella wrote:
> >
> > > So we have two options here:
> > >
> > >   1. Do not define USE_PPC64_NOTOC if with-lld (not optimal if ldd
> > >      aims to support such behavior).
> > >
> > >   2. Define USE_PPC64_NOTOC iff linker supports such optimization.
> > >      It means to emit a NOTOC relocation (R_PPC64_REL24_NOTOC),
> > >      link a simple binary without any pcrel and check if the stub has
> > >      power10 instruction.
> > >
> > >   3. Remove the USE_PPC64_NOTOC usage.  It is used on setjmp routines
> > >      and on the syscall definition to call the __syscall_error.
> > >
> > > I am aiming to implement 2. since at least by disabling USE_PPC64_NOTOC
> > > manually on config.h when configuring with lld I can build glibc.
> > >
> > >>
> > >>
> > >> In my testing environment (POWER9), ld.bfd doesn't support @notoc, so
> > >> USE_PPC64_NOTOC is undefined.
> > >>
> > >> % gcc --version
> > >> gcc (Debian 8.3.0-6) 8.3.0
> > >> ...
> > >> % ld.bfd --version
> > >> GNU ld (GNU Binutils for Debian) 2.31.1
> > >> ...
> > >>
> > >> I only see 8 more FAILs with ld.lld than ld.bfd
> > >
> > > I am still having trouble to *finish* make check with lld release 13
> > > my environment (some tests stuck on infinite loop).
> > >
> > >>
> > >>
> > >> % diff -u 0 1
> > >> --- 0   2021-11-05 00:11:43.218731302 -0700
> > >> +++ 1   2021-11-05 00:11:37.659286448 -0700
> > >> @@ -9,6 +9,14 @@
> > >>  FAIL: debug/tst-lfschk6
> > >>  FAIL: dlfcn/bug-atexit3
> > >>  FAIL: elf/check-abi-libc
> > >> +FAIL: elf/ifuncmain1pic
> > >> +FAIL: elf/ifuncmain1pie
> > >> +FAIL: elf/ifuncmain1vis
> > >> +FAIL: elf/ifuncmain1vispic
> > >> +FAIL: elf/ifuncmain1vispie
> > >> +FAIL: elf/ifuncmain3
> > >> +FAIL: elf/ifuncmain6pie
> > >> +FAIL: elf/tst-tlsopt-powerpc
> > >>  FAIL: nptl/tst-cancel24
> > >>  FAIL: nptl/tst-minstack-throw
> > >>  FAIL: nptl/tst-once5
> > >>
> > >>
> > >> I suspect ifuncmain1* is again related to the order of R_*_IRELATIVE with
> > >> regard to R_*_JUMP_SLOT referencing a STT_GNU_IFUNC symbol.
> > >> (something like https://maskray.me/blog/2021-01-18-gnu-indirect-function#relocation-resolving-order)
> > >> But perhaps Adhemerval can look a look at it.
> > >
> > > I still need to understand why armhf also fails with the ifunc using
> > > protected symbols, although I am not if they are related to IRELATIVE
> > > ordering.
> > >
> > >>
> > >>
> > >> For elf/tst-tlsopt-powerpc, it is simply because ld.lld doesn't implement the GNU ld powerpc64's
> > >> __tls_get_addr_opt (pseudo-TLSDESC): https://maskray.me/blog/2021-02-14-all-about-thread-local-storage#powerpc-__tls_get_addr_opt
> > >> Let me send a configure patch to disable it...
> > >>
> > >> Actually I do not know how to disable tst-tlsopt-powerpc properly.
> > >>
> > >> Perhaps add sysdeps/powerpc/configure.ac and move the
> > >> sysdeps/unix/sysv/linux/powerpc/configure.ac --no-tls-get-addr-optimize
> > >> to sysdeps/powerpc/configure.ac?
> > >> sysdeps/powerpc/preconfigure.ac exists (I don't know how it is used).
> > >>
> > >> The patch requires some non-trivial configure.ac change, so I'd hope
> > >> that an expert can do it
> > >
> > > Maybe just disable the test if __tls_get_addr_opt (pseudo-TLSDESC) is not supported
> > > by the linker?
> > >
> > >
> > >>
> > >>
> > >> Hey, so lld linked glibc powerpc64 is in a pretty good status!
> > >
> > > I am still not sure about it, I did could run some tests but I am still
> > > struggling to get a make check to finish.
> > >
> >
> > So it turned out the another issues was my gcc toolchain did not have
> > initfini array set which lets to compiler generate .ctors instead of
> > .init_array and lld does not support .ctors (it just ignores instead of
> > converting to .init_array entries as bfd).
> >
> > With NOTOC fix and a proper toolchain I am seeing the same regression
> > you are seeing:
> >
> > FAIL: elf/ifuncmain1pic
> > FAIL: elf/ifuncmain1pie
> > FAIL: elf/ifuncmain1vis
> > FAIL: elf/ifuncmain1vispic
> > FAIL: elf/ifuncmain1vispie
> > FAIL: elf/ifuncmain3
> > FAIL: elf/ifuncmain6pie
> > FAIL: elf/tst-tlsopt-powerpc
> >
> > I think it would be good use --enable-initfini-array on build-many-glibc.py
> > until we set gcc-12 as default.
>
> https://sourceware.org/pipermail/libc-alpha/2021-June/127413.html
>
> --
> H.J.

https://patchwork.sourceware.org/project/glibc/patch/20210609122916.3884385-1-hjl.tools@gmail.com/
Fangrui Song Nov. 5, 2021, 7:50 p.m. UTC | #6
On Fri, Nov 5, 2021 at 12:39 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Fri, Nov 5, 2021 at 12:33 PM Adhemerval Zanella via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
> >
> >
> >
> > On 05/11/2021 10:58, Adhemerval Zanella wrote:
> >
> > > So we have two options here:
> > >
> > >   1. Do not define USE_PPC64_NOTOC if with-lld (not optimal if ldd
> > >      aims to support such behavior).
> > >
> > >   2. Define USE_PPC64_NOTOC iff linker supports such optimization.
> > >      It means to emit a NOTOC relocation (R_PPC64_REL24_NOTOC),
> > >      link a simple binary without any pcrel and check if the stub has
> > >      power10 instruction.
> > >
> > >   3. Remove the USE_PPC64_NOTOC usage.  It is used on setjmp routines
> > >      and on the syscall definition to call the __syscall_error.
> > >
> > > I am aiming to implement 2. since at least by disabling USE_PPC64_NOTOC
> > > manually on config.h when configuring with lld I can build glibc.
> > >
> > >>
> > >>
> > >> In my testing environment (POWER9), ld.bfd doesn't support @notoc, so
> > >> USE_PPC64_NOTOC is undefined.
> > >>
> > >> % gcc --version
> > >> gcc (Debian 8.3.0-6) 8.3.0
> > >> ...
> > >> % ld.bfd --version
> > >> GNU ld (GNU Binutils for Debian) 2.31.1
> > >> ...
> > >>
> > >> I only see 8 more FAILs with ld.lld than ld.bfd
> > >
> > > I am still having trouble to *finish* make check with lld release 13
> > > my environment (some tests stuck on infinite loop).
> > >
> > >>
> > >>
> > >> % diff -u 0 1
> > >> --- 0   2021-11-05 00:11:43.218731302 -0700
> > >> +++ 1   2021-11-05 00:11:37.659286448 -0700
> > >> @@ -9,6 +9,14 @@
> > >>  FAIL: debug/tst-lfschk6
> > >>  FAIL: dlfcn/bug-atexit3
> > >>  FAIL: elf/check-abi-libc
> > >> +FAIL: elf/ifuncmain1pic
> > >> +FAIL: elf/ifuncmain1pie
> > >> +FAIL: elf/ifuncmain1vis
> > >> +FAIL: elf/ifuncmain1vispic
> > >> +FAIL: elf/ifuncmain1vispie
> > >> +FAIL: elf/ifuncmain3
> > >> +FAIL: elf/ifuncmain6pie
> > >> +FAIL: elf/tst-tlsopt-powerpc
> > >>  FAIL: nptl/tst-cancel24
> > >>  FAIL: nptl/tst-minstack-throw
> > >>  FAIL: nptl/tst-once5
> > >>
> > >>
> > >> I suspect ifuncmain1* is again related to the order of R_*_IRELATIVE with
> > >> regard to R_*_JUMP_SLOT referencing a STT_GNU_IFUNC symbol.
> > >> (something like https://maskray.me/blog/2021-01-18-gnu-indirect-function#relocation-resolving-order)
> > >> But perhaps Adhemerval can look a look at it.
> > >
> > > I still need to understand why armhf also fails with the ifunc using
> > > protected symbols, although I am not if they are related to IRELATIVE
> > > ordering.
> > >
> > >>
> > >>
> > >> For elf/tst-tlsopt-powerpc, it is simply because ld.lld doesn't implement the GNU ld powerpc64's
> > >> __tls_get_addr_opt (pseudo-TLSDESC): https://maskray.me/blog/2021-02-14-all-about-thread-local-storage#powerpc-__tls_get_addr_opt
> > >> Let me send a configure patch to disable it...
> > >>
> > >> Actually I do not know how to disable tst-tlsopt-powerpc properly.
> > >>
> > >> Perhaps add sysdeps/powerpc/configure.ac and move the
> > >> sysdeps/unix/sysv/linux/powerpc/configure.ac --no-tls-get-addr-optimize
> > >> to sysdeps/powerpc/configure.ac?
> > >> sysdeps/powerpc/preconfigure.ac exists (I don't know how it is used).
> > >>
> > >> The patch requires some non-trivial configure.ac change, so I'd hope
> > >> that an expert can do it
> > >
> > > Maybe just disable the test if __tls_get_addr_opt (pseudo-TLSDESC) is not supported
> > > by the linker?
> > >
> > >
> > >>
> > >>
> > >> Hey, so lld linked glibc powerpc64 is in a pretty good status!
> > >
> > > I am still not sure about it, I did could run some tests but I am still
> > > struggling to get a make check to finish.
> > >
> >
> > So it turned out the another issues was my gcc toolchain did not have
> > initfini array set which lets to compiler generate .ctors instead of
> > .init_array and lld does not support .ctors (it just ignores instead of
> > converting to .init_array entries as bfd).
> >
> > With NOTOC fix and a proper toolchain I am seeing the same regression
> > you are seeing:
> >
> > FAIL: elf/ifuncmain1pic
> > FAIL: elf/ifuncmain1pie
> > FAIL: elf/ifuncmain1vis
> > FAIL: elf/ifuncmain1vispic
> > FAIL: elf/ifuncmain1vispie
> > FAIL: elf/ifuncmain3
> > FAIL: elf/ifuncmain6pie
> > FAIL: elf/tst-tlsopt-powerpc
> >
> > I think it would be good use --enable-initfini-array on build-many-glibc.py
> > until we set gcc-12 as default.
>
> https://sourceware.org/pipermail/libc-alpha/2021-June/127413.html

I wasn't subscribed to libc-alpha, so it's not easy for me to reply
there. The patch looks good to me :)

Reviewed-by: Fangrui Song <maskray@google.com>

GCC side may use more clean-up
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100896#c4
develop--- via Libc-alpha Nov. 7, 2021, 2:24 p.m. UTC | #7
Please coordinate with Alan Modra and Nemanja Ivanovic on this topic.  There are ongoing discussions about bfd and lld linker support around @notoc that will be resolved soon, and Tulio is on vacation, so I don't want the community to make steps they'll have to undo later, or for people to engage in duplicate work.

Thanks!
Bill

On 11/5/21 8:58 AM, Adhemerval Zanella via Libc-alpha wrote:
>
> On 05/11/2021 04:23, Fangrui Song wrote:
>> On 2021-10-27, Tulio Magno Quites Machado Filho wrote:
>>> Tulio Magno Quites Machado Filho via Libc-alpha <libc-alpha@sourceware.org> writes:
>>>
>>>> Fangrui Song via Libc-alpha <libc-alpha@sourceware.org> writes:
>>>>
>>>>> On 2021-10-26, Adhemerval Zanella wrote:
>>>>>>  #2  0x00007ffff7d7ae90 in __libc_start_main_impl ()
>>>>>>     from
>>>>>>  /home/azanella/glibc/build/powerpc64le-linux-gnu-power9-lld/libc.so.6
>>>>>>  #3  0x0000000000000000 in ?? ()
>>>>>>  (gdb) disas
>>>>>>  Dump of assembler code for function __gep_setup___vmx__sigjmp_save:
>>>>>>  => 0x00007ffff7f2a980 <+0>:     .long 0x613ffe6
>>>>>>     0x00007ffff7f2a984 <+4>:     li      r12,-1280
>>>> This is a pla, but this GDB isn't able to disassemble it.  This instruction
>>>> shouldn't be used unless when configuring glibc using --with-cpu=power10.
>>> I can reproduce this issue even when configuring glibc with
>>> --with-cpu=power9 --disable-multi-arch, which means the build should not have
>>> any Power10 instructions.
>>>
>>> -- 
>>> Tulio Magno
>> __gep_setup___vmx__sigjmp_save means a r12 setup stub in ld.lld and is
>> used with R_PPC64_REL24_NOTOC for a non-PLT branch.
>>
>> Clang only emits `bl foo@notoc` with -mcpu=power10.
>> However, sysdeps/powerpc/powerpc64/configure.ac enables USE_PPC64_NOTOC
>> when the assembler (gas) supports it.
>>
>>
>> Adhemerval noticed that ld.lld has a behavior difference from GNU ld:
>> ld.lld defaults to PC-relative paddi stub while GNU ld doesn't (like
>> --power10-stubs=no).
>> Is USE_PPC64_NOTOC supposed to be used when targeting POWER9 and below?
>> If yes, the ld.lld default should be fixed.
> I figure out the issue and both bfd, gold, and lld align themselves on the
> power10-stubs options handling.  Using --power10-stubs without an arg is 
> equivalent to --power10-stubs=yes, but not specifying --power10-stubs at
> all should be equivalent to --power10-stubs=auto.  It somewhat confusing,
> but I think it is to allow linker and compiler to be independently
> regarding power10 stub generation.
>
> The issue is bfd enables power10 relocs generation on stubs iff it sees
> the new pc-relative relocations (for instance R_PPC64_D34), otherwise 
> it generates default stubs (ppc64_elf_check_relocs:4700).
>
> So we have two options here:
>
>   1. Do not define USE_PPC64_NOTOC if with-lld (not optimal if ldd
>      aims to support such behavior).
>
>   2. Define USE_PPC64_NOTOC iff linker supports such optimization.
>      It means to emit a NOTOC relocation (R_PPC64_REL24_NOTOC),
>      link a simple binary without any pcrel and check if the stub has 
>      power10 instruction.
>
>   3. Remove the USE_PPC64_NOTOC usage.  It is used on setjmp routines
>      and on the syscall definition to call the __syscall_error.
>
> I am aiming to implement 2. since at least by disabling USE_PPC64_NOTOC 
> manually on config.h when configuring with lld I can build glibc.
>
>>
>> In my testing environment (POWER9), ld.bfd doesn't support @notoc, so
>> USE_PPC64_NOTOC is undefined.
>>
>> % gcc --version
>> gcc (Debian 8.3.0-6) 8.3.0
>> ...
>> % ld.bfd --version
>> GNU ld (GNU Binutils for Debian) 2.31.1
>> ...
>>
>> I only see 8 more FAILs with ld.lld than ld.bfd
> I am still having trouble to *finish* make check with lld release 13
> my environment (some tests stuck on infinite loop).
>
>>
>> % diff -u 0 1
>> --- 0   2021-11-05 00:11:43.218731302 -0700
>> +++ 1   2021-11-05 00:11:37.659286448 -0700
>> @@ -9,6 +9,14 @@
>>  FAIL: debug/tst-lfschk6
>>  FAIL: dlfcn/bug-atexit3
>>  FAIL: elf/check-abi-libc
>> +FAIL: elf/ifuncmain1pic
>> +FAIL: elf/ifuncmain1pie
>> +FAIL: elf/ifuncmain1vis
>> +FAIL: elf/ifuncmain1vispic
>> +FAIL: elf/ifuncmain1vispie
>> +FAIL: elf/ifuncmain3
>> +FAIL: elf/ifuncmain6pie
>> +FAIL: elf/tst-tlsopt-powerpc
>>  FAIL: nptl/tst-cancel24
>>  FAIL: nptl/tst-minstack-throw
>>  FAIL: nptl/tst-once5
>>
>>
>> I suspect ifuncmain1* is again related to the order of R_*_IRELATIVE with
>> regard to R_*_JUMP_SLOT referencing a STT_GNU_IFUNC symbol.
>> (something like https://maskray.me/blog/2021-01-18-gnu-indirect-function#relocation-resolving-order)
>> But perhaps Adhemerval can look a look at it.
> I still need to understand why armhf also fails with the ifunc using
> protected symbols, although I am not if they are related to IRELATIVE
> ordering.
>
>>
>> For elf/tst-tlsopt-powerpc, it is simply because ld.lld doesn't implement the GNU ld powerpc64's
>> __tls_get_addr_opt (pseudo-TLSDESC): https://maskray.me/blog/2021-02-14-all-about-thread-local-storage#powerpc-__tls_get_addr_opt
>> Let me send a configure patch to disable it...
>>
>> Actually I do not know how to disable tst-tlsopt-powerpc properly.
>>
>> Perhaps add sysdeps/powerpc/configure.ac and move the
>> sysdeps/unix/sysv/linux/powerpc/configure.ac --no-tls-get-addr-optimize
>> to sysdeps/powerpc/configure.ac?
>> sysdeps/powerpc/preconfigure.ac exists (I don't know how it is used).
>>
>> The patch requires some non-trivial configure.ac change, so I'd hope
>> that an expert can do it 
> Maybe just disable the test if __tls_get_addr_opt (pseudo-TLSDESC) is not supported
> by the linker?
>
>
>>
>> Hey, so lld linked glibc powerpc64 is in a pretty good status!
> I am still not sure about it, I did could run some tests but I am still
> struggling to get a make check to finish.
Adhemerval Zanella Netto Nov. 8, 2021, 11:37 a.m. UTC | #8
On 07/11/2021 11:24, Bill Schmidt wrote:
> Please coordinate with Alan Modra and Nemanja Ivanovic on this topic.  There are ongoing discussions about bfd and lld linker support around @notoc that will be resolved soon, and Tulio is on vacation, so I don't want the community to make steps they'll have to undo later, or for people to engage in duplicate work.

For this specific issue I just sent a patch to fix it on glibc side [1].
However I think it would be good if lld also implements the ld.bfd
optimization to fallback to older stub generation if no pcrel relocation
is found (although it is debatable that @notoc should implicit generate
older ISA code depending of the resulting objects being linked against).

[1] https://patchwork.sourceware.org/project/glibc/patch/20211108113316.8867-1-adhemerval.zanella@linaro.org/
develop--- via Libc-alpha Nov. 8, 2021, 1:26 p.m. UTC | #9
On 11/8/21 5:37 AM, Adhemerval Zanella wrote:
>
> On 07/11/2021 11:24, Bill Schmidt wrote:
>> Please coordinate with Alan Modra and Nemanja Ivanovic on this topic.  There are ongoing discussions about bfd and lld linker support around @notoc that will be resolved soon, and Tulio is on vacation, so I don't want the community to make steps they'll have to undo later, or for people to engage in duplicate work.
> For this specific issue I just sent a patch to fix it on glibc side [1].
> However I think it would be good if lld also implements the ld.bfd
> optimization to fallback to older stub generation if no pcrel relocation
> is found (although it is debatable that @notoc should implicit generate
> older ISA code depending of the resulting objects being linked against).
>
> [1] https://patchwork.sourceware.org/project/glibc/patch/20211108113316.8867-1-adhemerval.zanella@linaro.org/

Hi Adhemerval,

Current course and speed is that @notoc will imply pcrel stubs on P10 and later,
but will not do so on P9 and earlier.  The relocation currently associated with
@notoc actually came with ELFv2 on P8 and, although no compilers ever generated 
@notoc, assembly routines using it prior to P10 are a valid case and should not be
punished.  This can be handled by generating a different reloc for @notoc in the
two cases.

If this solution holds up, then changes to glibc should be unnecessary.

Thanks,
Bill
Adhemerval Zanella Netto Nov. 8, 2021, 1:54 p.m. UTC | #10
On 08/11/2021 10:26, Bill Schmidt wrote:
> On 11/8/21 5:37 AM, Adhemerval Zanella wrote:
>>
>> On 07/11/2021 11:24, Bill Schmidt wrote:
>>> Please coordinate with Alan Modra and Nemanja Ivanovic on this topic.  There are ongoing discussions about bfd and lld linker support around @notoc that will be resolved soon, and Tulio is on vacation, so I don't want the community to make steps they'll have to undo later, or for people to engage in duplicate work.
>> For this specific issue I just sent a patch to fix it on glibc side [1].
>> However I think it would be good if lld also implements the ld.bfd
>> optimization to fallback to older stub generation if no pcrel relocation
>> is found (although it is debatable that @notoc should implicit generate
>> older ISA code depending of the resulting objects being linked against).
>>
>> [1] https://patchwork.sourceware.org/project/glibc/patch/20211108113316.8867-1-adhemerval.zanella@linaro.org/
> 
> Hi Adhemerval,
> 
> Current course and speed is that @notoc will imply pcrel stubs on P10 and later,
> but will not do so on P9 and earlier.  The relocation currently associated with
> @notoc actually came with ELFv2 on P8 and, although no compilers ever generated 
> @notoc, assembly routines using it prior to P10 are a valid case and should not be
> punished.  This can be handled by generating a different reloc for @notoc in the
> two cases.
> 
> If this solution holds up, then changes to glibc should be unnecessary.

The main problem is this imposes an extra burden for the linker, where it 
need to implement the ld.bfd optimization to not generate the power10 stubs 
if no pcrel is found.  And it seems that lld does not yet support this
yes and I guess it has not been an issue because @notoc in assembly
routines should be rare. 

It also means that stubs generation are subject to a combination of
relocation on different objects (@notoc on assembly does not necessary
generate power10 stub with default linker option).  I think it should
be ok, although I see this as really confusing since it took some time
to figure out what ld.lfd was doing.
develop--- via Libc-alpha Nov. 8, 2021, 1:59 p.m. UTC | #11
On 11/8/21 7:54 AM, Adhemerval Zanella wrote:
>
> On 08/11/2021 10:26, Bill Schmidt wrote:
>> On 11/8/21 5:37 AM, Adhemerval Zanella wrote:
>>> On 07/11/2021 11:24, Bill Schmidt wrote:
>>>> Please coordinate with Alan Modra and Nemanja Ivanovic on this topic.  There are ongoing discussions about bfd and lld linker support around @notoc that will be resolved soon, and Tulio is on vacation, so I don't want the community to make steps they'll have to undo later, or for people to engage in duplicate work.
>>> For this specific issue I just sent a patch to fix it on glibc side [1].
>>> However I think it would be good if lld also implements the ld.bfd
>>> optimization to fallback to older stub generation if no pcrel relocation
>>> is found (although it is debatable that @notoc should implicit generate
>>> older ISA code depending of the resulting objects being linked against).
>>>
>>> [1] https://patchwork.sourceware.org/project/glibc/patch/20211108113316.8867-1-adhemerval.zanella@linaro.org/
>> Hi Adhemerval,
>>
>> Current course and speed is that @notoc will imply pcrel stubs on P10 and later,
>> but will not do so on P9 and earlier.  The relocation currently associated with
>> @notoc actually came with ELFv2 on P8 and, although no compilers ever generated 
>> @notoc, assembly routines using it prior to P10 are a valid case and should not be
>> punished.  This can be handled by generating a different reloc for @notoc in the
>> two cases.
>>
>> If this solution holds up, then changes to glibc should be unnecessary.
> The main problem is this imposes an extra burden for the linker, where it 
> need to implement the ld.bfd optimization to not generate the power10 stubs 
> if no pcrel is found.  And it seems that lld does not yet support this
> yes and I guess it has not been an issue because @notoc in assembly
> routines should be rare. 
>
> It also means that stubs generation are subject to a combination of
> relocation on different objects (@notoc on assembly does not necessary
> generate power10 stub with default linker option).  I think it should
> be ok, although I see this as really confusing since it took some time
> to figure out what ld.lfd was doing.

My only point is that the linker teams from both groups have been huddling about
this, so I wanted you to be aware the issue goes a little deeper than it appears
on the surface.  How glibc deals with this is up to glibc, of course. :)
Adhemerval Zanella Netto Nov. 8, 2021, 2:11 p.m. UTC | #12
On 08/11/2021 10:59, Bill Schmidt wrote:
> On 11/8/21 7:54 AM, Adhemerval Zanella wrote:
>>
>> On 08/11/2021 10:26, Bill Schmidt wrote:
>>> On 11/8/21 5:37 AM, Adhemerval Zanella wrote:
>>>> On 07/11/2021 11:24, Bill Schmidt wrote:
>>>>> Please coordinate with Alan Modra and Nemanja Ivanovic on this topic.  There are ongoing discussions about bfd and lld linker support around @notoc that will be resolved soon, and Tulio is on vacation, so I don't want the community to make steps they'll have to undo later, or for people to engage in duplicate work.
>>>> For this specific issue I just sent a patch to fix it on glibc side [1].
>>>> However I think it would be good if lld also implements the ld.bfd
>>>> optimization to fallback to older stub generation if no pcrel relocation
>>>> is found (although it is debatable that @notoc should implicit generate
>>>> older ISA code depending of the resulting objects being linked against).
>>>>
>>>> [1] https://patchwork.sourceware.org/project/glibc/patch/20211108113316.8867-1-adhemerval.zanella@linaro.org/
>>> Hi Adhemerval,
>>>
>>> Current course and speed is that @notoc will imply pcrel stubs on P10 and later,
>>> but will not do so on P9 and earlier.  The relocation currently associated with
>>> @notoc actually came with ELFv2 on P8 and, although no compilers ever generated 
>>> @notoc, assembly routines using it prior to P10 are a valid case and should not be
>>> punished.  This can be handled by generating a different reloc for @notoc in the
>>> two cases.
>>>
>>> If this solution holds up, then changes to glibc should be unnecessary.
>> The main problem is this imposes an extra burden for the linker, where it 
>> need to implement the ld.bfd optimization to not generate the power10 stubs 
>> if no pcrel is found.  And it seems that lld does not yet support this
>> yes and I guess it has not been an issue because @notoc in assembly
>> routines should be rare. 
>>
>> It also means that stubs generation are subject to a combination of
>> relocation on different objects (@notoc on assembly does not necessary
>> generate power10 stub with default linker option).  I think it should
>> be ok, although I see this as really confusing since it took some time
>> to figure out what ld.lfd was doing.
> 
> My only point is that the linker teams from both groups have been huddling about
> this, so I wanted you to be aware the issue goes a little deeper than it appears
> on the surface.  How glibc deals with this is up to glibc, of course. :)
> 

Well, my patch had to disable notoc exactly because lld version does not support
such optimization (so default builds always use power10 instruction).  It means
that lld powerpc64le support is either incomplete or @notoc handling need to be
clarified.
develop--- via Libc-alpha Nov. 8, 2021, 2:12 p.m. UTC | #13
On 11/8/21 8:11 AM, Adhemerval Zanella wrote:
>
> On 08/11/2021 10:59, Bill Schmidt wrote:
>> On 11/8/21 7:54 AM, Adhemerval Zanella wrote:
>>> On 08/11/2021 10:26, Bill Schmidt wrote:
>>>> On 11/8/21 5:37 AM, Adhemerval Zanella wrote:
>>>>> On 07/11/2021 11:24, Bill Schmidt wrote:
>>>>>> Please coordinate with Alan Modra and Nemanja Ivanovic on this topic.  There are ongoing discussions about bfd and lld linker support around @notoc that will be resolved soon, and Tulio is on vacation, so I don't want the community to make steps they'll have to undo later, or for people to engage in duplicate work.
>>>>> For this specific issue I just sent a patch to fix it on glibc side [1].
>>>>> However I think it would be good if lld also implements the ld.bfd
>>>>> optimization to fallback to older stub generation if no pcrel relocation
>>>>> is found (although it is debatable that @notoc should implicit generate
>>>>> older ISA code depending of the resulting objects being linked against).
>>>>>
>>>>> [1] https://patchwork.sourceware.org/project/glibc/patch/20211108113316.8867-1-adhemerval.zanella@linaro.org/
>>>> Hi Adhemerval,
>>>>
>>>> Current course and speed is that @notoc will imply pcrel stubs on P10 and later,
>>>> but will not do so on P9 and earlier.  The relocation currently associated with
>>>> @notoc actually came with ELFv2 on P8 and, although no compilers ever generated 
>>>> @notoc, assembly routines using it prior to P10 are a valid case and should not be
>>>> punished.  This can be handled by generating a different reloc for @notoc in the
>>>> two cases.
>>>>
>>>> If this solution holds up, then changes to glibc should be unnecessary.
>>> The main problem is this imposes an extra burden for the linker, where it 
>>> need to implement the ld.bfd optimization to not generate the power10 stubs 
>>> if no pcrel is found.  And it seems that lld does not yet support this
>>> yes and I guess it has not been an issue because @notoc in assembly
>>> routines should be rare. 
>>>
>>> It also means that stubs generation are subject to a combination of
>>> relocation on different objects (@notoc on assembly does not necessary
>>> generate power10 stub with default linker option).  I think it should
>>> be ok, although I see this as really confusing since it took some time
>>> to figure out what ld.lfd was doing.
>> My only point is that the linker teams from both groups have been huddling about
>> this, so I wanted you to be aware the issue goes a little deeper than it appears
>> on the surface.  How glibc deals with this is up to glibc, of course. :)
>>
> Well, my patch had to disable notoc exactly because lld version does not support
> such optimization (so default builds always use power10 instruction).  It means
> that lld powerpc64le support is either incomplete or @notoc handling need to be
> clarified.
Precisely.
Fangrui Song Nov. 8, 2021, 10:38 p.m. UTC | #14
I agree with Adhemerval's point that "The main problem is this imposes
an extra burden for the linker" is unfortunate. (Magic behavior of
--power10-stubs=auto).

On 2021-11-08, Nemanja Ivanovic wrote:
>Just to chime in here...
> 
>Adhemerval,
>just a couple of notes about what you are referring to as an "optimization".
>The reason that LLD emits P10 PC-Relative prefixed instructions in stubs is
>because of an implicit assumption that the *_NOTOC relocation is restricted for
>P10 PC-Relative builds as it was never emitted by any compiler prior. However,
>I don't think it is accurate to refer to no P10 stubs with *_NOTOC as an
>"optimization". The two linkers have slightly different logic for deciding when
>it is OK to use P10 stubs. GLIBC is developed using the GNU toolchain so
>naturally relies on behaviour of GNU tools. In this case, the GNU linkers do
>not emit P10 stubs so GLIBC decided to use the @notoc. However, in a slightly
>different situation, the GNU linkers would generate P10 stubs as well. I
>imagine if GLIBC development added code that causes that to happen, that code
>would never be upstreamed as it causes broken builds.
> 
>Ultimately, what is needed is a clear indication in each relocation that using
>P10 stubs is OK for that call. Relying on presence of P10 relocations elsewhere
>in the compilation unit is problematic for at least two reasons (less efficient
>for the linker to look over all relocations twice, can lead to incorrect
>choices). This is why the proposal is to have different relocations for P10 and
>pre-P10 code. My preference would be to make this clear in the textual
>relocations as well (rather than the assembler emitting two different
>relocations for @notoc depending on -mpower10 vs -m<something_older>).

Thanks for investigating the long-term solution.

Nowadays among popular architectures, all of AArch64/RISC-V/x86 have
open places discussing ABI and toolchain implementations. It'd be nice
if ppc64 had a similar place so that we could detect problems earlier.
If more eyes had stared at the @notoc usage, we probably could have
avoided some friction in glibc and ld.lld...

---

> Current course and speed is that @notoc will imply pcrel stubs on P10
> and later, but will not do so on P9 and earlier.  The relocation
> currently associated with @notoc actually came with ELFv2 on P8 and,
> although no compilers ever generated @notoc, assembly routines using it
> prior to P10 are a valid case and should not be punished.  This can be
> handled by generating a different reloc for @notoc in the two cases.

In the context of glibc's @notoc assembly usage, I don't see why
suppresing @notoc for pre-POWER10 builds  would be a punishment:)

The branch targets are non-preemptible symbols.
In the absence of a long branch, @notoc to toc code needs a thunk (in
both GNU ld and ld.lld) while toc to toc doesn't.

In the case of a long branch, @notoc with PC-relative instructions is
slightly more efficient because it uses PC-relative instead of TOC's loading an address from .branch_lt
but the very minor improvement doesn't seem to justify the complexity :)

I say "minor" because I have checked the glibc build on a POWER9 machine
(GCC 8.3.0, binutils 2.31.1) and an x86-64 using cross compilatioin (GCC
10.2.1, binutils 2.37).
On the POWER9 machine, I don't find any thunk for __syscall_error and a
few other symbols using the glibc NOTOC macro.
On x86-64, I saw a long branch call site but I am not sure it can ever
be hot :)

For
https://sourceware.org/pipermail/libc-alpha/2021-November/132767.html ,
I'd like to support it. It will address the immediate problem linking
glibc with LLD 13.0.0 since glibc 2.35 is not too faraway (scheduled on
2022-02-01).

But the patch seems to complicate things.
I think checking whether with the supplied CFLAGS, CC generates @notoc
is better than checking assembler support.

>Nemanja Ivanovic
>LLVM PPC Backend Development
>IBM Toronto Lab
>Email: nemanjai@ca.ibm.com
>Phone: 905-413-3388
> 
>
>    ----- Original message -----
>    From: "Bill Schmidt" <wschmidt@linux.ibm.com>
>    To: "Adhemerval Zanella" <adhemerval.zanella@linaro.org>, "Fangrui Song"
>    <maskray@google.com>, "Tulio Magno Quites Machado Filho"
>    <tuliom@linux.ibm.com>, "Alan Modra" <amodra@gmail.com>, "Nemanja Ivanovic"
>    <nemanjai@ca.ibm.com>
>    Cc: libc-alpha@sourceware.org
>    Subject: Re: lld status with powerpc64
>    Date: Mon, Nov 8, 2021 9:12 AM
>     
>    On 11/8/21 8:11 AM, Adhemerval Zanella wrote:
>    >
>    > On 08/11/2021 10:59, Bill Schmidt wrote:
>    >> On 11/8/21 7:54 AM, Adhemerval Zanella wrote:
>    >>> On 08/11/2021 10:26, Bill Schmidt wrote:
>    >>>> On 11/8/21 5:37 AM, Adhemerval Zanella wrote:
>    >>>>> On 07/11/2021 11:24, Bill Schmidt wrote:
>    >>>>>> Please coordinate with Alan Modra and Nemanja Ivanovic on this
>    topic.  There are ongoing discussions about bfd and lld linker support
>    around @notoc that will be resolved soon, and Tulio is on vacation, so I
>    don't want the community to make steps they'll have to undo later, or for
>    people to engage in duplicate work.
>    >>>>> For this specific issue I just sent a patch to fix it on glibc side
>    [1].
>    >>>>> However I think it would be good if lld also implements the ld.bfd
>    >>>>> optimization to fallback to older stub generation if no pcrel
>    relocation
>    >>>>> is found (although it is debatable that @notoc should implicit
>    generate
>    >>>>> older ISA code depending of the resulting objects being linked
>    against).
>    >>>>>
>    >>>>> [1] https://patchwork.sourceware.org/project/glibc/patch/
>    20211108113316.8867-1-adhemerval.zanella@linaro.org/
>    >>>> Hi Adhemerval,
>    >>>>
>    >>>> Current course and speed is that @notoc will imply pcrel stubs on P10
>    and later,
>    >>>> but will not do so on P9 and earlier.  The relocation currently
>    associated with
>    >>>> @notoc actually came with ELFv2 on P8 and, although no compilers ever
>    generated
>    >>>> @notoc, assembly routines using it prior to P10 are a valid case and
>    should not be
>    >>>> punished.  This can be handled by generating a different reloc for
>    @notoc in the
>    >>>> two cases.
>    >>>>
>    >>>> If this solution holds up, then changes to glibc should be
>    unnecessary.
>    >>> The main problem is this imposes an extra burden for the linker, where
>    it
>    >>> need to implement the ld.bfd optimization to not generate the power10
>    stubs
>    >>> if no pcrel is found.  And it seems that lld does not yet support this
>    >>> yes and I guess it has not been an issue because @notoc in assembly
>    >>> routines should be rare.
>    >>>
>    >>> It also means that stubs generation are subject to a combination of
>    >>> relocation on different objects (@notoc on assembly does not necessary
>    >>> generate power10 stub with default linker option).  I think it should
>    >>> be ok, although I see this as really confusing since it took some time
>    >>> to figure out what ld.lfd was doing.
>    >> My only point is that the linker teams from both groups have been
>    huddling about
>    >> this, so I wanted you to be aware the issue goes a little deeper than it
>    appears
>    >> on the surface.  How glibc deals with this is up to glibc, of course. :)
>    >>
>    > Well, my patch had to disable notoc exactly because lld version does not
>    support
>    > such optimization (so default builds always use power10 instruction).  It
>    means
>    > that lld powerpc64le support is either incomplete or @notoc handling need
>    to be
>    > clarified.
>    Precisely.
>
> 
>
Adhemerval Zanella Netto Nov. 9, 2021, 12:20 p.m. UTC | #15
On 08/11/2021 19:38, Fangrui Song wrote:
> I agree with Adhemerval's point that "The main problem is this imposes
> an extra burden for the linker" is unfortunate. (Magic behavior of
> --power10-stubs=auto).

That the main problem in fact, where ld.bfd and lld have different semantics
for --power10-stubs=auto and it is the default option. 

> 
> On 2021-11-08, Nemanja Ivanovic wrote:
>> Just to chime in here...
>>
>> Adhemerval,
>> just a couple of notes about what you are referring to as an "optimization".
>> The reason that LLD emits P10 PC-Relative prefixed instructions in stubs is
>> because of an implicit assumption that the *_NOTOC relocation is restricted for
>> P10 PC-Relative builds as it was never emitted by any compiler prior. However,
>> I don't think it is accurate to refer to no P10 stubs with *_NOTOC as an
>> "optimization". The two linkers have slightly different logic for deciding when
>> it is OK to use P10 stubs. GLIBC is developed using the GNU toolchain so
>> naturally relies on behaviour of GNU tools. In this case, the GNU linkers do
>> not emit P10 stubs so GLIBC decided to use the @notoc. However, in a slightly
>> different situation, the GNU linkers would generate P10 stubs as well. I
>> imagine if GLIBC development added code that causes that to happen, that code
>> would never be upstreamed as it causes broken builds.

I agree that glibc is tied to GNU toolchain (we aim to use 'GNU C' along with
all its extension as the base language) and I also agree that such optimization
might be costly for lld.

But from an user perspective I would expect that if ldd is trying to mimic 
ld.bfd behavior regarding power10 stub creation, the semantic should be at least 
the same or error if something is not supported. The issue is exactly that subtle
issues that arise when something is either underspecified or not properly documented.

>>
>> Ultimately, what is needed is a clear indication in each relocation that using
>> P10 stubs is OK for that call. Relying on presence of P10 relocations elsewhere
>> in the compilation unit is problematic for at least two reasons (less efficient
>> for the linker to look over all relocations twice, can lead to incorrect
>> choices). This is why the proposal is to have different relocations for P10 and
>> pre-P10 code. My preference would be to make this clear in the textual
>> relocations as well (rather than the assembler emitting two different
>> relocations for @notoc depending on -mpower10 vs -m<something_older>).
> 
> Thanks for investigating the long-term solution.
> 
> Nowadays among popular architectures, all of AArch64/RISC-V/x86 have
> open places discussing ABI and toolchain implementations. It'd be nice
> if ppc64 had a similar place so that we could detect problems earlier.
> If more eyes had stared at the @notoc usage, we probably could have
> avoided some friction in glibc and ld.lld...
> 
> ---
> 
>> Current course and speed is that @notoc will imply pcrel stubs on P10
>> and later, but will not do so on P9 and earlier.  The relocation
>> currently associated with @notoc actually came with ELFv2 on P8 and,
>> although no compilers ever generated @notoc, assembly routines using it
>> prior to P10 are a valid case and should not be punished.  This can be
>> handled by generating a different reloc for @notoc in the two cases.
> 
> In the context of glibc's @notoc assembly usage, I don't see why
> suppresing @notoc for pre-POWER10 builds  would be a punishment:)
> 
> The branch targets are non-preemptible symbols.
> In the absence of a long branch, @notoc to toc code needs a thunk (in
> both GNU ld and ld.lld) while toc to toc doesn't.
> 
> In the case of a long branch, @notoc with PC-relative instructions is
> slightly more efficient because it uses PC-relative instead of TOC's loading an address from .branch_lt
> but the very minor improvement doesn't seem to justify the complexity :)
> 
> I say "minor" because I have checked the glibc build on a POWER9 machine
> (GCC 8.3.0, binutils 2.31.1) and an x86-64 using cross compilatioin (GCC
> 10.2.1, binutils 2.37).
> On the POWER9 machine, I don't find any thunk for __syscall_error and a
> few other symbols using the glibc NOTOC macro.
> On x86-64, I saw a long branch call site but I am not sure it can ever
> be hot :)

I won't indeed, since @notoc only is helpful when power8 objects are being
created.

> 
> For
> https://sourceware.org/pipermail/libc-alpha/2021-November/132767.html ,
> I'd like to support it. It will address the immediate problem linking
> glibc with LLD 13.0.0 since glibc 2.35 is not too faraway (scheduled on
> 2022-02-01).
> 
> But the patch seems to complicate things.
> I think checking whether with the supplied CFLAGS, CC generates @notoc
> is better than checking assembler support.

Well, that is exactly of the patch does:

  libc_cv_ppc64_notoc=no
  if AC_TRY_COMMAND([${CC-cc} $CFLAGS $CPPFLAGS -S -o conftest.s conftest.c]) \
     && AC_TRY_COMMAND([grep -q -E 'bar@notoc' conftest.s])
  then
    libc_cv_ppc64_notoc=yes
  fi

I think the title is misleading since it tests only compiler, not linker.
diff mbox series

Patch

--- 0   2021-11-05 00:11:43.218731302 -0700
+++ 1   2021-11-05 00:11:37.659286448 -0700
@@ -9,6 +9,14 @@ 
  FAIL: debug/tst-lfschk6
  FAIL: dlfcn/bug-atexit3
  FAIL: elf/check-abi-libc
+FAIL: elf/ifuncmain1pic
+FAIL: elf/ifuncmain1pie
+FAIL: elf/ifuncmain1vis
+FAIL: elf/ifuncmain1vispic
+FAIL: elf/ifuncmain1vispie
+FAIL: elf/ifuncmain3
+FAIL: elf/ifuncmain6pie
+FAIL: elf/tst-tlsopt-powerpc
  FAIL: nptl/tst-cancel24
  FAIL: nptl/tst-minstack-throw
  FAIL: nptl/tst-once5