Message ID | 20230509085835.1143661-1-ardb@kernel.org |
---|---|
State | New |
Headers | show |
Series | i386: Honour -mdirect-extern-access when calling __fentry__ | expand |
On Tue, May 9, 2023 at 10:58 AM Ard Biesheuvel <ardb@kernel.org> wrote: > > The small and medium PIC code models generate profiling calls that > always load the address of __fentry__() via the GOT, even if > -mdirect-extern-access is in effect. > > This deviates from the behavior with respect to other external > references, and results in a longer opcode that relies on linker > relaxation to eliminate the GOT load. In this particular case, the > transformation replaces an indirect 'CALL *__fentry__@GOTPCREL(%rip)' > with either 'CALL __fentry__; NOP' or 'NOP; CALL __fentry__', where the > NOP is a 1 byte NOP that preserves the 6 byte length of the sequence. > > This is problematic for the Linux kernel, which generally relies on > -mdirect-extern-access and hidden visibility to eliminate GOT based > symbol references in code generated with -fpie/-fpic, without having to > depend on linker relaxation. > > The Linux kernel relies on code patching to replace these opcodes with > NOPs at runtime, and this is complicated code that we'd prefer not to > complicate even more by adding support for patching both 5 and 6 byte > sequences as well as parsing the instruction stream to decide which > variant of CALL+NOP we are dealing with. > > So let's honour -mdirect-extern-access, and only load the address of > __fentry__ via the GOT if direct references to external symbols are not > permitted. > > Note that the GOT reference in question is in fact a data reference: we > explicitly load the address of __fentry__ from the GOT, which amounts to > eager binding, rather than emitting a PLT call that could bind eagerly, > lazily or directly at link time. > > gcc/ChangeLog: > > * config/i386/i386.cc (x86_function_profiler): Take > ix86_direct_extern_access into account when generating calls > to __fentry__() HJ, is the patch OK with you? Uros. > > Cc: H.J. Lu <hjl.tools@gmail.com> > Cc: Jakub Jelinek <jakub@redhat.com> > Cc: Richard Biener <rguenther@suse.de> > Cc: Uros Bizjak <ubizjak@gmail.com> > Cc: Hou Wenlong <houwenlong.hwl@antgroup.com> > --- > gcc/config/i386/i386.cc | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > index b1d08ecdb3d44729..69b183abb4318b0a 100644 > --- a/gcc/config/i386/i386.cc > +++ b/gcc/config/i386/i386.cc > @@ -21836,8 +21836,12 @@ x86_function_profiler (FILE *file, int labelno ATTRIBUTE_UNUSED) > break; > case CM_SMALL_PIC: > case CM_MEDIUM_PIC: > - fprintf (file, "1:\tcall\t*%s@GOTPCREL(%%rip)\n", mcount_name); > - break; > + if (!ix86_direct_extern_access) > + { > + fprintf (file, "1:\tcall\t*%s@GOTPCREL(%%rip)\n", mcount_name); > + break; > + } > + /* fall through */ > default: > x86_print_call_or_nop (file, mcount_name); > break; > -- > 2.39.2 >
On Wed, May 10, 2023 at 2:17 AM Uros Bizjak <ubizjak@gmail.com> wrote: > > On Tue, May 9, 2023 at 10:58 AM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > The small and medium PIC code models generate profiling calls that > > always load the address of __fentry__() via the GOT, even if > > -mdirect-extern-access is in effect. > > > > This deviates from the behavior with respect to other external > > references, and results in a longer opcode that relies on linker > > relaxation to eliminate the GOT load. In this particular case, the > > transformation replaces an indirect 'CALL *__fentry__@GOTPCREL(%rip)' > > with either 'CALL __fentry__; NOP' or 'NOP; CALL __fentry__', where the > > NOP is a 1 byte NOP that preserves the 6 byte length of the sequence. > > > > This is problematic for the Linux kernel, which generally relies on > > -mdirect-extern-access and hidden visibility to eliminate GOT based > > symbol references in code generated with -fpie/-fpic, without having to > > depend on linker relaxation. > > > > The Linux kernel relies on code patching to replace these opcodes with > > NOPs at runtime, and this is complicated code that we'd prefer not to > > complicate even more by adding support for patching both 5 and 6 byte > > sequences as well as parsing the instruction stream to decide which > > variant of CALL+NOP we are dealing with. > > > > So let's honour -mdirect-extern-access, and only load the address of > > __fentry__ via the GOT if direct references to external symbols are not > > permitted. > > > > Note that the GOT reference in question is in fact a data reference: we > > explicitly load the address of __fentry__ from the GOT, which amounts to > > eager binding, rather than emitting a PLT call that could bind eagerly, > > lazily or directly at link time. > > > > gcc/ChangeLog: > > > > * config/i386/i386.cc (x86_function_profiler): Take > > ix86_direct_extern_access into account when generating calls > > to __fentry__() > > HJ, is the patch OK with you? LGTM. Thanks. > Uros. > > > > > Cc: H.J. Lu <hjl.tools@gmail.com> > > Cc: Jakub Jelinek <jakub@redhat.com> > > Cc: Richard Biener <rguenther@suse.de> > > Cc: Uros Bizjak <ubizjak@gmail.com> > > Cc: Hou Wenlong <houwenlong.hwl@antgroup.com> > > --- > > gcc/config/i386/i386.cc | 8 ++++++-- > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > > index b1d08ecdb3d44729..69b183abb4318b0a 100644 > > --- a/gcc/config/i386/i386.cc > > +++ b/gcc/config/i386/i386.cc > > @@ -21836,8 +21836,12 @@ x86_function_profiler (FILE *file, int labelno ATTRIBUTE_UNUSED) > > break; > > case CM_SMALL_PIC: > > case CM_MEDIUM_PIC: > > - fprintf (file, "1:\tcall\t*%s@GOTPCREL(%%rip)\n", mcount_name); > > - break; > > + if (!ix86_direct_extern_access) > > + { > > + fprintf (file, "1:\tcall\t*%s@GOTPCREL(%%rip)\n", mcount_name); > > + break; > > + } > > + /* fall through */ > > default: > > x86_print_call_or_nop (file, mcount_name); > > break; > > -- > > 2.39.2 > >
On Thu, May 11, 2023 at 12:04 AM H.J. Lu <hjl.tools@gmail.com> wrote: > > On Wed, May 10, 2023 at 2:17 AM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > On Tue, May 9, 2023 at 10:58 AM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > > > The small and medium PIC code models generate profiling calls that > > > always load the address of __fentry__() via the GOT, even if > > > -mdirect-extern-access is in effect. > > > > > > This deviates from the behavior with respect to other external > > > references, and results in a longer opcode that relies on linker > > > relaxation to eliminate the GOT load. In this particular case, the > > > transformation replaces an indirect 'CALL *__fentry__@GOTPCREL(%rip)' > > > with either 'CALL __fentry__; NOP' or 'NOP; CALL __fentry__', where the > > > NOP is a 1 byte NOP that preserves the 6 byte length of the sequence. > > > > > > This is problematic for the Linux kernel, which generally relies on > > > -mdirect-extern-access and hidden visibility to eliminate GOT based > > > symbol references in code generated with -fpie/-fpic, without having to > > > depend on linker relaxation. > > > > > > The Linux kernel relies on code patching to replace these opcodes with > > > NOPs at runtime, and this is complicated code that we'd prefer not to > > > complicate even more by adding support for patching both 5 and 6 byte > > > sequences as well as parsing the instruction stream to decide which > > > variant of CALL+NOP we are dealing with. > > > > > > So let's honour -mdirect-extern-access, and only load the address of > > > __fentry__ via the GOT if direct references to external symbols are not > > > permitted. > > > > > > Note that the GOT reference in question is in fact a data reference: we > > > explicitly load the address of __fentry__ from the GOT, which amounts to > > > eager binding, rather than emitting a PLT call that could bind eagerly, > > > lazily or directly at link time. > > > > > > gcc/ChangeLog: > > > > > > * config/i386/i386.cc (x86_function_profiler): Take > > > ix86_direct_extern_access into account when generating calls > > > to __fentry__() > > > > HJ, is the patch OK with you? > > LGTM. OK then. Thanks, Uros.
On Thu, 11 May 2023 at 08:08, Uros Bizjak <ubizjak@gmail.com> wrote: > > On Thu, May 11, 2023 at 12:04 AM H.J. Lu <hjl.tools@gmail.com> wrote: > > > > On Wed, May 10, 2023 at 2:17 AM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > > On Tue, May 9, 2023 at 10:58 AM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > > > > > The small and medium PIC code models generate profiling calls that > > > > always load the address of __fentry__() via the GOT, even if > > > > -mdirect-extern-access is in effect. > > > > > > > > This deviates from the behavior with respect to other external > > > > references, and results in a longer opcode that relies on linker > > > > relaxation to eliminate the GOT load. In this particular case, the > > > > transformation replaces an indirect 'CALL *__fentry__@GOTPCREL(%rip)' > > > > with either 'CALL __fentry__; NOP' or 'NOP; CALL __fentry__', where the > > > > NOP is a 1 byte NOP that preserves the 6 byte length of the sequence. > > > > > > > > This is problematic for the Linux kernel, which generally relies on > > > > -mdirect-extern-access and hidden visibility to eliminate GOT based > > > > symbol references in code generated with -fpie/-fpic, without having to > > > > depend on linker relaxation. > > > > > > > > The Linux kernel relies on code patching to replace these opcodes with > > > > NOPs at runtime, and this is complicated code that we'd prefer not to > > > > complicate even more by adding support for patching both 5 and 6 byte > > > > sequences as well as parsing the instruction stream to decide which > > > > variant of CALL+NOP we are dealing with. > > > > > > > > So let's honour -mdirect-extern-access, and only load the address of > > > > __fentry__ via the GOT if direct references to external symbols are not > > > > permitted. > > > > > > > > Note that the GOT reference in question is in fact a data reference: we > > > > explicitly load the address of __fentry__ from the GOT, which amounts to > > > > eager binding, rather than emitting a PLT call that could bind eagerly, > > > > lazily or directly at link time. > > > > > > > > gcc/ChangeLog: > > > > > > > > * config/i386/i386.cc (x86_function_profiler): Take > > > > ix86_direct_extern_access into account when generating calls > > > > to __fentry__() > > > > > > HJ, is the patch OK with you? > > > > LGTM. > > OK then. > Thanks all. Is anything expected of me at this point?
On Fri, May 12, 2023 at 4:07 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > > Note that the GOT reference in question is in fact a data reference: we > > > > > explicitly load the address of __fentry__ from the GOT, which amounts to > > > > > eager binding, rather than emitting a PLT call that could bind eagerly, > > > > > lazily or directly at link time. > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > * config/i386/i386.cc (x86_function_profiler): Take > > > > > ix86_direct_extern_access into account when generating calls > > > > > to __fentry__() > > > > > > > > HJ, is the patch OK with you? > > > > > > LGTM. > > > > OK then. > > > > Thanks all. Is anything expected of me at this point? Do you have write access to the repository? If not I can commit the patch for you, but you should state this [1] in your patch submission. [1] https://gcc.gnu.org/contribute.html Uros.
On Fri, 12 May 2023 at 19:05, Uros Bizjak <ubizjak@gmail.com> wrote: > > On Fri, May 12, 2023 at 4:07 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > > > > Note that the GOT reference in question is in fact a data reference: we > > > > > > explicitly load the address of __fentry__ from the GOT, which amounts to > > > > > > eager binding, rather than emitting a PLT call that could bind eagerly, > > > > > > lazily or directly at link time. > > > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > > > * config/i386/i386.cc (x86_function_profiler): Take > > > > > > ix86_direct_extern_access into account when generating calls > > > > > > to __fentry__() > > > > > > > > > > HJ, is the patch OK with you? > > > > > > > > LGTM. > > > > > > OK then. > > > > > > > Thanks all. Is anything expected of me at this point? > > Do you have write access to the repository? If not I can commit the > patch for you Yes, please. , but you should state this [1] in your patch submission. > > [1] https://gcc.gnu.org/contribute.html > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Thanks,
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index b1d08ecdb3d44729..69b183abb4318b0a 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -21836,8 +21836,12 @@ x86_function_profiler (FILE *file, int labelno ATTRIBUTE_UNUSED) break; case CM_SMALL_PIC: case CM_MEDIUM_PIC: - fprintf (file, "1:\tcall\t*%s@GOTPCREL(%%rip)\n", mcount_name); - break; + if (!ix86_direct_extern_access) + { + fprintf (file, "1:\tcall\t*%s@GOTPCREL(%%rip)\n", mcount_name); + break; + } + /* fall through */ default: x86_print_call_or_nop (file, mcount_name); break;