Message ID | 20160620170501.GA18281@intel.com |
---|---|
State | New |
Headers | show |
On Mon, Jun 20, 2016 at 7:05 PM, H.J. Lu <hongjiu.lu@intel.com> wrote: > Hi, > > This patch implements the alternate code sequence recommended in > > https://groups.google.com/forum/#!topic/x86-64-abi/de5_KnLHxtI > > to load external function address via GOT slot with > > movq func@GOTPCREL(%rip), %rax > > so that linker won't create an PLT entry for extern function > address. > > Tested on x86-64. OK for trunk? > + else if (ix86_force_load_from_GOT_p (op1)) > + { > + /* Load the external function address via the GOT slot to > + avoid PLT. */ > + op1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op1), > + (TARGET_64BIT > + ? UNSPEC_GOTPCREL > + : UNSPEC_GOT)); > + op1 = gen_rtx_CONST (Pmode, op1); > + op1 = gen_const_mem (Pmode, op1); > + /* This symbol must be referenced via a load from the Global > + Offset Table. */ > + set_mem_alias_set (op1, ix86_GOT_alias_set ()); > + op1 = convert_to_mode (mode, op1, 1); > + op1 = force_reg (mode, op1); > + emit_insn (gen_rtx_SET (op0, op1)); > + /* Generate a CLOBBER so that there will be no REG_EQUAL note > + on the last insn to prevent cse and fwprop from replacing > + a GOT load with a constant. */ > + rtx tmp = gen_reg_rtx (Pmode); > + emit_clobber (tmp); > + return; Jeff, is this the recommended way to prevent CSE, as far as RTL infrastructure is concerned? I didn't find any example of this approach with other targets. Uros.
On Mon, Jun 20, 2016 at 12:13 PM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Mon, Jun 20, 2016 at 7:05 PM, H.J. Lu <hongjiu.lu@intel.com> wrote: >> Hi, >> >> This patch implements the alternate code sequence recommended in >> >> https://groups.google.com/forum/#!topic/x86-64-abi/de5_KnLHxtI >> >> to load external function address via GOT slot with >> >> movq func@GOTPCREL(%rip), %rax >> >> so that linker won't create an PLT entry for extern function >> address. >> >> Tested on x86-64. OK for trunk? > >> + else if (ix86_force_load_from_GOT_p (op1)) >> + { >> + /* Load the external function address via the GOT slot to >> + avoid PLT. */ >> + op1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op1), >> + (TARGET_64BIT >> + ? UNSPEC_GOTPCREL >> + : UNSPEC_GOT)); >> + op1 = gen_rtx_CONST (Pmode, op1); >> + op1 = gen_const_mem (Pmode, op1); >> + /* This symbol must be referenced via a load from the Global >> + Offset Table. */ >> + set_mem_alias_set (op1, ix86_GOT_alias_set ()); >> + op1 = convert_to_mode (mode, op1, 1); >> + op1 = force_reg (mode, op1); >> + emit_insn (gen_rtx_SET (op0, op1)); >> + /* Generate a CLOBBER so that there will be no REG_EQUAL note >> + on the last insn to prevent cse and fwprop from replacing >> + a GOT load with a constant. */ >> + rtx tmp = gen_reg_rtx (Pmode); >> + emit_clobber (tmp); >> + return; > > Jeff, is this the recommended way to prevent CSE, as far as RTL > infrastructure is concerned? I didn't find any example of this > approach with other targets. > FWIW, the similar approach is used in ix86_expand_vector_move_misalign, ix86_expand_convert_uns_didf_sse and ix86_expand_vector_init_general as well as other targets: frv/frv.c: emit_clobber (op0); frv/frv.c: emit_clobber (op1); im32c/m32c.c: /* emit_clobber (gen_rtx_REG (HImode, R0L_REGNO)); */ s390/s390.c: emit_clobber (addr); s390/s390.md: emit_clobber (reg0); s390/s390.md: emit_clobber (reg1); s390/s390.md: emit_clobber (reg0); s390/s390.md: emit_clobber (reg0); s390/s390.md: emit_clobber (reg1);
On Mon, Jun 20, 2016 at 9:19 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Mon, Jun 20, 2016 at 12:13 PM, Uros Bizjak <ubizjak@gmail.com> wrote: >> On Mon, Jun 20, 2016 at 7:05 PM, H.J. Lu <hongjiu.lu@intel.com> wrote: >>> Hi, >>> >>> This patch implements the alternate code sequence recommended in >>> >>> https://groups.google.com/forum/#!topic/x86-64-abi/de5_KnLHxtI >>> >>> to load external function address via GOT slot with >>> >>> movq func@GOTPCREL(%rip), %rax >>> >>> so that linker won't create an PLT entry for extern function >>> address. >>> >>> Tested on x86-64. OK for trunk? >> >>> + else if (ix86_force_load_from_GOT_p (op1)) >>> + { >>> + /* Load the external function address via the GOT slot to >>> + avoid PLT. */ >>> + op1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op1), >>> + (TARGET_64BIT >>> + ? UNSPEC_GOTPCREL >>> + : UNSPEC_GOT)); >>> + op1 = gen_rtx_CONST (Pmode, op1); >>> + op1 = gen_const_mem (Pmode, op1); >>> + /* This symbol must be referenced via a load from the Global >>> + Offset Table. */ >>> + set_mem_alias_set (op1, ix86_GOT_alias_set ()); >>> + op1 = convert_to_mode (mode, op1, 1); >>> + op1 = force_reg (mode, op1); >>> + emit_insn (gen_rtx_SET (op0, op1)); >>> + /* Generate a CLOBBER so that there will be no REG_EQUAL note >>> + on the last insn to prevent cse and fwprop from replacing >>> + a GOT load with a constant. */ >>> + rtx tmp = gen_reg_rtx (Pmode); >>> + emit_clobber (tmp); >>> + return; >> >> Jeff, is this the recommended way to prevent CSE, as far as RTL >> infrastructure is concerned? I didn't find any example of this >> approach with other targets. >> > > FWIW, the similar approach is used in ix86_expand_vector_move_misalign, > ix86_expand_convert_uns_didf_sse and ix86_expand_vector_init_general > as well as other targets: > > frv/frv.c: emit_clobber (op0); > frv/frv.c: emit_clobber (op1); > im32c/m32c.c: /* emit_clobber (gen_rtx_REG (HImode, R0L_REGNO)); */ > s390/s390.c: emit_clobber (addr); > s390/s390.md: emit_clobber (reg0); > s390/s390.md: emit_clobber (reg1); > s390/s390.md: emit_clobber (reg0); > s390/s390.md: emit_clobber (reg0); > s390/s390.md: emit_clobber (reg1); These usages mark the whole register as being "clobbered" (=undefined), before only a part of register is written, e.g.: emit_clobber (int_xmm); emit_move_insn (gen_lowpart (DImode, int_xmm), input); They aren't used to prevent unwanted CSE. Uros.
Uros Bizjak <ubizjak@gmail.com> writes: > On Mon, Jun 20, 2016 at 9:19 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >> On Mon, Jun 20, 2016 at 12:13 PM, Uros Bizjak <ubizjak@gmail.com> wrote: >>> On Mon, Jun 20, 2016 at 7:05 PM, H.J. Lu <hongjiu.lu@intel.com> wrote: >>>> Hi, >>>> >>>> This patch implements the alternate code sequence recommended in >>>> >>>> https://groups.google.com/forum/#!topic/x86-64-abi/de5_KnLHxtI >>>> >>>> to load external function address via GOT slot with >>>> >>>> movq func@GOTPCREL(%rip), %rax >>>> >>>> so that linker won't create an PLT entry for extern function >>>> address. >>>> >>>> Tested on x86-64. OK for trunk? >>> >>>> + else if (ix86_force_load_from_GOT_p (op1)) >>>> + { >>>> + /* Load the external function address via the GOT slot to >>>> + avoid PLT. */ >>>> + op1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op1), >>>> + (TARGET_64BIT >>>> + ? UNSPEC_GOTPCREL >>>> + : UNSPEC_GOT)); >>>> + op1 = gen_rtx_CONST (Pmode, op1); >>>> + op1 = gen_const_mem (Pmode, op1); >>>> + /* This symbol must be referenced via a load from the Global >>>> + Offset Table. */ >>>> + set_mem_alias_set (op1, ix86_GOT_alias_set ()); >>>> + op1 = convert_to_mode (mode, op1, 1); >>>> + op1 = force_reg (mode, op1); >>>> + emit_insn (gen_rtx_SET (op0, op1)); >>>> + /* Generate a CLOBBER so that there will be no REG_EQUAL note >>>> + on the last insn to prevent cse and fwprop from replacing >>>> + a GOT load with a constant. */ >>>> + rtx tmp = gen_reg_rtx (Pmode); >>>> + emit_clobber (tmp); >>>> + return; >>> >>> Jeff, is this the recommended way to prevent CSE, as far as RTL >>> infrastructure is concerned? I didn't find any example of this >>> approach with other targets. >>> >> >> FWIW, the similar approach is used in ix86_expand_vector_move_misalign, >> ix86_expand_convert_uns_didf_sse and ix86_expand_vector_init_general >> as well as other targets: >> >> frv/frv.c: emit_clobber (op0); >> frv/frv.c: emit_clobber (op1); >> im32c/m32c.c: /* emit_clobber (gen_rtx_REG (HImode, R0L_REGNO)); */ >> s390/s390.c: emit_clobber (addr); >> s390/s390.md: emit_clobber (reg0); >> s390/s390.md: emit_clobber (reg1); >> s390/s390.md: emit_clobber (reg0); >> s390/s390.md: emit_clobber (reg0); >> s390/s390.md: emit_clobber (reg1); > > These usages mark the whole register as being "clobbered" > (=undefined), before only a part of register is written, e.g.: > > emit_clobber (int_xmm); > emit_move_insn (gen_lowpart (DImode, int_xmm), input); > > They aren't used to prevent unwanted CSE. Since it's being called in the move expander, I thought the normal way of preventing the constant being rematerialised would be to reject it in the move define_insn predicates. FWIW, I agree that using a clobber for this is going to be fragile. Thanks, Richard
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index 9fd14f6..8130161 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -70,6 +70,7 @@ extern bool ix86_expand_set_or_movmem (rtx, rtx, rtx, rtx, rtx, rtx, extern bool constant_address_p (rtx); extern bool legitimate_pic_operand_p (rtx); extern bool legitimate_pic_address_disp_p (rtx); +extern bool ix86_force_load_from_GOT_p (rtx); extern void print_reg (rtx, int, FILE*); extern void ix86_print_operand (FILE *, rtx, int); diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 56a5b9c..c8c5081 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -15182,6 +15182,24 @@ ix86_legitimate_constant_p (machine_mode mode, rtx x) return true; } +/* True if operand X should be loaded from GOT. */ + +bool +ix86_force_load_from_GOT_p (rtx x) +{ + /* External function symbol should be loaded via the GOT slot for + -fno-plt. */ + return (!flag_plt + && !flag_pic + && ix86_cmodel != CM_LARGE + && TARGET_64BIT + && !TARGET_PECOFF + && !TARGET_MACHO + && GET_CODE (x) == SYMBOL_REF + && SYMBOL_REF_FUNCTION_P (x) + && !SYMBOL_REF_LOCAL_P (x)); +} + /* Determine if it's legal to put X into the constant pool. This is not possible for the address of thread-local symbols, which is checked above. */ @@ -15560,6 +15578,10 @@ ix86_legitimate_address_p (machine_mode, rtx addr, bool strict) return false; case UNSPEC_GOTPCREL: + gcc_assert (flag_pic + || ix86_force_load_from_GOT_p (XVECEXP (XEXP (disp, 0), 0, 0))); + goto is_legitimate_pic; + case UNSPEC_PCREL: gcc_assert (flag_pic); goto is_legitimate_pic; @@ -18130,6 +18152,12 @@ ix86_print_operand_address_as (FILE *file, rtx addr, } else if (flag_pic) output_pic_addr_const (file, disp, 0); + else if (GET_CODE (disp) == CONST + && GET_CODE (XEXP (disp, 0)) == UNSPEC + && (XINT (XEXP (disp, 0), 1) == UNSPEC_GOTPCREL + || XINT (XEXP (disp, 0), 1) == UNSPEC_GOT) + && ix86_force_load_from_GOT_p (XVECEXP (XEXP (disp, 0), 0, 0))) + output_pic_addr_const (file, XEXP (disp, 0), code); else output_addr_const (file, disp); } @@ -19448,6 +19476,29 @@ ix86_expand_move (machine_mode mode, rtx operands[]) op1 = convert_to_mode (mode, op1, 1); } } + } + else if (ix86_force_load_from_GOT_p (op1)) + { + /* Load the external function address via the GOT slot to + avoid PLT. */ + op1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op1), + (TARGET_64BIT + ? UNSPEC_GOTPCREL + : UNSPEC_GOT)); + op1 = gen_rtx_CONST (Pmode, op1); + op1 = gen_const_mem (Pmode, op1); + /* This symbol must be referenced via a load from the Global + Offset Table. */ + set_mem_alias_set (op1, ix86_GOT_alias_set ()); + op1 = convert_to_mode (mode, op1, 1); + op1 = force_reg (mode, op1); + emit_insn (gen_rtx_SET (op0, op1)); + /* Generate a CLOBBER so that there will be no REG_EQUAL note + on the last insn to prevent cse and fwprop from replacing + a GOT load with a constant. */ + rtx tmp = gen_reg_rtx (Pmode); + emit_clobber (tmp); + return; } else { diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index b3cf2a3..06a0002 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -149,6 +149,10 @@ (define_predicate "x86_64_immediate_operand" (match_code "const_int,symbol_ref,label_ref,const") { + /* Load the external function address via the GOT slot to avoid PLT. */ + if (ix86_force_load_from_GOT_p (op)) + return false; + if (!TARGET_64BIT) return immediate_operand (op, mode); diff --git a/gcc/testsuite/gcc.target/i386/pr67400-1.c b/gcc/testsuite/gcc.target/i386/pr67400-1.c new file mode 100644 index 0000000..a875b76 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr67400-1.c @@ -0,0 +1,13 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fno-pic -fno-plt" } */ + +extern void bar (void); + +void * +foo (void) +{ + return &bar; +} + +/* { dg-final { scan-assembler "mov\(l|q\)\[ \t\]*bar@GOTPCREL" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-not "mov\(l|q\)\[ \t\]*\\\$bar," { target { ! ia32 } } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr67400-2.c b/gcc/testsuite/gcc.target/i386/pr67400-2.c new file mode 100644 index 0000000..9f3f4bc --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr67400-2.c @@ -0,0 +1,14 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fno-pic -fno-plt" } */ + +extern void bar (void); +extern void *p; + +void +foo (void) +{ + p = &bar; +} + +/* { dg-final { scan-assembler "mov\(l|q\)\[ \t\]*bar@GOTPCREL" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-not "mov\(l|q\)\[ \t\]*\\\$bar," { target { ! ia32 } } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr67400-3.c b/gcc/testsuite/gcc.target/i386/pr67400-3.c new file mode 100644 index 0000000..045974e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr67400-3.c @@ -0,0 +1,16 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fno-pic -fno-plt" } */ + +static void +bar (void) +{ +} + +void * +foo (void) +{ + return &bar; +} + +/* { dg-final { scan-assembler "mov\(l|q\)\[ \t\]*\\\$bar," } } */ +/* { dg-final { scan-assembler-not "mov\(l|q\)\[ \t\]*bar@GOTPCREL" { target { ! ia32 } } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr67400-4.c b/gcc/testsuite/gcc.target/i386/pr67400-4.c new file mode 100644 index 0000000..fd373db --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr67400-4.c @@ -0,0 +1,13 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fno-pic -fno-plt" } */ + +extern void bar (void) __attribute__ ((visibility ("hidden"))); + +void * +foo (void) +{ + return &bar; +} + +/* { dg-final { scan-assembler "mov\(l|q\)\[ \t\]*\\\$bar," } } */ +/* { dg-final { scan-assembler-not "mov\(l|q\)\[ \t\]*bar@GOTPCREL" { target { ! ia32 } } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr67400-5.c b/gcc/testsuite/gcc.target/i386/pr67400-5.c new file mode 100644 index 0000000..9bb98dc --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr67400-5.c @@ -0,0 +1,11 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fno-pic -fno-plt" } */ + +extern void foo (void); +extern void bar (int, int, int, int, int, int, void *); + +void +x (void) +{ + bar (1, 2, 3, 4, 5, 6, foo); +} diff --git a/gcc/testsuite/gcc.target/i386/pr67400-6.c b/gcc/testsuite/gcc.target/i386/pr67400-6.c new file mode 100644 index 0000000..b84196a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr67400-6.c @@ -0,0 +1,13 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fno-pic -fno-plt" } */ + +extern int bar (void); + +int +check (void *p) +{ + return p != &bar; +} + +/* { dg-final { scan-assembler "cmp\(l|q\)\[ \t\]*.*bar@GOTPCREL" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-not "mov\(l|q\)\[ \t\]*\\\$bar," { target { ! ia32 } } } } */