x86-64: Load external function address via GOT slot
diff mbox

Message ID 20160620170501.GA18281@intel.com
State New
Headers show

Commit Message

H.J. Lu June 20, 2016, 5:05 p.m. UTC
Hi,

This patch implements the alternate code sequence recommended in

https://groups.google.com/forum/#!topic/x86-64-abi/de5_KnLHxtI

to load external function address via GOT slot with

movq func@GOTPCREL(%rip), %rax

so that linker won't create an PLT entry for extern function
address.

Tested on x86-64.  OK for trunk?


H.J.
--
gcc/

	PR target/67400
	* config/i386/i386-protos.h (ix86_force_load_from_GOT_p): New.
	* config/i386/i386.c (ix86_force_load_from_GOT_p): New function.
	(ix86_legitimate_address_p): Allow UNSPEC_GOTPCREL if
	ix86_force_load_from_GOT_p returns true.
	(ix86_print_operand_address): Support UNSPEC_GOTPCREL if
	ix86_force_load_from_GOT_p returns true.
	(ix86_expand_move): Load the external function address via the
	GOT slot if ix86_force_load_from_GOT_p returns true.
	* config/i386/predicates.md (x86_64_immediate_operand): Return
	false if ix86_force_load_from_GOT_p returns true.

gcc/testsuite/

	PR target/67400
	* gcc.target/i386/pr67400-1.c: New test.
	* gcc.target/i386/pr67400-2.c: Likewise.
	* gcc.target/i386/pr67400-3.c: Likewise.
	* gcc.target/i386/pr67400-4.c: Likewise.
	* gcc.target/i386/pr67400-5.c: Likewise.
	* gcc.target/i386/pr67400-6.c: Likewise.
---
 gcc/config/i386/i386-protos.h             |  1 +
 gcc/config/i386/i386.c                    | 51 +++++++++++++++++++++++++++++++
 gcc/config/i386/predicates.md             |  4 +++
 gcc/testsuite/gcc.target/i386/pr67400-1.c | 13 ++++++++
 gcc/testsuite/gcc.target/i386/pr67400-2.c | 14 +++++++++
 gcc/testsuite/gcc.target/i386/pr67400-3.c | 16 ++++++++++
 gcc/testsuite/gcc.target/i386/pr67400-4.c | 13 ++++++++
 gcc/testsuite/gcc.target/i386/pr67400-5.c | 11 +++++++
 gcc/testsuite/gcc.target/i386/pr67400-6.c | 13 ++++++++
 9 files changed, 136 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67400-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67400-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67400-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67400-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67400-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67400-6.c

Comments

Uros Bizjak June 20, 2016, 7:13 p.m. UTC | #1
On Mon, Jun 20, 2016 at 7:05 PM, H.J. Lu <hongjiu.lu@intel.com> wrote:
> Hi,
>
> This patch implements the alternate code sequence recommended in
>
> https://groups.google.com/forum/#!topic/x86-64-abi/de5_KnLHxtI
>
> to load external function address via GOT slot with
>
> movq func@GOTPCREL(%rip), %rax
>
> so that linker won't create an PLT entry for extern function
> address.
>
> Tested on x86-64.  OK for trunk?

> +  else if (ix86_force_load_from_GOT_p (op1))
> +    {
> +      /* Load the external function address via the GOT slot to
> +        avoid PLT.  */
> +      op1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op1),
> +                           (TARGET_64BIT
> +                            ? UNSPEC_GOTPCREL
> +                            : UNSPEC_GOT));
> +      op1 = gen_rtx_CONST (Pmode, op1);
> +      op1 = gen_const_mem (Pmode, op1);
> +      /* This symbol must be referenced via a load from the Global
> +        Offset Table.  */
> +      set_mem_alias_set (op1, ix86_GOT_alias_set ());
> +      op1 = convert_to_mode (mode, op1, 1);
> +      op1 = force_reg (mode, op1);
> +      emit_insn (gen_rtx_SET (op0, op1));
> +      /* Generate a CLOBBER so that there will be no REG_EQUAL note
> +        on the last insn to prevent cse and fwprop from replacing
> +        a GOT load with a constant.  */
> +      rtx tmp = gen_reg_rtx (Pmode);
> +      emit_clobber (tmp);
> +      return;

Jeff, is this the recommended way to prevent CSE, as far as RTL
infrastructure is concerned? I didn't find any example of this
approach with other targets.

Uros.
H.J. Lu June 20, 2016, 7:19 p.m. UTC | #2
On Mon, Jun 20, 2016 at 12:13 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Mon, Jun 20, 2016 at 7:05 PM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>> Hi,
>>
>> This patch implements the alternate code sequence recommended in
>>
>> https://groups.google.com/forum/#!topic/x86-64-abi/de5_KnLHxtI
>>
>> to load external function address via GOT slot with
>>
>> movq func@GOTPCREL(%rip), %rax
>>
>> so that linker won't create an PLT entry for extern function
>> address.
>>
>> Tested on x86-64.  OK for trunk?
>
>> +  else if (ix86_force_load_from_GOT_p (op1))
>> +    {
>> +      /* Load the external function address via the GOT slot to
>> +        avoid PLT.  */
>> +      op1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op1),
>> +                           (TARGET_64BIT
>> +                            ? UNSPEC_GOTPCREL
>> +                            : UNSPEC_GOT));
>> +      op1 = gen_rtx_CONST (Pmode, op1);
>> +      op1 = gen_const_mem (Pmode, op1);
>> +      /* This symbol must be referenced via a load from the Global
>> +        Offset Table.  */
>> +      set_mem_alias_set (op1, ix86_GOT_alias_set ());
>> +      op1 = convert_to_mode (mode, op1, 1);
>> +      op1 = force_reg (mode, op1);
>> +      emit_insn (gen_rtx_SET (op0, op1));
>> +      /* Generate a CLOBBER so that there will be no REG_EQUAL note
>> +        on the last insn to prevent cse and fwprop from replacing
>> +        a GOT load with a constant.  */
>> +      rtx tmp = gen_reg_rtx (Pmode);
>> +      emit_clobber (tmp);
>> +      return;
>
> Jeff, is this the recommended way to prevent CSE, as far as RTL
> infrastructure is concerned? I didn't find any example of this
> approach with other targets.
>

FWIW, the similar approach is used in ix86_expand_vector_move_misalign,
ix86_expand_convert_uns_didf_sse and ix86_expand_vector_init_general
as well as other targets:

frv/frv.c:  emit_clobber (op0);
frv/frv.c:  emit_clobber (op1);
im32c/m32c.c:  /*  emit_clobber (gen_rtx_REG (HImode, R0L_REGNO)); */
s390/s390.c:  emit_clobber (addr);
s390/s390.md:  emit_clobber (reg0);
s390/s390.md:  emit_clobber (reg1);
s390/s390.md:  emit_clobber (reg0);
s390/s390.md:  emit_clobber (reg0);
s390/s390.md:  emit_clobber (reg1);
Uros Bizjak June 20, 2016, 7:26 p.m. UTC | #3
On Mon, Jun 20, 2016 at 9:19 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Jun 20, 2016 at 12:13 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> On Mon, Jun 20, 2016 at 7:05 PM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>>> Hi,
>>>
>>> This patch implements the alternate code sequence recommended in
>>>
>>> https://groups.google.com/forum/#!topic/x86-64-abi/de5_KnLHxtI
>>>
>>> to load external function address via GOT slot with
>>>
>>> movq func@GOTPCREL(%rip), %rax
>>>
>>> so that linker won't create an PLT entry for extern function
>>> address.
>>>
>>> Tested on x86-64.  OK for trunk?
>>
>>> +  else if (ix86_force_load_from_GOT_p (op1))
>>> +    {
>>> +      /* Load the external function address via the GOT slot to
>>> +        avoid PLT.  */
>>> +      op1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op1),
>>> +                           (TARGET_64BIT
>>> +                            ? UNSPEC_GOTPCREL
>>> +                            : UNSPEC_GOT));
>>> +      op1 = gen_rtx_CONST (Pmode, op1);
>>> +      op1 = gen_const_mem (Pmode, op1);
>>> +      /* This symbol must be referenced via a load from the Global
>>> +        Offset Table.  */
>>> +      set_mem_alias_set (op1, ix86_GOT_alias_set ());
>>> +      op1 = convert_to_mode (mode, op1, 1);
>>> +      op1 = force_reg (mode, op1);
>>> +      emit_insn (gen_rtx_SET (op0, op1));
>>> +      /* Generate a CLOBBER so that there will be no REG_EQUAL note
>>> +        on the last insn to prevent cse and fwprop from replacing
>>> +        a GOT load with a constant.  */
>>> +      rtx tmp = gen_reg_rtx (Pmode);
>>> +      emit_clobber (tmp);
>>> +      return;
>>
>> Jeff, is this the recommended way to prevent CSE, as far as RTL
>> infrastructure is concerned? I didn't find any example of this
>> approach with other targets.
>>
>
> FWIW, the similar approach is used in ix86_expand_vector_move_misalign,
> ix86_expand_convert_uns_didf_sse and ix86_expand_vector_init_general
> as well as other targets:
>
> frv/frv.c:  emit_clobber (op0);
> frv/frv.c:  emit_clobber (op1);
> im32c/m32c.c:  /*  emit_clobber (gen_rtx_REG (HImode, R0L_REGNO)); */
> s390/s390.c:  emit_clobber (addr);
> s390/s390.md:  emit_clobber (reg0);
> s390/s390.md:  emit_clobber (reg1);
> s390/s390.md:  emit_clobber (reg0);
> s390/s390.md:  emit_clobber (reg0);
> s390/s390.md:  emit_clobber (reg1);

These usages mark the whole register as being "clobbered"
(=undefined), before only a part of register is written, e.g.:

      emit_clobber (int_xmm);
      emit_move_insn (gen_lowpart (DImode, int_xmm), input);

They aren't used to prevent unwanted CSE.

Uros.
Richard Sandiford June 20, 2016, 7:46 p.m. UTC | #4
Uros Bizjak <ubizjak@gmail.com> writes:
> On Mon, Jun 20, 2016 at 9:19 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Mon, Jun 20, 2016 at 12:13 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>> On Mon, Jun 20, 2016 at 7:05 PM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>>>> Hi,
>>>>
>>>> This patch implements the alternate code sequence recommended in
>>>>
>>>> https://groups.google.com/forum/#!topic/x86-64-abi/de5_KnLHxtI
>>>>
>>>> to load external function address via GOT slot with
>>>>
>>>> movq func@GOTPCREL(%rip), %rax
>>>>
>>>> so that linker won't create an PLT entry for extern function
>>>> address.
>>>>
>>>> Tested on x86-64.  OK for trunk?
>>>
>>>> +  else if (ix86_force_load_from_GOT_p (op1))
>>>> +    {
>>>> +      /* Load the external function address via the GOT slot to
>>>> +        avoid PLT.  */
>>>> +      op1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op1),
>>>> +                           (TARGET_64BIT
>>>> +                            ? UNSPEC_GOTPCREL
>>>> +                            : UNSPEC_GOT));
>>>> +      op1 = gen_rtx_CONST (Pmode, op1);
>>>> +      op1 = gen_const_mem (Pmode, op1);
>>>> +      /* This symbol must be referenced via a load from the Global
>>>> +        Offset Table.  */
>>>> +      set_mem_alias_set (op1, ix86_GOT_alias_set ());
>>>> +      op1 = convert_to_mode (mode, op1, 1);
>>>> +      op1 = force_reg (mode, op1);
>>>> +      emit_insn (gen_rtx_SET (op0, op1));
>>>> +      /* Generate a CLOBBER so that there will be no REG_EQUAL note
>>>> +        on the last insn to prevent cse and fwprop from replacing
>>>> +        a GOT load with a constant.  */
>>>> +      rtx tmp = gen_reg_rtx (Pmode);
>>>> +      emit_clobber (tmp);
>>>> +      return;
>>>
>>> Jeff, is this the recommended way to prevent CSE, as far as RTL
>>> infrastructure is concerned? I didn't find any example of this
>>> approach with other targets.
>>>
>>
>> FWIW, the similar approach is used in ix86_expand_vector_move_misalign,
>> ix86_expand_convert_uns_didf_sse and ix86_expand_vector_init_general
>> as well as other targets:
>>
>> frv/frv.c:  emit_clobber (op0);
>> frv/frv.c:  emit_clobber (op1);
>> im32c/m32c.c:  /*  emit_clobber (gen_rtx_REG (HImode, R0L_REGNO)); */
>> s390/s390.c:  emit_clobber (addr);
>> s390/s390.md:  emit_clobber (reg0);
>> s390/s390.md:  emit_clobber (reg1);
>> s390/s390.md:  emit_clobber (reg0);
>> s390/s390.md:  emit_clobber (reg0);
>> s390/s390.md:  emit_clobber (reg1);
>
> These usages mark the whole register as being "clobbered"
> (=undefined), before only a part of register is written, e.g.:
>
>       emit_clobber (int_xmm);
>       emit_move_insn (gen_lowpart (DImode, int_xmm), input);
>
> They aren't used to prevent unwanted CSE.

Since it's being called in the move expander, I thought the normal
way of preventing the constant being rematerialised would be to reject
it in the move define_insn predicates.

FWIW, I agree that using a clobber for this is going to be fragile.

Thanks,
Richard

Patch
diff mbox

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 9fd14f6..8130161 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -70,6 +70,7 @@  extern bool ix86_expand_set_or_movmem (rtx, rtx, rtx, rtx, rtx, rtx,
 extern bool constant_address_p (rtx);
 extern bool legitimate_pic_operand_p (rtx);
 extern bool legitimate_pic_address_disp_p (rtx);
+extern bool ix86_force_load_from_GOT_p (rtx);
 extern void print_reg (rtx, int, FILE*);
 extern void ix86_print_operand (FILE *, rtx, int);
 
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 56a5b9c..c8c5081 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -15182,6 +15182,24 @@  ix86_legitimate_constant_p (machine_mode mode, rtx x)
   return true;
 }
 
+/* True if operand X should be loaded from GOT.  */
+
+bool
+ix86_force_load_from_GOT_p (rtx x)
+{
+  /* External function symbol should be loaded via the GOT slot for
+     -fno-plt.  */
+  return (!flag_plt
+	  && !flag_pic
+	  && ix86_cmodel != CM_LARGE
+	  && TARGET_64BIT
+	  && !TARGET_PECOFF
+	  && !TARGET_MACHO
+	  && GET_CODE (x) == SYMBOL_REF
+	  && SYMBOL_REF_FUNCTION_P (x)
+	  && !SYMBOL_REF_LOCAL_P (x));
+}
+
 /* Determine if it's legal to put X into the constant pool.  This
    is not possible for the address of thread-local symbols, which
    is checked above.  */
@@ -15560,6 +15578,10 @@  ix86_legitimate_address_p (machine_mode, rtx addr, bool strict)
 	    return false;
 
 	  case UNSPEC_GOTPCREL:
+	    gcc_assert (flag_pic
+			|| ix86_force_load_from_GOT_p (XVECEXP (XEXP (disp, 0), 0, 0)));
+	    goto is_legitimate_pic;
+
 	  case UNSPEC_PCREL:
 	    gcc_assert (flag_pic);
 	    goto is_legitimate_pic;
@@ -18130,6 +18152,12 @@  ix86_print_operand_address_as (FILE *file, rtx addr,
 	}
       else if (flag_pic)
 	output_pic_addr_const (file, disp, 0);
+      else if (GET_CODE (disp) == CONST
+	       && GET_CODE (XEXP (disp, 0)) == UNSPEC
+	       && (XINT (XEXP (disp, 0), 1) == UNSPEC_GOTPCREL
+		   || XINT (XEXP (disp, 0), 1) == UNSPEC_GOT)
+	       && ix86_force_load_from_GOT_p (XVECEXP (XEXP (disp, 0), 0, 0)))
+	output_pic_addr_const (file, XEXP (disp, 0), code);
       else
 	output_addr_const (file, disp);
     }
@@ -19448,6 +19476,29 @@  ix86_expand_move (machine_mode mode, rtx operands[])
 	      op1 = convert_to_mode (mode, op1, 1);
 	    }
 	}
+   }
+  else if (ix86_force_load_from_GOT_p (op1))
+    {
+      /* Load the external function address via the GOT slot to
+	 avoid PLT.  */
+      op1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op1),
+			    (TARGET_64BIT
+			     ? UNSPEC_GOTPCREL
+			     : UNSPEC_GOT));
+      op1 = gen_rtx_CONST (Pmode, op1);
+      op1 = gen_const_mem (Pmode, op1);
+      /* This symbol must be referenced via a load from the Global
+	 Offset Table.  */
+      set_mem_alias_set (op1, ix86_GOT_alias_set ());
+      op1 = convert_to_mode (mode, op1, 1);
+      op1 = force_reg (mode, op1);
+      emit_insn (gen_rtx_SET (op0, op1));
+      /* Generate a CLOBBER so that there will be no REG_EQUAL note
+	 on the last insn to prevent cse and fwprop from replacing
+	 a GOT load with a constant.  */
+      rtx tmp = gen_reg_rtx (Pmode);
+      emit_clobber (tmp);
+      return;
     }
   else
     {
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index b3cf2a3..06a0002 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -149,6 +149,10 @@ 
 (define_predicate "x86_64_immediate_operand"
   (match_code "const_int,symbol_ref,label_ref,const")
 {
+  /* Load the external function address via the GOT slot to avoid PLT.  */
+  if (ix86_force_load_from_GOT_p (op))
+    return false;
+
   if (!TARGET_64BIT)
     return immediate_operand (op, mode);
 
diff --git a/gcc/testsuite/gcc.target/i386/pr67400-1.c b/gcc/testsuite/gcc.target/i386/pr67400-1.c
new file mode 100644
index 0000000..a875b76
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr67400-1.c
@@ -0,0 +1,13 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fno-pic -fno-plt" } */
+
+extern void bar (void);
+
+void *
+foo (void)
+{
+  return &bar;
+}
+
+/* { dg-final { scan-assembler "mov\(l|q\)\[ \t\]*bar@GOTPCREL" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "mov\(l|q\)\[ \t\]*\\\$bar," { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr67400-2.c b/gcc/testsuite/gcc.target/i386/pr67400-2.c
new file mode 100644
index 0000000..9f3f4bc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr67400-2.c
@@ -0,0 +1,14 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fno-pic -fno-plt" } */
+
+extern void bar (void);
+extern void *p;
+
+void
+foo (void)
+{
+  p = &bar;
+}
+
+/* { dg-final { scan-assembler "mov\(l|q\)\[ \t\]*bar@GOTPCREL" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "mov\(l|q\)\[ \t\]*\\\$bar," { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr67400-3.c b/gcc/testsuite/gcc.target/i386/pr67400-3.c
new file mode 100644
index 0000000..045974e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr67400-3.c
@@ -0,0 +1,16 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fno-pic -fno-plt" } */
+
+static void
+bar (void)
+{
+}
+
+void *
+foo (void)
+{
+  return &bar;
+}
+
+/* { dg-final { scan-assembler "mov\(l|q\)\[ \t\]*\\\$bar," } } */
+/* { dg-final { scan-assembler-not "mov\(l|q\)\[ \t\]*bar@GOTPCREL" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr67400-4.c b/gcc/testsuite/gcc.target/i386/pr67400-4.c
new file mode 100644
index 0000000..fd373db
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr67400-4.c
@@ -0,0 +1,13 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fno-pic -fno-plt" } */
+
+extern void bar (void) __attribute__ ((visibility ("hidden")));
+
+void *
+foo (void)
+{
+  return &bar;
+}
+
+/* { dg-final { scan-assembler "mov\(l|q\)\[ \t\]*\\\$bar," } } */
+/* { dg-final { scan-assembler-not "mov\(l|q\)\[ \t\]*bar@GOTPCREL" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr67400-5.c b/gcc/testsuite/gcc.target/i386/pr67400-5.c
new file mode 100644
index 0000000..9bb98dc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr67400-5.c
@@ -0,0 +1,11 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fno-pic -fno-plt" } */
+
+extern void foo (void);
+extern void bar (int, int, int, int, int, int, void *);
+
+void
+x (void)
+{
+  bar (1, 2, 3, 4, 5, 6, foo);
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr67400-6.c b/gcc/testsuite/gcc.target/i386/pr67400-6.c
new file mode 100644
index 0000000..b84196a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr67400-6.c
@@ -0,0 +1,13 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fno-pic -fno-plt" } */
+
+extern int bar (void);
+
+int
+check (void *p)
+{
+  return p != &bar;
+}
+
+/* { dg-final { scan-assembler "cmp\(l|q\)\[ \t\]*.*bar@GOTPCREL" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "mov\(l|q\)\[ \t\]*\\\$bar," { target { ! ia32 } } } } */