diff mbox series

Oprimize stack_protect_set_1_<mode> followed by a move to the same register (PR target/92841)

Message ID 20191210095735.GY10088@tucnak
State New
Headers show
Series Oprimize stack_protect_set_1_<mode> followed by a move to the same register (PR target/92841) | expand

Commit Message

Jakub Jelinek Dec. 10, 2019, 9:57 a.m. UTC
Hi!

The stack_protect_set_1_<mode> pattern intentionally clears the register it
used as a temporary to read the canary from the register and push it back
on the stack for security reasons, to make sure the stack canary isn't
spilled somewhere else etc.  On the following testcase, we end up with a
weird:
	movq	%fs:40, %rax
	movq	%rax, 24(%rsp)
	xorl	%eax, %eax
	movl	$30, %eax
sequence though, where the reporter rightfully complains it is a waste
to clear the register and immediately set it to something else.

We really don't want to split this into two patterns, because then the
scheduler and whatever other post-RA passes could stick some further
code in between and increase the lifetime of the security sensitive
data in the register.

So, the following patch uses peephole2 to merge the
stack_protect_set_1_<mode> insn with following setter of the same register.
Only SImode and for TARGET_64BIT DImode moves are considered, as QI/HImode
(unless actually emitted as SImode) moves don't overwrite the whole
register, and for simplicity only the most common cases (no XMM/MM etc.
sources).

Bootstrapped/regtested on x86_64-linux and i686-linux, additionally tested
(just check-gcc check-c++-all check-target-libstdc++-v3 with
--target_board=unix/-fstack-protector-strong).  The -fstack-protector-strong
testing was done with additional logging whenever the peephole2's kick in.
During the -m64 testing, it kicked in 17682 times, during -m32 testing
6344 times.  Ok for trunk?

2019-12-10  Jakub Jelinek  <jakub@redhat.com>

	PR target/92841
	* config/i386/i386.md (*stack_protect_set_2_<mode>,
	*stack_protect_set_3): New define_insns and corresponding
	define_peephole2s.

	* gcc.target/i386/pr92841.c: New test.


	Jakub

Comments

Andreas Schwab Dec. 10, 2019, 10:02 a.m. UTC | #1
On Dez 10 2019, Jakub Jelinek wrote:

> --- gcc/testsuite/gcc.target/i386/pr92841.c.jj	2019-12-09 19:38:29.572759215 +0100
> +++ gcc/testsuite/gcc.target/i386/pr92841.c	2019-12-09 19:40:59.642492417 +0100
> @@ -0,0 +1,17 @@
> +/* PR target/92841 */
> +/* { dg-do compile { target fstack_protector } } */
> +/* { dg-options "-O2 -fstack-protector-strong -masm=att" } */
> +/* { dg-final { scan-assembler-not "xor\[lq]\t%(\[re]\[a-z0-9]*), %\\1\[\n\r]*\tmov\[lq]\t\[^\n\r]*, %\\1" } } */
> +
> +const struct S { int b; } c[] = {30, 12, 20, 0, 11};
> +void bar (int *);
> +
> +void
> +foo (void)
> +{
> +  int e[4];
> +  const struct S *a;
> +  for (a = c; a < c + sizeof (c); a++)
> +    if (a->b)

This accesses c beyond bounds.  Does that invalidate the test?

Andreas.
Jakub Jelinek Dec. 10, 2019, 10:06 a.m. UTC | #2
On Tue, Dec 10, 2019 at 11:02:39AM +0100, Andreas Schwab wrote:
> On Dez 10 2019, Jakub Jelinek wrote:
> 
> > --- gcc/testsuite/gcc.target/i386/pr92841.c.jj	2019-12-09 19:38:29.572759215 +0100
> > +++ gcc/testsuite/gcc.target/i386/pr92841.c	2019-12-09 19:40:59.642492417 +0100
> > @@ -0,0 +1,17 @@
> > +/* PR target/92841 */
> > +/* { dg-do compile { target fstack_protector } } */
> > +/* { dg-options "-O2 -fstack-protector-strong -masm=att" } */
> > +/* { dg-final { scan-assembler-not "xor\[lq]\t%(\[re]\[a-z0-9]*), %\\1\[\n\r]*\tmov\[lq]\t\[^\n\r]*, %\\1" } } */
> > +
> > +const struct S { int b; } c[] = {30, 12, 20, 0, 11};
> > +void bar (int *);
> > +
> > +void
> > +foo (void)
> > +{
> > +  int e[4];
> > +  const struct S *a;
> > +  for (a = c; a < c + sizeof (c); a++)
> > +    if (a->b)
> 
> This accesses c beyond bounds.  Does that invalidate the test?

Thanks for noticing, changed into
  for (a = c; a < c + sizeof (c) / sizeof (c[0]); a++)
in my copy, the testcase still FAILs before the patch and PASSes with it.

	Jakub
Jakub Jelinek Dec. 17, 2019, 9:18 a.m. UTC | #3
Hi!

I'd like to ping this patch (with the sizeof (c) -> sizeof (c) / sizeof (c[0])
testsuite fix Andreas pointed out).

Thanks!

On Tue, Dec 10, 2019 at 10:57:35AM +0100, Jakub Jelinek wrote:
> 2019-12-10  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR target/92841
> 	* config/i386/i386.md (*stack_protect_set_2_<mode>,
> 	*stack_protect_set_3): New define_insns and corresponding
> 	define_peephole2s.
> 
> 	* gcc.target/i386/pr92841.c: New test.

	Jakub
Uros Bizjak Dec. 17, 2019, 9:47 a.m. UTC | #4
On Tue, Dec 10, 2019 at 10:57 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> Hi!
>
> The stack_protect_set_1_<mode> pattern intentionally clears the register it
> used as a temporary to read the canary from the register and push it back
> on the stack for security reasons, to make sure the stack canary isn't
> spilled somewhere else etc.  On the following testcase, we end up with a
> weird:
>         movq    %fs:40, %rax
>         movq    %rax, 24(%rsp)
>         xorl    %eax, %eax
>         movl    $30, %eax
> sequence though, where the reporter rightfully complains it is a waste
> to clear the register and immediately set it to something else.
>
> We really don't want to split this into two patterns, because then the
> scheduler and whatever other post-RA passes could stick some further
> code in between and increase the lifetime of the security sensitive
> data in the register.
>
> So, the following patch uses peephole2 to merge the
> stack_protect_set_1_<mode> insn with following setter of the same register.
> Only SImode and for TARGET_64BIT DImode moves are considered, as QI/HImode
> (unless actually emitted as SImode) moves don't overwrite the whole
> register, and for simplicity only the most common cases (no XMM/MM etc.
> sources).
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, additionally tested
> (just check-gcc check-c++-all check-target-libstdc++-v3 with
> --target_board=unix/-fstack-protector-strong).  The -fstack-protector-strong
> testing was done with additional logging whenever the peephole2's kick in.
> During the -m64 testing, it kicked in 17682 times, during -m32 testing
> 6344 times.  Ok for trunk?
>
> 2019-12-10  Jakub Jelinek  <jakub@redhat.com>
>
>         PR target/92841
>         * config/i386/i386.md (*stack_protect_set_2_<mode>,
>         *stack_protect_set_3): New define_insns and corresponding
>         define_peephole2s.
>
>         * gcc.target/i386/pr92841.c: New test.

OK with a couple of changes below.

Thanks,
Uros.

> --- gcc/config/i386/i386.md.jj  2019-12-05 10:04:05.357235555 +0100
> +++ gcc/config/i386/i386.md     2019-12-09 19:29:31.578885594 +0100
> @@ -19732,6 +19732,98 @@ (define_insn "@stack_protect_set_1_<mode
>    "mov{<imodesuffix>}\t{%1, %2|%2, %1}\;mov{<imodesuffix>}\t{%2, %0|%0, %2}\;xor{l}\t%k2, %k2"
>    [(set_attr "type" "multi")])
>
> +;; Patterns and peephole2s to optimize stack_protect_set_1_<mode>
> +;; immediately followed by *mov{s,d}i_internal to the same register,
> +;; where we can avoid the xor{l} above.  We don't split this, so that
> +;; scheduling or anything else doesn't separate the *stack_protect_set*
> +;; pattern from the set of the register that overwrites the register
> +;; with a new value.
> +(define_insn "*stack_protect_set_2_<mode>"
> +  [(set (match_operand:PTR 0 "memory_operand" "=m")
> +       (unspec:PTR [(match_operand:PTR 3 "memory_operand" "m")]
> +                   UNSPEC_SP_SET))
> +   (set (match_operand:SI 1 "register_operand" "=&r")
> +       (match_operand:SI 2 "general_operand" "g"))
> +   (clobber (reg:CC FLAGS_REG))]
> +  "reload_completed
> +   && !reg_overlap_mentioned_p (operands[1], operands[2])"
> +{
> +  output_asm_insn ("mov{<imodesuffix>}\t{%3, %<k>1|%<k>1, %3}", operands);
> +  output_asm_insn ("mov{<imodesuffix>}\t{%<k>1, %0|%0, %<k>1}", operands);

This looks much better than the current asm template in
stack_protect_set_1_<mode> and stack_protect_test_1_<mode>. Can you
maybe also change templates there?

> +  if (pic_32bit_operand (operands[2], SImode)
> +      || ix86_use_lea_for_mov (insn, operands + 1))
> +    return "lea{l}\t{%E2, %1|%1, %E2}";
> +  else
> +    return "mov{l}\t{%2, %1|%1, %2}";
> +}
> +  [(set_attr "type" "multi")
> +   (set_attr "length" "24")])
> +
> +(define_peephole2
> + [(parallel [(set (match_operand:PTR 0 "memory_operand")
> +                 (unspec:PTR [(match_operand:PTR 1 "memory_operand")]
> +                             UNSPEC_SP_SET))
> +            (set (match_operand:PTR 2 "general_reg_operand") (const_int 0))
> +            (clobber (reg:CC FLAGS_REG))])
> +  (set (match_operand:SI 3 "general_reg_operand")
> +       (match_operand:SI 4 "general_operand"))]
> + "reload_completed
> +  && REGNO (operands[2]) == REGNO (operands[3])
> +  && !reg_overlap_mentioned_p (operands[3], operands[4])
> +  && (general_reg_operand (operands[4], SImode)
> +      || !register_operand (operands[4], SImode))"
> + [(parallel [(set (match_dup 0)
> +                 (unspec:PTR [(match_dup 1)] UNSPEC_SP_SET))
> +            (set (match_dup 3) (match_dup 4))
> +            (clobber (reg:CC FLAGS_REG))])])

No need for "reload_completed" in peephole2 patterns. Also (IIRC),
operand predicate is not needed for operand 4 when it is mentioned in
the insn condition.

> +(define_insn "*stack_protect_set_3"
> +  [(set (match_operand:DI 0 "memory_operand" "=m,m,m")
> +       (unspec:DI [(match_operand:DI 3 "memory_operand" "m,m,m")]
> +                  UNSPEC_SP_SET))
> +   (set (match_operand:DI 1 "register_operand" "=&r,r,r")
> +       (match_operand:DI 2 "general_operand" "Z,rem,i"))
> +   (clobber (reg:CC FLAGS_REG))]
> +  "TARGET_64BIT
> +   && reload_completed
> +   && !reg_overlap_mentioned_p (operands[1], operands[2])"
> +{
> +  output_asm_insn ("mov{q}\t{%3, %1|%1, %3}", operands);
> +  output_asm_insn ("mov{q}\t{%1, %0|%0, %1}", operands);
> +  if (which_alternative == 0)
> +    return "mov{l}\t{%k2, %k1|%k1, %k2}";
> +  else if (which_alternative == 2)
> +    return "movabs{q}\t{%2, %1|%1, %2}";
> +  else if (pic_32bit_operand (operands[2], DImode)
> +          || ix86_use_lea_for_mov (insn, operands + 1))
> +    return "lea{q}\t{%E2, %1|%1, %E2}";
> +  else
> +    return "mov{q}\t{%2, %1|%1, %2}";
> +}
> +  [(set_attr "type" "multi")
> +   (set_attr "length" "24")])
> +
> +(define_peephole2
> + [(parallel [(set (match_operand:DI 0 "memory_operand")
> +                 (unspec:DI [(match_operand:DI 1 "memory_operand")]
> +                            UNSPEC_SP_SET))
> +            (set (match_operand:DI 2 "general_reg_operand") (const_int 0))
> +            (clobber (reg:CC FLAGS_REG))])
> +  (set (match_dup 2) (match_operand:DI 3 "general_operand"))]
> + "TARGET_64BIT
> +  && reload_completed
> +  && !reg_overlap_mentioned_p (operands[2], operands[3])
> +  && (general_reg_operand (operands[3], DImode)
> +      || memory_operand (operands[3], DImode)
> +      || x86_64_zext_immediate_operand (operands[3], DImode)
> +      || x86_64_immediate_operand (operands[3], DImode)
> +      || (CONSTANT_P (operands[3])
> +         && (!flag_pic || LEGITIMATE_PIC_OPERAND_P (operands[3]))))"
> + [(parallel [(set (match_dup 0)
> +                 (unspec:PTR [(match_dup 1)] UNSPEC_SP_SET))
> +            (set (match_dup 2) (match_dup 3))
> +            (clobber (reg:CC FLAGS_REG))])])

Same here.

>  (define_expand "stack_protect_test"
>    [(match_operand 0 "memory_operand")
>     (match_operand 1 "memory_operand")
> --- gcc/testsuite/gcc.target/i386/pr92841.c.jj  2019-12-09 19:38:29.572759215 +0100
> +++ gcc/testsuite/gcc.target/i386/pr92841.c     2019-12-09 19:40:59.642492417 +0100
> @@ -0,0 +1,17 @@
> +/* PR target/92841 */
> +/* { dg-do compile { target fstack_protector } } */
> +/* { dg-options "-O2 -fstack-protector-strong -masm=att" } */
> +/* { dg-final { scan-assembler-not "xor\[lq]\t%(\[re]\[a-z0-9]*), %\\1\[\n\r]*\tmov\[lq]\t\[^\n\r]*, %\\1" } } */
> +
> +const struct S { int b; } c[] = {30, 12, 20, 0, 11};
> +void bar (int *);
> +
> +void
> +foo (void)
> +{
> +  int e[4];
> +  const struct S *a;
> +  for (a = c; a < c + sizeof (c); a++)
> +    if (a->b)
> +      bar (e);
> +}
>
>         Jakub
>
Jan Hubicka Dec. 19, 2019, 3:01 p.m. UTC | #5
> Hi!
> 
> I'd like to ping this patch (with the sizeof (c) -> sizeof (c) / sizeof (c[0])
> testsuite fix Andreas pointed out).
> 
> Thanks!
> 
> On Tue, Dec 10, 2019 at 10:57:35AM +0100, Jakub Jelinek wrote:
> > 2019-12-10  Jakub Jelinek  <jakub@redhat.com>
> > 
> > 	PR target/92841
> > 	* config/i386/i386.md (*stack_protect_set_2_<mode>,
> > 	*stack_protect_set_3): New define_insns and corresponding
> > 	define_peephole2s.
> > 
> > 	* gcc.target/i386/pr92841.c: New test.

I now get build failure of Firefox with LTO due to:

movabsq $.LC12, %rax

which is output by:

(insn:TI 468 3 849 2 (parallel [                                                
            (set (mem/v/f/c:DI (plus:DI (reg/f:DI 6 bp)                         
                        (const_int -56 [0xffffffffffffffc8])) [1 D.22910+0 S8 A64])
                (unspec:DI [                                                    
                        (mem/v/f:DI (const_int 40 [0x28]) [0 MEM[(<address-space-1> long unsigned int *)40B]+0 S8 A64 AS1])
                    ] UNSPEC_SP_SET))                                           
            (set (reg:DI 0 ax [157])                                            
                (symbol_ref/f:DI ("*.LC12") [flags 0x2]  <var_decl 0x7ffff56ac870 *.LC12>))
            (clobber (reg:CC 17 flags))                                         
        ]) 1042 {*stack_protect_set_3}                                          
     (expr_list:REG_UNUSED (reg:CC 17 flags)                                    
        (nil)))                                                                 

I suppose we need to be more careful about the symbolic operands for
movabs?

Honza
Jakub Jelinek Dec. 19, 2019, 3:41 p.m. UTC | #6
On Thu, Dec 19, 2019 at 04:01:31PM +0100, Jan Hubicka wrote:
> I now get build failure of Firefox with LTO due to:
> 
> movabsq $.LC12, %rax
> 
> which is output by:
> 
> (insn:TI 468 3 849 2 (parallel [                                                
>             (set (mem/v/f/c:DI (plus:DI (reg/f:DI 6 bp)                         
>                         (const_int -56 [0xffffffffffffffc8])) [1 D.22910+0 S8 A64])
>                 (unspec:DI [                                                    
>                         (mem/v/f:DI (const_int 40 [0x28]) [0 MEM[(<address-space-1> long unsigned int *)40B]+0 S8 A64 AS1])
>                     ] UNSPEC_SP_SET))                                           
>             (set (reg:DI 0 ax [157])                                            
>                 (symbol_ref/f:DI ("*.LC12") [flags 0x2]  <var_decl 0x7ffff56ac870 *.LC12>))
>             (clobber (reg:CC 17 flags))                                         
>         ]) 1042 {*stack_protect_set_3}                                          
>      (expr_list:REG_UNUSED (reg:CC 17 flags)                                    
>         (nil)))                                                                 
> 
> I suppose we need to be more careful about the symbolic operands for
> movabs?

Is that -fPIC or -fPIE or -fno-pic?
What were the two insns before peephole2?

	Jakub
Jan Hubicka Dec. 19, 2019, 3:50 p.m. UTC | #7
> On Thu, Dec 19, 2019 at 04:01:31PM +0100, Jan Hubicka wrote:
> > I now get build failure of Firefox with LTO due to:
> > 
> > movabsq $.LC12, %rax
> > 
> > which is output by:
> > 
> > (insn:TI 468 3 849 2 (parallel [                                                
> >             (set (mem/v/f/c:DI (plus:DI (reg/f:DI 6 bp)                         
> >                         (const_int -56 [0xffffffffffffffc8])) [1 D.22910+0 S8 A64])
> >                 (unspec:DI [                                                    
> >                         (mem/v/f:DI (const_int 40 [0x28]) [0 MEM[(<address-space-1> long unsigned int *)40B]+0 S8 A64 AS1])
> >                     ] UNSPEC_SP_SET))                                           
> >             (set (reg:DI 0 ax [157])                                            
> >                 (symbol_ref/f:DI ("*.LC12") [flags 0x2]  <var_decl 0x7ffff56ac870 *.LC12>))
> >             (clobber (reg:CC 17 flags))                                         
> >         ]) 1042 {*stack_protect_set_3}                                          
> >      (expr_list:REG_UNUSED (reg:CC 17 flags)                                    
> >         (nil)))                                                                 
> > 
> > I suppose we need to be more careful about the symbolic operands for
> > movabs?
> 
> Is that -fPIC or -fPIE or -fno-pic?
> What were the two insns before peephole2?
It is -fPIC transforming

(insn 9 7 12 2 (parallel [                                                      
            (set (mem/v/f/c:DI (plus:DI (reg/f:DI 6 bp)                         
                        (const_int -56 [0xffffffffffffffc8])) [1 D.22910+0 S8 A64])
                (unspec:DI [                                                    
                        (mem/v/f:DI (const_int 40 [0x28]) [0 MEM[(<address-space-1> long unsigned int *)40B]+0 S8 A64 AS1])
                    ] UNSPEC_SP_SET))                                           
            (set (reg:DI 0 ax [157])                                            
                (const_int 0 [0]))                                              
            (clobber (reg:CC 17 flags))                                         
        ]) "/aux/hubicka/firefox2019-trunktest/dist/include/testing/gtest/include/gtest/gtest.h":1538:17 1039 {stack_protect_set_1_di}
     (expr_list:REG_UNUSED (reg:CC 17 flags)                                    
        (expr_list:REG_UNUSED (reg:DI 0 ax [157])                               
            (nil))))                                                             
(note 12 9 13 2 NOTE_INSN_DELETED)                                              
(insn 13 12 426 2 (set (reg:DI 0 ax [159])                                      
        (symbol_ref/f:DI ("*.LC12") [flags 0x2]  <var_decl 0x7ffff56ac870 *.LC12>)) 66 {*movdi_internal}
     (nil))                                                                     

to

(insn 468 7 426 2 (parallel [                                                   
            (set (mem/v/f/c:DI (plus:DI (reg/f:DI 6 bp)                         
                        (const_int -56 [0xffffffffffffffc8])) [1 D.22910+0 S8 A64])
                (unspec:DI [                                                    
                        (mem/v/f:DI (const_int 40 [0x28]) [0 MEM[(<address-space-1> long unsigned int *)40B]+0 S8 A64 AS1])
                    ] UNSPEC_SP_SET))                                           
            (set (reg:DI 0 ax [157])                                            
                (symbol_ref/f:DI ("*.LC12") [flags 0x2]  <var_decl 0x7ffff56ac870 *.LC12>))
            (clobber (reg:CC 17 flags))                                         
        ]) -1                                                                   
     (nil))                                                                     

Outputting the move as RIP relative movq would work.
LC12 is string "s" and has nothing to do with stack protecting.
Jakub Jelinek Dec. 19, 2019, 5:23 p.m. UTC | #8
On Thu, Dec 19, 2019 at 04:50:40PM +0100, Jan Hubicka wrote:
> Outputting the move as RIP relative movq would work.
> LC12 is string "s" and has nothing to do with stack protecting.

This should fix it by doing more carefully what *movdi_internal
does.  Will bootstrap/regtest it tonight.

2019-12-19  Jakub Jelinek  <jakub@redhat.com>

	PR target/92841
	* config/i386/i386.md (*stack_protect_set_3): For pic_32bit_operand
	always use lea{q}, no matter what value which_alternative has.

	* gcc.target/i386/pr92841-2.c: New test.

--- gcc/config/i386/i386.md.jj	2019-12-19 13:31:52.424108730 +0100
+++ gcc/config/i386/i386.md	2019-12-19 18:12:54.064747056 +0100
@@ -19863,12 +19863,13 @@ (define_insn "*stack_protect_set_3"
 {
   output_asm_insn ("mov{q}\t{%3, %1|%1, %3}", operands);
   output_asm_insn ("mov{q}\t{%1, %0|%0, %1}", operands);
-  if (which_alternative == 0)
+  if (pic_32bit_operand (operands[2], DImode))
+    return "lea{q}\t{%E2, %1|%1, %E2}";
+  else if (which_alternative == 0)
     return "mov{l}\t{%k2, %k1|%k1, %k2}";
   else if (which_alternative == 2)
     return "movabs{q}\t{%2, %1|%1, %2}";
-  else if (pic_32bit_operand (operands[2], DImode)
-	   || ix86_use_lea_for_mov (insn, operands + 1))
+  else if (ix86_use_lea_for_mov (insn, operands + 1))
     return "lea{q}\t{%E2, %1|%1, %E2}";
   else
     return "mov{q}\t{%2, %1|%1, %2}";
--- gcc/testsuite/gcc.target/i386/pr92841-2.c.jj	2019-12-19 18:18:31.295587557 +0100
+++ gcc/testsuite/gcc.target/i386/pr92841-2.c	2019-12-19 18:19:37.800570175 +0100
@@ -0,0 +1,18 @@
+/* PR target/92841 */
+/* { dg-do compile { target { { { *-*-linux* } && lp64 } && fstack_protector } } } */
+/* { dg-options "-O2 -fpic -fstack-protector-strong -masm=att" } */
+/* { dg-final { scan-assembler "leaq\tbuf2\\\(%rip\\\)," } } */
+
+static char buf2[64];
+void bar (char *, char *);
+
+void
+foo ()
+{
+  char buf[64];
+  char *p = buf2;
+  asm ("" : "+a" (p));
+  char *q = buf;
+  asm ("" : "+r" (q));
+  bar (p, q);
+}


	Jakub
Jakub Jelinek Dec. 19, 2019, 11:26 p.m. UTC | #9
On Thu, Dec 19, 2019 at 06:23:59PM +0100, Jakub Jelinek wrote:
> On Thu, Dec 19, 2019 at 04:50:40PM +0100, Jan Hubicka wrote:
> > Outputting the move as RIP relative movq would work.
> > LC12 is string "s" and has nothing to do with stack protecting.
> 
> This should fix it by doing more carefully what *movdi_internal
> does.  Will bootstrap/regtest it tonight.

Passed bootstrap/regtest on x86_64-linux and i686-linux.  Ok for trunk?
> 
> 2019-12-19  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR target/92841
> 	* config/i386/i386.md (*stack_protect_set_3): For pic_32bit_operand
> 	always use lea{q}, no matter what value which_alternative has.
> 
> 	* gcc.target/i386/pr92841-2.c: New test.

	Jakub
Uros Bizjak Dec. 20, 2019, 7:26 a.m. UTC | #10
On Fri, Dec 20, 2019 at 12:26 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Thu, Dec 19, 2019 at 06:23:59PM +0100, Jakub Jelinek wrote:
> > On Thu, Dec 19, 2019 at 04:50:40PM +0100, Jan Hubicka wrote:
> > > Outputting the move as RIP relative movq would work.
> > > LC12 is string "s" and has nothing to do with stack protecting.
> >
> > This should fix it by doing more carefully what *movdi_internal
> > does.  Will bootstrap/regtest it tonight.
>
> Passed bootstrap/regtest on x86_64-linux and i686-linux.  Ok for trunk?
> >
> > 2019-12-19  Jakub Jelinek  <jakub@redhat.com>
> >
> >       PR target/92841
> >       * config/i386/i386.md (*stack_protect_set_3): For pic_32bit_operand
> >       always use lea{q}, no matter what value which_alternative has.
> >
> >       * gcc.target/i386/pr92841-2.c: New test.

LGTM.

Thanks,
Uros.

>         Jakub
>
diff mbox series

Patch

--- gcc/config/i386/i386.md.jj	2019-12-05 10:04:05.357235555 +0100
+++ gcc/config/i386/i386.md	2019-12-09 19:29:31.578885594 +0100
@@ -19732,6 +19732,98 @@  (define_insn "@stack_protect_set_1_<mode
   "mov{<imodesuffix>}\t{%1, %2|%2, %1}\;mov{<imodesuffix>}\t{%2, %0|%0, %2}\;xor{l}\t%k2, %k2"
   [(set_attr "type" "multi")])
 
+;; Patterns and peephole2s to optimize stack_protect_set_1_<mode>
+;; immediately followed by *mov{s,d}i_internal to the same register,
+;; where we can avoid the xor{l} above.  We don't split this, so that
+;; scheduling or anything else doesn't separate the *stack_protect_set*
+;; pattern from the set of the register that overwrites the register
+;; with a new value.
+(define_insn "*stack_protect_set_2_<mode>"
+  [(set (match_operand:PTR 0 "memory_operand" "=m")
+	(unspec:PTR [(match_operand:PTR 3 "memory_operand" "m")]
+		    UNSPEC_SP_SET))
+   (set (match_operand:SI 1 "register_operand" "=&r")
+	(match_operand:SI 2 "general_operand" "g"))
+   (clobber (reg:CC FLAGS_REG))]
+  "reload_completed
+   && !reg_overlap_mentioned_p (operands[1], operands[2])"
+{
+  output_asm_insn ("mov{<imodesuffix>}\t{%3, %<k>1|%<k>1, %3}", operands);
+  output_asm_insn ("mov{<imodesuffix>}\t{%<k>1, %0|%0, %<k>1}", operands);
+  if (pic_32bit_operand (operands[2], SImode)
+      || ix86_use_lea_for_mov (insn, operands + 1))
+    return "lea{l}\t{%E2, %1|%1, %E2}";
+  else
+    return "mov{l}\t{%2, %1|%1, %2}";
+}
+  [(set_attr "type" "multi")
+   (set_attr "length" "24")])
+
+(define_peephole2
+ [(parallel [(set (match_operand:PTR 0 "memory_operand")
+		  (unspec:PTR [(match_operand:PTR 1 "memory_operand")]
+			      UNSPEC_SP_SET))
+	     (set (match_operand:PTR 2 "general_reg_operand") (const_int 0))
+	     (clobber (reg:CC FLAGS_REG))])
+  (set (match_operand:SI 3 "general_reg_operand")
+       (match_operand:SI 4 "general_operand"))]
+ "reload_completed
+  && REGNO (operands[2]) == REGNO (operands[3])
+  && !reg_overlap_mentioned_p (operands[3], operands[4])
+  && (general_reg_operand (operands[4], SImode)
+      || !register_operand (operands[4], SImode))"
+ [(parallel [(set (match_dup 0)
+		  (unspec:PTR [(match_dup 1)] UNSPEC_SP_SET))
+	     (set (match_dup 3) (match_dup 4))
+	     (clobber (reg:CC FLAGS_REG))])])
+
+(define_insn "*stack_protect_set_3"
+  [(set (match_operand:DI 0 "memory_operand" "=m,m,m")
+	(unspec:DI [(match_operand:DI 3 "memory_operand" "m,m,m")]
+		   UNSPEC_SP_SET))
+   (set (match_operand:DI 1 "register_operand" "=&r,r,r")
+	(match_operand:DI 2 "general_operand" "Z,rem,i"))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_64BIT
+   && reload_completed
+   && !reg_overlap_mentioned_p (operands[1], operands[2])"
+{
+  output_asm_insn ("mov{q}\t{%3, %1|%1, %3}", operands);
+  output_asm_insn ("mov{q}\t{%1, %0|%0, %1}", operands);
+  if (which_alternative == 0)
+    return "mov{l}\t{%k2, %k1|%k1, %k2}";
+  else if (which_alternative == 2)
+    return "movabs{q}\t{%2, %1|%1, %2}";
+  else if (pic_32bit_operand (operands[2], DImode)
+	   || ix86_use_lea_for_mov (insn, operands + 1))
+    return "lea{q}\t{%E2, %1|%1, %E2}";
+  else
+    return "mov{q}\t{%2, %1|%1, %2}";
+}
+  [(set_attr "type" "multi")
+   (set_attr "length" "24")])
+
+(define_peephole2
+ [(parallel [(set (match_operand:DI 0 "memory_operand")
+		  (unspec:DI [(match_operand:DI 1 "memory_operand")]
+			     UNSPEC_SP_SET))
+	     (set (match_operand:DI 2 "general_reg_operand") (const_int 0))
+	     (clobber (reg:CC FLAGS_REG))])
+  (set (match_dup 2) (match_operand:DI 3 "general_operand"))]
+ "TARGET_64BIT
+  && reload_completed
+  && !reg_overlap_mentioned_p (operands[2], operands[3])
+  && (general_reg_operand (operands[3], DImode)
+      || memory_operand (operands[3], DImode)
+      || x86_64_zext_immediate_operand (operands[3], DImode)
+      || x86_64_immediate_operand (operands[3], DImode)
+      || (CONSTANT_P (operands[3])
+	  && (!flag_pic || LEGITIMATE_PIC_OPERAND_P (operands[3]))))"
+ [(parallel [(set (match_dup 0)
+		  (unspec:PTR [(match_dup 1)] UNSPEC_SP_SET))
+	     (set (match_dup 2) (match_dup 3))
+	     (clobber (reg:CC FLAGS_REG))])])
+
 (define_expand "stack_protect_test"
   [(match_operand 0 "memory_operand")
    (match_operand 1 "memory_operand")
--- gcc/testsuite/gcc.target/i386/pr92841.c.jj	2019-12-09 19:38:29.572759215 +0100
+++ gcc/testsuite/gcc.target/i386/pr92841.c	2019-12-09 19:40:59.642492417 +0100
@@ -0,0 +1,17 @@ 
+/* PR target/92841 */
+/* { dg-do compile { target fstack_protector } } */
+/* { dg-options "-O2 -fstack-protector-strong -masm=att" } */
+/* { dg-final { scan-assembler-not "xor\[lq]\t%(\[re]\[a-z0-9]*), %\\1\[\n\r]*\tmov\[lq]\t\[^\n\r]*, %\\1" } } */
+
+const struct S { int b; } c[] = {30, 12, 20, 0, 11};
+void bar (int *);
+
+void
+foo (void)
+{
+  int e[4];
+  const struct S *a;
+  for (a = c; a < c + sizeof (c); a++)
+    if (a->b)
+      bar (e);
+}