diff mbox

Fix ix86_split_long_move collision handling with TLS (PR target/66470)

Message ID CAFULd4ag80XLNrt0SP=fB8B=ywM=A2k6rP31=VSzz+x7b4mBAQ@mail.gmail.com
State New
Headers show

Commit Message

Uros Bizjak June 9, 2015, 6:09 p.m. UTC
On Tue, Jun 9, 2015 at 6:21 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Jun 09, 2015 at 06:16:32PM +0200, Uros Bizjak wrote:
>> > something?  Would it be acceptable to just guard the changes in the patch
>> > with !TARGET_X32 and let H.J. deal with that target?  I'm afraid I'm lost
>> > when to ZERO_EXTEND addr (if needed at all), etc.
>>
>> If you wish, I can take your patch and take if further. -mx32 is a
>> delicate beast...
>
> If you could, it would be appreciated, I'm quite busy with OpenMP 4.1 stuff
> now.
> Note that for -m64/-mx32 it will be much harder to create a reproducer,
> because to trigger the bug one has to convince the register allocator
> to allocate the lhs of the load in certain registers (not that hard),
> but also the index register (to be scaled, also not that hard) and
> also the register holding the tls symbol immediate.  Wonder if one has to
> keep all but the two registers live across the load or something similar.

Please find attach a patch that takes your idea slightly further. We
find  perhaps zero-extended UNSPEC_TP, and copy it for further use. At
its place, we simply slap const0_rtx. We know that address to
multi-word values has to be offsettable, which in case of x32 means
that it is NOT zero-extended address.

Uros.

Comments

Jakub Jelinek June 9, 2015, 7:30 p.m. UTC | #1
On Tue, Jun 09, 2015 at 08:09:28PM +0200, Uros Bizjak wrote:
> Please find attach a patch that takes your idea slightly further. We
> find  perhaps zero-extended UNSPEC_TP, and copy it for further use. At
> its place, we simply slap const0_rtx. We know that address to

Is that safe?  I mean, the address, even if offsetable, can have some
immediate already (seems e.g. the offsettable_memref_p predicate just checks
you can plus_constant some small integer and be recognized again) and if you
turn the %gs: into a const0_rtx, it would fail next decompose.
And when you already have the PLUS which has UNSPEC_TP as one of its
arguments, replacing that PLUS with the other argument is IMHO very easy.
Perhaps you are right that there is no need to copy_rtx, supposedly
the rtx shouldn't be shared with anything and thus can be modified in place.

If -mx32 is a non-issue here, then perhaps my initial patch is good enough?

> Index: config/i386/i386.c
> ===================================================================
> --- config/i386/i386.c	(revision 224292)
> +++ config/i386/i386.c	(working copy)
> @@ -22858,7 +22858,7 @@ ix86_split_long_move (rtx operands[])
>  	 Do an lea to the last part and use only one colliding move.  */
>        else if (collisions > 1)
>  	{
> -	  rtx base;
> +	  rtx base, addr, tls_base = NULL_RTX;
>  
>  	  collisions = 1;
>  
> @@ -22869,10 +22869,52 @@ ix86_split_long_move (rtx operands[])
>  	  if (GET_MODE (base) != Pmode)
>  	    base = gen_rtx_REG (Pmode, REGNO (base));
>  
> -	  emit_insn (gen_rtx_SET (base, XEXP (part[1][0], 0)));
> +	  addr = XEXP (part[1][0], 0);
> +	  if (TARGET_TLS_DIRECT_SEG_REFS)
> +	    {
> +	      struct ix86_address parts;
> +	      int ok = ix86_decompose_address (addr, &parts);
> +	      gcc_assert (ok);
> +	      if (parts.seg != SEG_DEFAULT)
> +		{
> +		  /* It is not valid to use %gs: or %fs: in
> +		     lea though, so we need to remove it from the
> +		     address used for lea and add it to each individual
> +		     memory loads instead.  */
> +		  rtx *x = &addr;
> +                  while (GET_CODE (*x) == PLUS)
> +                    {
> +                      for (i = 0; i < 2; i++)
> +			{
> +			  rtx op = XEXP (*x, i);
> +			  if ((GET_CODE (op) == UNSPEC
> +			     && XINT (op, 1) == UNSPEC_TP)
> +			    || (GET_CODE (op) == ZERO_EXTEND
> +				&& GET_CODE (XEXP (op, 0)) == UNSPEC
> +				&& (XINT (XEXP (op, 0), 1)
> +				    == UNSPEC_TP)))
> +			  {
> +			    tls_base = XEXP (*x, i);
> +			    XEXP (*x, i) = const0_rtx;
> +			    break;
> +			  }
> +			}
> +
> +		      if (tls_base)
> +			break;
> +		      x = &XEXP (*x, 0);
> +		    }
> +		  gcc_assert (tls_base);
> +		}
> +	    }
> +	  emit_insn (gen_rtx_SET (base, addr));
> +	  if (tls_base)
> +	    base = gen_rtx_PLUS (GET_MODE (base), base, tls_base);
>  	  part[1][0] = replace_equiv_address (part[1][0], base);
>  	  for (i = 1; i < nparts; i++)
>  	    {
> +	      if (tls_base)
> +		base = copy_rtx (base);
>  	      tmp = plus_constant (Pmode, base, UNITS_PER_WORD * i);
>  	      part[1][i] = replace_equiv_address (part[1][i], tmp);
>  	    }


	Jakub
Uros Bizjak June 10, 2015, 6:06 a.m. UTC | #2
On Tue, Jun 9, 2015 at 9:30 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Jun 09, 2015 at 08:09:28PM +0200, Uros Bizjak wrote:
>> Please find attach a patch that takes your idea slightly further. We
>> find  perhaps zero-extended UNSPEC_TP, and copy it for further use. At
>> its place, we simply slap const0_rtx. We know that address to
>
> Is that safe?  I mean, the address, even if offsetable, can have some
> immediate already (seems e.g. the offsettable_memref_p predicate just checks
> you can plus_constant some small integer and be recognized again) and if you
> turn the %gs: into a const0_rtx, it would fail next decompose.
> And when you already have the PLUS which has UNSPEC_TP as one of its
> arguments, replacing that PLUS with the other argument is IMHO very easy.
> Perhaps you are right that there is no need to copy_rtx, supposedly
> the rtx shouldn't be shared with anything and thus can be modified in place.

Hm, you are right. I was under impression that decompose_address can
handle multiple CONST_INT addends, which is unfortunatelly not the
case.

> If -mx32 is a non-issue here, then perhaps my initial patch is good enough?

It looks to me, that if you detect and record zero-extended UNSPEC_TP,
your original patch would also handle -mx32.

Can you please repost your original patch with the above addition?

Thanks,
Uros.
Richard Sandiford June 10, 2015, 6:38 a.m. UTC | #3
Uros Bizjak <ubizjak@gmail.com> writes:
> On Tue, Jun 9, 2015 at 9:30 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Tue, Jun 09, 2015 at 08:09:28PM +0200, Uros Bizjak wrote:
>>> Please find attach a patch that takes your idea slightly further. We
>>> find  perhaps zero-extended UNSPEC_TP, and copy it for further use. At
>>> its place, we simply slap const0_rtx. We know that address to
>>
>> Is that safe?  I mean, the address, even if offsetable, can have some
>> immediate already (seems e.g. the offsettable_memref_p predicate just checks
>> you can plus_constant some small integer and be recognized again) and if you
>> turn the %gs: into a const0_rtx, it would fail next decompose.
>> And when you already have the PLUS which has UNSPEC_TP as one of its
>> arguments, replacing that PLUS with the other argument is IMHO very easy.
>> Perhaps you are right that there is no need to copy_rtx, supposedly
>> the rtx shouldn't be shared with anything and thus can be modified in place.
>
> Hm, you are right. I was under impression that decompose_address can
> handle multiple CONST_INT addends, which is unfortunatelly not the
> case.

That's in some ways a feature though.  I don't think we want to support
multiple offsets, since that implies having more than one representation
for the same address.

Thanks,
Richard
diff mbox

Patch

Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c	(revision 224292)
+++ config/i386/i386.c	(working copy)
@@ -22858,7 +22858,7 @@  ix86_split_long_move (rtx operands[])
 	 Do an lea to the last part and use only one colliding move.  */
       else if (collisions > 1)
 	{
-	  rtx base;
+	  rtx base, addr, tls_base = NULL_RTX;
 
 	  collisions = 1;
 
@@ -22869,10 +22869,52 @@  ix86_split_long_move (rtx operands[])
 	  if (GET_MODE (base) != Pmode)
 	    base = gen_rtx_REG (Pmode, REGNO (base));
 
-	  emit_insn (gen_rtx_SET (base, XEXP (part[1][0], 0)));
+	  addr = XEXP (part[1][0], 0);
+	  if (TARGET_TLS_DIRECT_SEG_REFS)
+	    {
+	      struct ix86_address parts;
+	      int ok = ix86_decompose_address (addr, &parts);
+	      gcc_assert (ok);
+	      if (parts.seg != SEG_DEFAULT)
+		{
+		  /* It is not valid to use %gs: or %fs: in
+		     lea though, so we need to remove it from the
+		     address used for lea and add it to each individual
+		     memory loads instead.  */
+		  rtx *x = &addr;
+                  while (GET_CODE (*x) == PLUS)
+                    {
+                      for (i = 0; i < 2; i++)
+			{
+			  rtx op = XEXP (*x, i);
+			  if ((GET_CODE (op) == UNSPEC
+			     && XINT (op, 1) == UNSPEC_TP)
+			    || (GET_CODE (op) == ZERO_EXTEND
+				&& GET_CODE (XEXP (op, 0)) == UNSPEC
+				&& (XINT (XEXP (op, 0), 1)
+				    == UNSPEC_TP)))
+			  {
+			    tls_base = XEXP (*x, i);
+			    XEXP (*x, i) = const0_rtx;
+			    break;
+			  }
+			}
+
+		      if (tls_base)
+			break;
+		      x = &XEXP (*x, 0);
+		    }
+		  gcc_assert (tls_base);
+		}
+	    }
+	  emit_insn (gen_rtx_SET (base, addr));
+	  if (tls_base)
+	    base = gen_rtx_PLUS (GET_MODE (base), base, tls_base);
 	  part[1][0] = replace_equiv_address (part[1][0], base);
 	  for (i = 1; i < nparts; i++)
 	    {
+	      if (tls_base)
+		base = copy_rtx (base);
 	      tmp = plus_constant (Pmode, base, UNITS_PER_WORD * i);
 	      part[1][i] = replace_equiv_address (part[1][i], tmp);
 	    }