Use movmem optab to attempt inline expansion of __builtin_memmove()
diff mbox series

Message ID 85494829-c454-3448-2951-0cc2a36df507@linux.ibm.com
State New
Headers show
Series
  • Use movmem optab to attempt inline expansion of __builtin_memmove()
Related show

Commit Message

Aaron Sawdey Sept. 27, 2019, 6:23 p.m. UTC
This is the third piece of my effort to improve inline expansion of memmove. The
first two parts I posted back in June fixed the names of the optab entries
involved so that optab cpymem is used for memcpy() and optab movmem is used for
memmove(). This piece adds support for actually attempting to invoke the movmem
optab to do inline expansion of __builtin_memmove().

Because what needs to be done for memmove() is very similar to memcpy(), I have
just added a bool parm "might_overlap" to several of the functions involved so
the same functions can handle both. The name might_overlap comes from the fact
that if we still have a memmove() call at expand, this means
gimple_fold_builtin_memory_op() was not able to prove that the source and
destination do not overlap.

There are a few places where might_overlap gets used to keep us from trying to
use the by-pieces infrastructure or generate a copy loop, as neither of those
things will work correctly if source and destination overlap.

I've restructured things slightly in emit_block_move_hints() so that we can
try the pattern first if we already know that by-pieces won't work. This way
we can bail out immediately in the might_overlap case.

Bootstrap/regtest passed on ppc64le, in progress on x86_64. If everything passes,
is this ok for trunk?


2019-09-27  Aaron Sawdey <acsawdey@linux.ibm.com>

	* builtins.c (expand_builtin_memory_copy_args): Add might_overlap parm.
	(expand_builtin_memcpy): Use might_overlap parm.
	(expand_builtin_mempcpy_args): Use might_overlap parm.
	(expand_builtin_memmove): Call expand_builtin_memory_copy_args.
	(expand_builtin_memory_copy_args): Add might_overlap parm.
	* expr.c (emit_block_move_via_cpymem): Rename to
	emit_block_move_via_pattern, add might_overlap parm, use cpymem
	or movmem optab as appropriate.
	(emit_block_move_hints): Add might_overlap parm, do the right
	thing for might_overlap==true.
	* expr.h (emit_block_move_hints): Update prototype.

Comments

Jeff Law Oct. 1, 2019, 9:45 p.m. UTC | #1
On 9/27/19 12:23 PM, Aaron Sawdey wrote:
> This is the third piece of my effort to improve inline expansion of memmove. The
> first two parts I posted back in June fixed the names of the optab entries
> involved so that optab cpymem is used for memcpy() and optab movmem is used for
> memmove(). This piece adds support for actually attempting to invoke the movmem
> optab to do inline expansion of __builtin_memmove().
> 
> Because what needs to be done for memmove() is very similar to memcpy(), I have
> just added a bool parm "might_overlap" to several of the functions involved so
> the same functions can handle both. The name might_overlap comes from the fact
> that if we still have a memmove() call at expand, this means
> gimple_fold_builtin_memory_op() was not able to prove that the source and
> destination do not overlap.
> 
> There are a few places where might_overlap gets used to keep us from trying to
> use the by-pieces infrastructure or generate a copy loop, as neither of those
> things will work correctly if source and destination overlap.
> 
> I've restructured things slightly in emit_block_move_hints() so that we can
> try the pattern first if we already know that by-pieces won't work. This way
> we can bail out immediately in the might_overlap case.
> 
> Bootstrap/regtest passed on ppc64le, in progress on x86_64. If everything passes,
> is this ok for trunk?
> 
> 
> 2019-09-27  Aaron Sawdey <acsawdey@linux.ibm.com>
> 
> 	* builtins.c (expand_builtin_memory_copy_args): Add might_overlap parm.
> 	(expand_builtin_memcpy): Use might_overlap parm.
> 	(expand_builtin_mempcpy_args): Use might_overlap parm.
> 	(expand_builtin_memmove): Call expand_builtin_memory_copy_args.
> 	(expand_builtin_memory_copy_args): Add might_overlap parm.
> 	* expr.c (emit_block_move_via_cpymem): Rename to
> 	emit_block_move_via_pattern, add might_overlap parm, use cpymem
> 	or movmem optab as appropriate.
> 	(emit_block_move_hints): Add might_overlap parm, do the right
> 	thing for might_overlap==true.
> 	* expr.h (emit_block_move_hints): Update prototype.
> 
> 
> 
> 
> Index: gcc/builtins.c
> ===================================================================
> --- gcc/builtins.c	(revision 276131)
> +++ gcc/builtins.c	(working copy)
> @@ -3894,10 +3897,11 @@
>  			&probable_max_size);
>    src_str = c_getstr (src);
> 
> -  /* If SRC is a string constant and block move would be done
> -     by pieces, we can avoid loading the string from memory
> -     and only stored the computed constants.  */
> -  if (src_str
> +  /* If SRC is a string constant and block move would be done by
> +     pieces, we can avoid loading the string from memory and only
> +     stored the computed constants.  I'm not sure if the by pieces
> +     method works if src/dest are overlapping, so avoid that case.  */
> +  if (src_str && !might_overlap
I don't think you need the check here.  c_getstr, when it returns
somethign useful is going to be returning a string constant.  Think read
only literals here.  I'm pretty sure overlap isn't going to be possible.

>        && CONST_INT_P (len_rtx)
>        && (unsigned HOST_WIDE_INT) INTVAL (len_rtx) <= strlen (src_str) + 1
>        && can_store_by_pieces (INTVAL (len_rtx), builtin_memcpy_read_str,
> @@ -3922,7 +3926,7 @@
>        && (retmode == RETURN_BEGIN || target == const0_rtx))
>      method = BLOCK_OP_TAILCALL;
>    bool use_mempcpy_call = (targetm.libc_has_fast_function (BUILT_IN_MEMPCPY)
> -			   && retmode == RETURN_END
> +			   && retmode == RETURN_END && !might_overlap
Put the && !might_overlap on its own line for readability.


> Index: gcc/expr.c
> ===================================================================
> --- gcc/expr.c	(revision 276131)
> +++ gcc/expr.c	(working copy)
> @@ -1622,13 +1624,29 @@
>        set_mem_size (y, const_size);
>      }
> 
> -  if (CONST_INT_P (size) && can_move_by_pieces (INTVAL (size), align))
> +  bool pieces_ok = can_move_by_pieces (INTVAL (size), align);
> +  bool pattern_ok = false;
> +
> +  if (!CONST_INT_P (size) || !pieces_ok || might_overlap)
> +    {
> +      pattern_ok =
> +	emit_block_move_via_pattern (x, y, size, align,
> +				     expected_align, expected_size,
> +				     min_size, max_size, probable_max_size,
> +				     might_overlap);
> +      if (!pattern_ok && might_overlap)
> +	{
> +	  /* Do not try any of the other methods below as they are not safe
> +	     for overlapping moves.  */
> +	  *is_move_done = false;
> +	  return retval;
> +	}
> +    }
> +
> +  if (pattern_ok) ;
Drop the semi-colon down to its own line like

if (whatever)
  ;
else if (...)
  something
else if (...)
  something else

With the changes noted above, this is fine for th trunk.

jeff
Aaron Sawdey Oct. 2, 2019, 2:21 p.m. UTC | #2
On 10/1/19 4:45 PM, Jeff Law wrote:
> On 9/27/19 12:23 PM, Aaron Sawdey wrote:
>> This is the third piece of my effort to improve inline expansion of memmove. The
>> first two parts I posted back in June fixed the names of the optab entries
>> involved so that optab cpymem is used for memcpy() and optab movmem is used for
>> memmove(). This piece adds support for actually attempting to invoke the movmem
>> optab to do inline expansion of __builtin_memmove().
>>
>> Because what needs to be done for memmove() is very similar to memcpy(), I have
>> just added a bool parm "might_overlap" to several of the functions involved so
>> the same functions can handle both. The name might_overlap comes from the fact
>> that if we still have a memmove() call at expand, this means
>> gimple_fold_builtin_memory_op() was not able to prove that the source and
>> destination do not overlap.
>>
>> There are a few places where might_overlap gets used to keep us from trying to
>> use the by-pieces infrastructure or generate a copy loop, as neither of those
>> things will work correctly if source and destination overlap.
>>
>> I've restructured things slightly in emit_block_move_hints() so that we can
>> try the pattern first if we already know that by-pieces won't work. This way
>> we can bail out immediately in the might_overlap case.
>>
>> Bootstrap/regtest passed on ppc64le, in progress on x86_64. If everything passes,
>> is this ok for trunk?
>>
>>
>> 2019-09-27  Aaron Sawdey <acsawdey@linux.ibm.com>
>>
>> 	* builtins.c (expand_builtin_memory_copy_args): Add might_overlap parm.
>> 	(expand_builtin_memcpy): Use might_overlap parm.
>> 	(expand_builtin_mempcpy_args): Use might_overlap parm.
>> 	(expand_builtin_memmove): Call expand_builtin_memory_copy_args.
>> 	(expand_builtin_memory_copy_args): Add might_overlap parm.
>> 	* expr.c (emit_block_move_via_cpymem): Rename to
>> 	emit_block_move_via_pattern, add might_overlap parm, use cpymem
>> 	or movmem optab as appropriate.
>> 	(emit_block_move_hints): Add might_overlap parm, do the right
>> 	thing for might_overlap==true.
>> 	* expr.h (emit_block_move_hints): Update prototype.
>>
>>
>>
>>
>> Index: gcc/builtins.c
>> ===================================================================
>> --- gcc/builtins.c	(revision 276131)
>> +++ gcc/builtins.c	(working copy)
>> @@ -3894,10 +3897,11 @@
>>  			&probable_max_size);
>>    src_str = c_getstr (src);
>>
>> -  /* If SRC is a string constant and block move would be done
>> -     by pieces, we can avoid loading the string from memory
>> -     and only stored the computed constants.  */
>> -  if (src_str
>> +  /* If SRC is a string constant and block move would be done by
>> +     pieces, we can avoid loading the string from memory and only
>> +     stored the computed constants.  I'm not sure if the by pieces
>> +     method works if src/dest are overlapping, so avoid that case.  */
>> +  if (src_str && !might_overlap
> I don't think you need the check here.  c_getstr, when it returns
> somethign useful is going to be returning a string constant.  Think read
> only literals here.  I'm pretty sure overlap isn't going to be possible.

After some digging, I agree -- c_getstr() return a string constant and
that is then used to generate stores of constants.

I've fixed the other issues and also fixed emit_block_move_via_pattern() to
make use of pieces_ok instead of calling can_move_by_pieces() a second
time. The patch I'm actually committing is below.

Thanks for the review!

  Aaron

Index: gcc/builtins.c
===================================================================
--- gcc/builtins.c	(revision 276131)
+++ gcc/builtins.c	(working copy)
@@ -127,7 +127,8 @@
 static rtx expand_builtin_memcpy (tree, rtx);
 static rtx expand_builtin_memory_copy_args (tree dest, tree src, tree len,
 					    rtx target, tree exp,
-					    memop_ret retmode);
+					    memop_ret retmode,
+					    bool might_overlap);
 static rtx expand_builtin_memmove (tree, rtx);
 static rtx expand_builtin_mempcpy (tree, rtx);
 static rtx expand_builtin_mempcpy_args (tree, tree, tree, rtx, tree, memop_ret);
@@ -3790,7 +3791,7 @@
   check_memop_access (exp, dest, src, len);

   return expand_builtin_memory_copy_args (dest, src, len, target, exp,
-					  /*retmode=*/ RETURN_BEGIN);
+					  /*retmode=*/ RETURN_BEGIN, false);
 }

 /* Check a call EXP to the memmove built-in for validity.
@@ -3797,7 +3798,7 @@
    Return NULL_RTX on both success and failure.  */

 static rtx
-expand_builtin_memmove (tree exp, rtx)
+expand_builtin_memmove (tree exp, rtx target)
 {
   if (!validate_arglist (exp,
  			 POINTER_TYPE, POINTER_TYPE, INTEGER_TYPE, VOID_TYPE))
@@ -3809,7 +3810,8 @@

   check_memop_access (exp, dest, src, len);

-  return NULL_RTX;
+  return expand_builtin_memory_copy_args (dest, src, len, target, exp,
+					  /*retmode=*/ RETURN_BEGIN, true);
 }

 /* Expand a call EXP to the mempcpy builtin.
@@ -3858,7 +3860,8 @@

 static rtx
 expand_builtin_memory_copy_args (tree dest, tree src, tree len,
-				 rtx target, tree exp, memop_ret retmode)
+				 rtx target, tree exp, memop_ret retmode,
+				 bool might_overlap)
 {
   const char *src_str;
   unsigned int src_align = get_pointer_alignment (src);
@@ -3894,9 +3897,12 @@
 			&probable_max_size);
   src_str = c_getstr (src);

-  /* If SRC is a string constant and block move would be done
-     by pieces, we can avoid loading the string from memory
-     and only stored the computed constants.  */
+  /* If SRC is a string constant and block move would be done by
+     pieces, we can avoid loading the string from memory and only
+     stored the computed constants.  This works in the overlap
+     (memmove) case as well because store_by_pieces just generates a
+     series of stores of constants from the string constant returned
+     by c_getstr().  */
   if (src_str
       && CONST_INT_P (len_rtx)
       && (unsigned HOST_WIDE_INT) INTVAL (len_rtx) <= strlen (src_str) + 1
@@ -3923,6 +3929,7 @@
     method = BLOCK_OP_TAILCALL;
   bool use_mempcpy_call = (targetm.libc_has_fast_function (BUILT_IN_MEMPCPY)
 			   && retmode == RETURN_END
+			   && !might_overlap
 			   && target != const0_rtx);
   if (use_mempcpy_call)
     method = BLOCK_OP_NO_LIBCALL_RET;
@@ -3929,7 +3936,7 @@
   dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx, method,
 				     expected_align, expected_size,
 				     min_size, max_size, probable_max_size,
-				     use_mempcpy_call, &is_move_done);
+				     use_mempcpy_call, &is_move_done, might_overlap);

   /* Bail out when a mempcpy call would be expanded as libcall and when
      we have a target that provides a fast implementation
@@ -3962,7 +3969,7 @@
 			     rtx target, tree orig_exp, memop_ret retmode)
 {
   return expand_builtin_memory_copy_args (dest, src, len, target, orig_exp,
-					  retmode);
+					  retmode, false);
 }

 /* Expand into a movstr instruction, if one is available.  Return NULL_RTX if
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	(revision 276131)
+++ gcc/expr.c	(working copy)
@@ -73,9 +73,10 @@
 int cse_not_expected;

 static bool block_move_libcall_safe_for_call_parm (void);
-static bool emit_block_move_via_cpymem (rtx, rtx, rtx, unsigned, unsigned, HOST_WIDE_INT,
-					unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
-					unsigned HOST_WIDE_INT);
+static bool emit_block_move_via_pattern (rtx, rtx, rtx, unsigned, unsigned,
+					 HOST_WIDE_INT, unsigned HOST_WIDE_INT,
+					 unsigned HOST_WIDE_INT,
+					 unsigned HOST_WIDE_INT, bool);
 static void emit_block_move_via_loop (rtx, rtx, rtx, unsigned);
 static void clear_by_pieces (rtx, unsigned HOST_WIDE_INT, unsigned int);
 static rtx_insn *compress_float_constant (rtx, rtx);
@@ -1562,7 +1563,8 @@
 		       unsigned HOST_WIDE_INT min_size,
 		       unsigned HOST_WIDE_INT max_size,
 		       unsigned HOST_WIDE_INT probable_max_size,
-		       bool bail_out_libcall, bool *is_move_done)
+		       bool bail_out_libcall, bool *is_move_done,
+		       bool might_overlap)
 {
   int may_use_call;
   rtx retval = 0;
@@ -1622,13 +1624,30 @@
       set_mem_size (y, const_size);
     }

-  if (CONST_INT_P (size) && can_move_by_pieces (INTVAL (size), align))
+  bool pieces_ok = can_move_by_pieces (INTVAL (size), align);
+  bool pattern_ok = false;
+
+  if (!CONST_INT_P (size) || !pieces_ok || might_overlap)
+    {
+      pattern_ok =
+	emit_block_move_via_pattern (x, y, size, align,
+				     expected_align, expected_size,
+				     min_size, max_size, probable_max_size,
+				     might_overlap);
+      if (!pattern_ok && might_overlap)
+	{
+	  /* Do not try any of the other methods below as they are not safe
+	     for overlapping moves.  */
+	  *is_move_done = false;
+	  return retval;
+	}
+    }
+
+  if (pattern_ok)
+    ;
+  else if (CONST_INT_P (size) && pieces_ok)
     move_by_pieces (x, y, INTVAL (size), align, RETURN_BEGIN);
-  else if (emit_block_move_via_cpymem (x, y, size, align,
-				       expected_align, expected_size,
-				       min_size, max_size, probable_max_size))
-    ;
-  else if (may_use_call
+  else if (may_use_call && !might_overlap
 	   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x))
 	   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (y)))
     {
@@ -1645,7 +1664,8 @@
       retval = emit_block_copy_via_libcall (x, y, size,
 					    method == BLOCK_OP_TAILCALL);
     }
-
+  else if (might_overlap)
+    *is_move_done = false;
   else
     emit_block_move_via_loop (x, y, size, align);

@@ -1721,15 +1741,26 @@
   return true;
 }

-/* A subroutine of emit_block_move.  Expand a cpymem pattern;
-   return true if successful.  */
+/* A subroutine of emit_block_move.  Expand a cpymem or movmem pattern;
+   return true if successful.
+
+   X is the destination of the copy or move.
+   Y is the source of the copy or move.
+   SIZE is the size of the block to be moved.

+   MIGHT_OVERLAP indicates this originated with expansion of a
+   builtin_memmove() and the source and destination blocks may
+   overlap.
+  */
+
 static bool
-emit_block_move_via_cpymem (rtx x, rtx y, rtx size, unsigned int align,
-			    unsigned int expected_align, HOST_WIDE_INT expected_size,
-			    unsigned HOST_WIDE_INT min_size,
-			    unsigned HOST_WIDE_INT max_size,
-			    unsigned HOST_WIDE_INT probable_max_size)
+emit_block_move_via_pattern (rtx x, rtx y, rtx size, unsigned int align,
+			     unsigned int expected_align,
+			     HOST_WIDE_INT expected_size,
+			     unsigned HOST_WIDE_INT min_size,
+			     unsigned HOST_WIDE_INT max_size,
+			     unsigned HOST_WIDE_INT probable_max_size,
+			     bool might_overlap)
 {
   if (expected_align < align)
     expected_align = align;
@@ -1752,7 +1783,11 @@
   FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_INT)
     {
       scalar_int_mode mode = mode_iter.require ();
-      enum insn_code code = direct_optab_handler (cpymem_optab, mode);
+      enum insn_code code;
+      if (might_overlap)
+	code = direct_optab_handler (movmem_optab, mode);
+      else
+	code = direct_optab_handler (cpymem_optab, mode);

       if (code != CODE_FOR_nothing
 	  /* We don't need MODE to be narrower than BITS_PER_HOST_WIDE_INT
Index: gcc/expr.h
===================================================================
--- gcc/expr.h	(revision 276131)
+++ gcc/expr.h	(working copy)
@@ -116,7 +116,8 @@
 				  unsigned HOST_WIDE_INT,
 				  unsigned HOST_WIDE_INT,
 				  bool bail_out_libcall = false,
-				  bool *is_move_done = NULL);
+				  bool *is_move_done = NULL,
+				  bool might_overlap = false);
 extern rtx emit_block_cmp_hints (rtx, rtx, rtx, tree, rtx, bool,
 				 by_pieces_constfn, void *);
 extern bool emit_storent_insn (rtx to, rtx from);
Jakub Jelinek Oct. 2, 2019, 10:35 p.m. UTC | #3
On Wed, Oct 02, 2019 at 09:21:23AM -0500, Aaron Sawdey wrote:
> >> 2019-09-27  Aaron Sawdey <acsawdey@linux.ibm.com>
> >>
> >> 	* builtins.c (expand_builtin_memory_copy_args): Add might_overlap parm.
> >> 	(expand_builtin_memcpy): Use might_overlap parm.
> >> 	(expand_builtin_mempcpy_args): Use might_overlap parm.
> >> 	(expand_builtin_memmove): Call expand_builtin_memory_copy_args.
> >> 	(expand_builtin_memory_copy_args): Add might_overlap parm.
> >> 	* expr.c (emit_block_move_via_cpymem): Rename to
> >> 	emit_block_move_via_pattern, add might_overlap parm, use cpymem
> >> 	or movmem optab as appropriate.
> >> 	(emit_block_move_hints): Add might_overlap parm, do the right
> >> 	thing for might_overlap==true.
> >> 	* expr.h (emit_block_move_hints): Update prototype.

> @@ -1622,13 +1624,30 @@
>        set_mem_size (y, const_size);
>      }
> 
> -  if (CONST_INT_P (size) && can_move_by_pieces (INTVAL (size), align))
> +  bool pieces_ok = can_move_by_pieces (INTVAL (size), align);
> +  bool pattern_ok = false;
> +
> +  if (!CONST_INT_P (size) || !pieces_ok || might_overlap)
...

This change broke rtl checking bootstrap.
You can't use INTVAL on size that isn't CONST_INT_P.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
committed to trunk as obvious:

2019-10-03  Jakub Jelinek  <jakub@redhat.com>

	PR rtl-optimization/91976
	* expr.c (emit_block_move_hints): Don't call can_move_by_pieces if
	size is not CONST_INT_P, set pieces_ok to false in that case.  Simplify
	CONST_INT_P (size) && pieces_ok to pieces_ok.  Formatting fix.

--- gcc/expr.c.jj	2019-10-02 16:35:20.977451346 +0200
+++ gcc/expr.c	2019-10-02 21:47:54.900724874 +0200
@@ -1624,16 +1624,18 @@ emit_block_move_hints (rtx x, rtx y, rtx
       set_mem_size (y, const_size);
     }
 
-  bool pieces_ok = can_move_by_pieces (INTVAL (size), align);
+  bool pieces_ok = false;
+  if (CONST_INT_P (size))
+    pieces_ok = can_move_by_pieces (INTVAL (size), align);
   bool pattern_ok = false;
 
-  if (!CONST_INT_P (size) || !pieces_ok || might_overlap)
+  if (!pieces_ok || might_overlap)
     {
-      pattern_ok = 
-	emit_block_move_via_pattern (x, y, size, align,
-				     expected_align, expected_size,
-				     min_size, max_size, probable_max_size,
-				     might_overlap);
+      pattern_ok
+	= emit_block_move_via_pattern (x, y, size, align,
+				       expected_align, expected_size,
+				       min_size, max_size, probable_max_size,
+				       might_overlap);
       if (!pattern_ok && might_overlap)
 	{
 	  /* Do not try any of the other methods below as they are not safe
@@ -1645,7 +1647,7 @@ emit_block_move_hints (rtx x, rtx y, rtx
 
   if (pattern_ok)
     ;
-  else if (CONST_INT_P (size) && pieces_ok)
+  else if (pieces_ok)
     move_by_pieces (x, y, INTVAL (size), align, RETURN_BEGIN);
   else if (may_use_call && !might_overlap
 	   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x))


	Jakub
Aaron Sawdey Oct. 2, 2019, 10:44 p.m. UTC | #4
On 10/2/19 5:35 PM, Jakub Jelinek wrote:
> On Wed, Oct 02, 2019 at 09:21:23AM -0500, Aaron Sawdey wrote:
>>>> 2019-09-27  Aaron Sawdey <acsawdey@linux.ibm.com>
>>>>
>>>> 	* builtins.c (expand_builtin_memory_copy_args): Add might_overlap parm.
>>>> 	(expand_builtin_memcpy): Use might_overlap parm.
>>>> 	(expand_builtin_mempcpy_args): Use might_overlap parm.
>>>> 	(expand_builtin_memmove): Call expand_builtin_memory_copy_args.
>>>> 	(expand_builtin_memory_copy_args): Add might_overlap parm.
>>>> 	* expr.c (emit_block_move_via_cpymem): Rename to
>>>> 	emit_block_move_via_pattern, add might_overlap parm, use cpymem
>>>> 	or movmem optab as appropriate.
>>>> 	(emit_block_move_hints): Add might_overlap parm, do the right
>>>> 	thing for might_overlap==true.
>>>> 	* expr.h (emit_block_move_hints): Update prototype.
> 
>> @@ -1622,13 +1624,30 @@
>>        set_mem_size (y, const_size);
>>      }
>>
>> -  if (CONST_INT_P (size) && can_move_by_pieces (INTVAL (size), align))
>> +  bool pieces_ok = can_move_by_pieces (INTVAL (size), align);
>> +  bool pattern_ok = false;
>> +
>> +  if (!CONST_INT_P (size) || !pieces_ok || might_overlap)
> ...
> 
> This change broke rtl checking bootstrap.
> You can't use INTVAL on size that isn't CONST_INT_P.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
> committed to trunk as obvious:

Jakub,
  Sorry about that! Now that you point it out, it's obvious. But what it means
for me is that I need to be in the habit of bootstrapping with --enable-checking=rtl
when I make these changes.

  Aaron
Richard Sandiford Oct. 3, 2019, 10:14 a.m. UTC | #5
Aaron Sawdey <acsawdey@linux.ibm.com> writes:
> On 10/2/19 5:35 PM, Jakub Jelinek wrote:
>> On Wed, Oct 02, 2019 at 09:21:23AM -0500, Aaron Sawdey wrote:
>>>>> 2019-09-27  Aaron Sawdey <acsawdey@linux.ibm.com>
>>>>>
>>>>> 	* builtins.c (expand_builtin_memory_copy_args): Add might_overlap parm.
>>>>> 	(expand_builtin_memcpy): Use might_overlap parm.
>>>>> 	(expand_builtin_mempcpy_args): Use might_overlap parm.
>>>>> 	(expand_builtin_memmove): Call expand_builtin_memory_copy_args.
>>>>> 	(expand_builtin_memory_copy_args): Add might_overlap parm.
>>>>> 	* expr.c (emit_block_move_via_cpymem): Rename to
>>>>> 	emit_block_move_via_pattern, add might_overlap parm, use cpymem
>>>>> 	or movmem optab as appropriate.
>>>>> 	(emit_block_move_hints): Add might_overlap parm, do the right
>>>>> 	thing for might_overlap==true.
>>>>> 	* expr.h (emit_block_move_hints): Update prototype.
>> 
>>> @@ -1622,13 +1624,30 @@
>>>        set_mem_size (y, const_size);
>>>      }
>>>
>>> -  if (CONST_INT_P (size) && can_move_by_pieces (INTVAL (size), align))
>>> +  bool pieces_ok = can_move_by_pieces (INTVAL (size), align);
>>> +  bool pattern_ok = false;
>>> +
>>> +  if (!CONST_INT_P (size) || !pieces_ok || might_overlap)
>> ...
>> 
>> This change broke rtl checking bootstrap.
>> You can't use INTVAL on size that isn't CONST_INT_P.
>> 
>> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
>> committed to trunk as obvious:
>
> Jakub,
>   Sorry about that! Now that you point it out, it's obvious. But what it means
> for me is that I need to be in the habit of bootstrapping with --enable-checking=rtl
> when I make these changes.

Confusingly, it needs to be --enable-checking=yes,extra,rtl for rtl
to be a pure addition to the default.

I guess it's time for a new entry in the semi-regular series
"should rtl checking be enabled by default for development builds?" :-)
Is it realy so expensive that it needs to be off by default?  And do
we gain much from having it off by default when many developers just
enable it anyway?  The status quo seems like an unnecessary trap to me.

"rtl" checking is an everyday checking option that regularly finds
problems, unlike say "df" and "fold" (which very rarely find problems)
and "gcac" (which definitely isn't an everyday option).

Richard
Jakub Jelinek Oct. 3, 2019, 10:20 a.m. UTC | #6
On Thu, Oct 03, 2019 at 11:14:25AM +0100, Richard Sandiford wrote:
> Confusingly, it needs to be --enable-checking=yes,extra,rtl for rtl
> to be a pure addition to the default.
> 
> I guess it's time for a new entry in the semi-regular series
> "should rtl checking be enabled by default for development builds?" :-)
> Is it realy so expensive that it needs to be off by default?  And do
> we gain much from having it off by default when many developers just
> enable it anyway?  The status quo seems like an unnecessary trap to me.
> 
> "rtl" checking is an everyday checking option that regularly finds
> problems, unlike say "df" and "fold" (which very rarely find problems)
> and "gcac" (which definitely isn't an everyday option).

rtl checking can be everyday checking option (it is e.g. for me), but I'm
not sure if it needs to be forced onto all the developers, e.g. those
working on FEs or GIMPLE will only very rarely trigger some RTL checking
failure with their changes, and it is quite expensive (mainly, the insn*.c
sources when they contain RTL checking take much longer to compile).
Perhaps we should be just recommending --enable-checking=yes,extra,rtl for
those that test RTL/backend patches?

	Jakub
Richard Sandiford Oct. 3, 2019, 10:50 a.m. UTC | #7
Jakub Jelinek <jakub@redhat.com> writes:
> On Thu, Oct 03, 2019 at 11:14:25AM +0100, Richard Sandiford wrote:
>> Confusingly, it needs to be --enable-checking=yes,extra,rtl for rtl
>> to be a pure addition to the default.
>> 
>> I guess it's time for a new entry in the semi-regular series
>> "should rtl checking be enabled by default for development builds?" :-)
>> Is it realy so expensive that it needs to be off by default?  And do
>> we gain much from having it off by default when many developers just
>> enable it anyway?  The status quo seems like an unnecessary trap to me.
>> 
>> "rtl" checking is an everyday checking option that regularly finds
>> problems, unlike say "df" and "fold" (which very rarely find problems)
>> and "gcac" (which definitely isn't an everyday option).
>
> rtl checking can be everyday checking option (it is e.g. for me), but I'm
> not sure if it needs to be forced onto all the developers, e.g. those
> working on FEs or GIMPLE will only very rarely trigger some RTL checking
> failure with their changes, and it is quite expensive (mainly, the insn*.c
> sources when they contain RTL checking take much longer to compile).
> Perhaps we should be just recommending --enable-checking=yes,extra,rtl for
> those that test RTL/backend patches?

Maybe, but the problem is that someone who doesn't work on RTL/backend
patches very often might not be aware of or think about adding the
option specially.  We'd also benefit if as many CI builds as possible
enabled rtl checking.

So IMO the default should be --enable-checking=yes,extra,rtl so that
we get better coverage.  Then people who want faster bootstrap/test
times can use --enable-checking=yes,extra if they "know" it's safe for
the work they're doing (with that being a recognised thing to do -- it's
OK not to catch latent rtl problems).

Richard
Aaron Sawdey Oct. 3, 2019, 10:22 p.m. UTC | #8
On 10/2/19 5:44 PM, Aaron Sawdey wrote:
> On 10/2/19 5:35 PM, Jakub Jelinek wrote:
>> On Wed, Oct 02, 2019 at 09:21:23AM -0500, Aaron Sawdey wrote:
>>>>> 2019-09-27  Aaron Sawdey <acsawdey@linux.ibm.com>
>>>>>
>>>>> 	* builtins.c (expand_builtin_memory_copy_args): Add might_overlap parm.
>>>>> 	(expand_builtin_memcpy): Use might_overlap parm.
>>>>> 	(expand_builtin_mempcpy_args): Use might_overlap parm.
>>>>> 	(expand_builtin_memmove): Call expand_builtin_memory_copy_args.
>>>>> 	(expand_builtin_memory_copy_args): Add might_overlap parm.
>>>>> 	* expr.c (emit_block_move_via_cpymem): Rename to
>>>>> 	emit_block_move_via_pattern, add might_overlap parm, use cpymem
>>>>> 	or movmem optab as appropriate.
>>>>> 	(emit_block_move_hints): Add might_overlap parm, do the right
>>>>> 	thing for might_overlap==true.
>>>>> 	* expr.h (emit_block_move_hints): Update prototype.
>>
>>> @@ -1622,13 +1624,30 @@
>>>        set_mem_size (y, const_size);
>>>      }
>>>
>>> -  if (CONST_INT_P (size) && can_move_by_pieces (INTVAL (size), align))
>>> +  bool pieces_ok = can_move_by_pieces (INTVAL (size), align);
>>> +  bool pattern_ok = false;
>>> +
>>> +  if (!CONST_INT_P (size) || !pieces_ok || might_overlap)
>> ...
>>
>> This change broke rtl checking bootstrap.
>> You can't use INTVAL on size that isn't CONST_INT_P.
>>
>> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
>> committed to trunk as obvious:
> 
> Jakub,
>   Sorry about that! Now that you point it out, it's obvious. But what it means
> for me is that I need to be in the habit of bootstrapping with --enable-checking=rtl
> when I make these changes.

I stared at this for a bit and came up with a slightly cleaner fix that is one less line:

2019-10-03  Aaron Sawdey <acsawdey@linux.ibm.com>

	* expr.c (emit_block_move_hints): Slightly cleaner fix to
	can_move_by_pieces issue.

Index: gcc/expr.c
===================================================================
--- gcc/expr.c	(revision 276516)
+++ gcc/expr.c	(working copy)
@@ -1624,9 +1624,8 @@
       set_mem_size (y, const_size);
     }

-  bool pieces_ok = false;
-  if (CONST_INT_P (size))
-    pieces_ok = can_move_by_pieces (INTVAL (size), align);
+  bool pieces_ok = CONST_INT_P (size)
+    && can_move_by_pieces (INTVAL (size), align);
   bool pattern_ok = false;

   if (!pieces_ok || might_overlap)

Bootstrap/regtest (with --enable-checking=yes,rtl,tree) ok on ppc64le (power9),
committed as obvious.
Jakub Jelinek Oct. 4, 2019, 6:27 a.m. UTC | #9
On Thu, Oct 03, 2019 at 05:22:49PM -0500, Aaron Sawdey wrote:
> --- gcc/expr.c	(revision 276516)
> +++ gcc/expr.c	(working copy)
> @@ -1624,9 +1624,8 @@
>        set_mem_size (y, const_size);
>      }
> 
> -  bool pieces_ok = false;
> -  if (CONST_INT_P (size))
> -    pieces_ok = can_move_by_pieces (INTVAL (size), align);
> +  bool pieces_ok = CONST_INT_P (size)
> +    && can_move_by_pieces (INTVAL (size), align);
>    bool pattern_ok = false;
> 
>    if (!pieces_ok || might_overlap)
> 
> Bootstrap/regtest (with --enable-checking=yes,rtl,tree) ok on ppc64le (power9),
> committed as obvious.

The formatting is wrong.  Either it needs to be:
  bool pieces_ok
    = CONST_INT_P (size) && can_move_by_pieces (INTVAL (size), align);
or
  bool pieces_ok = (CONST_INT_P (size)
		    && can_move_by_pieces (INTVAL (size), align));

	Jakub

Patch
diff mbox series

Index: gcc/builtins.c
===================================================================
--- gcc/builtins.c	(revision 276131)
+++ gcc/builtins.c	(working copy)
@@ -127,7 +127,8 @@ 
 static rtx expand_builtin_memcpy (tree, rtx);
 static rtx expand_builtin_memory_copy_args (tree dest, tree src, tree len,
 					    rtx target, tree exp,
-					    memop_ret retmode);
+					    memop_ret retmode,
+					    bool might_overlap);
 static rtx expand_builtin_memmove (tree, rtx);
 static rtx expand_builtin_mempcpy (tree, rtx);
 static rtx expand_builtin_mempcpy_args (tree, tree, tree, rtx, tree, memop_ret);
@@ -3790,7 +3791,7 @@ 
   check_memop_access (exp, dest, src, len);

   return expand_builtin_memory_copy_args (dest, src, len, target, exp,
-					  /*retmode=*/ RETURN_BEGIN);
+					  /*retmode=*/ RETURN_BEGIN, false);
 }

 /* Check a call EXP to the memmove built-in for validity.
@@ -3797,7 +3798,7 @@ 
    Return NULL_RTX on both success and failure.  */

 static rtx
-expand_builtin_memmove (tree exp, rtx)
+expand_builtin_memmove (tree exp, rtx target)
 {
   if (!validate_arglist (exp,
  			 POINTER_TYPE, POINTER_TYPE, INTEGER_TYPE, VOID_TYPE))
@@ -3809,7 +3810,8 @@ 

   check_memop_access (exp, dest, src, len);

-  return NULL_RTX;
+  return expand_builtin_memory_copy_args (dest, src, len, target, exp,
+					  /*retmode=*/ RETURN_BEGIN, true);
 }

 /* Expand a call EXP to the mempcpy builtin.
@@ -3858,7 +3860,8 @@ 

 static rtx
 expand_builtin_memory_copy_args (tree dest, tree src, tree len,
-				 rtx target, tree exp, memop_ret retmode)
+				 rtx target, tree exp, memop_ret retmode,
+				 bool might_overlap)
 {
   const char *src_str;
   unsigned int src_align = get_pointer_alignment (src);
@@ -3894,10 +3897,11 @@ 
 			&probable_max_size);
   src_str = c_getstr (src);

-  /* If SRC is a string constant and block move would be done
-     by pieces, we can avoid loading the string from memory
-     and only stored the computed constants.  */
-  if (src_str
+  /* If SRC is a string constant and block move would be done by
+     pieces, we can avoid loading the string from memory and only
+     stored the computed constants.  I'm not sure if the by pieces
+     method works if src/dest are overlapping, so avoid that case.  */
+  if (src_str && !might_overlap
       && CONST_INT_P (len_rtx)
       && (unsigned HOST_WIDE_INT) INTVAL (len_rtx) <= strlen (src_str) + 1
       && can_store_by_pieces (INTVAL (len_rtx), builtin_memcpy_read_str,
@@ -3922,7 +3926,7 @@ 
       && (retmode == RETURN_BEGIN || target == const0_rtx))
     method = BLOCK_OP_TAILCALL;
   bool use_mempcpy_call = (targetm.libc_has_fast_function (BUILT_IN_MEMPCPY)
-			   && retmode == RETURN_END
+			   && retmode == RETURN_END && !might_overlap
 			   && target != const0_rtx);
   if (use_mempcpy_call)
     method = BLOCK_OP_NO_LIBCALL_RET;
@@ -3929,7 +3933,7 @@ 
   dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx, method,
 				     expected_align, expected_size,
 				     min_size, max_size, probable_max_size,
-				     use_mempcpy_call, &is_move_done);
+				     use_mempcpy_call, &is_move_done, might_overlap);

   /* Bail out when a mempcpy call would be expanded as libcall and when
      we have a target that provides a fast implementation
@@ -3962,7 +3966,7 @@ 
 			     rtx target, tree orig_exp, memop_ret retmode)
 {
   return expand_builtin_memory_copy_args (dest, src, len, target, orig_exp,
-					  retmode);
+					  retmode, false);
 }

 /* Expand into a movstr instruction, if one is available.  Return NULL_RTX if
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	(revision 276131)
+++ gcc/expr.c	(working copy)
@@ -73,9 +73,10 @@ 
 int cse_not_expected;

 static bool block_move_libcall_safe_for_call_parm (void);
-static bool emit_block_move_via_cpymem (rtx, rtx, rtx, unsigned, unsigned, HOST_WIDE_INT,
-					unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
-					unsigned HOST_WIDE_INT);
+static bool emit_block_move_via_pattern (rtx, rtx, rtx, unsigned, unsigned,
+					 HOST_WIDE_INT, unsigned HOST_WIDE_INT,
+					 unsigned HOST_WIDE_INT,
+					 unsigned HOST_WIDE_INT, bool);
 static void emit_block_move_via_loop (rtx, rtx, rtx, unsigned);
 static void clear_by_pieces (rtx, unsigned HOST_WIDE_INT, unsigned int);
 static rtx_insn *compress_float_constant (rtx, rtx);
@@ -1562,7 +1563,8 @@ 
 		       unsigned HOST_WIDE_INT min_size,
 		       unsigned HOST_WIDE_INT max_size,
 		       unsigned HOST_WIDE_INT probable_max_size,
-		       bool bail_out_libcall, bool *is_move_done)
+		       bool bail_out_libcall, bool *is_move_done,
+		       bool might_overlap)
 {
   int may_use_call;
   rtx retval = 0;
@@ -1622,13 +1624,29 @@ 
       set_mem_size (y, const_size);
     }

-  if (CONST_INT_P (size) && can_move_by_pieces (INTVAL (size), align))
+  bool pieces_ok = can_move_by_pieces (INTVAL (size), align);
+  bool pattern_ok = false;
+
+  if (!CONST_INT_P (size) || !pieces_ok || might_overlap)
+    {
+      pattern_ok =
+	emit_block_move_via_pattern (x, y, size, align,
+				     expected_align, expected_size,
+				     min_size, max_size, probable_max_size,
+				     might_overlap);
+      if (!pattern_ok && might_overlap)
+	{
+	  /* Do not try any of the other methods below as they are not safe
+	     for overlapping moves.  */
+	  *is_move_done = false;
+	  return retval;
+	}
+    }
+
+  if (pattern_ok) ;
+  else if (CONST_INT_P (size) && can_move_by_pieces (INTVAL (size), align))
     move_by_pieces (x, y, INTVAL (size), align, RETURN_BEGIN);
-  else if (emit_block_move_via_cpymem (x, y, size, align,
-				       expected_align, expected_size,
-				       min_size, max_size, probable_max_size))
-    ;
-  else if (may_use_call
+  else if (may_use_call && !might_overlap
 	   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x))
 	   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (y)))
     {
@@ -1645,7 +1663,8 @@ 
       retval = emit_block_copy_via_libcall (x, y, size,
 					    method == BLOCK_OP_TAILCALL);
     }
-
+  else if (might_overlap)
+    *is_move_done = false;
   else
     emit_block_move_via_loop (x, y, size, align);

@@ -1721,15 +1740,25 @@ 
   return true;
 }

-/* A subroutine of emit_block_move.  Expand a cpymem pattern;
-   return true if successful.  */
+/* A subroutine of emit_block_move.  Expand a cpymem or movmem pattern;
+   return true if successful.
+
+   X is the destination of the copy or move.
+   Y is the source of the copy or move.
+   SIZE is the size of the block to be moved.

+   MIGHT_OVERLAP indicates this originated with expansion of a
+   builtin_memmove() and the source and destination blocks may
+   overlap.
+  */
+
 static bool
-emit_block_move_via_cpymem (rtx x, rtx y, rtx size, unsigned int align,
-			    unsigned int expected_align, HOST_WIDE_INT expected_size,
-			    unsigned HOST_WIDE_INT min_size,
-			    unsigned HOST_WIDE_INT max_size,
-			    unsigned HOST_WIDE_INT probable_max_size)
+emit_block_move_via_pattern (rtx x, rtx y, rtx size, unsigned int align,
+			     unsigned int expected_align, HOST_WIDE_INT expected_size,
+			     unsigned HOST_WIDE_INT min_size,
+			     unsigned HOST_WIDE_INT max_size,
+			     unsigned HOST_WIDE_INT probable_max_size,
+			     bool might_overlap)
 {
   if (expected_align < align)
     expected_align = align;
@@ -1752,7 +1781,11 @@ 
   FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_INT)
     {
       scalar_int_mode mode = mode_iter.require ();
-      enum insn_code code = direct_optab_handler (cpymem_optab, mode);
+      enum insn_code code;
+      if (might_overlap)
+	code = direct_optab_handler (movmem_optab, mode);
+      else
+	code = direct_optab_handler (cpymem_optab, mode);

       if (code != CODE_FOR_nothing
 	  /* We don't need MODE to be narrower than BITS_PER_HOST_WIDE_INT
Index: gcc/expr.h
===================================================================
--- gcc/expr.h	(revision 276131)
+++ gcc/expr.h	(working copy)
@@ -116,7 +116,8 @@ 
 				  unsigned HOST_WIDE_INT,
 				  unsigned HOST_WIDE_INT,
 				  bool bail_out_libcall = false,
-				  bool *is_move_done = NULL);
+				  bool *is_move_done = NULL,
+				  bool might_overlap = false);
 extern rtx emit_block_cmp_hints (rtx, rtx, rtx, tree, rtx, bool,
 				 by_pieces_constfn, void *);
 extern bool emit_storent_insn (rtx to, rtx from);