diff mbox series

Optimize store_expr from STRING_CST [PR95052]

Message ID 20200512081116.GE8462@tucnak
State New
Headers show
Series Optimize store_expr from STRING_CST [PR95052] | expand

Commit Message

Jakub Jelinek May 12, 2020, 8:11 a.m. UTC
Hi!

In the following testcase, store_expr of e.g. 97 bytes long string literal
into 1MB long array is implemented by copying the 97 bytes from .rodata
section, followed by clearing the remaining bytes.  But, as the STRING_CST
has type char[1024*1024], we actually allocate whole 1MB in .rodata section
for it, even when we only use the first 97 bytes from that.

The following patch tweaks it so that if we are going to initialize only the
small part from it, we don't emit all the zeros that we never use after it.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-05-12  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/95052
	* expr.c (store_expr): If expr_size is constant and significantly
	larger than TREE_STRING_LENGTH, set temp to just the
	TREE_STRING_LENGTH portion of the STRING_CST.

	* gcc.target/i386/pr95052.c: New test.


	Jakub

Comments

Li, Pan2 via Gcc-patches May 14, 2020, 4:10 p.m. UTC | #1
On Tue, 2020-05-12 at 10:12 +0200, Jakub Jelinek wrote:
> Hi!
> 
> In the following testcase, store_expr of e.g. 97 bytes long string literal
> into 1MB long array is implemented by copying the 97 bytes from .rodata
> section, followed by clearing the remaining bytes.  But, as the STRING_CST
> has type char[1024*1024], we actually allocate whole 1MB in .rodata section
> for it, even when we only use the first 97 bytes from that.
> 
> The following patch tweaks it so that if we are going to initialize only the
> small part from it, we don't emit all the zeros that we never use after it.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2020-05-12  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR middle-end/95052
> 	* expr.c (store_expr): If expr_size is constant and significantly
> 	larger than TREE_STRING_LENGTH, set temp to just the
> 	TREE_STRING_LENGTH portion of the STRING_CST.
> 
> 	* gcc.target/i386/pr95052.c: New test.
OK.  Initially I had a number of concerns, but this only affects the .rodata
that's used to initialize the real object.  The .rodata bits have no other
purpose and aren't otherwise accessable.

jeff
>
Jakub Jelinek May 14, 2020, 4:51 p.m. UTC | #2
On Thu, May 14, 2020 at 10:10:55AM -0600, Jeff Law wrote:
> On Tue, 2020-05-12 at 10:12 +0200, Jakub Jelinek wrote:
> > Hi!
> > 
> > In the following testcase, store_expr of e.g. 97 bytes long string literal
> > into 1MB long array is implemented by copying the 97 bytes from .rodata
> > section, followed by clearing the remaining bytes.  But, as the STRING_CST
> > has type char[1024*1024], we actually allocate whole 1MB in .rodata section
> > for it, even when we only use the first 97 bytes from that.
> > 
> > The following patch tweaks it so that if we are going to initialize only the
> > small part from it, we don't emit all the zeros that we never use after it.
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > 
> > 2020-05-12  Jakub Jelinek  <jakub@redhat.com>
> > 
> > 	PR middle-end/95052
> > 	* expr.c (store_expr): If expr_size is constant and significantly
> > 	larger than TREE_STRING_LENGTH, set temp to just the
> > 	TREE_STRING_LENGTH portion of the STRING_CST.
> > 
> > 	* gcc.target/i386/pr95052.c: New test.
> OK.  Initially I had a number of concerns, but this only affects the .rodata
> that's used to initialize the real object.  The .rodata bits have no other
> purpose and aren't otherwise accessable.

I wasn't sure if it wouldn't be safer to add some bool flag set to true by
the new code and then add gcc_assert in all the other paths, like following
incremental patch.  I believe none of the asserts can trigger right now,
but the code is still adjusting what it plans to use as source before actually
only copying fewer bytes from it, so if somebody changes it later...

Thoughts on that?

--- gcc/expr.c.jj	2020-05-14 18:43:46.324145031 +0200
+++ gcc/expr.c	2020-05-14 18:47:19.003950500 +0200
@@ -5583,6 +5583,7 @@ store_expr (tree exp, rtx target, int ca
   rtx temp;
   rtx alt_rtl = NULL_RTX;
   location_t loc = curr_insn_location ();
+  bool shortened_string_cst = false;
 
   if (VOID_TYPE_P (TREE_TYPE (exp)))
     {
@@ -5771,6 +5772,7 @@ store_expr (tree exp, rtx target, int ca
 		= build_index_type (size_int (TREE_STRING_LENGTH (exp) - 1));
 	      TREE_TYPE (rexp) = build_array_type (TREE_TYPE (TREE_TYPE (exp)),
 						   index);
+	      shortened_string_cst = true;
 	    }
 	}
       temp = expand_expr_real (rexp, tmp_target, GET_MODE (target),
@@ -5787,6 +5789,7 @@ store_expr (tree exp, rtx target, int ca
       && TREE_CODE (exp) != ERROR_MARK
       && GET_MODE (target) != TYPE_MODE (TREE_TYPE (exp)))
     {
+      gcc_assert (!shortened_string_cst);
       if (GET_MODE_CLASS (GET_MODE (target))
 	  != GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (exp)))
 	  && known_eq (GET_MODE_BITSIZE (GET_MODE (target)),
@@ -5839,6 +5842,7 @@ store_expr (tree exp, rtx target, int ca
     {
       if (GET_MODE (temp) != GET_MODE (target) && GET_MODE (temp) != VOIDmode)
 	{
+	  gcc_assert (!shortened_string_cst);
 	  if (GET_MODE (target) == BLKmode)
 	    {
 	      /* Handle calls that return BLKmode values in registers.  */
@@ -5924,6 +5928,8 @@ store_expr (tree exp, rtx target, int ca
 		emit_label (label);
 	    }
 	}
+      else if (shortened_string_cst)
+	gcc_unreachable ();
       /* Handle calls that return values in multiple non-contiguous locations.
 	 The Irix 6 ABI has examples of this.  */
       else if (GET_CODE (target) == PARALLEL)
@@ -5953,6 +5959,8 @@ store_expr (tree exp, rtx target, int ca
 	    emit_move_insn (target, temp);
 	}
     }
+  else
+    gcc_assert (!shortened_string_cst);
 
   return NULL_RTX;
 }


	Jakub
Li, Pan2 via Gcc-patches May 15, 2020, 4:32 p.m. UTC | #3
On Thu, 2020-05-14 at 18:51 +0200, Jakub Jelinek wrote:
> On Thu, May 14, 2020 at 10:10:55AM -0600, Jeff Law wrote:
> > On Tue, 2020-05-12 at 10:12 +0200, Jakub Jelinek wrote:
> > > Hi!
> > > 
> > > In the following testcase, store_expr of e.g. 97 bytes long string literal
> > > into 1MB long array is implemented by copying the 97 bytes from .rodata
> > > section, followed by clearing the remaining bytes.  But, as the STRING_CST
> > > has type char[1024*1024], we actually allocate whole 1MB in .rodata section
> > > for it, even when we only use the first 97 bytes from that.
> > > 
> > > The following patch tweaks it so that if we are going to initialize only
> > > the
> > > small part from it, we don't emit all the zeros that we never use after it.
> > > 
> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > > 
> > > 2020-05-12  Jakub Jelinek  <jakub@redhat.com>
> > > 
> > > 	PR middle-end/95052
> > > 	* expr.c (store_expr): If expr_size is constant and significantly
> > > 	larger than TREE_STRING_LENGTH, set temp to just the
> > > 	TREE_STRING_LENGTH portion of the STRING_CST.
> > > 
> > > 	* gcc.target/i386/pr95052.c: New test.
> > OK.  Initially I had a number of concerns, but this only affects the .rodata
> > that's used to initialize the real object.  The .rodata bits have no other
> > purpose and aren't otherwise accessable.
> 
> I wasn't sure if it wouldn't be safer to add some bool flag set to true by
> the new code and then add gcc_assert in all the other paths, like following
> incremental patch.  I believe none of the asserts can trigger right now,
> but the code is still adjusting what it plans to use as source before actually
> only copying fewer bytes from it, so if somebody changes it later...
> 
> Thoughts on that?
Can't hurt, and debugging the assert tripping is likely a hell of a lot easier
than debugging the resultant incorrect code.   So if it passes, then I'd say go
for it.

jeff
>
diff mbox series

Patch

--- gcc/expr.c.jj	2020-04-16 12:57:16.584912448 +0200
+++ gcc/expr.c	2020-05-11 20:58:46.436785403 +0200
@@ -5749,7 +5749,31 @@  store_expr (tree exp, rtx target, int ca
       /* If we want to use a nontemporal or a reverse order store, force the
 	 value into a register first.  */
       tmp_target = nontemporal || reverse ? NULL_RTX : target;
-      temp = expand_expr_real (exp, tmp_target, GET_MODE (target),
+      tree rexp = exp;
+      if (TREE_CODE (exp) == STRING_CST
+	  && tmp_target == target
+	  && GET_MODE (target) == BLKmode
+	  && TYPE_MODE (TREE_TYPE (exp)) == BLKmode)
+	{
+	  rtx size = expr_size (exp);
+	  if (CONST_INT_P (size)
+	      && size != const0_rtx
+	      && (UINTVAL (size)
+		  > ((unsigned HOST_WIDE_INT) TREE_STRING_LENGTH (exp) + 32)))
+	    {
+	      /* If the STRING_CST has much larger array type than
+		 TREE_STRING_LENGTH, only emit the TREE_STRING_LENGTH part of
+		 it into the rodata section as the code later on will use
+		 memset zero for the remainder anyway.  See PR95052.  */
+	      tmp_target = NULL_RTX;
+	      rexp = copy_node (exp);
+	      tree index
+		= build_index_type (size_int (TREE_STRING_LENGTH (exp) - 1));
+	      TREE_TYPE (rexp) = build_array_type (TREE_TYPE (TREE_TYPE (exp)),
+						   index);
+	    }
+	}
+      temp = expand_expr_real (rexp, tmp_target, GET_MODE (target),
 			       (call_param_p
 				? EXPAND_STACK_PARM : EXPAND_NORMAL),
 			       &alt_rtl, false);
--- gcc/testsuite/gcc.target/i386/pr95052.c.jj	2020-05-11 21:03:05.635238485 +0200
+++ gcc/testsuite/gcc.target/i386/pr95052.c	2020-05-11 21:02:47.473487020 +0200
@@ -0,0 +1,20 @@ 
+/* PR middle-end/95052 */
+/* { dg-do compile } */
+/* { dg-options "-Os -mtune=skylake" } */
+/* Verify we don't waste almost 2 megabytes of .rodata.  */
+/* { dg-final { scan-assembler-not "\.zero\t1048\[0-9]\[0-9]\[0-9]" } } */
+extern void foo (char *, unsigned);
+
+int
+main ()
+{
+  char str[1024 * 1024] =
+    "fooiuhluhpiuhliuhliyfyukyfklyugkiuhpoipoipoipoipoipoipoipoipoipoipoipoipoimipoipiuhoulouihnliuhl";
+  char arr[1024 * 1024] =
+    { 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 6, 2, 3,
+      4, 5, 6, 7, 8, 9, 0, 3, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6,
+      7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 };
+  foo (str, sizeof (str));
+  foo (arr, sizeof (arr));
+  return 0;
+}