diff mbox series

Fix load_gsi computation in store-merging (PR tree-optimization/83047)

Message ID 20171120205821.GM14653@tucnak
State New
Headers show
Series Fix load_gsi computation in store-merging (PR tree-optimization/83047) | expand

Commit Message

Jakub Jelinek Nov. 20, 2017, 8:58 p.m. UTC
Hi!

This is something the bswap pass has been already doing, but not the
new load handling code in store-merging.  If all the loads have the same
vuse, it still doesn't mean they are necessarily in the same basic block.
If they are in multiple bbs, previously we've chosen randomly (well, from
the load corresponding to the first store in the group), now we pick at
least the last basic block.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

It might not be good enough, if some program has
  int x = q[0]; foo (x); int y = q[1]; p[0] = x; p[1] = y; and
foo conditionally exits or aborts or externally throws or loops forever
in case q[1] might not be mapped, then even when both loads are in the
same bb we might put the larger load on the first load rather than the
second.  I think we'd need to compute uids (perhaps lazily) and compare
which stmt comes last.  Thoughts on that?

2017-11-20  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/83047
	* gimple-ssa-store-merging.c
	(imm_store_chain_info::output_merged_store): If the loads with the
	same vuse are in different basic blocks, for load_gsi pick a load
	location that is dominated by the other loads.

	* gcc.dg/pr83047.c: New test.


	Jakub

Comments

Richard Biener Nov. 21, 2017, 8:14 a.m. UTC | #1
On Mon, 20 Nov 2017, Jakub Jelinek wrote:

> Hi!
> 
> This is something the bswap pass has been already doing, but not the
> new load handling code in store-merging.  If all the loads have the same
> vuse, it still doesn't mean they are necessarily in the same basic block.
> If they are in multiple bbs, previously we've chosen randomly (well, from
> the load corresponding to the first store in the group), now we pick at
> least the last basic block.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

> It might not be good enough, if some program has
>   int x = q[0]; foo (x); int y = q[1]; p[0] = x; p[1] = y; and
> foo conditionally exits or aborts or externally throws or loops forever
> in case q[1] might not be mapped, then even when both loads are in the
> same bb we might put the larger load on the first load rather than the
> second.  I think we'd need to compute uids (perhaps lazily) and compare
> which stmt comes last.  Thoughts on that?

What we need to compute is whether we can hoist/sink loads to the
insert location.  Ideally we'd start by hoisting loads upwards
and if we hit a road-block try sinking the other loads (might work
in case of EH / looping forever).

One slight complication is that AFIAK an externally throwing or
endlessly looping (or memory unmapping) function doesn't necessarily
have a VDEF, not even a VUSE.  So we might not catch all stmts that
serve as a barrier by walking the VUSE -> VDEF chain.

We _might_ want to change rules here when to force a VDEF for
simplicity.

With -fnon-call-exceptions even a division might throw externally
so this rule adjustment might not hold.  But OTOH I don't see
easily how that serves as a barrier (unless the combined load
is unaligned in which case later loads might access not mapped
memory?)

Richard.

> 2017-11-20  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR tree-optimization/83047
> 	* gimple-ssa-store-merging.c
> 	(imm_store_chain_info::output_merged_store): If the loads with the
> 	same vuse are in different basic blocks, for load_gsi pick a load
> 	location that is dominated by the other loads.
> 
> 	* gcc.dg/pr83047.c: New test.
> 
> --- gcc/gimple-ssa-store-merging.c.jj	2017-11-17 08:40:25.000000000 +0100
> +++ gcc/gimple-ssa-store-merging.c	2017-11-20 10:37:40.429859947 +0100
> @@ -1857,7 +1857,30 @@ imm_store_chain_info::output_merged_stor
>        store_immediate_info *infol = group->stores.last ();
>        if (gimple_vuse (op.stmt) == gimple_vuse (infol->ops[j].stmt))
>  	{
> -	  load_gsi[j] = gsi_for_stmt (op.stmt);
> +	  /* We can't pick the location randomly; while we've verified
> +	     all the loads have the same vuse, they can be still in different
> +	     basic blocks and we need to pick the one from the last bb:
> +	     int x = q[0];
> +	     if (x == N) return;
> +	     int y = q[1];
> +	     p[0] = x;
> +	     p[1] = y;
> +	     otherwise if we put the wider load at the q[0] load, we might
> +	     segfault if q[1] is not mapped.  */
> +	  basic_block bb = gimple_bb (op.stmt);
> +	  gimple *ostmt = op.stmt;
> +	  store_immediate_info *info;
> +	  FOR_EACH_VEC_ELT (group->stores, i, info)
> +	    {
> +	      gimple *tstmt = info->ops[j].stmt;
> +	      basic_block tbb = gimple_bb (tstmt);
> +	      if (dominated_by_p (CDI_DOMINATORS, tbb, bb))
> +		{
> +		  ostmt = tstmt;
> +		  bb = tbb;
> +		}
> +	    }
> +	  load_gsi[j] = gsi_for_stmt (ostmt);
>  	  load_addr[j]
>  	    = force_gimple_operand_1 (unshare_expr (op.base_addr),
>  				      &load_seq[j], is_gimple_mem_ref_addr,
> --- gcc/testsuite/gcc.dg/pr83047.c.jj	2017-11-20 10:19:26.612657065 +0100
> +++ gcc/testsuite/gcc.dg/pr83047.c	2017-11-20 10:24:15.000000000 +0100
> @@ -0,0 +1,58 @@
> +/* PR tree-optimization/83047 */
> +/* { dg-do run { target mmap } } */
> +/* { dg-options "-O2" } */
> +
> +#include <stddef.h>
> +#include <stdio.h>
> +#include <sys/mman.h>
> +#ifndef MAP_ANONYMOUS
> +#define MAP_ANONYMOUS MAP_ANON
> +#endif
> +#ifndef MAP_ANON
> +#define MAP_ANON 0
> +#endif
> +#ifndef MAP_FAILED
> +#define MAP_FAILED ((void *)-1)
> +#endif
> +#include <stdlib.h>
> +
> +__attribute__((noipa)) void
> +foo (char *p, char *q, int r)
> +{
> +  char a = q[0];
> +  if (r || a == '\0')
> +    return;
> +  char b = q[1];
> +  p[0] = a;
> +  p[1] = b;
> +}
> +
> +int
> +main ()
> +{
> +  char *p = mmap (NULL, 131072, PROT_READ | PROT_WRITE,
> +		  MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> +  if (p == MAP_FAILED)
> +    return 0;
> +  if (munmap (p + 65536, 65536) < 0)
> +    return 0;
> +  p[0] = 'c';
> +  p[1] = 'd';
> +  p[65536 - 2] = 'a';
> +  p[65536 - 1] = 'b';
> +  volatile int r = 1;
> +  foo (p, p + 65536 - 2, r);
> +  if (p[0] != 'c' || p[1] != 'd')
> +    abort ();
> +  r = 0;
> +  foo (p, p + 65536 - 2, r);
> +  if (p[0] != 'a' || p[1] != 'b')
> +    abort ();
> +  p[0] = 'e';
> +  p[1] = 'f';
> +  r = 1;
> +  foo (p, p + 65536 - 1, r);
> +  if (p[0] != 'e' || p[1] != 'f')
> +    abort ();
> +  return 0;
> +}
> 
> 	Jakub
> 
>
diff mbox series

Patch

--- gcc/gimple-ssa-store-merging.c.jj	2017-11-17 08:40:25.000000000 +0100
+++ gcc/gimple-ssa-store-merging.c	2017-11-20 10:37:40.429859947 +0100
@@ -1857,7 +1857,30 @@  imm_store_chain_info::output_merged_stor
       store_immediate_info *infol = group->stores.last ();
       if (gimple_vuse (op.stmt) == gimple_vuse (infol->ops[j].stmt))
 	{
-	  load_gsi[j] = gsi_for_stmt (op.stmt);
+	  /* We can't pick the location randomly; while we've verified
+	     all the loads have the same vuse, they can be still in different
+	     basic blocks and we need to pick the one from the last bb:
+	     int x = q[0];
+	     if (x == N) return;
+	     int y = q[1];
+	     p[0] = x;
+	     p[1] = y;
+	     otherwise if we put the wider load at the q[0] load, we might
+	     segfault if q[1] is not mapped.  */
+	  basic_block bb = gimple_bb (op.stmt);
+	  gimple *ostmt = op.stmt;
+	  store_immediate_info *info;
+	  FOR_EACH_VEC_ELT (group->stores, i, info)
+	    {
+	      gimple *tstmt = info->ops[j].stmt;
+	      basic_block tbb = gimple_bb (tstmt);
+	      if (dominated_by_p (CDI_DOMINATORS, tbb, bb))
+		{
+		  ostmt = tstmt;
+		  bb = tbb;
+		}
+	    }
+	  load_gsi[j] = gsi_for_stmt (ostmt);
 	  load_addr[j]
 	    = force_gimple_operand_1 (unshare_expr (op.base_addr),
 				      &load_seq[j], is_gimple_mem_ref_addr,
--- gcc/testsuite/gcc.dg/pr83047.c.jj	2017-11-20 10:19:26.612657065 +0100
+++ gcc/testsuite/gcc.dg/pr83047.c	2017-11-20 10:24:15.000000000 +0100
@@ -0,0 +1,58 @@ 
+/* PR tree-optimization/83047 */
+/* { dg-do run { target mmap } } */
+/* { dg-options "-O2" } */
+
+#include <stddef.h>
+#include <stdio.h>
+#include <sys/mman.h>
+#ifndef MAP_ANONYMOUS
+#define MAP_ANONYMOUS MAP_ANON
+#endif
+#ifndef MAP_ANON
+#define MAP_ANON 0
+#endif
+#ifndef MAP_FAILED
+#define MAP_FAILED ((void *)-1)
+#endif
+#include <stdlib.h>
+
+__attribute__((noipa)) void
+foo (char *p, char *q, int r)
+{
+  char a = q[0];
+  if (r || a == '\0')
+    return;
+  char b = q[1];
+  p[0] = a;
+  p[1] = b;
+}
+
+int
+main ()
+{
+  char *p = mmap (NULL, 131072, PROT_READ | PROT_WRITE,
+		  MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+  if (p == MAP_FAILED)
+    return 0;
+  if (munmap (p + 65536, 65536) < 0)
+    return 0;
+  p[0] = 'c';
+  p[1] = 'd';
+  p[65536 - 2] = 'a';
+  p[65536 - 1] = 'b';
+  volatile int r = 1;
+  foo (p, p + 65536 - 2, r);
+  if (p[0] != 'c' || p[1] != 'd')
+    abort ();
+  r = 0;
+  foo (p, p + 65536 - 2, r);
+  if (p[0] != 'a' || p[1] != 'b')
+    abort ();
+  p[0] = 'e';
+  p[1] = 'f';
+  r = 1;
+  foo (p, p + 65536 - 1, r);
+  if (p[0] != 'e' || p[1] != 'f')
+    abort ();
+  return 0;
+}