diff mbox series

Extend GIMPLE store merging to throwing stores

Message ID 2602318.vuf938cG6T@arcturus.home
State New
Headers show
Series Extend GIMPLE store merging to throwing stores | expand

Commit Message

Eric Botcazou July 26, 2019, 10:53 a.m. UTC
Hi,

one of the effects of -fnon-call-exceptions is that the memory accesses are 
considered trapping by default, i.e. unless you can prove otherwise.  If, in 
addition to this, the code is covered by an exception handler, such memory 
accesses are the sources of an EH edge, which means that they end their basic 
block.  This thwarts a fair number of optimizations, especially the local ones 
among which a very badly affected example is GIMPLE store merging.

So the attached patch adds (minimal) support for merging throwing stores when 
-fnon-call-exceptions is enabled and there are active exception handlers in 
the function.  The idea is straightforward (to record the index number of the 
landing pad and merge consecutive stores with the same number) so the meat of 
the patch is technicalities required to first streamline the CFG, then detect 
the candidates throwing stores and finally properly update the CFG at the end.

We have real-world examples for which this helps a lot in Ada, typically when 
you're inlining an elaboration routine (constructor in Ada parlance) into a 
region covered by an exception handler.

Tested on x86_64-suse-linux, OK for the mainline?


2019-07-26  Eric Botcazou  <ebotcazou@adacore.com>

	* tree-eh.h (unsplit_all_eh): Declare.
	* tree-eh.c (maybe_remove_unreachable_handlers): Detect more cases.
	(unsplit_all_eh): Make public.
	* gimple-ssa-store-merging.c: Include cfganal.h cfgcleanup.h except.h.
	(struct store_immediate_info): Add lp_nr field.
	(store_immediate_info::store_immediate_info): Add NR2 parameter and
	initialize lp_nr with it.
	(struct merged_store_group): Add lp_nr and only_constants fields.
	(merged_store_group::merged_store_group): Initialize them.
	(merged_store_group::can_be_merged_into): Deal with them.
	(pass_store_merging): Rename terminate_and_release_chain into
	terminate_and_process_chain.
	(pass_store_merging::terminate_and_process_all_chains): Adjust to above
	renaming and remove useless assertions.
	(pass_store_merging::terminate_all_aliasing_chains): Small tweak.
	(stmts_may_clobber_ref_p): Be prepared for different basic blocks.
	(imm_store_chain_info::coalesce_immediate_stores): Use only_constants
	instead of always recomputing it and compare lp_nr.
	(imm_store_chain_info::output_merged_store): If the group is in an
	active EH region, register new stores if they can throw.  Moreover,
	if the insertion has created new basic blocks, adjust the PHI nodes
	of the post landing pad.
	(imm_store_chain_info::output_merged_stores): If the original stores
	are in an active EH region, deregister them.
	(lhs_valid_for_store_merging_p): Prettify.
	(adjust_bit_pos): New function extracted from...
	(mem_valid_for_store_merging): ...here.  Use it for the base address
	and also for the offset if it is the addition of a constant.
	(lp_nr_for_store): New function.
	(pass_store_merging::process_store): Change return type to bool.
	Call lp_nr_for_store to initialize the store info.  Propagate the
	return status of various called functions to the return value.
	(store_valid_for_store_merging_p): New predicate.
	(enum basic_block_status): New enumeration.
	(get_status_for_store_merging): New function.
	(pass_store_merging::execute): If the function can throw and catch
	non-call exceptions, unsplit the EH edges on entry and clean up the
	CFG on exit if something changed.  Call get_status_for_store_merging
	for every basic block and keep the chains open across basic blocks
	when possible.  Terminate and process open chains at the end, if any.


2019-07-26  Eric Botcazou  <ebotcazou@adacore.com>

	* gnat.dg/opt81.adb: New test.
	* gnat.dg/opt81_pkg.ads: New helper.

Comments

Richard Biener July 31, 2019, 10:23 a.m. UTC | #1
On Fri, Jul 26, 2019 at 12:56 PM Eric Botcazou <ebotcazou@adacore.com> wrote:
>
> Hi,
>
> one of the effects of -fnon-call-exceptions is that the memory accesses are
> considered trapping by default, i.e. unless you can prove otherwise.  If, in
> addition to this, the code is covered by an exception handler, such memory
> accesses are the sources of an EH edge, which means that they end their basic
> block.  This thwarts a fair number of optimizations, especially the local ones
> among which a very badly affected example is GIMPLE store merging.
>
> So the attached patch adds (minimal) support for merging throwing stores when
> -fnon-call-exceptions is enabled and there are active exception handlers in
> the function.  The idea is straightforward (to record the index number of the
> landing pad and merge consecutive stores with the same number) so the meat of
> the patch is technicalities required to first streamline the CFG, then detect
> the candidates throwing stores and finally properly update the CFG at the end.
>
> We have real-world examples for which this helps a lot in Ada, typically when
> you're inlining an elaboration routine (constructor in Ada parlance) into a
> region covered by an exception handler.
>
> Tested on x86_64-suse-linux, OK for the mainline?

+/* Return the index number of the landing pad for STMT, if any.  */
+
+static int
+lp_nr_for_store (gimple *stmt)
+{
+  if (!cfun->can_throw_non_call_exceptions || !cfun->eh)
+    return 0;
+
+  if (!stmt_could_throw_p (cfun, stmt))
+    return 0;
+
+  return lookup_stmt_eh_lp (stmt);
+}

Did you add the wrapper as compile-time optimization?  That is,
I don't see why simply calling lookup_stmt_eh_lp wouldn't work?

+  /* If the function can throw and catch non-call exceptions, we'll be trying
+     to merge stores across different basic blocks so we need to first unsplit
+     the EH edges in order to streamline the CFG of the function.  */
+  if (cfun->can_throw_non_call_exceptions && cfun->eh)
+    {
+      free_dominance_info (CDI_DOMINATORS);
+      maybe_remove_unreachable_handlers ();
+      changed = unsplit_all_eh ();
+      if (changed)
+       delete_unreachable_blocks ();
+    }

uh, can unsplitting really result in unreachable blocks or does it
merely forget to delete forwarders it made unreachable?  Removing
unreachable handlers is also to make things match better?  Just
wondering how much of this work we could delay to the first
store-merging opportunity with EH we find (but I don't care too much
about -fnon-call-exceptions).

To isolate the details above maybe move this piece into a helper
in tree-eh.c so you also can avoid exporting unsplit_all_eh?

Otherwise looks OK to me.

Thanks,
Richard.

>
> 2019-07-26  Eric Botcazou  <ebotcazou@adacore.com>
>
>         * tree-eh.h (unsplit_all_eh): Declare.
>         * tree-eh.c (maybe_remove_unreachable_handlers): Detect more cases.
>         (unsplit_all_eh): Make public.
>         * gimple-ssa-store-merging.c: Include cfganal.h cfgcleanup.h except.h.
>         (struct store_immediate_info): Add lp_nr field.
>         (store_immediate_info::store_immediate_info): Add NR2 parameter and
>         initialize lp_nr with it.
>         (struct merged_store_group): Add lp_nr and only_constants fields.
>         (merged_store_group::merged_store_group): Initialize them.
>         (merged_store_group::can_be_merged_into): Deal with them.
>         (pass_store_merging): Rename terminate_and_release_chain into
>         terminate_and_process_chain.
>         (pass_store_merging::terminate_and_process_all_chains): Adjust to above
>         renaming and remove useless assertions.
>         (pass_store_merging::terminate_all_aliasing_chains): Small tweak.
>         (stmts_may_clobber_ref_p): Be prepared for different basic blocks.
>         (imm_store_chain_info::coalesce_immediate_stores): Use only_constants
>         instead of always recomputing it and compare lp_nr.
>         (imm_store_chain_info::output_merged_store): If the group is in an
>         active EH region, register new stores if they can throw.  Moreover,
>         if the insertion has created new basic blocks, adjust the PHI nodes
>         of the post landing pad.
>         (imm_store_chain_info::output_merged_stores): If the original stores
>         are in an active EH region, deregister them.
>         (lhs_valid_for_store_merging_p): Prettify.
>         (adjust_bit_pos): New function extracted from...
>         (mem_valid_for_store_merging): ...here.  Use it for the base address
>         and also for the offset if it is the addition of a constant.
>         (lp_nr_for_store): New function.
>         (pass_store_merging::process_store): Change return type to bool.
>         Call lp_nr_for_store to initialize the store info.  Propagate the
>         return status of various called functions to the return value.
>         (store_valid_for_store_merging_p): New predicate.
>         (enum basic_block_status): New enumeration.
>         (get_status_for_store_merging): New function.
>         (pass_store_merging::execute): If the function can throw and catch
>         non-call exceptions, unsplit the EH edges on entry and clean up the
>         CFG on exit if something changed.  Call get_status_for_store_merging
>         for every basic block and keep the chains open across basic blocks
>         when possible.  Terminate and process open chains at the end, if any.
>
>
> 2019-07-26  Eric Botcazou  <ebotcazou@adacore.com>
>
>         * gnat.dg/opt81.adb: New test.
>         * gnat.dg/opt81_pkg.ads: New helper.
>
> --
> Eric Botcazou
Eric Botcazou Oct. 1, 2019, 11:05 a.m. UTC | #2
[Thanks for the quick review and sorry for the longish delay]

> +/* Return the index number of the landing pad for STMT, if any.  */
> +
> +static int
> +lp_nr_for_store (gimple *stmt)
> +{
> +  if (!cfun->can_throw_non_call_exceptions || !cfun->eh)
> +    return 0;
> +
> +  if (!stmt_could_throw_p (cfun, stmt))
> +    return 0;
> +
> +  return lookup_stmt_eh_lp (stmt);
> +}
> 
> Did you add the wrapper as compile-time optimization?  That is,
> I don't see why simply calling lookup_stmt_eh_lp wouldn't work?

Yes, I added it for C & C++, which both trivially fail the first test.  More 
generally, every additional processing is (directly or indirectly) guarded by 
the conjunction cfun->can_throw_non_call_exceptions && cfun->eh throughout.

> +  /* If the function can throw and catch non-call exceptions, we'll be
> trying +     to merge stores across different basic blocks so we need to
> first unsplit +     the EH edges in order to streamline the CFG of the
> function.  */ +  if (cfun->can_throw_non_call_exceptions && cfun->eh)
> +    {
> +      free_dominance_info (CDI_DOMINATORS);
> +      maybe_remove_unreachable_handlers ();
> +      changed = unsplit_all_eh ();
> +      if (changed)
> +       delete_unreachable_blocks ();
> +    }
> 
> uh, can unsplitting really result in unreachable blocks or does it
> merely forget to delete forwarders it made unreachable?

The latter.

> Removing unreachable handlers is also to make things match better?

Nope, only because calculate_dominance_info aborts otherwise below.

> Just wondering how much of this work we could delay to the first
> store-merging opportunity with EH we find (but I don't care too much
> about -fnon-call-exceptions).

This block of code is a manual, stripped down ehcleanup pass.

> To isolate the details above maybe move this piece into a helper
> in tree-eh.c so you also can avoid exporting unsplit_all_eh?

The main point is the unsplitting though so this would trade an explicit call 
for a less implicit one.  But what I could do is to rename unsplit_all_eh into 
unsplit_all_eh_1 and hide the technicalities in a new unsplit_all_eh.
Richard Biener Oct. 1, 2019, 11:15 a.m. UTC | #3
On Tue, Oct 1, 2019 at 1:05 PM Eric Botcazou <ebotcazou@adacore.com> wrote:
>
> [Thanks for the quick review and sorry for the longish delay]
>
> > +/* Return the index number of the landing pad for STMT, if any.  */
> > +
> > +static int
> > +lp_nr_for_store (gimple *stmt)
> > +{
> > +  if (!cfun->can_throw_non_call_exceptions || !cfun->eh)
> > +    return 0;
> > +
> > +  if (!stmt_could_throw_p (cfun, stmt))
> > +    return 0;
> > +
> > +  return lookup_stmt_eh_lp (stmt);
> > +}
> >
> > Did you add the wrapper as compile-time optimization?  That is,
> > I don't see why simply calling lookup_stmt_eh_lp wouldn't work?
>
> Yes, I added it for C & C++, which both trivially fail the first test.  More
> generally, every additional processing is (directly or indirectly) guarded by
> the conjunction cfun->can_throw_non_call_exceptions && cfun->eh throughout.
>
> > +  /* If the function can throw and catch non-call exceptions, we'll be
> > trying +     to merge stores across different basic blocks so we need to
> > first unsplit +     the EH edges in order to streamline the CFG of the
> > function.  */ +  if (cfun->can_throw_non_call_exceptions && cfun->eh)
> > +    {
> > +      free_dominance_info (CDI_DOMINATORS);
> > +      maybe_remove_unreachable_handlers ();
> > +      changed = unsplit_all_eh ();
> > +      if (changed)
> > +       delete_unreachable_blocks ();
> > +    }
> >
> > uh, can unsplitting really result in unreachable blocks or does it
> > merely forget to delete forwarders it made unreachable?
>
> The latter.
>
> > Removing unreachable handlers is also to make things match better?
>
> Nope, only because calculate_dominance_info aborts otherwise below.
>
> > Just wondering how much of this work we could delay to the first
> > store-merging opportunity with EH we find (but I don't care too much
> > about -fnon-call-exceptions).
>
> This block of code is a manual, stripped down ehcleanup pass.
>
> > To isolate the details above maybe move this piece into a helper
> > in tree-eh.c so you also can avoid exporting unsplit_all_eh?
>
> The main point is the unsplitting though so this would trade an explicit call
> for a less implicit one.  But what I could do is to rename unsplit_all_eh into
> unsplit_all_eh_1 and hide the technicalities in a new unsplit_all_eh.

that works for me - the patch is OK with that change.

Thanks,
Richard.

> --
> Eric Botcazou
Eric Botcazou Oct. 2, 2019, 1:33 p.m. UTC | #4
> that works for me - the patch is OK with that change.

Thanks.  In the end, I went for your solution and the helper is called 
unsplit_eh_edges, which is even more explicit than unsplit_all_eh.


	* tree-eh.h (unsplit_eh_edges): Declare.
	* tree-eh.c (maybe_remove_unreachable_handlers): Detect more cases.
	(unsplit_eh_edges): New function wrapping unsplit_all_eh.
	* gimple-ssa-store-merging.c: Include cfganal.h cfgcleanup.h except.h.
	(struct store_immediate_info): Add lp_nr field.
	(store_immediate_info::store_immediate_info): Add NR2 parameter and
	initialize lp_nr with it.
	(struct merged_store_group): Add lp_nr and only_constants fields.
	(merged_store_group::merged_store_group): Initialize them.
	(merged_store_group::can_be_merged_into): Deal with them.
	(pass_store_merging): Rename terminate_and_release_chain into
	terminate_and_process_chain.
	(pass_store_merging::terminate_and_process_all_chains): Adjust to above
	renaming and remove useless assertions.
	(pass_store_merging::terminate_all_aliasing_chains): Small tweak.
	(stmts_may_clobber_ref_p): Be prepared for different basic blocks.
	(imm_store_chain_info::coalesce_immediate_stores): Use only_constants
	instead of always recomputing it and compare lp_nr.
	(imm_store_chain_info::output_merged_store): If the group is in an
	active EH region, register new stores if they can throw.  Moreover,
	if the insertion has created new basic blocks, adjust the PHI nodes
	of the post landing pad.
	(imm_store_chain_info::output_merged_stores): If the original stores
	are in an active EH region, deregister them.
	(lhs_valid_for_store_merging_p): Prettify.
	(adjust_bit_pos): New function extracted from...
	(mem_valid_for_store_merging): ...here.  Use it for the base address
	and also for the offset if it is the addition of a constant.
	(lp_nr_for_store): New function.
	(pass_store_merging::process_store): Change return type to bool.
	Call lp_nr_for_store to initialize the store info.  Propagate the
	return status of various called functions to the return value.
	(store_valid_for_store_merging_p): New predicate.
	(enum basic_block_status): New enumeration.
	(get_status_for_store_merging): New function.
	(pass_store_merging::execute): If the function can throw and catch
	non-call exceptions, unsplit the EH edges on entry and clean up the
	CFG on exit if something changed.  Call get_status_for_store_merging
	for every basic block and keep the chains open across basic blocks
	when possible.  Terminate and process open chains at the end, if any.
diff mbox series

Patch

Index: gimple-ssa-store-merging.c
===================================================================
--- gimple-ssa-store-merging.c	(revision 273765)
+++ gimple-ssa-store-merging.c	(working copy)
@@ -159,7 +159,10 @@ 
 #include "gimple-fold.h"
 #include "stor-layout.h"
 #include "timevar.h"
+#include "cfganal.h"
+#include "cfgcleanup.h"
 #include "tree-cfg.h"
+#include "except.h"
 #include "tree-eh.h"
 #include "target.h"
 #include "gimplify-me.h"
@@ -1374,6 +1377,8 @@  class store_immediate_info
   /* True if ops have been swapped and thus ops[1] represents
      rhs1 of BIT_{AND,IOR,XOR}_EXPR and ops[0] represents rhs2.  */
   bool ops_swapped_p;
+  /* The index number of the landing pad, or 0 if there is none.  */
+  int lp_nr;
   /* Operands.  For BIT_*_EXPR rhs_code both operands are used, otherwise
      just the first one.  */
   store_operand_info ops[2];
@@ -1380,7 +1385,7 @@  class store_immediate_info
   store_immediate_info (unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
 			unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
 			gimple *, unsigned int, enum tree_code,
-			struct symbolic_number &, gimple *, bool,
+			struct symbolic_number &, gimple *, bool, int,
 			const store_operand_info &,
 			const store_operand_info &);
 };
@@ -1395,11 +1400,13 @@  store_immediate_info::store_immediate_info (unsign
 					    struct symbolic_number &nr,
 					    gimple *ins_stmtp,
 					    bool bitnotp,
+					    int nr2,
 					    const store_operand_info &op0r,
 					    const store_operand_info &op1r)
   : bitsize (bs), bitpos (bp), bitregion_start (brs), bitregion_end (bre),
     stmt (st), order (ord), rhs_code (rhscode), n (nr),
-    ins_stmt (ins_stmtp), bit_not_p (bitnotp), ops_swapped_p (false)
+    ins_stmt (ins_stmtp), bit_not_p (bitnotp), ops_swapped_p (false),
+    lp_nr (nr2)
 #if __cplusplus >= 201103L
     , ops { op0r, op1r }
 {
@@ -1434,6 +1441,7 @@  class merged_store_group
   bool bit_insertion;
   bool only_constants;
   unsigned int first_nonmergeable_order;
+  int lp_nr;
 
   auto_vec<store_immediate_info *> stores;
   /* We record the first and last original statements in the sequence because
@@ -1861,6 +1869,7 @@  merged_store_group::merged_store_group (store_imme
   bit_insertion = false;
   only_constants = info->rhs_code == INTEGER_CST;
   first_nonmergeable_order = ~0U;
+  lp_nr = info->lp_nr;
   unsigned HOST_WIDE_INT align_bitpos = 0;
   get_object_alignment_1 (gimple_assign_lhs (info->stmt),
 			  &align, &align_bitpos);
@@ -1903,6 +1912,9 @@  merged_store_group::can_be_merged_into (store_imme
   if (info->rhs_code == LROTATE_EXPR)
     return false;
 
+  if (info->lp_nr != lp_nr)
+    return false;
+
   /* The canonical case.  */
   if (info->rhs_code == stores[0]->rhs_code)
     return true;
@@ -2172,10 +2184,10 @@  class pass_store_merging : public gimple_opt_pass
      decisions when going out of SSA).  */
   imm_store_chain_info *m_stores_head;
 
-  void process_store (gimple *);
+  bool process_store (gimple *);
+  bool terminate_and_process_chain (imm_store_chain_info *);
+  bool terminate_all_aliasing_chains (imm_store_chain_info **, gimple *);
   bool terminate_and_process_all_chains ();
-  bool terminate_all_aliasing_chains (imm_store_chain_info **, gimple *);
-  bool terminate_and_release_chain (imm_store_chain_info *);
 }; // class pass_store_merging
 
 /* Terminate and process all recorded chains.  Return true if any changes
@@ -2186,16 +2198,14 @@  pass_store_merging::terminate_and_process_all_chai
 {
   bool ret = false;
   while (m_stores_head)
-    ret |= terminate_and_release_chain (m_stores_head);
+    ret |= terminate_and_process_chain (m_stores_head);
   gcc_assert (m_stores.is_empty ());
-  gcc_assert (m_stores_head == NULL);
-
   return ret;
 }
 
 /* Terminate all chains that are affected by the statement STMT.
    CHAIN_INFO is the chain we should ignore from the checks if
-   non-NULL.  */
+   non-NULL.  Return true if any changes were made.  */
 
 bool
 pass_store_merging::terminate_all_aliasing_chains (imm_store_chain_info
@@ -2232,8 +2242,7 @@  pass_store_merging::terminate_all_aliasing_chains
 		  fprintf (dump_file, "stmt causes chain termination:\n");
 		  print_gimple_stmt (dump_file, stmt, 0);
 		}
-	      terminate_and_release_chain (cur);
-	      ret = true;
+	      ret |= terminate_and_process_chain (cur);
 	      break;
 	    }
 	}
@@ -2247,7 +2256,7 @@  pass_store_merging::terminate_all_aliasing_chains
    entry is removed after the processing in any case.  */
 
 bool
-pass_store_merging::terminate_and_release_chain (imm_store_chain_info *chain_info)
+pass_store_merging::terminate_and_process_chain (imm_store_chain_info *chain_info)
 {
   bool ret = chain_info->terminate_and_process_chain ();
   m_stores.remove (chain_info->base_addr);
@@ -2256,9 +2265,9 @@  bool
 }
 
 /* Return true if stmts in between FIRST (inclusive) and LAST (exclusive)
-   may clobber REF.  FIRST and LAST must be in the same basic block and
-   have non-NULL vdef.  We want to be able to sink load of REF across
-   stores between FIRST and LAST, up to right before LAST.  */
+   may clobber REF.  FIRST and LAST must have non-NULL vdef.  We want to
+   be able to sink load of REF across stores between FIRST and LAST, up
+   to right before LAST.  */
 
 bool
 stmts_may_clobber_ref_p (gimple *first, gimple *last, tree ref)
@@ -2269,7 +2278,10 @@  stmts_may_clobber_ref_p (gimple *first, gimple *la
   tree vop = gimple_vdef (last);
   gimple *stmt;
 
-  gcc_checking_assert (gimple_bb (first) == gimple_bb (last));
+  /* Return true conservatively if the basic blocks are different.  */
+  if (gimple_bb (first) != gimple_bb (last))
+    return true;
+
   do
     {
       stmt = SSA_NAME_DEF_STMT (vop);
@@ -2285,6 +2297,7 @@  stmts_may_clobber_ref_p (gimple *first, gimple *la
       vop = gimple_vuse (stmt);
     }
   while (stmt != first);
+
   return false;
 }
 
@@ -2758,7 +2771,9 @@  imm_store_chain_info::coalesce_immediate_stores ()
 			 merged_store->start + merged_store->width - 1))
 	{
 	  /* Only allow overlapping stores of constants.  */
-	  if (info->rhs_code == INTEGER_CST && merged_store->only_constants)
+	  if (info->rhs_code == INTEGER_CST
+	      && merged_store->only_constants
+	      && info->lp_nr == merged_store->lp_nr)
 	    {
 	      unsigned int last_order
 		= MAX (merged_store->last_order, info->order);
@@ -4151,6 +4166,9 @@  imm_store_chain_info::output_merged_store (merged_
       gimple_set_vuse (stmt, new_vuse);
       gimple_seq_add_stmt_without_update (&seq, stmt);
 
+      if (group->lp_nr && stmt_could_throw_p (cfun, stmt))
+	add_stmt_to_eh_lp (stmt, group->lp_nr);
+
       tree new_vdef;
       if (i < split_stores.length () - 1)
 	new_vdef = make_ssa_name (gimple_vop (cfun), stmt);
@@ -4174,7 +4192,63 @@  imm_store_chain_info::output_merged_store (merged_
       if (dump_flags & TDF_DETAILS)
 	print_gimple_seq (dump_file, seq, 0, TDF_VOPS | TDF_MEMSYMS);
     }
-  gsi_insert_seq_after (&last_gsi, seq, GSI_SAME_STMT);
+
+  if (group->lp_nr > 0)
+    {
+      /* We're going to insert a sequence of (potentially) throwing stores
+	 into an active EH region.  This means that we're going to create
+	 new basic blocks with EH edges pointing to the post landing pad
+	 and, therefore, to have to update its PHI nodes, if any.  For the
+	 virtual PHI node, we're going to use the VDEFs created above, but
+	 for the other nodes, we need to record the original reaching defs.  */
+      eh_landing_pad lp = get_eh_landing_pad_from_number (group->lp_nr);
+      basic_block lp_bb = label_to_block (cfun, lp->post_landing_pad);
+      basic_block last_bb = gimple_bb (group->last_stmt);
+      edge last_edge = find_edge (last_bb, lp_bb);
+      auto_vec<tree, 16> last_defs;
+      gphi_iterator gpi;
+      for (gpi = gsi_start_phis (lp_bb); !gsi_end_p (gpi); gsi_next (&gpi))
+	{
+	  gphi *phi = gpi.phi ();
+	  tree last_def;
+	  if (virtual_operand_p (gimple_phi_result (phi)))
+	    last_def = NULL_TREE;
+	  else
+	    last_def = gimple_phi_arg_def (phi, last_edge->dest_idx);
+	  last_defs.safe_push (last_def);
+	}
+
+      /* Do the insertion.  Then, if new basic blocks have been created in the
+	 process, rewind the chain of VDEFs create above to walk the new basic
+	 blocks and update the corresponding arguments of the PHI nodes.  */
+      update_modified_stmts (seq);
+      if (gimple_find_sub_bbs (seq, &last_gsi))
+	while (last_vdef != gimple_vuse (group->last_stmt))
+	  {
+	    gimple *stmt = SSA_NAME_DEF_STMT (last_vdef);
+	    if (stmt_could_throw_p (cfun, stmt))
+	      {
+		edge new_edge = find_edge (gimple_bb (stmt), lp_bb);
+		unsigned int i;
+		for (gpi = gsi_start_phis (lp_bb), i = 0;
+		     !gsi_end_p (gpi);
+		     gsi_next (&gpi), i++)
+		  {
+		    gphi *phi = gpi.phi ();
+		    tree new_def;
+		    if (virtual_operand_p (gimple_phi_result (phi)))
+		      new_def = last_vdef;
+		    else
+		      new_def = last_defs[i];
+		    add_phi_arg (phi, new_def, new_edge, UNKNOWN_LOCATION);
+		  }
+	      }
+	    last_vdef = gimple_vuse (stmt);
+	  }
+    }
+  else
+    gsi_insert_seq_after (&last_gsi, seq, GSI_SAME_STMT);
+
   for (int j = 0; j < 2; ++j)
     if (load_seq[j])
       gsi_insert_seq_after (&load_gsi[j], load_seq[j], GSI_SAME_STMT);
@@ -4204,6 +4278,8 @@  imm_store_chain_info::output_merged_stores ()
 	      gimple *stmt = store->stmt;
 	      gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
 	      gsi_remove (&gsi, true);
+	      if (store->lp_nr)
+		remove_stmt_from_eh_lp (stmt);
 	      if (stmt != merged_store->last_stmt)
 		{
 		  unlink_stmt_vdef (stmt);
@@ -4256,14 +4332,22 @@  imm_store_chain_info::terminate_and_process_chain
 static bool
 lhs_valid_for_store_merging_p (tree lhs)
 {
-  tree_code code = TREE_CODE (lhs);
-
-  if (code == ARRAY_REF || code == ARRAY_RANGE_REF || code == MEM_REF
-      || code == COMPONENT_REF || code == BIT_FIELD_REF
-      || DECL_P (lhs))
+  if (DECL_P (lhs))
     return true;
 
-  return false;
+  switch (TREE_CODE (lhs))
+    {
+    case ARRAY_REF:
+    case ARRAY_RANGE_REF:
+    case BIT_FIELD_REF:
+    case COMPONENT_REF:
+    case MEM_REF:
+      return true;
+    default:
+      return false;
+    }
+
+  gcc_unreachable ();
 }
 
 /* Return true if the tree RHS is a constant we want to consider
@@ -4284,6 +4368,40 @@  rhs_valid_for_store_merging_p (tree rhs)
 	  && native_encode_expr (rhs, NULL, size) != 0);
 }
 
+/* Adjust *PBITPOS, *PBITREGION_START and *PBITREGION_END by BYTE_OFF bytes
+   and return true on success or false on failure.  */
+
+static bool
+adjust_bit_pos (poly_offset_int byte_off,
+		poly_int64  *pbitpos,
+		poly_uint64 *pbitregion_start,
+		poly_uint64 *pbitregion_end)
+{
+  poly_offset_int bit_off = byte_off << LOG2_BITS_PER_UNIT;
+  bit_off += *pbitpos;
+
+  if (known_ge (bit_off, 0) && bit_off.to_shwi (pbitpos))
+    {
+      if (maybe_ne (*pbitregion_end, 0U))
+	{
+	  bit_off = byte_off << LOG2_BITS_PER_UNIT;
+	  bit_off += *pbitregion_start;
+	  if (bit_off.to_uhwi (pbitregion_start))
+	    {
+	      bit_off = byte_off << LOG2_BITS_PER_UNIT;
+	      bit_off += *pbitregion_end;
+	      if (!bit_off.to_uhwi (pbitregion_end))
+		*pbitregion_end = 0;
+	    }
+	  else
+	    *pbitregion_end = 0;
+	}
+      return true;
+    }
+  else
+    return false;
+}
+
 /* If MEM is a memory reference usable for store merging (either as
    store destination or for loads), return the non-NULL base_addr
    and set *PBITSIZE, *PBITPOS, *PBITREGION_START and *PBITREGION_END.
@@ -4328,27 +4446,8 @@  mem_valid_for_store_merging (tree mem, poly_uint64
      PR 23684 and this way we can catch more chains.  */
   else if (TREE_CODE (base_addr) == MEM_REF)
     {
-      poly_offset_int byte_off = mem_ref_offset (base_addr);
-      poly_offset_int bit_off = byte_off << LOG2_BITS_PER_UNIT;
-      bit_off += bitpos;
-      if (known_ge (bit_off, 0) && bit_off.to_shwi (&bitpos))
-	{
-	  if (maybe_ne (bitregion_end, 0U))
-	    {
-	      bit_off = byte_off << LOG2_BITS_PER_UNIT;
-	      bit_off += bitregion_start;
-	      if (bit_off.to_uhwi (&bitregion_start))
-		{
-		  bit_off = byte_off << LOG2_BITS_PER_UNIT;
-		  bit_off += bitregion_end;
-		  if (!bit_off.to_uhwi (&bitregion_end))
-		    bitregion_end = 0;
-		}
-	      else
-		bitregion_end = 0;
-	    }
-	}
-      else
+      if (!adjust_bit_pos (mem_ref_offset (base_addr), &bitpos,
+			   &bitregion_start, &bitregion_end))
 	return NULL_TREE;
       base_addr = TREE_OPERAND (base_addr, 0);
     }
@@ -4361,15 +4460,8 @@  mem_valid_for_store_merging (tree mem, poly_uint64
       base_addr = build_fold_addr_expr (base_addr);
     }
 
-  if (known_eq (bitregion_end, 0U))
+  if (offset)
     {
-      bitregion_start = round_down_to_byte_boundary (bitpos);
-      bitregion_end = bitpos;
-      bitregion_end = round_up_to_byte_boundary (bitregion_end + bitsize);
-    }
-
-  if (offset != NULL_TREE)
-    {
       /* If the access is variable offset then a base decl has to be
 	 address-taken to be able to emit pointer-based stores to it.
 	 ???  We might be able to get away with re-using the original
@@ -4376,14 +4468,26 @@  mem_valid_for_store_merging (tree mem, poly_uint64
 	 base up to the first variable part and then wrapping that inside
 	 a BIT_FIELD_REF.  */
       tree base = get_base_address (base_addr);
-      if (! base
-	  || (DECL_P (base) && ! TREE_ADDRESSABLE (base)))
+      if (!base || (DECL_P (base) && !TREE_ADDRESSABLE (base)))
 	return NULL_TREE;
 
+      /* Similarly to above for the base, remove constant from the offset.  */
+      if (TREE_CODE (offset) == PLUS_EXPR
+	  && TREE_CODE (TREE_OPERAND (offset, 1)) == INTEGER_CST
+	  && adjust_bit_pos (wi::to_poly_offset (TREE_OPERAND (offset, 1)),
+			     &bitpos, &bitregion_start, &bitregion_end))
+	offset = TREE_OPERAND (offset, 0);
+
       base_addr = build2 (POINTER_PLUS_EXPR, TREE_TYPE (base_addr),
 			  base_addr, offset);
     }
 
+  if (known_eq (bitregion_end, 0U))
+    {
+      bitregion_start = round_down_to_byte_boundary (bitpos);
+      bitregion_end = round_up_to_byte_boundary (bitpos + bitsize);
+    }
+
   *pbitsize = bitsize;
   *pbitpos = bitpos;
   *pbitregion_start = bitregion_start;
@@ -4446,10 +4550,24 @@  handled_load (gimple *stmt, store_operand_info *op
   return false;
 }
 
+/* Return the index number of the landing pad for STMT, if any.  */
+
+static int
+lp_nr_for_store (gimple *stmt)
+{
+  if (!cfun->can_throw_non_call_exceptions || !cfun->eh)
+    return 0;
+
+  if (!stmt_could_throw_p (cfun, stmt))
+    return 0;
+
+  return lookup_stmt_eh_lp (stmt);
+}
+
 /* Record the store STMT for store merging optimization if it can be
-   optimized.  */
+   optimized.  Return true if any changes were made.  */
 
-void
+bool
 pass_store_merging::process_store (gimple *stmt)
 {
   tree lhs = gimple_assign_lhs (stmt);
@@ -4460,7 +4578,7 @@  pass_store_merging::process_store (gimple *stmt)
     = mem_valid_for_store_merging (lhs, &bitsize, &bitpos,
 				   &bitregion_start, &bitregion_end);
   if (known_eq (bitsize, 0U))
-    return;
+    return false;
 
   bool invalid = (base_addr == NULL_TREE
 		  || (maybe_gt (bitsize,
@@ -4602,15 +4720,13 @@  pass_store_merging::process_store (gimple *stmt)
       || !bitpos.is_constant (&const_bitpos)
       || !bitregion_start.is_constant (&const_bitregion_start)
       || !bitregion_end.is_constant (&const_bitregion_end))
-    {
-      terminate_all_aliasing_chains (NULL, stmt);
-      return;
-    }
+    return terminate_all_aliasing_chains (NULL, stmt);
 
   if (!ins_stmt)
     memset (&n, 0, sizeof (n));
 
   class imm_store_chain_info **chain_info = NULL;
+  bool ret = false;
   if (base_addr)
     chain_info = m_stores.get (base_addr);
 
@@ -4622,7 +4738,8 @@  pass_store_merging::process_store (gimple *stmt)
 				       const_bitregion_start,
 				       const_bitregion_end,
 				       stmt, ord, rhs_code, n, ins_stmt,
-				       bit_not_p, ops[0], ops[1]);
+				       bit_not_p, lp_nr_for_store (stmt),
+				       ops[0], ops[1]);
       if (dump_file && (dump_flags & TDF_DETAILS))
 	{
 	  fprintf (dump_file, "Recording immediate store from stmt:\n");
@@ -4629,7 +4746,7 @@  pass_store_merging::process_store (gimple *stmt)
 	  print_gimple_stmt (dump_file, stmt, 0);
 	}
       (*chain_info)->m_store_info.safe_push (info);
-      terminate_all_aliasing_chains (chain_info, stmt);
+      ret |= terminate_all_aliasing_chains (chain_info, stmt);
       /* If we reach the limit of stores to merge in a chain terminate and
 	 process the chain now.  */
       if ((*chain_info)->m_store_info.length ()
@@ -4638,13 +4755,13 @@  pass_store_merging::process_store (gimple *stmt)
 	  if (dump_file && (dump_flags & TDF_DETAILS))
 	    fprintf (dump_file,
 		     "Reached maximum number of statements to merge:\n");
-	  terminate_and_release_chain (*chain_info);
+	  ret |= terminate_and_process_chain (*chain_info);
 	}
-      return;
+      return ret;
     }
 
   /* Store aliases any existing chain?  */
-  terminate_all_aliasing_chains (NULL, stmt);
+  ret |= terminate_all_aliasing_chains (NULL, stmt);
   /* Start a new chain.  */
   class imm_store_chain_info *new_chain
     = new imm_store_chain_info (m_stores_head, base_addr);
@@ -4652,7 +4769,8 @@  pass_store_merging::process_store (gimple *stmt)
 				   const_bitregion_start,
 				   const_bitregion_end,
 				   stmt, 0, rhs_code, n, ins_stmt,
-				   bit_not_p, ops[0], ops[1]);
+				   bit_not_p, lp_nr_for_store (stmt),
+				   ops[0], ops[1]);
   new_chain->m_store_info.safe_push (info);
   m_stores.put (base_addr, new_chain);
   if (dump_file && (dump_flags & TDF_DETAILS))
@@ -4663,8 +4781,54 @@  pass_store_merging::process_store (gimple *stmt)
       print_generic_expr (dump_file, base_addr);
       fprintf (dump_file, "\n");
     }
+  return ret;
 }
 
+/* Return true if STMT is a store valid for store merging.  */
+
+static bool
+store_valid_for_store_merging_p (gimple *stmt)
+{
+  return gimple_assign_single_p (stmt)
+	 && gimple_vdef (stmt)
+	 && lhs_valid_for_store_merging_p (gimple_assign_lhs (stmt))
+	 && !gimple_has_volatile_ops (stmt);
+}
+
+enum basic_block_status { BB_INVALID, BB_VALID, BB_EXTENDED_VALID };
+
+/* Return the status of basic block BB wrt store merging.  */
+
+static enum basic_block_status
+get_status_for_store_merging (basic_block bb)
+{
+  unsigned int num_statements = 0;
+  gimple_stmt_iterator gsi;
+  edge e;
+
+  for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      gimple *stmt = gsi_stmt (gsi);
+
+      if (is_gimple_debug (stmt))
+	continue;
+
+      if (store_valid_for_store_merging_p (stmt) && ++num_statements >= 2)
+	break;
+    }
+
+  if (num_statements == 0)
+    return BB_INVALID;
+
+  if (cfun->can_throw_non_call_exceptions && cfun->eh
+      && store_valid_for_store_merging_p (gimple_seq_last_stmt (bb_seq (bb)))
+      && (e = find_fallthru_edge (bb->succs))
+      && e->dest == bb->next_bb)
+    return BB_EXTENDED_VALID;
+
+  return num_statements >= 2 ? BB_VALID : BB_INVALID;
+}
+
 /* Entry point for the pass.  Go over each basic block recording chains of
    immediate stores.  Upon encountering a terminating statement (as defined
    by stmt_terminates_chain_p) process the recorded stores and emit the widened
@@ -4675,26 +4839,34 @@  pass_store_merging::execute (function *fun)
 {
   basic_block bb;
   hash_set<gimple *> orig_stmts;
+  bool changed = false, open_chains = false;
 
+  /* If the function can throw and catch non-call exceptions, we'll be trying
+     to merge stores across different basic blocks so we need to first unsplit
+     the EH edges in order to streamline the CFG of the function.  */
+  if (cfun->can_throw_non_call_exceptions && cfun->eh)
+    {
+      free_dominance_info (CDI_DOMINATORS);
+      maybe_remove_unreachable_handlers ();
+      changed = unsplit_all_eh ();
+      if (changed)
+	delete_unreachable_blocks ();
+    }
+
   calculate_dominance_info (CDI_DOMINATORS);
 
   FOR_EACH_BB_FN (bb, fun)
     {
+      const basic_block_status bb_status = get_status_for_store_merging (bb);
       gimple_stmt_iterator gsi;
-      unsigned HOST_WIDE_INT num_statements = 0;
-      /* Record the original statements so that we can keep track of
-	 statements emitted in this pass and not re-process new
-	 statements.  */
-      for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+
+      if (open_chains && (bb_status == BB_INVALID || !single_pred_p (bb)))
 	{
-	  if (is_gimple_debug (gsi_stmt (gsi)))
-	    continue;
-
-	  if (++num_statements >= 2)
-	    break;
+	  changed |= terminate_and_process_all_chains ();
+	  open_chains = false;
 	}
 
-      if (num_statements < 2)
+      if (bb_status == BB_INVALID)
 	continue;
 
       if (dump_file && (dump_flags & TDF_DETAILS))
@@ -4713,19 +4885,37 @@  pass_store_merging::execute (function *fun)
 	      if (dump_file && (dump_flags & TDF_DETAILS))
 		fprintf (dump_file, "Volatile access terminates "
 				    "all chains\n");
-	      terminate_and_process_all_chains ();
+	      changed |= terminate_and_process_all_chains ();
+	      open_chains = false;
 	      continue;
 	    }
 
-	  if (gimple_assign_single_p (stmt) && gimple_vdef (stmt)
-	      && !stmt_can_throw_internal (cfun, stmt)
-	      && lhs_valid_for_store_merging_p (gimple_assign_lhs (stmt)))
-	    process_store (stmt);
+	  if (store_valid_for_store_merging_p (stmt))
+	    changed |= process_store (stmt);
 	  else
-	    terminate_all_aliasing_chains (NULL, stmt);
+	    changed |= terminate_all_aliasing_chains (NULL, stmt);
 	}
-      terminate_and_process_all_chains ();
+
+      if (bb_status == BB_EXTENDED_VALID)
+	open_chains = true;
+      else
+	{
+	  changed |= terminate_and_process_all_chains ();
+	  open_chains = false;
+	}
     }
+
+  if (open_chains)
+    changed |= terminate_and_process_all_chains ();
+
+  /* If the function can throw and catch non-call exceptions and something
+     changed during the pass, then the CFG has (very likely) changed too.  */
+  if (cfun->can_throw_non_call_exceptions && cfun->eh && changed)
+    {
+      free_dominance_info (CDI_DOMINATORS);
+      return TODO_cleanup_cfg;
+    }
+
   return 0;
 }
 
Index: tree-eh.c
===================================================================
--- tree-eh.c	(revision 273765)
+++ tree-eh.c	(working copy)
@@ -4040,15 +4040,14 @@  maybe_remove_unreachable_handlers (void)
 
   if (cfun->eh == NULL)
     return;
-           
+
   FOR_EACH_VEC_SAFE_ELT (cfun->eh->lp_array, i, lp)
-    if (lp && lp->post_landing_pad)
+    if (lp
+	&& (lp->post_landing_pad == NULL_TREE
+	    || label_to_block (cfun, lp->post_landing_pad) == NULL))
       {
-	if (label_to_block (cfun, lp->post_landing_pad) == NULL)
-	  {
-	    remove_unreachable_handlers ();
-	    return;
-	  }
+	remove_unreachable_handlers ();
+	return;
       }
 }
 
@@ -4197,7 +4196,7 @@  unsplit_eh (eh_landing_pad lp)
 
 /* Examine each landing pad block and see if it matches unsplit_eh.  */
 
-static bool
+bool
 unsplit_all_eh (void)
 {
   bool changed = false;
Index: tree-eh.h
===================================================================
--- tree-eh.h	(revision 273765)
+++ tree-eh.h	(working copy)
@@ -52,5 +52,6 @@  extern bool maybe_duplicate_eh_stmt (gimple *, gim
 extern void maybe_remove_unreachable_handlers (void);
 extern bool verify_eh_edges (gimple *);
 extern bool verify_eh_dispatch_edge (geh_dispatch *);
+extern bool unsplit_all_eh (void);
 
 #endif /* GCC_TREE_EH_H */