diff mbox series

Fix PR91522

Message ID alpine.LSU.2.20.1908221533320.32458@zhemvz.fhfr.qr
State New
Headers show
Series Fix PR91522 | expand

Commit Message

Richard Biener Aug. 22, 2019, 1:35 p.m. UTC
This fixes quadraticness in STV and makes

 machine dep reorg                  :  89.07 ( 95%)   0.02 ( 18%)  89.10 ( 
95%)      54 kB (  0%)

drop to zero.  Anybody remembers why it is the way it is now?

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

OK?

Thanks,
Richard.

2019-08-22  Richard Biener  <rguenther@suse.de>

	PR target/91522
	* config/i386/i386-features.c (scalar_chain::add_insn): Do not
	iterate over all defs of a reg.

Comments

Uros Bizjak Aug. 23, 2019, 6:32 a.m. UTC | #1
On Thu, Aug 22, 2019 at 3:35 PM Richard Biener <rguenther@suse.de> wrote:
>
>
> This fixes quadraticness in STV and makes
>
>  machine dep reorg                  :  89.07 ( 95%)   0.02 ( 18%)  89.10 (
> 95%)      54 kB (  0%)
>
> drop to zero.  Anybody remembers why it is the way it is now?
>
> Bootstrap / regtest running on x86_64-unknown-linux-gnu.
>
> OK?

Looking at the PR, comment #3 [1], I assume this patch is obsoltete
and will be updated?

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91522#c3

Uros.

> Thanks,
> Richard.
>
> 2019-08-22  Richard Biener  <rguenther@suse.de>
>
>         PR target/91522
>         * config/i386/i386-features.c (scalar_chain::add_insn): Do not
>         iterate over all defs of a reg.
>
> Index: gcc/config/i386/i386-features.c
> ===================================================================
> --- gcc/config/i386/i386-features.c     (revision 274764)
> +++ gcc/config/i386/i386-features.c     (working copy)
> @@ -419,10 +419,7 @@ scalar_chain::add_insn (bitmap candidate
>    df_ref def;
>    for (ref = DF_INSN_UID_DEFS (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref))
>      if (!HARD_REGISTER_P (DF_REF_REG (ref)))
> -      for (def = DF_REG_DEF_CHAIN (DF_REF_REGNO (ref));
> -          def;
> -          def = DF_REF_NEXT_REG (def))
> -       analyze_register_chain (candidates, def);
> +      analyze_register_chain (candidates, ref);
>    for (ref = DF_INSN_UID_USES (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref))
>      if (!DF_REF_REG_MEM_P (ref))
>        analyze_register_chain (candidates, ref);
>
Richard Biener Aug. 23, 2019, 10:57 a.m. UTC | #2
On Fri, 23 Aug 2019, Uros Bizjak wrote:

> On Thu, Aug 22, 2019 at 3:35 PM Richard Biener <rguenther@suse.de> wrote:
> >
> >
> > This fixes quadraticness in STV and makes
> >
> >  machine dep reorg                  :  89.07 ( 95%)   0.02 ( 18%)  89.10 (
> > 95%)      54 kB (  0%)
> >
> > drop to zero.  Anybody remembers why it is the way it is now?
> >
> > Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> >
> > OK?
> 
> Looking at the PR, comment #3 [1], I assume this patch is obsoltete
> and will be updated?
> 
> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91522#c3

Yes.  I'm still learning how STV operates (and learing DF and RTL...).
The following is a rewrite of the non-TImode chain conversion
according to I think how it should operate als allowing the hunk
that fixes the compile-time and fixing PR91527 on the way
(which I ran into during more extensive testing of the patch myself).

So compared to the state before which I still do not 100% understand
we now do the following.  Chain detection works as before including
recording of all defs (both defined by the insns in the chain and
insns outside) that need copy-in or copy-out operations.

But then the patch changes things as to guarantee that
after the conversion all uses/defs of a pseudo are
of the (subreg:Vmode ..) form or of the original scalar form.
In particular it avoids the need to change any insns that
are not part of the chain (besides emitting the extra copy
instructions).  For this all defs that were marked as needing
copies (thus they have uses/defs both in the chain and outside)
the chain will use a new pseudo that we copy to from scalar sources
and that we copy from for scalar uses.  There's the new defs_map
which records the mapping of old to new reg.  pseudos that are
only used in the chain already are not remapped.

The conversion itself then happens in two stages, first,
in make_vector_copies, we emit the copy-in insns and
allocate all pseudos we need.  Then the rest of the
conversion happens fully inside of convert_insn where
we generate the copy-outs of the insns def, replace
defs and uses according to the mapping and replace uses
and defs with the (subreg:Vmode ..) style.

For PR91527 IRA doesn't like the REG_EQUIV note in

(insn 4 24 5 2 (set (subreg:V4SI (reg/v:SI 90 [ c ]) 0)
        (subreg:V4SI (reg:SI 100) 0)) 
"/space/rguenther/src/svn/trunk2/gcc/testsuite/g++.dg/tree-ssa/pr21463.C":11:4 
1248 {movv4si_internal}
     (expr_list:REG_DEAD (reg:SI 100)
        (expr_list:REG_EQUIV (mem/c:SI (plus:DI (reg/f:DI 16 argp)
                    (const_int 16 [0x10])) [1 c+0 S4 A64])
            (nil))))

because the SET_DEST is not a REG_P.  I'm not sure if this
is invalid RTL, docs say SET_DEST might be a strict_low_part
or a zero_extract but doesn't mention a subreg.  So I opted
to simply remove equal/equiv notes on insns we convert
and since the above has a REG_DEAD note I took the liberty
to update that according to the mapping (so that would have
been not needed before this patch) rather than dropping it.

Bootstrapped with and without --with-march=westmere (to get
some STV coverage, this included all languages) on 
x88_64-unknown-linux-gnu, testing in progress.

OK if testing succeeds?

It still solves the compile-time issue (which is a latent issue,
btw, and with a carefully crafted testcase can be triggered
since STV exists for DImode chains with !TARGET_64BIT).

Thanks,
Richard.

2019-08-22  Richard Biener  <rguenther@suse.de>

	PR target/91522
	PR target/91527
	* config/i386/i386-features.h (general_scalar_chain::defs_map):
	New member.
	(general_scalar_chain::replace_with_subreg): Remove.
	(general_scalar_chain::replace_with_subreg_in_insn): Likewise.
	(general_scalar_chain::convert_reg): Adjust signature.
	* config/i386/i386-features.c (scalar_chain::add_insn): Do not
	iterate over all defs of a reg.
	(general_scalar_chain::replace_with_subreg): Remove.
	(general_scalar_chain::replace_with_subreg_in_insn): Likewise.
	(general_scalar_chain::make_vector_copies): Populate defs_map,
	place copy only after defs that are used as vectors in the chain.
	(general_scalar_chain::convert_reg): Emit a copy for a specific
	def in a specific instruction.
	(general_scalar_chain::convert_op): All reg uses are converted here.
	(general_scalar_chain::convert_insn): Emit copies for scalar
	uses of defs here.  Replace uses with the copies we created.
	Replace and convert the def.  Adjust REG_DEAD notes, remove
	REG_EQUIV/EQUAL notes.
	(general_scalar_chain::convert_registers): Only handle copies
	into the chain here.

Index: gcc/config/i386/i386-features.c
===================================================================
--- gcc/config/i386/i386-features.c	(revision 274843)
+++ gcc/config/i386/i386-features.c	(working copy)
@@ -416,13 +416,9 @@ scalar_chain::add_insn (bitmap candidate
      iterates over all refs to look for dual-mode regs.  Instead this
      should be done separately for all regs mentioned in the chain once.  */
   df_ref ref;
-  df_ref def;
   for (ref = DF_INSN_UID_DEFS (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref))
     if (!HARD_REGISTER_P (DF_REF_REG (ref)))
-      for (def = DF_REG_DEF_CHAIN (DF_REF_REGNO (ref));
-	   def;
-	   def = DF_REF_NEXT_REG (def))
-	analyze_register_chain (candidates, def);
+      analyze_register_chain (candidates, ref);
   for (ref = DF_INSN_UID_USES (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref))
     if (!DF_REF_REG_MEM_P (ref))
       analyze_register_chain (candidates, ref);
@@ -605,42 +601,6 @@ general_scalar_chain::compute_convert_ga
   return gain;
 }
 
-/* Replace REG in X with a V2DI subreg of NEW_REG.  */
-
-rtx
-general_scalar_chain::replace_with_subreg (rtx x, rtx reg, rtx new_reg)
-{
-  if (x == reg)
-    return gen_rtx_SUBREG (vmode, new_reg, 0);
-
-  /* But not in memory addresses.  */
-  if (MEM_P (x))
-    return x;
-
-  const char *fmt = GET_RTX_FORMAT (GET_CODE (x));
-  int i, j;
-  for (i = GET_RTX_LENGTH (GET_CODE (x)) - 1; i >= 0; i--)
-    {
-      if (fmt[i] == 'e')
-	XEXP (x, i) = replace_with_subreg (XEXP (x, i), reg, new_reg);
-      else if (fmt[i] == 'E')
-	for (j = XVECLEN (x, i) - 1; j >= 0; j--)
-	  XVECEXP (x, i, j) = replace_with_subreg (XVECEXP (x, i, j),
-						   reg, new_reg);
-    }
-
-  return x;
-}
-
-/* Replace REG in INSN with a V2DI subreg of NEW_REG.  */
-
-void
-general_scalar_chain::replace_with_subreg_in_insn (rtx_insn *insn,
-						  rtx reg, rtx new_reg)
-{
-  replace_with_subreg (single_set (insn), reg, new_reg);
-}
-
 /* Insert generated conversion instruction sequence INSNS
    after instruction AFTER.  New BB may be required in case
    instruction has EH region attached.  */
@@ -691,204 +651,147 @@ general_scalar_chain::make_vector_copies
   rtx vreg = gen_reg_rtx (smode);
   df_ref ref;
 
-  for (ref = DF_REG_DEF_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref))
-    if (!bitmap_bit_p (insns, DF_REF_INSN_UID (ref)))
-      {
-	start_sequence ();
-	if (!TARGET_INTER_UNIT_MOVES_TO_VEC)
-	  {
-	    rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP);
-	    if (smode == DImode && !TARGET_64BIT)
-	      {
-		emit_move_insn (adjust_address (tmp, SImode, 0),
-				gen_rtx_SUBREG (SImode, reg, 0));
-		emit_move_insn (adjust_address (tmp, SImode, 4),
-				gen_rtx_SUBREG (SImode, reg, 4));
-	      }
-	    else
-	      emit_move_insn (copy_rtx (tmp), reg);
-	    emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
-				    gen_gpr_to_xmm_move_src (vmode, tmp)));
-	  }
-	else if (!TARGET_64BIT && smode == DImode)
-	  {
-	    if (TARGET_SSE4_1)
-	      {
-		emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0),
-					    CONST0_RTX (V4SImode),
-					    gen_rtx_SUBREG (SImode, reg, 0)));
-		emit_insn (gen_sse4_1_pinsrd (gen_rtx_SUBREG (V4SImode, vreg, 0),
-					      gen_rtx_SUBREG (V4SImode, vreg, 0),
-					      gen_rtx_SUBREG (SImode, reg, 4),
-					      GEN_INT (2)));
-	      }
-	    else
-	      {
-		rtx tmp = gen_reg_rtx (DImode);
-		emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0),
-					    CONST0_RTX (V4SImode),
-					    gen_rtx_SUBREG (SImode, reg, 0)));
-		emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, tmp, 0),
-					    CONST0_RTX (V4SImode),
-					    gen_rtx_SUBREG (SImode, reg, 4)));
-		emit_insn (gen_vec_interleave_lowv4si
-			   (gen_rtx_SUBREG (V4SImode, vreg, 0),
-			    gen_rtx_SUBREG (V4SImode, vreg, 0),
-			    gen_rtx_SUBREG (V4SImode, tmp, 0)));
-	      }
-	  }
-	else
-	  emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
-				  gen_gpr_to_xmm_move_src (vmode, reg)));
-	rtx_insn *seq = get_insns ();
-	end_sequence ();
-	rtx_insn *insn = DF_REF_INSN (ref);
-	emit_conversion_insns (seq, insn);
-
-	if (dump_file)
-	  fprintf (dump_file,
-		   "  Copied r%d to a vector register r%d for insn %d\n",
-		   regno, REGNO (vreg), INSN_UID (insn));
-      }
-
-  for (ref = DF_REG_USE_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref))
-    if (bitmap_bit_p (insns, DF_REF_INSN_UID (ref)))
-      {
-	rtx_insn *insn = DF_REF_INSN (ref);
-	replace_with_subreg_in_insn (insn, reg, vreg);
-
-	if (dump_file)
-	  fprintf (dump_file, "  Replaced r%d with r%d in insn %d\n",
-		   regno, REGNO (vreg), INSN_UID (insn));
-      }
-}
-
-/* Convert all definitions of register REGNO
-   and fix its uses.  Scalar copies may be created
-   in case register is used in not convertible insn.  */
-
-void
-general_scalar_chain::convert_reg (unsigned regno)
-{
-  bool scalar_copy = bitmap_bit_p (defs_conv, regno);
-  rtx reg = regno_reg_rtx[regno];
-  rtx scopy = NULL_RTX;
-  df_ref ref;
-  bitmap conv;
-
-  conv = BITMAP_ALLOC (NULL);
-  bitmap_copy (conv, insns);
-
-  if (scalar_copy)
-    scopy = gen_reg_rtx (smode);
+  defs_map.put (reg, vreg);
 
+  /* For each insn defining REGNO, see if it is defined by an insn
+     not part of the chain but with uses in insns part of the chain
+     and insert a copy in that case.  */
   for (ref = DF_REG_DEF_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref))
     {
-      rtx_insn *insn = DF_REF_INSN (ref);
-      rtx def_set = single_set (insn);
-      rtx src = SET_SRC (def_set);
-      rtx reg = DF_REF_REG (ref);
+      if (bitmap_bit_p (insns, DF_REF_INSN_UID (ref)))
+	continue;
+      df_link *use;
+      for (use = DF_REF_CHAIN (ref); use; use = use->next)
+	if (!DF_REF_REG_MEM_P (use->ref)
+	    && bitmap_bit_p (insns, DF_REF_INSN_UID (use->ref)))
+	  break;
+      if (!use)
+	continue;
 
-      if (!MEM_P (src))
+      start_sequence ();
+      if (!TARGET_INTER_UNIT_MOVES_TO_VEC)
 	{
-	  replace_with_subreg_in_insn (insn, reg, reg);
-	  bitmap_clear_bit (conv, INSN_UID (insn));
+	  rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP);
+	  if (smode == DImode && !TARGET_64BIT)
+	    {
+	      emit_move_insn (adjust_address (tmp, SImode, 0),
+			      gen_rtx_SUBREG (SImode, reg, 0));
+	      emit_move_insn (adjust_address (tmp, SImode, 4),
+			      gen_rtx_SUBREG (SImode, reg, 4));
+	    }
+	  else
+	    emit_move_insn (copy_rtx (tmp), reg);
+	  emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
+				  gen_gpr_to_xmm_move_src (vmode, tmp)));
 	}
-
-      if (scalar_copy)
+      else if (!TARGET_64BIT && smode == DImode)
 	{
-	  start_sequence ();
-	  if (!TARGET_INTER_UNIT_MOVES_FROM_VEC)
+	  if (TARGET_SSE4_1)
 	    {
-	      rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP);
-	      emit_move_insn (tmp, reg);
-	      if (!TARGET_64BIT && smode == DImode)
-		{
-		  emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 0),
-				  adjust_address (tmp, SImode, 0));
-		  emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 4),
-				  adjust_address (tmp, SImode, 4));
-		}
-	      else
-		emit_move_insn (scopy, copy_rtx (tmp));
+	      emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0),
+					  CONST0_RTX (V4SImode),
+					  gen_rtx_SUBREG (SImode, reg, 0)));
+	      emit_insn (gen_sse4_1_pinsrd (gen_rtx_SUBREG (V4SImode, vreg, 0),
+					    gen_rtx_SUBREG (V4SImode, vreg, 0),
+					    gen_rtx_SUBREG (SImode, reg, 4),
+					    GEN_INT (2)));
 	    }
-	  else if (!TARGET_64BIT && smode == DImode)
+	  else
 	    {
-	      if (TARGET_SSE4_1)
-		{
-		  rtx tmp = gen_rtx_PARALLEL (VOIDmode,
-					      gen_rtvec (1, const0_rtx));
-		  emit_insn
-		    (gen_rtx_SET
-		       (gen_rtx_SUBREG (SImode, scopy, 0),
-			gen_rtx_VEC_SELECT (SImode,
-					    gen_rtx_SUBREG (V4SImode, reg, 0),
-					    tmp)));
-
-		  tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, const1_rtx));
-		  emit_insn
-		    (gen_rtx_SET
-		       (gen_rtx_SUBREG (SImode, scopy, 4),
-			gen_rtx_VEC_SELECT (SImode,
-					    gen_rtx_SUBREG (V4SImode, reg, 0),
-					    tmp)));
-		}
-	      else
-		{
-		  rtx vcopy = gen_reg_rtx (V2DImode);
-		  emit_move_insn (vcopy, gen_rtx_SUBREG (V2DImode, reg, 0));
-		  emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 0),
-				  gen_rtx_SUBREG (SImode, vcopy, 0));
-		  emit_move_insn (vcopy,
-				  gen_rtx_LSHIFTRT (V2DImode,
-						    vcopy, GEN_INT (32)));
-		  emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 4),
-				  gen_rtx_SUBREG (SImode, vcopy, 0));
-		}
+	      rtx tmp = gen_reg_rtx (DImode);
+	      emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0),
+					  CONST0_RTX (V4SImode),
+					  gen_rtx_SUBREG (SImode, reg, 0)));
+	      emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, tmp, 0),
+					  CONST0_RTX (V4SImode),
+					  gen_rtx_SUBREG (SImode, reg, 4)));
+	      emit_insn (gen_vec_interleave_lowv4si
+			 (gen_rtx_SUBREG (V4SImode, vreg, 0),
+			  gen_rtx_SUBREG (V4SImode, vreg, 0),
+			  gen_rtx_SUBREG (V4SImode, tmp, 0)));
 	    }
-	  else
-	    emit_move_insn (scopy, reg);
-
-	  rtx_insn *seq = get_insns ();
-	  end_sequence ();
-	  emit_conversion_insns (seq, insn);
-
-	  if (dump_file)
-	    fprintf (dump_file,
-		     "  Copied r%d to a scalar register r%d for insn %d\n",
-		     regno, REGNO (scopy), INSN_UID (insn));
 	}
-    }
-
-  for (ref = DF_REG_USE_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref))
-    if (bitmap_bit_p (insns, DF_REF_INSN_UID (ref)))
-      {
-	if (bitmap_bit_p (conv, DF_REF_INSN_UID (ref)))
-	  {
-	    rtx_insn *insn = DF_REF_INSN (ref);
+      else
+	emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
+				gen_gpr_to_xmm_move_src (vmode, reg)));
+      rtx_insn *seq = get_insns ();
+      end_sequence ();
+      rtx_insn *insn = DF_REF_INSN (ref);
+      emit_conversion_insns (seq, insn);
 
-	    rtx def_set = single_set (insn);
-	    gcc_assert (def_set);
+      if (dump_file)
+	fprintf (dump_file,
+		 "  Copied r%d to a vector register r%d for insn %d\n",
+		 regno, REGNO (vreg), INSN_UID (insn));
+    }
+}
 
-	    rtx src = SET_SRC (def_set);
-	    rtx dst = SET_DEST (def_set);
+/* Copy the definition SRC of INSN inside the chain to DST for
+   scalar uses outside of the chain.  */
 
-	    if (!MEM_P (dst) || !REG_P (src))
-	      replace_with_subreg_in_insn (insn, reg, reg);
+void
+general_scalar_chain::convert_reg (rtx_insn *insn, rtx dst, rtx src)
+{
+  start_sequence ();
+  if (!TARGET_INTER_UNIT_MOVES_FROM_VEC)
+    {
+      rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP);
+      emit_move_insn (tmp, src);
+      if (!TARGET_64BIT && smode == DImode)
+	{
+	  emit_move_insn (gen_rtx_SUBREG (SImode, dst, 0),
+			  adjust_address (tmp, SImode, 0));
+	  emit_move_insn (gen_rtx_SUBREG (SImode, dst, 4),
+			  adjust_address (tmp, SImode, 4));
+	}
+      else
+	emit_move_insn (dst, copy_rtx (tmp));
+    }
+  else if (!TARGET_64BIT && smode == DImode)
+    {
+      if (TARGET_SSE4_1)
+	{
+	  rtx tmp = gen_rtx_PARALLEL (VOIDmode,
+				      gen_rtvec (1, const0_rtx));
+	  emit_insn
+	      (gen_rtx_SET
+	       (gen_rtx_SUBREG (SImode, dst, 0),
+		gen_rtx_VEC_SELECT (SImode,
+				    gen_rtx_SUBREG (V4SImode, src, 0),
+				    tmp)));
+
+	  tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, const1_rtx));
+	  emit_insn
+	      (gen_rtx_SET
+	       (gen_rtx_SUBREG (SImode, dst, 4),
+		gen_rtx_VEC_SELECT (SImode,
+				    gen_rtx_SUBREG (V4SImode, src, 0),
+				    tmp)));
+	}
+      else
+	{
+	  rtx vcopy = gen_reg_rtx (V2DImode);
+	  emit_move_insn (vcopy, gen_rtx_SUBREG (V2DImode, src, 0));
+	  emit_move_insn (gen_rtx_SUBREG (SImode, dst, 0),
+			  gen_rtx_SUBREG (SImode, vcopy, 0));
+	  emit_move_insn (vcopy,
+			  gen_rtx_LSHIFTRT (V2DImode,
+					    vcopy, GEN_INT (32)));
+	  emit_move_insn (gen_rtx_SUBREG (SImode, dst, 4),
+			  gen_rtx_SUBREG (SImode, vcopy, 0));
+	}
+    }
+  else
+    emit_move_insn (dst, src);
 
-	    bitmap_clear_bit (conv, INSN_UID (insn));
-	  }
-      }
-    /* Skip debug insns and uninitialized uses.  */
-    else if (DF_REF_CHAIN (ref)
-	     && NONDEBUG_INSN_P (DF_REF_INSN (ref)))
-      {
-	gcc_assert (scopy);
-	replace_rtx (DF_REF_INSN (ref), reg, scopy);
-	df_insn_rescan (DF_REF_INSN (ref));
-      }
+  rtx_insn *seq = get_insns ();
+  end_sequence ();
+  emit_conversion_insns (seq, insn);
 
-  BITMAP_FREE (conv);
+  if (dump_file)
+    fprintf (dump_file,
+	     "  Copied r%d to a scalar register r%d for insn %d\n",
+	     REGNO (src), REGNO (dst), INSN_UID (insn));
 }
 
 /* Convert operand OP in INSN.  We should handle
@@ -921,16 +824,6 @@ general_scalar_chain::convert_op (rtx *o
     }
   else if (REG_P (*op))
     {
-      /* We may have not converted register usage in case
-	 this register has no definition.  Otherwise it
-	 should be converted in convert_reg.  */
-      df_ref ref;
-      FOR_EACH_INSN_USE (ref, insn)
-	if (DF_REF_REGNO (ref) == REGNO (*op))
-	  {
-	    gcc_assert (!DF_REF_CHAIN (ref));
-	    break;
-	  }
       *op = gen_rtx_SUBREG (vmode, *op, 0);
     }
   else if (CONST_INT_P (*op))
@@ -980,6 +873,32 @@ general_scalar_chain::convert_insn (rtx_
   rtx dst = SET_DEST (def_set);
   rtx subreg;
 
+  /* Generate copies for out-of-chain uses of defs.  */
+  for (df_ref ref = DF_INSN_DEFS (insn); ref; ref = DF_REF_NEXT_LOC (ref))
+    if (bitmap_bit_p (defs_conv, DF_REF_REGNO (ref)))
+      {
+	df_link *use;
+	for (use = DF_REF_CHAIN (ref); use; use = use->next)
+	  if (DF_REF_REG_MEM_P (use->ref)
+	      || !bitmap_bit_p (insns, DF_REF_INSN_UID (use->ref)))
+	    break;
+	if (use)
+	  convert_reg (insn, DF_REF_REG (ref),
+		       *defs_map.get (regno_reg_rtx [DF_REF_REGNO (ref)]));
+      }
+
+  /* Replace uses in this insn with the defs we use in the chain.  */
+  for (df_ref ref = DF_INSN_USES (insn); ref; ref = DF_REF_NEXT_LOC (ref))
+    if (!DF_REF_REG_MEM_P (ref))
+      if (rtx *vreg = defs_map.get (regno_reg_rtx[DF_REF_REGNO (ref)]))
+	{
+	  /* Also update a corresponding REG_DEAD note.  */
+	  rtx note = find_reg_note (insn, REG_DEAD, DF_REF_REG (ref));
+	  if (note)
+	    XEXP (note, 0) = *vreg;
+	  *DF_REF_REAL_LOC (ref) = *vreg;
+	}
+
   if (MEM_P (dst) && !REG_P (src))
     {
       /* There are no scalar integer instructions and therefore
@@ -988,6 +907,20 @@ general_scalar_chain::convert_insn (rtx_
       emit_conversion_insns (gen_move_insn (dst, tmp), insn);
       dst = gen_rtx_SUBREG (vmode, tmp, 0);
     }
+  else if (REG_P (dst))
+    {
+      /* Replace the definition with a SUBREG to the definition we
+         use inside the chain.  */
+      rtx *vdef = defs_map.get (dst);
+      if (vdef)
+	dst = *vdef;
+      dst = gen_rtx_SUBREG (vmode, dst, 0);
+      /* IRA doesn't like to have REG_EQUAL/EQUIV notes when the SET_DEST
+         is a non-REG_P.  So kill those off.  */
+      rtx note = find_reg_equal_equiv_note (insn);
+      if (note)
+	remove_note (insn, note);
+    }
 
   switch (GET_CODE (src))
     {
@@ -1045,20 +978,15 @@ general_scalar_chain::convert_insn (rtx_
     case COMPARE:
       src = SUBREG_REG (XEXP (XEXP (src, 0), 0));
 
-      gcc_assert ((REG_P (src) && GET_MODE (src) == DImode)
-		  || (SUBREG_P (src) && GET_MODE (src) == V2DImode));
-
-      if (REG_P (src))
-	subreg = gen_rtx_SUBREG (V2DImode, src, 0);
-      else
-	subreg = copy_rtx_if_shared (src);
+      gcc_assert (REG_P (src) && GET_MODE (src) == DImode);
+      subreg = gen_rtx_SUBREG (V2DImode, src, 0);
       emit_insn_before (gen_vec_interleave_lowv2di (copy_rtx_if_shared (subreg),
 						    copy_rtx_if_shared (subreg),
 						    copy_rtx_if_shared (subreg)),
 			insn);
       dst = gen_rtx_REG (CCmode, FLAGS_REG);
-      src = gen_rtx_UNSPEC (CCmode, gen_rtvec (2, copy_rtx_if_shared (src),
-					       copy_rtx_if_shared (src)),
+      src = gen_rtx_UNSPEC (CCmode, gen_rtvec (2, copy_rtx_if_shared (subreg),
+					       copy_rtx_if_shared (subreg)),
 			    UNSPEC_PTEST);
       break;
 
@@ -1217,16 +1145,15 @@ timode_scalar_chain::convert_insn (rtx_i
   df_insn_rescan (insn);
 }
 
+/* Generate copies from defs used by the chain but not defined therein.
+   Also populates defs_map which is used later by convert_insn.  */
+
 void
 general_scalar_chain::convert_registers ()
 {
   bitmap_iterator bi;
   unsigned id;
-
-  EXECUTE_IF_SET_IN_BITMAP (defs, 0, id, bi)
-    convert_reg (id);
-
-  EXECUTE_IF_AND_COMPL_IN_BITMAP (defs_conv, defs, 0, id, bi)
+  EXECUTE_IF_SET_IN_BITMAP (defs_conv, 0, id, bi)
     make_vector_copies (id);
 }
 
Index: gcc/config/i386/i386-features.h
===================================================================
--- gcc/config/i386/i386-features.h	(revision 274843)
+++ gcc/config/i386/i386-features.h	(working copy)
@@ -171,12 +171,11 @@ class general_scalar_chain : public scal
     : scalar_chain (smode_, vmode_) {}
   int compute_convert_gain ();
  private:
+  hash_map<rtx, rtx> defs_map;
   void mark_dual_mode_def (df_ref def);
-  rtx replace_with_subreg (rtx x, rtx reg, rtx subreg);
-  void replace_with_subreg_in_insn (rtx_insn *insn, rtx reg, rtx subreg);
   void convert_insn (rtx_insn *insn);
   void convert_op (rtx *op, rtx_insn *insn);
-  void convert_reg (unsigned regno);
+  void convert_reg (rtx_insn *insn, rtx dst, rtx src);
   void make_vector_copies (unsigned regno);
   void convert_registers ();
   int vector_const_cost (rtx exp);
Richard Biener Aug. 23, 2019, 12:38 p.m. UTC | #3
On Fri, 23 Aug 2019, Richard Biener wrote:

> On Fri, 23 Aug 2019, Uros Bizjak wrote:
> 
> > On Thu, Aug 22, 2019 at 3:35 PM Richard Biener <rguenther@suse.de> wrote:
> > >
> > >
> > > This fixes quadraticness in STV and makes
> > >
> > >  machine dep reorg                  :  89.07 ( 95%)   0.02 ( 18%)  89.10 (
> > > 95%)      54 kB (  0%)
> > >
> > > drop to zero.  Anybody remembers why it is the way it is now?
> > >
> > > Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> > >
> > > OK?
> > 
> > Looking at the PR, comment #3 [1], I assume this patch is obsoltete
> > and will be updated?
> > 
> > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91522#c3
> 
> Yes.  I'm still learning how STV operates (and learing DF and RTL...).
> The following is a rewrite of the non-TImode chain conversion
> according to I think how it should operate als allowing the hunk
> that fixes the compile-time and fixing PR91527 on the way
> (which I ran into during more extensive testing of the patch myself).
> 
> So compared to the state before which I still do not 100% understand
> we now do the following.  Chain detection works as before including
> recording of all defs (both defined by the insns in the chain and
> insns outside) that need copy-in or copy-out operations.
> 
> But then the patch changes things as to guarantee that
> after the conversion all uses/defs of a pseudo are
> of the (subreg:Vmode ..) form or of the original scalar form.
> In particular it avoids the need to change any insns that
> are not part of the chain (besides emitting the extra copy
> instructions).  For this all defs that were marked as needing
> copies (thus they have uses/defs both in the chain and outside)
> the chain will use a new pseudo that we copy to from scalar sources
> and that we copy from for scalar uses.  There's the new defs_map
> which records the mapping of old to new reg.  pseudos that are
> only used in the chain already are not remapped.
> 
> The conversion itself then happens in two stages, first,
> in make_vector_copies, we emit the copy-in insns and
> allocate all pseudos we need.  Then the rest of the
> conversion happens fully inside of convert_insn where
> we generate the copy-outs of the insns def, replace
> defs and uses according to the mapping and replace uses
> and defs with the (subreg:Vmode ..) style.
> 
> For PR91527 IRA doesn't like the REG_EQUIV note in
> 
> (insn 4 24 5 2 (set (subreg:V4SI (reg/v:SI 90 [ c ]) 0)
>         (subreg:V4SI (reg:SI 100) 0)) 
> "/space/rguenther/src/svn/trunk2/gcc/testsuite/g++.dg/tree-ssa/pr21463.C":11:4 
> 1248 {movv4si_internal}
>      (expr_list:REG_DEAD (reg:SI 100)
>         (expr_list:REG_EQUIV (mem/c:SI (plus:DI (reg/f:DI 16 argp)
>                     (const_int 16 [0x10])) [1 c+0 S4 A64])
>             (nil))))
> 
> because the SET_DEST is not a REG_P.  I'm not sure if this
> is invalid RTL, docs say SET_DEST might be a strict_low_part
> or a zero_extract but doesn't mention a subreg.  So I opted
> to simply remove equal/equiv notes on insns we convert
> and since the above has a REG_DEAD note I took the liberty
> to update that according to the mapping (so that would have
> been not needed before this patch) rather than dropping it.
> 
> Bootstrapped with and without --with-march=westmere (to get
> some STV coverage, this included all languages) on 
> x88_64-unknown-linux-gnu, testing in progress.
> 
> OK if testing succeeds?

Testing revealed I made an error in general_scalar_chain::convert_insn
failing to move down SET_SRC extraction below replacing with
the defs map.  This showed in 4 execute FAILs in 32bit fortran
testing (only).  Fixed by moving down the whole def_set/src/dst
extraction.

Re-testing on x86_64-unknown-linux-gnu.

Updated patch below.  I'm feeling adventurous and will run
the "westmere" bootstrap with costing disabled (aka always
convert detected chains...).

Richard.

2019-08-23  Richard Biener  <rguenther@suse.de>

	PR target/91522
	PR target/91527
	* config/i386/i386-features.h (general_scalar_chain::defs_map):
	New member.
	(general_scalar_chain::replace_with_subreg): Remove.
	(general_scalar_chain::replace_with_subreg_in_insn): Likewise.
	(general_scalar_chain::convert_reg): Adjust signature.
	* config/i386/i386-features.c (scalar_chain::add_insn): Do not
	iterate over all defs of a reg.
	(general_scalar_chain::replace_with_subreg): Remove.
	(general_scalar_chain::replace_with_subreg_in_insn): Likewise.
	(general_scalar_chain::make_vector_copies): Populate defs_map,
	place copy only after defs that are used as vectors in the chain.
	(general_scalar_chain::convert_reg): Emit a copy for a specific
	def in a specific instruction.
	(general_scalar_chain::convert_op): All reg uses are converted here.
	(general_scalar_chain::convert_insn): Emit copies for scalar
	uses of defs here.  Replace uses with the copies we created.
	Replace and convert the def.  Adjust REG_DEAD notes, remove
	REG_EQUIV/EQUAL notes.
	(general_scalar_chain::convert_registers): Only handle copies
	into the chain here.


Index: gcc/config/i386/i386-features.c
===================================================================
--- gcc/config/i386/i386-features.c	(revision 274843)
+++ gcc/config/i386/i386-features.c	(working copy)
@@ -416,13 +416,9 @@ scalar_chain::add_insn (bitmap candidate
      iterates over all refs to look for dual-mode regs.  Instead this
      should be done separately for all regs mentioned in the chain once.  */
   df_ref ref;
-  df_ref def;
   for (ref = DF_INSN_UID_DEFS (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref))
     if (!HARD_REGISTER_P (DF_REF_REG (ref)))
-      for (def = DF_REG_DEF_CHAIN (DF_REF_REGNO (ref));
-	   def;
-	   def = DF_REF_NEXT_REG (def))
-	analyze_register_chain (candidates, def);
+      analyze_register_chain (candidates, ref);
   for (ref = DF_INSN_UID_USES (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref))
     if (!DF_REF_REG_MEM_P (ref))
       analyze_register_chain (candidates, ref);
@@ -605,42 +601,6 @@ general_scalar_chain::compute_convert_ga
   return gain;
 }
 
-/* Replace REG in X with a V2DI subreg of NEW_REG.  */
-
-rtx
-general_scalar_chain::replace_with_subreg (rtx x, rtx reg, rtx new_reg)
-{
-  if (x == reg)
-    return gen_rtx_SUBREG (vmode, new_reg, 0);
-
-  /* But not in memory addresses.  */
-  if (MEM_P (x))
-    return x;
-
-  const char *fmt = GET_RTX_FORMAT (GET_CODE (x));
-  int i, j;
-  for (i = GET_RTX_LENGTH (GET_CODE (x)) - 1; i >= 0; i--)
-    {
-      if (fmt[i] == 'e')
-	XEXP (x, i) = replace_with_subreg (XEXP (x, i), reg, new_reg);
-      else if (fmt[i] == 'E')
-	for (j = XVECLEN (x, i) - 1; j >= 0; j--)
-	  XVECEXP (x, i, j) = replace_with_subreg (XVECEXP (x, i, j),
-						   reg, new_reg);
-    }
-
-  return x;
-}
-
-/* Replace REG in INSN with a V2DI subreg of NEW_REG.  */
-
-void
-general_scalar_chain::replace_with_subreg_in_insn (rtx_insn *insn,
-						  rtx reg, rtx new_reg)
-{
-  replace_with_subreg (single_set (insn), reg, new_reg);
-}
-
 /* Insert generated conversion instruction sequence INSNS
    after instruction AFTER.  New BB may be required in case
    instruction has EH region attached.  */
@@ -691,204 +651,147 @@ general_scalar_chain::make_vector_copies
   rtx vreg = gen_reg_rtx (smode);
   df_ref ref;
 
-  for (ref = DF_REG_DEF_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref))
-    if (!bitmap_bit_p (insns, DF_REF_INSN_UID (ref)))
-      {
-	start_sequence ();
-	if (!TARGET_INTER_UNIT_MOVES_TO_VEC)
-	  {
-	    rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP);
-	    if (smode == DImode && !TARGET_64BIT)
-	      {
-		emit_move_insn (adjust_address (tmp, SImode, 0),
-				gen_rtx_SUBREG (SImode, reg, 0));
-		emit_move_insn (adjust_address (tmp, SImode, 4),
-				gen_rtx_SUBREG (SImode, reg, 4));
-	      }
-	    else
-	      emit_move_insn (copy_rtx (tmp), reg);
-	    emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
-				    gen_gpr_to_xmm_move_src (vmode, tmp)));
-	  }
-	else if (!TARGET_64BIT && smode == DImode)
-	  {
-	    if (TARGET_SSE4_1)
-	      {
-		emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0),
-					    CONST0_RTX (V4SImode),
-					    gen_rtx_SUBREG (SImode, reg, 0)));
-		emit_insn (gen_sse4_1_pinsrd (gen_rtx_SUBREG (V4SImode, vreg, 0),
-					      gen_rtx_SUBREG (V4SImode, vreg, 0),
-					      gen_rtx_SUBREG (SImode, reg, 4),
-					      GEN_INT (2)));
-	      }
-	    else
-	      {
-		rtx tmp = gen_reg_rtx (DImode);
-		emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0),
-					    CONST0_RTX (V4SImode),
-					    gen_rtx_SUBREG (SImode, reg, 0)));
-		emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, tmp, 0),
-					    CONST0_RTX (V4SImode),
-					    gen_rtx_SUBREG (SImode, reg, 4)));
-		emit_insn (gen_vec_interleave_lowv4si
-			   (gen_rtx_SUBREG (V4SImode, vreg, 0),
-			    gen_rtx_SUBREG (V4SImode, vreg, 0),
-			    gen_rtx_SUBREG (V4SImode, tmp, 0)));
-	      }
-	  }
-	else
-	  emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
-				  gen_gpr_to_xmm_move_src (vmode, reg)));
-	rtx_insn *seq = get_insns ();
-	end_sequence ();
-	rtx_insn *insn = DF_REF_INSN (ref);
-	emit_conversion_insns (seq, insn);
-
-	if (dump_file)
-	  fprintf (dump_file,
-		   "  Copied r%d to a vector register r%d for insn %d\n",
-		   regno, REGNO (vreg), INSN_UID (insn));
-      }
-
-  for (ref = DF_REG_USE_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref))
-    if (bitmap_bit_p (insns, DF_REF_INSN_UID (ref)))
-      {
-	rtx_insn *insn = DF_REF_INSN (ref);
-	replace_with_subreg_in_insn (insn, reg, vreg);
-
-	if (dump_file)
-	  fprintf (dump_file, "  Replaced r%d with r%d in insn %d\n",
-		   regno, REGNO (vreg), INSN_UID (insn));
-      }
-}
-
-/* Convert all definitions of register REGNO
-   and fix its uses.  Scalar copies may be created
-   in case register is used in not convertible insn.  */
-
-void
-general_scalar_chain::convert_reg (unsigned regno)
-{
-  bool scalar_copy = bitmap_bit_p (defs_conv, regno);
-  rtx reg = regno_reg_rtx[regno];
-  rtx scopy = NULL_RTX;
-  df_ref ref;
-  bitmap conv;
-
-  conv = BITMAP_ALLOC (NULL);
-  bitmap_copy (conv, insns);
-
-  if (scalar_copy)
-    scopy = gen_reg_rtx (smode);
+  defs_map.put (reg, vreg);
 
+  /* For each insn defining REGNO, see if it is defined by an insn
+     not part of the chain but with uses in insns part of the chain
+     and insert a copy in that case.  */
   for (ref = DF_REG_DEF_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref))
     {
-      rtx_insn *insn = DF_REF_INSN (ref);
-      rtx def_set = single_set (insn);
-      rtx src = SET_SRC (def_set);
-      rtx reg = DF_REF_REG (ref);
+      if (bitmap_bit_p (insns, DF_REF_INSN_UID (ref)))
+	continue;
+      df_link *use;
+      for (use = DF_REF_CHAIN (ref); use; use = use->next)
+	if (!DF_REF_REG_MEM_P (use->ref)
+	    && bitmap_bit_p (insns, DF_REF_INSN_UID (use->ref)))
+	  break;
+      if (!use)
+	continue;
 
-      if (!MEM_P (src))
+      start_sequence ();
+      if (!TARGET_INTER_UNIT_MOVES_TO_VEC)
 	{
-	  replace_with_subreg_in_insn (insn, reg, reg);
-	  bitmap_clear_bit (conv, INSN_UID (insn));
+	  rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP);
+	  if (smode == DImode && !TARGET_64BIT)
+	    {
+	      emit_move_insn (adjust_address (tmp, SImode, 0),
+			      gen_rtx_SUBREG (SImode, reg, 0));
+	      emit_move_insn (adjust_address (tmp, SImode, 4),
+			      gen_rtx_SUBREG (SImode, reg, 4));
+	    }
+	  else
+	    emit_move_insn (copy_rtx (tmp), reg);
+	  emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
+				  gen_gpr_to_xmm_move_src (vmode, tmp)));
 	}
-
-      if (scalar_copy)
+      else if (!TARGET_64BIT && smode == DImode)
 	{
-	  start_sequence ();
-	  if (!TARGET_INTER_UNIT_MOVES_FROM_VEC)
+	  if (TARGET_SSE4_1)
 	    {
-	      rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP);
-	      emit_move_insn (tmp, reg);
-	      if (!TARGET_64BIT && smode == DImode)
-		{
-		  emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 0),
-				  adjust_address (tmp, SImode, 0));
-		  emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 4),
-				  adjust_address (tmp, SImode, 4));
-		}
-	      else
-		emit_move_insn (scopy, copy_rtx (tmp));
+	      emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0),
+					  CONST0_RTX (V4SImode),
+					  gen_rtx_SUBREG (SImode, reg, 0)));
+	      emit_insn (gen_sse4_1_pinsrd (gen_rtx_SUBREG (V4SImode, vreg, 0),
+					    gen_rtx_SUBREG (V4SImode, vreg, 0),
+					    gen_rtx_SUBREG (SImode, reg, 4),
+					    GEN_INT (2)));
 	    }
-	  else if (!TARGET_64BIT && smode == DImode)
+	  else
 	    {
-	      if (TARGET_SSE4_1)
-		{
-		  rtx tmp = gen_rtx_PARALLEL (VOIDmode,
-					      gen_rtvec (1, const0_rtx));
-		  emit_insn
-		    (gen_rtx_SET
-		       (gen_rtx_SUBREG (SImode, scopy, 0),
-			gen_rtx_VEC_SELECT (SImode,
-					    gen_rtx_SUBREG (V4SImode, reg, 0),
-					    tmp)));
-
-		  tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, const1_rtx));
-		  emit_insn
-		    (gen_rtx_SET
-		       (gen_rtx_SUBREG (SImode, scopy, 4),
-			gen_rtx_VEC_SELECT (SImode,
-					    gen_rtx_SUBREG (V4SImode, reg, 0),
-					    tmp)));
-		}
-	      else
-		{
-		  rtx vcopy = gen_reg_rtx (V2DImode);
-		  emit_move_insn (vcopy, gen_rtx_SUBREG (V2DImode, reg, 0));
-		  emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 0),
-				  gen_rtx_SUBREG (SImode, vcopy, 0));
-		  emit_move_insn (vcopy,
-				  gen_rtx_LSHIFTRT (V2DImode,
-						    vcopy, GEN_INT (32)));
-		  emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 4),
-				  gen_rtx_SUBREG (SImode, vcopy, 0));
-		}
+	      rtx tmp = gen_reg_rtx (DImode);
+	      emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0),
+					  CONST0_RTX (V4SImode),
+					  gen_rtx_SUBREG (SImode, reg, 0)));
+	      emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, tmp, 0),
+					  CONST0_RTX (V4SImode),
+					  gen_rtx_SUBREG (SImode, reg, 4)));
+	      emit_insn (gen_vec_interleave_lowv4si
+			 (gen_rtx_SUBREG (V4SImode, vreg, 0),
+			  gen_rtx_SUBREG (V4SImode, vreg, 0),
+			  gen_rtx_SUBREG (V4SImode, tmp, 0)));
 	    }
-	  else
-	    emit_move_insn (scopy, reg);
-
-	  rtx_insn *seq = get_insns ();
-	  end_sequence ();
-	  emit_conversion_insns (seq, insn);
-
-	  if (dump_file)
-	    fprintf (dump_file,
-		     "  Copied r%d to a scalar register r%d for insn %d\n",
-		     regno, REGNO (scopy), INSN_UID (insn));
 	}
-    }
-
-  for (ref = DF_REG_USE_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref))
-    if (bitmap_bit_p (insns, DF_REF_INSN_UID (ref)))
-      {
-	if (bitmap_bit_p (conv, DF_REF_INSN_UID (ref)))
-	  {
-	    rtx_insn *insn = DF_REF_INSN (ref);
+      else
+	emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
+				gen_gpr_to_xmm_move_src (vmode, reg)));
+      rtx_insn *seq = get_insns ();
+      end_sequence ();
+      rtx_insn *insn = DF_REF_INSN (ref);
+      emit_conversion_insns (seq, insn);
 
-	    rtx def_set = single_set (insn);
-	    gcc_assert (def_set);
+      if (dump_file)
+	fprintf (dump_file,
+		 "  Copied r%d to a vector register r%d for insn %d\n",
+		 regno, REGNO (vreg), INSN_UID (insn));
+    }
+}
 
-	    rtx src = SET_SRC (def_set);
-	    rtx dst = SET_DEST (def_set);
+/* Copy the definition SRC of INSN inside the chain to DST for
+   scalar uses outside of the chain.  */
 
-	    if (!MEM_P (dst) || !REG_P (src))
-	      replace_with_subreg_in_insn (insn, reg, reg);
+void
+general_scalar_chain::convert_reg (rtx_insn *insn, rtx dst, rtx src)
+{
+  start_sequence ();
+  if (!TARGET_INTER_UNIT_MOVES_FROM_VEC)
+    {
+      rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP);
+      emit_move_insn (tmp, src);
+      if (!TARGET_64BIT && smode == DImode)
+	{
+	  emit_move_insn (gen_rtx_SUBREG (SImode, dst, 0),
+			  adjust_address (tmp, SImode, 0));
+	  emit_move_insn (gen_rtx_SUBREG (SImode, dst, 4),
+			  adjust_address (tmp, SImode, 4));
+	}
+      else
+	emit_move_insn (dst, copy_rtx (tmp));
+    }
+  else if (!TARGET_64BIT && smode == DImode)
+    {
+      if (TARGET_SSE4_1)
+	{
+	  rtx tmp = gen_rtx_PARALLEL (VOIDmode,
+				      gen_rtvec (1, const0_rtx));
+	  emit_insn
+	      (gen_rtx_SET
+	       (gen_rtx_SUBREG (SImode, dst, 0),
+		gen_rtx_VEC_SELECT (SImode,
+				    gen_rtx_SUBREG (V4SImode, src, 0),
+				    tmp)));
+
+	  tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, const1_rtx));
+	  emit_insn
+	      (gen_rtx_SET
+	       (gen_rtx_SUBREG (SImode, dst, 4),
+		gen_rtx_VEC_SELECT (SImode,
+				    gen_rtx_SUBREG (V4SImode, src, 0),
+				    tmp)));
+	}
+      else
+	{
+	  rtx vcopy = gen_reg_rtx (V2DImode);
+	  emit_move_insn (vcopy, gen_rtx_SUBREG (V2DImode, src, 0));
+	  emit_move_insn (gen_rtx_SUBREG (SImode, dst, 0),
+			  gen_rtx_SUBREG (SImode, vcopy, 0));
+	  emit_move_insn (vcopy,
+			  gen_rtx_LSHIFTRT (V2DImode,
+					    vcopy, GEN_INT (32)));
+	  emit_move_insn (gen_rtx_SUBREG (SImode, dst, 4),
+			  gen_rtx_SUBREG (SImode, vcopy, 0));
+	}
+    }
+  else
+    emit_move_insn (dst, src);
 
-	    bitmap_clear_bit (conv, INSN_UID (insn));
-	  }
-      }
-    /* Skip debug insns and uninitialized uses.  */
-    else if (DF_REF_CHAIN (ref)
-	     && NONDEBUG_INSN_P (DF_REF_INSN (ref)))
-      {
-	gcc_assert (scopy);
-	replace_rtx (DF_REF_INSN (ref), reg, scopy);
-	df_insn_rescan (DF_REF_INSN (ref));
-      }
+  rtx_insn *seq = get_insns ();
+  end_sequence ();
+  emit_conversion_insns (seq, insn);
 
-  BITMAP_FREE (conv);
+  if (dump_file)
+    fprintf (dump_file,
+	     "  Copied r%d to a scalar register r%d for insn %d\n",
+	     REGNO (src), REGNO (dst), INSN_UID (insn));
 }
 
 /* Convert operand OP in INSN.  We should handle
@@ -921,16 +824,6 @@ general_scalar_chain::convert_op (rtx *o
     }
   else if (REG_P (*op))
     {
-      /* We may have not converted register usage in case
-	 this register has no definition.  Otherwise it
-	 should be converted in convert_reg.  */
-      df_ref ref;
-      FOR_EACH_INSN_USE (ref, insn)
-	if (DF_REF_REGNO (ref) == REGNO (*op))
-	  {
-	    gcc_assert (!DF_REF_CHAIN (ref));
-	    break;
-	  }
       *op = gen_rtx_SUBREG (vmode, *op, 0);
     }
   else if (CONST_INT_P (*op))
@@ -975,6 +868,32 @@ general_scalar_chain::convert_op (rtx *o
 void
 general_scalar_chain::convert_insn (rtx_insn *insn)
 {
+  /* Generate copies for out-of-chain uses of defs.  */
+  for (df_ref ref = DF_INSN_DEFS (insn); ref; ref = DF_REF_NEXT_LOC (ref))
+    if (bitmap_bit_p (defs_conv, DF_REF_REGNO (ref)))
+      {
+	df_link *use;
+	for (use = DF_REF_CHAIN (ref); use; use = use->next)
+	  if (DF_REF_REG_MEM_P (use->ref)
+	      || !bitmap_bit_p (insns, DF_REF_INSN_UID (use->ref)))
+	    break;
+	if (use)
+	  convert_reg (insn, DF_REF_REG (ref),
+		       *defs_map.get (regno_reg_rtx [DF_REF_REGNO (ref)]));
+      }
+
+  /* Replace uses in this insn with the defs we use in the chain.  */
+  for (df_ref ref = DF_INSN_USES (insn); ref; ref = DF_REF_NEXT_LOC (ref))
+    if (!DF_REF_REG_MEM_P (ref))
+      if (rtx *vreg = defs_map.get (regno_reg_rtx[DF_REF_REGNO (ref)]))
+	{
+	  /* Also update a corresponding REG_DEAD note.  */
+	  rtx note = find_reg_note (insn, REG_DEAD, DF_REF_REG (ref));
+	  if (note)
+	    XEXP (note, 0) = *vreg;
+	  *DF_REF_REAL_LOC (ref) = *vreg;
+	}
+
   rtx def_set = single_set (insn);
   rtx src = SET_SRC (def_set);
   rtx dst = SET_DEST (def_set);
@@ -988,6 +907,20 @@ general_scalar_chain::convert_insn (rtx_
       emit_conversion_insns (gen_move_insn (dst, tmp), insn);
       dst = gen_rtx_SUBREG (vmode, tmp, 0);
     }
+  else if (REG_P (dst))
+    {
+      /* Replace the definition with a SUBREG to the definition we
+         use inside the chain.  */
+      rtx *vdef = defs_map.get (dst);
+      if (vdef)
+	dst = *vdef;
+      dst = gen_rtx_SUBREG (vmode, dst, 0);
+      /* IRA doesn't like to have REG_EQUAL/EQUIV notes when the SET_DEST
+         is a non-REG_P.  So kill those off.  */
+      rtx note = find_reg_equal_equiv_note (insn);
+      if (note)
+	remove_note (insn, note);
+    }
 
   switch (GET_CODE (src))
     {
@@ -1045,20 +978,15 @@ general_scalar_chain::convert_insn (rtx_
     case COMPARE:
       src = SUBREG_REG (XEXP (XEXP (src, 0), 0));
 
-      gcc_assert ((REG_P (src) && GET_MODE (src) == DImode)
-		  || (SUBREG_P (src) && GET_MODE (src) == V2DImode));
-
-      if (REG_P (src))
-	subreg = gen_rtx_SUBREG (V2DImode, src, 0);
-      else
-	subreg = copy_rtx_if_shared (src);
+      gcc_assert (REG_P (src) && GET_MODE (src) == DImode);
+      subreg = gen_rtx_SUBREG (V2DImode, src, 0);
       emit_insn_before (gen_vec_interleave_lowv2di (copy_rtx_if_shared (subreg),
 						    copy_rtx_if_shared (subreg),
 						    copy_rtx_if_shared (subreg)),
 			insn);
       dst = gen_rtx_REG (CCmode, FLAGS_REG);
-      src = gen_rtx_UNSPEC (CCmode, gen_rtvec (2, copy_rtx_if_shared (src),
-					       copy_rtx_if_shared (src)),
+      src = gen_rtx_UNSPEC (CCmode, gen_rtvec (2, copy_rtx_if_shared (subreg),
+					       copy_rtx_if_shared (subreg)),
 			    UNSPEC_PTEST);
       break;
 
@@ -1217,16 +1145,15 @@ timode_scalar_chain::convert_insn (rtx_i
   df_insn_rescan (insn);
 }
 
+/* Generate copies from defs used by the chain but not defined therein.
+   Also populates defs_map which is used later by convert_insn.  */
+
 void
 general_scalar_chain::convert_registers ()
 {
   bitmap_iterator bi;
   unsigned id;
-
-  EXECUTE_IF_SET_IN_BITMAP (defs, 0, id, bi)
-    convert_reg (id);
-
-  EXECUTE_IF_AND_COMPL_IN_BITMAP (defs_conv, defs, 0, id, bi)
+  EXECUTE_IF_SET_IN_BITMAP (defs_conv, 0, id, bi)
     make_vector_copies (id);
 }
 
Index: gcc/config/i386/i386-features.h
===================================================================
--- gcc/config/i386/i386-features.h	(revision 274843)
+++ gcc/config/i386/i386-features.h	(working copy)
@@ -171,12 +171,11 @@ class general_scalar_chain : public scal
     : scalar_chain (smode_, vmode_) {}
   int compute_convert_gain ();
  private:
+  hash_map<rtx, rtx> defs_map;
   void mark_dual_mode_def (df_ref def);
-  rtx replace_with_subreg (rtx x, rtx reg, rtx subreg);
-  void replace_with_subreg_in_insn (rtx_insn *insn, rtx reg, rtx subreg);
   void convert_insn (rtx_insn *insn);
   void convert_op (rtx *op, rtx_insn *insn);
-  void convert_reg (unsigned regno);
+  void convert_reg (rtx_insn *insn, rtx dst, rtx src);
   void make_vector_copies (unsigned regno);
   void convert_registers ();
   int vector_const_cost (rtx exp);
Richard Biener Aug. 26, 2019, 8:40 a.m. UTC | #4
On Fri, 23 Aug 2019, Richard Biener wrote:

> On Fri, 23 Aug 2019, Richard Biener wrote:
> 
> > On Fri, 23 Aug 2019, Uros Bizjak wrote:
> > 
> > > On Thu, Aug 22, 2019 at 3:35 PM Richard Biener <rguenther@suse.de> wrote:
> > > >
> > > >
> > > > This fixes quadraticness in STV and makes
> > > >
> > > >  machine dep reorg                  :  89.07 ( 95%)   0.02 ( 18%)  89.10 (
> > > > 95%)      54 kB (  0%)
> > > >
> > > > drop to zero.  Anybody remembers why it is the way it is now?
> > > >
> > > > Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> > > >
> > > > OK?
> > > 
> > > Looking at the PR, comment #3 [1], I assume this patch is obsoltete
> > > and will be updated?
> > > 
> > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91522#c3
> > 
> > Yes.  I'm still learning how STV operates (and learing DF and RTL...).
> > The following is a rewrite of the non-TImode chain conversion
> > according to I think how it should operate als allowing the hunk
> > that fixes the compile-time and fixing PR91527 on the way
> > (which I ran into during more extensive testing of the patch myself).
> > 
> > So compared to the state before which I still do not 100% understand
> > we now do the following.  Chain detection works as before including
> > recording of all defs (both defined by the insns in the chain and
> > insns outside) that need copy-in or copy-out operations.
> > 
> > But then the patch changes things as to guarantee that
> > after the conversion all uses/defs of a pseudo are
> > of the (subreg:Vmode ..) form or of the original scalar form.
> > In particular it avoids the need to change any insns that
> > are not part of the chain (besides emitting the extra copy
> > instructions).  For this all defs that were marked as needing
> > copies (thus they have uses/defs both in the chain and outside)
> > the chain will use a new pseudo that we copy to from scalar sources
> > and that we copy from for scalar uses.  There's the new defs_map
> > which records the mapping of old to new reg.  pseudos that are
> > only used in the chain already are not remapped.
> > 
> > The conversion itself then happens in two stages, first,
> > in make_vector_copies, we emit the copy-in insns and
> > allocate all pseudos we need.  Then the rest of the
> > conversion happens fully inside of convert_insn where
> > we generate the copy-outs of the insns def, replace
> > defs and uses according to the mapping and replace uses
> > and defs with the (subreg:Vmode ..) style.
> > 
> > For PR91527 IRA doesn't like the REG_EQUIV note in
> > 
> > (insn 4 24 5 2 (set (subreg:V4SI (reg/v:SI 90 [ c ]) 0)
> >         (subreg:V4SI (reg:SI 100) 0)) 
> > "/space/rguenther/src/svn/trunk2/gcc/testsuite/g++.dg/tree-ssa/pr21463.C":11:4 
> > 1248 {movv4si_internal}
> >      (expr_list:REG_DEAD (reg:SI 100)
> >         (expr_list:REG_EQUIV (mem/c:SI (plus:DI (reg/f:DI 16 argp)
> >                     (const_int 16 [0x10])) [1 c+0 S4 A64])
> >             (nil))))
> > 
> > because the SET_DEST is not a REG_P.  I'm not sure if this
> > is invalid RTL, docs say SET_DEST might be a strict_low_part
> > or a zero_extract but doesn't mention a subreg.  So I opted
> > to simply remove equal/equiv notes on insns we convert
> > and since the above has a REG_DEAD note I took the liberty
> > to update that according to the mapping (so that would have
> > been not needed before this patch) rather than dropping it.
> > 
> > Bootstrapped with and without --with-march=westmere (to get
> > some STV coverage, this included all languages) on 
> > x88_64-unknown-linux-gnu, testing in progress.
> > 
> > OK if testing succeeds?
> 
> Testing revealed I made an error in general_scalar_chain::convert_insn
> failing to move down SET_SRC extraction below replacing with
> the defs map.  This showed in 4 execute FAILs in 32bit fortran
> testing (only).  Fixed by moving down the whole def_set/src/dst
> extraction.
> 
> Re-testing on x86_64-unknown-linux-gnu.

Bootstrapped / tested on x86_64-unknown-linux-gnu.  The "no-costmodel"
run runs into the latent PR91528 building 32bit libada in stage3
for a few sources, I've manually built those with -mno-stv and
bootstrap finishes successfully.  I hope HJ can help with this
dynamic stack-alignment issue.

So - OK for trunk?

As followup we can now remove general_remove_non_convertible_regs
because we can handle defs that cannot be converted just fine
with the patch since we're splitting live-ranges of all defs at
the chain boundary.

Thanks,
Richard.

> Updated patch below.  I'm feeling adventurous and will run
> the "westmere" bootstrap with costing disabled (aka always
> convert detected chains...).
> 
> Richard.
> 
> 2019-08-23  Richard Biener  <rguenther@suse.de>
> 
> 	PR target/91522
> 	PR target/91527
> 	* config/i386/i386-features.h (general_scalar_chain::defs_map):
> 	New member.
> 	(general_scalar_chain::replace_with_subreg): Remove.
> 	(general_scalar_chain::replace_with_subreg_in_insn): Likewise.
> 	(general_scalar_chain::convert_reg): Adjust signature.
> 	* config/i386/i386-features.c (scalar_chain::add_insn): Do not
> 	iterate over all defs of a reg.
> 	(general_scalar_chain::replace_with_subreg): Remove.
> 	(general_scalar_chain::replace_with_subreg_in_insn): Likewise.
> 	(general_scalar_chain::make_vector_copies): Populate defs_map,
> 	place copy only after defs that are used as vectors in the chain.
> 	(general_scalar_chain::convert_reg): Emit a copy for a specific
> 	def in a specific instruction.
> 	(general_scalar_chain::convert_op): All reg uses are converted here.
> 	(general_scalar_chain::convert_insn): Emit copies for scalar
> 	uses of defs here.  Replace uses with the copies we created.
> 	Replace and convert the def.  Adjust REG_DEAD notes, remove
> 	REG_EQUIV/EQUAL notes.
> 	(general_scalar_chain::convert_registers): Only handle copies
> 	into the chain here.
> 
> 
> Index: gcc/config/i386/i386-features.c
> ===================================================================
> --- gcc/config/i386/i386-features.c	(revision 274843)
> +++ gcc/config/i386/i386-features.c	(working copy)
> @@ -416,13 +416,9 @@ scalar_chain::add_insn (bitmap candidate
>       iterates over all refs to look for dual-mode regs.  Instead this
>       should be done separately for all regs mentioned in the chain once.  */
>    df_ref ref;
> -  df_ref def;
>    for (ref = DF_INSN_UID_DEFS (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref))
>      if (!HARD_REGISTER_P (DF_REF_REG (ref)))
> -      for (def = DF_REG_DEF_CHAIN (DF_REF_REGNO (ref));
> -	   def;
> -	   def = DF_REF_NEXT_REG (def))
> -	analyze_register_chain (candidates, def);
> +      analyze_register_chain (candidates, ref);
>    for (ref = DF_INSN_UID_USES (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref))
>      if (!DF_REF_REG_MEM_P (ref))
>        analyze_register_chain (candidates, ref);
> @@ -605,42 +601,6 @@ general_scalar_chain::compute_convert_ga
>    return gain;
>  }
>  
> -/* Replace REG in X with a V2DI subreg of NEW_REG.  */
> -
> -rtx
> -general_scalar_chain::replace_with_subreg (rtx x, rtx reg, rtx new_reg)
> -{
> -  if (x == reg)
> -    return gen_rtx_SUBREG (vmode, new_reg, 0);
> -
> -  /* But not in memory addresses.  */
> -  if (MEM_P (x))
> -    return x;
> -
> -  const char *fmt = GET_RTX_FORMAT (GET_CODE (x));
> -  int i, j;
> -  for (i = GET_RTX_LENGTH (GET_CODE (x)) - 1; i >= 0; i--)
> -    {
> -      if (fmt[i] == 'e')
> -	XEXP (x, i) = replace_with_subreg (XEXP (x, i), reg, new_reg);
> -      else if (fmt[i] == 'E')
> -	for (j = XVECLEN (x, i) - 1; j >= 0; j--)
> -	  XVECEXP (x, i, j) = replace_with_subreg (XVECEXP (x, i, j),
> -						   reg, new_reg);
> -    }
> -
> -  return x;
> -}
> -
> -/* Replace REG in INSN with a V2DI subreg of NEW_REG.  */
> -
> -void
> -general_scalar_chain::replace_with_subreg_in_insn (rtx_insn *insn,
> -						  rtx reg, rtx new_reg)
> -{
> -  replace_with_subreg (single_set (insn), reg, new_reg);
> -}
> -
>  /* Insert generated conversion instruction sequence INSNS
>     after instruction AFTER.  New BB may be required in case
>     instruction has EH region attached.  */
> @@ -691,204 +651,147 @@ general_scalar_chain::make_vector_copies
>    rtx vreg = gen_reg_rtx (smode);
>    df_ref ref;
>  
> -  for (ref = DF_REG_DEF_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref))
> -    if (!bitmap_bit_p (insns, DF_REF_INSN_UID (ref)))
> -      {
> -	start_sequence ();
> -	if (!TARGET_INTER_UNIT_MOVES_TO_VEC)
> -	  {
> -	    rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP);
> -	    if (smode == DImode && !TARGET_64BIT)
> -	      {
> -		emit_move_insn (adjust_address (tmp, SImode, 0),
> -				gen_rtx_SUBREG (SImode, reg, 0));
> -		emit_move_insn (adjust_address (tmp, SImode, 4),
> -				gen_rtx_SUBREG (SImode, reg, 4));
> -	      }
> -	    else
> -	      emit_move_insn (copy_rtx (tmp), reg);
> -	    emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
> -				    gen_gpr_to_xmm_move_src (vmode, tmp)));
> -	  }
> -	else if (!TARGET_64BIT && smode == DImode)
> -	  {
> -	    if (TARGET_SSE4_1)
> -	      {
> -		emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0),
> -					    CONST0_RTX (V4SImode),
> -					    gen_rtx_SUBREG (SImode, reg, 0)));
> -		emit_insn (gen_sse4_1_pinsrd (gen_rtx_SUBREG (V4SImode, vreg, 0),
> -					      gen_rtx_SUBREG (V4SImode, vreg, 0),
> -					      gen_rtx_SUBREG (SImode, reg, 4),
> -					      GEN_INT (2)));
> -	      }
> -	    else
> -	      {
> -		rtx tmp = gen_reg_rtx (DImode);
> -		emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0),
> -					    CONST0_RTX (V4SImode),
> -					    gen_rtx_SUBREG (SImode, reg, 0)));
> -		emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, tmp, 0),
> -					    CONST0_RTX (V4SImode),
> -					    gen_rtx_SUBREG (SImode, reg, 4)));
> -		emit_insn (gen_vec_interleave_lowv4si
> -			   (gen_rtx_SUBREG (V4SImode, vreg, 0),
> -			    gen_rtx_SUBREG (V4SImode, vreg, 0),
> -			    gen_rtx_SUBREG (V4SImode, tmp, 0)));
> -	      }
> -	  }
> -	else
> -	  emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
> -				  gen_gpr_to_xmm_move_src (vmode, reg)));
> -	rtx_insn *seq = get_insns ();
> -	end_sequence ();
> -	rtx_insn *insn = DF_REF_INSN (ref);
> -	emit_conversion_insns (seq, insn);
> -
> -	if (dump_file)
> -	  fprintf (dump_file,
> -		   "  Copied r%d to a vector register r%d for insn %d\n",
> -		   regno, REGNO (vreg), INSN_UID (insn));
> -      }
> -
> -  for (ref = DF_REG_USE_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref))
> -    if (bitmap_bit_p (insns, DF_REF_INSN_UID (ref)))
> -      {
> -	rtx_insn *insn = DF_REF_INSN (ref);
> -	replace_with_subreg_in_insn (insn, reg, vreg);
> -
> -	if (dump_file)
> -	  fprintf (dump_file, "  Replaced r%d with r%d in insn %d\n",
> -		   regno, REGNO (vreg), INSN_UID (insn));
> -      }
> -}
> -
> -/* Convert all definitions of register REGNO
> -   and fix its uses.  Scalar copies may be created
> -   in case register is used in not convertible insn.  */
> -
> -void
> -general_scalar_chain::convert_reg (unsigned regno)
> -{
> -  bool scalar_copy = bitmap_bit_p (defs_conv, regno);
> -  rtx reg = regno_reg_rtx[regno];
> -  rtx scopy = NULL_RTX;
> -  df_ref ref;
> -  bitmap conv;
> -
> -  conv = BITMAP_ALLOC (NULL);
> -  bitmap_copy (conv, insns);
> -
> -  if (scalar_copy)
> -    scopy = gen_reg_rtx (smode);
> +  defs_map.put (reg, vreg);
>  
> +  /* For each insn defining REGNO, see if it is defined by an insn
> +     not part of the chain but with uses in insns part of the chain
> +     and insert a copy in that case.  */
>    for (ref = DF_REG_DEF_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref))
>      {
> -      rtx_insn *insn = DF_REF_INSN (ref);
> -      rtx def_set = single_set (insn);
> -      rtx src = SET_SRC (def_set);
> -      rtx reg = DF_REF_REG (ref);
> +      if (bitmap_bit_p (insns, DF_REF_INSN_UID (ref)))
> +	continue;
> +      df_link *use;
> +      for (use = DF_REF_CHAIN (ref); use; use = use->next)
> +	if (!DF_REF_REG_MEM_P (use->ref)
> +	    && bitmap_bit_p (insns, DF_REF_INSN_UID (use->ref)))
> +	  break;
> +      if (!use)
> +	continue;
>  
> -      if (!MEM_P (src))
> +      start_sequence ();
> +      if (!TARGET_INTER_UNIT_MOVES_TO_VEC)
>  	{
> -	  replace_with_subreg_in_insn (insn, reg, reg);
> -	  bitmap_clear_bit (conv, INSN_UID (insn));
> +	  rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP);
> +	  if (smode == DImode && !TARGET_64BIT)
> +	    {
> +	      emit_move_insn (adjust_address (tmp, SImode, 0),
> +			      gen_rtx_SUBREG (SImode, reg, 0));
> +	      emit_move_insn (adjust_address (tmp, SImode, 4),
> +			      gen_rtx_SUBREG (SImode, reg, 4));
> +	    }
> +	  else
> +	    emit_move_insn (copy_rtx (tmp), reg);
> +	  emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
> +				  gen_gpr_to_xmm_move_src (vmode, tmp)));
>  	}
> -
> -      if (scalar_copy)
> +      else if (!TARGET_64BIT && smode == DImode)
>  	{
> -	  start_sequence ();
> -	  if (!TARGET_INTER_UNIT_MOVES_FROM_VEC)
> +	  if (TARGET_SSE4_1)
>  	    {
> -	      rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP);
> -	      emit_move_insn (tmp, reg);
> -	      if (!TARGET_64BIT && smode == DImode)
> -		{
> -		  emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 0),
> -				  adjust_address (tmp, SImode, 0));
> -		  emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 4),
> -				  adjust_address (tmp, SImode, 4));
> -		}
> -	      else
> -		emit_move_insn (scopy, copy_rtx (tmp));
> +	      emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0),
> +					  CONST0_RTX (V4SImode),
> +					  gen_rtx_SUBREG (SImode, reg, 0)));
> +	      emit_insn (gen_sse4_1_pinsrd (gen_rtx_SUBREG (V4SImode, vreg, 0),
> +					    gen_rtx_SUBREG (V4SImode, vreg, 0),
> +					    gen_rtx_SUBREG (SImode, reg, 4),
> +					    GEN_INT (2)));
>  	    }
> -	  else if (!TARGET_64BIT && smode == DImode)
> +	  else
>  	    {
> -	      if (TARGET_SSE4_1)
> -		{
> -		  rtx tmp = gen_rtx_PARALLEL (VOIDmode,
> -					      gen_rtvec (1, const0_rtx));
> -		  emit_insn
> -		    (gen_rtx_SET
> -		       (gen_rtx_SUBREG (SImode, scopy, 0),
> -			gen_rtx_VEC_SELECT (SImode,
> -					    gen_rtx_SUBREG (V4SImode, reg, 0),
> -					    tmp)));
> -
> -		  tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, const1_rtx));
> -		  emit_insn
> -		    (gen_rtx_SET
> -		       (gen_rtx_SUBREG (SImode, scopy, 4),
> -			gen_rtx_VEC_SELECT (SImode,
> -					    gen_rtx_SUBREG (V4SImode, reg, 0),
> -					    tmp)));
> -		}
> -	      else
> -		{
> -		  rtx vcopy = gen_reg_rtx (V2DImode);
> -		  emit_move_insn (vcopy, gen_rtx_SUBREG (V2DImode, reg, 0));
> -		  emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 0),
> -				  gen_rtx_SUBREG (SImode, vcopy, 0));
> -		  emit_move_insn (vcopy,
> -				  gen_rtx_LSHIFTRT (V2DImode,
> -						    vcopy, GEN_INT (32)));
> -		  emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 4),
> -				  gen_rtx_SUBREG (SImode, vcopy, 0));
> -		}
> +	      rtx tmp = gen_reg_rtx (DImode);
> +	      emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0),
> +					  CONST0_RTX (V4SImode),
> +					  gen_rtx_SUBREG (SImode, reg, 0)));
> +	      emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, tmp, 0),
> +					  CONST0_RTX (V4SImode),
> +					  gen_rtx_SUBREG (SImode, reg, 4)));
> +	      emit_insn (gen_vec_interleave_lowv4si
> +			 (gen_rtx_SUBREG (V4SImode, vreg, 0),
> +			  gen_rtx_SUBREG (V4SImode, vreg, 0),
> +			  gen_rtx_SUBREG (V4SImode, tmp, 0)));
>  	    }
> -	  else
> -	    emit_move_insn (scopy, reg);
> -
> -	  rtx_insn *seq = get_insns ();
> -	  end_sequence ();
> -	  emit_conversion_insns (seq, insn);
> -
> -	  if (dump_file)
> -	    fprintf (dump_file,
> -		     "  Copied r%d to a scalar register r%d for insn %d\n",
> -		     regno, REGNO (scopy), INSN_UID (insn));
>  	}
> -    }
> -
> -  for (ref = DF_REG_USE_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref))
> -    if (bitmap_bit_p (insns, DF_REF_INSN_UID (ref)))
> -      {
> -	if (bitmap_bit_p (conv, DF_REF_INSN_UID (ref)))
> -	  {
> -	    rtx_insn *insn = DF_REF_INSN (ref);
> +      else
> +	emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
> +				gen_gpr_to_xmm_move_src (vmode, reg)));
> +      rtx_insn *seq = get_insns ();
> +      end_sequence ();
> +      rtx_insn *insn = DF_REF_INSN (ref);
> +      emit_conversion_insns (seq, insn);
>  
> -	    rtx def_set = single_set (insn);
> -	    gcc_assert (def_set);
> +      if (dump_file)
> +	fprintf (dump_file,
> +		 "  Copied r%d to a vector register r%d for insn %d\n",
> +		 regno, REGNO (vreg), INSN_UID (insn));
> +    }
> +}
>  
> -	    rtx src = SET_SRC (def_set);
> -	    rtx dst = SET_DEST (def_set);
> +/* Copy the definition SRC of INSN inside the chain to DST for
> +   scalar uses outside of the chain.  */
>  
> -	    if (!MEM_P (dst) || !REG_P (src))
> -	      replace_with_subreg_in_insn (insn, reg, reg);
> +void
> +general_scalar_chain::convert_reg (rtx_insn *insn, rtx dst, rtx src)
> +{
> +  start_sequence ();
> +  if (!TARGET_INTER_UNIT_MOVES_FROM_VEC)
> +    {
> +      rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP);
> +      emit_move_insn (tmp, src);
> +      if (!TARGET_64BIT && smode == DImode)
> +	{
> +	  emit_move_insn (gen_rtx_SUBREG (SImode, dst, 0),
> +			  adjust_address (tmp, SImode, 0));
> +	  emit_move_insn (gen_rtx_SUBREG (SImode, dst, 4),
> +			  adjust_address (tmp, SImode, 4));
> +	}
> +      else
> +	emit_move_insn (dst, copy_rtx (tmp));
> +    }
> +  else if (!TARGET_64BIT && smode == DImode)
> +    {
> +      if (TARGET_SSE4_1)
> +	{
> +	  rtx tmp = gen_rtx_PARALLEL (VOIDmode,
> +				      gen_rtvec (1, const0_rtx));
> +	  emit_insn
> +	      (gen_rtx_SET
> +	       (gen_rtx_SUBREG (SImode, dst, 0),
> +		gen_rtx_VEC_SELECT (SImode,
> +				    gen_rtx_SUBREG (V4SImode, src, 0),
> +				    tmp)));
> +
> +	  tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, const1_rtx));
> +	  emit_insn
> +	      (gen_rtx_SET
> +	       (gen_rtx_SUBREG (SImode, dst, 4),
> +		gen_rtx_VEC_SELECT (SImode,
> +				    gen_rtx_SUBREG (V4SImode, src, 0),
> +				    tmp)));
> +	}
> +      else
> +	{
> +	  rtx vcopy = gen_reg_rtx (V2DImode);
> +	  emit_move_insn (vcopy, gen_rtx_SUBREG (V2DImode, src, 0));
> +	  emit_move_insn (gen_rtx_SUBREG (SImode, dst, 0),
> +			  gen_rtx_SUBREG (SImode, vcopy, 0));
> +	  emit_move_insn (vcopy,
> +			  gen_rtx_LSHIFTRT (V2DImode,
> +					    vcopy, GEN_INT (32)));
> +	  emit_move_insn (gen_rtx_SUBREG (SImode, dst, 4),
> +			  gen_rtx_SUBREG (SImode, vcopy, 0));
> +	}
> +    }
> +  else
> +    emit_move_insn (dst, src);
>  
> -	    bitmap_clear_bit (conv, INSN_UID (insn));
> -	  }
> -      }
> -    /* Skip debug insns and uninitialized uses.  */
> -    else if (DF_REF_CHAIN (ref)
> -	     && NONDEBUG_INSN_P (DF_REF_INSN (ref)))
> -      {
> -	gcc_assert (scopy);
> -	replace_rtx (DF_REF_INSN (ref), reg, scopy);
> -	df_insn_rescan (DF_REF_INSN (ref));
> -      }
> +  rtx_insn *seq = get_insns ();
> +  end_sequence ();
> +  emit_conversion_insns (seq, insn);
>  
> -  BITMAP_FREE (conv);
> +  if (dump_file)
> +    fprintf (dump_file,
> +	     "  Copied r%d to a scalar register r%d for insn %d\n",
> +	     REGNO (src), REGNO (dst), INSN_UID (insn));
>  }
>  
>  /* Convert operand OP in INSN.  We should handle
> @@ -921,16 +824,6 @@ general_scalar_chain::convert_op (rtx *o
>      }
>    else if (REG_P (*op))
>      {
> -      /* We may have not converted register usage in case
> -	 this register has no definition.  Otherwise it
> -	 should be converted in convert_reg.  */
> -      df_ref ref;
> -      FOR_EACH_INSN_USE (ref, insn)
> -	if (DF_REF_REGNO (ref) == REGNO (*op))
> -	  {
> -	    gcc_assert (!DF_REF_CHAIN (ref));
> -	    break;
> -	  }
>        *op = gen_rtx_SUBREG (vmode, *op, 0);
>      }
>    else if (CONST_INT_P (*op))
> @@ -975,6 +868,32 @@ general_scalar_chain::convert_op (rtx *o
>  void
>  general_scalar_chain::convert_insn (rtx_insn *insn)
>  {
> +  /* Generate copies for out-of-chain uses of defs.  */
> +  for (df_ref ref = DF_INSN_DEFS (insn); ref; ref = DF_REF_NEXT_LOC (ref))
> +    if (bitmap_bit_p (defs_conv, DF_REF_REGNO (ref)))
> +      {
> +	df_link *use;
> +	for (use = DF_REF_CHAIN (ref); use; use = use->next)
> +	  if (DF_REF_REG_MEM_P (use->ref)
> +	      || !bitmap_bit_p (insns, DF_REF_INSN_UID (use->ref)))
> +	    break;
> +	if (use)
> +	  convert_reg (insn, DF_REF_REG (ref),
> +		       *defs_map.get (regno_reg_rtx [DF_REF_REGNO (ref)]));
> +      }
> +
> +  /* Replace uses in this insn with the defs we use in the chain.  */
> +  for (df_ref ref = DF_INSN_USES (insn); ref; ref = DF_REF_NEXT_LOC (ref))
> +    if (!DF_REF_REG_MEM_P (ref))
> +      if (rtx *vreg = defs_map.get (regno_reg_rtx[DF_REF_REGNO (ref)]))
> +	{
> +	  /* Also update a corresponding REG_DEAD note.  */
> +	  rtx note = find_reg_note (insn, REG_DEAD, DF_REF_REG (ref));
> +	  if (note)
> +	    XEXP (note, 0) = *vreg;
> +	  *DF_REF_REAL_LOC (ref) = *vreg;
> +	}
> +
>    rtx def_set = single_set (insn);
>    rtx src = SET_SRC (def_set);
>    rtx dst = SET_DEST (def_set);
> @@ -988,6 +907,20 @@ general_scalar_chain::convert_insn (rtx_
>        emit_conversion_insns (gen_move_insn (dst, tmp), insn);
>        dst = gen_rtx_SUBREG (vmode, tmp, 0);
>      }
> +  else if (REG_P (dst))
> +    {
> +      /* Replace the definition with a SUBREG to the definition we
> +         use inside the chain.  */
> +      rtx *vdef = defs_map.get (dst);
> +      if (vdef)
> +	dst = *vdef;
> +      dst = gen_rtx_SUBREG (vmode, dst, 0);
> +      /* IRA doesn't like to have REG_EQUAL/EQUIV notes when the SET_DEST
> +         is a non-REG_P.  So kill those off.  */
> +      rtx note = find_reg_equal_equiv_note (insn);
> +      if (note)
> +	remove_note (insn, note);
> +    }
>  
>    switch (GET_CODE (src))
>      {
> @@ -1045,20 +978,15 @@ general_scalar_chain::convert_insn (rtx_
>      case COMPARE:
>        src = SUBREG_REG (XEXP (XEXP (src, 0), 0));
>  
> -      gcc_assert ((REG_P (src) && GET_MODE (src) == DImode)
> -		  || (SUBREG_P (src) && GET_MODE (src) == V2DImode));
> -
> -      if (REG_P (src))
> -	subreg = gen_rtx_SUBREG (V2DImode, src, 0);
> -      else
> -	subreg = copy_rtx_if_shared (src);
> +      gcc_assert (REG_P (src) && GET_MODE (src) == DImode);
> +      subreg = gen_rtx_SUBREG (V2DImode, src, 0);
>        emit_insn_before (gen_vec_interleave_lowv2di (copy_rtx_if_shared (subreg),
>  						    copy_rtx_if_shared (subreg),
>  						    copy_rtx_if_shared (subreg)),
>  			insn);
>        dst = gen_rtx_REG (CCmode, FLAGS_REG);
> -      src = gen_rtx_UNSPEC (CCmode, gen_rtvec (2, copy_rtx_if_shared (src),
> -					       copy_rtx_if_shared (src)),
> +      src = gen_rtx_UNSPEC (CCmode, gen_rtvec (2, copy_rtx_if_shared (subreg),
> +					       copy_rtx_if_shared (subreg)),
>  			    UNSPEC_PTEST);
>        break;
>  
> @@ -1217,16 +1145,15 @@ timode_scalar_chain::convert_insn (rtx_i
>    df_insn_rescan (insn);
>  }
>  
> +/* Generate copies from defs used by the chain but not defined therein.
> +   Also populates defs_map which is used later by convert_insn.  */
> +
>  void
>  general_scalar_chain::convert_registers ()
>  {
>    bitmap_iterator bi;
>    unsigned id;
> -
> -  EXECUTE_IF_SET_IN_BITMAP (defs, 0, id, bi)
> -    convert_reg (id);
> -
> -  EXECUTE_IF_AND_COMPL_IN_BITMAP (defs_conv, defs, 0, id, bi)
> +  EXECUTE_IF_SET_IN_BITMAP (defs_conv, 0, id, bi)
>      make_vector_copies (id);
>  }
>  
> Index: gcc/config/i386/i386-features.h
> ===================================================================
> --- gcc/config/i386/i386-features.h	(revision 274843)
> +++ gcc/config/i386/i386-features.h	(working copy)
> @@ -171,12 +171,11 @@ class general_scalar_chain : public scal
>      : scalar_chain (smode_, vmode_) {}
>    int compute_convert_gain ();
>   private:
> +  hash_map<rtx, rtx> defs_map;
>    void mark_dual_mode_def (df_ref def);
> -  rtx replace_with_subreg (rtx x, rtx reg, rtx subreg);
> -  void replace_with_subreg_in_insn (rtx_insn *insn, rtx reg, rtx subreg);
>    void convert_insn (rtx_insn *insn);
>    void convert_op (rtx *op, rtx_insn *insn);
> -  void convert_reg (unsigned regno);
> +  void convert_reg (rtx_insn *insn, rtx dst, rtx src);
>    void make_vector_copies (unsigned regno);
>    void convert_registers ();
>    int vector_const_cost (rtx exp);
>
Uros Bizjak Aug. 26, 2019, 10:06 a.m. UTC | #5
On Mon, Aug 26, 2019 at 10:40 AM Richard Biener <rguenther@suse.de> wrote:
>
> On Fri, 23 Aug 2019, Richard Biener wrote:
>
> > On Fri, 23 Aug 2019, Richard Biener wrote:
> >
> > > On Fri, 23 Aug 2019, Uros Bizjak wrote:
> > >
> > > > On Thu, Aug 22, 2019 at 3:35 PM Richard Biener <rguenther@suse.de> wrote:
> > > > >
> > > > >
> > > > > This fixes quadraticness in STV and makes
> > > > >
> > > > >  machine dep reorg                  :  89.07 ( 95%)   0.02 ( 18%)  89.10 (
> > > > > 95%)      54 kB (  0%)
> > > > >
> > > > > drop to zero.  Anybody remembers why it is the way it is now?
> > > > >
> > > > > Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> > > > >
> > > > > OK?
> > > >
> > > > Looking at the PR, comment #3 [1], I assume this patch is obsoltete
> > > > and will be updated?
> > > >
> > > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91522#c3
> > >
> > > Yes.  I'm still learning how STV operates (and learing DF and RTL...).
> > > The following is a rewrite of the non-TImode chain conversion
> > > according to I think how it should operate als allowing the hunk
> > > that fixes the compile-time and fixing PR91527 on the way
> > > (which I ran into during more extensive testing of the patch myself).
> > >
> > > So compared to the state before which I still do not 100% understand
> > > we now do the following.  Chain detection works as before including
> > > recording of all defs (both defined by the insns in the chain and
> > > insns outside) that need copy-in or copy-out operations.
> > >
> > > But then the patch changes things as to guarantee that
> > > after the conversion all uses/defs of a pseudo are
> > > of the (subreg:Vmode ..) form or of the original scalar form.
> > > In particular it avoids the need to change any insns that
> > > are not part of the chain (besides emitting the extra copy
> > > instructions).  For this all defs that were marked as needing
> > > copies (thus they have uses/defs both in the chain and outside)
> > > the chain will use a new pseudo that we copy to from scalar sources
> > > and that we copy from for scalar uses.  There's the new defs_map
> > > which records the mapping of old to new reg.  pseudos that are
> > > only used in the chain already are not remapped.
> > >
> > > The conversion itself then happens in two stages, first,
> > > in make_vector_copies, we emit the copy-in insns and
> > > allocate all pseudos we need.  Then the rest of the
> > > conversion happens fully inside of convert_insn where
> > > we generate the copy-outs of the insns def, replace
> > > defs and uses according to the mapping and replace uses
> > > and defs with the (subreg:Vmode ..) style.
> > >
> > > For PR91527 IRA doesn't like the REG_EQUIV note in
> > >
> > > (insn 4 24 5 2 (set (subreg:V4SI (reg/v:SI 90 [ c ]) 0)
> > >         (subreg:V4SI (reg:SI 100) 0))
> > > "/space/rguenther/src/svn/trunk2/gcc/testsuite/g++.dg/tree-ssa/pr21463.C":11:4
> > > 1248 {movv4si_internal}
> > >      (expr_list:REG_DEAD (reg:SI 100)
> > >         (expr_list:REG_EQUIV (mem/c:SI (plus:DI (reg/f:DI 16 argp)
> > >                     (const_int 16 [0x10])) [1 c+0 S4 A64])
> > >             (nil))))
> > >
> > > because the SET_DEST is not a REG_P.  I'm not sure if this
> > > is invalid RTL, docs say SET_DEST might be a strict_low_part
> > > or a zero_extract but doesn't mention a subreg.  So I opted
> > > to simply remove equal/equiv notes on insns we convert
> > > and since the above has a REG_DEAD note I took the liberty
> > > to update that according to the mapping (so that would have
> > > been not needed before this patch) rather than dropping it.
> > >
> > > Bootstrapped with and without --with-march=westmere (to get
> > > some STV coverage, this included all languages) on
> > > x88_64-unknown-linux-gnu, testing in progress.
> > >
> > > OK if testing succeeds?
> >
> > Testing revealed I made an error in general_scalar_chain::convert_insn
> > failing to move down SET_SRC extraction below replacing with
> > the defs map.  This showed in 4 execute FAILs in 32bit fortran
> > testing (only).  Fixed by moving down the whole def_set/src/dst
> > extraction.
> >
> > Re-testing on x86_64-unknown-linux-gnu.
>
> Bootstrapped / tested on x86_64-unknown-linux-gnu.  The "no-costmodel"
> run runs into the latent PR91528 building 32bit libada in stage3
> for a few sources, I've manually built those with -mno-stv and
> bootstrap finishes successfully.  I hope HJ can help with this
> dynamic stack-alignment issue.
>
> So - OK for trunk?
>
> As followup we can now remove general_remove_non_convertible_regs
> because we can handle defs that cannot be converted just fine
> with the patch since we're splitting live-ranges of all defs at
> the chain boundary.
>
> Thanks,
> Richard.
>
> > Updated patch below.  I'm feeling adventurous and will run
> > the "westmere" bootstrap with costing disabled (aka always
> > convert detected chains...).
> >
> > Richard.
> >
> > 2019-08-23  Richard Biener  <rguenther@suse.de>
> >
> >       PR target/91522
> >       PR target/91527
> >       * config/i386/i386-features.h (general_scalar_chain::defs_map):
> >       New member.
> >       (general_scalar_chain::replace_with_subreg): Remove.
> >       (general_scalar_chain::replace_with_subreg_in_insn): Likewise.
> >       (general_scalar_chain::convert_reg): Adjust signature.
> >       * config/i386/i386-features.c (scalar_chain::add_insn): Do not
> >       iterate over all defs of a reg.
> >       (general_scalar_chain::replace_with_subreg): Remove.
> >       (general_scalar_chain::replace_with_subreg_in_insn): Likewise.
> >       (general_scalar_chain::make_vector_copies): Populate defs_map,
> >       place copy only after defs that are used as vectors in the chain.
> >       (general_scalar_chain::convert_reg): Emit a copy for a specific
> >       def in a specific instruction.
> >       (general_scalar_chain::convert_op): All reg uses are converted here.
> >       (general_scalar_chain::convert_insn): Emit copies for scalar
> >       uses of defs here.  Replace uses with the copies we created.
> >       Replace and convert the def.  Adjust REG_DEAD notes, remove
> >       REG_EQUIV/EQUAL notes.
> >       (general_scalar_chain::convert_registers): Only handle copies
> >       into the chain here.

Rubberstamped with LGTM. It looks you are the master of this domain now ;)

Thanks,
Uros.

> >
> >
> > Index: gcc/config/i386/i386-features.c
> > ===================================================================
> > --- gcc/config/i386/i386-features.c   (revision 274843)
> > +++ gcc/config/i386/i386-features.c   (working copy)
> > @@ -416,13 +416,9 @@ scalar_chain::add_insn (bitmap candidate
> >       iterates over all refs to look for dual-mode regs.  Instead this
> >       should be done separately for all regs mentioned in the chain once.  */
> >    df_ref ref;
> > -  df_ref def;
> >    for (ref = DF_INSN_UID_DEFS (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref))
> >      if (!HARD_REGISTER_P (DF_REF_REG (ref)))
> > -      for (def = DF_REG_DEF_CHAIN (DF_REF_REGNO (ref));
> > -        def;
> > -        def = DF_REF_NEXT_REG (def))
> > -     analyze_register_chain (candidates, def);
> > +      analyze_register_chain (candidates, ref);
> >    for (ref = DF_INSN_UID_USES (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref))
> >      if (!DF_REF_REG_MEM_P (ref))
> >        analyze_register_chain (candidates, ref);
> > @@ -605,42 +601,6 @@ general_scalar_chain::compute_convert_ga
> >    return gain;
> >  }
> >
> > -/* Replace REG in X with a V2DI subreg of NEW_REG.  */
> > -
> > -rtx
> > -general_scalar_chain::replace_with_subreg (rtx x, rtx reg, rtx new_reg)
> > -{
> > -  if (x == reg)
> > -    return gen_rtx_SUBREG (vmode, new_reg, 0);
> > -
> > -  /* But not in memory addresses.  */
> > -  if (MEM_P (x))
> > -    return x;
> > -
> > -  const char *fmt = GET_RTX_FORMAT (GET_CODE (x));
> > -  int i, j;
> > -  for (i = GET_RTX_LENGTH (GET_CODE (x)) - 1; i >= 0; i--)
> > -    {
> > -      if (fmt[i] == 'e')
> > -     XEXP (x, i) = replace_with_subreg (XEXP (x, i), reg, new_reg);
> > -      else if (fmt[i] == 'E')
> > -     for (j = XVECLEN (x, i) - 1; j >= 0; j--)
> > -       XVECEXP (x, i, j) = replace_with_subreg (XVECEXP (x, i, j),
> > -                                                reg, new_reg);
> > -    }
> > -
> > -  return x;
> > -}
> > -
> > -/* Replace REG in INSN with a V2DI subreg of NEW_REG.  */
> > -
> > -void
> > -general_scalar_chain::replace_with_subreg_in_insn (rtx_insn *insn,
> > -                                               rtx reg, rtx new_reg)
> > -{
> > -  replace_with_subreg (single_set (insn), reg, new_reg);
> > -}
> > -
> >  /* Insert generated conversion instruction sequence INSNS
> >     after instruction AFTER.  New BB may be required in case
> >     instruction has EH region attached.  */
> > @@ -691,204 +651,147 @@ general_scalar_chain::make_vector_copies
> >    rtx vreg = gen_reg_rtx (smode);
> >    df_ref ref;
> >
> > -  for (ref = DF_REG_DEF_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref))
> > -    if (!bitmap_bit_p (insns, DF_REF_INSN_UID (ref)))
> > -      {
> > -     start_sequence ();
> > -     if (!TARGET_INTER_UNIT_MOVES_TO_VEC)
> > -       {
> > -         rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP);
> > -         if (smode == DImode && !TARGET_64BIT)
> > -           {
> > -             emit_move_insn (adjust_address (tmp, SImode, 0),
> > -                             gen_rtx_SUBREG (SImode, reg, 0));
> > -             emit_move_insn (adjust_address (tmp, SImode, 4),
> > -                             gen_rtx_SUBREG (SImode, reg, 4));
> > -           }
> > -         else
> > -           emit_move_insn (copy_rtx (tmp), reg);
> > -         emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
> > -                                 gen_gpr_to_xmm_move_src (vmode, tmp)));
> > -       }
> > -     else if (!TARGET_64BIT && smode == DImode)
> > -       {
> > -         if (TARGET_SSE4_1)
> > -           {
> > -             emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0),
> > -                                         CONST0_RTX (V4SImode),
> > -                                         gen_rtx_SUBREG (SImode, reg, 0)));
> > -             emit_insn (gen_sse4_1_pinsrd (gen_rtx_SUBREG (V4SImode, vreg, 0),
> > -                                           gen_rtx_SUBREG (V4SImode, vreg, 0),
> > -                                           gen_rtx_SUBREG (SImode, reg, 4),
> > -                                           GEN_INT (2)));
> > -           }
> > -         else
> > -           {
> > -             rtx tmp = gen_reg_rtx (DImode);
> > -             emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0),
> > -                                         CONST0_RTX (V4SImode),
> > -                                         gen_rtx_SUBREG (SImode, reg, 0)));
> > -             emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, tmp, 0),
> > -                                         CONST0_RTX (V4SImode),
> > -                                         gen_rtx_SUBREG (SImode, reg, 4)));
> > -             emit_insn (gen_vec_interleave_lowv4si
> > -                        (gen_rtx_SUBREG (V4SImode, vreg, 0),
> > -                         gen_rtx_SUBREG (V4SImode, vreg, 0),
> > -                         gen_rtx_SUBREG (V4SImode, tmp, 0)));
> > -           }
> > -       }
> > -     else
> > -       emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
> > -                               gen_gpr_to_xmm_move_src (vmode, reg)));
> > -     rtx_insn *seq = get_insns ();
> > -     end_sequence ();
> > -     rtx_insn *insn = DF_REF_INSN (ref);
> > -     emit_conversion_insns (seq, insn);
> > -
> > -     if (dump_file)
> > -       fprintf (dump_file,
> > -                "  Copied r%d to a vector register r%d for insn %d\n",
> > -                regno, REGNO (vreg), INSN_UID (insn));
> > -      }
> > -
> > -  for (ref = DF_REG_USE_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref))
> > -    if (bitmap_bit_p (insns, DF_REF_INSN_UID (ref)))
> > -      {
> > -     rtx_insn *insn = DF_REF_INSN (ref);
> > -     replace_with_subreg_in_insn (insn, reg, vreg);
> > -
> > -     if (dump_file)
> > -       fprintf (dump_file, "  Replaced r%d with r%d in insn %d\n",
> > -                regno, REGNO (vreg), INSN_UID (insn));
> > -      }
> > -}
> > -
> > -/* Convert all definitions of register REGNO
> > -   and fix its uses.  Scalar copies may be created
> > -   in case register is used in not convertible insn.  */
> > -
> > -void
> > -general_scalar_chain::convert_reg (unsigned regno)
> > -{
> > -  bool scalar_copy = bitmap_bit_p (defs_conv, regno);
> > -  rtx reg = regno_reg_rtx[regno];
> > -  rtx scopy = NULL_RTX;
> > -  df_ref ref;
> > -  bitmap conv;
> > -
> > -  conv = BITMAP_ALLOC (NULL);
> > -  bitmap_copy (conv, insns);
> > -
> > -  if (scalar_copy)
> > -    scopy = gen_reg_rtx (smode);
> > +  defs_map.put (reg, vreg);
> >
> > +  /* For each insn defining REGNO, see if it is defined by an insn
> > +     not part of the chain but with uses in insns part of the chain
> > +     and insert a copy in that case.  */
> >    for (ref = DF_REG_DEF_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref))
> >      {
> > -      rtx_insn *insn = DF_REF_INSN (ref);
> > -      rtx def_set = single_set (insn);
> > -      rtx src = SET_SRC (def_set);
> > -      rtx reg = DF_REF_REG (ref);
> > +      if (bitmap_bit_p (insns, DF_REF_INSN_UID (ref)))
> > +     continue;
> > +      df_link *use;
> > +      for (use = DF_REF_CHAIN (ref); use; use = use->next)
> > +     if (!DF_REF_REG_MEM_P (use->ref)
> > +         && bitmap_bit_p (insns, DF_REF_INSN_UID (use->ref)))
> > +       break;
> > +      if (!use)
> > +     continue;
> >
> > -      if (!MEM_P (src))
> > +      start_sequence ();
> > +      if (!TARGET_INTER_UNIT_MOVES_TO_VEC)
> >       {
> > -       replace_with_subreg_in_insn (insn, reg, reg);
> > -       bitmap_clear_bit (conv, INSN_UID (insn));
> > +       rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP);
> > +       if (smode == DImode && !TARGET_64BIT)
> > +         {
> > +           emit_move_insn (adjust_address (tmp, SImode, 0),
> > +                           gen_rtx_SUBREG (SImode, reg, 0));
> > +           emit_move_insn (adjust_address (tmp, SImode, 4),
> > +                           gen_rtx_SUBREG (SImode, reg, 4));
> > +         }
> > +       else
> > +         emit_move_insn (copy_rtx (tmp), reg);
> > +       emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
> > +                               gen_gpr_to_xmm_move_src (vmode, tmp)));
> >       }
> > -
> > -      if (scalar_copy)
> > +      else if (!TARGET_64BIT && smode == DImode)
> >       {
> > -       start_sequence ();
> > -       if (!TARGET_INTER_UNIT_MOVES_FROM_VEC)
> > +       if (TARGET_SSE4_1)
> >           {
> > -           rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP);
> > -           emit_move_insn (tmp, reg);
> > -           if (!TARGET_64BIT && smode == DImode)
> > -             {
> > -               emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 0),
> > -                               adjust_address (tmp, SImode, 0));
> > -               emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 4),
> > -                               adjust_address (tmp, SImode, 4));
> > -             }
> > -           else
> > -             emit_move_insn (scopy, copy_rtx (tmp));
> > +           emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0),
> > +                                       CONST0_RTX (V4SImode),
> > +                                       gen_rtx_SUBREG (SImode, reg, 0)));
> > +           emit_insn (gen_sse4_1_pinsrd (gen_rtx_SUBREG (V4SImode, vreg, 0),
> > +                                         gen_rtx_SUBREG (V4SImode, vreg, 0),
> > +                                         gen_rtx_SUBREG (SImode, reg, 4),
> > +                                         GEN_INT (2)));
> >           }
> > -       else if (!TARGET_64BIT && smode == DImode)
> > +       else
> >           {
> > -           if (TARGET_SSE4_1)
> > -             {
> > -               rtx tmp = gen_rtx_PARALLEL (VOIDmode,
> > -                                           gen_rtvec (1, const0_rtx));
> > -               emit_insn
> > -                 (gen_rtx_SET
> > -                    (gen_rtx_SUBREG (SImode, scopy, 0),
> > -                     gen_rtx_VEC_SELECT (SImode,
> > -                                         gen_rtx_SUBREG (V4SImode, reg, 0),
> > -                                         tmp)));
> > -
> > -               tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, const1_rtx));
> > -               emit_insn
> > -                 (gen_rtx_SET
> > -                    (gen_rtx_SUBREG (SImode, scopy, 4),
> > -                     gen_rtx_VEC_SELECT (SImode,
> > -                                         gen_rtx_SUBREG (V4SImode, reg, 0),
> > -                                         tmp)));
> > -             }
> > -           else
> > -             {
> > -               rtx vcopy = gen_reg_rtx (V2DImode);
> > -               emit_move_insn (vcopy, gen_rtx_SUBREG (V2DImode, reg, 0));
> > -               emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 0),
> > -                               gen_rtx_SUBREG (SImode, vcopy, 0));
> > -               emit_move_insn (vcopy,
> > -                               gen_rtx_LSHIFTRT (V2DImode,
> > -                                                 vcopy, GEN_INT (32)));
> > -               emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 4),
> > -                               gen_rtx_SUBREG (SImode, vcopy, 0));
> > -             }
> > +           rtx tmp = gen_reg_rtx (DImode);
> > +           emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0),
> > +                                       CONST0_RTX (V4SImode),
> > +                                       gen_rtx_SUBREG (SImode, reg, 0)));
> > +           emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, tmp, 0),
> > +                                       CONST0_RTX (V4SImode),
> > +                                       gen_rtx_SUBREG (SImode, reg, 4)));
> > +           emit_insn (gen_vec_interleave_lowv4si
> > +                      (gen_rtx_SUBREG (V4SImode, vreg, 0),
> > +                       gen_rtx_SUBREG (V4SImode, vreg, 0),
> > +                       gen_rtx_SUBREG (V4SImode, tmp, 0)));
> >           }
> > -       else
> > -         emit_move_insn (scopy, reg);
> > -
> > -       rtx_insn *seq = get_insns ();
> > -       end_sequence ();
> > -       emit_conversion_insns (seq, insn);
> > -
> > -       if (dump_file)
> > -         fprintf (dump_file,
> > -                  "  Copied r%d to a scalar register r%d for insn %d\n",
> > -                  regno, REGNO (scopy), INSN_UID (insn));
> >       }
> > -    }
> > -
> > -  for (ref = DF_REG_USE_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref))
> > -    if (bitmap_bit_p (insns, DF_REF_INSN_UID (ref)))
> > -      {
> > -     if (bitmap_bit_p (conv, DF_REF_INSN_UID (ref)))
> > -       {
> > -         rtx_insn *insn = DF_REF_INSN (ref);
> > +      else
> > +     emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
> > +                             gen_gpr_to_xmm_move_src (vmode, reg)));
> > +      rtx_insn *seq = get_insns ();
> > +      end_sequence ();
> > +      rtx_insn *insn = DF_REF_INSN (ref);
> > +      emit_conversion_insns (seq, insn);
> >
> > -         rtx def_set = single_set (insn);
> > -         gcc_assert (def_set);
> > +      if (dump_file)
> > +     fprintf (dump_file,
> > +              "  Copied r%d to a vector register r%d for insn %d\n",
> > +              regno, REGNO (vreg), INSN_UID (insn));
> > +    }
> > +}
> >
> > -         rtx src = SET_SRC (def_set);
> > -         rtx dst = SET_DEST (def_set);
> > +/* Copy the definition SRC of INSN inside the chain to DST for
> > +   scalar uses outside of the chain.  */
> >
> > -         if (!MEM_P (dst) || !REG_P (src))
> > -           replace_with_subreg_in_insn (insn, reg, reg);
> > +void
> > +general_scalar_chain::convert_reg (rtx_insn *insn, rtx dst, rtx src)
> > +{
> > +  start_sequence ();
> > +  if (!TARGET_INTER_UNIT_MOVES_FROM_VEC)
> > +    {
> > +      rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP);
> > +      emit_move_insn (tmp, src);
> > +      if (!TARGET_64BIT && smode == DImode)
> > +     {
> > +       emit_move_insn (gen_rtx_SUBREG (SImode, dst, 0),
> > +                       adjust_address (tmp, SImode, 0));
> > +       emit_move_insn (gen_rtx_SUBREG (SImode, dst, 4),
> > +                       adjust_address (tmp, SImode, 4));
> > +     }
> > +      else
> > +     emit_move_insn (dst, copy_rtx (tmp));
> > +    }
> > +  else if (!TARGET_64BIT && smode == DImode)
> > +    {
> > +      if (TARGET_SSE4_1)
> > +     {
> > +       rtx tmp = gen_rtx_PARALLEL (VOIDmode,
> > +                                   gen_rtvec (1, const0_rtx));
> > +       emit_insn
> > +           (gen_rtx_SET
> > +            (gen_rtx_SUBREG (SImode, dst, 0),
> > +             gen_rtx_VEC_SELECT (SImode,
> > +                                 gen_rtx_SUBREG (V4SImode, src, 0),
> > +                                 tmp)));
> > +
> > +       tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, const1_rtx));
> > +       emit_insn
> > +           (gen_rtx_SET
> > +            (gen_rtx_SUBREG (SImode, dst, 4),
> > +             gen_rtx_VEC_SELECT (SImode,
> > +                                 gen_rtx_SUBREG (V4SImode, src, 0),
> > +                                 tmp)));
> > +     }
> > +      else
> > +     {
> > +       rtx vcopy = gen_reg_rtx (V2DImode);
> > +       emit_move_insn (vcopy, gen_rtx_SUBREG (V2DImode, src, 0));
> > +       emit_move_insn (gen_rtx_SUBREG (SImode, dst, 0),
> > +                       gen_rtx_SUBREG (SImode, vcopy, 0));
> > +       emit_move_insn (vcopy,
> > +                       gen_rtx_LSHIFTRT (V2DImode,
> > +                                         vcopy, GEN_INT (32)));
> > +       emit_move_insn (gen_rtx_SUBREG (SImode, dst, 4),
> > +                       gen_rtx_SUBREG (SImode, vcopy, 0));
> > +     }
> > +    }
> > +  else
> > +    emit_move_insn (dst, src);
> >
> > -         bitmap_clear_bit (conv, INSN_UID (insn));
> > -       }
> > -      }
> > -    /* Skip debug insns and uninitialized uses.  */
> > -    else if (DF_REF_CHAIN (ref)
> > -          && NONDEBUG_INSN_P (DF_REF_INSN (ref)))
> > -      {
> > -     gcc_assert (scopy);
> > -     replace_rtx (DF_REF_INSN (ref), reg, scopy);
> > -     df_insn_rescan (DF_REF_INSN (ref));
> > -      }
> > +  rtx_insn *seq = get_insns ();
> > +  end_sequence ();
> > +  emit_conversion_insns (seq, insn);
> >
> > -  BITMAP_FREE (conv);
> > +  if (dump_file)
> > +    fprintf (dump_file,
> > +          "  Copied r%d to a scalar register r%d for insn %d\n",
> > +          REGNO (src), REGNO (dst), INSN_UID (insn));
> >  }
> >
> >  /* Convert operand OP in INSN.  We should handle
> > @@ -921,16 +824,6 @@ general_scalar_chain::convert_op (rtx *o
> >      }
> >    else if (REG_P (*op))
> >      {
> > -      /* We may have not converted register usage in case
> > -      this register has no definition.  Otherwise it
> > -      should be converted in convert_reg.  */
> > -      df_ref ref;
> > -      FOR_EACH_INSN_USE (ref, insn)
> > -     if (DF_REF_REGNO (ref) == REGNO (*op))
> > -       {
> > -         gcc_assert (!DF_REF_CHAIN (ref));
> > -         break;
> > -       }
> >        *op = gen_rtx_SUBREG (vmode, *op, 0);
> >      }
> >    else if (CONST_INT_P (*op))
> > @@ -975,6 +868,32 @@ general_scalar_chain::convert_op (rtx *o
> >  void
> >  general_scalar_chain::convert_insn (rtx_insn *insn)
> >  {
> > +  /* Generate copies for out-of-chain uses of defs.  */
> > +  for (df_ref ref = DF_INSN_DEFS (insn); ref; ref = DF_REF_NEXT_LOC (ref))
> > +    if (bitmap_bit_p (defs_conv, DF_REF_REGNO (ref)))
> > +      {
> > +     df_link *use;
> > +     for (use = DF_REF_CHAIN (ref); use; use = use->next)
> > +       if (DF_REF_REG_MEM_P (use->ref)
> > +           || !bitmap_bit_p (insns, DF_REF_INSN_UID (use->ref)))
> > +         break;
> > +     if (use)
> > +       convert_reg (insn, DF_REF_REG (ref),
> > +                    *defs_map.get (regno_reg_rtx [DF_REF_REGNO (ref)]));
> > +      }
> > +
> > +  /* Replace uses in this insn with the defs we use in the chain.  */
> > +  for (df_ref ref = DF_INSN_USES (insn); ref; ref = DF_REF_NEXT_LOC (ref))
> > +    if (!DF_REF_REG_MEM_P (ref))
> > +      if (rtx *vreg = defs_map.get (regno_reg_rtx[DF_REF_REGNO (ref)]))
> > +     {
> > +       /* Also update a corresponding REG_DEAD note.  */
> > +       rtx note = find_reg_note (insn, REG_DEAD, DF_REF_REG (ref));
> > +       if (note)
> > +         XEXP (note, 0) = *vreg;
> > +       *DF_REF_REAL_LOC (ref) = *vreg;
> > +     }
> > +
> >    rtx def_set = single_set (insn);
> >    rtx src = SET_SRC (def_set);
> >    rtx dst = SET_DEST (def_set);
> > @@ -988,6 +907,20 @@ general_scalar_chain::convert_insn (rtx_
> >        emit_conversion_insns (gen_move_insn (dst, tmp), insn);
> >        dst = gen_rtx_SUBREG (vmode, tmp, 0);
> >      }
> > +  else if (REG_P (dst))
> > +    {
> > +      /* Replace the definition with a SUBREG to the definition we
> > +         use inside the chain.  */
> > +      rtx *vdef = defs_map.get (dst);
> > +      if (vdef)
> > +     dst = *vdef;
> > +      dst = gen_rtx_SUBREG (vmode, dst, 0);
> > +      /* IRA doesn't like to have REG_EQUAL/EQUIV notes when the SET_DEST
> > +         is a non-REG_P.  So kill those off.  */
> > +      rtx note = find_reg_equal_equiv_note (insn);
> > +      if (note)
> > +     remove_note (insn, note);
> > +    }
> >
> >    switch (GET_CODE (src))
> >      {
> > @@ -1045,20 +978,15 @@ general_scalar_chain::convert_insn (rtx_
> >      case COMPARE:
> >        src = SUBREG_REG (XEXP (XEXP (src, 0), 0));
> >
> > -      gcc_assert ((REG_P (src) && GET_MODE (src) == DImode)
> > -               || (SUBREG_P (src) && GET_MODE (src) == V2DImode));
> > -
> > -      if (REG_P (src))
> > -     subreg = gen_rtx_SUBREG (V2DImode, src, 0);
> > -      else
> > -     subreg = copy_rtx_if_shared (src);
> > +      gcc_assert (REG_P (src) && GET_MODE (src) == DImode);
> > +      subreg = gen_rtx_SUBREG (V2DImode, src, 0);
> >        emit_insn_before (gen_vec_interleave_lowv2di (copy_rtx_if_shared (subreg),
> >                                                   copy_rtx_if_shared (subreg),
> >                                                   copy_rtx_if_shared (subreg)),
> >                       insn);
> >        dst = gen_rtx_REG (CCmode, FLAGS_REG);
> > -      src = gen_rtx_UNSPEC (CCmode, gen_rtvec (2, copy_rtx_if_shared (src),
> > -                                            copy_rtx_if_shared (src)),
> > +      src = gen_rtx_UNSPEC (CCmode, gen_rtvec (2, copy_rtx_if_shared (subreg),
> > +                                            copy_rtx_if_shared (subreg)),
> >                           UNSPEC_PTEST);
> >        break;
> >
> > @@ -1217,16 +1145,15 @@ timode_scalar_chain::convert_insn (rtx_i
> >    df_insn_rescan (insn);
> >  }
> >
> > +/* Generate copies from defs used by the chain but not defined therein.
> > +   Also populates defs_map which is used later by convert_insn.  */
> > +
> >  void
> >  general_scalar_chain::convert_registers ()
> >  {
> >    bitmap_iterator bi;
> >    unsigned id;
> > -
> > -  EXECUTE_IF_SET_IN_BITMAP (defs, 0, id, bi)
> > -    convert_reg (id);
> > -
> > -  EXECUTE_IF_AND_COMPL_IN_BITMAP (defs_conv, defs, 0, id, bi)
> > +  EXECUTE_IF_SET_IN_BITMAP (defs_conv, 0, id, bi)
> >      make_vector_copies (id);
> >  }
> >
> > Index: gcc/config/i386/i386-features.h
> > ===================================================================
> > --- gcc/config/i386/i386-features.h   (revision 274843)
> > +++ gcc/config/i386/i386-features.h   (working copy)
> > @@ -171,12 +171,11 @@ class general_scalar_chain : public scal
> >      : scalar_chain (smode_, vmode_) {}
> >    int compute_convert_gain ();
> >   private:
> > +  hash_map<rtx, rtx> defs_map;
> >    void mark_dual_mode_def (df_ref def);
> > -  rtx replace_with_subreg (rtx x, rtx reg, rtx subreg);
> > -  void replace_with_subreg_in_insn (rtx_insn *insn, rtx reg, rtx subreg);
> >    void convert_insn (rtx_insn *insn);
> >    void convert_op (rtx *op, rtx_insn *insn);
> > -  void convert_reg (unsigned regno);
> > +  void convert_reg (rtx_insn *insn, rtx dst, rtx src);
> >    void make_vector_copies (unsigned regno);
> >    void convert_registers ();
> >    int vector_const_cost (rtx exp);
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 247165 (AG München)
Iain Sandoe Aug. 28, 2019, 1:26 p.m. UTC | #6
> On 26 Aug 2019, at 11:06, Uros Bizjak <ubizjak@gmail.com> wrote:
> 
> On Mon, Aug 26, 2019 at 10:40 AM Richard Biener <rguenther@suse.de> wrote:
>> 
>> On Fri, 23 Aug 2019, Richard Biener wrote:
>> 
>>> On Fri, 23 Aug 2019, Richard Biener wrote:
>>> 
>>>> On Fri, 23 Aug 2019, Uros Bizjak wrote:
>>>> 
>>>>> On Thu, Aug 22, 2019 at 3:35 PM Richard Biener <rguenther@suse.de> wrote:
>>>>>> 
>>>>>> 
>>>>>> This fixes quadraticness in STV and makes
>>>>>> 
>>>>>> machine dep reorg                  :  89.07 ( 95%)   0.02 ( 18%)  89.10 (
>>>>>> 95%)      54 kB (  0%)
>>>>>> 
>>>>>> drop to zero.  Anybody remembers why it is the way it is now?
>>>>>> 
>>>>>> Bootstrap / regtest running on x86_64-unknown-linux-gnu.
>>>>>> 
>>>>>> OK?
>>>>> 
>>>>> Looking at the PR, comment #3 [1], I assume this patch is obsoltete
>>>>> and will be updated?
>>>>> 
>>>>> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91522#c3
>>>> 
>>>> Yes.  I'm still learning how STV operates (and learing DF and RTL...).
>>>> The following is a rewrite of the non-TImode chain conversion
>>>> according to I think how it should operate als allowing the hunk
>>>> that fixes the compile-time and fixing PR91527 on the way
>>>> (which I ran into during more extensive testing of the patch myself).
>>>> 
>>>> So compared to the state before which I still do not 100% understand
>>>> we now do the following.  Chain detection works as before including
>>>> recording of all defs (both defined by the insns in the chain and
>>>> insns outside) that need copy-in or copy-out operations.
>>>> 
>>>> But then the patch changes things as to guarantee that
>>>> after the conversion all uses/defs of a pseudo are
>>>> of the (subreg:Vmode ..) form or of the original scalar form.
>>>> In particular it avoids the need to change any insns that
>>>> are not part of the chain (besides emitting the extra copy
>>>> instructions).  For this all defs that were marked as needing
>>>> copies (thus they have uses/defs both in the chain and outside)
>>>> the chain will use a new pseudo that we copy to from scalar sources
>>>> and that we copy from for scalar uses.  There's the new defs_map
>>>> which records the mapping of old to new reg.  pseudos that are
>>>> only used in the chain already are not remapped.
>>>> 
>>>> The conversion itself then happens in two stages, first,
>>>> in make_vector_copies, we emit the copy-in insns and
>>>> allocate all pseudos we need.  Then the rest of the
>>>> conversion happens fully inside of convert_insn where
>>>> we generate the copy-outs of the insns def, replace
>>>> defs and uses according to the mapping and replace uses
>>>> and defs with the (subreg:Vmode ..) style.
>>>> 
>>>> For PR91527 IRA doesn't like the REG_EQUIV note in
>>>> 
>>>> (insn 4 24 5 2 (set (subreg:V4SI (reg/v:SI 90 [ c ]) 0)
>>>>        (subreg:V4SI (reg:SI 100) 0))
>>>> "/space/rguenther/src/svn/trunk2/gcc/testsuite/g++.dg/tree-ssa/pr21463.C":11:4
>>>> 1248 {movv4si_internal}
>>>>     (expr_list:REG_DEAD (reg:SI 100)
>>>>        (expr_list:REG_EQUIV (mem/c:SI (plus:DI (reg/f:DI 16 argp)
>>>>                    (const_int 16 [0x10])) [1 c+0 S4 A64])
>>>>            (nil))))
>>>> 
>>>> because the SET_DEST is not a REG_P.  I'm not sure if this
>>>> is invalid RTL, docs say SET_DEST might be a strict_low_part
>>>> or a zero_extract but doesn't mention a subreg.  So I opted
>>>> to simply remove equal/equiv notes on insns we convert
>>>> and since the above has a REG_DEAD note I took the liberty
>>>> to update that according to the mapping (so that would have
>>>> been not needed before this patch) rather than dropping it.
>>>> 
>>>> Bootstrapped with and without --with-march=westmere (to get
>>>> some STV coverage, this included all languages) on
>>>> x88_64-unknown-linux-gnu, testing in progress.
>>>> 
>>>> OK if testing succeeds?
>>> 
>>> Testing revealed I made an error in general_scalar_chain::convert_insn
>>> failing to move down SET_SRC extraction below replacing with
>>> the defs map.  This showed in 4 execute FAILs in 32bit fortran
>>> testing (only).  Fixed by moving down the whole def_set/src/dst
>>> extraction.
>>> 
>>> Re-testing on x86_64-unknown-linux-gnu.
>> 
>> Bootstrapped / tested on x86_64-unknown-linux-gnu.  The "no-costmodel"
>> run runs into the latent PR91528 building 32bit libada in stage3
>> for a few sources, I've manually built those with -mno-stv and
>> bootstrap finishes successfully.  I hope HJ can help with this
>> dynamic stack-alignment issue.
>> 
>> So - OK for trunk?
>> 
>> As followup we can now remove general_remove_non_convertible_regs
>> because we can handle defs that cannot be converted just fine
>> with the patch since we're splitting live-ranges of all defs at
>> the chain boundary.
>> 
>> Thanks,
>> Richard.
>> 
>>> Updated patch below.  I'm feeling adventurous and will run
>>> the "westmere" bootstrap with costing disabled (aka always
>>> convert detected chains...).
>>> 
>>> Richard.
>>> 
>>> 2019-08-23  Richard Biener  <rguenther@suse.de>
>>> 
>>>      PR target/91522
>>>      PR target/91527
>>>      * config/i386/i386-features.h (general_scalar_chain::defs_map):
>>>      New member.
>>>      (general_scalar_chain::replace_with_subreg): Remove.
>>>      (general_scalar_chain::replace_with_subreg_in_insn): Likewise.
>>>      (general_scalar_chain::convert_reg): Adjust signature.
>>>      * config/i386/i386-features.c (scalar_chain::add_insn): Do not
>>>      iterate over all defs of a reg.
>>>      (general_scalar_chain::replace_with_subreg): Remove.
>>>      (general_scalar_chain::replace_with_subreg_in_insn): Likewise.
>>>      (general_scalar_chain::make_vector_copies): Populate defs_map,
>>>      place copy only after defs that are used as vectors in the chain.
>>>      (general_scalar_chain::convert_reg): Emit a copy for a specific
>>>      def in a specific instruction.
>>>      (general_scalar_chain::convert_op): All reg uses are converted here.
>>>      (general_scalar_chain::convert_insn): Emit copies for scalar
>>>      uses of defs here.  Replace uses with the copies we created.
>>>      Replace and convert the def.  Adjust REG_DEAD notes, remove
>>>      REG_EQUIV/EQUAL notes.
>>>      (general_scalar_chain::convert_registers): Only handle copies
>>>      into the chain here.
> 
> Rubberstamped with LGTM. It looks you are the master of this domain now ;)

This  breaks bootstrap for i686-darwin (and most likely is the cause of bootstrap fail on 
i686-linux i686-linux-gnu at https://gcc.gnu.org/ml/gcc-regression/2019-08/msg00398.html 
et al) 
It gives a bunch of compare errors spread around the tree - so no specific pointer there.

There’s a secondary fail overlaying it between 274933-or so and 274980 which confused
my initial search.

Iain




> 
> Thanks,
> Uros.
> 
>>> 
>>> 
>>> Index: gcc/config/i386/i386-features.c
>>> ===================================================================
>>> --- gcc/config/i386/i386-features.c   (revision 274843)
>>> +++ gcc/config/i386/i386-features.c   (working copy)
>>> @@ -416,13 +416,9 @@ scalar_chain::add_insn (bitmap candidate
>>>      iterates over all refs to look for dual-mode regs.  Instead this
>>>      should be done separately for all regs mentioned in the chain once.  */
>>>   df_ref ref;
>>> -  df_ref def;
>>>   for (ref = DF_INSN_UID_DEFS (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref))
>>>     if (!HARD_REGISTER_P (DF_REF_REG (ref)))
>>> -      for (def = DF_REG_DEF_CHAIN (DF_REF_REGNO (ref));
>>> -        def;
>>> -        def = DF_REF_NEXT_REG (def))
>>> -     analyze_register_chain (candidates, def);
>>> +      analyze_register_chain (candidates, ref);
>>>   for (ref = DF_INSN_UID_USES (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref))
>>>     if (!DF_REF_REG_MEM_P (ref))
>>>       analyze_register_chain (candidates, ref);
>>> @@ -605,42 +601,6 @@ general_scalar_chain::compute_convert_ga
>>>   return gain;
>>> }
>>> 
>>> -/* Replace REG in X with a V2DI subreg of NEW_REG.  */
>>> -
>>> -rtx
>>> -general_scalar_chain::replace_with_subreg (rtx x, rtx reg, rtx new_reg)
>>> -{
>>> -  if (x == reg)
>>> -    return gen_rtx_SUBREG (vmode, new_reg, 0);
>>> -
>>> -  /* But not in memory addresses.  */
>>> -  if (MEM_P (x))
>>> -    return x;
>>> -
>>> -  const char *fmt = GET_RTX_FORMAT (GET_CODE (x));
>>> -  int i, j;
>>> -  for (i = GET_RTX_LENGTH (GET_CODE (x)) - 1; i >= 0; i--)
>>> -    {
>>> -      if (fmt[i] == 'e')
>>> -     XEXP (x, i) = replace_with_subreg (XEXP (x, i), reg, new_reg);
>>> -      else if (fmt[i] == 'E')
>>> -     for (j = XVECLEN (x, i) - 1; j >= 0; j--)
>>> -       XVECEXP (x, i, j) = replace_with_subreg (XVECEXP (x, i, j),
>>> -                                                reg, new_reg);
>>> -    }
>>> -
>>> -  return x;
>>> -}
>>> -
>>> -/* Replace REG in INSN with a V2DI subreg of NEW_REG.  */
>>> -
>>> -void
>>> -general_scalar_chain::replace_with_subreg_in_insn (rtx_insn *insn,
>>> -                                               rtx reg, rtx new_reg)
>>> -{
>>> -  replace_with_subreg (single_set (insn), reg, new_reg);
>>> -}
>>> -
>>> /* Insert generated conversion instruction sequence INSNS
>>>    after instruction AFTER.  New BB may be required in case
>>>    instruction has EH region attached.  */
>>> @@ -691,204 +651,147 @@ general_scalar_chain::make_vector_copies
>>>   rtx vreg = gen_reg_rtx (smode);
>>>   df_ref ref;
>>> 
>>> -  for (ref = DF_REG_DEF_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref))
>>> -    if (!bitmap_bit_p (insns, DF_REF_INSN_UID (ref)))
>>> -      {
>>> -     start_sequence ();
>>> -     if (!TARGET_INTER_UNIT_MOVES_TO_VEC)
>>> -       {
>>> -         rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP);
>>> -         if (smode == DImode && !TARGET_64BIT)
>>> -           {
>>> -             emit_move_insn (adjust_address (tmp, SImode, 0),
>>> -                             gen_rtx_SUBREG (SImode, reg, 0));
>>> -             emit_move_insn (adjust_address (tmp, SImode, 4),
>>> -                             gen_rtx_SUBREG (SImode, reg, 4));
>>> -           }
>>> -         else
>>> -           emit_move_insn (copy_rtx (tmp), reg);
>>> -         emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
>>> -                                 gen_gpr_to_xmm_move_src (vmode, tmp)));
>>> -       }
>>> -     else if (!TARGET_64BIT && smode == DImode)
>>> -       {
>>> -         if (TARGET_SSE4_1)
>>> -           {
>>> -             emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0),
>>> -                                         CONST0_RTX (V4SImode),
>>> -                                         gen_rtx_SUBREG (SImode, reg, 0)));
>>> -             emit_insn (gen_sse4_1_pinsrd (gen_rtx_SUBREG (V4SImode, vreg, 0),
>>> -                                           gen_rtx_SUBREG (V4SImode, vreg, 0),
>>> -                                           gen_rtx_SUBREG (SImode, reg, 4),
>>> -                                           GEN_INT (2)));
>>> -           }
>>> -         else
>>> -           {
>>> -             rtx tmp = gen_reg_rtx (DImode);
>>> -             emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0),
>>> -                                         CONST0_RTX (V4SImode),
>>> -                                         gen_rtx_SUBREG (SImode, reg, 0)));
>>> -             emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, tmp, 0),
>>> -                                         CONST0_RTX (V4SImode),
>>> -                                         gen_rtx_SUBREG (SImode, reg, 4)));
>>> -             emit_insn (gen_vec_interleave_lowv4si
>>> -                        (gen_rtx_SUBREG (V4SImode, vreg, 0),
>>> -                         gen_rtx_SUBREG (V4SImode, vreg, 0),
>>> -                         gen_rtx_SUBREG (V4SImode, tmp, 0)));
>>> -           }
>>> -       }
>>> -     else
>>> -       emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
>>> -                               gen_gpr_to_xmm_move_src (vmode, reg)));
>>> -     rtx_insn *seq = get_insns ();
>>> -     end_sequence ();
>>> -     rtx_insn *insn = DF_REF_INSN (ref);
>>> -     emit_conversion_insns (seq, insn);
>>> -
>>> -     if (dump_file)
>>> -       fprintf (dump_file,
>>> -                "  Copied r%d to a vector register r%d for insn %d\n",
>>> -                regno, REGNO (vreg), INSN_UID (insn));
>>> -      }
>>> -
>>> -  for (ref = DF_REG_USE_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref))
>>> -    if (bitmap_bit_p (insns, DF_REF_INSN_UID (ref)))
>>> -      {
>>> -     rtx_insn *insn = DF_REF_INSN (ref);
>>> -     replace_with_subreg_in_insn (insn, reg, vreg);
>>> -
>>> -     if (dump_file)
>>> -       fprintf (dump_file, "  Replaced r%d with r%d in insn %d\n",
>>> -                regno, REGNO (vreg), INSN_UID (insn));
>>> -      }
>>> -}
>>> -
>>> -/* Convert all definitions of register REGNO
>>> -   and fix its uses.  Scalar copies may be created
>>> -   in case register is used in not convertible insn.  */
>>> -
>>> -void
>>> -general_scalar_chain::convert_reg (unsigned regno)
>>> -{
>>> -  bool scalar_copy = bitmap_bit_p (defs_conv, regno);
>>> -  rtx reg = regno_reg_rtx[regno];
>>> -  rtx scopy = NULL_RTX;
>>> -  df_ref ref;
>>> -  bitmap conv;
>>> -
>>> -  conv = BITMAP_ALLOC (NULL);
>>> -  bitmap_copy (conv, insns);
>>> -
>>> -  if (scalar_copy)
>>> -    scopy = gen_reg_rtx (smode);
>>> +  defs_map.put (reg, vreg);
>>> 
>>> +  /* For each insn defining REGNO, see if it is defined by an insn
>>> +     not part of the chain but with uses in insns part of the chain
>>> +     and insert a copy in that case.  */
>>>   for (ref = DF_REG_DEF_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref))
>>>     {
>>> -      rtx_insn *insn = DF_REF_INSN (ref);
>>> -      rtx def_set = single_set (insn);
>>> -      rtx src = SET_SRC (def_set);
>>> -      rtx reg = DF_REF_REG (ref);
>>> +      if (bitmap_bit_p (insns, DF_REF_INSN_UID (ref)))
>>> +     continue;
>>> +      df_link *use;
>>> +      for (use = DF_REF_CHAIN (ref); use; use = use->next)
>>> +     if (!DF_REF_REG_MEM_P (use->ref)
>>> +         && bitmap_bit_p (insns, DF_REF_INSN_UID (use->ref)))
>>> +       break;
>>> +      if (!use)
>>> +     continue;
>>> 
>>> -      if (!MEM_P (src))
>>> +      start_sequence ();
>>> +      if (!TARGET_INTER_UNIT_MOVES_TO_VEC)
>>>      {
>>> -       replace_with_subreg_in_insn (insn, reg, reg);
>>> -       bitmap_clear_bit (conv, INSN_UID (insn));
>>> +       rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP);
>>> +       if (smode == DImode && !TARGET_64BIT)
>>> +         {
>>> +           emit_move_insn (adjust_address (tmp, SImode, 0),
>>> +                           gen_rtx_SUBREG (SImode, reg, 0));
>>> +           emit_move_insn (adjust_address (tmp, SImode, 4),
>>> +                           gen_rtx_SUBREG (SImode, reg, 4));
>>> +         }
>>> +       else
>>> +         emit_move_insn (copy_rtx (tmp), reg);
>>> +       emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
>>> +                               gen_gpr_to_xmm_move_src (vmode, tmp)));
>>>      }
>>> -
>>> -      if (scalar_copy)
>>> +      else if (!TARGET_64BIT && smode == DImode)
>>>      {
>>> -       start_sequence ();
>>> -       if (!TARGET_INTER_UNIT_MOVES_FROM_VEC)
>>> +       if (TARGET_SSE4_1)
>>>          {
>>> -           rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP);
>>> -           emit_move_insn (tmp, reg);
>>> -           if (!TARGET_64BIT && smode == DImode)
>>> -             {
>>> -               emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 0),
>>> -                               adjust_address (tmp, SImode, 0));
>>> -               emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 4),
>>> -                               adjust_address (tmp, SImode, 4));
>>> -             }
>>> -           else
>>> -             emit_move_insn (scopy, copy_rtx (tmp));
>>> +           emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0),
>>> +                                       CONST0_RTX (V4SImode),
>>> +                                       gen_rtx_SUBREG (SImode, reg, 0)));
>>> +           emit_insn (gen_sse4_1_pinsrd (gen_rtx_SUBREG (V4SImode, vreg, 0),
>>> +                                         gen_rtx_SUBREG (V4SImode, vreg, 0),
>>> +                                         gen_rtx_SUBREG (SImode, reg, 4),
>>> +                                         GEN_INT (2)));
>>>          }
>>> -       else if (!TARGET_64BIT && smode == DImode)
>>> +       else
>>>          {
>>> -           if (TARGET_SSE4_1)
>>> -             {
>>> -               rtx tmp = gen_rtx_PARALLEL (VOIDmode,
>>> -                                           gen_rtvec (1, const0_rtx));
>>> -               emit_insn
>>> -                 (gen_rtx_SET
>>> -                    (gen_rtx_SUBREG (SImode, scopy, 0),
>>> -                     gen_rtx_VEC_SELECT (SImode,
>>> -                                         gen_rtx_SUBREG (V4SImode, reg, 0),
>>> -                                         tmp)));
>>> -
>>> -               tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, const1_rtx));
>>> -               emit_insn
>>> -                 (gen_rtx_SET
>>> -                    (gen_rtx_SUBREG (SImode, scopy, 4),
>>> -                     gen_rtx_VEC_SELECT (SImode,
>>> -                                         gen_rtx_SUBREG (V4SImode, reg, 0),
>>> -                                         tmp)));
>>> -             }
>>> -           else
>>> -             {
>>> -               rtx vcopy = gen_reg_rtx (V2DImode);
>>> -               emit_move_insn (vcopy, gen_rtx_SUBREG (V2DImode, reg, 0));
>>> -               emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 0),
>>> -                               gen_rtx_SUBREG (SImode, vcopy, 0));
>>> -               emit_move_insn (vcopy,
>>> -                               gen_rtx_LSHIFTRT (V2DImode,
>>> -                                                 vcopy, GEN_INT (32)));
>>> -               emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 4),
>>> -                               gen_rtx_SUBREG (SImode, vcopy, 0));
>>> -             }
>>> +           rtx tmp = gen_reg_rtx (DImode);
>>> +           emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0),
>>> +                                       CONST0_RTX (V4SImode),
>>> +                                       gen_rtx_SUBREG (SImode, reg, 0)));
>>> +           emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, tmp, 0),
>>> +                                       CONST0_RTX (V4SImode),
>>> +                                       gen_rtx_SUBREG (SImode, reg, 4)));
>>> +           emit_insn (gen_vec_interleave_lowv4si
>>> +                      (gen_rtx_SUBREG (V4SImode, vreg, 0),
>>> +                       gen_rtx_SUBREG (V4SImode, vreg, 0),
>>> +                       gen_rtx_SUBREG (V4SImode, tmp, 0)));
>>>          }
>>> -       else
>>> -         emit_move_insn (scopy, reg);
>>> -
>>> -       rtx_insn *seq = get_insns ();
>>> -       end_sequence ();
>>> -       emit_conversion_insns (seq, insn);
>>> -
>>> -       if (dump_file)
>>> -         fprintf (dump_file,
>>> -                  "  Copied r%d to a scalar register r%d for insn %d\n",
>>> -                  regno, REGNO (scopy), INSN_UID (insn));
>>>      }
>>> -    }
>>> -
>>> -  for (ref = DF_REG_USE_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref))
>>> -    if (bitmap_bit_p (insns, DF_REF_INSN_UID (ref)))
>>> -      {
>>> -     if (bitmap_bit_p (conv, DF_REF_INSN_UID (ref)))
>>> -       {
>>> -         rtx_insn *insn = DF_REF_INSN (ref);
>>> +      else
>>> +     emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
>>> +                             gen_gpr_to_xmm_move_src (vmode, reg)));
>>> +      rtx_insn *seq = get_insns ();
>>> +      end_sequence ();
>>> +      rtx_insn *insn = DF_REF_INSN (ref);
>>> +      emit_conversion_insns (seq, insn);
>>> 
>>> -         rtx def_set = single_set (insn);
>>> -         gcc_assert (def_set);
>>> +      if (dump_file)
>>> +     fprintf (dump_file,
>>> +              "  Copied r%d to a vector register r%d for insn %d\n",
>>> +              regno, REGNO (vreg), INSN_UID (insn));
>>> +    }
>>> +}
>>> 
>>> -         rtx src = SET_SRC (def_set);
>>> -         rtx dst = SET_DEST (def_set);
>>> +/* Copy the definition SRC of INSN inside the chain to DST for
>>> +   scalar uses outside of the chain.  */
>>> 
>>> -         if (!MEM_P (dst) || !REG_P (src))
>>> -           replace_with_subreg_in_insn (insn, reg, reg);
>>> +void
>>> +general_scalar_chain::convert_reg (rtx_insn *insn, rtx dst, rtx src)
>>> +{
>>> +  start_sequence ();
>>> +  if (!TARGET_INTER_UNIT_MOVES_FROM_VEC)
>>> +    {
>>> +      rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP);
>>> +      emit_move_insn (tmp, src);
>>> +      if (!TARGET_64BIT && smode == DImode)
>>> +     {
>>> +       emit_move_insn (gen_rtx_SUBREG (SImode, dst, 0),
>>> +                       adjust_address (tmp, SImode, 0));
>>> +       emit_move_insn (gen_rtx_SUBREG (SImode, dst, 4),
>>> +                       adjust_address (tmp, SImode, 4));
>>> +     }
>>> +      else
>>> +     emit_move_insn (dst, copy_rtx (tmp));
>>> +    }
>>> +  else if (!TARGET_64BIT && smode == DImode)
>>> +    {
>>> +      if (TARGET_SSE4_1)
>>> +     {
>>> +       rtx tmp = gen_rtx_PARALLEL (VOIDmode,
>>> +                                   gen_rtvec (1, const0_rtx));
>>> +       emit_insn
>>> +           (gen_rtx_SET
>>> +            (gen_rtx_SUBREG (SImode, dst, 0),
>>> +             gen_rtx_VEC_SELECT (SImode,
>>> +                                 gen_rtx_SUBREG (V4SImode, src, 0),
>>> +                                 tmp)));
>>> +
>>> +       tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, const1_rtx));
>>> +       emit_insn
>>> +           (gen_rtx_SET
>>> +            (gen_rtx_SUBREG (SImode, dst, 4),
>>> +             gen_rtx_VEC_SELECT (SImode,
>>> +                                 gen_rtx_SUBREG (V4SImode, src, 0),
>>> +                                 tmp)));
>>> +     }
>>> +      else
>>> +     {
>>> +       rtx vcopy = gen_reg_rtx (V2DImode);
>>> +       emit_move_insn (vcopy, gen_rtx_SUBREG (V2DImode, src, 0));
>>> +       emit_move_insn (gen_rtx_SUBREG (SImode, dst, 0),
>>> +                       gen_rtx_SUBREG (SImode, vcopy, 0));
>>> +       emit_move_insn (vcopy,
>>> +                       gen_rtx_LSHIFTRT (V2DImode,
>>> +                                         vcopy, GEN_INT (32)));
>>> +       emit_move_insn (gen_rtx_SUBREG (SImode, dst, 4),
>>> +                       gen_rtx_SUBREG (SImode, vcopy, 0));
>>> +     }
>>> +    }
>>> +  else
>>> +    emit_move_insn (dst, src);
>>> 
>>> -         bitmap_clear_bit (conv, INSN_UID (insn));
>>> -       }
>>> -      }
>>> -    /* Skip debug insns and uninitialized uses.  */
>>> -    else if (DF_REF_CHAIN (ref)
>>> -          && NONDEBUG_INSN_P (DF_REF_INSN (ref)))
>>> -      {
>>> -     gcc_assert (scopy);
>>> -     replace_rtx (DF_REF_INSN (ref), reg, scopy);
>>> -     df_insn_rescan (DF_REF_INSN (ref));
>>> -      }
>>> +  rtx_insn *seq = get_insns ();
>>> +  end_sequence ();
>>> +  emit_conversion_insns (seq, insn);
>>> 
>>> -  BITMAP_FREE (conv);
>>> +  if (dump_file)
>>> +    fprintf (dump_file,
>>> +          "  Copied r%d to a scalar register r%d for insn %d\n",
>>> +          REGNO (src), REGNO (dst), INSN_UID (insn));
>>> }
>>> 
>>> /* Convert operand OP in INSN.  We should handle
>>> @@ -921,16 +824,6 @@ general_scalar_chain::convert_op (rtx *o
>>>     }
>>>   else if (REG_P (*op))
>>>     {
>>> -      /* We may have not converted register usage in case
>>> -      this register has no definition.  Otherwise it
>>> -      should be converted in convert_reg.  */
>>> -      df_ref ref;
>>> -      FOR_EACH_INSN_USE (ref, insn)
>>> -     if (DF_REF_REGNO (ref) == REGNO (*op))
>>> -       {
>>> -         gcc_assert (!DF_REF_CHAIN (ref));
>>> -         break;
>>> -       }
>>>       *op = gen_rtx_SUBREG (vmode, *op, 0);
>>>     }
>>>   else if (CONST_INT_P (*op))
>>> @@ -975,6 +868,32 @@ general_scalar_chain::convert_op (rtx *o
>>> void
>>> general_scalar_chain::convert_insn (rtx_insn *insn)
>>> {
>>> +  /* Generate copies for out-of-chain uses of defs.  */
>>> +  for (df_ref ref = DF_INSN_DEFS (insn); ref; ref = DF_REF_NEXT_LOC (ref))
>>> +    if (bitmap_bit_p (defs_conv, DF_REF_REGNO (ref)))
>>> +      {
>>> +     df_link *use;
>>> +     for (use = DF_REF_CHAIN (ref); use; use = use->next)
>>> +       if (DF_REF_REG_MEM_P (use->ref)
>>> +           || !bitmap_bit_p (insns, DF_REF_INSN_UID (use->ref)))
>>> +         break;
>>> +     if (use)
>>> +       convert_reg (insn, DF_REF_REG (ref),
>>> +                    *defs_map.get (regno_reg_rtx [DF_REF_REGNO (ref)]));
>>> +      }
>>> +
>>> +  /* Replace uses in this insn with the defs we use in the chain.  */
>>> +  for (df_ref ref = DF_INSN_USES (insn); ref; ref = DF_REF_NEXT_LOC (ref))
>>> +    if (!DF_REF_REG_MEM_P (ref))
>>> +      if (rtx *vreg = defs_map.get (regno_reg_rtx[DF_REF_REGNO (ref)]))
>>> +     {
>>> +       /* Also update a corresponding REG_DEAD note.  */
>>> +       rtx note = find_reg_note (insn, REG_DEAD, DF_REF_REG (ref));
>>> +       if (note)
>>> +         XEXP (note, 0) = *vreg;
>>> +       *DF_REF_REAL_LOC (ref) = *vreg;
>>> +     }
>>> +
>>>   rtx def_set = single_set (insn);
>>>   rtx src = SET_SRC (def_set);
>>>   rtx dst = SET_DEST (def_set);
>>> @@ -988,6 +907,20 @@ general_scalar_chain::convert_insn (rtx_
>>>       emit_conversion_insns (gen_move_insn (dst, tmp), insn);
>>>       dst = gen_rtx_SUBREG (vmode, tmp, 0);
>>>     }
>>> +  else if (REG_P (dst))
>>> +    {
>>> +      /* Replace the definition with a SUBREG to the definition we
>>> +         use inside the chain.  */
>>> +      rtx *vdef = defs_map.get (dst);
>>> +      if (vdef)
>>> +     dst = *vdef;
>>> +      dst = gen_rtx_SUBREG (vmode, dst, 0);
>>> +      /* IRA doesn't like to have REG_EQUAL/EQUIV notes when the SET_DEST
>>> +         is a non-REG_P.  So kill those off.  */
>>> +      rtx note = find_reg_equal_equiv_note (insn);
>>> +      if (note)
>>> +     remove_note (insn, note);
>>> +    }
>>> 
>>>   switch (GET_CODE (src))
>>>     {
>>> @@ -1045,20 +978,15 @@ general_scalar_chain::convert_insn (rtx_
>>>     case COMPARE:
>>>       src = SUBREG_REG (XEXP (XEXP (src, 0), 0));
>>> 
>>> -      gcc_assert ((REG_P (src) && GET_MODE (src) == DImode)
>>> -               || (SUBREG_P (src) && GET_MODE (src) == V2DImode));
>>> -
>>> -      if (REG_P (src))
>>> -     subreg = gen_rtx_SUBREG (V2DImode, src, 0);
>>> -      else
>>> -     subreg = copy_rtx_if_shared (src);
>>> +      gcc_assert (REG_P (src) && GET_MODE (src) == DImode);
>>> +      subreg = gen_rtx_SUBREG (V2DImode, src, 0);
>>>       emit_insn_before (gen_vec_interleave_lowv2di (copy_rtx_if_shared (subreg),
>>>                                                  copy_rtx_if_shared (subreg),
>>>                                                  copy_rtx_if_shared (subreg)),
>>>                      insn);
>>>       dst = gen_rtx_REG (CCmode, FLAGS_REG);
>>> -      src = gen_rtx_UNSPEC (CCmode, gen_rtvec (2, copy_rtx_if_shared (src),
>>> -                                            copy_rtx_if_shared (src)),
>>> +      src = gen_rtx_UNSPEC (CCmode, gen_rtvec (2, copy_rtx_if_shared (subreg),
>>> +                                            copy_rtx_if_shared (subreg)),
>>>                          UNSPEC_PTEST);
>>>       break;
>>> 
>>> @@ -1217,16 +1145,15 @@ timode_scalar_chain::convert_insn (rtx_i
>>>   df_insn_rescan (insn);
>>> }
>>> 
>>> +/* Generate copies from defs used by the chain but not defined therein.
>>> +   Also populates defs_map which is used later by convert_insn.  */
>>> +
>>> void
>>> general_scalar_chain::convert_registers ()
>>> {
>>>   bitmap_iterator bi;
>>>   unsigned id;
>>> -
>>> -  EXECUTE_IF_SET_IN_BITMAP (defs, 0, id, bi)
>>> -    convert_reg (id);
>>> -
>>> -  EXECUTE_IF_AND_COMPL_IN_BITMAP (defs_conv, defs, 0, id, bi)
>>> +  EXECUTE_IF_SET_IN_BITMAP (defs_conv, 0, id, bi)
>>>     make_vector_copies (id);
>>> }
>>> 
>>> Index: gcc/config/i386/i386-features.h
>>> ===================================================================
>>> --- gcc/config/i386/i386-features.h   (revision 274843)
>>> +++ gcc/config/i386/i386-features.h   (working copy)
>>> @@ -171,12 +171,11 @@ class general_scalar_chain : public scal
>>>     : scalar_chain (smode_, vmode_) {}
>>>   int compute_convert_gain ();
>>>  private:
>>> +  hash_map<rtx, rtx> defs_map;
>>>   void mark_dual_mode_def (df_ref def);
>>> -  rtx replace_with_subreg (rtx x, rtx reg, rtx subreg);
>>> -  void replace_with_subreg_in_insn (rtx_insn *insn, rtx reg, rtx subreg);
>>>   void convert_insn (rtx_insn *insn);
>>>   void convert_op (rtx *op, rtx_insn *insn);
>>> -  void convert_reg (unsigned regno);
>>> +  void convert_reg (rtx_insn *insn, rtx dst, rtx src);
>>>   void make_vector_copies (unsigned regno);
>>>   void convert_registers ();
>>>   int vector_const_cost (rtx exp);
>>> 
>> 
>> --
>> Richard Biener <rguenther@suse.de>
>> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
>> Germany; GF: Felix Imendörffer; HRB 247165 (AG München)
Richard Biener Aug. 28, 2019, 1:54 p.m. UTC | #7
On Wed, 28 Aug 2019, Iain Sandoe wrote:

> 
> 
> 
> > On 26 Aug 2019, at 11:06, Uros Bizjak <ubizjak@gmail.com> wrote:
> > 
> > On Mon, Aug 26, 2019 at 10:40 AM Richard Biener <rguenther@suse.de> wrote:
> >> 
> >> On Fri, 23 Aug 2019, Richard Biener wrote:
> >> 
> >>> On Fri, 23 Aug 2019, Richard Biener wrote:
> >>> 
> >>>> On Fri, 23 Aug 2019, Uros Bizjak wrote:
> >>>> 
> >>>>> On Thu, Aug 22, 2019 at 3:35 PM Richard Biener <rguenther@suse.de> wrote:
> >>>>>> 
> >>>>>> 
> >>>>>> This fixes quadraticness in STV and makes
> >>>>>> 
> >>>>>> machine dep reorg                  :  89.07 ( 95%)   0.02 ( 18%)  89.10 (
> >>>>>> 95%)      54 kB (  0%)
> >>>>>> 
> >>>>>> drop to zero.  Anybody remembers why it is the way it is now?
> >>>>>> 
> >>>>>> Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> >>>>>> 
> >>>>>> OK?
> >>>>> 
> >>>>> Looking at the PR, comment #3 [1], I assume this patch is obsoltete
> >>>>> and will be updated?
> >>>>> 
> >>>>> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91522#c3
> >>>> 
> >>>> Yes.  I'm still learning how STV operates (and learing DF and RTL...).
> >>>> The following is a rewrite of the non-TImode chain conversion
> >>>> according to I think how it should operate als allowing the hunk
> >>>> that fixes the compile-time and fixing PR91527 on the way
> >>>> (which I ran into during more extensive testing of the patch myself).
> >>>> 
> >>>> So compared to the state before which I still do not 100% understand
> >>>> we now do the following.  Chain detection works as before including
> >>>> recording of all defs (both defined by the insns in the chain and
> >>>> insns outside) that need copy-in or copy-out operations.
> >>>> 
> >>>> But then the patch changes things as to guarantee that
> >>>> after the conversion all uses/defs of a pseudo are
> >>>> of the (subreg:Vmode ..) form or of the original scalar form.
> >>>> In particular it avoids the need to change any insns that
> >>>> are not part of the chain (besides emitting the extra copy
> >>>> instructions).  For this all defs that were marked as needing
> >>>> copies (thus they have uses/defs both in the chain and outside)
> >>>> the chain will use a new pseudo that we copy to from scalar sources
> >>>> and that we copy from for scalar uses.  There's the new defs_map
> >>>> which records the mapping of old to new reg.  pseudos that are
> >>>> only used in the chain already are not remapped.
> >>>> 
> >>>> The conversion itself then happens in two stages, first,
> >>>> in make_vector_copies, we emit the copy-in insns and
> >>>> allocate all pseudos we need.  Then the rest of the
> >>>> conversion happens fully inside of convert_insn where
> >>>> we generate the copy-outs of the insns def, replace
> >>>> defs and uses according to the mapping and replace uses
> >>>> and defs with the (subreg:Vmode ..) style.
> >>>> 
> >>>> For PR91527 IRA doesn't like the REG_EQUIV note in
> >>>> 
> >>>> (insn 4 24 5 2 (set (subreg:V4SI (reg/v:SI 90 [ c ]) 0)
> >>>>        (subreg:V4SI (reg:SI 100) 0))
> >>>> "/space/rguenther/src/svn/trunk2/gcc/testsuite/g++.dg/tree-ssa/pr21463.C":11:4
> >>>> 1248 {movv4si_internal}
> >>>>     (expr_list:REG_DEAD (reg:SI 100)
> >>>>        (expr_list:REG_EQUIV (mem/c:SI (plus:DI (reg/f:DI 16 argp)
> >>>>                    (const_int 16 [0x10])) [1 c+0 S4 A64])
> >>>>            (nil))))
> >>>> 
> >>>> because the SET_DEST is not a REG_P.  I'm not sure if this
> >>>> is invalid RTL, docs say SET_DEST might be a strict_low_part
> >>>> or a zero_extract but doesn't mention a subreg.  So I opted
> >>>> to simply remove equal/equiv notes on insns we convert
> >>>> and since the above has a REG_DEAD note I took the liberty
> >>>> to update that according to the mapping (so that would have
> >>>> been not needed before this patch) rather than dropping it.
> >>>> 
> >>>> Bootstrapped with and without --with-march=westmere (to get
> >>>> some STV coverage, this included all languages) on
> >>>> x88_64-unknown-linux-gnu, testing in progress.
> >>>> 
> >>>> OK if testing succeeds?
> >>> 
> >>> Testing revealed I made an error in general_scalar_chain::convert_insn
> >>> failing to move down SET_SRC extraction below replacing with
> >>> the defs map.  This showed in 4 execute FAILs in 32bit fortran
> >>> testing (only).  Fixed by moving down the whole def_set/src/dst
> >>> extraction.
> >>> 
> >>> Re-testing on x86_64-unknown-linux-gnu.
> >> 
> >> Bootstrapped / tested on x86_64-unknown-linux-gnu.  The "no-costmodel"
> >> run runs into the latent PR91528 building 32bit libada in stage3
> >> for a few sources, I've manually built those with -mno-stv and
> >> bootstrap finishes successfully.  I hope HJ can help with this
> >> dynamic stack-alignment issue.
> >> 
> >> So - OK for trunk?
> >> 
> >> As followup we can now remove general_remove_non_convertible_regs
> >> because we can handle defs that cannot be converted just fine
> >> with the patch since we're splitting live-ranges of all defs at
> >> the chain boundary.
> >> 
> >> Thanks,
> >> Richard.
> >> 
> >>> Updated patch below.  I'm feeling adventurous and will run
> >>> the "westmere" bootstrap with costing disabled (aka always
> >>> convert detected chains...).
> >>> 
> >>> Richard.
> >>> 
> >>> 2019-08-23  Richard Biener  <rguenther@suse.de>
> >>> 
> >>>      PR target/91522
> >>>      PR target/91527
> >>>      * config/i386/i386-features.h (general_scalar_chain::defs_map):
> >>>      New member.
> >>>      (general_scalar_chain::replace_with_subreg): Remove.
> >>>      (general_scalar_chain::replace_with_subreg_in_insn): Likewise.
> >>>      (general_scalar_chain::convert_reg): Adjust signature.
> >>>      * config/i386/i386-features.c (scalar_chain::add_insn): Do not
> >>>      iterate over all defs of a reg.
> >>>      (general_scalar_chain::replace_with_subreg): Remove.
> >>>      (general_scalar_chain::replace_with_subreg_in_insn): Likewise.
> >>>      (general_scalar_chain::make_vector_copies): Populate defs_map,
> >>>      place copy only after defs that are used as vectors in the chain.
> >>>      (general_scalar_chain::convert_reg): Emit a copy for a specific
> >>>      def in a specific instruction.
> >>>      (general_scalar_chain::convert_op): All reg uses are converted here.
> >>>      (general_scalar_chain::convert_insn): Emit copies for scalar
> >>>      uses of defs here.  Replace uses with the copies we created.
> >>>      Replace and convert the def.  Adjust REG_DEAD notes, remove
> >>>      REG_EQUIV/EQUAL notes.
> >>>      (general_scalar_chain::convert_registers): Only handle copies
> >>>      into the chain here.
> > 
> > Rubberstamped with LGTM. It looks you are the master of this domain now ;)
> 
> This  breaks bootstrap for i686-darwin (and most likely is the cause of bootstrap fail on 
> i686-linux i686-linux-gnu at https://gcc.gnu.org/ml/gcc-regression/2019-08/msg00398.html 
> et al) 
> It gives a bunch of compare errors spread around the tree - so no specific pointer there.
> 
> There’s a secondary fail overlaying it between 274933-or so and 274980 which confused
> my initial search.

Please file a bugreport.  Don't have time today to dig into but will do
tomorrow (but now at least try to reproduce).

Richard.
Uros Bizjak Aug. 28, 2019, 1:58 p.m. UTC | #8
On Wed, Aug 28, 2019 at 3:54 PM Richard Biener <rguenther@suse.de> wrote:
> > >>> 2019-08-23  Richard Biener  <rguenther@suse.de>
> > >>>
> > >>>      PR target/91522
> > >>>      PR target/91527
> > >>>      * config/i386/i386-features.h (general_scalar_chain::defs_map):
> > >>>      New member.
> > >>>      (general_scalar_chain::replace_with_subreg): Remove.
> > >>>      (general_scalar_chain::replace_with_subreg_in_insn): Likewise.
> > >>>      (general_scalar_chain::convert_reg): Adjust signature.
> > >>>      * config/i386/i386-features.c (scalar_chain::add_insn): Do not
> > >>>      iterate over all defs of a reg.
> > >>>      (general_scalar_chain::replace_with_subreg): Remove.
> > >>>      (general_scalar_chain::replace_with_subreg_in_insn): Likewise.
> > >>>      (general_scalar_chain::make_vector_copies): Populate defs_map,
> > >>>      place copy only after defs that are used as vectors in the chain.
> > >>>      (general_scalar_chain::convert_reg): Emit a copy for a specific
> > >>>      def in a specific instruction.
> > >>>      (general_scalar_chain::convert_op): All reg uses are converted here.
> > >>>      (general_scalar_chain::convert_insn): Emit copies for scalar
> > >>>      uses of defs here.  Replace uses with the copies we created.
> > >>>      Replace and convert the def.  Adjust REG_DEAD notes, remove
> > >>>      REG_EQUIV/EQUAL notes.
> > >>>      (general_scalar_chain::convert_registers): Only handle copies
> > >>>      into the chain here.
> > >
> > > Rubberstamped with LGTM. It looks you are the master of this domain now ;)
> >
> > This  breaks bootstrap for i686-darwin (and most likely is the cause of bootstrap fail on
> > i686-linux i686-linux-gnu at https://gcc.gnu.org/ml/gcc-regression/2019-08/msg00398.html
> > et al)
> > It gives a bunch of compare errors spread around the tree - so no specific pointer there.
> >
> > There’s a secondary fail overlaying it between 274933-or so and 274980 which confused
> > my initial search.
>
> Please file a bugreport.  Don't have time today to dig into but will do
> tomorrow (but now at least try to reproduce).

Looking at HJ's testreports, it looks configuring  for i686-linux
--with-fpmath=sse should be enough to trigger comparison failure.

Uros.
H.J. Lu May 9, 2020, 7:21 p.m. UTC | #9
On Fri, Aug 23, 2019 at 3:57 AM Richard Biener <rguenther@suse.de> wrote:
>
> On Fri, 23 Aug 2019, Uros Bizjak wrote:
>
> > On Thu, Aug 22, 2019 at 3:35 PM Richard Biener <rguenther@suse.de> wrote:
> > >
> > >
> > > This fixes quadraticness in STV and makes
> > >
> > >  machine dep reorg                  :  89.07 ( 95%)   0.02 ( 18%)  89.10 (
> > > 95%)      54 kB (  0%)
> > >
> > > drop to zero.  Anybody remembers why it is the way it is now?
> > >
> > > Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> > >
> > > OK?
> >
> > Looking at the PR, comment #3 [1], I assume this patch is obsoltete
> > and will be updated?
> >
> > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91522#c3
>
> Yes.  I'm still learning how STV operates (and learing DF and RTL...).
> The following is a rewrite of the non-TImode chain conversion
> according to I think how it should operate als allowing the hunk
> that fixes the compile-time and fixing PR91527 on the way
> (which I ran into during more extensive testing of the patch myself).
>
> So compared to the state before which I still do not 100% understand
> we now do the following.  Chain detection works as before including
> recording of all defs (both defined by the insns in the chain and
> insns outside) that need copy-in or copy-out operations.
>
> But then the patch changes things as to guarantee that
> after the conversion all uses/defs of a pseudo are
> of the (subreg:Vmode ..) form or of the original scalar form.
> In particular it avoids the need to change any insns that
> are not part of the chain (besides emitting the extra copy
> instructions).  For this all defs that were marked as needing
> copies (thus they have uses/defs both in the chain and outside)
> the chain will use a new pseudo that we copy to from scalar sources
> and that we copy from for scalar uses.  There's the new defs_map
> which records the mapping of old to new reg.  pseudos that are
> only used in the chain already are not remapped.
>
> The conversion itself then happens in two stages, first,
> in make_vector_copies, we emit the copy-in insns and
> allocate all pseudos we need.  Then the rest of the
> conversion happens fully inside of convert_insn where
> we generate the copy-outs of the insns def, replace
> defs and uses according to the mapping and replace uses
> and defs with the (subreg:Vmode ..) style.
>
> For PR91527 IRA doesn't like the REG_EQUIV note in
>
> (insn 4 24 5 2 (set (subreg:V4SI (reg/v:SI 90 [ c ]) 0)
>         (subreg:V4SI (reg:SI 100) 0))
> "/space/rguenther/src/svn/trunk2/gcc/testsuite/g++.dg/tree-ssa/pr21463.C":11:4
> 1248 {movv4si_internal}
>      (expr_list:REG_DEAD (reg:SI 100)
>         (expr_list:REG_EQUIV (mem/c:SI (plus:DI (reg/f:DI 16 argp)
>                     (const_int 16 [0x10])) [1 c+0 S4 A64])
>             (nil))))
>
> because the SET_DEST is not a REG_P.  I'm not sure if this
> is invalid RTL, docs say SET_DEST might be a strict_low_part
> or a zero_extract but doesn't mention a subreg.  So I opted
> to simply remove equal/equiv notes on insns we convert
> and since the above has a REG_DEAD note I took the liberty
> to update that according to the mapping (so that would have
> been not needed before this patch) rather than dropping it.
>
> Bootstrapped with and without --with-march=westmere (to get
> some STV coverage, this included all languages) on
> x88_64-unknown-linux-gnu, testing in progress.
>
> OK if testing succeeds?
>
> It still solves the compile-time issue (which is a latent issue,
> btw, and with a carefully crafted testcase can be triggered
> since STV exists for DImode chains with !TARGET_64BIT).
>
> Thanks,
> Richard.
>
> 2019-08-22  Richard Biener  <rguenther@suse.de>
>
>         PR target/91522
>         PR target/91527
>         * config/i386/i386-features.h (general_scalar_chain::defs_map):
>         New member.
>         (general_scalar_chain::replace_with_subreg): Remove.
>         (general_scalar_chain::replace_with_subreg_in_insn): Likewise.
>         (general_scalar_chain::convert_reg): Adjust signature.
>         * config/i386/i386-features.c (scalar_chain::add_insn): Do not
>         iterate over all defs of a reg.
>         (general_scalar_chain::replace_with_subreg): Remove.
>         (general_scalar_chain::replace_with_subreg_in_insn): Likewise.
>         (general_scalar_chain::make_vector_copies): Populate defs_map,
>         place copy only after defs that are used as vectors in the chain.
>         (general_scalar_chain::convert_reg): Emit a copy for a specific
>         def in a specific instruction.
>         (general_scalar_chain::convert_op): All reg uses are converted here.
>         (general_scalar_chain::convert_insn): Emit copies for scalar
>         uses of defs here.  Replace uses with the copies we created.
>         Replace and convert the def.  Adjust REG_DEAD notes, remove
>         REG_EQUIV/EQUAL notes.
>         (general_scalar_chain::convert_registers): Only handle copies
>         into the chain here.
>

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95021


H.J.
diff mbox series

Patch

Index: gcc/config/i386/i386-features.c
===================================================================
--- gcc/config/i386/i386-features.c	(revision 274764)
+++ gcc/config/i386/i386-features.c	(working copy)
@@ -419,10 +419,7 @@  scalar_chain::add_insn (bitmap candidate
   df_ref def;
   for (ref = DF_INSN_UID_DEFS (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref))
     if (!HARD_REGISTER_P (DF_REF_REG (ref)))
-      for (def = DF_REG_DEF_CHAIN (DF_REF_REGNO (ref));
-	   def;
-	   def = DF_REF_NEXT_REG (def))
-	analyze_register_chain (candidates, def);
+      analyze_register_chain (candidates, ref);
   for (ref = DF_INSN_UID_USES (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref))
     if (!DF_REF_REG_MEM_P (ref))
       analyze_register_chain (candidates, ref);