diff mbox

[2/6] combine: If recog fails, try again with zero_ext{ract,end} simplified

Message ID 00854266aef1cbae6d3620f78122e6131c37b73c.1431268135.git.segher@kernel.crashing.org
State New
Headers show

Commit Message

Segher Boessenkool May 10, 2015, 4:13 p.m. UTC
Combine has its own ideas of what is "canonical" RTL, forcing all
backends to have special patterns in their machine description for the
"more simplified" patterns combine often creates, even though the
backend already has patterns for a more general form.  Backends that
do not implement those patterns get less well optimised code.

This patch lifts that burden for two cases: combine often converts
an AND (with, say, 0xff) to a ZERO_EXTEND of a SUBREG; and an LSHIFTRT
followed by an AND to a ZERO_EXTRACT.  This is perfectly helpful for
e.g. MEMs, but not nice if you have instructions to do more generic
masking (like PowerPC rlwinm, and similar on some other archs).

With this patch, if recog_for_combine fails, and there are any
ZERO_EXT* in the pattern to be matched, it tries again with those
expressed as AND etc.  If that also fails it rolls back the changes,
because it might still match after e.g. splitting, and we want to
try the ZERO_EXT* for that as well.

Tested on powerpc-linux, before and after removing many patterns
from the machine description, and checked that the only changes in
the bootstrapped compiler are new and removed functions.

I'll also test on x86_64-linux before committing.


Segher


2015-05-10  Segher Boessenkool   <segher@kernel.crashing.org>

	* combine.c (recog_for_combine_1): New function, factored out
	from recog_for_combine.
	(change_zero_ext): New function.
	(recog_for_combine): If recog fails, try again with the pattern
	modified by change_zero_ext; if that still fails, restore the
	pattern.

---
 gcc/combine.c | 120 +++++++++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 107 insertions(+), 13 deletions(-)

Comments

Jeff Law May 11, 2015, 4:15 a.m. UTC | #1
On 05/10/2015 10:13 AM, Segher Boessenkool wrote:
> Combine has its own ideas of what is "canonical" RTL, forcing all
> backends to have special patterns in their machine description for the
> "more simplified" patterns combine often creates, even though the
> backend already has patterns for a more general form.  Backends that
> do not implement those patterns get less well optimised code.
>
> This patch lifts that burden for two cases: combine often converts
> an AND (with, say, 0xff) to a ZERO_EXTEND of a SUBREG; and an LSHIFTRT
> followed by an AND to a ZERO_EXTRACT.  This is perfectly helpful for
> e.g. MEMs, but not nice if you have instructions to do more generic
> masking (like PowerPC rlwinm, and similar on some other archs).
>
> With this patch, if recog_for_combine fails, and there are any
> ZERO_EXT* in the pattern to be matched, it tries again with those
> expressed as AND etc.  If that also fails it rolls back the changes,
> because it might still match after e.g. splitting, and we want to
> try the ZERO_EXT* for that as well.
>
> Tested on powerpc-linux, before and after removing many patterns
> from the machine description, and checked that the only changes in
> the bootstrapped compiler are new and removed functions.
>
> I'll also test on x86_64-linux before committing.
>
>
> Segher
>
>
> 2015-05-10  Segher Boessenkool   <segher@kernel.crashing.org>
>
> 	* combine.c (recog_for_combine_1): New function, factored out
> 	from recog_for_combine.
> 	(change_zero_ext): New function.
> 	(recog_for_combine): If recog fails, try again with the pattern
> 	modified by change_zero_ext; if that still fails, restore the
> 	pattern.
I like it.  Attacking the extensions are the most obvious candidates, 
but I wonder if there's others (like the whole "ASHIFT vs MULT" stuff 
that we were recently looking at for ARM).


jeff
Segher Boessenkool May 11, 2015, 6:18 a.m. UTC | #2
On Sun, May 10, 2015 at 10:15:34PM -0600, Jeff Law wrote:
> >	(recog_for_combine): If recog fails, try again with the pattern
> >	modified by change_zero_ext; if that still fails, restore the
> >	pattern.
> I like it.  Attacking the extensions are the most obvious candidates, 
> but I wonder if there's others (like the whole "ASHIFT vs MULT" stuff 
> that we were recently looking at for ARM).

Yeah, I thought about that as well.  But that case, MULT instead of shift,
outside of MEM, is simply non-canonical RTL; and combine created it on
purpose.  It just shouldn't.

There certainly will be other cases though; hopefully not overlapping,
I don't want to recog an exponentially expanding number of times ;-)


Segher
Kyrylo Tkachov May 12, 2015, 8:20 a.m. UTC | #3
Hi Segher,

On 10/05/15 17:13, Segher Boessenkool wrote:
> Combine has its own ideas of what is "canonical" RTL, forcing all
> backends to have special patterns in their machine description for the
> "more simplified" patterns combine often creates, even though the
> backend already has patterns for a more general form.  Backends that
> do not implement those patterns get less well optimised code.
>
> This patch lifts that burden for two cases: combine often converts
> an AND (with, say, 0xff) to a ZERO_EXTEND of a SUBREG; and an LSHIFTRT
> followed by an AND to a ZERO_EXTRACT.  This is perfectly helpful for
> e.g. MEMs, but not nice if you have instructions to do more generic
> masking (like PowerPC rlwinm, and similar on some other archs).
>
> With this patch, if recog_for_combine fails, and there are any
> ZERO_EXT* in the pattern to be matched, it tries again with those
> expressed as AND etc.  If that also fails it rolls back the changes,
> because it might still match after e.g. splitting, and we want to
> try the ZERO_EXT* for that as well.
>
> Tested on powerpc-linux, before and after removing many patterns
> from the machine description, and checked that the only changes in
> the bootstrapped compiler are new and removed functions.


Does this patch means we can remove any patterns in
the backend that look like:

-  [(set (match_operand:SI 0 "gpc_reg_operand" "=r")
-       (zero_extract:SI (match_operand:SI 1 "gpc_reg_operand" "r")
-                        (match_operand:SI 2 "const_int_operand" "i")
-                        (match_operand:SI 3 "const_int_operand" "i")))]


as long as we have an equivalent and-with-mask pattern?

Thanks,
Kyrill


>
> I'll also test on x86_64-linux before committing.
>
>
> Segher
>
>
> 2015-05-10  Segher Boessenkool   <segher@kernel.crashing.org>
>
> 	* combine.c (recog_for_combine_1): New function, factored out
> 	from recog_for_combine.
> 	(change_zero_ext): New function.
> 	(recog_for_combine): If recog fails, try again with the pattern
> 	modified by change_zero_ext; if that still fails, restore the
> 	pattern.
>
> ---
>   gcc/combine.c | 120 +++++++++++++++++++++++++++++++++++++++++++++++++++-------
>   1 file changed, 107 insertions(+), 13 deletions(-)
>
> diff --git a/gcc/combine.c b/gcc/combine.c
> index 1e4d65e..896d9d2 100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -10849,21 +10849,11 @@ simplify_shift_const (rtx x, enum rtx_code code, machine_mode result_mode,
>   }
>   
>   
> -/* Like recog, but we receive the address of a pointer to a new pattern.
> -   We try to match the rtx that the pointer points to.
> -   If that fails, we may try to modify or replace the pattern,
> -   storing the replacement into the same pointer object.
> -
> -   Modifications include deletion or addition of CLOBBERs.
> -
> -   PNOTES is a pointer to a location where any REG_UNUSED notes added for
> -   the CLOBBERs are placed.
> -
> -   The value is the final insn code from the pattern ultimately matched,
> -   or -1.  */
> +/* A subroutine of recog_for_combine.  See there for arguments and
> +   return value.  */
>   
>   static int
> -recog_for_combine (rtx *pnewpat, rtx_insn *insn, rtx *pnotes)
> +recog_for_combine_1 (rtx *pnewpat, rtx_insn *insn, rtx *pnotes)
>   {
>     rtx pat = *pnewpat;
>     rtx pat_without_clobbers;
> @@ -11010,6 +11000,110 @@ recog_for_combine (rtx *pnewpat, rtx_insn *insn, rtx *pnotes)
>   
>     return insn_code_number;
>   }
> +
> +/* Change every ZERO_EXTRACT and ZERO_EXTEND of a SUBREG that can be
> +   expressed as an AND and maybe an LSHIFTRT, to that formulation.
> +   Return whether anything was so changed.  */
> +
> +static bool
> +change_zero_ext (rtx *src)
> +{
> +  bool changed = false;
> +
> +  subrtx_ptr_iterator::array_type array;
> +  FOR_EACH_SUBRTX_PTR (iter, array, src, NONCONST)
> +    {
> +      rtx x = **iter;
> +      machine_mode mode = GET_MODE (x);
> +      int size;
> +
> +      if (GET_CODE (x) == ZERO_EXTRACT
> +	  && CONST_INT_P (XEXP (x, 1))
> +	  && CONST_INT_P (XEXP (x, 2))
> +	  && GET_MODE (XEXP (x, 0)) == mode)
> +	{
> +	  size = INTVAL (XEXP (x, 1));
> +
> +	  int start = INTVAL (XEXP (x, 2));
> +	  if (BITS_BIG_ENDIAN)
> +	    start = GET_MODE_PRECISION (mode) - size - start;
> +
> +	  x = gen_rtx_LSHIFTRT (mode, XEXP (x, 0), GEN_INT (start));
> +	}
> +      else if (GET_CODE (x) == ZERO_EXTEND
> +	       && GET_CODE (XEXP (x, 0)) == SUBREG
> +	       && GET_MODE (SUBREG_REG (XEXP (x, 0))) == mode
> +	       && subreg_lowpart_p (XEXP (x, 0)))
> +	{
> +	  size = GET_MODE_PRECISION (GET_MODE (XEXP (x, 0)));
> +	  x = SUBREG_REG (XEXP (x, 0));
> +	}
> +      else
> +	continue;
> +
> +      unsigned HOST_WIDE_INT mask = 1;
> +      mask <<= size;
> +      mask--;
> +
> +      x = gen_rtx_AND (mode, x, GEN_INT (mask));
> +
> +      SUBST (**iter, x);
> +      changed = true;
> +    }
> +
> +  return changed;
> +}
> +
> +/* Like recog, but we receive the address of a pointer to a new pattern.
> +   We try to match the rtx that the pointer points to.
> +   If that fails, we may try to modify or replace the pattern,
> +   storing the replacement into the same pointer object.
> +
> +   Modifications include deletion or addition of CLOBBERs.  If the
> +   instruction will still not match, we change ZERO_EXTEND and ZERO_EXTRACT
> +   to the equivalent AND and perhaps LSHIFTRT patterns, and try with that
> +   (and undo if that fails).
> +
> +   PNOTES is a pointer to a location where any REG_UNUSED notes added for
> +   the CLOBBERs are placed.
> +
> +   The value is the final insn code from the pattern ultimately matched,
> +   or -1.  */
> +
> +static int
> +recog_for_combine (rtx *pnewpat, rtx_insn *insn, rtx *pnotes)
> +{
> +  rtx pat = PATTERN (insn);
> +  int insn_code_number = recog_for_combine_1 (pnewpat, insn, pnotes);
> +  if (insn_code_number >= 0 || check_asm_operands (pat))
> +    return insn_code_number;
> +
> +  void *marker = get_undo_marker ();
> +  bool changed = false;
> +
> +  if (GET_CODE (pat) == SET)
> +    changed = change_zero_ext (&SET_SRC (pat));
> +  else if (GET_CODE (pat) == PARALLEL)
> +    {
> +      int i;
> +      for (i = 0; i < XVECLEN (pat, 0); i++)
> +	{
> +	  rtx set = XVECEXP (pat, 0, i);
> +	  if (GET_CODE (set) == SET)
> +	    changed |= change_zero_ext (&SET_SRC (set));
> +	}
> +    }
> +
> +  if (changed)
> +    {
> +      insn_code_number = recog_for_combine_1 (pnewpat, insn, pnotes);
> +
> +      if (insn_code_number < 0)
> +	undo_to_marker (marker);
> +    }
> +
> +  return insn_code_number;
> +}
>   
>   /* Like gen_lowpart_general but for use by combine.  In combine it
>      is not possible to create any new pseudoregs.  However, it is
Segher Boessenkool May 12, 2015, 11:13 a.m. UTC | #4
On Tue, May 12, 2015 at 09:20:18AM +0100, Kyrill Tkachov wrote:
> Does this patch means we can remove any patterns in
> the backend that look like:
> 
> -  [(set (match_operand:SI 0 "gpc_reg_operand" "=r")
> -       (zero_extract:SI (match_operand:SI 1 "gpc_reg_operand" "r")
> -                        (match_operand:SI 2 "const_int_operand" "i")
> -                        (match_operand:SI 3 "const_int_operand" "i")))]
> 
> 
> as long as we have an equivalent and-with-mask pattern?

Yes, exactly.  But you can also keep them, it will find either.

Older / CISC targets will usually want the zero_ext*, newer / RISC
targets will more often write something as AND (and LSHIFTRT).

You can also have both in the same target: for example, many targets
will want to write their plain zero_extend patterns as that, because
you want to allow both mem and reg in the same pattern, for reload.


Segher
diff mbox

Patch

diff --git a/gcc/combine.c b/gcc/combine.c
index 1e4d65e..896d9d2 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -10849,21 +10849,11 @@  simplify_shift_const (rtx x, enum rtx_code code, machine_mode result_mode,
 }
 
 
-/* Like recog, but we receive the address of a pointer to a new pattern.
-   We try to match the rtx that the pointer points to.
-   If that fails, we may try to modify or replace the pattern,
-   storing the replacement into the same pointer object.
-
-   Modifications include deletion or addition of CLOBBERs.
-
-   PNOTES is a pointer to a location where any REG_UNUSED notes added for
-   the CLOBBERs are placed.
-
-   The value is the final insn code from the pattern ultimately matched,
-   or -1.  */
+/* A subroutine of recog_for_combine.  See there for arguments and
+   return value.  */
 
 static int
-recog_for_combine (rtx *pnewpat, rtx_insn *insn, rtx *pnotes)
+recog_for_combine_1 (rtx *pnewpat, rtx_insn *insn, rtx *pnotes)
 {
   rtx pat = *pnewpat;
   rtx pat_without_clobbers;
@@ -11010,6 +11000,110 @@  recog_for_combine (rtx *pnewpat, rtx_insn *insn, rtx *pnotes)
 
   return insn_code_number;
 }
+
+/* Change every ZERO_EXTRACT and ZERO_EXTEND of a SUBREG that can be
+   expressed as an AND and maybe an LSHIFTRT, to that formulation.
+   Return whether anything was so changed.  */
+
+static bool
+change_zero_ext (rtx *src)
+{
+  bool changed = false;
+
+  subrtx_ptr_iterator::array_type array;
+  FOR_EACH_SUBRTX_PTR (iter, array, src, NONCONST)
+    {
+      rtx x = **iter;
+      machine_mode mode = GET_MODE (x);
+      int size;
+
+      if (GET_CODE (x) == ZERO_EXTRACT
+	  && CONST_INT_P (XEXP (x, 1))
+	  && CONST_INT_P (XEXP (x, 2))
+	  && GET_MODE (XEXP (x, 0)) == mode)
+	{
+	  size = INTVAL (XEXP (x, 1));
+
+	  int start = INTVAL (XEXP (x, 2));
+	  if (BITS_BIG_ENDIAN)
+	    start = GET_MODE_PRECISION (mode) - size - start;
+
+	  x = gen_rtx_LSHIFTRT (mode, XEXP (x, 0), GEN_INT (start));
+	}
+      else if (GET_CODE (x) == ZERO_EXTEND
+	       && GET_CODE (XEXP (x, 0)) == SUBREG
+	       && GET_MODE (SUBREG_REG (XEXP (x, 0))) == mode
+	       && subreg_lowpart_p (XEXP (x, 0)))
+	{
+	  size = GET_MODE_PRECISION (GET_MODE (XEXP (x, 0)));
+	  x = SUBREG_REG (XEXP (x, 0));
+	}
+      else
+	continue;
+
+      unsigned HOST_WIDE_INT mask = 1;
+      mask <<= size;
+      mask--;
+
+      x = gen_rtx_AND (mode, x, GEN_INT (mask));
+
+      SUBST (**iter, x);
+      changed = true;
+    }
+
+  return changed;
+}
+
+/* Like recog, but we receive the address of a pointer to a new pattern.
+   We try to match the rtx that the pointer points to.
+   If that fails, we may try to modify or replace the pattern,
+   storing the replacement into the same pointer object.
+
+   Modifications include deletion or addition of CLOBBERs.  If the
+   instruction will still not match, we change ZERO_EXTEND and ZERO_EXTRACT
+   to the equivalent AND and perhaps LSHIFTRT patterns, and try with that
+   (and undo if that fails).
+
+   PNOTES is a pointer to a location where any REG_UNUSED notes added for
+   the CLOBBERs are placed.
+
+   The value is the final insn code from the pattern ultimately matched,
+   or -1.  */
+
+static int
+recog_for_combine (rtx *pnewpat, rtx_insn *insn, rtx *pnotes)
+{
+  rtx pat = PATTERN (insn);
+  int insn_code_number = recog_for_combine_1 (pnewpat, insn, pnotes);
+  if (insn_code_number >= 0 || check_asm_operands (pat))
+    return insn_code_number;
+
+  void *marker = get_undo_marker ();
+  bool changed = false;
+
+  if (GET_CODE (pat) == SET)
+    changed = change_zero_ext (&SET_SRC (pat));
+  else if (GET_CODE (pat) == PARALLEL)
+    {
+      int i;
+      for (i = 0; i < XVECLEN (pat, 0); i++)
+	{
+	  rtx set = XVECEXP (pat, 0, i);
+	  if (GET_CODE (set) == SET)
+	    changed |= change_zero_ext (&SET_SRC (set));
+	}
+    }
+
+  if (changed)
+    {
+      insn_code_number = recog_for_combine_1 (pnewpat, insn, pnotes);
+
+      if (insn_code_number < 0)
+	undo_to_marker (marker);
+    }
+
+  return insn_code_number;
+}
 
 /* Like gen_lowpart_general but for use by combine.  In combine it
    is not possible to create any new pseudoregs.  However, it is