From patchwork Tue Jun 15 16:26:21 2010
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Michael Matz <matz@suse.de>
X-Patchwork-Id: 55763
Return-Path: 
 <gcc-patches-return-265303-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	by ozlabs.org (Postfix) with SMTP id A214CB7DA8
	for <incoming@patchwork.ozlabs.org>;
	Wed, 16 Jun 2010 02:26:57 +1000 (EST)
Received: (qmail 30178 invoked by alias); 15 Jun 2010 16:26:54 -0000
Received: (qmail 29800 invoked by uid 22791); 15 Jun 2010 16:26:43 -0000
X-SWARE-Spam-Status: No, hits=-1.7 required=5.0	tests=AWL, BAYES_50, TW_CP,
	TW_VB, T_RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
Received: from cantor2.suse.de (HELO mx2.suse.de) (195.135.220.15) by
	sourceware.org (qpsmtpd/0.43rc1) with ESMTP;
	Tue, 15 Jun 2010 16:26:25 +0000
Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.221.2])	by
	mx2.suse.de (Postfix) with ESMTP id 1EBD08A95F	for
	<gcc-patches@gcc.gnu.org>; Tue, 15 Jun 2010 18:26:22 +0200 (CEST)
Date: Tue, 15 Jun 2010 18:26:21 +0200 (CEST)
From: Michael Matz <matz@suse.de>
To: gcc-patches@gcc.gnu.org
Subject: Speed up genattrtab
Message-ID: <Pine.LNX.4.64.1006151807190.15724@wotan.suse.de>
MIME-Version: 1.0
X-IsSubscribed: yes
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org

Hi,

okay, let's try again, after many years.  Maybe this time :)

This speeds up genattrtab to be no time issue during bootstrap anymore.
Over the years I worked on many approaches to this.  My first one was to 
throttle down the optimization, then I completely removed the 
optimization, then I implemented different kinds of optimizations, then I 
combined them with the throttled down ones, and now I'm back to more or 
less only throttling the optimizations.

Obviously switching off all optimizations creates the fastest combination 
of genattrtab+compiling insn-attrtab.c.  But it has some effect on the 
overall speed of the compiler.  Not too bad as I rectified this a bit, but 
maybe too much to be acceptable to everyone.  But for all development 
during the last years I used a so modified genattrtab, it's really nice :)

Now, after much benchmarking last week I'm proposing the below patch.  It 
does not get rid of all optimizations, but throttles it significantly 
(plus reorders the order of computation so that lowering the limits 
doesn't have too much effect for small attributes).

Numbers follow.  First the architecture, then the version of genattrtab, 
then four numbers:
 gen_u == seconds to run an optimized (!) genattrtab
 st1_1 == seconds to compile generated insn-attrtab.c with an optimized 
          cc1
 big_u == seconds to compile an artificial piece of code that generates
          large functions, many loops, scheduling opportunities
 kde_u == seconds to compile kdecore.cc, a one-file variant of an older 
          version of libkdecore.

The genattrtab versions are: clean == as in SVN, try3 == no call to 
optimize_attrs, otherwise same as proposed patch, try == the proposed 
variant.  These measurements were taken on genattrtab versions that didn't 
contain the latest changes to support enum attributes, but those have no 
speed effect (I've checked for some combinations).

arch     name   gen_u  st1_u  big_u  kde_u
alpha    clean  0      0.75   43.21  32.52
alpha    try3   0      0.26   44.19  32.85
alpha    try    0      0.35   43.26  32.50
arm      clean  6      19.66  49.25  37.89
arm      try3   0      1.78   50.04  38.00
arm      try    2      2.35   49.85  38.10
crisv32  clean  0      0.21   36.17  27.33
crisv32  try3   0      0.15   36.49  27.53
crisv32  try    0      0.23   36.21  27.43
hppa     clean  0      1.11   46.77  31.97
hppa     try3   0      0.58   46.85  32.00
hppa     try    0      0.64   46.97  31.84
i386     clean  38     34.25  33.51  29.93
i386     try3   1      1.99   34.26  30.64
i386     try    6      2.31   33.78  30.12
ia64     clean  1      1.88   66.55  49.81
ia64     try3   0      0.71   67.08  50.33
ia64     try    0      0.95   66.62  49.81
mips     clean  74     17.08  51.23
mips     try3   0      1.74   52.11
mips     try    4      2.29   50.77
powerpc  clean  56     48.59  49.74  34.15
powerpc  try3   0      2.59   50.60  34.97
powerpc  try    5      2.17   49.38  34.71
s390x    clean  0      1.82   47.26  32.83
s390x    try3   0      0.62   47.63  33.75
s390x    try    0      0.78   47.41  33.64
sh       clean  0      1.46   50.99  38.09
sh       try3   0      0.68   51.05  38.30
sh       try    0      0.91   50.79  38.13
sparc    clean  0      1.11   44.21  32.98
sparc    try3   0      0.57   44.45  33.22
sparc    try    0      0.57   43.27  32.93
x86_64   clean  52     43.81  28.78  28.72
x86_64   try3   1      2.16   29.22  29.40
x86_64   try    6      2.98   28.96  28.96

(mips wasn't able to compile kdecore.cc).  This is all cross compilers to 
$arch-linux, all running on the same host machine (a x86_64-linux iCore7 
machine).  As said, I'm proposing "try", so compare the first and third 
numbers.  It will hugely help i386, mips, powerpc and x86_64, and arm a 
bit; the others aren't a problem right now anyway.  The speed difference 
of the compiler is acceptable I think, actually even speeding up the 
compiler sometimes (probably cache effects, because the .text size of 
insn-attrtab is _much_ smaller) or being in the noise.

So, if included, we go from 95 seconds to 9 seconds for x86_64 for an 
optimized cc1, the difference will be even larger for stage2 (using an 
possibly unoptimized cc1).

As said, I'm bootstrapping with variants of this since years, but of 
course I'm regstrapping this currently on x86_64-linux.  Okay for trunk?


Ciao,
Michael.

Index: genattrtab.c
===================================================================
--- genattrtab.c	(revision 160784)
+++ genattrtab.c	(working copy)
@@ -241,6 +241,7 @@ static const char *length_str;
 static const char *delay_type_str;
 static const char *delay_1_0_str;
 static const char *num_delay_slots_str;
+static const char *insn_code_str;
 
 /* Simplify an expression.  Only call the routine if there is something to
    simplify.  */
@@ -1621,6 +1622,58 @@ write_length_unit_log (void)
   printf ("EXPORTED_CONST int length_unit_log = %u;\n", length_unit_log);
 }
 
+/* Compute approximate cost of the expression.  Used to decide whether
+   expression is cheap enough for inline.  */
+static int
+attr_rtx_cost (rtx x)
+{
+  int cost = 1;
+  enum rtx_code code;
+  if (!x)
+    return 0;
+  code = GET_CODE (x);
+  switch (code)
+    {
+    case MATCH_OPERAND:
+      if (XSTR (x, 1)[0])
+	return 10;
+      else
+	return 1;
+
+    case EQ_ATTR_ALT:
+      return 1;
+
+    case EQ_ATTR:
+      /* Alternatives don't result into function call.  */
+      if (!strcmp_check (XSTR (x, 0), alternative_name)
+          || !strcmp_check (XSTR (x, 0), insn_code_str))
+	return 1;
+      else
+	return 5;
+    default:
+      {
+	int i, j;
+	const char *fmt = GET_RTX_FORMAT (code);
+	for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
+	  {
+	    switch (fmt[i])
+	      {
+	      case 'V':
+	      case 'E':
+		for (j = 0; j < XVECLEN (x, i); j++)
+		  cost += attr_rtx_cost (XVECEXP (x, i, j));
+		break;
+	      case 'e':
+		cost += attr_rtx_cost (XEXP (x, i));
+		break;
+	      }
+	  }
+      }
+      break;
+    }
+  return cost;
+}
+
 /* Take a COND expression and see if any of the conditions in it can be
    simplified.  If any are known true or known false for the particular insn
    code, the COND can be further simplified.
@@ -1744,6 +1797,111 @@ simplify_cond (rtx exp, int insn_code, i
   return ret;
 }
 
+/* Taken a COND expression EXP and a value AV, and returns a possibly
+   optimized variant of EXP.  This is similar to simplify_cond, but
+   instead of optimizing for just one instruction we optimize for the
+   set of instructions found in AV.  */
+
+static rtx
+simplify_cond_insn_list (rtx exp, struct attr_value *av)
+{
+  int i;
+  /* We store the desired contents here,
+     then build a new expression if they don't match EXP.  */
+  rtx defval = XEXP (exp, 1);
+  rtx new_defval = XEXP (exp, 1);
+  int len = XVECLEN (exp, 0);
+  rtx *tests;
+  int allsame = 1;
+  rtx ret;
+
+  if (av->has_asm_insn)
+    return exp;
+
+  tests = XNEWVEC (rtx, len);
+  /* This lets us free all storage allocated below, if appropriate.  */
+  obstack_finish (rtl_obstack);
+
+  memcpy (tests, XVEC (exp, 0)->elem, len * sizeof (rtx));
+
+  /* See if default value needs simplification.  */
+  if (GET_CODE (defval) == COND)
+    new_defval = simplify_cond_insn_list (defval, av);
+
+  for (i = 0; i < len; i += 2)
+    {
+      rtx newtest, newval;
+      struct insn_ent *ie;
+      int oldcost;
+
+      oldcost = attr_rtx_cost (XVECEXP (exp, 0, i));
+      tests[i] = false_rtx;
+      for (ie = av->first_insn; ie != 0; ie = ie->next)
+        {
+	  rtx orig_test = XVECEXP (exp, 0, i);
+	  gcc_assert (ie->def->insn_code >= 0);
+	  newtest = simplify_test_exp_in_temp (orig_test, ie->def->insn_code,
+	  				       ie->def->insn_index);
+	  /* We couldn't simplify the expression for this insn code,
+	     so it makes no sense to try further, as syntactically we
+	     can't shorten the test anymore.  Further, also the
+	     simplifications done up to now can be removed.  */
+          if (newtest == orig_test)
+	    {
+	      tests[i] = orig_test;
+	      break;
+	    }
+	  if (newtest != false_rtx)
+	    {
+	      newtest = attr_rtx (AND,
+				  attr_eq (insn_code_str,
+					   attr_numeral (ie->def->insn_code)),
+				  newtest);
+	      tests[i] = insert_right_side (IOR, tests[i], newtest, -2, -2);
+	    }
+        }
+
+      if (attr_rtx_cost (tests[i]) > oldcost)
+        tests[i] = XVECEXP (exp, 0, i);
+
+      /* Simplify this test.  */
+      newtest = simplify_test_exp_in_temp (tests[i], -2, -2);
+      tests[i] = newtest;
+
+      newval = tests[i + 1];
+      /* See if this value may need simplification.  */
+      if (GET_CODE (newval) == COND)
+	newval = simplify_cond_insn_list (newval, av);
+
+      tests[i + 1] = newval;
+    }
+
+  /* See if we changed anything.  */
+  if (len != XVECLEN (exp, 0) || new_defval != XEXP (exp, 1))
+    allsame = 0;
+  else
+    for (i = 0; i < len; i++)
+      if (! attr_equal_p (tests[i], XVECEXP (exp, 0, i)))
+	{
+	  allsame = 0;
+	  break;
+	}
+
+  if (allsame)
+    ret = exp;
+  else
+    {
+      rtx newexp = rtx_alloc (COND);
+
+      XVEC (newexp, 0) = rtvec_alloc (len);
+      memcpy (XVEC (newexp, 0)->elem, tests, len * sizeof (rtx));
+      XEXP (newexp, 1) = new_defval;
+      ret = newexp;
+    }
+  free (tests);
+  return ret;
+}
+
 /* Remove an insn entry from an attribute value.  */
 
 static void
@@ -2216,57 +2374,6 @@ simplify_or_tree (rtx exp, rtx *pterm, i
   return exp;
 }
 
-/* Compute approximate cost of the expression.  Used to decide whether
-   expression is cheap enough for inline.  */
-static int
-attr_rtx_cost (rtx x)
-{
-  int cost = 0;
-  enum rtx_code code;
-  if (!x)
-    return 0;
-  code = GET_CODE (x);
-  switch (code)
-    {
-    case MATCH_OPERAND:
-      if (XSTR (x, 1)[0])
-	return 10;
-      else
-	return 0;
-
-    case EQ_ATTR_ALT:
-      return 0;
-
-    case EQ_ATTR:
-      /* Alternatives don't result into function call.  */
-      if (!strcmp_check (XSTR (x, 0), alternative_name))
-	return 0;
-      else
-	return 5;
-    default:
-      {
-	int i, j;
-	const char *fmt = GET_RTX_FORMAT (code);
-	for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
-	  {
-	    switch (fmt[i])
-	      {
-	      case 'V':
-	      case 'E':
-		for (j = 0; j < XVECLEN (x, i); j++)
-		  cost += attr_rtx_cost (XVECEXP (x, i, j));
-		break;
-	      case 'e':
-		cost += attr_rtx_cost (XEXP (x, i));
-		break;
-	      }
-	  }
-      }
-      break;
-    }
-  return cost;
-}
-
 /* Simplify test expression and use temporary obstack in order to avoid
    memory bloat.  Use ATTR_IND_SIMPLIFIED to avoid unnecessary simplifications
    and avoid unnecessary copying if possible.  */
@@ -2599,6 +2706,25 @@ simplify_test_exp (rtx exp, int insn_cod
 	  return SIMPLIFY_TEST_EXP (newexp, insn_code, insn_index);
 	}
 
+      /* Similarly,
+	    convert (ior (and (y) (x))
+			 (and (z) (x)))
+	    to      (and (ior (y) (z))
+			 (x))
+         Note that we want the common term to stay at the end.
+       */
+
+      else if (GET_CODE (left) == AND && GET_CODE (right) == AND
+	       && attr_equal_p (XEXP (left, 1), XEXP (right, 1)))
+	{
+	  newexp = attr_rtx (IOR, XEXP (left, 0), XEXP (right, 0));
+
+	  left = newexp;
+	  right = XEXP (right, 1);
+	  newexp = attr_rtx (AND, left, right);
+	  return SIMPLIFY_TEST_EXP (newexp, insn_code, insn_index);
+	}
+
       /* See if all or all but one of the insn's alternatives are specified
 	 in this tree.  Optimize if so.  */
 
@@ -2702,6 +2828,13 @@ simplify_test_exp (rtx exp, int insn_cod
 	  break;
 	}
 
+      if (XSTR (exp, 0) == insn_code_str)
+        {
+	  if (insn_code >= 0)
+	    newexp = atoi (XSTR (exp, 1)) == insn_code ? true_rtx : false_rtx;
+	  break;
+	}
+
       /* Look at the value for this insn code in the specified attribute.
 	 We normally can replace this comparison with the condition that
 	 would give this insn the values being tested for.  */
@@ -2734,7 +2867,7 @@ simplify_test_exp (rtx exp, int insn_cod
 	      x = evaluate_eq_attr (exp, attr, av->value,
 				    insn_code, insn_index);
 	      x = SIMPLIFY_TEST_EXP (x, insn_code, insn_index);
-	      if (attr_rtx_cost(x) < 20)
+	      if (attr_rtx_cost(x) < 7)
 		return x;
 	    }
 	}
@@ -2754,6 +2887,131 @@ simplify_test_exp (rtx exp, int insn_cod
   return newexp;
 }
 
+/* Return 1 if any EQ_ATTR subexpression of P refers to ATTR,
+   otherwise return 0.  */
+
+static int
+tests_attr_p (rtx p, struct attr_desc *attr)
+{
+  const char *fmt;
+  int i, ie, j, je;
+
+  if (GET_CODE (p) == EQ_ATTR)
+    {
+      if (XSTR (p, 0) != attr->name)
+	return 0;
+      return 1;
+    }
+
+  fmt = GET_RTX_FORMAT (GET_CODE (p));
+  ie = GET_RTX_LENGTH (GET_CODE (p));
+  for (i = 0; i < ie; i++)
+    {
+      switch (*fmt++)
+	{
+	case 'e':
+	  if (tests_attr_p (XEXP (p, i), attr))
+	    return 1;
+	  break;
+
+	case 'E':
+	  je = XVECLEN (p, i);
+	  for (j = 0; j < je; ++j)
+	    if (tests_attr_p (XVECEXP (p, i, j), attr))
+	      return 1;
+	  break;
+	}
+    }
+
+  return 0;
+}
+
+/* Calculate a topological sorting of all attributes so that
+   all attributes only depend on attributes in front of it.
+   Place the result in *RET (which is a pointer to an array of
+   attr_desc pointers), and return the size of that array.  */
+
+static int
+get_attr_order (struct attr_desc ***ret)
+{
+  int i, j;
+  int num = 0;
+  struct attr_desc *attr;
+  struct attr_desc **all, **sorted;
+  char *handled;
+  for (i = 0; i < MAX_ATTRS_INDEX; i++)
+    for (attr = attrs[i]; attr; attr = attr->next)
+      num++;
+  all = XNEWVEC (struct attr_desc *, num);
+  sorted = XNEWVEC (struct attr_desc *, num);
+  handled = XCNEWVEC (char, num);
+  num = 0;
+  for (i = 0; i < MAX_ATTRS_INDEX; i++)
+    for (attr = attrs[i]; attr; attr = attr->next)
+      all[num++] = attr;
+
+  j = 0;
+  for (i = 0; i < num; i++)
+    if (all[i]->is_const)
+      handled[i] = 1, sorted[j++] = all[i];
+
+  /* We have only few attributes hence we can live with the inner
+     loop being O(n^2), unlike the normal fast variants of topological
+     sorting.  */
+  while (j < num)
+    {
+      for (i = 0; i < num; i++)
+	if (!handled[i])
+	  {
+	    /* Let's see if I depends on anything interesting.  */
+	    int k;
+	    for (k = 0; k < num; k++)
+	      if (!handled[k])
+		{
+		  struct attr_value *av;
+		  for (av = all[i]->first_value; av; av = av->next)
+		    if (av->num_insns != 0)
+		      if (tests_attr_p (av->value, all[k]))
+			break;
+
+		  if (av)
+		    /* Something in I depends on K.  */
+		    break;
+		}
+	    if (k == num)
+	      {
+		/* Nothing in I depended on anything intersting, so
+		   it's done.  */
+		handled[i] = 1;
+		sorted[j++] = all[i];
+	      }
+	  }
+    }
+
+  for (j = 0; j < num; j++)
+    {
+      struct attr_desc *attr2;
+      struct attr_value *av;
+
+      attr = sorted[j];
+      fprintf (stderr, "%s depends on: ", attr->name);
+      for (i = 0; i < MAX_ATTRS_INDEX; ++i)
+	for (attr2 = attrs[i]; attr2; attr2 = attr2->next)
+	  if (!attr2->is_const)
+	    for (av = attr->first_value; av; av = av->next)
+	      if (av->num_insns != 0)
+		if (tests_attr_p (av->value, attr2))
+		  {
+		    fprintf (stderr, "%s, ", attr2->name);
+		    break;
+		  }
+      fprintf (stderr, "\n");
+    }
+  free (all);
+  *ret = sorted;
+  return num;
+}
+
 /* Optimize the attribute lists by seeing if we can determine conditional
    values from the known values of other attributes.  This will save subroutine
    calls during the compilation.  */
@@ -2768,6 +3026,8 @@ optimize_attrs (void)
   int i;
   struct attr_value_list *ivbuf;
   struct attr_value_list *iv;
+  struct attr_desc **topsort;
+  int topnum;
 
   /* For each insn code, make a list of all the insn_ent's for it,
      for all values for all attributes.  */
@@ -2783,18 +3043,22 @@ optimize_attrs (void)
 
   iv = ivbuf = XNEWVEC (struct attr_value_list, num_insn_ents);
 
-  for (i = 0; i < MAX_ATTRS_INDEX; i++)
-    for (attr = attrs[i]; attr; attr = attr->next)
-      for (av = attr->first_value; av; av = av->next)
-	for (ie = av->first_insn; ie; ie = ie->next)
-	  {
-	    iv->attr = attr;
-	    iv->av = av;
-	    iv->ie = ie;
-	    iv->next = insn_code_values[ie->def->insn_code];
-	    insn_code_values[ie->def->insn_code] = iv;
-	    iv++;
-	  }
+  /* Create the chain of insn*attr values such that we see dependend
+     attributes after their dependencies.  As we use a stack via the
+     next pointers start from the end of the topological order.  */
+  topnum = get_attr_order (&topsort);
+  for (i = topnum - 1; i >= 0; i--)
+    for (av = topsort[i]->first_value; av; av = av->next)
+      for (ie = av->first_insn; ie; ie = ie->next)
+	{
+	  iv->attr = topsort[i];
+	  iv->av = av;
+	  iv->ie = ie;
+	  iv->next = insn_code_values[ie->def->insn_code];
+	  insn_code_values[ie->def->insn_code] = iv;
+	  iv++;
+	}
+  free (topsort);
 
   /* Sanity check on num_insn_ents.  */
   gcc_assert (iv == ivbuf + num_insn_ents);
@@ -2829,7 +3093,15 @@ optimize_attrs (void)
 	    }
 
 	  rtl_obstack = old;
-	  if (newexp != av->value)
+	  /* If we created a new value for this instruction, and it's
+	     cheaper than the old value, and overall cheap, use that
+	     one as specific value for the current instruction.
+	     The last test is to avoid exploding the get_attr_ function
+	     sizes for no much gain.  */
+	  if (newexp != av->value
+	      && attr_rtx_cost (newexp) < attr_rtx_cost (av->value)
+	      && attr_rtx_cost (newexp) < 26
+	     )
 	    {
 	      newexp = attr_copy_rtx (newexp);
 	      remove_insn_ent (av, ie);
@@ -3332,6 +3604,12 @@ write_test_expr (rtx exp, int flags)
 	  break;
 	}
 
+      if (XSTR (exp, 0) == insn_code_str)
+        {
+	  printf ("insn_code == %s", XSTR (exp, 1));
+	  break;
+	}
+
       attr = find_attr (&XSTR (exp, 0), 0);
       gcc_assert (attr);
 
@@ -3618,10 +3896,80 @@ walk_attr_value (rtx exp)
       }
 }
 
-/* Write out a function to obtain the attribute for a given INSN.  */
+/* If a subexpression of P refers to ATTR, write out C code to retrieve
+   the value of that attribute storing it in a local variable.  */
+
+static int
+write_expr_attr_cache (rtx p, struct attr_desc *attr, int indent)
+{
+  const char *fmt;
+  int i, ie, j, je;
+
+  if (GET_CODE (p) == EQ_ATTR)
+    {
+      if (XSTR (p, 0) != attr->name)
+	return 0;
+
+      write_indent (indent);
+      if (attr->enum_name)
+	printf ("  enum %s ", attr->enum_name);
+      else if (!attr->is_numeric)
+	printf ("  enum attr_%s ", attr->name);
+      else
+	printf ("  int ");
+
+      printf ("attr_%s ATTRIBUTE_UNUSED = get_attr_%s (insn);\n",
+	      attr->name, attr->name);
+      return 1;
+    }
+
+  fmt = GET_RTX_FORMAT (GET_CODE (p));
+  ie = GET_RTX_LENGTH (GET_CODE (p));
+  for (i = 0; i < ie; i++)
+    {
+      switch (*fmt++)
+	{
+	case 'e':
+	  if (write_expr_attr_cache (XEXP (p, i), attr, indent))
+	    return 1;
+	  break;
+
+	case 'E':
+	  je = XVECLEN (p, i);
+	  for (j = 0; j < je; ++j)
+	    if (write_expr_attr_cache (XVECEXP (p, i, j), attr, indent))
+	      return 1;
+	  break;
+	}
+    }
+
+  return 0;
+}
+
+/* Given a VALUE (possibly having EQ_ATTR tests in subexpression)
+   write out C code to retrieve the value of all attributes used
+   in tests embedded therein.  */
 
 static void
-write_attr_get (struct attr_desc *attr)
+write_cache_used_attr_for_value (rtx value, int indent)
+{
+  struct attr_desc *attr2;
+  int i;
+
+  for (i = 0; i < MAX_ATTRS_INDEX; ++i)
+    for (attr2 = attrs[i]; attr2; attr2 = attr2->next)
+      if (!attr2->is_const)
+	write_expr_attr_cache (value, attr2, indent);
+}
+
+/* Write out C code to calculate the value of ATTR per instruction.
+   PREFIX and SUFFIX are used to delimit the 'return' statement delivering
+   the value to our caller.  Either it will form a real return statement,
+   or an accumulator update.  */
+
+static void
+write_attr_switch (struct attr_desc *attr, int indent, const char *prefix,
+		   const char *suffix)
 {
   struct attr_value *av, *common_av;
 
@@ -3629,6 +3977,32 @@ write_attr_get (struct attr_desc *attr)
      switch we will generate.  */
   common_av = find_most_used (attr);
 
+  write_indent (indent);
+  printf ("{\n");
+  write_indent (indent);
+  printf ("  int insn_code = recog_memoized (insn);\n");
+  write_indent (indent);
+  printf ("  switch (insn_code)\n");
+  write_indent (indent);
+  printf ("    {\n");
+
+  for (av = attr->first_value; av; av = av->next)
+    if (av != common_av)
+      write_attr_case (attr, av, 1, prefix, suffix, indent + 4, true_rtx);
+
+  write_attr_case (attr, common_av, 0, prefix, suffix, indent + 4, true_rtx);
+  
+  write_indent (indent);
+  printf ("    }\n");
+  write_indent (indent);
+  printf ("}\n\n");
+}
+
+/* Write out a function to obtain the attribute for a given INSN.  */
+
+static void
+write_attr_get (struct attr_desc *attr)
+{
   /* Write out start of function, then all values with explicit `case' lines,
      then a `default', then the value with the most uses.  */
   if (attr->enum_name)
@@ -3646,6 +4020,7 @@ write_attr_get (struct attr_desc *attr)
     printf ("get_attr_%s (rtx insn ATTRIBUTE_UNUSED)\n", attr->name);
   else
     {
+      struct attr_value *av;
       printf ("get_attr_%s (void)\n", attr->name);
       printf ("{\n");
 
@@ -3656,22 +4031,13 @@ write_attr_get (struct attr_desc *attr)
 			  av->first_insn->def->insn_index);
 	else if (av->num_insns != 0)
 	  write_attr_set (attr, 2, av->value, "return", ";",
-			  true_rtx, -2, 0);
+			  true_rtx, -2, -2);
 
       printf ("}\n\n");
       return;
     }
 
-  printf ("{\n");
-  printf ("  switch (recog_memoized (insn))\n");
-  printf ("    {\n");
-
-  for (av = attr->first_value; av; av = av->next)
-    if (av != common_av)
-      write_attr_case (attr, av, 1, "return", ";", 4, true_rtx);
-
-  write_attr_case (attr, common_av, 0, "return", ";", 4, true_rtx);
-  printf ("    }\n}\n\n");
+  write_attr_switch (attr, 0, "return", ";");
 }
 
 /* Given an AND tree of known true terms (because we are inside an `if' with
@@ -3727,6 +4093,10 @@ write_attr_set (struct attr_desc *attr,
 	  rtx testexp;
 	  rtx inner_true;
 
+	  /* Reset our_known_true after some time to not accumulate
+	     too much cruft (slowing down genattrtab).  */
+	  if ((i & 31) == 0)
+	    our_known_true = known_true;
 	  testexp = eliminate_known_true (our_known_true,
 					  XVECEXP (value, 0, i),
 					  insn_code, insn_index);
@@ -3755,7 +4125,7 @@ write_attr_set (struct attr_desc *attr,
 	  write_indent (indent);
 	  printf ("%sif ", first_if ? "" : "else ");
 	  first_if = 0;
-	  write_test_expr (testexp, 0);
+	  write_test_expr (testexp, 2);
 	  printf ("\n");
 	  write_indent (indent + 2);
 	  printf ("{\n");
@@ -3820,6 +4190,7 @@ write_attr_case (struct attr_desc *attr,
 		 int write_case_lines, const char *prefix, const char *suffix,
 		 int indent, rtx known_true)
 {
+  rtx opt_val;
   if (av->num_insns == 0)
     return;
 
@@ -3843,9 +4214,33 @@ write_attr_case (struct attr_desc *attr,
       printf ("default:\n");
     }
 
+  indent += 2;
+  write_indent (indent);
+  printf ("{\n");
+
+  opt_val = av->value;
+  /* If we have multiple but only few instructions associated with
+     this value we can possibly optimize this.  */
+  if (GET_CODE (opt_val) == COND && av->num_insns > 1 && av->num_insns < 10)
+    opt_val = simplify_cond_insn_list (opt_val, av);
+  while (GET_CODE (opt_val) == COND)
+    {
+      rtx newexp2;
+      if (av->num_insns == 1)
+	newexp2 = simplify_cond (opt_val, av->first_insn->def->insn_code,
+				 av->first_insn->def->insn_index);
+      else
+	newexp2 = simplify_cond (opt_val, -2, -2);
+      if (newexp2 == opt_val)
+	break;
+      opt_val = newexp2;
+    }
+
   /* See what we have to do to output this value.  */
   must_extract = must_constrain = address_used = 0;
-  walk_attr_value (av->value);
+  walk_attr_value (opt_val);
+
+  write_cache_used_attr_for_value (opt_val, indent);
 
   if (must_constrain)
     {
@@ -3859,11 +4254,11 @@ write_attr_case (struct attr_desc *attr,
     }
 
   if (av->num_insns == 1)
-    write_attr_set (attr, indent + 2, av->value, prefix, suffix,
+    write_attr_set (attr, indent + 2, opt_val, prefix, suffix,
 		    known_true, av->first_insn->def->insn_code,
 		    av->first_insn->def->insn_index);
   else
-    write_attr_set (attr, indent + 2, av->value, prefix, suffix,
+    write_attr_set (attr, indent + 2, opt_val, prefix, suffix,
 		    known_true, -2, 0);
 
   if (strncmp (prefix, "return", 6))
@@ -3871,6 +4266,8 @@ write_attr_case (struct attr_desc *attr,
       write_indent (indent + 2);
       printf ("break;\n");
     }
+  write_indent (indent);
+  printf ("}\n");
   printf ("\n");
 }
 
@@ -3993,7 +4390,6 @@ write_eligible_delay (const char *kind)
   char str[50];
   const char *pstr;
   struct attr_desc *attr;
-  struct attr_value *av, *common_av;
   int i;
 
   /* Compute the maximum number of delay slots required.  We use the delay
@@ -4026,19 +4422,11 @@ write_eligible_delay (const char *kind)
     {
       attr = find_attr (&delay_type_str, 0);
       gcc_assert (attr);
-      common_av = find_most_used (attr);
-
-      printf ("  insn = delay_insn;\n");
-      printf ("  switch (recog_memoized (insn))\n");
-      printf ("    {\n");
 
       sprintf (str, " * %d;\n      break;", max_slots);
-      for (av = attr->first_value; av; av = av->next)
-	if (av != common_av)
-	  write_attr_case (attr, av, 1, "slot +=", str, 4, true_rtx);
 
-      write_attr_case (attr, common_av, 0, "slot +=", str, 4, true_rtx);
-      printf ("    }\n\n");
+      printf ("  insn = delay_insn;\n");
+      write_attr_switch (attr, 2, "slot +=", str);
 
       /* Ensure matched.  Otherwise, shouldn't have been called.  */
       printf ("  gcc_assert (slot >= %d);\n\n", max_slots);
@@ -4047,20 +4435,10 @@ write_eligible_delay (const char *kind)
   /* If just one type of delay slot, write simple switch.  */
   if (num_delays == 1 && max_slots == 1)
     {
-      printf ("  insn = candidate_insn;\n");
-      printf ("  switch (recog_memoized (insn))\n");
-      printf ("    {\n");
-
       attr = find_attr (&delay_1_0_str, 0);
       gcc_assert (attr);
-      common_av = find_most_used (attr);
-
-      for (av = attr->first_value; av; av = av->next)
-	if (av != common_av)
-	  write_attr_case (attr, av, 1, "return", ";", 4, true_rtx);
-
-      write_attr_case (attr, common_av, 0, "return", ";", 4, true_rtx);
-      printf ("    }\n");
+      printf ("  insn = candidate_insn;\n");
+      write_attr_switch (attr, 2, "return", ";");
     }
 
   else
@@ -4076,21 +4454,12 @@ write_eligible_delay (const char *kind)
 	  {
 	    printf ("    case %d:\n",
 		    (i / 3) + (num_delays == 1 ? 0 : delay->num * max_slots));
-	    printf ("      switch (recog_memoized (insn))\n");
-	    printf ("\t{\n");
 
 	    sprintf (str, "*%s_%d_%d", kind, delay->num, i / 3);
 	    pstr = str;
 	    attr = find_attr (&pstr, 0);
 	    gcc_assert (attr);
-	    common_av = find_most_used (attr);
-
-	    for (av = attr->first_value; av; av = av->next)
-	      if (av != common_av)
-		write_attr_case (attr, av, 1, "return", ";", 8, true_rtx);
-
-	    write_attr_case (attr, common_av, 0, "return", ";", 8, true_rtx);
-	    printf ("      }\n");
+	    write_attr_switch (attr, 6, "return", ";");
 	  }
 
       printf ("    default:\n");
@@ -4133,7 +4502,7 @@ find_attr (const char **name_p, int crea
 
   /* Before we resort to using `strcmp', see if the string address matches
      anywhere.  In most cases, it should have been canonicalized to do so.  */
-  if (name == alternative_name)
+  if (name == alternative_name || name == insn_code_str)
     return NULL;
 
   index = name[0] & (MAX_ATTRS_INDEX - 1);
@@ -4458,6 +4827,7 @@ main (int argc, char **argv)
   delay_type_str = DEF_ATTR_STRING ("*delay_type");
   delay_1_0_str = DEF_ATTR_STRING ("*delay_1_0");
   num_delay_slots_str = DEF_ATTR_STRING ("*num_delay_slots");
+  insn_code_str = DEF_ATTR_STRING ("insn_code");
 
   printf ("/* Generated automatically by the program `genattrtab'\n\
 from the machine description file `md'.  */\n\n");
@@ -4573,7 +4943,7 @@ from the machine description file `md'.
   /* Construct extra attributes for `length'.  */
   make_length_attrs ();
 
-  /* Perform any possible optimizations to speed up compilation.  */
+  /* Perform some optimizations to speed up compilation.  */
   optimize_attrs ();
 
   /* Now write out all the `gen_attr_...' routines.  Do these before the