diff mbox

new sign/zero extension elimination pass

Message ID AANLkTi==BHdr3ZZNt_99tu4tbGWqz8xJQf1m4FkDMxz4@mail.gmail.com
State New
Headers show

Commit Message

Andrew Pinski Oct. 28, 2010, 7:19 p.m. UTC
On Thu, Oct 28, 2010 at 12:03 PM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Mon, Oct 18, 2010 at 8:36 AM, Tom de Vries <tom@codesourcery.com> wrote:
>> I created a new sign/zero extension elimination pass.
>>
>> The motivating example for this pass is:
>
> In the above case fwprop could do the majority of the work.  In fact
> it simplifies the (subreg (zero_extend (subreg))) into (subreg) but
> does not replace it.  I think you could extend fwprop to the correct
> thing.

Something like the attached patch.  I have not bootstrap/tested it yet
but it works for your simple example.
This allows us not to add another pass.

Thanks,
Andrew Pinski

Comments

Tom de Vries Oct. 28, 2010, 7:30 p.m. UTC | #1
Andrew,

Andrew Pinski wrote:
> On Thu, Oct 28, 2010 at 12:03 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>> On Mon, Oct 18, 2010 at 8:36 AM, Tom de Vries <tom@codesourcery.com> wrote:
>>> I created a new sign/zero extension elimination pass.
>>>
>>> The motivating example for this pass is:
>> In the above case fwprop could do the majority of the work.  In fact
>> it simplifies the (subreg (zero_extend (subreg))) into (subreg) but
>> does not replace it.  I think you could extend fwprop to the correct
>> thing.
> 
> Something like the attached patch.  I have not bootstrap/tested it yet
> but it works for your simple example.
> This allows us not to add another pass.
> 
> Thanks,
> Andrew Pinski
> 

thanks for this patch.

I agreed with Paolo in http://gcc.gnu.org/ml/gcc-patches/2010-10/msg01897.html
that for the example with which I submitted the pass initially, it would make
sense to handle it in fwprop. However, I also think that for the example
mentioned in http://gcc.gnu.org/ml/gcc-patches/2010-10/msg01796.html, that
wouldn't work, so we still need the new pass.

Thanks,
- Tom
Andrew Pinski Oct. 28, 2010, 7:41 p.m. UTC | #2
On Thu, Oct 28, 2010 at 12:30 PM, Tom de Vries <tom@codesourcery.com> wrote:
> I agreed with Paolo in http://gcc.gnu.org/ml/gcc-patches/2010-10/msg01897.html
> that for the example with which I submitted the pass initially, it would make
> sense to handle it in fwprop. However, I also think that for the example
> mentioned in http://gcc.gnu.org/ml/gcc-patches/2010-10/msg01796.html, that
> wouldn't work, so we still need the new pass.

Well when I look at the tree dumps I see:
(short unsigned int) ((int) (*D.1255)[1] + (int) (*D.1255)[0])

So we should be optimizing this at the tree level such that we don't
see extra sign extends there.  We would optimize it such that it looks
like:
(short unsigned int)(*D.1255)[1] + (short unsigned int)(*D.1255)[0]

I think we need a tree combiner for that and fold to do the folding.
See PR 14844 for another case of the problem.

-- Pinski
Paolo Bonzini Oct. 28, 2010, 9:12 p.m. UTC | #3
On 10/28/2010 09:19 PM, Andrew Pinski wrote:
> On Thu, Oct 28, 2010 at 12:03 PM, Andrew Pinski<pinskia@gmail.com>  wrote:
>> On Mon, Oct 18, 2010 at 8:36 AM, Tom de Vries<tom@codesourcery.com>  wrote:
>>> I created a new sign/zero extension elimination pass.
>>>
>>> The motivating example for this pass is:
>>
>> In the above case fwprop could do the majority of the work.  In fact
>> it simplifies the (subreg (zero_extend (subreg))) into (subreg) but
>> does not replace it.  I think you could extend fwprop to the correct
>> thing.
>
> Something like the attached patch.

If you can bootstrap it, I think this should go in independent of the 
pass in the thread.

Paolo
Tom de Vries Oct. 29, 2010, 2:23 p.m. UTC | #4
Andrew,

Andrew Pinski wrote:
> On Thu, Oct 28, 2010 at 12:30 PM, Tom de Vries <tom@codesourcery.com> wrote:
>> I agreed with Paolo in http://gcc.gnu.org/ml/gcc-patches/2010-10/msg01897.html
>> that for the example with which I submitted the pass initially, it would make
>> sense to handle it in fwprop. However, I also think that for the example
>> mentioned in http://gcc.gnu.org/ml/gcc-patches/2010-10/msg01796.html, that
>> wouldn't work, so we still need the new pass.
> 
> Well when I look at the tree dumps I see:
> (short unsigned int) ((int) (*D.1255)[1] + (int) (*D.1255)[0])
> 

I'm not able to reproduce that tree. Can you tell me how you got that one?

I compiled the example http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40893#c0 with
MIPS as default target:
...
$ rm -f *.c.* ; gcc -O3 test2.c -S -fdump-rtl-all -fdump-tree-all
...

and grepped for a pattern matching the tree code you mention:
...
$ grep '(short unsigned int)' test2.c.* | grep '+'
test2.c.003t.original:  (*d)[0] = (int16_t) ((short unsigned int) d0 + (short
unsigned int) d1);
...

The only match is 003t.original, which looks like:
...
{
  int d0 = (int) (*d)[0] + (int) (*d)[1];
  int d1 = (int) (*(d + 4))[0] + (int) (*(d + 4))[1];

    int d0 = (int) (*d)[0] + (int) (*d)[1];
    int d1 = (int) (*(d + 4))[0] + (int) (*(d + 4))[1];
  (*d)[0] = (int16_t) ((short unsigned int) d0 + (short unsigned int) d1);
  (*d)[1] = (int16_t) ((short unsigned int) d0 - (short unsigned int) d1);
}
...

I did a clean checkout and build for x86_64 build this morning. Same tree as for
MIPS, no other match for the grep.

The tree you mention looks like '(short unsigned int) d0' combined with the def
of d0.

> So we should be optimizing this at the tree level such that we don't
> see extra sign extends there.  We would optimize it such that it looks
> like:
> (short unsigned int)(*D.1255)[1] + (short unsigned int)(*D.1255)[0]
> 
> I think we need a tree combiner for that and fold to do the folding.
> See PR 14844 for another case of the problem.
> 

Thanks for the links to PR 14844 and the tree combiner (PR 15459). Do you know
of an updated (to gimple) version of the tree combiner? I would like to try it
on this example.

If I look at the representation after cselim, the 2 uses of d0 are cse-ed:
...
{
  int d1;
  int d0;
  int16_t D.2096;
  short unsigned int D.2095;
  int16_t D.2094;
  short unsigned int D.2093;
  short unsigned int D.2092;
  short unsigned int D.2091;
  int D.2090;
  int16_t D.2089;
  int D.2088;
  int16_t D.2087;
  int D.2085;
  int16_t D.2084;
  int D.2083;
  int16_t D.2082;

<bb 2>:
  D.2082_2 = *d_1(D)[0];
  D.2083_3 = (int) D.2082_2;
  D.2084_4 = *d_1(D)[1];
  D.2085_5 = (int) D.2084_4;
  d0_6 = D.2083_3 + D.2085_5;
  D.2087_8 = MEM[(int16_t[2] *)d_1(D) + 4B][0];
  D.2088_9 = (int) D.2087_8;
  D.2089_11 = MEM[(int16_t[2] *)d_1(D) + 4B][1];
  D.2090_12 = (int) D.2089_11;
  d1_13 = D.2088_9 + D.2090_12;
  D.2091_14 = (short unsigned int) d0_6;
  D.2092_15 = (short unsigned int) d1_13;
  D.2093_16 = D.2091_14 + D.2092_15;
  D.2094_17 = (int16_t) D.2093_16;
  *d_1(D)[0] = D.2094_17;
  D.2095_20 = D.2091_14 - D.2092_15;
  D.2096_21 = (int16_t) D.2095_20;
  *d_1(D)[1] = D.2096_21;
  return;
}
...

So if I filter out all the statements related to d0 we get:
...
{
  int d0;
  short unsigned int D.2091;
  int D.2085;
  int16_t D.2084;
  int D.2083;
  int16_t D.2082;

<bb 2>:
  D.2082_2 = *d_1(D)[0];
  D.2083_3 = (int) D.2082_2;
  D.2084_4 = *d_1(D)[1];
  D.2085_5 = (int) D.2084_4;
  d0_6 = D.2083_3 + D.2085_5;
  D.2091_14 = (short unsigned int) d0_6;
  ...
}
...

I think a tree combiner would work here like you suggested, although it would
have to combine 6 gimple statements to get there.

But I wonder, the rtl combiner has a weak point: it only combines chains of
instructions that have no other uses of the intermediate results (PR 18395/1).
AFAIU the source code of the tree combiner, it suffers from the same problem. Is
my understanding correct there?

Thanks,
- Tom
diff mbox

Patch

Index: fwprop.c
===================================================================
--- fwprop.c	(revision 166031)
+++ fwprop.c	(working copy)
@@ -543,8 +543,14 @@  propagate_rtx_1 (rtx *px, rtx old_rtx, r
 	  valid_ops &= propagate_rtx_1 (&op0, old_rtx, new_rtx, flags);
           if (op0 == XEXP (x, 0))
 	    return true;
-	  tem = simplify_gen_subreg (mode, op0, GET_MODE (SUBREG_REG (x)),
-				     SUBREG_BYTE (x));
+	  tem = simplify_subreg (mode, op0, GET_MODE (SUBREG_REG (x)),
+				 SUBREG_BYTE (x));
+	  if (!tem)
+	    return false;
+	  else if (GET_CODE (tem) == SUBREG && REG_P (SUBREG_REG (tem)))
+	    valid_ops = true;
+	  else if (REG_P (tem))
+	    valid_ops = true;
 	}
       break;