diff mbox

[x86] Fix pblendv expand.

Message ID CAOvf_xxs3bOHcPCy_RyUQ7Tw=+KRodLZSidHgg7wPW=+WKi4qA@mail.gmail.com
State New
Headers show

Commit Message

Evgeny Stupachenko Dec. 8, 2014, 11:33 p.m. UTC
Hi,

The patch fix pblendv expand.
The bug was uncovered when permutation operands are constants.
In this case we init target register for expand_vec_perm_1 with
constant and then rewrite the target with constant for
expand_vec_perm_pblend.

The patch fixes 403.gcc execution, compiled with -Ofast -funroll-loops
-flto -march=corei7.

Bootstrap and make check passed.

Is it ok?

Evgeny

2014-12-09  Evgeny Stupachenko  <evstupac@gmail.com>

gcc/
        * config/i386/i386.c (expand_vec_perm_pblendv): Gen new rtx for
        expand_vec_perm_1 target.

Comments

Uros Bizjak Dec. 9, 2014, 8:57 a.m. UTC | #1
On Tue, Dec 9, 2014 at 12:33 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:

> The patch fix pblendv expand.
> The bug was uncovered when permutation operands are constants.
> In this case we init target register for expand_vec_perm_1 with
> constant and then rewrite the target with constant for
> expand_vec_perm_pblend.
>
> The patch fixes 403.gcc execution, compiled with -Ofast -funroll-loops
> -flto -march=corei7.
>
> Bootstrap and make check passed.
>
> Is it ok?

Please add a testcase.

Uros.

>
> Evgeny
>
> 2014-12-09  Evgeny Stupachenko  <evstupac@gmail.com>
>
> gcc/
>         * config/i386/i386.c (expand_vec_perm_pblendv): Gen new rtx for
>         expand_vec_perm_1 target.
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index eafc15a..5a914ad 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -47546,6 +47546,7 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d *d)
>      dcopy.op0 = dcopy.op1 = d->op1;
>    else
>      dcopy.op0 = dcopy.op1 = d->op0;
> +  dcopy.target = gen_reg_rtx (vmode);
>    dcopy.one_operand_p = true;
>
>    for (i = 0; i < nelt; ++i)
Uros Bizjak Dec. 9, 2014, 9:06 a.m. UTC | #2
On Tue, Dec 9, 2014 at 9:57 AM, Uros Bizjak <ubizjak@gmail.com> wrote:

>> The patch fix pblendv expand.
>> The bug was uncovered when permutation operands are constants.
>> In this case we init target register for expand_vec_perm_1 with
>> constant and then rewrite the target with constant for
>> expand_vec_perm_pblend.
>>
>> The patch fixes 403.gcc execution, compiled with -Ofast -funroll-loops
>> -flto -march=corei7.
>>
>> Bootstrap and make check passed.
>>
>> Is it ok?
>
> Please add a testcase.

Also, it surprises me that we enter expand_vec_perm_pblendv with
uninitialized (?) target. Does your patch only papers over a real
problem up the call chain (hard to say without a testcase)?

Uros.

>
>>
>> Evgeny
>>
>> 2014-12-09  Evgeny Stupachenko  <evstupac@gmail.com>
>>
>> gcc/
>>         * config/i386/i386.c (expand_vec_perm_pblendv): Gen new rtx for
>>         expand_vec_perm_1 target.
>>
>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> index eafc15a..5a914ad 100644
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -47546,6 +47546,7 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d *d)
>>      dcopy.op0 = dcopy.op1 = d->op1;
>>    else
>>      dcopy.op0 = dcopy.op1 = d->op0;
>> +  dcopy.target = gen_reg_rtx (vmode);
>>    dcopy.one_operand_p = true;
>>
>>    for (i = 0; i < nelt; ++i)
Evgeny Stupachenko Dec. 9, 2014, 1:29 p.m. UTC | #3
The case comes from spec2006 403.gcc (or old GCC itself).

  for (i = 0; i < FIRST_PSEUDO_REGISTER; ++i)
    {
      vd->e[i].mode = VOIDmode;
      vd->e[i].oldest_regno = i;
      vd->e[i].next_regno = INVALID_REGNUM;
    }

It is vectorized and only then completely peeled.
Only after peeling all stored values become constant.

Currently expand_vec_perm_pblendv works as following:
Let d.target, d.op0, dop1 be permutation parameters.

First we permute an operand (d.op1 or d.op0) and then blend it with
other argument:

d.target = shuffle(d.op1) /* using expand_vec_perm_1 */
d.target = pblend(d.op0, d.target)
(if d.op0 equal to d.target this is buggy)

Patch make it more accurate:
tmp = gen_reg_rtx (vmode)
tmp = shuffle(d.op1) /* using expand_vec_perm_1 */
d.target = pblend(d.op0, tmp)
(Here d.op0 can be equal to d.target)

Below is rtl dump of buggy case:

(183t.optimized)
...
vect_shuffle3_low_470 = VEC_PERM_EXPR <{ 0, 0, 0, 0 }, { 32, 33, 34,
35 }, { 0, 4, 0, 1 }>;
vect_shuffle3_high_469 = VEC_PERM_EXPR <vect_shuffle3_low_470, {
4294967295, 4294967295, 4294967295, 4294967295 }, { 0, 1, 4, 3 }>;
...
(184r.expand)
...
(insn 205 204 206 (set (reg:V4SI 768)
        (const_vector:V4SI [
                (const_int 0 [0])
                (const_int 0 [0])
                (const_int 0 [0])
                (const_int 0 [0])
            ])) ../regrename.c:1171 -1
     (nil))

(insn 206 205 208 (set (reg:V4SI 769)
        (mem/u/c:V4SI (symbol_ref/u:DI ("*.LC28") [flags 0x2]) [3  S16
A128])) ../regrename.c:1171 -1
     (expr_list:REG_EQUAL (const_vector:V4SI [
                (const_int 32 [0x20])
                (const_int 33 [0x21])
                (const_int 34 [0x22])
                (const_int 35 [0x23])
            ])
        (nil)))

(insn 208 206 207 (set (reg:V4SI 770)
        (vec_select:V4SI (vec_concat:V8SI (reg:V4SI 768)
                (reg:V4SI 769))
            (parallel [
                    (const_int 0 [0])
                    (const_int 4 [0x4])
                    (const_int 1 [0x1])
                    (const_int 5 [0x5])
                ]))) ../regrename.c:1171 -1
     (nil))

(insn 207 208 209 (set (reg:V4SI 464 [ D.15061 ])
        (vec_select:V4SI (reg:V4SI 770)
            (parallel [
                    (const_int 0 [0])
                    (const_int 1 [0x1])
                    (const_int 0 [0])
                    (const_int 2 [0x2])
                ]))) ../regrename.c:1171 -1
     (nil))

(insn 209 207 210 (set (reg:V4SI 771)
        (const_vector:V4SI [
                (const_int -1 [0xffffffffffffffff])
                (const_int -1 [0xffffffffffffffff])
                (const_int -1 [0xffffffffffffffff])
                (const_int -1 [0xffffffffffffffff])
            ])) ../regrename.c:1171 -1
     (nil))

(insn 210 209 211 (set (reg:V4SI 464 [ D.15061 ])
        (vec_select:V4SI (reg:V4SI 771)
            (parallel [
                    (const_int 0 [0])
                    (const_int 1 [0x1])
                    (const_int 0 [0])
                    (const_int 3 [0x3])
                ]))) ../regrename.c:1171 -1
     (nil))
(insn 211 210 212 (set (reg:V8HI 772)
        (vec_merge:V8HI (subreg:V8HI (reg:V4SI 464 [ D.15061 ]) 0)
            (subreg:V8HI (reg:V4SI 464 [ D.15061 ]) 0)
            (const_int 48 [0x30]))) ../regrename.c:1171 -1
     (nil))
...

On Tue, Dec 9, 2014 at 12:06 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Tue, Dec 9, 2014 at 9:57 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>
>>> The patch fix pblendv expand.
>>> The bug was uncovered when permutation operands are constants.
>>> In this case we init target register for expand_vec_perm_1 with
>>> constant and then rewrite the target with constant for
>>> expand_vec_perm_pblend.
>>>
>>> The patch fixes 403.gcc execution, compiled with -Ofast -funroll-loops
>>> -flto -march=corei7.
>>>
>>> Bootstrap and make check passed.
>>>
>>> Is it ok?
>>
>> Please add a testcase.
>
> Also, it surprises me that we enter expand_vec_perm_pblendv with
> uninitialized (?) target. Does your patch only papers over a real
> problem up the call chain (hard to say without a testcase)?
>
> Uros.
>
>>
>>>
>>> Evgeny
>>>
>>> 2014-12-09  Evgeny Stupachenko  <evstupac@gmail.com>
>>>
>>> gcc/
>>>         * config/i386/i386.c (expand_vec_perm_pblendv): Gen new rtx for
>>>         expand_vec_perm_1 target.
>>>
>>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>>> index eafc15a..5a914ad 100644
>>> --- a/gcc/config/i386/i386.c
>>> +++ b/gcc/config/i386/i386.c
>>> @@ -47546,6 +47546,7 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d *d)
>>>      dcopy.op0 = dcopy.op1 = d->op1;
>>>    else
>>>      dcopy.op0 = dcopy.op1 = d->op0;
>>> +  dcopy.target = gen_reg_rtx (vmode);
>>>    dcopy.one_operand_p = true;
>>>
>>>    for (i = 0; i < nelt; ++i)
Evgeny Stupachenko Dec. 9, 2014, 3:59 p.m. UTC | #4
I've tried to get smaller reproducer, however currently it is
complicated as several functions in GCC.
However patch is fixing spec2006 benchmark. Shouldn't that be enough
for regression testing?

Thanks,
Evgeny

On Tue, Dec 9, 2014 at 4:29 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> The case comes from spec2006 403.gcc (or old GCC itself).
>
>   for (i = 0; i < FIRST_PSEUDO_REGISTER; ++i)
>     {
>       vd->e[i].mode = VOIDmode;
>       vd->e[i].oldest_regno = i;
>       vd->e[i].next_regno = INVALID_REGNUM;
>     }
>
> It is vectorized and only then completely peeled.
> Only after peeling all stored values become constant.
>
> Currently expand_vec_perm_pblendv works as following:
> Let d.target, d.op0, dop1 be permutation parameters.
>
> First we permute an operand (d.op1 or d.op0) and then blend it with
> other argument:
>
> d.target = shuffle(d.op1) /* using expand_vec_perm_1 */
> d.target = pblend(d.op0, d.target)
> (if d.op0 equal to d.target this is buggy)
>
> Patch make it more accurate:
> tmp = gen_reg_rtx (vmode)
> tmp = shuffle(d.op1) /* using expand_vec_perm_1 */
> d.target = pblend(d.op0, tmp)
> (Here d.op0 can be equal to d.target)
>
> Below is rtl dump of buggy case:
>
> (183t.optimized)
> ...
> vect_shuffle3_low_470 = VEC_PERM_EXPR <{ 0, 0, 0, 0 }, { 32, 33, 34,
> 35 }, { 0, 4, 0, 1 }>;
> vect_shuffle3_high_469 = VEC_PERM_EXPR <vect_shuffle3_low_470, {
> 4294967295, 4294967295, 4294967295, 4294967295 }, { 0, 1, 4, 3 }>;
> ...
> (184r.expand)
> ...
> (insn 205 204 206 (set (reg:V4SI 768)
>         (const_vector:V4SI [
>                 (const_int 0 [0])
>                 (const_int 0 [0])
>                 (const_int 0 [0])
>                 (const_int 0 [0])
>             ])) ../regrename.c:1171 -1
>      (nil))
>
> (insn 206 205 208 (set (reg:V4SI 769)
>         (mem/u/c:V4SI (symbol_ref/u:DI ("*.LC28") [flags 0x2]) [3  S16
> A128])) ../regrename.c:1171 -1
>      (expr_list:REG_EQUAL (const_vector:V4SI [
>                 (const_int 32 [0x20])
>                 (const_int 33 [0x21])
>                 (const_int 34 [0x22])
>                 (const_int 35 [0x23])
>             ])
>         (nil)))
>
> (insn 208 206 207 (set (reg:V4SI 770)
>         (vec_select:V4SI (vec_concat:V8SI (reg:V4SI 768)
>                 (reg:V4SI 769))
>             (parallel [
>                     (const_int 0 [0])
>                     (const_int 4 [0x4])
>                     (const_int 1 [0x1])
>                     (const_int 5 [0x5])
>                 ]))) ../regrename.c:1171 -1
>      (nil))
>
> (insn 207 208 209 (set (reg:V4SI 464 [ D.15061 ])
>         (vec_select:V4SI (reg:V4SI 770)
>             (parallel [
>                     (const_int 0 [0])
>                     (const_int 1 [0x1])
>                     (const_int 0 [0])
>                     (const_int 2 [0x2])
>                 ]))) ../regrename.c:1171 -1
>      (nil))
>
> (insn 209 207 210 (set (reg:V4SI 771)
>         (const_vector:V4SI [
>                 (const_int -1 [0xffffffffffffffff])
>                 (const_int -1 [0xffffffffffffffff])
>                 (const_int -1 [0xffffffffffffffff])
>                 (const_int -1 [0xffffffffffffffff])
>             ])) ../regrename.c:1171 -1
>      (nil))
>
> (insn 210 209 211 (set (reg:V4SI 464 [ D.15061 ])
>         (vec_select:V4SI (reg:V4SI 771)
>             (parallel [
>                     (const_int 0 [0])
>                     (const_int 1 [0x1])
>                     (const_int 0 [0])
>                     (const_int 3 [0x3])
>                 ]))) ../regrename.c:1171 -1
>      (nil))
> (insn 211 210 212 (set (reg:V8HI 772)
>         (vec_merge:V8HI (subreg:V8HI (reg:V4SI 464 [ D.15061 ]) 0)
>             (subreg:V8HI (reg:V4SI 464 [ D.15061 ]) 0)
>             (const_int 48 [0x30]))) ../regrename.c:1171 -1
>      (nil))
> ...
>
> On Tue, Dec 9, 2014 at 12:06 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> On Tue, Dec 9, 2014 at 9:57 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>
>>>> The patch fix pblendv expand.
>>>> The bug was uncovered when permutation operands are constants.
>>>> In this case we init target register for expand_vec_perm_1 with
>>>> constant and then rewrite the target with constant for
>>>> expand_vec_perm_pblend.
>>>>
>>>> The patch fixes 403.gcc execution, compiled with -Ofast -funroll-loops
>>>> -flto -march=corei7.
>>>>
>>>> Bootstrap and make check passed.
>>>>
>>>> Is it ok?
>>>
>>> Please add a testcase.
>>
>> Also, it surprises me that we enter expand_vec_perm_pblendv with
>> uninitialized (?) target. Does your patch only papers over a real
>> problem up the call chain (hard to say without a testcase)?
>>
>> Uros.
>>
>>>
>>>>
>>>> Evgeny
>>>>
>>>> 2014-12-09  Evgeny Stupachenko  <evstupac@gmail.com>
>>>>
>>>> gcc/
>>>>         * config/i386/i386.c (expand_vec_perm_pblendv): Gen new rtx for
>>>>         expand_vec_perm_1 target.
>>>>
>>>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>>>> index eafc15a..5a914ad 100644
>>>> --- a/gcc/config/i386/i386.c
>>>> +++ b/gcc/config/i386/i386.c
>>>> @@ -47546,6 +47546,7 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d *d)
>>>>      dcopy.op0 = dcopy.op1 = d->op1;
>>>>    else
>>>>      dcopy.op0 = dcopy.op1 = d->op0;
>>>> +  dcopy.target = gen_reg_rtx (vmode);
>>>>    dcopy.one_operand_p = true;
>>>>
>>>>    for (i = 0; i < nelt; ++i)
Richard Henderson Dec. 9, 2014, 4:20 p.m. UTC | #5
On 12/09/2014 07:59 AM, Evgeny Stupachenko wrote:
> However patch is fixing spec2006 benchmark. Shouldn't that be enough
> for regression testing?
> 

No.  Spec is not free.


r~
Jakub Jelinek Dec. 9, 2014, 4:23 p.m. UTC | #6
On Tue, Dec 09, 2014 at 06:59:27PM +0300, Evgeny Stupachenko wrote:
> I've tried to get smaller reproducer, however currently it is
> complicated as several functions in GCC.

Just add a gcc_assert where you are changing the code, testing for
what you want to avoid.
Then just delta reduce it or creduce it.

	Jakub
Evgeny Stupachenko Dec. 9, 2014, 4:38 p.m. UTC | #7
I mean that there are a lot of people tracking spec2006 stability and
therefore the issue should be on track in future.
And that I can create the test case, but it would be as big as several
GCC functions.
Will work on reducing the test case.


On Tue, Dec 9, 2014 at 7:20 PM, Richard Henderson <rth@redhat.com> wrote:
> On 12/09/2014 07:59 AM, Evgeny Stupachenko wrote:
>> However patch is fixing spec2006 benchmark. Shouldn't that be enough
>> for regression testing?
>>
>
> No.  Spec is not free.
>
>
> r~
diff mbox

Patch

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index eafc15a..5a914ad 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -47546,6 +47546,7 @@  expand_vec_perm_pblendv (struct expand_vec_perm_d *d)
     dcopy.op0 = dcopy.op1 = d->op1;
   else
     dcopy.op0 = dcopy.op1 = d->op0;
+  dcopy.target = gen_reg_rtx (vmode);
   dcopy.one_operand_p = true;

   for (i = 0; i < nelt; ++i)