diff mbox

[i386] Add prefixes avoidance tuning for silvermont target

Message ID 20140715110155.GA48158@msticlxl57.ims.intel.com
State New
Headers show

Commit Message

Ilya Enkovich July 15, 2014, 11:01 a.m. UTC
On 15 Jul 10:42, Uros Bizjak wrote:
> On Tue, Jul 15, 2014 at 10:25 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> 
> >>>>> Also fully restrict xmm8-15 does not seem right.  It is just costly
> >>>>> but not fully disallowed.
> >>>>
> >>>> As said earlier, you can try "Ya*x" as a constraint.
> >>>
> >>> I tried it. It does not seem to affect allocation much. I do not see
> >>> any gain on targeted tests.
> >>
> >> Strange, because the documentation claims:
> >>
> >> '*'
> >>      Says that the following character should be ignored when choosing
> >>      register preferences.  '*' has no effect on the meaning of the
> >>      constraint as a constraint, and no effect on reloading.  For LRA
> >>      '*' additionally disparages slightly the alternative if the
> >>      following character matches the operand.
> >>
> >> Let me rethink this a bit. Prehaps we could reconsider Jakub's
> >> proposal with "Ya,!x" (with two alternatives). IIRC this approach was
> >> needed for some MMX alternatives, where we didn't want RA to allocate
> >> a MMX register when the value could be passed in integer regs, but the
> >> value was still allowed in MMX register.
> >
> > That's is what my patch already does, but with '?' instead of '!'.
> 
> Yes, I know. The problem is, that Ya*x type conditional allocation
> worked OK in the past for "not preferred, but still alowed regclass"
> registers, There are several patterns in i386.md that live by this
> premise, including movsf_internal and movdf_internal. If this approach
> doesn't work anymore, then we have to either figure out what is the
> reason, or invent a new strategy that will be applicable to all cases.
> 
> Can you please post a small test that illustrates the case where Ya,!x
> works, but Ya*x doesn't?

It's hard to compose a small testcase which will have SSE4 instructions generated with required register usage.  I use tcpjumbo test from TCPmark for initial check of how my patch works.  This test has a lot of pmovzxwd instructions generated and many of them use xmm8-15.  I tried two versions of a simple patch which modifies only pmovzxwd instruction.

Patch1:



Here are results of looking for pmovzxwd in resulting binaries:
#objdump -d tcpjumbo-orig | grep pmovzxwd | grep "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l
76
#objdump -d tcpjumbo-patch1 | grep pmovzxwd | grep "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l
0
#objdump -d tcpjumbo-patch2 | grep pmovzxwd | grep "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l
76

Therefore I make a conclusion that Yr*x does not really differ much from x.


Thanks,
Ilya

> 
> I have added Vlad to CC for his opinion on this matter. This is
> clearly a RA issue that has to be resolved before your patch is
> committed.
> 
> Thanks,
> Uros.

Comments

Uros Bizjak July 15, 2014, 1:38 p.m. UTC | #1
On Tue, Jul 15, 2014 at 1:01 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> On 15 Jul 10:42, Uros Bizjak wrote:
>> On Tue, Jul 15, 2014 at 10:25 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>
>> >>>>> Also fully restrict xmm8-15 does not seem right.  It is just costly
>> >>>>> but not fully disallowed.
>> >>>>
>> >>>> As said earlier, you can try "Ya*x" as a constraint.
>> >>>
>> >>> I tried it. It does not seem to affect allocation much. I do not see
>> >>> any gain on targeted tests.
>> >>
>> >> Strange, because the documentation claims:
>> >>
>> >> '*'
>> >>      Says that the following character should be ignored when choosing
>> >>      register preferences.  '*' has no effect on the meaning of the
>> >>      constraint as a constraint, and no effect on reloading.  For LRA
>> >>      '*' additionally disparages slightly the alternative if the
>> >>      following character matches the operand.
>> >>
>> >> Let me rethink this a bit. Prehaps we could reconsider Jakub's
>> >> proposal with "Ya,!x" (with two alternatives). IIRC this approach was
>> >> needed for some MMX alternatives, where we didn't want RA to allocate
>> >> a MMX register when the value could be passed in integer regs, but the
>> >> value was still allowed in MMX register.
>> >
>> > That's is what my patch already does, but with '?' instead of '!'.
>>
>> Yes, I know. The problem is, that Ya*x type conditional allocation
>> worked OK in the past for "not preferred, but still alowed regclass"
>> registers, There are several patterns in i386.md that live by this
>> premise, including movsf_internal and movdf_internal. If this approach
>> doesn't work anymore, then we have to either figure out what is the
>> reason, or invent a new strategy that will be applicable to all cases.
>>
>> Can you please post a small test that illustrates the case where Ya,!x
>> works, but Ya*x doesn't?
>
> It's hard to compose a small testcase which will have SSE4 instructions generated with required register usage.  I use tcpjumbo test from TCPmark for initial check of how my patch works.  This test has a lot of pmovzxwd instructions generated and many of them use xmm8-15.  I tried two versions of a simple patch which modifies only pmovzxwd instruction.
>
> Patch1:
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index d907353..6b03b72 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -11852,10 +11852,10 @@
>     (set_attr "mode" "OI")])
>
>  (define_insn "sse4_1_<code>v4hiv4si2"
> -  [(set (match_operand:V4SI 0 "register_operand" "=x")
> +  [(set (match_operand:V4SI 0 "register_operand" "=Yr,!x")
>         (any_extend:V4SI
>           (vec_select:V4HI
> -           (match_operand:V8HI 1 "nonimmediate_operand" "xm")
> +           (match_operand:V8HI 1 "nonimmediate_operand" "Yr,!xm")
>             (parallel [(const_int 0) (const_int 1)
>                        (const_int 2) (const_int 3)]))))]
>    "TARGET_SSE4_1"
>
> Patch2:
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index d907353..b3721c4 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -11852,10 +11852,10 @@
>     (set_attr "mode" "OI")])
>
>  (define_insn "sse4_1_<code>v4hiv4si2"
> -  [(set (match_operand:V4SI 0 "register_operand" "=x")
> +  [(set (match_operand:V4SI 0 "register_operand" "=Yr*x")
>         (any_extend:V4SI
>           (vec_select:V4HI
> -           (match_operand:V8HI 1 "nonimmediate_operand" "xm")
> +           (match_operand:V8HI 1 "nonimmediate_operand" "Yr*xm")
>             (parallel [(const_int 0) (const_int 1)
>                        (const_int 2) (const_int 3)]))))]
>    "TARGET_SSE4_1"
>
>
> Here are results of looking for pmovzxwd in resulting binaries:
> #objdump -d tcpjumbo-orig | grep pmovzxwd | grep "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l
> 76
> #objdump -d tcpjumbo-patch1 | grep pmovzxwd | grep "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l
> 0
> #objdump -d tcpjumbo-patch2 | grep pmovzxwd | grep "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l
> 76
>
> Therefore I make a conclusion that Yr*x does not really differ much from x.

Just FTR:

Using "Yr,*x" is also a viable option:

#objdump -d tcpjumbo-patch3 | grep pmovzxwd | grep
"xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l
0

I believe that the above is the way to go with LRA. Vladimir, what do you think?

Uros.
Ilya Enkovich Nov. 5, 2014, 9:35 a.m. UTC | #2
Hi,

Having stage1 close to end, may we make some decision regarding this
patch? Having a couple of working variants, may we choose and use one
of them?

Thanks,
Ilya

2014-07-15 17:38 GMT+04:00 Uros Bizjak <ubizjak@gmail.com>:
> On Tue, Jul 15, 2014 at 1:01 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>> On 15 Jul 10:42, Uros Bizjak wrote:
>>> On Tue, Jul 15, 2014 at 10:25 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>
>>> >>>>> Also fully restrict xmm8-15 does not seem right.  It is just costly
>>> >>>>> but not fully disallowed.
>>> >>>>
>>> >>>> As said earlier, you can try "Ya*x" as a constraint.
>>> >>>
>>> >>> I tried it. It does not seem to affect allocation much. I do not see
>>> >>> any gain on targeted tests.
>>> >>
>>> >> Strange, because the documentation claims:
>>> >>
>>> >> '*'
>>> >>      Says that the following character should be ignored when choosing
>>> >>      register preferences.  '*' has no effect on the meaning of the
>>> >>      constraint as a constraint, and no effect on reloading.  For LRA
>>> >>      '*' additionally disparages slightly the alternative if the
>>> >>      following character matches the operand.
>>> >>
>>> >> Let me rethink this a bit. Prehaps we could reconsider Jakub's
>>> >> proposal with "Ya,!x" (with two alternatives). IIRC this approach was
>>> >> needed for some MMX alternatives, where we didn't want RA to allocate
>>> >> a MMX register when the value could be passed in integer regs, but the
>>> >> value was still allowed in MMX register.
>>> >
>>> > That's is what my patch already does, but with '?' instead of '!'.
>>>
>>> Yes, I know. The problem is, that Ya*x type conditional allocation
>>> worked OK in the past for "not preferred, but still alowed regclass"
>>> registers, There are several patterns in i386.md that live by this
>>> premise, including movsf_internal and movdf_internal. If this approach
>>> doesn't work anymore, then we have to either figure out what is the
>>> reason, or invent a new strategy that will be applicable to all cases.
>>>
>>> Can you please post a small test that illustrates the case where Ya,!x
>>> works, but Ya*x doesn't?
>>
>> It's hard to compose a small testcase which will have SSE4 instructions generated with required register usage.  I use tcpjumbo test from TCPmark for initial check of how my patch works.  This test has a lot of pmovzxwd instructions generated and many of them use xmm8-15.  I tried two versions of a simple patch which modifies only pmovzxwd instruction.
>>
>> Patch1:
>>
>> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
>> index d907353..6b03b72 100644
>> --- a/gcc/config/i386/sse.md
>> +++ b/gcc/config/i386/sse.md
>> @@ -11852,10 +11852,10 @@
>>     (set_attr "mode" "OI")])
>>
>>  (define_insn "sse4_1_<code>v4hiv4si2"
>> -  [(set (match_operand:V4SI 0 "register_operand" "=x")
>> +  [(set (match_operand:V4SI 0 "register_operand" "=Yr,!x")
>>         (any_extend:V4SI
>>           (vec_select:V4HI
>> -           (match_operand:V8HI 1 "nonimmediate_operand" "xm")
>> +           (match_operand:V8HI 1 "nonimmediate_operand" "Yr,!xm")
>>             (parallel [(const_int 0) (const_int 1)
>>                        (const_int 2) (const_int 3)]))))]
>>    "TARGET_SSE4_1"
>>
>> Patch2:
>>
>> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
>> index d907353..b3721c4 100644
>> --- a/gcc/config/i386/sse.md
>> +++ b/gcc/config/i386/sse.md
>> @@ -11852,10 +11852,10 @@
>>     (set_attr "mode" "OI")])
>>
>>  (define_insn "sse4_1_<code>v4hiv4si2"
>> -  [(set (match_operand:V4SI 0 "register_operand" "=x")
>> +  [(set (match_operand:V4SI 0 "register_operand" "=Yr*x")
>>         (any_extend:V4SI
>>           (vec_select:V4HI
>> -           (match_operand:V8HI 1 "nonimmediate_operand" "xm")
>> +           (match_operand:V8HI 1 "nonimmediate_operand" "Yr*xm")
>>             (parallel [(const_int 0) (const_int 1)
>>                        (const_int 2) (const_int 3)]))))]
>>    "TARGET_SSE4_1"
>>
>>
>> Here are results of looking for pmovzxwd in resulting binaries:
>> #objdump -d tcpjumbo-orig | grep pmovzxwd | grep "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l
>> 76
>> #objdump -d tcpjumbo-patch1 | grep pmovzxwd | grep "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l
>> 0
>> #objdump -d tcpjumbo-patch2 | grep pmovzxwd | grep "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l
>> 76
>>
>> Therefore I make a conclusion that Yr*x does not really differ much from x.
>
> Just FTR:
>
> Using "Yr,*x" is also a viable option:
>
> #objdump -d tcpjumbo-patch3 | grep pmovzxwd | grep
> "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l
> 0
>
> I believe that the above is the way to go with LRA. Vladimir, what do you think?
>
> Uros.
Uros Bizjak Nov. 5, 2014, 10 a.m. UTC | #3
On Wed, Nov 5, 2014 at 10:35 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> Hi,
>
> Having stage1 close to end, may we make some decision regarding this
> patch? Having a couple of working variants, may we choose and use one
> of them?

I propose to wait for Vlad for an update about his plans on register
preference algorythm that would fix this (and other "Ya*r"-type
issues).

In the absence of the fix, we'll go with "Yr,*x".

Uros.

>
> Thanks,
> Ilya
>
> 2014-07-15 17:38 GMT+04:00 Uros Bizjak <ubizjak@gmail.com>:
>> On Tue, Jul 15, 2014 at 1:01 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>> On 15 Jul 10:42, Uros Bizjak wrote:
>>>> On Tue, Jul 15, 2014 at 10:25 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>>
>>>> >>>>> Also fully restrict xmm8-15 does not seem right.  It is just costly
>>>> >>>>> but not fully disallowed.
>>>> >>>>
>>>> >>>> As said earlier, you can try "Ya*x" as a constraint.
>>>> >>>
>>>> >>> I tried it. It does not seem to affect allocation much. I do not see
>>>> >>> any gain on targeted tests.
>>>> >>
>>>> >> Strange, because the documentation claims:
>>>> >>
>>>> >> '*'
>>>> >>      Says that the following character should be ignored when choosing
>>>> >>      register preferences.  '*' has no effect on the meaning of the
>>>> >>      constraint as a constraint, and no effect on reloading.  For LRA
>>>> >>      '*' additionally disparages slightly the alternative if the
>>>> >>      following character matches the operand.
>>>> >>
>>>> >> Let me rethink this a bit. Prehaps we could reconsider Jakub's
>>>> >> proposal with "Ya,!x" (with two alternatives). IIRC this approach was
>>>> >> needed for some MMX alternatives, where we didn't want RA to allocate
>>>> >> a MMX register when the value could be passed in integer regs, but the
>>>> >> value was still allowed in MMX register.
>>>> >
>>>> > That's is what my patch already does, but with '?' instead of '!'.
>>>>
>>>> Yes, I know. The problem is, that Ya*x type conditional allocation
>>>> worked OK in the past for "not preferred, but still alowed regclass"
>>>> registers, There are several patterns in i386.md that live by this
>>>> premise, including movsf_internal and movdf_internal. If this approach
>>>> doesn't work anymore, then we have to either figure out what is the
>>>> reason, or invent a new strategy that will be applicable to all cases.
>>>>
>>>> Can you please post a small test that illustrates the case where Ya,!x
>>>> works, but Ya*x doesn't?
>>>
>>> It's hard to compose a small testcase which will have SSE4 instructions generated with required register usage.  I use tcpjumbo test from TCPmark for initial check of how my patch works.  This test has a lot of pmovzxwd instructions generated and many of them use xmm8-15.  I tried two versions of a simple patch which modifies only pmovzxwd instruction.
>>>
>>> Patch1:
>>>
>>> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
>>> index d907353..6b03b72 100644
>>> --- a/gcc/config/i386/sse.md
>>> +++ b/gcc/config/i386/sse.md
>>> @@ -11852,10 +11852,10 @@
>>>     (set_attr "mode" "OI")])
>>>
>>>  (define_insn "sse4_1_<code>v4hiv4si2"
>>> -  [(set (match_operand:V4SI 0 "register_operand" "=x")
>>> +  [(set (match_operand:V4SI 0 "register_operand" "=Yr,!x")
>>>         (any_extend:V4SI
>>>           (vec_select:V4HI
>>> -           (match_operand:V8HI 1 "nonimmediate_operand" "xm")
>>> +           (match_operand:V8HI 1 "nonimmediate_operand" "Yr,!xm")
>>>             (parallel [(const_int 0) (const_int 1)
>>>                        (const_int 2) (const_int 3)]))))]
>>>    "TARGET_SSE4_1"
>>>
>>> Patch2:
>>>
>>> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
>>> index d907353..b3721c4 100644
>>> --- a/gcc/config/i386/sse.md
>>> +++ b/gcc/config/i386/sse.md
>>> @@ -11852,10 +11852,10 @@
>>>     (set_attr "mode" "OI")])
>>>
>>>  (define_insn "sse4_1_<code>v4hiv4si2"
>>> -  [(set (match_operand:V4SI 0 "register_operand" "=x")
>>> +  [(set (match_operand:V4SI 0 "register_operand" "=Yr*x")
>>>         (any_extend:V4SI
>>>           (vec_select:V4HI
>>> -           (match_operand:V8HI 1 "nonimmediate_operand" "xm")
>>> +           (match_operand:V8HI 1 "nonimmediate_operand" "Yr*xm")
>>>             (parallel [(const_int 0) (const_int 1)
>>>                        (const_int 2) (const_int 3)]))))]
>>>    "TARGET_SSE4_1"
>>>
>>>
>>> Here are results of looking for pmovzxwd in resulting binaries:
>>> #objdump -d tcpjumbo-orig | grep pmovzxwd | grep "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l
>>> 76
>>> #objdump -d tcpjumbo-patch1 | grep pmovzxwd | grep "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l
>>> 0
>>> #objdump -d tcpjumbo-patch2 | grep pmovzxwd | grep "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l
>>> 76
>>>
>>> Therefore I make a conclusion that Yr*x does not really differ much from x.
>>
>> Just FTR:
>>
>> Using "Yr,*x" is also a viable option:
>>
>> #objdump -d tcpjumbo-patch3 | grep pmovzxwd | grep
>> "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l
>> 0
>>
>> I believe that the above is the way to go with LRA. Vladimir, what do you think?
>>
>> Uros.
diff mbox

Patch

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index d907353..6b03b72 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -11852,10 +11852,10 @@ 
    (set_attr "mode" "OI")])

 (define_insn "sse4_1_<code>v4hiv4si2"
-  [(set (match_operand:V4SI 0 "register_operand" "=x")
+  [(set (match_operand:V4SI 0 "register_operand" "=Yr,!x")
        (any_extend:V4SI
          (vec_select:V4HI
-           (match_operand:V8HI 1 "nonimmediate_operand" "xm")
+           (match_operand:V8HI 1 "nonimmediate_operand" "Yr,!xm")
            (parallel [(const_int 0) (const_int 1)
                       (const_int 2) (const_int 3)]))))]
   "TARGET_SSE4_1"

Patch2:

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index d907353..b3721c4 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -11852,10 +11852,10 @@ 
    (set_attr "mode" "OI")])

 (define_insn "sse4_1_<code>v4hiv4si2"
-  [(set (match_operand:V4SI 0 "register_operand" "=x")
+  [(set (match_operand:V4SI 0 "register_operand" "=Yr*x")
        (any_extend:V4SI
          (vec_select:V4HI
-           (match_operand:V8HI 1 "nonimmediate_operand" "xm")
+           (match_operand:V8HI 1 "nonimmediate_operand" "Yr*xm")
            (parallel [(const_int 0) (const_int 1)
                       (const_int 2) (const_int 3)]))))]
   "TARGET_SSE4_1"