diff mbox series

[v3] vect: Recog mul_highpart pattern

Message ID 46838de4-3d92-a270-e71a-73fbe923d306@linux.ibm.com
State New
Headers show
Series [v3] vect: Recog mul_highpart pattern | expand

Commit Message

Kewen.Lin July 15, 2021, 7:06 a.m. UTC
on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote:
> on 2021/7/14 下午2:38, Richard Biener wrote:
>> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>
>>> on 2021/7/13 下午8:42, Richard Biener wrote:
>>>> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>
>>> I guess the proposed IFN would be directly mapped for [us]mul_highpart?
>>
>> Yes.
>>
> 
> Thanks for confirming!  The related patch v2 is attached and the testing
> is ongoing.
> 

It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and
aarch64-linux-gnu.  But on x86_64-redhat-linux there are XPASSes as below:

XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw

They weren't exposed in the testing run with the previous patch which
doesn't use IFN way.  By investigating it, the difference comes from
the different costing on MULT_HIGHPART_EXPR and IFN_MULH.

For MULT_HIGHPART_EXPR, it's costed by 16 from below call:

	case MULT_EXPR:
	case WIDEN_MULT_EXPR:
	case MULT_HIGHPART_EXPR:
	  stmt_cost = ix86_multiplication_cost (ix86_cost, mode);

While for IFN_MULH, it's costed by 4 as normal stmt so the total cost
becomes profitable and the expected vectorization happens.

One conservative fix seems to make IFN_MULH costing go through the
unique cost interface for multiplication, that is:

      case CFN_MULH:
        stmt_cost = ix86_multiplication_cost (ix86_cost, mode);
        break;

As the test case marks the checks as "xfail", probably it's good to
revisit the costing on mul_highpart to ensure it's not priced more.

The attached patch also addressed Richard S.'s review comments on
two reformatting hunks.  Is it ok for trunk?

BR,
Kewen
-----
gcc/ChangeLog:

	* internal-fn.c (first_commutative_argument): Add info for IFN_MULH.
	* internal-fn.def (IFN_MULH): New internal function.
	* tree-vect-patterns.c (vect_recog_mulhs_pattern): Add support to
	recog normal multiply highpart as IFN_MULH.
	* config/i386/i386.c (ix86_add_stmt_cost): Adjust for combined
	function CFN_MULH.
---
 gcc/config/i386/i386.c   |  3 +++
 gcc/internal-fn.c        |  1 +
 gcc/internal-fn.def      |  2 ++
 gcc/tree-vect-patterns.c | 38 ++++++++++++++++++++++++++++----------
 4 files changed, 34 insertions(+), 10 deletions(-)

Comments

Uros Bizjak July 15, 2021, 7:17 a.m. UTC | #1
On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote:
> > on 2021/7/14 下午2:38, Richard Biener wrote:
> >> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>>
> >>> on 2021/7/13 下午8:42, Richard Biener wrote:
> >>>> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>
> >>> I guess the proposed IFN would be directly mapped for [us]mul_highpart?
> >>
> >> Yes.
> >>
> >
> > Thanks for confirming!  The related patch v2 is attached and the testing
> > is ongoing.
> >
>
> It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and
> aarch64-linux-gnu.  But on x86_64-redhat-linux there are XPASSes as below:
>
> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw

These XFAILs should be removed after your patch.

This is PR100696 [1], we want PMULH.W here, so x86 part of the patch
is actually not needed.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100696

Uros.

> They weren't exposed in the testing run with the previous patch which
> doesn't use IFN way.  By investigating it, the difference comes from
> the different costing on MULT_HIGHPART_EXPR and IFN_MULH.
>
> For MULT_HIGHPART_EXPR, it's costed by 16 from below call:
>
>         case MULT_EXPR:
>         case WIDEN_MULT_EXPR:
>         case MULT_HIGHPART_EXPR:
>           stmt_cost = ix86_multiplication_cost (ix86_cost, mode);
>
> While for IFN_MULH, it's costed by 4 as normal stmt so the total cost
> becomes profitable and the expected vectorization happens.
>
> One conservative fix seems to make IFN_MULH costing go through the
> unique cost interface for multiplication, that is:
>
>       case CFN_MULH:
>         stmt_cost = ix86_multiplication_cost (ix86_cost, mode);
>         break;
>
> As the test case marks the checks as "xfail", probably it's good to
> revisit the costing on mul_highpart to ensure it's not priced more.
>
> The attached patch also addressed Richard S.'s review comments on
> two reformatting hunks.  Is it ok for trunk?
>
> BR,
> Kewen
> -----
> gcc/ChangeLog:
>
>         * internal-fn.c (first_commutative_argument): Add info for IFN_MULH.
>         * internal-fn.def (IFN_MULH): New internal function.
>         * tree-vect-patterns.c (vect_recog_mulhs_pattern): Add support to
>         recog normal multiply highpart as IFN_MULH.
>         * config/i386/i386.c (ix86_add_stmt_cost): Adjust for combined
>         function CFN_MULH.
Kewen.Lin July 15, 2021, 8:04 a.m. UTC | #2
Hi Uros,

on 2021/7/15 下午3:17, Uros Bizjak wrote:
> On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>
>> on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote:
>>> on 2021/7/14 下午2:38, Richard Biener wrote:
>>>> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>>
>>>>> on 2021/7/13 下午8:42, Richard Biener wrote:
>>>>>> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>
>>>>> I guess the proposed IFN would be directly mapped for [us]mul_highpart?
>>>>
>>>> Yes.
>>>>
>>>
>>> Thanks for confirming!  The related patch v2 is attached and the testing
>>> is ongoing.
>>>
>>
>> It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and
>> aarch64-linux-gnu.  But on x86_64-redhat-linux there are XPASSes as below:
>>
>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
> 
> These XFAILs should be removed after your patch.
> 
I'm curious whether it's intentional not to specify -fno-vect-cost-model
for this test case.  As noted above, this case is sensitive on how we
cost mult_highpart.  Without cost modeling, the XFAILs can be removed
only with this mul_highpart pattern support, no matter how we model it
(x86 part of this patch exists or not).

> This is PR100696 [1], we want PMULH.W here, so x86 part of the patch
> is actually not needed.
> 

Thanks for the information!  The justification for the x86 part is that:
the IFN_MULH essentially covers MULT_HIGHPART_EXPR with mul_highpart
optab support, i386 port has already customized costing for 
MULT_HIGHPART_EXPR (should mean/involve the case with mul_highpart optab
support), if we don't follow the same way for IFN_MULH, I'm worried that
we may cost the IFN_MULH wrongly.  If taking IFN_MULH as normal stmt is
a right thing (we shouldn't cost it specially), it at least means we
have to adjust ix86_multiplication_cost for MULT_HIGHPART_EXPR when it
has direct mul_highpart optab support, I think they should be costed
consistently.  Does it sound reasonable?

BR,
Kewen

> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100696
> 
> Uros.
> 
>> They weren't exposed in the testing run with the previous patch which
>> doesn't use IFN way.  By investigating it, the difference comes from
>> the different costing on MULT_HIGHPART_EXPR and IFN_MULH.
>>
>> For MULT_HIGHPART_EXPR, it's costed by 16 from below call:
>>
>>         case MULT_EXPR:
>>         case WIDEN_MULT_EXPR:
>>         case MULT_HIGHPART_EXPR:
>>           stmt_cost = ix86_multiplication_cost (ix86_cost, mode);
>>
>> While for IFN_MULH, it's costed by 4 as normal stmt so the total cost
>> becomes profitable and the expected vectorization happens.
>>
>> One conservative fix seems to make IFN_MULH costing go through the
>> unique cost interface for multiplication, that is:
>>
>>       case CFN_MULH:
>>         stmt_cost = ix86_multiplication_cost (ix86_cost, mode);
>>         break;
>>
>> As the test case marks the checks as "xfail", probably it's good to
>> revisit the costing on mul_highpart to ensure it's not priced more.
>>
>> The attached patch also addressed Richard S.'s review comments on
>> two reformatting hunks.  Is it ok for trunk?
>>
>> BR,
>> Kewen
>> -----
>> gcc/ChangeLog:
>>
>>         * internal-fn.c (first_commutative_argument): Add info for IFN_MULH.
>>         * internal-fn.def (IFN_MULH): New internal function.
>>         * tree-vect-patterns.c (vect_recog_mulhs_pattern): Add support to
>>         recog normal multiply highpart as IFN_MULH.
>>         * config/i386/i386.c (ix86_add_stmt_cost): Adjust for combined
>>         function CFN_MULH.
Uros Bizjak July 15, 2021, 8:23 a.m. UTC | #3
On Thu, Jul 15, 2021 at 10:04 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> Hi Uros,
>
> on 2021/7/15 下午3:17, Uros Bizjak wrote:
> > On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>
> >> on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote:
> >>> on 2021/7/14 下午2:38, Richard Biener wrote:
> >>>> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>>>>
> >>>>> on 2021/7/13 下午8:42, Richard Biener wrote:
> >>>>>> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>>>
> >>>>> I guess the proposed IFN would be directly mapped for [us]mul_highpart?
> >>>>
> >>>> Yes.
> >>>>
> >>>
> >>> Thanks for confirming!  The related patch v2 is attached and the testing
> >>> is ongoing.
> >>>
> >>
> >> It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and
> >> aarch64-linux-gnu.  But on x86_64-redhat-linux there are XPASSes as below:
> >>
> >> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
> >> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
> >> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
> >> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
> >
> > These XFAILs should be removed after your patch.
> >
> I'm curious whether it's intentional not to specify -fno-vect-cost-model
> for this test case.  As noted above, this case is sensitive on how we
> cost mult_highpart.  Without cost modeling, the XFAILs can be removed
> only with this mul_highpart pattern support, no matter how we model it
> (x86 part of this patch exists or not).
>
> > This is PR100696 [1], we want PMULH.W here, so x86 part of the patch
> > is actually not needed.
> >
>
> Thanks for the information!  The justification for the x86 part is that:
> the IFN_MULH essentially covers MULT_HIGHPART_EXPR with mul_highpart
> optab support, i386 port has already customized costing for
> MULT_HIGHPART_EXPR (should mean/involve the case with mul_highpart optab
> support), if we don't follow the same way for IFN_MULH, I'm worried that
> we may cost the IFN_MULH wrongly.  If taking IFN_MULH as normal stmt is
> a right thing (we shouldn't cost it specially), it at least means we
> have to adjust ix86_multiplication_cost for MULT_HIGHPART_EXPR when it
> has direct mul_highpart optab support, I think they should be costed
> consistently.  Does it sound reasonable?

Ah, I was under impression that i386 part was introduced to avoid
generation of PMULHW instructions in the testcases above (to keep
XFAILs). Based on your explanation - yes, the costing function should
be the same. So, the x86 part is OK.

Thanks,
Uros.
Kewen.Lin July 15, 2021, 8:40 a.m. UTC | #4
on 2021/7/15 下午4:04, Kewen.Lin via Gcc-patches wrote:
> Hi Uros,
> 
> on 2021/7/15 下午3:17, Uros Bizjak wrote:
>> On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>
>>> on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote:
>>>> on 2021/7/14 下午2:38, Richard Biener wrote:
>>>>> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>>>
>>>>>> on 2021/7/13 下午8:42, Richard Biener wrote:
>>>>>>> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>>
>>>>>> I guess the proposed IFN would be directly mapped for [us]mul_highpart?
>>>>>
>>>>> Yes.
>>>>>
>>>>
>>>> Thanks for confirming!  The related patch v2 is attached and the testing
>>>> is ongoing.
>>>>
>>>
>>> It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and
>>> aarch64-linux-gnu.  But on x86_64-redhat-linux there are XPASSes as below:
>>>
>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
>>
>> These XFAILs should be removed after your patch.
>>
> I'm curious whether it's intentional not to specify -fno-vect-cost-model
> for this test case.  As noted above, this case is sensitive on how we
> cost mult_highpart.  Without cost modeling, the XFAILs can be removed
> only with this mul_highpart pattern support, no matter how we model it
> (x86 part of this patch exists or not).
> 
>> This is PR100696 [1], we want PMULH.W here, so x86 part of the patch
>> is actually not needed.
>>
> 
> Thanks for the information!  The justification for the x86 part is that:
> the IFN_MULH essentially covers MULT_HIGHPART_EXPR with mul_highpart
> optab support, i386 port has already customized costing for 
> MULT_HIGHPART_EXPR (should mean/involve the case with mul_highpart optab
> support), if we don't follow the same way for IFN_MULH, I'm worried that
> we may cost the IFN_MULH wrongly.  If taking IFN_MULH as normal stmt is
> a right thing (we shouldn't cost it specially), it at least means we
> have to adjust ix86_multiplication_cost for MULT_HIGHPART_EXPR when it
> has direct mul_highpart optab support, I think they should be costed
> consistently.  Does it sound reasonable?
> 

Hi Richard(s),

This possibly inconsistent handling problem seems like a counter example
better to use a new IFN rather than the existing tree_code, it seems hard
to maintain (should remember to keep consistent for its handlings).  ;)
From this perspective, maybe it's better to move backward to use tree_code
and guard it under can_mult_highpart_p == 1 (just like IFN and avoid
costing issue Richi pointed out before)?

What do you think?

BR,
Kewen
Kewen.Lin July 15, 2021, 8:49 a.m. UTC | #5
on 2021/7/15 下午4:23, Uros Bizjak wrote:
> On Thu, Jul 15, 2021 at 10:04 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>
>> Hi Uros,
>>
>> on 2021/7/15 下午3:17, Uros Bizjak wrote:
>>> On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>
>>>> on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote:
>>>>> on 2021/7/14 下午2:38, Richard Biener wrote:
>>>>>> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>>>>
>>>>>>> on 2021/7/13 下午8:42, Richard Biener wrote:
>>>>>>>> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>>>
>>>>>>> I guess the proposed IFN would be directly mapped for [us]mul_highpart?
>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>
>>>>> Thanks for confirming!  The related patch v2 is attached and the testing
>>>>> is ongoing.
>>>>>
>>>>
>>>> It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and
>>>> aarch64-linux-gnu.  But on x86_64-redhat-linux there are XPASSes as below:
>>>>
>>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
>>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
>>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
>>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
>>>
>>> These XFAILs should be removed after your patch.
>>>
>> I'm curious whether it's intentional not to specify -fno-vect-cost-model
>> for this test case.  As noted above, this case is sensitive on how we
>> cost mult_highpart.  Without cost modeling, the XFAILs can be removed
>> only with this mul_highpart pattern support, no matter how we model it
>> (x86 part of this patch exists or not).
>>
>>> This is PR100696 [1], we want PMULH.W here, so x86 part of the patch
>>> is actually not needed.
>>>
>>
>> Thanks for the information!  The justification for the x86 part is that:
>> the IFN_MULH essentially covers MULT_HIGHPART_EXPR with mul_highpart
>> optab support, i386 port has already customized costing for
>> MULT_HIGHPART_EXPR (should mean/involve the case with mul_highpart optab
>> support), if we don't follow the same way for IFN_MULH, I'm worried that
>> we may cost the IFN_MULH wrongly.  If taking IFN_MULH as normal stmt is
>> a right thing (we shouldn't cost it specially), it at least means we
>> have to adjust ix86_multiplication_cost for MULT_HIGHPART_EXPR when it
>> has direct mul_highpart optab support, I think they should be costed
>> consistently.  Does it sound reasonable?
> 
> Ah, I was under impression that i386 part was introduced to avoid
> generation of PMULHW instructions in the testcases above (to keep
> XFAILs). Based on your explanation - yes, the costing function should
> be the same. So, the x86 part is OK.
> 

Thanks!  It does have the effect to keep XFAILs.  ;)  I guess the case
doesn't care about the costing much just like most vectorization cases?
If so, do you want me to remove the xfails with one extra option 
"-fno-vect-cost-model" along with this patch?

BR,
Kewen
Uros Bizjak July 15, 2021, 9:41 a.m. UTC | #6
V čet., 15. jul. 2021 10:49 je oseba Kewen.Lin <linkw@linux.ibm.com>
napisala:

> on 2021/7/15 下午4:23, Uros Bizjak wrote:
> > On Thu, Jul 15, 2021 at 10:04 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>
> >> Hi Uros,
> >>
> >> on 2021/7/15 下午3:17, Uros Bizjak wrote:
> >>> On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>>>
> >>>> on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote:
> >>>>> on 2021/7/14 下午2:38, Richard Biener wrote:
> >>>>>> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin <linkw@linux.ibm.com>
> wrote:
> >>>>>>>
> >>>>>>> on 2021/7/13 下午8:42, Richard Biener wrote:
> >>>>>>>> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin <linkw@linux.ibm.com>
> wrote:
> >>>>>>
> >>>>>>> I guess the proposed IFN would be directly mapped for
> [us]mul_highpart?
> >>>>>>
> >>>>>> Yes.
> >>>>>>
> >>>>>
> >>>>> Thanks for confirming!  The related patch v2 is attached and the
> testing
> >>>>> is ongoing.
> >>>>>
> >>>>
> >>>> It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and
> >>>> aarch64-linux-gnu.  But on x86_64-redhat-linux there are XPASSes as
> below:
> >>>>
> >>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
> >>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
> >>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
> >>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
> >>>
> >>> These XFAILs should be removed after your patch.
> >>>
> >> I'm curious whether it's intentional not to specify -fno-vect-cost-model
> >> for this test case.  As noted above, this case is sensitive on how we
> >> cost mult_highpart.  Without cost modeling, the XFAILs can be removed
> >> only with this mul_highpart pattern support, no matter how we model it
> >> (x86 part of this patch exists or not).
> >>
> >>> This is PR100696 [1], we want PMULH.W here, so x86 part of the patch
> >>> is actually not needed.
> >>>
> >>
> >> Thanks for the information!  The justification for the x86 part is that:
> >> the IFN_MULH essentially covers MULT_HIGHPART_EXPR with mul_highpart
> >> optab support, i386 port has already customized costing for
> >> MULT_HIGHPART_EXPR (should mean/involve the case with mul_highpart optab
> >> support), if we don't follow the same way for IFN_MULH, I'm worried that
> >> we may cost the IFN_MULH wrongly.  If taking IFN_MULH as normal stmt is
> >> a right thing (we shouldn't cost it specially), it at least means we
> >> have to adjust ix86_multiplication_cost for MULT_HIGHPART_EXPR when it
> >> has direct mul_highpart optab support, I think they should be costed
> >> consistently.  Does it sound reasonable?
> >
> > Ah, I was under impression that i386 part was introduced to avoid
> > generation of PMULHW instructions in the testcases above (to keep
> > XFAILs). Based on your explanation - yes, the costing function should
> > be the same. So, the x86 part is OK.
> >
>
> Thanks!  It does have the effect to keep XFAILs.  ;)  I guess the case
> doesn't care about the costing much just like most vectorization cases?
> If so, do you want me to remove the xfails with one extra option
> "-fno-vect-cost-model" along with this patch.
>

Yes, please do so. The testcase cares only about PMULHW generation.

Thanks,
Uros.


> BR,
> Kewen
>
Richard Biener July 15, 2021, 11:58 a.m. UTC | #7
On Thu, Jul 15, 2021 at 10:41 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> on 2021/7/15 下午4:04, Kewen.Lin via Gcc-patches wrote:
> > Hi Uros,
> >
> > on 2021/7/15 下午3:17, Uros Bizjak wrote:
> >> On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>>
> >>> on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote:
> >>>> on 2021/7/14 下午2:38, Richard Biener wrote:
> >>>>> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>>>>>
> >>>>>> on 2021/7/13 下午8:42, Richard Biener wrote:
> >>>>>>> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>>>>
> >>>>>> I guess the proposed IFN would be directly mapped for [us]mul_highpart?
> >>>>>
> >>>>> Yes.
> >>>>>
> >>>>
> >>>> Thanks for confirming!  The related patch v2 is attached and the testing
> >>>> is ongoing.
> >>>>
> >>>
> >>> It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and
> >>> aarch64-linux-gnu.  But on x86_64-redhat-linux there are XPASSes as below:
> >>>
> >>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
> >>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
> >>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
> >>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
> >>
> >> These XFAILs should be removed after your patch.
> >>
> > I'm curious whether it's intentional not to specify -fno-vect-cost-model
> > for this test case.  As noted above, this case is sensitive on how we
> > cost mult_highpart.  Without cost modeling, the XFAILs can be removed
> > only with this mul_highpart pattern support, no matter how we model it
> > (x86 part of this patch exists or not).
> >
> >> This is PR100696 [1], we want PMULH.W here, so x86 part of the patch
> >> is actually not needed.
> >>
> >
> > Thanks for the information!  The justification for the x86 part is that:
> > the IFN_MULH essentially covers MULT_HIGHPART_EXPR with mul_highpart
> > optab support, i386 port has already customized costing for
> > MULT_HIGHPART_EXPR (should mean/involve the case with mul_highpart optab
> > support), if we don't follow the same way for IFN_MULH, I'm worried that
> > we may cost the IFN_MULH wrongly.  If taking IFN_MULH as normal stmt is
> > a right thing (we shouldn't cost it specially), it at least means we
> > have to adjust ix86_multiplication_cost for MULT_HIGHPART_EXPR when it
> > has direct mul_highpart optab support, I think they should be costed
> > consistently.  Does it sound reasonable?
> >
>
> Hi Richard(s),
>
> This possibly inconsistent handling problem seems like a counter example
> better to use a new IFN rather than the existing tree_code, it seems hard
> to maintain (should remember to keep consistent for its handlings).  ;)
> From this perspective, maybe it's better to move backward to use tree_code
> and guard it under can_mult_highpart_p == 1 (just like IFN and avoid
> costing issue Richi pointed out before)?
>
> What do you think?

No, whenever we want to do code generation based on machine
capabilities the canonical way to test for those is to look at optabs
and then it's most natural to keep that 1:1 relation and emit
internal function calls which directly map to supported optabs
instead of going back to some tree codes.

When targets "lie" and provide expanders for something they can
only emulate then they have to compensate in their costing.
But as I understand this isn't the case for x86 here.

Now, in this case we already have the MULT_HIGHPART_EXPR tree,
so yes, it might make sense to use that instead of introducing an
alternate way via the direct internal function.  Somebody decided
that MULT_HIGHPART is generic enough to warrant this - but I
see that expand_mult_highpart can fail unless can_mult_highpart_p
and this is exactly one of the cases we want to avoid - either
we can handle something generally in which case it can be a
tree code or we can't, then it should be 1:1 tied to optabs at best
(mult_highpart has scalar support only for the direct optab,
vector support also for widen_mult).

Richard.

>
> BR,
> Kewen
diff mbox series

Patch

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a93128fa0a4..1dd9108353c 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -22559,6 +22559,9 @@  ix86_add_stmt_cost (class vec_info *vinfo, void *data, int count,
 				   mode == SFmode ? ix86_cost->fmass
 				   : ix86_cost->fmasd);
 	break;
+      case CFN_MULH:
+	stmt_cost = ix86_multiplication_cost (ix86_cost, mode);
+	break;
       default:
 	break;
       }
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index fb8b43d1ce2..b1b4289357c 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -3703,6 +3703,7 @@  first_commutative_argument (internal_fn fn)
     case IFN_FNMS:
     case IFN_AVG_FLOOR:
     case IFN_AVG_CEIL:
+    case IFN_MULH:
     case IFN_MULHS:
     case IFN_MULHRS:
     case IFN_FMIN:
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index c3b8e730960..ed6d7de1680 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -169,6 +169,8 @@  DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_FLOOR, ECF_CONST | ECF_NOTHROW, first,
 DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_CEIL, ECF_CONST | ECF_NOTHROW, first,
 			      savg_ceil, uavg_ceil, binary)
 
+DEF_INTERNAL_SIGNED_OPTAB_FN (MULH, ECF_CONST | ECF_NOTHROW, first,
+			      smul_highpart, umul_highpart, binary)
 DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST | ECF_NOTHROW, first,
 			      smulhs, umulhs, binary)
 DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW, first,
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index b2e7fc2cc7a..ada89d7060b 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -1896,8 +1896,15 @@  vect_recog_over_widening_pattern (vec_info *vinfo,
 
    1) Multiply high with scaling
      TYPE res = ((TYPE) a * (TYPE) b) >> c;
+     Here, c is bitsize (TYPE) / 2 - 1.
+
    2) ... or also with rounding
      TYPE res = (((TYPE) a * (TYPE) b) >> d + 1) >> 1;
+     Here, d is bitsize (TYPE) / 2 - 2.
+
+   3) Normal multiply high
+     TYPE res = ((TYPE) a * (TYPE) b) >> e;
+     Here, e is bitsize (TYPE) / 2.
 
    where only the bottom half of res is used.  */
 
@@ -1942,7 +1949,6 @@  vect_recog_mulhs_pattern (vec_info *vinfo,
   stmt_vec_info mulh_stmt_info;
   tree scale_term;
   internal_fn ifn;
-  unsigned int expect_offset;
 
   /* Check for the presence of the rounding term.  */
   if (gimple_assign_rhs_code (rshift_input_stmt) == PLUS_EXPR)
@@ -1991,25 +1997,37 @@  vect_recog_mulhs_pattern (vec_info *vinfo,
 
       /* Get the scaling term.  */
       scale_term = gimple_assign_rhs2 (plus_input_stmt);
+      /* Check that the scaling factor is correct.  */
+      if (TREE_CODE (scale_term) != INTEGER_CST)
+	return NULL;
+
+      /* Check pattern 2).  */
+      if (wi::to_widest (scale_term) + target_precision + 2
+	  != TYPE_PRECISION (lhs_type))
+	return NULL;
 
-      expect_offset = target_precision + 2;
       ifn = IFN_MULHRS;
     }
   else
     {
       mulh_stmt_info = rshift_input_stmt_info;
       scale_term = gimple_assign_rhs2 (last_stmt);
+      /* Check that the scaling factor is correct.  */
+      if (TREE_CODE (scale_term) != INTEGER_CST)
+	return NULL;
 
-      expect_offset = target_precision + 1;
-      ifn = IFN_MULHS;
+      /* Check for pattern 1).  */
+      if (wi::to_widest (scale_term) + target_precision + 1
+	  == TYPE_PRECISION (lhs_type))
+	ifn = IFN_MULHS;
+      /* Check for pattern 3).  */
+      else if (wi::to_widest (scale_term) + target_precision
+	       == TYPE_PRECISION (lhs_type))
+	ifn = IFN_MULH;
+      else
+	return NULL;
     }
 
-  /* Check that the scaling factor is correct.  */
-  if (TREE_CODE (scale_term) != INTEGER_CST
-      || wi::to_widest (scale_term) + expect_offset
-	   != TYPE_PRECISION (lhs_type))
-    return NULL;
-
   /* Check whether the scaling input term can be seen as two widened
      inputs multiplied together.  */
   vect_unpromoted_value unprom_mult[2];