[Vect] Fix costing for vector shifts

Message ID	e6689a2a-e889-1aa2-eabf-43ccc3d0af52@arm.com
State	New
Headers	show Return-Path: <gcc-patches-return-515341-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:references:in-reply-to :content-type:mime-version; q=dns; s=default; b=Mm70F7FyJU9fEp3I 02y+D1y82fa64MuqctMJuJQJLvjs1EVUK08z9X17ky7UtGxa0TF53B3DMWxKIHu7 6VLNyV2HfAzW5afgRCDDFntXpinJLJRZEI6hJez12XVCVGyTpo6hFY+P7Ug0wFsv tF6WDhzNww0TosXK/wZWdTWoyrw= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; From: Sudakshina Das <Sudi.Das@arm.com> To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org> CC: Richard Sandiford <Richard.Sandiford@arm.com>, Richard Biener <richard.guenther@gmail.com> Subject: Fwd: [PATCH, GCC, Vect] Fix costing for vector shifts Date: Fri, 6 Dec 2019 14:05:05 +0000 Message-ID: <e6689a2a-e889-1aa2-eabf-43ccc3d0af52@arm.com> References: <bd1be471-af66-45dc-e05f-7bd04966f0e7@arm.com> In-Reply-To: <bd1be471-af66-45dc-e05f-7bd04966f0e7@arm.com> Authentication-Results-Original: spf=none (sender IP is ) smtp.mailfrom=Sudi.Das@arm.com; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) Content-Type: multipart/mixed; boundary="_002_e6689a2ae8891aa2eabf43ccc3d0af52armcom_" MIME-Version: 1.0 Original-Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Sudi.Das@arm.com; X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB5EUR03FT029.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 4bea6748-8a99-4c28-edbf-08d77a554b97
Series	[Vect] Fix costing for vector shifts \| expand [Vect] Fix costing for vector shifts

Message ID

e6689a2a-e889-1aa2-eabf-43ccc3d0af52@arm.com

State

New

Headers

DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:cc:subject:date:message-id:references:in-reply-to
	:content-type:mime-version; q=dns; s=default; b=Mm70F7FyJU9fEp3I
	02y+D1y82fa64MuqctMJuJQJLvjs1EVUK08z9X17ky7UtGxa0TF53B3DMWxKIHu7
	6VLNyV2HfAzW5afgRCDDFntXpinJLJRZEI6hJez12XVCVGyTpo6hFY+P7Ug0wFsv
	tF6WDhzNww0TosXK/wZWdTWoyrw=
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
Sender: gcc-patches-owner@gcc.gnu.org
Received-SPF: Pass (protection.outlook.com: domain of arm.com designates
	63.35.35.123 as permitted sender)
	receiver=protection.outlook.com; client-ip=63.35.35.123;
	helo=64aa7808-outbound-1.mta.getcheckrecipient.com; 
From: Sudakshina Das <Sudi.Das@arm.com>
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
CC: Richard Sandiford <Richard.Sandiford@arm.com>,
	Richard Biener	<richard.guenther@gmail.com>
Subject: Fwd: [PATCH, GCC, Vect] Fix costing for vector shifts
Date: Fri, 6 Dec 2019 14:05:05 +0000
Message-ID: <e6689a2a-e889-1aa2-eabf-43ccc3d0af52@arm.com>
References: <bd1be471-af66-45dc-e05f-7bd04966f0e7@arm.com>
In-Reply-To: <bd1be471-af66-45dc-e05f-7bd04966f0e7@arm.com>
Authentication-Results-Original: spf=none (sender IP is )
	smtp.mailfrom=Sudi.Das@arm.com; 
received-spf: None (protection.outlook.com: arm.com does not designate
	permitted sender hosts)
X-Microsoft-Antispam-Message-Info-Original: TbGWdarMKf2rlQD/79cyD4k88cwvJGzU+zT46N1/pCrndybps4NOmOAKZTviZR+dzdWTEDgOvLSFEnj8hX5qW2fwXjXKza6REjI28XdHkCaTPuhD/83EqmUBXQIPcy2M+gewsV75KoZzMNYFadZn18DNhi9UqRFOQ2PKUX3pWS2wGrjrWHfQBrqJ8ORrIZGveBfB48JubCCXCPqne00GI6N590E1CafFPKShDKbv1NzmGWXOz8sa1fE7xUAhaB7METU64s6Gvj59Vg3sQ9b8YX+LC5jwOrCVm2rAGel8PijwYHm6UAUXsSqxFfafcWaNSAr0xCkqf25SLZFBgigPS1fWdaNtGlIYAwZrzeNm0kZb2XGLYd1puEu45tKl998q2JQmHWcYy+tU1QIQtwbLbHvXncGN41MTOFxWqsLnZH7a8TpSLVPgRdsVwtj4MP/LJxpTrnP3YRdAtEiG4OuY+fKjx5XP74X6GIXQHOsa82ZDyT4jDKbMjOHdY5pTW8GW
Content-Type: multipart/mixed;
	boundary="_002_e6689a2ae8891aa2eabf43ccc3d0af52armcom_"
MIME-Version: 1.0
Original-Authentication-Results: spf=none (sender IP is )
	smtp.mailfrom=Sudi.Das@arm.com; 
X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB5EUR03FT029.eop-EUR03.prod.protection.outlook.com
X-MS-Office365-Filtering-Correlation-Id-Prvs: 4bea6748-8a99-4c28-edbf-08d77a554b97
X-IsSubscribed: yes

Series

[Vect] Fix costing for vector shifts | expand

Commit Message

Sudakshina Das Dec. 6, 2019, 2:05 p.m. UTC

Hi

While looking at the vectorization for following example, we realized 
that even though vectorizable_shift function was distinguishing vector 
shifted by vector from vector shifted by scalar, while modeling the cost 
it would always add the cost of building a vector constant despite not 
needing it for vector shifted by scalar.

This patch fixes this by using scalar_shift_arg to determine whether we 
need to build a vector for the second operand or not. This reduces 
prologue cost as shown in the test.

Build and regression tests pass on aarch64-none-elf and 
x86_64-pc-linux-gnu-gcc. This gives a 3.42% boost to 525.x264_r in 
Spec2017 for AArch64.

gcc/ChangeLog:

2019-xx-xx  Sudakshina Das  <sudi.das@arm.com>
	    Richard Sandiford  <richard.sandiford@arm.com>

	* tree-vect-stmt.c (vectorizable_shift): Condition ndts for
	vect_model_simple_cost call on scalar_shift_arg.

gcc/testsuite/ChangeLog:

2019-xx-xx  Sudakshina Das  <sudi.das@arm.com>

	* gcc.dg/vect/vect-shift-5.c: New test.

Is this ok for trunk?

Thanks
Sudi

Comments

Jeff Law Dec. 7, 2019, 5:44 p.m. UTC | #1

On Fri, 2019-12-06 at 14:05 +0000, Sudakshina Das wrote:
> Hi
> 
> While looking at the vectorization for following example, we
> realized 
> that even though vectorizable_shift function was distinguishing
> vector 
> shifted by vector from vector shifted by scalar, while modeling the
> cost 
> it would always add the cost of building a vector constant despite
> not 
> needing it for vector shifted by scalar.
> 
> This patch fixes this by using scalar_shift_arg to determine whether
> we 
> need to build a vector for the second operand or not. This reduces 
> prologue cost as shown in the test.
> 
> Build and regression tests pass on aarch64-none-elf and 
> x86_64-pc-linux-gnu-gcc. This gives a 3.42% boost to 525.x264_r in 
> Spec2017 for AArch64.
> 
> gcc/ChangeLog:
> 
> 2019-xx-xx  Sudakshina Das  <sudi.das@arm.com>
> 	    Richard Sandiford  <richard.sandiford@arm.com>
> 
> 	* tree-vect-stmt.c (vectorizable_shift): Condition ndts for
> 	vect_model_simple_cost call on scalar_shift_arg.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-xx-xx  Sudakshina Das  <sudi.das@arm.com>
> 
> 	* gcc.dg/vect/vect-shift-5.c: New test.
It's a bit borderline, but it's really just twiddling a cost, so OK.

jeff

Sudakshina Das Dec. 9, 2019, 10:23 a.m. UTC | #2

Hi Jeff

On 07/12/2019 17:44, Jeff Law wrote:
> On Fri, 2019-12-06 at 14:05 +0000, Sudakshina Das wrote:
>> Hi
>>
>> While looking at the vectorization for following example, we
>> realized
>> that even though vectorizable_shift function was distinguishing
>> vector
>> shifted by vector from vector shifted by scalar, while modeling the
>> cost
>> it would always add the cost of building a vector constant despite
>> not
>> needing it for vector shifted by scalar.
>>
>> This patch fixes this by using scalar_shift_arg to determine whether
>> we
>> need to build a vector for the second operand or not. This reduces
>> prologue cost as shown in the test.
>>
>> Build and regression tests pass on aarch64-none-elf and
>> x86_64-pc-linux-gnu-gcc. This gives a 3.42% boost to 525.x264_r in
>> Spec2017 for AArch64.
>>
>> gcc/ChangeLog:
>>
>> 2019-xx-xx  Sudakshina Das  <sudi.das@arm.com>
>> 	    Richard Sandiford  <richard.sandiford@arm.com>
>>
>> 	* tree-vect-stmt.c (vectorizable_shift): Condition ndts for
>> 	vect_model_simple_cost call on scalar_shift_arg.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-xx-xx  Sudakshina Das  <sudi.das@arm.com>
>>
>> 	* gcc.dg/vect/vect-shift-5.c: New test.
> It's a bit borderline, but it's really just twiddling a cost, so OK.

Thanks :) Committed as r279114.

Sudi

> 
> jeff
>

Christophe Lyon Dec. 10, 2019, 9:01 a.m. UTC | #3

Hi,

On Mon, 9 Dec 2019 at 11:23, Sudakshina Das <Sudi.Das@arm.com> wrote:
>
> Hi Jeff
>
> On 07/12/2019 17:44, Jeff Law wrote:
> > On Fri, 2019-12-06 at 14:05 +0000, Sudakshina Das wrote:
> >> Hi
> >>
> >> While looking at the vectorization for following example, we
> >> realized
> >> that even though vectorizable_shift function was distinguishing
> >> vector
> >> shifted by vector from vector shifted by scalar, while modeling the
> >> cost
> >> it would always add the cost of building a vector constant despite
> >> not
> >> needing it for vector shifted by scalar.
> >>
> >> This patch fixes this by using scalar_shift_arg to determine whether
> >> we
> >> need to build a vector for the second operand or not. This reduces
> >> prologue cost as shown in the test.
> >>
> >> Build and regression tests pass on aarch64-none-elf and
> >> x86_64-pc-linux-gnu-gcc. This gives a 3.42% boost to 525.x264_r in
> >> Spec2017 for AArch64.
> >>

Looks like you didn't check on arm, where I can see that the new testcase fails:
FAIL: gcc.dg/vect/vect-shift-5.c -flto -ffat-lto-objects
scan-tree-dump vect "vectorizable_shift
===[\\n\\r][^\\n]*prologue_cost = 0"
FAIL: gcc.dg/vect/vect-shift-5.c scan-tree-dump vect
"vectorizable_shift ===[\\n\\r][^\\n]*prologue_cost = 0"

Seen on arm-none-linux-gnueabihf
--with-mode arm
--with-cpu cortex-a9
--with-fpu neon-fp16

Christophe

> >> gcc/ChangeLog:
> >>
> >> 2019-xx-xx  Sudakshina Das  <sudi.das@arm.com>
> >>          Richard Sandiford  <richard.sandiford@arm.com>
> >>
> >>      * tree-vect-stmt.c (vectorizable_shift): Condition ndts for
> >>      vect_model_simple_cost call on scalar_shift_arg.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> 2019-xx-xx  Sudakshina Das  <sudi.das@arm.com>
> >>
> >>      * gcc.dg/vect/vect-shift-5.c: New test.
> > It's a bit borderline, but it's really just twiddling a cost, so OK.
>
> Thanks :) Committed as r279114.
>
> Sudi
>
> >
> > jeff
> >
>

Sudakshina Das Dec. 10, 2019, 10:43 a.m. UTC | #4

Hi Christophe

On 10/12/2019 09:01, Christophe Lyon wrote:
> Hi,
> 
> On Mon, 9 Dec 2019 at 11:23, Sudakshina Das <Sudi.Das@arm.com> wrote:
>>
>> Hi Jeff
>>
>> On 07/12/2019 17:44, Jeff Law wrote:
>>> On Fri, 2019-12-06 at 14:05 +0000, Sudakshina Das wrote:
>>>> Hi
>>>>
>>>> While looking at the vectorization for following example, we
>>>> realized
>>>> that even though vectorizable_shift function was distinguishing
>>>> vector
>>>> shifted by vector from vector shifted by scalar, while modeling the
>>>> cost
>>>> it would always add the cost of building a vector constant despite
>>>> not
>>>> needing it for vector shifted by scalar.
>>>>
>>>> This patch fixes this by using scalar_shift_arg to determine whether
>>>> we
>>>> need to build a vector for the second operand or not. This reduces
>>>> prologue cost as shown in the test.
>>>>
>>>> Build and regression tests pass on aarch64-none-elf and
>>>> x86_64-pc-linux-gnu-gcc. This gives a 3.42% boost to 525.x264_r in
>>>> Spec2017 for AArch64.
>>>>
> 
> Looks like you didn't check on arm, where I can see that the new testcase fails:
> FAIL: gcc.dg/vect/vect-shift-5.c -flto -ffat-lto-objects
> scan-tree-dump vect "vectorizable_shift
> ===[\\n\\r][^\\n]*prologue_cost = 0"
> FAIL: gcc.dg/vect/vect-shift-5.c scan-tree-dump vect
> "vectorizable_shift ===[\\n\\r][^\\n]*prologue_cost = 0"
> 
> Seen on arm-none-linux-gnueabihf
> --with-mode arm
> --with-cpu cortex-a9
> --with-fpu neon-fp16
> 
> Christophe

Thanks for reporting this. There is already a bugzilla report PR92870 
for powerpc that I am looking at. Apologies I couldn't find your email 
address there to add you to the cc list.

Thanks
Sudi

> 
>>>> gcc/ChangeLog:
>>>>
>>>> 2019-xx-xx  Sudakshina Das  <sudi.das@arm.com>
>>>>           Richard Sandiford  <richard.sandiford@arm.com>
>>>>
>>>>       * tree-vect-stmt.c (vectorizable_shift): Condition ndts for
>>>>       vect_model_simple_cost call on scalar_shift_arg.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>> 2019-xx-xx  Sudakshina Das  <sudi.das@arm.com>
>>>>
>>>>       * gcc.dg/vect/vect-shift-5.c: New test.
>>> It's a bit borderline, but it's really just twiddling a cost, so OK.
>>
>> Thanks :) Committed as r279114.
>>
>> Sudi
>>
>>>
>>> jeff
>>>
>>

diff --git a/gcc/testsuite/gcc.dg/vect/vect-shift-5.c b/gcc/testsuite/gcc.dg/vect/vect-shift-5.c
new file mode 100644
index 0000000..c1fd4f2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-shift-5.c
@@ -0,0 +1,19 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_int } */
+
+typedef unsigned int uint32_t;
+typedef short unsigned int uint16_t;
+
+int foo (uint32_t arr[4][4])
+{
+  int sum = 0;
+  for(int i = 0; i < 4; i++)
+    {
+      sum += ((arr[0][i] >> 10) * 20) + ((arr[1][i] >> 11) & 53)
+	     + ((arr[2][i] >> 12) * 7)  + ((arr[3][i] >> 13) ^ 43);
+    }
+    return (((uint16_t)sum) + ((uint32_t)sum >> 16)) >> 1;
+}
+
+/* { dg-final { scan-tree-dump {vectorizable_shift ===[\n\r][^\n]*prologue_cost = 0} "vect" } } */
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 2cb6b15..396ff15 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -5764,7 +5764,8 @@  vectorizable_shift (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
     {
       STMT_VINFO_TYPE (stmt_info) = shift_vec_info_type;
       DUMP_VECT_SCOPE ("vectorizable_shift");
-      vect_model_simple_cost (stmt_info, ncopies, dt, ndts, slp_node, cost_vec);
+      vect_model_simple_cost (stmt_info, ncopies, dt,
+			      scalar_shift_arg ? 1 : ndts, slp_node, cost_vec);
       return true;
     }

[Vect] Fix costing for vector shifts

Commit Message

Comments

Patch