diff mbox series

[Vect] Fix costing for vector shifts

Message ID e6689a2a-e889-1aa2-eabf-43ccc3d0af52@arm.com
State New
Headers show
Series [Vect] Fix costing for vector shifts | expand

Commit Message

Sudakshina Das Dec. 6, 2019, 2:05 p.m. UTC
Hi

While looking at the vectorization for following example, we realized 
that even though vectorizable_shift function was distinguishing vector 
shifted by vector from vector shifted by scalar, while modeling the cost 
it would always add the cost of building a vector constant despite not 
needing it for vector shifted by scalar.

This patch fixes this by using scalar_shift_arg to determine whether we 
need to build a vector for the second operand or not. This reduces 
prologue cost as shown in the test.

Build and regression tests pass on aarch64-none-elf and 
x86_64-pc-linux-gnu-gcc. This gives a 3.42% boost to 525.x264_r in 
Spec2017 for AArch64.

gcc/ChangeLog:

2019-xx-xx  Sudakshina Das  <sudi.das@arm.com>
	    Richard Sandiford  <richard.sandiford@arm.com>

	* tree-vect-stmt.c (vectorizable_shift): Condition ndts for
	vect_model_simple_cost call on scalar_shift_arg.

gcc/testsuite/ChangeLog:

2019-xx-xx  Sudakshina Das  <sudi.das@arm.com>

	* gcc.dg/vect/vect-shift-5.c: New test.

Is this ok for trunk?

Thanks
Sudi

Comments

Jeff Law Dec. 7, 2019, 5:44 p.m. UTC | #1
On Fri, 2019-12-06 at 14:05 +0000, Sudakshina Das wrote:
> Hi
> 
> While looking at the vectorization for following example, we
> realized 
> that even though vectorizable_shift function was distinguishing
> vector 
> shifted by vector from vector shifted by scalar, while modeling the
> cost 
> it would always add the cost of building a vector constant despite
> not 
> needing it for vector shifted by scalar.
> 
> This patch fixes this by using scalar_shift_arg to determine whether
> we 
> need to build a vector for the second operand or not. This reduces 
> prologue cost as shown in the test.
> 
> Build and regression tests pass on aarch64-none-elf and 
> x86_64-pc-linux-gnu-gcc. This gives a 3.42% boost to 525.x264_r in 
> Spec2017 for AArch64.
> 
> gcc/ChangeLog:
> 
> 2019-xx-xx  Sudakshina Das  <sudi.das@arm.com>
> 	    Richard Sandiford  <richard.sandiford@arm.com>
> 
> 	* tree-vect-stmt.c (vectorizable_shift): Condition ndts for
> 	vect_model_simple_cost call on scalar_shift_arg.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-xx-xx  Sudakshina Das  <sudi.das@arm.com>
> 
> 	* gcc.dg/vect/vect-shift-5.c: New test.
It's a bit borderline, but it's really just twiddling a cost, so OK.

jeff
Sudakshina Das Dec. 9, 2019, 10:23 a.m. UTC | #2
Hi Jeff

On 07/12/2019 17:44, Jeff Law wrote:
> On Fri, 2019-12-06 at 14:05 +0000, Sudakshina Das wrote:
>> Hi
>>
>> While looking at the vectorization for following example, we
>> realized
>> that even though vectorizable_shift function was distinguishing
>> vector
>> shifted by vector from vector shifted by scalar, while modeling the
>> cost
>> it would always add the cost of building a vector constant despite
>> not
>> needing it for vector shifted by scalar.
>>
>> This patch fixes this by using scalar_shift_arg to determine whether
>> we
>> need to build a vector for the second operand or not. This reduces
>> prologue cost as shown in the test.
>>
>> Build and regression tests pass on aarch64-none-elf and
>> x86_64-pc-linux-gnu-gcc. This gives a 3.42% boost to 525.x264_r in
>> Spec2017 for AArch64.
>>
>> gcc/ChangeLog:
>>
>> 2019-xx-xx  Sudakshina Das  <sudi.das@arm.com>
>> 	    Richard Sandiford  <richard.sandiford@arm.com>
>>
>> 	* tree-vect-stmt.c (vectorizable_shift): Condition ndts for
>> 	vect_model_simple_cost call on scalar_shift_arg.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-xx-xx  Sudakshina Das  <sudi.das@arm.com>
>>
>> 	* gcc.dg/vect/vect-shift-5.c: New test.
> It's a bit borderline, but it's really just twiddling a cost, so OK.

Thanks :) Committed as r279114.

Sudi

> 
> jeff
>
Christophe Lyon Dec. 10, 2019, 9:01 a.m. UTC | #3
Hi,

On Mon, 9 Dec 2019 at 11:23, Sudakshina Das <Sudi.Das@arm.com> wrote:
>
> Hi Jeff
>
> On 07/12/2019 17:44, Jeff Law wrote:
> > On Fri, 2019-12-06 at 14:05 +0000, Sudakshina Das wrote:
> >> Hi
> >>
> >> While looking at the vectorization for following example, we
> >> realized
> >> that even though vectorizable_shift function was distinguishing
> >> vector
> >> shifted by vector from vector shifted by scalar, while modeling the
> >> cost
> >> it would always add the cost of building a vector constant despite
> >> not
> >> needing it for vector shifted by scalar.
> >>
> >> This patch fixes this by using scalar_shift_arg to determine whether
> >> we
> >> need to build a vector for the second operand or not. This reduces
> >> prologue cost as shown in the test.
> >>
> >> Build and regression tests pass on aarch64-none-elf and
> >> x86_64-pc-linux-gnu-gcc. This gives a 3.42% boost to 525.x264_r in
> >> Spec2017 for AArch64.
> >>

Looks like you didn't check on arm, where I can see that the new testcase fails:
FAIL: gcc.dg/vect/vect-shift-5.c -flto -ffat-lto-objects
scan-tree-dump vect "vectorizable_shift
===[\\n\\r][^\\n]*prologue_cost = 0"
FAIL: gcc.dg/vect/vect-shift-5.c scan-tree-dump vect
"vectorizable_shift ===[\\n\\r][^\\n]*prologue_cost = 0"

Seen on arm-none-linux-gnueabihf
--with-mode arm
--with-cpu cortex-a9
--with-fpu neon-fp16

Christophe

> >> gcc/ChangeLog:
> >>
> >> 2019-xx-xx  Sudakshina Das  <sudi.das@arm.com>
> >>          Richard Sandiford  <richard.sandiford@arm.com>
> >>
> >>      * tree-vect-stmt.c (vectorizable_shift): Condition ndts for
> >>      vect_model_simple_cost call on scalar_shift_arg.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> 2019-xx-xx  Sudakshina Das  <sudi.das@arm.com>
> >>
> >>      * gcc.dg/vect/vect-shift-5.c: New test.
> > It's a bit borderline, but it's really just twiddling a cost, so OK.
>
> Thanks :) Committed as r279114.
>
> Sudi
>
> >
> > jeff
> >
>
Sudakshina Das Dec. 10, 2019, 10:43 a.m. UTC | #4
Hi Christophe

On 10/12/2019 09:01, Christophe Lyon wrote:
> Hi,
> 
> On Mon, 9 Dec 2019 at 11:23, Sudakshina Das <Sudi.Das@arm.com> wrote:
>>
>> Hi Jeff
>>
>> On 07/12/2019 17:44, Jeff Law wrote:
>>> On Fri, 2019-12-06 at 14:05 +0000, Sudakshina Das wrote:
>>>> Hi
>>>>
>>>> While looking at the vectorization for following example, we
>>>> realized
>>>> that even though vectorizable_shift function was distinguishing
>>>> vector
>>>> shifted by vector from vector shifted by scalar, while modeling the
>>>> cost
>>>> it would always add the cost of building a vector constant despite
>>>> not
>>>> needing it for vector shifted by scalar.
>>>>
>>>> This patch fixes this by using scalar_shift_arg to determine whether
>>>> we
>>>> need to build a vector for the second operand or not. This reduces
>>>> prologue cost as shown in the test.
>>>>
>>>> Build and regression tests pass on aarch64-none-elf and
>>>> x86_64-pc-linux-gnu-gcc. This gives a 3.42% boost to 525.x264_r in
>>>> Spec2017 for AArch64.
>>>>
> 
> Looks like you didn't check on arm, where I can see that the new testcase fails:
> FAIL: gcc.dg/vect/vect-shift-5.c -flto -ffat-lto-objects
> scan-tree-dump vect "vectorizable_shift
> ===[\\n\\r][^\\n]*prologue_cost = 0"
> FAIL: gcc.dg/vect/vect-shift-5.c scan-tree-dump vect
> "vectorizable_shift ===[\\n\\r][^\\n]*prologue_cost = 0"
> 
> Seen on arm-none-linux-gnueabihf
> --with-mode arm
> --with-cpu cortex-a9
> --with-fpu neon-fp16
> 
> Christophe

Thanks for reporting this. There is already a bugzilla report PR92870 
for powerpc that I am looking at. Apologies I couldn't find your email 
address there to add you to the cc list.

Thanks
Sudi

> 
>>>> gcc/ChangeLog:
>>>>
>>>> 2019-xx-xx  Sudakshina Das  <sudi.das@arm.com>
>>>>           Richard Sandiford  <richard.sandiford@arm.com>
>>>>
>>>>       * tree-vect-stmt.c (vectorizable_shift): Condition ndts for
>>>>       vect_model_simple_cost call on scalar_shift_arg.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>> 2019-xx-xx  Sudakshina Das  <sudi.das@arm.com>
>>>>
>>>>       * gcc.dg/vect/vect-shift-5.c: New test.
>>> It's a bit borderline, but it's really just twiddling a cost, so OK.
>>
>> Thanks :) Committed as r279114.
>>
>> Sudi
>>
>>>
>>> jeff
>>>
>>
diff mbox series

Patch

diff --git a/gcc/testsuite/gcc.dg/vect/vect-shift-5.c b/gcc/testsuite/gcc.dg/vect/vect-shift-5.c
new file mode 100644
index 0000000..c1fd4f2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-shift-5.c
@@ -0,0 +1,19 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_int } */
+
+typedef unsigned int uint32_t;
+typedef short unsigned int uint16_t;
+
+int foo (uint32_t arr[4][4])
+{
+  int sum = 0;
+  for(int i = 0; i < 4; i++)
+    {
+      sum += ((arr[0][i] >> 10) * 20) + ((arr[1][i] >> 11) & 53)
+	     + ((arr[2][i] >> 12) * 7)  + ((arr[3][i] >> 13) ^ 43);
+    }
+    return (((uint16_t)sum) + ((uint32_t)sum >> 16)) >> 1;
+}
+
+/* { dg-final { scan-tree-dump {vectorizable_shift ===[\n\r][^\n]*prologue_cost = 0} "vect" } } */
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 2cb6b15..396ff15 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -5764,7 +5764,8 @@  vectorizable_shift (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
     {
       STMT_VINFO_TYPE (stmt_info) = shift_vec_info_type;
       DUMP_VECT_SCOPE ("vectorizable_shift");
-      vect_model_simple_cost (stmt_info, ncopies, dt, ndts, slp_node, cost_vec);
+      vect_model_simple_cost (stmt_info, ncopies, dt,
+			      scalar_shift_arg ? 1 : ndts, slp_node, cost_vec);
       return true;
     }