Message ID | e6689a2a-e889-1aa2-eabf-43ccc3d0af52@arm.com |
---|---|
State | New |
Headers | show |
Series | [Vect] Fix costing for vector shifts | expand |
On Fri, 2019-12-06 at 14:05 +0000, Sudakshina Das wrote: > Hi > > While looking at the vectorization for following example, we > realized > that even though vectorizable_shift function was distinguishing > vector > shifted by vector from vector shifted by scalar, while modeling the > cost > it would always add the cost of building a vector constant despite > not > needing it for vector shifted by scalar. > > This patch fixes this by using scalar_shift_arg to determine whether > we > need to build a vector for the second operand or not. This reduces > prologue cost as shown in the test. > > Build and regression tests pass on aarch64-none-elf and > x86_64-pc-linux-gnu-gcc. This gives a 3.42% boost to 525.x264_r in > Spec2017 for AArch64. > > gcc/ChangeLog: > > 2019-xx-xx Sudakshina Das <sudi.das@arm.com> > Richard Sandiford <richard.sandiford@arm.com> > > * tree-vect-stmt.c (vectorizable_shift): Condition ndts for > vect_model_simple_cost call on scalar_shift_arg. > > gcc/testsuite/ChangeLog: > > 2019-xx-xx Sudakshina Das <sudi.das@arm.com> > > * gcc.dg/vect/vect-shift-5.c: New test. It's a bit borderline, but it's really just twiddling a cost, so OK. jeff
Hi Jeff On 07/12/2019 17:44, Jeff Law wrote: > On Fri, 2019-12-06 at 14:05 +0000, Sudakshina Das wrote: >> Hi >> >> While looking at the vectorization for following example, we >> realized >> that even though vectorizable_shift function was distinguishing >> vector >> shifted by vector from vector shifted by scalar, while modeling the >> cost >> it would always add the cost of building a vector constant despite >> not >> needing it for vector shifted by scalar. >> >> This patch fixes this by using scalar_shift_arg to determine whether >> we >> need to build a vector for the second operand or not. This reduces >> prologue cost as shown in the test. >> >> Build and regression tests pass on aarch64-none-elf and >> x86_64-pc-linux-gnu-gcc. This gives a 3.42% boost to 525.x264_r in >> Spec2017 for AArch64. >> >> gcc/ChangeLog: >> >> 2019-xx-xx Sudakshina Das <sudi.das@arm.com> >> Richard Sandiford <richard.sandiford@arm.com> >> >> * tree-vect-stmt.c (vectorizable_shift): Condition ndts for >> vect_model_simple_cost call on scalar_shift_arg. >> >> gcc/testsuite/ChangeLog: >> >> 2019-xx-xx Sudakshina Das <sudi.das@arm.com> >> >> * gcc.dg/vect/vect-shift-5.c: New test. > It's a bit borderline, but it's really just twiddling a cost, so OK. Thanks :) Committed as r279114. Sudi > > jeff >
Hi, On Mon, 9 Dec 2019 at 11:23, Sudakshina Das <Sudi.Das@arm.com> wrote: > > Hi Jeff > > On 07/12/2019 17:44, Jeff Law wrote: > > On Fri, 2019-12-06 at 14:05 +0000, Sudakshina Das wrote: > >> Hi > >> > >> While looking at the vectorization for following example, we > >> realized > >> that even though vectorizable_shift function was distinguishing > >> vector > >> shifted by vector from vector shifted by scalar, while modeling the > >> cost > >> it would always add the cost of building a vector constant despite > >> not > >> needing it for vector shifted by scalar. > >> > >> This patch fixes this by using scalar_shift_arg to determine whether > >> we > >> need to build a vector for the second operand or not. This reduces > >> prologue cost as shown in the test. > >> > >> Build and regression tests pass on aarch64-none-elf and > >> x86_64-pc-linux-gnu-gcc. This gives a 3.42% boost to 525.x264_r in > >> Spec2017 for AArch64. > >> Looks like you didn't check on arm, where I can see that the new testcase fails: FAIL: gcc.dg/vect/vect-shift-5.c -flto -ffat-lto-objects scan-tree-dump vect "vectorizable_shift ===[\\n\\r][^\\n]*prologue_cost = 0" FAIL: gcc.dg/vect/vect-shift-5.c scan-tree-dump vect "vectorizable_shift ===[\\n\\r][^\\n]*prologue_cost = 0" Seen on arm-none-linux-gnueabihf --with-mode arm --with-cpu cortex-a9 --with-fpu neon-fp16 Christophe > >> gcc/ChangeLog: > >> > >> 2019-xx-xx Sudakshina Das <sudi.das@arm.com> > >> Richard Sandiford <richard.sandiford@arm.com> > >> > >> * tree-vect-stmt.c (vectorizable_shift): Condition ndts for > >> vect_model_simple_cost call on scalar_shift_arg. > >> > >> gcc/testsuite/ChangeLog: > >> > >> 2019-xx-xx Sudakshina Das <sudi.das@arm.com> > >> > >> * gcc.dg/vect/vect-shift-5.c: New test. > > It's a bit borderline, but it's really just twiddling a cost, so OK. > > Thanks :) Committed as r279114. > > Sudi > > > > > jeff > > >
Hi Christophe On 10/12/2019 09:01, Christophe Lyon wrote: > Hi, > > On Mon, 9 Dec 2019 at 11:23, Sudakshina Das <Sudi.Das@arm.com> wrote: >> >> Hi Jeff >> >> On 07/12/2019 17:44, Jeff Law wrote: >>> On Fri, 2019-12-06 at 14:05 +0000, Sudakshina Das wrote: >>>> Hi >>>> >>>> While looking at the vectorization for following example, we >>>> realized >>>> that even though vectorizable_shift function was distinguishing >>>> vector >>>> shifted by vector from vector shifted by scalar, while modeling the >>>> cost >>>> it would always add the cost of building a vector constant despite >>>> not >>>> needing it for vector shifted by scalar. >>>> >>>> This patch fixes this by using scalar_shift_arg to determine whether >>>> we >>>> need to build a vector for the second operand or not. This reduces >>>> prologue cost as shown in the test. >>>> >>>> Build and regression tests pass on aarch64-none-elf and >>>> x86_64-pc-linux-gnu-gcc. This gives a 3.42% boost to 525.x264_r in >>>> Spec2017 for AArch64. >>>> > > Looks like you didn't check on arm, where I can see that the new testcase fails: > FAIL: gcc.dg/vect/vect-shift-5.c -flto -ffat-lto-objects > scan-tree-dump vect "vectorizable_shift > ===[\\n\\r][^\\n]*prologue_cost = 0" > FAIL: gcc.dg/vect/vect-shift-5.c scan-tree-dump vect > "vectorizable_shift ===[\\n\\r][^\\n]*prologue_cost = 0" > > Seen on arm-none-linux-gnueabihf > --with-mode arm > --with-cpu cortex-a9 > --with-fpu neon-fp16 > > Christophe Thanks for reporting this. There is already a bugzilla report PR92870 for powerpc that I am looking at. Apologies I couldn't find your email address there to add you to the cc list. Thanks Sudi > >>>> gcc/ChangeLog: >>>> >>>> 2019-xx-xx Sudakshina Das <sudi.das@arm.com> >>>> Richard Sandiford <richard.sandiford@arm.com> >>>> >>>> * tree-vect-stmt.c (vectorizable_shift): Condition ndts for >>>> vect_model_simple_cost call on scalar_shift_arg. >>>> >>>> gcc/testsuite/ChangeLog: >>>> >>>> 2019-xx-xx Sudakshina Das <sudi.das@arm.com> >>>> >>>> * gcc.dg/vect/vect-shift-5.c: New test. >>> It's a bit borderline, but it's really just twiddling a cost, so OK. >> >> Thanks :) Committed as r279114. >> >> Sudi >> >>> >>> jeff >>> >>
diff --git a/gcc/testsuite/gcc.dg/vect/vect-shift-5.c b/gcc/testsuite/gcc.dg/vect/vect-shift-5.c new file mode 100644 index 0000000..c1fd4f2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-shift-5.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_shift } */ +/* { dg-require-effective-target vect_int } */ + +typedef unsigned int uint32_t; +typedef short unsigned int uint16_t; + +int foo (uint32_t arr[4][4]) +{ + int sum = 0; + for(int i = 0; i < 4; i++) + { + sum += ((arr[0][i] >> 10) * 20) + ((arr[1][i] >> 11) & 53) + + ((arr[2][i] >> 12) * 7) + ((arr[3][i] >> 13) ^ 43); + } + return (((uint16_t)sum) + ((uint32_t)sum >> 16)) >> 1; +} + +/* { dg-final { scan-tree-dump {vectorizable_shift ===[\n\r][^\n]*prologue_cost = 0} "vect" } } */ diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index 2cb6b15..396ff15 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -5764,7 +5764,8 @@ vectorizable_shift (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, { STMT_VINFO_TYPE (stmt_info) = shift_vec_info_type; DUMP_VECT_SCOPE ("vectorizable_shift"); - vect_model_simple_cost (stmt_info, ncopies, dt, ndts, slp_node, cost_vec); + vect_model_simple_cost (stmt_info, ncopies, dt, + scalar_shift_arg ? 1 : ndts, slp_node, cost_vec); return true; }