Message ID | CY1PR1201MB1098A879C8612260029E00CC8F490@CY1PR1201MB1098.namprd12.prod.outlook.com |
---|---|
State | New |
Headers | show |
On Wed, May 18, 2016 at 5:29 PM, Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com> wrote: > Hi Richard, > >> -----Original Message----- >> From: Richard Biener [mailto:richard.guenther@gmail.com] >> Sent: Tuesday, May 17, 2016 5:40 PM >> To: Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com> >> Cc: gcc-patches@gcc.gnu.org >> Subject: Re: [Patch V2] Fix SLP PR58135. >> >> On Tue, May 17, 2016 at 1:56 PM, Kumar, Venkataramanan >> <Venkataramanan.Kumar@amd.com> wrote: >> > Hi Richard, >> > >> > I created the patch by passing -b option to git. Now the patch is more >> readable. >> > >> > As per your suggestion I tried to fix the PR by splitting the SLP store group at >> vector boundary after the SLP tree is built. >> > >> > Boot strap PASSED on x86_64. >> > Checked the patch with check_GNU_style.sh. >> > >> > The gfortran.dg/pr46519-1.f test now does SLP vectorization. Hence it >> generated 2 more vzeroupper. >> > As recommended I adjusted the test case by adding -fno-tree-slp-vectorize >> to make it as expected after loop vectorization. >> > >> > The following tests are now passing. >> > >> > ------ Snip----- >> > Tests that now work, but didn't before: >> > >> > gcc.dg/vect/bb-slp-19.c -flto -ffat-lto-objects scan-tree-dump-times >> > slp2 "basic block vectorized" 1 >> > >> > gcc.dg/vect/bb-slp-19.c scan-tree-dump-times slp2 "basic block >> > vectorized" 1 >> > >> > New tests that PASS: >> > >> > gcc.dg/vect/pr58135.c (test for excess errors) gcc.dg/vect/pr58135.c >> > -flto -ffat-lto-objects (test for excess errors) >> > >> > ------ Snip----- >> > >> > ChangeLog >> > >> > 2016-05-14 Venkataramanan Kumar >> <Venkataramanan.kumar@amd.com> >> > PR tree-optimization/58135 >> > * tree-vect-slp.c: When group size is not multiple of vector size, >> > allow splitting of store group at vector boundary. >> > >> > Test suite ChangeLog >> > 2016-05-14 Venkataramanan Kumar >> <Venkataramanan.kumar@amd.com> >> > * gcc.dg/vect/bb-slp-19.c: Remove XFAIL. >> > * gcc.dg/vect/pr58135.c: Add new. >> > * gfortran.dg/pr46519-1.f: Adjust test case. >> > >> > The attached patch Ok for trunk? >> >> >> Please avoid the excessive vertical space around the vect_build_slp_tree call. > Yes fixed in the attached patch. >> >> + /* Calculate the unrolling factor. */ >> + unrolling_factor = least_common_multiple >> + (nunits, group_size) / group_size; >> ... >> + else >> { >> /* Calculate the unrolling factor based on the smallest type. */ >> if (max_nunits > nunits) >> - unrolling_factor = least_common_multiple (max_nunits, group_size) >> - / group_size; >> + unrolling_factor >> + = least_common_multiple (max_nunits, >> + group_size)/group_size; >> >> please compute the "correct" unroll factor immediately and move the >> "unrolling of BB required" error into the if() case by post-poning the nunits < >> group_size check (and use max_nunits here). >> > Yes fixed in the attached patch. > >> + if (is_a <bb_vec_info> (vinfo) >> + && nunits < group_size >> + && unrolling_factor != 1 >> + && is_a <bb_vec_info> (vinfo)) >> + { >> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >> + "Build SLP failed: store group " >> + "size not a multiple of the vector size " >> + "in basic block SLP\n"); >> + /* Fatal mismatch. */ >> + matches[nunits] = false; >> >> this is too pessimistic - you want to add the extra 'false' at group_size / >> max_nunits * max_nunits. > Yes fixed in attached patch. > >> >> It looks like you leak 'node' in the if () path as well. You need >> >> vect_free_slp_tree (node); >> loads.release (); >> >> thus treat it as a failure case. > > Yes fixed. I added an else part before scalar_stmts.release call for the case when SLP tree is not built. This avoids double freeing. > Bootstrapped and reg tested on X86_64. > > Ok for trunk ? + /*Calculate the unrolling factor based on the smallest type. */ + unrolling_factor Space after /* missing. + unrolling_factor + = least_common_multiple (max_nunits, group_size)/group_size; spaces before/after the '/' + if (max_nunits > nunits + && unrolling_factor != 1 + && is_a <bb_vec_info> (vinfo)) { if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "Build SLP failed: unrolling required in basic" - " block SLP\n"); + dump_printf_loc (MSG_MISSED_OPTIMIZATION, + vect_location, + "Build SLP failed: unrolling " + "required in basic block SLP\n"); vect_free_slp_tree (node); loads.release (); return false; } + if (is_a <bb_vec_info> (vinfo) + && max_nunits < group_size + && unrolling_factor != 1) please merge these. I think you have a not covered case of max_nunits == nunits, max_nunits > group_size. So do if (unrolling_factor != 1 && is_a <...>) { if (max_nunits >= group_size) { first error } ... signal fatal mismatch instead... } else { Ok with that change. Thanks, Richard. >> >> Thanks, >> Richard. >> >> > Regards, >> > Venkat. >> > > > Regards, > Venkat.
Hi Richard, > -----Original Message----- > From: Richard Biener [mailto:richard.guenther@gmail.com] > Sent: Thursday, May 19, 2016 4:08 PM > To: Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com> > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [Patch V2] Fix SLP PR58135. > > On Wed, May 18, 2016 at 5:29 PM, Kumar, Venkataramanan > <Venkataramanan.Kumar@amd.com> wrote: > > Hi Richard, > > > >> -----Original Message----- > >> From: Richard Biener [mailto:richard.guenther@gmail.com] > >> Sent: Tuesday, May 17, 2016 5:40 PM > >> To: Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com> > >> Cc: gcc-patches@gcc.gnu.org > >> Subject: Re: [Patch V2] Fix SLP PR58135. > >> > >> On Tue, May 17, 2016 at 1:56 PM, Kumar, Venkataramanan > >> <Venkataramanan.Kumar@amd.com> wrote: > >> > Hi Richard, > >> > > >> > I created the patch by passing -b option to git. Now the patch is > >> > more > >> readable. > >> > > >> > As per your suggestion I tried to fix the PR by splitting the SLP > >> > store group at > >> vector boundary after the SLP tree is built. > >> > > >> > Boot strap PASSED on x86_64. > >> > Checked the patch with check_GNU_style.sh. > >> > > >> > The gfortran.dg/pr46519-1.f test now does SLP vectorization. Hence > >> > it > >> generated 2 more vzeroupper. > >> > As recommended I adjusted the test case by adding > >> > -fno-tree-slp-vectorize > >> to make it as expected after loop vectorization. > >> > > >> > The following tests are now passing. > >> > > >> > ------ Snip----- > >> > Tests that now work, but didn't before: > >> > > >> > gcc.dg/vect/bb-slp-19.c -flto -ffat-lto-objects > >> > scan-tree-dump-times > >> > slp2 "basic block vectorized" 1 > >> > > >> > gcc.dg/vect/bb-slp-19.c scan-tree-dump-times slp2 "basic block > >> > vectorized" 1 > >> > > >> > New tests that PASS: > >> > > >> > gcc.dg/vect/pr58135.c (test for excess errors) > >> > gcc.dg/vect/pr58135.c -flto -ffat-lto-objects (test for excess > >> > errors) > >> > > >> > ------ Snip----- > >> > > >> > ChangeLog > >> > > >> > 2016-05-14 Venkataramanan Kumar > >> <Venkataramanan.kumar@amd.com> > >> > PR tree-optimization/58135 > >> > * tree-vect-slp.c: When group size is not multiple of vector size, > >> > allow splitting of store group at vector boundary. > >> > > >> > Test suite ChangeLog > >> > 2016-05-14 Venkataramanan Kumar > >> <Venkataramanan.kumar@amd.com> > >> > * gcc.dg/vect/bb-slp-19.c: Remove XFAIL. > >> > * gcc.dg/vect/pr58135.c: Add new. > >> > * gfortran.dg/pr46519-1.f: Adjust test case. > >> > > >> > The attached patch Ok for trunk? > >> > >> > >> Please avoid the excessive vertical space around the vect_build_slp_tree > call. > > Yes fixed in the attached patch. > >> > >> + /* Calculate the unrolling factor. */ > >> + unrolling_factor = least_common_multiple > >> + (nunits, group_size) / group_size; > >> ... > >> + else > >> { > >> /* Calculate the unrolling factor based on the smallest type. */ > >> if (max_nunits > nunits) > >> - unrolling_factor = least_common_multiple (max_nunits, group_size) > >> - / group_size; > >> + unrolling_factor > >> + = least_common_multiple (max_nunits, > >> + group_size)/group_size; > >> > >> please compute the "correct" unroll factor immediately and move the > >> "unrolling of BB required" error into the if() case by post-poning > >> the nunits < group_size check (and use max_nunits here). > >> > > Yes fixed in the attached patch. > > > >> + if (is_a <bb_vec_info> (vinfo) > >> + && nunits < group_size > >> + && unrolling_factor != 1 > >> + && is_a <bb_vec_info> (vinfo)) > >> + { > >> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > >> + "Build SLP failed: store group " > >> + "size not a multiple of the vector size " > >> + "in basic block SLP\n"); > >> + /* Fatal mismatch. */ > >> + matches[nunits] = false; > >> > >> this is too pessimistic - you want to add the extra 'false' at > >> group_size / max_nunits * max_nunits. > > Yes fixed in attached patch. > > > >> > >> It looks like you leak 'node' in the if () path as well. You need > >> > >> vect_free_slp_tree (node); > >> loads.release (); > >> > >> thus treat it as a failure case. > > > > Yes fixed. I added an else part before scalar_stmts.release call for the case > when SLP tree is not built. This avoids double freeing. > > Bootstrapped and reg tested on X86_64. > > > > Ok for trunk ? > > + /*Calculate the unrolling factor based on the smallest type. */ > + unrolling_factor > > Space after /* missing. > > + unrolling_factor > + = least_common_multiple (max_nunits, group_size)/group_size; > > spaces before/after the '/' > > + if (max_nunits > nunits > + && unrolling_factor != 1 > + && is_a <bb_vec_info> (vinfo)) > { > if (dump_enabled_p ()) > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > - "Build SLP failed: unrolling required in basic" > - " block SLP\n"); > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, > + vect_location, > + "Build SLP failed: unrolling " > + "required in basic block SLP\n"); > vect_free_slp_tree (node); > loads.release (); > return false; > } > > + if (is_a <bb_vec_info> (vinfo) > + && max_nunits < group_size > + && unrolling_factor != 1) > > please merge these. I think you have a not covered case of max_nunits == > nunits, max_nunits > group_size. > So do > > if (unrolling_factor != 1 > && is_a <...>) > { > if (max_nunits >= group_size) > { > first error > } > ... signal fatal mismatch instead... > } > else > { > > Ok with that change. Thank you for the review. I updated and patch and up streamed it to trunk. Ref: https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=236582 Regards, Venkat. > > Thanks, > Richard. > > >> > >> Thanks, > >> Richard. > >> > >> > Regards, > >> > Venkat. > >> > > > > > Regards, > > Venkat.
Hi Venkat, On 23 May 2016 at 11:54, Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com> wrote: > Hi Richard, > >> -----Original Message----- >> From: Richard Biener [mailto:richard.guenther@gmail.com] >> Sent: Thursday, May 19, 2016 4:08 PM >> To: Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com> >> Cc: gcc-patches@gcc.gnu.org >> Subject: Re: [Patch V2] Fix SLP PR58135. >> >> On Wed, May 18, 2016 at 5:29 PM, Kumar, Venkataramanan >> <Venkataramanan.Kumar@amd.com> wrote: >> > Hi Richard, >> > >> >> -----Original Message----- >> >> From: Richard Biener [mailto:richard.guenther@gmail.com] >> >> Sent: Tuesday, May 17, 2016 5:40 PM >> >> To: Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com> >> >> Cc: gcc-patches@gcc.gnu.org >> >> Subject: Re: [Patch V2] Fix SLP PR58135. >> >> >> >> On Tue, May 17, 2016 at 1:56 PM, Kumar, Venkataramanan >> >> <Venkataramanan.Kumar@amd.com> wrote: >> >> > Hi Richard, >> >> > >> >> > I created the patch by passing -b option to git. Now the patch is >> >> > more >> >> readable. >> >> > >> >> > As per your suggestion I tried to fix the PR by splitting the SLP >> >> > store group at >> >> vector boundary after the SLP tree is built. >> >> > >> >> > Boot strap PASSED on x86_64. >> >> > Checked the patch with check_GNU_style.sh. >> >> > >> >> > The gfortran.dg/pr46519-1.f test now does SLP vectorization. Hence >> >> > it >> >> generated 2 more vzeroupper. >> >> > As recommended I adjusted the test case by adding >> >> > -fno-tree-slp-vectorize >> >> to make it as expected after loop vectorization. >> >> > >> >> > The following tests are now passing. >> >> > >> >> > ------ Snip----- >> >> > Tests that now work, but didn't before: >> >> > >> >> > gcc.dg/vect/bb-slp-19.c -flto -ffat-lto-objects >> >> > scan-tree-dump-times >> >> > slp2 "basic block vectorized" 1 >> >> > >> >> > gcc.dg/vect/bb-slp-19.c scan-tree-dump-times slp2 "basic block >> >> > vectorized" 1 >> >> > >> >> > New tests that PASS: >> >> > >> >> > gcc.dg/vect/pr58135.c (test for excess errors) >> >> > gcc.dg/vect/pr58135.c -flto -ffat-lto-objects (test for excess >> >> > errors) >> >> > >> >> > ------ Snip----- >> >> > >> >> > ChangeLog >> >> > >> >> > 2016-05-14 Venkataramanan Kumar >> >> <Venkataramanan.kumar@amd.com> >> >> > PR tree-optimization/58135 >> >> > * tree-vect-slp.c: When group size is not multiple of vector size, >> >> > allow splitting of store group at vector boundary. >> >> > >> >> > Test suite ChangeLog >> >> > 2016-05-14 Venkataramanan Kumar >> >> <Venkataramanan.kumar@amd.com> >> >> > * gcc.dg/vect/bb-slp-19.c: Remove XFAIL. >> >> > * gcc.dg/vect/pr58135.c: Add new. >> >> > * gfortran.dg/pr46519-1.f: Adjust test case. >> >> > >> >> > The attached patch Ok for trunk? >> >> >> >> >> >> Please avoid the excessive vertical space around the vect_build_slp_tree >> call. >> > Yes fixed in the attached patch. >> >> >> >> + /* Calculate the unrolling factor. */ >> >> + unrolling_factor = least_common_multiple >> >> + (nunits, group_size) / group_size; >> >> ... >> >> + else >> >> { >> >> /* Calculate the unrolling factor based on the smallest type. */ >> >> if (max_nunits > nunits) >> >> - unrolling_factor = least_common_multiple (max_nunits, group_size) >> >> - / group_size; >> >> + unrolling_factor >> >> + = least_common_multiple (max_nunits, >> >> + group_size)/group_size; >> >> >> >> please compute the "correct" unroll factor immediately and move the >> >> "unrolling of BB required" error into the if() case by post-poning >> >> the nunits < group_size check (and use max_nunits here). >> >> >> > Yes fixed in the attached patch. >> > >> >> + if (is_a <bb_vec_info> (vinfo) >> >> + && nunits < group_size >> >> + && unrolling_factor != 1 >> >> + && is_a <bb_vec_info> (vinfo)) >> >> + { >> >> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >> >> + "Build SLP failed: store group " >> >> + "size not a multiple of the vector size " >> >> + "in basic block SLP\n"); >> >> + /* Fatal mismatch. */ >> >> + matches[nunits] = false; >> >> >> >> this is too pessimistic - you want to add the extra 'false' at >> >> group_size / max_nunits * max_nunits. >> > Yes fixed in attached patch. >> > >> >> >> >> It looks like you leak 'node' in the if () path as well. You need >> >> >> >> vect_free_slp_tree (node); >> >> loads.release (); >> >> >> >> thus treat it as a failure case. >> > >> > Yes fixed. I added an else part before scalar_stmts.release call for the case >> when SLP tree is not built. This avoids double freeing. >> > Bootstrapped and reg tested on X86_64. >> > This patch is causing regressions on armeb: http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/236591/report-build-info.html The following fortran tests now fail at runtime: gfortran.dg/c_f_pointer_logical.f03 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test gfortran.dg/c_f_pointer_logical.f03 -O3 -g execution test gfortran.dg/intrinsic_pack_1.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test gfortran.dg/intrinsic_pack_1.f90 -O3 -g execution test Can you have a look? Thanks, Christophe. >> > Ok for trunk ? >> >> + /*Calculate the unrolling factor based on the smallest type. */ >> + unrolling_factor >> >> Space after /* missing. >> >> + unrolling_factor >> + = least_common_multiple (max_nunits, group_size)/group_size; >> >> spaces before/after the '/' >> >> + if (max_nunits > nunits >> + && unrolling_factor != 1 >> + && is_a <bb_vec_info> (vinfo)) >> { >> if (dump_enabled_p ()) >> - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >> - "Build SLP failed: unrolling required in basic" >> - " block SLP\n"); >> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, >> + vect_location, >> + "Build SLP failed: unrolling " >> + "required in basic block SLP\n"); >> vect_free_slp_tree (node); >> loads.release (); >> return false; >> } >> >> + if (is_a <bb_vec_info> (vinfo) >> + && max_nunits < group_size >> + && unrolling_factor != 1) >> >> please merge these. I think you have a not covered case of max_nunits == >> nunits, max_nunits > group_size. >> So do >> >> if (unrolling_factor != 1 >> && is_a <...>) >> { >> if (max_nunits >= group_size) >> { >> first error >> } >> ... signal fatal mismatch instead... >> } >> else >> { >> >> Ok with that change. > > Thank you for the review. I updated and patch and up streamed it to trunk. > Ref: https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=236582 > > Regards, > Venkat. > >> >> Thanks, >> Richard. >> >> >> >> >> Thanks, >> >> Richard. >> >> >> >> > Regards, >> >> > Venkat. >> >> > >> > >> > Regards, >> > Venkat.
Hi Christophe, > -----Original Message----- > From: Christophe Lyon [mailto:christophe.lyon@linaro.org] > Sent: Tuesday, May 24, 2016 8:45 PM > To: Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com> > Cc: Richard Biener <richard.guenther@gmail.com>; gcc-patches@gcc.gnu.org > Subject: Re: [Patch V2] Fix SLP PR58135. > > Hi Venkat, > > > On 23 May 2016 at 11:54, Kumar, Venkataramanan > <Venkataramanan.Kumar@amd.com> wrote: > > Hi Richard, > > > >> -----Original Message----- > >> From: Richard Biener [mailto:richard.guenther@gmail.com] > >> Sent: Thursday, May 19, 2016 4:08 PM > >> To: Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com> > >> Cc: gcc-patches@gcc.gnu.org > >> Subject: Re: [Patch V2] Fix SLP PR58135. > >> > >> On Wed, May 18, 2016 at 5:29 PM, Kumar, Venkataramanan > >> <Venkataramanan.Kumar@amd.com> wrote: > >> > Hi Richard, > >> > > >> >> -----Original Message----- > >> >> From: Richard Biener [mailto:richard.guenther@gmail.com] > >> >> Sent: Tuesday, May 17, 2016 5:40 PM > >> >> To: Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com> > >> >> Cc: gcc-patches@gcc.gnu.org > >> >> Subject: Re: [Patch V2] Fix SLP PR58135. > >> >> > >> >> On Tue, May 17, 2016 at 1:56 PM, Kumar, Venkataramanan > >> >> <Venkataramanan.Kumar@amd.com> wrote: > >> >> > Hi Richard, > >> >> > > >> >> > I created the patch by passing -b option to git. Now the patch > >> >> > is more > >> >> readable. > >> >> > > >> >> > As per your suggestion I tried to fix the PR by splitting the > >> >> > SLP store group at > >> >> vector boundary after the SLP tree is built. > >> >> > > >> >> > Boot strap PASSED on x86_64. > >> >> > Checked the patch with check_GNU_style.sh. > >> >> > > >> >> > The gfortran.dg/pr46519-1.f test now does SLP vectorization. > >> >> > Hence it > >> >> generated 2 more vzeroupper. > >> >> > As recommended I adjusted the test case by adding > >> >> > -fno-tree-slp-vectorize > >> >> to make it as expected after loop vectorization. > >> >> > > >> >> > The following tests are now passing. > >> >> > > >> >> > ------ Snip----- > >> >> > Tests that now work, but didn't before: > >> >> > > >> >> > gcc.dg/vect/bb-slp-19.c -flto -ffat-lto-objects > >> >> > scan-tree-dump-times > >> >> > slp2 "basic block vectorized" 1 > >> >> > > >> >> > gcc.dg/vect/bb-slp-19.c scan-tree-dump-times slp2 "basic block > >> >> > vectorized" 1 > >> >> > > >> >> > New tests that PASS: > >> >> > > >> >> > gcc.dg/vect/pr58135.c (test for excess errors) > >> >> > gcc.dg/vect/pr58135.c -flto -ffat-lto-objects (test for excess > >> >> > errors) > >> >> > > >> >> > ------ Snip----- > >> >> > > >> >> > ChangeLog > >> >> > > >> >> > 2016-05-14 Venkataramanan Kumar > >> >> <Venkataramanan.kumar@amd.com> > >> >> > PR tree-optimization/58135 > >> >> > * tree-vect-slp.c: When group size is not multiple of vector size, > >> >> > allow splitting of store group at vector boundary. > >> >> > > >> >> > Test suite ChangeLog > >> >> > 2016-05-14 Venkataramanan Kumar > >> >> <Venkataramanan.kumar@amd.com> > >> >> > * gcc.dg/vect/bb-slp-19.c: Remove XFAIL. > >> >> > * gcc.dg/vect/pr58135.c: Add new. > >> >> > * gfortran.dg/pr46519-1.f: Adjust test case. > >> >> > > >> >> > The attached patch Ok for trunk? > >> >> > >> >> > >> >> Please avoid the excessive vertical space around the > >> >> vect_build_slp_tree > >> call. > >> > Yes fixed in the attached patch. > >> >> > >> >> + /* Calculate the unrolling factor. */ > >> >> + unrolling_factor = least_common_multiple > >> >> + (nunits, group_size) / group_size; > >> >> ... > >> >> + else > >> >> { > >> >> /* Calculate the unrolling factor based on the smallest type. */ > >> >> if (max_nunits > nunits) > >> >> - unrolling_factor = least_common_multiple (max_nunits, > group_size) > >> >> - / group_size; > >> >> + unrolling_factor > >> >> + = least_common_multiple (max_nunits, > >> >> + group_size)/group_size; > >> >> > >> >> please compute the "correct" unroll factor immediately and move > >> >> the "unrolling of BB required" error into the if() case by > >> >> post-poning the nunits < group_size check (and use max_nunits here). > >> >> > >> > Yes fixed in the attached patch. > >> > > >> >> + if (is_a <bb_vec_info> (vinfo) > >> >> + && nunits < group_size > >> >> + && unrolling_factor != 1 > >> >> + && is_a <bb_vec_info> (vinfo)) > >> >> + { > >> >> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > >> >> + "Build SLP failed: store group " > >> >> + "size not a multiple of the vector size " > >> >> + "in basic block SLP\n"); > >> >> + /* Fatal mismatch. */ > >> >> + matches[nunits] = false; > >> >> > >> >> this is too pessimistic - you want to add the extra 'false' at > >> >> group_size / max_nunits * max_nunits. > >> > Yes fixed in attached patch. > >> > > >> >> > >> >> It looks like you leak 'node' in the if () path as well. You need > >> >> > >> >> vect_free_slp_tree (node); > >> >> loads.release (); > >> >> > >> >> thus treat it as a failure case. > >> > > >> > Yes fixed. I added an else part before scalar_stmts.release call > >> > for the case > >> when SLP tree is not built. This avoids double freeing. > >> > Bootstrapped and reg tested on X86_64. > >> > > > This patch is causing regressions on armeb: > http://people.linaro.org/~christophe.lyon/cross- > validation/gcc/trunk/236591/report-build-info.html > The following fortran tests now fail at runtime: > > gfortran.dg/c_f_pointer_logical.f03 -O3 -fomit-frame-pointer > -funroll-loops -fpeel-loops -ftracer -finline-functions execution test > gfortran.dg/c_f_pointer_logical.f03 -O3 -g execution test > gfortran.dg/intrinsic_pack_1.f90 -O3 -fomit-frame-pointer > -funroll-loops -fpeel-loops -ftracer -finline-functions execution test > gfortran.dg/intrinsic_pack_1.f90 -O3 -g execution test > > Can you have a look? > Sure. I will take a look. > Thanks, > > Christophe. > > > > >> > Ok for trunk ? > >> > >> + /*Calculate the unrolling factor based on the smallest type. */ > >> + unrolling_factor > >> > >> Space after /* missing. > >> > >> + unrolling_factor > >> + = least_common_multiple (max_nunits, group_size)/group_size; > >> > >> spaces before/after the '/' > >> > >> + if (max_nunits > nunits > >> + && unrolling_factor != 1 > >> + && is_a <bb_vec_info> (vinfo)) > >> { > >> if (dump_enabled_p ()) > >> - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > >> - "Build SLP failed: unrolling required in basic" > >> - " block SLP\n"); > >> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, > >> + vect_location, > >> + "Build SLP failed: unrolling " > >> + "required in basic block SLP\n"); > >> vect_free_slp_tree (node); > >> loads.release (); > >> return false; > >> } > >> > >> + if (is_a <bb_vec_info> (vinfo) > >> + && max_nunits < group_size > >> + && unrolling_factor != 1) > >> > >> please merge these. I think you have a not covered case of > >> max_nunits == nunits, max_nunits > group_size. > >> So do > >> > >> if (unrolling_factor != 1 > >> && is_a <...>) > >> { > >> if (max_nunits >= group_size) > >> { > >> first error > >> } > >> ... signal fatal mismatch instead... > >> } > >> else > >> { > >> > >> Ok with that change. > > > > Thank you for the review. I updated and patch and up streamed it to trunk. > > Ref: https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=236582 > > > > Regards, > > Venkat. Regards, Venkat.
On 25 May 2016 at 05:29, Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com> wrote: > Hi Christophe, > >> -----Original Message----- >> From: Christophe Lyon [mailto:christophe.lyon@linaro.org] >> Sent: Tuesday, May 24, 2016 8:45 PM >> To: Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com> >> Cc: Richard Biener <richard.guenther@gmail.com>; gcc-patches@gcc.gnu.org >> Subject: Re: [Patch V2] Fix SLP PR58135. >> >> Hi Venkat, >> >> >> On 23 May 2016 at 11:54, Kumar, Venkataramanan >> <Venkataramanan.Kumar@amd.com> wrote: >> > Hi Richard, >> > >> >> -----Original Message----- >> >> From: Richard Biener [mailto:richard.guenther@gmail.com] >> >> Sent: Thursday, May 19, 2016 4:08 PM >> >> To: Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com> >> >> Cc: gcc-patches@gcc.gnu.org >> >> Subject: Re: [Patch V2] Fix SLP PR58135. >> >> >> >> On Wed, May 18, 2016 at 5:29 PM, Kumar, Venkataramanan >> >> <Venkataramanan.Kumar@amd.com> wrote: >> >> > Hi Richard, >> >> > >> >> >> -----Original Message----- >> >> >> From: Richard Biener [mailto:richard.guenther@gmail.com] >> >> >> Sent: Tuesday, May 17, 2016 5:40 PM >> >> >> To: Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com> >> >> >> Cc: gcc-patches@gcc.gnu.org >> >> >> Subject: Re: [Patch V2] Fix SLP PR58135. >> >> >> >> >> >> On Tue, May 17, 2016 at 1:56 PM, Kumar, Venkataramanan >> >> >> <Venkataramanan.Kumar@amd.com> wrote: >> >> >> > Hi Richard, >> >> >> > >> >> >> > I created the patch by passing -b option to git. Now the patch >> >> >> > is more >> >> >> readable. >> >> >> > >> >> >> > As per your suggestion I tried to fix the PR by splitting the >> >> >> > SLP store group at >> >> >> vector boundary after the SLP tree is built. >> >> >> > >> >> >> > Boot strap PASSED on x86_64. >> >> >> > Checked the patch with check_GNU_style.sh. >> >> >> > >> >> >> > The gfortran.dg/pr46519-1.f test now does SLP vectorization. >> >> >> > Hence it >> >> >> generated 2 more vzeroupper. >> >> >> > As recommended I adjusted the test case by adding >> >> >> > -fno-tree-slp-vectorize >> >> >> to make it as expected after loop vectorization. >> >> >> > >> >> >> > The following tests are now passing. >> >> >> > >> >> >> > ------ Snip----- >> >> >> > Tests that now work, but didn't before: >> >> >> > >> >> >> > gcc.dg/vect/bb-slp-19.c -flto -ffat-lto-objects >> >> >> > scan-tree-dump-times >> >> >> > slp2 "basic block vectorized" 1 >> >> >> > >> >> >> > gcc.dg/vect/bb-slp-19.c scan-tree-dump-times slp2 "basic block >> >> >> > vectorized" 1 >> >> >> > >> >> >> > New tests that PASS: >> >> >> > >> >> >> > gcc.dg/vect/pr58135.c (test for excess errors) >> >> >> > gcc.dg/vect/pr58135.c -flto -ffat-lto-objects (test for excess >> >> >> > errors) >> >> >> > >> >> >> > ------ Snip----- >> >> >> > >> >> >> > ChangeLog >> >> >> > >> >> >> > 2016-05-14 Venkataramanan Kumar >> >> >> <Venkataramanan.kumar@amd.com> >> >> >> > PR tree-optimization/58135 >> >> >> > * tree-vect-slp.c: When group size is not multiple of vector size, >> >> >> > allow splitting of store group at vector boundary. >> >> >> > >> >> >> > Test suite ChangeLog >> >> >> > 2016-05-14 Venkataramanan Kumar >> >> >> <Venkataramanan.kumar@amd.com> >> >> >> > * gcc.dg/vect/bb-slp-19.c: Remove XFAIL. >> >> >> > * gcc.dg/vect/pr58135.c: Add new. >> >> >> > * gfortran.dg/pr46519-1.f: Adjust test case. >> >> >> > >> >> >> > The attached patch Ok for trunk? >> >> >> >> >> >> >> >> >> Please avoid the excessive vertical space around the >> >> >> vect_build_slp_tree >> >> call. >> >> > Yes fixed in the attached patch. >> >> >> >> >> >> + /* Calculate the unrolling factor. */ >> >> >> + unrolling_factor = least_common_multiple >> >> >> + (nunits, group_size) / group_size; >> >> >> ... >> >> >> + else >> >> >> { >> >> >> /* Calculate the unrolling factor based on the smallest type. */ >> >> >> if (max_nunits > nunits) >> >> >> - unrolling_factor = least_common_multiple (max_nunits, >> group_size) >> >> >> - / group_size; >> >> >> + unrolling_factor >> >> >> + = least_common_multiple (max_nunits, >> >> >> + group_size)/group_size; >> >> >> >> >> >> please compute the "correct" unroll factor immediately and move >> >> >> the "unrolling of BB required" error into the if() case by >> >> >> post-poning the nunits < group_size check (and use max_nunits here). >> >> >> >> >> > Yes fixed in the attached patch. >> >> > >> >> >> + if (is_a <bb_vec_info> (vinfo) >> >> >> + && nunits < group_size >> >> >> + && unrolling_factor != 1 >> >> >> + && is_a <bb_vec_info> (vinfo)) >> >> >> + { >> >> >> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >> >> >> + "Build SLP failed: store group " >> >> >> + "size not a multiple of the vector size " >> >> >> + "in basic block SLP\n"); >> >> >> + /* Fatal mismatch. */ >> >> >> + matches[nunits] = false; >> >> >> >> >> >> this is too pessimistic - you want to add the extra 'false' at >> >> >> group_size / max_nunits * max_nunits. >> >> > Yes fixed in attached patch. >> >> > >> >> >> >> >> >> It looks like you leak 'node' in the if () path as well. You need >> >> >> >> >> >> vect_free_slp_tree (node); >> >> >> loads.release (); >> >> >> >> >> >> thus treat it as a failure case. >> >> > >> >> > Yes fixed. I added an else part before scalar_stmts.release call >> >> > for the case >> >> when SLP tree is not built. This avoids double freeing. >> >> > Bootstrapped and reg tested on X86_64. >> >> > >> >> This patch is causing regressions on armeb: >> http://people.linaro.org/~christophe.lyon/cross- >> validation/gcc/trunk/236591/report-build-info.html >> The following fortran tests now fail at runtime: >> >> gfortran.dg/c_f_pointer_logical.f03 -O3 -fomit-frame-pointer >> -funroll-loops -fpeel-loops -ftracer -finline-functions execution test >> gfortran.dg/c_f_pointer_logical.f03 -O3 -g execution test >> gfortran.dg/intrinsic_pack_1.f90 -O3 -fomit-frame-pointer >> -funroll-loops -fpeel-loops -ftracer -finline-functions execution test >> gfortran.dg/intrinsic_pack_1.f90 -O3 -g execution test >> >> Can you have a look? >> > > Sure. I will take a look. > Thanks, I've filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71270 to keep track of this. Christophe >> Thanks, >> >> Christophe. >> >> >> >> >> > Ok for trunk ? >> >> >> >> + /*Calculate the unrolling factor based on the smallest type. */ >> >> + unrolling_factor >> >> >> >> Space after /* missing. >> >> >> >> + unrolling_factor >> >> + = least_common_multiple (max_nunits, group_size)/group_size; >> >> >> >> spaces before/after the '/' >> >> >> >> + if (max_nunits > nunits >> >> + && unrolling_factor != 1 >> >> + && is_a <bb_vec_info> (vinfo)) >> >> { >> >> if (dump_enabled_p ()) >> >> - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >> >> - "Build SLP failed: unrolling required in basic" >> >> - " block SLP\n"); >> >> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, >> >> + vect_location, >> >> + "Build SLP failed: unrolling " >> >> + "required in basic block SLP\n"); >> >> vect_free_slp_tree (node); >> >> loads.release (); >> >> return false; >> >> } >> >> >> >> + if (is_a <bb_vec_info> (vinfo) >> >> + && max_nunits < group_size >> >> + && unrolling_factor != 1) >> >> >> >> please merge these. I think you have a not covered case of >> >> max_nunits == nunits, max_nunits > group_size. >> >> So do >> >> >> >> if (unrolling_factor != 1 >> >> && is_a <...>) >> >> { >> >> if (max_nunits >= group_size) >> >> { >> >> first error >> >> } >> >> ... signal fatal mismatch instead... >> >> } >> >> else >> >> { >> >> >> >> Ok with that change. >> > >> > Thank you for the review. I updated and patch and up streamed it to trunk. >> > Ref: https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=236582 >> > >> > Regards, >> > Venkat. > > Regards, > Venkat.
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-19.c b/gcc/testsuite/gcc.dg/vect/bb-slp-19.c index 42cd294..c282155 100644 --- a/gcc/testsuite/gcc.dg/vect/bb-slp-19.c +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-19.c @@ -53,5 +53,5 @@ int main (void) return 0; } -/* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp2" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp2" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/pr58135.c b/gcc/testsuite/gcc.dg/vect/pr58135.c new file mode 100644 index 0000000..ca25000 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr58135.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ + +int a[100]; +void foo () +{ + a[0] = a[1] = a[2] = a[3] = a[4]= 0; +} + +/* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp2" } } */ diff --git a/gcc/testsuite/gfortran.dg/pr46519-1.f b/gcc/testsuite/gfortran.dg/pr46519-1.f index 51c64b8..46be9f5 100644 --- a/gcc/testsuite/gfortran.dg/pr46519-1.f +++ b/gcc/testsuite/gfortran.dg/pr46519-1.f @@ -1,5 +1,5 @@ ! { dg-do compile { target i?86-*-* x86_64-*-* } } -! { dg-options "-O3 -mavx -mvzeroupper -mtune=generic -dp" } +! { dg-options "-O3 -mavx -mvzeroupper -fno-tree-slp-vectorize -mtune=generic -dp" } PROGRAM MG3XDEMO INTEGER LM, NM, NV, NR, NIT diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index d713848..3babfcd 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -1754,18 +1754,6 @@ vect_analyze_slp_instance (vec_info *vinfo, } nunits = TYPE_VECTOR_SUBPARTS (vectype); - /* Calculate the unrolling factor. */ - unrolling_factor = least_common_multiple (nunits, group_size) / group_size; - if (unrolling_factor != 1 && is_a <bb_vec_info> (vinfo)) - { - if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "Build SLP failed: unrolling required in basic" - " block SLP\n"); - - return false; - } - /* Create a node (a root of the SLP tree) for the packed grouped stores. */ scalar_stmts.create (group_size); next = stmt; @@ -1801,26 +1789,44 @@ vect_analyze_slp_instance (vec_info *vinfo, /* Build the tree for the SLP instance. */ bool *matches = XALLOCAVEC (bool, group_size); unsigned npermutes = 0; - if ((node = vect_build_slp_tree (vinfo, scalar_stmts, group_size, + node = vect_build_slp_tree (vinfo, scalar_stmts, group_size, &max_nunits, &loads, matches, &npermutes, - NULL, max_tree_size)) != NULL) + NULL, max_tree_size); + if (node != NULL) { - /* Calculate the unrolling factor based on the smallest type. */ - if (max_nunits > nunits) - unrolling_factor = least_common_multiple (max_nunits, group_size) - / group_size; + /*Calculate the unrolling factor based on the smallest type. */ + unrolling_factor + = least_common_multiple (max_nunits, group_size)/group_size; - if (unrolling_factor != 1 && is_a <bb_vec_info> (vinfo)) + if (max_nunits > nunits + && unrolling_factor != 1 + && is_a <bb_vec_info> (vinfo)) { if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "Build SLP failed: unrolling required in basic" - " block SLP\n"); + dump_printf_loc (MSG_MISSED_OPTIMIZATION, + vect_location, + "Build SLP failed: unrolling " + "required in basic block SLP\n"); vect_free_slp_tree (node); loads.release (); return false; } + if (is_a <bb_vec_info> (vinfo) + && max_nunits < group_size + && unrolling_factor != 1) + { + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "Build SLP failed: store group " + "size not a multiple of the vector size " + "in basic block SLP\n"); + /* Fatal mismatch. */ + matches[group_size/max_nunits * max_nunits] = false; + vect_free_slp_tree (node); + loads.release (); + } + else + { /* Create a new SLP instance. */ new_instance = XNEW (struct _slp_instance); SLP_INSTANCE_TREE (new_instance) = node; @@ -1842,8 +1848,8 @@ vect_analyze_slp_instance (vec_info *vinfo, (vinfo_for_stmt (SLP_TREE_SCALAR_STMTS (load_node)[0])); FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (load_node), j, load) { - int load_place - = vect_get_place_in_interleaving_chain (load, first_stmt); + int load_place = vect_get_place_in_interleaving_chain + (load, first_stmt); gcc_assert (load_place != -1); if (load_place != j) this_load_permuted = true; @@ -1873,7 +1879,8 @@ vect_analyze_slp_instance (vec_info *vinfo, dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "Build SLP failed: unsupported load " "permutation "); - dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0); + dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, + TDF_SLIM, stmt, 0); dump_printf (MSG_MISSED_OPTIMIZATION, "\n"); } vect_free_slp_instance (new_instance); @@ -1881,7 +1888,7 @@ vect_analyze_slp_instance (vec_info *vinfo, } } - /* If the loads and stores can be handled with load/store-lane + /* If the loads and stores can be handled with load/store-lan instructions do not generate this SLP instance. */ if (is_a <loop_vec_info> (vinfo) && loads_permuted @@ -1893,7 +1900,8 @@ vect_analyze_slp_instance (vec_info *vinfo, gimple *first_stmt = GROUP_FIRST_ELEMENT (vinfo_for_stmt (SLP_TREE_SCALAR_STMTS (load_node)[0])); stmt_vec_info stmt_vinfo = vinfo_for_stmt (first_stmt); - /* Use SLP for strided accesses (or if we can't load-lanes). */ + /* Use SLP for strided accesses (or if we + can't load-lanes). */ if (STMT_VINFO_STRIDED_P (stmt_vinfo) || ! vect_load_lanes_supported (STMT_VINFO_VECTYPE (stmt_vinfo), @@ -1922,11 +1930,14 @@ vect_analyze_slp_instance (vec_info *vinfo, return true; } - + } + else + { /* Failed to SLP. */ /* Free the allocated memory. */ scalar_stmts.release (); loads.release (); + } /* For basic block SLP, try to break the group up into multiples of the vector size. */