Message ID | alpine.LSU.2.11.1511271511510.4884@t29.fhfr.qr |
---|---|
State | New |
Headers | show |
On 27/11/15 14:13, Richard Biener wrote: > > The following fixes the excessive peeling for gaps we do when doing > SLP now that I removed most of the restrictions on having gaps in > the first place. > > This should make low-trip vectorized loops more efficient (sth > also the combine-epilogue-with-vectorized-body-by-masking patches > claim to do). > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. > > Richard. > > 2015-11-27 Richard Biener <rguenther@suse.de> > > PR tree-optimization/68559 > * tree-vect-data-refs.c (vect_analyze_group_access_1): Move > peeling for gap checks ... > * tree-vect-stmts.c (vectorizable_load): ... here and relax > for SLP. > * tree-vect-loop.c (vect_analyze_loop_2): Re-set > LOOP_VINFO_PEELING_FOR_GAPS before re-trying without SLP. > > * gcc.dg/vect/slp-perm-4.c: Adjust again. > * gcc.dg/vect/pr45752.c: Likewise. Since this, we have FAIL: gcc.dg/vect/pr45752.c -flto -ffat-lto-objects scan-tree-dump-times vect "gaps requires scalar epilogue loop" 0 FAIL: gcc.dg/vect/pr45752.c scan-tree-dump-times vect "gaps requires scalar epilogue loop" 0 on aarch64 platforms (aarch64-none-linux-gnu, aarch64-none-elf, aarch64_be-none-elf). Thanks, Alan
On Wed, 2 Dec 2015, Alan Lawrence wrote: > On 27/11/15 14:13, Richard Biener wrote: > > > > The following fixes the excessive peeling for gaps we do when doing > > SLP now that I removed most of the restrictions on having gaps in > > the first place. > > > > This should make low-trip vectorized loops more efficient (sth > > also the combine-epilogue-with-vectorized-body-by-masking patches > > claim to do). > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. > > > > Richard. > > > > 2015-11-27 Richard Biener <rguenther@suse.de> > > > > PR tree-optimization/68559 > > * tree-vect-data-refs.c (vect_analyze_group_access_1): Move > > peeling for gap checks ... > > * tree-vect-stmts.c (vectorizable_load): ... here and relax > > for SLP. > > * tree-vect-loop.c (vect_analyze_loop_2): Re-set > > LOOP_VINFO_PEELING_FOR_GAPS before re-trying without SLP. > > > > * gcc.dg/vect/slp-perm-4.c: Adjust again. > > * gcc.dg/vect/pr45752.c: Likewise. > > Since this, we have > > FAIL: gcc.dg/vect/pr45752.c -flto -ffat-lto-objects scan-tree-dump-times vect > "gaps requires scalar epilogue loop" 0 > FAIL: gcc.dg/vect/pr45752.c scan-tree-dump-times vect "gaps requires scalar > epilogue loop" 0 > > on aarch64 platforms (aarch64-none-linux-gnu, aarch64-none-elf, > aarch64_be-none-elf). Can you open a bug and attach -details vectorizer dumps? Richard.
Index: gcc/tree-vect-loop.c =================================================================== --- gcc/tree-vect-loop.c (revision 230998) +++ gcc/tree-vect-loop.c (working copy) @@ -2190,6 +2190,7 @@ again: = init_cost (LOOP_VINFO_LOOP (loop_vinfo)); /* Reset assorted flags. */ LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = false; + LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) = false; LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = 0; goto start_over; Index: gcc/tree-vect-data-refs.c =================================================================== --- gcc/tree-vect-data-refs.c (revision 231005) +++ gcc/tree-vect-data-refs.c (working copy) @@ -2166,10 +2166,6 @@ vect_analyze_group_access_1 (struct data HOST_WIDE_INT dr_step = -1; HOST_WIDE_INT groupsize, last_accessed_element = 1; bool slp_impossible = false; - struct loop *loop = NULL; - - if (loop_vinfo) - loop = LOOP_VINFO_LOOP (loop_vinfo); /* For interleaving, GROUPSIZE is STEP counted in elements, i.e., the size of the interleaving group (including gaps). */ @@ -2227,24 +2223,6 @@ vect_analyze_group_access_1 (struct data dump_printf (MSG_NOTE, "\n"); } - if (loop_vinfo) - { - if (dump_enabled_p ()) - dump_printf_loc (MSG_NOTE, vect_location, - "Data access with gaps requires scalar " - "epilogue loop\n"); - if (loop->inner) - { - if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "Peeling for outer loop is not" - " supported\n"); - return false; - } - - LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) = true; - } - return true; } @@ -2399,29 +2377,6 @@ vect_analyze_group_access_1 (struct data if (bb_vinfo) BB_VINFO_GROUPED_STORES (bb_vinfo).safe_push (stmt); } - - /* If there is a gap in the end of the group or the group size cannot - be made a multiple of the vector element count then we access excess - elements in the last iteration and thus need to peel that off. */ - if (loop_vinfo - && (groupsize - last_accessed_element > 0 - || exact_log2 (groupsize) == -1)) - - { - if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "Data access with gaps requires scalar " - "epilogue loop\n"); - if (loop->inner) - { - if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "Peeling for outer loop is not supported\n"); - return false; - } - - LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) = true; - } } return true;