Message ID | mptmudbvexc.fsf@arm.com |
---|---|
State | New |
Headers | show |
Series | Vector epilogues vs. mixed vector sizes | expand |
On Mon, Nov 4, 2019 at 4:30 PM Richard Sandiford <richard.sandiford@arm.com> wrote: > > With a later patch I saw a case in which we peeled a single iteration > for gaps but didn't need to peel further iterations to make up a full > vector. We then tried to vectorise the single-iteration epilogue. But when peeling for gaps we peel off a full vector iteration and thus have possibly VF-1 iterations in the epilogue, enough for vectorizing with VF/2? > > 2019-11-04 Richard Sandiford <richard.sandiford@arm.com> > > gcc/ > * tree-vect-loop.c (vect_analyze_loop): Only try to vectorize > the epilogue if there are peeled iterations for it to handle. > > Index: gcc/tree-vect-loop.c > =================================================================== > --- gcc/tree-vect-loop.c 2019-11-04 15:18:26.684592505 +0000 > +++ gcc/tree-vect-loop.c 2019-11-04 15:18:36.608524542 +0000 > @@ -2462,6 +2462,7 @@ vect_analyze_loop (class loop *loop, loo > vect_epilogues = (!loop->simdlen > && loop->inner == NULL > && PARAM_VALUE (PARAM_VECT_EPILOGUES_NOMASK) > + && LOOP_VINFO_PEELING_FOR_NITER (first_loop_vinfo) > /* For now only allow one epilogue loop. */ > && first_loop_vinfo->epilogue_vinfos.is_empty ()); >
Richard Biener <richard.guenther@gmail.com> writes: > On Mon, Nov 4, 2019 at 4:30 PM Richard Sandiford > <richard.sandiford@arm.com> wrote: >> >> With a later patch I saw a case in which we peeled a single iteration >> for gaps but didn't need to peel further iterations to make up a full >> vector. We then tried to vectorise the single-iteration epilogue. > > But when peeling for gaps we peel off a full vector iteration and thus > have possibly VF-1 iterations in the epilogue, enough for vectorizing > with VF/2? Peeling for gaps just means we need to peel off one final scalar iteration. Often that means we need to peel more to keep the vector loop operating on a multiple of VF, but if so, that additional peeling counts as LOOP_VINFO_PEELING_FOR_NITER. If we have a VF of 32 and a known iteration count of 65, we can peel a single iteration for gaps without having to peel any more. (Obviously we'd peel that iteration anyway if we didn't have to peel it for gaps.) And when using fully-masked/predicated loops, peeling one iteration for gaps doesn't force us to peel more, even if the iteration count isn't known. Thanks, Richard >> >> 2019-11-04 Richard Sandiford <richard.sandiford@arm.com> >> >> gcc/ >> * tree-vect-loop.c (vect_analyze_loop): Only try to vectorize >> the epilogue if there are peeled iterations for it to handle. >> >> Index: gcc/tree-vect-loop.c >> =================================================================== >> --- gcc/tree-vect-loop.c 2019-11-04 15:18:26.684592505 +0000 >> +++ gcc/tree-vect-loop.c 2019-11-04 15:18:36.608524542 +0000 >> @@ -2462,6 +2462,7 @@ vect_analyze_loop (class loop *loop, loo >> vect_epilogues = (!loop->simdlen >> && loop->inner == NULL >> && PARAM_VALUE (PARAM_VECT_EPILOGUES_NOMASK) >> + && LOOP_VINFO_PEELING_FOR_NITER (first_loop_vinfo) >> /* For now only allow one epilogue loop. */ >> && first_loop_vinfo->epilogue_vinfos.is_empty ()); >>
On Wed, Nov 6, 2019 at 1:22 PM Richard Sandiford <richard.sandiford@arm.com> wrote: > > Richard Biener <richard.guenther@gmail.com> writes: > > On Mon, Nov 4, 2019 at 4:30 PM Richard Sandiford > > <richard.sandiford@arm.com> wrote: > >> > >> With a later patch I saw a case in which we peeled a single iteration > >> for gaps but didn't need to peel further iterations to make up a full > >> vector. We then tried to vectorise the single-iteration epilogue. > > > > But when peeling for gaps we peel off a full vector iteration and thus > > have possibly VF-1 iterations in the epilogue, enough for vectorizing > > with VF/2? > > Peeling for gaps just means we need to peel off one final scalar > iteration. Often that means we need to peel more to keep the vector > loop operating on a multiple of VF, but if so, that additional peeling > counts as LOOP_VINFO_PEELING_FOR_NITER. > > If we have a VF of 32 and a known iteration count of 65, we can peel a > single iteration for gaps without having to peel any more. (Obviously > we'd peel that iteration anyway if we didn't have to peel it for gaps.) > And when using fully-masked/predicated loops, peeling one iteration for > gaps doesn't force us to peel more, even if the iteration count isn't > known. For sure when we do not have any epiloge it's pointless to try vectorize it. It seems LOOP_VINFO_PEELING_FOR_NITER is set in "interesting" ways, deciphering it seems to show that when we have an epilogue but not LOOP_VINFO_PEELING_FOR_NITER then that epilogue always has a single iteration only. So, OK ... Richard. > Thanks, > Richard > > >> > >> 2019-11-04 Richard Sandiford <richard.sandiford@arm.com> > >> > >> gcc/ > >> * tree-vect-loop.c (vect_analyze_loop): Only try to vectorize > >> the epilogue if there are peeled iterations for it to handle. > >> > >> Index: gcc/tree-vect-loop.c > >> =================================================================== > >> --- gcc/tree-vect-loop.c 2019-11-04 15:18:26.684592505 +0000 > >> +++ gcc/tree-vect-loop.c 2019-11-04 15:18:36.608524542 +0000 > >> @@ -2462,6 +2462,7 @@ vect_analyze_loop (class loop *loop, loo > >> vect_epilogues = (!loop->simdlen > >> && loop->inner == NULL > >> && PARAM_VALUE (PARAM_VECT_EPILOGUES_NOMASK) > >> + && LOOP_VINFO_PEELING_FOR_NITER (first_loop_vinfo) > >> /* For now only allow one epilogue loop. */ > >> && first_loop_vinfo->epilogue_vinfos.is_empty ()); > >>
Index: gcc/tree-vect-loop.c =================================================================== --- gcc/tree-vect-loop.c 2019-11-04 15:18:26.684592505 +0000 +++ gcc/tree-vect-loop.c 2019-11-04 15:18:36.608524542 +0000 @@ -2462,6 +2462,7 @@ vect_analyze_loop (class loop *loop, loo vect_epilogues = (!loop->simdlen && loop->inner == NULL && PARAM_VALUE (PARAM_VECT_EPILOGUES_NOMASK) + && LOOP_VINFO_PEELING_FOR_NITER (first_loop_vinfo) /* For now only allow one epilogue loop. */ && first_loop_vinfo->epilogue_vinfos.is_empty ());