Message ID | BANLkTi=kwCb71z5_n6U1P497uggA5boMyw@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Wed, Jun 1, 2011 at 1:37 PM, Ira Rosen <ira.rosen@linaro.org> wrote: > On 1 June 2011 12:42, Richard Guenther <richard.guenther@gmail.com> wrote: > >> Did you think about moving pass_optimize_widening_mul before >> loop optimizations? Does that pass catch the cases you are >> teaching the pattern recognizer? I think we should try to expose >> these more complicated instructions to loop optimizers. >> > > pass_optimize_widening_mul doesn't catch these cases, but I can try to > teach it instead of the vectorizer. > I am now testing > > Index: passes.c > =================================================================== > --- passes.c (revision 174391) > +++ passes.c (working copy) > @@ -870,6 +870,7 @@ > NEXT_PASS (pass_split_crit_edges); > NEXT_PASS (pass_pre); > NEXT_PASS (pass_sink_code); > + NEXT_PASS (pass_optimize_widening_mul); > NEXT_PASS (pass_tree_loop); > { > struct opt_pass **p = &pass_tree_loop.pass.sub; > @@ -934,7 +935,6 @@ > NEXT_PASS (pass_forwprop); > NEXT_PASS (pass_phiopt); > NEXT_PASS (pass_fold_builtins); > - NEXT_PASS (pass_optimize_widening_mul); > NEXT_PASS (pass_tail_calls); > NEXT_PASS (pass_rename_ssa_copies); > NEXT_PASS (pass_uncprop); > > to see how it affects other loop optimizations (vectorizer pattern > tests obviously fail). Thanks. I would hope that we eventually can get rid of the pattern recognizer ... at least for SSE there is also always a scalar variant instruction for each vectorized one. Richard.
Richard Guenther <richard.guenther@gmail.com> writes: > Thanks. I would hope that we eventually can get rid of the > pattern recognizer ... at least for SSE there is also always > a scalar variant instruction for each vectorized one. AFAIK, that isn't true for ARM and NEON. E.g. I don't know of a single instruction that does the scalar equivalent of things like VADDHN (add values and narrow to high half), VSUBL.U32 (subtract two values and extend the result), etc. FWIW, I think MIPS only has minimum and maximum operations for paired floats, not for single floats or doubles. I don't have the manuals to hand to check though. It's probably OK for the particular case of widening multiplications. It sounded like you were making a more general statement though. If so, I think we should try to avoid assuming that every vectorisable operation has an equivalent scalar machine instruction. Richard
On Mon, Jun 6, 2011 at 3:04 PM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > Richard Guenther <richard.guenther@gmail.com> writes: >> Thanks. I would hope that we eventually can get rid of the >> pattern recognizer ... at least for SSE there is also always >> a scalar variant instruction for each vectorized one. > > AFAIK, that isn't true for ARM and NEON. E.g. I don't know of a single > instruction that does the scalar equivalent of things like VADDHN > (add values and narrow to high half), VSUBL.U32 (subtract two values > and extend the result), etc. > > FWIW, I think MIPS only has minimum and maximum operations for paired > floats, not for single floats or doubles. I don't have the manuals to > hand to check though. > > It's probably OK for the particular case of widening multiplications. > It sounded like you were making a more general statement though. > If so, I think we should try to avoid assuming that every vectorisable > operation has an equivalent scalar machine instruction. Hmm, too bad ;) Yes, I was suggesting that we assume that. I guess for now we can go with the vectorizer pattern matching enhancement and re-visit re-ordering the passes later (I don't have time right now to look into the reported issue). Thanks, Richard. > Richard >
Index: passes.c =================================================================== --- passes.c (revision 174391) +++ passes.c (working copy) @@ -870,6 +870,7 @@ NEXT_PASS (pass_split_crit_edges); NEXT_PASS (pass_pre); NEXT_PASS (pass_sink_code); + NEXT_PASS (pass_optimize_widening_mul); NEXT_PASS (pass_tree_loop); { struct opt_pass **p = &pass_tree_loop.pass.sub; @@ -934,7 +935,6 @@ NEXT_PASS (pass_forwprop); NEXT_PASS (pass_phiopt); NEXT_PASS (pass_fold_builtins); - NEXT_PASS (pass_optimize_widening_mul); NEXT_PASS (pass_tail_calls); NEXT_PASS (pass_rename_ssa_copies); NEXT_PASS (pass_uncprop);