diff mbox

Improve detection of widening multiplication in the vectorizer

Message ID BANLkTi=kwCb71z5_n6U1P497uggA5boMyw@mail.gmail.com
State New
Headers show

Commit Message

Ira Rosen June 1, 2011, 11:37 a.m. UTC
On 1 June 2011 12:42, Richard Guenther <richard.guenther@gmail.com> wrote:

> Did you think about moving pass_optimize_widening_mul before
> loop optimizations?  Does that pass catch the cases you are
> teaching the pattern recognizer?  I think we should try to expose
> these more complicated instructions to loop optimizers.
>

pass_optimize_widening_mul doesn't catch these cases, but I can try to
teach it instead of the vectorizer.
I am now testing


to see how it affects other loop optimizations (vectorizer pattern
tests obviously fail).

Thanks,
Ira

> Thanks,
> Richard.
>

Comments

Richard Biener June 1, 2011, 12:14 p.m. UTC | #1
On Wed, Jun 1, 2011 at 1:37 PM, Ira Rosen <ira.rosen@linaro.org> wrote:
> On 1 June 2011 12:42, Richard Guenther <richard.guenther@gmail.com> wrote:
>
>> Did you think about moving pass_optimize_widening_mul before
>> loop optimizations?  Does that pass catch the cases you are
>> teaching the pattern recognizer?  I think we should try to expose
>> these more complicated instructions to loop optimizers.
>>
>
> pass_optimize_widening_mul doesn't catch these cases, but I can try to
> teach it instead of the vectorizer.
> I am now testing
>
> Index: passes.c
> ===================================================================
> --- passes.c    (revision 174391)
> +++ passes.c    (working copy)
> @@ -870,6 +870,7 @@
>       NEXT_PASS (pass_split_crit_edges);
>       NEXT_PASS (pass_pre);
>       NEXT_PASS (pass_sink_code);
> +      NEXT_PASS (pass_optimize_widening_mul);
>       NEXT_PASS (pass_tree_loop);
>        {
>          struct opt_pass **p = &pass_tree_loop.pass.sub;
> @@ -934,7 +935,6 @@
>       NEXT_PASS (pass_forwprop);
>       NEXT_PASS (pass_phiopt);
>       NEXT_PASS (pass_fold_builtins);
> -      NEXT_PASS (pass_optimize_widening_mul);
>       NEXT_PASS (pass_tail_calls);
>       NEXT_PASS (pass_rename_ssa_copies);
>       NEXT_PASS (pass_uncprop);
>
> to see how it affects other loop optimizations (vectorizer pattern
> tests obviously fail).

Thanks.  I would hope that we eventually can get rid of the
pattern recognizer ... at least for SSE there is also always
a scalar variant instruction for each vectorized one.

Richard.
Richard Sandiford June 6, 2011, 1:04 p.m. UTC | #2
Richard Guenther <richard.guenther@gmail.com> writes:
> Thanks.  I would hope that we eventually can get rid of the
> pattern recognizer ... at least for SSE there is also always
> a scalar variant instruction for each vectorized one.

AFAIK, that isn't true for ARM and NEON.  E.g. I don't know of a single
instruction that does the scalar equivalent of things like VADDHN
(add values and narrow to high half), VSUBL.U32 (subtract two values
and extend the result), etc.

FWIW, I think MIPS only has minimum and maximum operations for paired
floats, not for single floats or doubles.  I don't have the manuals to
hand to check though.

It's probably OK for the particular case of widening multiplications.
It sounded like you were making a more general statement though.
If so, I think we should try to avoid assuming that every vectorisable
operation has an equivalent scalar machine instruction.

Richard
Richard Biener June 6, 2011, 2:28 p.m. UTC | #3
On Mon, Jun 6, 2011 at 3:04 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Guenther <richard.guenther@gmail.com> writes:
>> Thanks.  I would hope that we eventually can get rid of the
>> pattern recognizer ... at least for SSE there is also always
>> a scalar variant instruction for each vectorized one.
>
> AFAIK, that isn't true for ARM and NEON.  E.g. I don't know of a single
> instruction that does the scalar equivalent of things like VADDHN
> (add values and narrow to high half), VSUBL.U32 (subtract two values
> and extend the result), etc.
>
> FWIW, I think MIPS only has minimum and maximum operations for paired
> floats, not for single floats or doubles.  I don't have the manuals to
> hand to check though.
>
> It's probably OK for the particular case of widening multiplications.
> It sounded like you were making a more general statement though.
> If so, I think we should try to avoid assuming that every vectorisable
> operation has an equivalent scalar machine instruction.

Hmm, too bad ;)  Yes, I was suggesting that we assume that.  I guess
for now we can go with the vectorizer pattern matching enhancement
and re-visit re-ordering the passes later (I don't have time right now to
look into the reported issue).

Thanks,
Richard.

> Richard
>
diff mbox

Patch

Index: passes.c
===================================================================
--- passes.c    (revision 174391)
+++ passes.c    (working copy)
@@ -870,6 +870,7 @@ 
       NEXT_PASS (pass_split_crit_edges);
       NEXT_PASS (pass_pre);
       NEXT_PASS (pass_sink_code);
+      NEXT_PASS (pass_optimize_widening_mul);
       NEXT_PASS (pass_tree_loop);
        {
          struct opt_pass **p = &pass_tree_loop.pass.sub;
@@ -934,7 +935,6 @@ 
       NEXT_PASS (pass_forwprop);
       NEXT_PASS (pass_phiopt);
       NEXT_PASS (pass_fold_builtins);
-      NEXT_PASS (pass_optimize_widening_mul);
       NEXT_PASS (pass_tail_calls);
       NEXT_PASS (pass_rename_ssa_copies);
       NEXT_PASS (pass_uncprop);