diff mbox

RFC [1/3] divmod transform v2

Message ID CAAgBjMkqv15VNxN17v9WUeNTU5OgZ3bAmzJQ+u3hHaGbevsVOg@mail.gmail.com
State New
Headers show

Commit Message

Prathamesh Kulkarni Oct. 16, 2016, 5:59 a.m. UTC
Hi,
After approval from Bernd Schmidt, I committed the patch to remove
optab functions for
sdivmod_optab and udivmod_optab in optabs.def, which removes the block
for divmod patch.

This patch is mostly the same as previous one, except it drops
targeting __udivmoddi4() because
it gave undefined reference link error for calling __udivmoddi4() on
aarch64-linux-gnu.
It appears aarch64 has hardware insn for DImode div, so __udivmoddi4()
isn't needed for the target
(it was a bug in my patch that called __udivmoddi4() even though
aarch64 supported hardware div).

However this makes me wonder if it's guaranteed that __udivmoddi4()
will be available for a target if it doesn't have hardware div and
divmod insn and doesn't have target-specific libfunc for
DImode divmod ? To be conservative, the attached patch doesn't
generate call to __udivmoddi4.

Passes bootstrap+test on x86_64-unknown-linux.
Cross-tested on arm*-*-*, aarch64*-*-*.
Verified that there are no regressions with SPEC2006 on
x86_64-unknown-linux-gnu.
OK to commit ?

Thanks,
Prathamesh
2016-10-15  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>
	    Kugan Vivekanandarajah  <kuganv@linaro.org>
	    Jim Wilson  <jim.wilson@linaro.org>

	    * target.def: New hook expand_divmod_libfunc.
	    * doc/tm.texi.in: Add hook for TARGET_EXPAND_DIVMOD_LIBFUNC
	    * doc/tm.texi: Regenerate.
	    * internal-fn.def: Add new entry for DIVMOD ifn.
	    * internal-fn.c (expand_DIVMOD): New.
	    * tree-ssa-math-opts.c: Include optabs-libfuncs.h, tree-eh.h,
	    targhooks.h.
	    (widen_mul_stats): Add new field divmod_calls_inserted.
	    (target_supports_divmod_p): New.
	    (divmod_candidate_p): Likewise.
	    (convert_to_divmod): Likewise.
	    (pass_optimize_widening_mul::execute): Call
	    calculate_dominance_info(), renumber_gimple_stmt_uids() at
	    beginning of function. Call convert_to_divmod()
	    and record stats for divmod.

Comments

Prathamesh Kulkarni Oct. 16, 2016, 8:37 a.m. UTC | #1
On 16 October 2016 at 11:29, Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
> Hi,
> After approval from Bernd Schmidt, I committed the patch to remove
> optab functions for
> sdivmod_optab and udivmod_optab in optabs.def, which removes the block
> for divmod patch.
>
> This patch is mostly the same as previous one, except it drops
> targeting __udivmoddi4() because
> it gave undefined reference link error for calling __udivmoddi4() on
> aarch64-linux-gnu.
> It appears aarch64 has hardware insn for DImode div, so __udivmoddi4()
> isn't needed for the target
> (it was a bug in my patch that called __udivmoddi4() even though
> aarch64 supported hardware div).
>
> However this makes me wonder if it's guaranteed that __udivmoddi4()
> will be available for a target if it doesn't have hardware div and
> divmod insn and doesn't have target-specific libfunc for
> DImode divmod ? To be conservative, the attached patch doesn't
> generate call to __udivmoddi4.
>
> Passes bootstrap+test on x86_64-unknown-linux.
> Cross-tested on arm*-*-*, aarch64*-*-*.
> Verified that there are no regressions with SPEC2006 on
I mean no regressions from correctness perspective, I didn't benchmark
the patch.
> x86_64-unknown-linux-gnu.
> OK to commit ?
>
> Thanks,
> Prathamesh
Jeff Law Oct. 17, 2016, 9:16 p.m. UTC | #2
On 10/15/2016 11:59 PM, Prathamesh Kulkarni wrote:
> This patch is mostly the same as previous one, except it drops
> targeting __udivmoddi4() because it gave undefined reference link
> error for calling __udivmoddi4() on aarch64-linux-gnu. It appears
> aarch64 has hardware insn for DImode div, so __udivmoddi4() isn't
> needed for the target (it was a bug in my patch that called
> __udivmoddi4() even though aarch64 supported hardware div).
This touches on the one high level question I had.  Namely what is the 
code generation strategy if the hardware has a div, but not divmod.

ISTM in that case I think we want to use the div instruction and 
synthesize mod from that result rather than relying on a software 
divmod.  So it looks like you ought to be doing the right thing for that 
case now based on your comment above.
>
> However this makes me wonder if it's guaranteed that __udivmoddi4()
> will be available for a target if it doesn't have hardware div and
> divmod insn and doesn't have target-specific libfunc for DImode
> divmod ? To be conservative, the attached patch doesn't generate call
> to __udivmoddi4.
I don't think that's a safe assumption.  Addition of the divmod routines 
into libgcc is controlled by the target and have to be explicitly added 
AFAICT.

So on a target like the fr30 which has no div or mod insn and doesn't 
define the right bits in libgcc, there is no divmod libcall available. 
(On these targets there's a div libcall and a mod libcall, but not a 
combined one).

I don't even think we have a way of knowing in the compiler if the 
target has enabled divmod support in libgcc.

ISTM that for now we have to limit to cases where we have a divmod 
insn/libcall defined.

jeff
Prathamesh Kulkarni Oct. 18, 2016, 5:23 a.m. UTC | #3
On 18 October 2016 at 02:46, Jeff Law <law@redhat.com> wrote:
> On 10/15/2016 11:59 PM, Prathamesh Kulkarni wrote:
>>
>> This patch is mostly the same as previous one, except it drops
>> targeting __udivmoddi4() because it gave undefined reference link
>> error for calling __udivmoddi4() on aarch64-linux-gnu. It appears
>> aarch64 has hardware insn for DImode div, so __udivmoddi4() isn't
>> needed for the target (it was a bug in my patch that called
>> __udivmoddi4() even though aarch64 supported hardware div).
>
> This touches on the one high level question I had.  Namely what is the code
> generation strategy if the hardware has a div, but not divmod.
The divmod transform isn't enabled if target supports hardware div in the same
or wider mode even if divmod libfunc is available for the given mode.
>
> ISTM in that case I think we want to use the div instruction and synthesize
> mod from that result rather than relying on a software divmod.  So it looks
> like you ought to be doing the right thing for that case now based on your
> comment above.
>>
>>
>> However this makes me wonder if it's guaranteed that __udivmoddi4()
>> will be available for a target if it doesn't have hardware div and
>> divmod insn and doesn't have target-specific libfunc for DImode
>> divmod ? To be conservative, the attached patch doesn't generate call
>> to __udivmoddi4.
>
> I don't think that's a safe assumption.  Addition of the divmod routines
> into libgcc is controlled by the target and have to be explicitly added
> AFAICT.
>
> So on a target like the fr30 which has no div or mod insn and doesn't define
> the right bits in libgcc, there is no divmod libcall available. (On these
> targets there's a div libcall and a mod libcall, but not a combined one).
Thanks. I had erroneously  assumed __udivimoddi4() was available for all targets
because it was defined in libgcc2.c and generated call to it as "last
resort" for unsigned
DImode case if target didn't support hardware div and divmod insn and
didn't have
target-specific divmod libfunc for DImode.
The reason why it generated undefined reference on aarch64-linux-gnu
was because I
didn't properly check in the patch if target supported hardware div,
and ended up generating call to
__udivmoddi4() even though aarch64 has hardware div. I rectified that
before posting the
patch.
>
> I don't even think we have a way of knowing in the compiler if the target
> has enabled divmod support in libgcc.
Well the arm and c6x backends register target-specific divmod libfunc via
set_optab_libfunc(). I suppose that's sufficient to know if target has
divmod enabled
in libgcc ?
>
> ISTM that for now we have to limit to cases where we have a divmod
> insn/libcall defined.
Indeed. The transform is enabled only if the target has hardware divmod insn
or divmod libfunc (in the libfunc case, no hardware div insn).
Please see divmod_candidate_p() in the patch for cases when the
transform gets enabled.

Thanks,
Prathamesh
>
> jeff
>
Richard Biener Oct. 18, 2016, 8:25 a.m. UTC | #4
On Tue, 18 Oct 2016, Prathamesh Kulkarni wrote:

> On 18 October 2016 at 02:46, Jeff Law <law@redhat.com> wrote:
> > On 10/15/2016 11:59 PM, Prathamesh Kulkarni wrote:
> >>
> >> This patch is mostly the same as previous one, except it drops
> >> targeting __udivmoddi4() because it gave undefined reference link
> >> error for calling __udivmoddi4() on aarch64-linux-gnu. It appears
> >> aarch64 has hardware insn for DImode div, so __udivmoddi4() isn't
> >> needed for the target (it was a bug in my patch that called
> >> __udivmoddi4() even though aarch64 supported hardware div).
> >
> > This touches on the one high level question I had.  Namely what is the code
> > generation strategy if the hardware has a div, but not divmod.
> The divmod transform isn't enabled if target supports hardware div in the same
> or wider mode even if divmod libfunc is available for the given mode.
> >
> > ISTM in that case I think we want to use the div instruction and synthesize
> > mod from that result rather than relying on a software divmod.  So it looks
> > like you ought to be doing the right thing for that case now based on your
> > comment above.
> >>
> >>
> >> However this makes me wonder if it's guaranteed that __udivmoddi4()
> >> will be available for a target if it doesn't have hardware div and
> >> divmod insn and doesn't have target-specific libfunc for DImode
> >> divmod ? To be conservative, the attached patch doesn't generate call
> >> to __udivmoddi4.
> >
> > I don't think that's a safe assumption.  Addition of the divmod routines
> > into libgcc is controlled by the target and have to be explicitly added
> > AFAICT.
> >
> > So on a target like the fr30 which has no div or mod insn and doesn't define
> > the right bits in libgcc, there is no divmod libcall available. (On these
> > targets there's a div libcall and a mod libcall, but not a combined one).
> Thanks. I had erroneously  assumed __udivimoddi4() was available for all targets
> because it was defined in libgcc2.c and generated call to it as "last
> resort" for unsigned
> DImode case if target didn't support hardware div and divmod insn and
> didn't have
> target-specific divmod libfunc for DImode.
> The reason why it generated undefined reference on aarch64-linux-gnu
> was because I
> didn't properly check in the patch if target supported hardware div,
> and ended up generating call to
> __udivmoddi4() even though aarch64 has hardware div. I rectified that
> before posting the
> patch.
> >
> > I don't even think we have a way of knowing in the compiler if the target
> > has enabled divmod support in libgcc.

Yeah, that's what bothers me with the current optab libfunc query
setup -- it isn't reliable.

> Well the arm and c6x backends register target-specific divmod libfunc via
> set_optab_libfunc(). I suppose that's sufficient to know if target has
> divmod enabled
> in libgcc ?
> >
> > ISTM that for now we have to limit to cases where we have a divmod
> > insn/libcall defined.
> Indeed. The transform is enabled only if the target has hardware divmod insn
> or divmod libfunc (in the libfunc case, no hardware div insn).
> Please see divmod_candidate_p() in the patch for cases when the
> transform gets enabled.

But after your patch the divmod libfunc is never available?

Richard.

> Thanks,
> Prathamesh
> >
> > jeff
> >
> 
>
Prathamesh Kulkarni Oct. 18, 2016, 8:29 a.m. UTC | #5
On 18 October 2016 at 13:55, Richard Biener <rguenther@suse.de> wrote:
> On Tue, 18 Oct 2016, Prathamesh Kulkarni wrote:
>
>> On 18 October 2016 at 02:46, Jeff Law <law@redhat.com> wrote:
>> > On 10/15/2016 11:59 PM, Prathamesh Kulkarni wrote:
>> >>
>> >> This patch is mostly the same as previous one, except it drops
>> >> targeting __udivmoddi4() because it gave undefined reference link
>> >> error for calling __udivmoddi4() on aarch64-linux-gnu. It appears
>> >> aarch64 has hardware insn for DImode div, so __udivmoddi4() isn't
>> >> needed for the target (it was a bug in my patch that called
>> >> __udivmoddi4() even though aarch64 supported hardware div).
>> >
>> > This touches on the one high level question I had.  Namely what is the code
>> > generation strategy if the hardware has a div, but not divmod.
>> The divmod transform isn't enabled if target supports hardware div in the same
>> or wider mode even if divmod libfunc is available for the given mode.
>> >
>> > ISTM in that case I think we want to use the div instruction and synthesize
>> > mod from that result rather than relying on a software divmod.  So it looks
>> > like you ought to be doing the right thing for that case now based on your
>> > comment above.
>> >>
>> >>
>> >> However this makes me wonder if it's guaranteed that __udivmoddi4()
>> >> will be available for a target if it doesn't have hardware div and
>> >> divmod insn and doesn't have target-specific libfunc for DImode
>> >> divmod ? To be conservative, the attached patch doesn't generate call
>> >> to __udivmoddi4.
>> >
>> > I don't think that's a safe assumption.  Addition of the divmod routines
>> > into libgcc is controlled by the target and have to be explicitly added
>> > AFAICT.
>> >
>> > So on a target like the fr30 which has no div or mod insn and doesn't define
>> > the right bits in libgcc, there is no divmod libcall available. (On these
>> > targets there's a div libcall and a mod libcall, but not a combined one).
>> Thanks. I had erroneously  assumed __udivimoddi4() was available for all targets
>> because it was defined in libgcc2.c and generated call to it as "last
>> resort" for unsigned
>> DImode case if target didn't support hardware div and divmod insn and
>> didn't have
>> target-specific divmod libfunc for DImode.
>> The reason why it generated undefined reference on aarch64-linux-gnu
>> was because I
>> didn't properly check in the patch if target supported hardware div,
>> and ended up generating call to
>> __udivmoddi4() even though aarch64 has hardware div. I rectified that
>> before posting the
>> patch.
>> >
>> > I don't even think we have a way of knowing in the compiler if the target
>> > has enabled divmod support in libgcc.
>
> Yeah, that's what bothers me with the current optab libfunc query
> setup -- it isn't reliable.
>
>> Well the arm and c6x backends register target-specific divmod libfunc via
>> set_optab_libfunc(). I suppose that's sufficient to know if target has
>> divmod enabled
>> in libgcc ?
>> >
>> > ISTM that for now we have to limit to cases where we have a divmod
>> > insn/libcall defined.
>> Indeed. The transform is enabled only if the target has hardware divmod insn
>> or divmod libfunc (in the libfunc case, no hardware div insn).
>> Please see divmod_candidate_p() in the patch for cases when the
>> transform gets enabled.
>
> But after your patch the divmod libfunc is never available?
After the patch, the divmod libfunc is not available if the target
doesn't explicitly set it.
So by default optab_libfunc ([us]divmod_optab, mode) would return NULL
instead of bogus libfunc
constructed with gen_int_libfunc, but if the target has registered it
with set_optab_libfunc, then
the target-specific libfunc name would be returned.

Thanks,
Prathamesh
>
> Richard.
>
>> Thanks,
>> Prathamesh
>> >
>> > jeff
>> >
>>
>>
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Jeff Law Oct. 18, 2016, 5:08 p.m. UTC | #6
On 10/18/2016 02:25 AM, Richard Biener wrote:

>>> I don't even think we have a way of knowing in the compiler if the target
>>> has enabled divmod support in libgcc.
>
> Yeah, that's what bothers me with the current optab libfunc query
> setup -- it isn't reliable.
I wonder if we ought to just have them all available in libgcc.  If 
they're one-function-per-file in the .a, then we're not going to get 
significant codesize bloat in the resulting executables.

jeff
Jeff Law Oct. 18, 2016, 9:33 p.m. UTC | #7
On 10/17/2016 11:23 PM, Prathamesh Kulkarni wrote:
> The divmod transform isn't enabled if target supports hardware div in
> the same or wider mode even if divmod libfunc is available for the
> given mode.
Good.  That seems like the right thing to do.

> Thanks. I had erroneously  assumed __udivimoddi4() was available for
> all targets because it was defined in libgcc2.c and generated call to
> it as "last resort" for unsigned DImode case if target didn't support
> hardware div and divmod insn and didn't have target-specific divmod
> libfunc for DImode. The reason why it generated undefined reference
> on aarch64-linux-gnu was because I didn't properly check in the patch
> if target supported hardware div, and ended up generating call to
> __udivmoddi4() even though aarch64 has hardware div. I rectified
> that before posting the patch.
Understood.  From a design standpoint, it seems to me like the path 
where we emit a call to udivmod without knowing if its supported by 
libgcc is broken.  But if I understand correctly, that's not affected by 
your changes -- it's simply a historical poor decision.

>>
>> I don't even think we have a way of knowing in the compiler if the
>> target has enabled divmod support in libgcc.
> Well the arm and c6x backends register target-specific divmod libfunc
> via set_optab_libfunc(). I suppose that's sufficient to know if
> target has divmod enabled in libgcc ?
It's probably a pretty reasonable assumption that if the target has 
registered a libfunc, the the libfunc ought to be available.

>>
>> ISTM that for now we have to limit to cases where we have a divmod
>> insn/libcall defined.
> Indeed. The transform is enabled only if the target has hardware
> divmod insn or divmod libfunc (in the libfunc case, no hardware div
> insn). Please see divmod_candidate_p() in the patch for cases when
> the transform gets enabled.
Great.  Thanks.  Hoping to make some progress on the actual patch in the 
next couple days.

jeff
Prathamesh Kulkarni Oct. 19, 2016, 3:30 a.m. UTC | #8
On 19 October 2016 at 03:03, Jeff Law <law@redhat.com> wrote:
> On 10/17/2016 11:23 PM, Prathamesh Kulkarni wrote:
>>
>> The divmod transform isn't enabled if target supports hardware div in
>> the same or wider mode even if divmod libfunc is available for the
>> given mode.
>
> Good.  That seems like the right thing to do.
>
>> Thanks. I had erroneously  assumed __udivimoddi4() was available for
>> all targets because it was defined in libgcc2.c and generated call to
>> it as "last resort" for unsigned DImode case if target didn't support
>> hardware div and divmod insn and didn't have target-specific divmod
>> libfunc for DImode. The reason why it generated undefined reference
>> on aarch64-linux-gnu was because I didn't properly check in the patch
>> if target supported hardware div, and ended up generating call to
>> __udivmoddi4() even though aarch64 has hardware div. I rectified
>> that before posting the patch.
>
> Understood.  From a design standpoint, it seems to me like the path where we
> emit a call to udivmod without knowing if its supported by libgcc is broken.
> But if I understand correctly, that's not affected by your changes -- it's
> simply a historical poor decision.
Err no, that poor decision was entirely mine -;)  I had wrongly
assumed __udivmoddi4 to be always available
and got surprised when it gave undefined reference error on aarch64
and hence brought it up for discussion.
I removed those parts of the patch that generated call to
__udivmoddi4() before posting the patch upstream.
>
>>>
>>> I don't even think we have a way of knowing in the compiler if the
>>> target has enabled divmod support in libgcc.
>>
>> Well the arm and c6x backends register target-specific divmod libfunc
>> via set_optab_libfunc(). I suppose that's sufficient to know if
>> target has divmod enabled in libgcc ?
>
> It's probably a pretty reasonable assumption that if the target has
> registered a libfunc, the the libfunc ought to be available.
>
>>>
>>> ISTM that for now we have to limit to cases where we have a divmod
>>> insn/libcall defined.
>>
>> Indeed. The transform is enabled only if the target has hardware
>> divmod insn or divmod libfunc (in the libfunc case, no hardware div
>> insn). Please see divmod_candidate_p() in the patch for cases when
>> the transform gets enabled.
>
> Great.  Thanks.  Hoping to make some progress on the actual patch in the
> next couple days.
Thanks!

Regards,
Prathamesh
>
> jeff
Jeff Law Oct. 19, 2016, 7:57 p.m. UTC | #9
On 10/15/2016 11:59 PM, Prathamesh Kulkarni wrote:
> Hi,
> After approval from Bernd Schmidt, I committed the patch to remove
> optab functions for
> sdivmod_optab and udivmod_optab in optabs.def, which removes the block
> for divmod patch.
>
> This patch is mostly the same as previous one, except it drops
> targeting __udivmoddi4() because
> it gave undefined reference link error for calling __udivmoddi4() on
> aarch64-linux-gnu.
> It appears aarch64 has hardware insn for DImode div, so __udivmoddi4()
> isn't needed for the target
> (it was a bug in my patch that called __udivmoddi4() even though
> aarch64 supported hardware div).
>
> However this makes me wonder if it's guaranteed that __udivmoddi4()
> will be available for a target if it doesn't have hardware div and
> divmod insn and doesn't have target-specific libfunc for
> DImode divmod ? To be conservative, the attached patch doesn't
> generate call to __udivmoddi4.
>
> Passes bootstrap+test on x86_64-unknown-linux.
> Cross-tested on arm*-*-*, aarch64*-*-*.
> Verified that there are no regressions with SPEC2006 on
> x86_64-unknown-linux-gnu.
> OK to commit ?
>
> Thanks,
> Prathamesh
>
>
> divmod-v2-3-main.txt
>
>
> 2016-10-15  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>
> 	    Kugan Vivekanandarajah  <kuganv@linaro.org>
> 	    Jim Wilson  <jim.wilson@linaro.org>
>
> 	    * target.def: New hook expand_divmod_libfunc.
> 	    * doc/tm.texi.in: Add hook for TARGET_EXPAND_DIVMOD_LIBFUNC
> 	    * doc/tm.texi: Regenerate.
> 	    * internal-fn.def: Add new entry for DIVMOD ifn.
> 	    * internal-fn.c (expand_DIVMOD): New.
> 	    * tree-ssa-math-opts.c: Include optabs-libfuncs.h, tree-eh.h,
> 	    targhooks.h.
> 	    (widen_mul_stats): Add new field divmod_calls_inserted.
> 	    (target_supports_divmod_p): New.
> 	    (divmod_candidate_p): Likewise.
> 	    (convert_to_divmod): Likewise.
> 	    (pass_optimize_widening_mul::execute): Call
> 	    calculate_dominance_info(), renumber_gimple_stmt_uids() at
> 	    beginning of function. Call convert_to_divmod()
> 	    and record stats for divmod.
Starting with some high level design comments.  If these conflict with 
comments from others, let me know and we'll work through the issues.

I don't really like introducing code conditional on the target 
capabilities this early in the gimple optimization pipeline.

Would it be possible to always do the transformation to divmod in the 
gimple optimizers, regardless of the target capabilities.  Then in the 
gimple->RTL expanders make a final decision about divmod insn, libcall, 
or using div/mod insns?

That would move all the target dependencies out of the gimple optimizers 
and into the gimple->rtl expansion phase, which is the preferred place 
to start introducing this kind of target dependency.

With that background, I'm going to focus more on the identification of 
divmod opportunities than the expansion bits.


>
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index a4a8e49..866c368 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -7078,6 +7078,11 @@ This is firstly introduced on ARM/AArch64 targets, please refer to
>  the hook implementation for how different fusion types are supported.
>  @end deftypefn
>
> +@deftypefn {Target Hook} void TARGET_EXPAND_DIVMOD_LIBFUNC (rtx @var{libfunc}, machine_mode @var{mode}, rtx @var{op0}, rtx @var{op1}, rtx *@var{quot}, rtx *@var{rem})
> +Define this hook for enabling divmod transform if the port does not have
> +hardware divmod insn but defines target-specific divmod libfuncs.
> +@end deftypefn
> +
>  @node Sections
>  @section Dividing the Output into Sections (Texts, Data, @dots{})
>  @c the above section title is WAY too long.  maybe cut the part between
> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> index 265f1be..c4c387b 100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -4890,6 +4890,8 @@ them: try the first ones in this list first.
>
>  @hook TARGET_SCHED_FUSION_PRIORITY
>
> +@hook TARGET_EXPAND_DIVMOD_LIBFUNC
> +
>  @node Sections
>  @section Dividing the Output into Sections (Texts, Data, @dots{})
>  @c the above section title is WAY too long.  maybe cut the part between
> diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> index 0b32d5f..42c6973 100644
> --- a/gcc/internal-fn.c
> +++ b/gcc/internal-fn.c
> @@ -2207,6 +2207,53 @@ expand_ATOMIC_COMPARE_EXCHANGE (internal_fn, gcall *call)
>    expand_ifn_atomic_compare_exchange (call);
>  }
>
> +/* Expand DIVMOD() using:
In general, we do not use () when referring to objects in comments.

> + a) optab handler for udivmod/sdivmod if it is available.
> + b) If optab_handler doesn't exist, generate call to
> +    target-specific divmod libfunc.  */
> +
> +static void
> +expand_DIVMOD (internal_fn, gcall *call_stmt)
In general, don't use upper case for function names.  Lower case please. 
  But I believe we should be pushing all the expansion stuff into the 
gimple->rtl expansion code, so I haven't looked at this closely yet on 
the expectation it'll likely move.



> diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
> index 0cea1a8..4f7bdb2 100644
> --- a/gcc/tree-ssa-math-opts.c
> +++ b/gcc/tree-ssa-math-opts.c
> @@ -3793,6 +3799,226 @@ match_uaddsub_overflow (gimple_stmt_iterator *gsi, gimple *stmt,
>    return true;
>  }
>
> +/* Return true if target has support for divmod.  */
> +
> +static bool
> +target_supports_divmod_p (optab divmod_optab, optab div_optab, machine_mode mode)
So this is the kind of code I expect to avoid in the gimple optimizers. 
Instead the decision about whether to use a divmod insn, divmod libfunc 
or div + synthesized mod, or separate div and mod insns should happen at 
gimple->rtl expansion time.


> +
> +/* This function looks for:
> +   t1 = a TRUNC_DIV_EXPR b;
> +   t2 = a TRUNC_MOD_EXPR b;
> +   and transforms it to the following sequence:
> +   complex_tmp = DIVMOD (a, b);
> +   t1 = REALPART_EXPR(a);
> +   t2 = IMAGPART_EXPR(b);
> +   For conditions enabling the transform see divmod_candidate_p().
> +
> +   The pass works in two phases:
> +   1) Walk through all immediate uses of stmt's operand and find a
> +      TRUNC_DIV_EXPR with matching operands and if such a stmt is found add
> +      it to stmts vector.
> +   2) Insert DIVMOD call before first div/mod stmt in top_bb (basic block that
> +      dominates other div/mod stmts with same operands) and update entries in
> +      stmts vector to use return value of DIMOVD (REALEXPR_PART for div,
> +      IMAGPART_EXPR for mod).  */
> +
> +static bool
> +convert_to_divmod (gassign *stmt)
> +{
> +  if (!divmod_candidate_p (stmt))
> +    return false;
> +
> +  tree op1 = gimple_assign_rhs1 (stmt);
> +  tree op2 = gimple_assign_rhs2 (stmt);
> +
> +  imm_use_iterator use_iter;
> +  gimple *use_stmt;
> +  auto_vec<gimple *> stmts;
> +
> +  gimple *top_stmt = stmt;
> +  basic_block top_bb = gimple_bb (stmt);
> +
> +  /* Try to set top_stmt to "topmost" stmt
> +     with code TRUNC_DIV_EXPR/TRUNC_MOD_EXPR having same operands as stmt.  */
> +
> +  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1)
> +    {
> +      if (is_gimple_assign (use_stmt)
> +	  && (gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR
> +	      || gimple_assign_rhs_code (use_stmt) == TRUNC_MOD_EXPR)
> +	  && operand_equal_p (op1, gimple_assign_rhs1 (use_stmt), 0)
> +	  && operand_equal_p (op2, gimple_assign_rhs2 (use_stmt), 0))
So why check for TRUNC_MOD_EXPR here?  ISTM that stmt is always going to 
be TRUNC_MOD_EXPR and thus you're only interested in looking for a 
matching TRUNC_DIV_EXPR statement.

The only way I could see TRUNC_MOD_EXPR being useful here would be if 
there is a redundant TRUNC_MOD_EXPR in the IL, which would be a sign 
that some other optimizer hasn't done its job.  Have you seen this to be 
useful in practice?

> +	{
> +	  if (stmt_can_throw_internal (use_stmt))
> +	    continue;
> +
> +	  basic_block bb = gimple_bb (use_stmt);
> +
> +	  if (bb == top_bb)
> +	    {
> +	      if (gimple_uid (use_stmt) < gimple_uid (top_stmt))
> +		top_stmt = use_stmt;
> +	    }
> +	  else if (dominated_by_p (CDI_DOMINATORS, top_bb, bb))
> +	    {
> +	      top_bb = bb;
> +	      top_stmt = use_stmt;
> +	    }
Do you want to break out of the immediate use loop once you've found a 
TRUNC_DIV_EXPR statement with matching operands?

I could perhaps see value in finding the closest TRUNC_DIV_EXPR to the 
TRUNC_MOD_EXPR, but that would mean that we've got redundant 
TRUNC_DIV_EXPR statements in the IL, which presumably doesn't happen if 
the rest of the optimizers are doing their job.



> +	}
> +    }
> +
> +  if (top_stmt == stmt && stmt_can_throw_internal (top_stmt))
> +    return false;
Don't you just want
if (stmt_can_throw_internal (stmt))
   return false;

Before the loop over the immediate uses, thus avoiding the loop if we've 
got a TRUNC_MOD_EXPR that we can't optimize?



> +
> +  tree top_op1 = gimple_assign_rhs1 (top_stmt);
> +  tree top_op2 = gimple_assign_rhs2 (top_stmt);
> +
> +  stmts.safe_push (top_stmt);
> +  bool div_seen = (gimple_assign_rhs_code (top_stmt) == TRUNC_DIV_EXPR);
> +
> +  /* Ensure that gimple_bb (use_stmt) is dominated by top_bb.  */
?!?  It's not clear why you'd need to do another immediate use loop here.

ISTM at this point you've found two statements, one a TRUNC_DIV the 
other a TRUNC_MOD with the same operands.  Given that, ISTM you just 
need to check the dominance relationship of the blocks containing those 
statements.

It really feels like you're going out of your way to handle cases where 
we have redundant expressions in the IL.  Maybe I'm wrong. BUt if I'm 
right, I'd rather see that time spent optimizing away those redundant 
expressions in FRE/PRE/DOM.

> +
> +  /* Create libcall to internal fn DIVMOD:
> +     divmod_tmp = DIVMOD (op1, op2).  */
> +
> +  gcall *call_stmt = gimple_build_call_internal (IFN_DIVMOD, 2, op1, op2);
> +  tree res = make_temp_ssa_name (
> +		build_complex_type (TREE_TYPE (op1)),
> +		call_stmt, "divmod_tmp");
Please review the the GCC formatting guidelines.  This looks wrong.

tree res = make_temp_ssa_name (build_complex_type (TREE_TYPE (op1)),
                                call_stmt, "divmod_tmp");

Seems like the right formatting to me.

> +
> +  /* Update all statements in stmts.
> +     if stmt is lhs = op1 TRUNC_DIV_EXPR op2, change to lhs = REALPART_EXPR<divmod_tmp>
> +     if stmt is lhs = op1 TRUNC_MOD_EXPR op2, change to lhs = IMAGPART_EXPR<divmod_tmp>.  */
I'd just emit a copy from RES to the appropriate lhs operand just after 
the divmod and delete the now unnecessary TRUNC_DIV_EXPR and 
TRUNC_MOD_EXPR statements.

Copy propagation should clean that up just fine and it simplifies your 
implementation.

> +
> +  bool cfg_changed = false;
> +  for (unsigned i = 0; stmts.iterate (i, &use_stmt); ++i)
> +    {
> +      tree new_rhs;
> +
> +      switch (gimple_assign_rhs_code (use_stmt))
> +	{
> +	  case TRUNC_DIV_EXPR:
> +	    new_rhs = fold_build1 (REALPART_EXPR, TREE_TYPE (op1), res);
> +	    break;
> +
> +	  case TRUNC_MOD_EXPR:
> +	    new_rhs = fold_build1 (IMAGPART_EXPR, TREE_TYPE (op2), res);
> +	    break;
> +
> +	  default:
> +	    gcc_unreachable ();
> +	}
> +
> +      gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
> +      gimple_assign_set_rhs_from_tree (&gsi, new_rhs);
> +      update_stmt (use_stmt);
> +
> +      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
> +	cfg_changed = true;
Does this ever happen given how you filter out throwing statements earlier?


> @@ -3870,6 +4098,10 @@ pass_optimize_widening_mul::execute (function *fun)
>  		    match_uaddsub_overflow (&gsi, stmt, code);
>  		  break;
>
> +		case TRUNC_MOD_EXPR:
> +		  cfg_changed = convert_to_divmod (as_a<gassign *> (stmt));
> +		  break;
> +
>  		default:;
>  		}
>  	    }
If I'm reading this correctly, ISTM that you only search for a divmod 
pair if you first find the mod insn.  That's probably reasonable.  We 
have to find both to have an optimization opportunity and I suspect we 
have more div than mod insns, so you're presumably minimizing the search 
space with this approach.  Can you confirm that was your intention here, 
or were you looking to do something else?


Jeff
Richard Biener Oct. 20, 2016, 9:32 a.m. UTC | #10
On Wed, 19 Oct 2016, Jeff Law wrote:

> On 10/15/2016 11:59 PM, Prathamesh Kulkarni wrote:
> > Hi,
> > After approval from Bernd Schmidt, I committed the patch to remove
> > optab functions for
> > sdivmod_optab and udivmod_optab in optabs.def, which removes the block
> > for divmod patch.
> > 
> > This patch is mostly the same as previous one, except it drops
> > targeting __udivmoddi4() because
> > it gave undefined reference link error for calling __udivmoddi4() on
> > aarch64-linux-gnu.
> > It appears aarch64 has hardware insn for DImode div, so __udivmoddi4()
> > isn't needed for the target
> > (it was a bug in my patch that called __udivmoddi4() even though
> > aarch64 supported hardware div).
> > 
> > However this makes me wonder if it's guaranteed that __udivmoddi4()
> > will be available for a target if it doesn't have hardware div and
> > divmod insn and doesn't have target-specific libfunc for
> > DImode divmod ? To be conservative, the attached patch doesn't
> > generate call to __udivmoddi4.
> > 
> > Passes bootstrap+test on x86_64-unknown-linux.
> > Cross-tested on arm*-*-*, aarch64*-*-*.
> > Verified that there are no regressions with SPEC2006 on
> > x86_64-unknown-linux-gnu.
> > OK to commit ?
> > 
> > Thanks,
> > Prathamesh
> > 
> > 
> > divmod-v2-3-main.txt
> > 
> > 
> > 2016-10-15  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>
> > 	    Kugan Vivekanandarajah  <kuganv@linaro.org>
> > 	    Jim Wilson  <jim.wilson@linaro.org>
> > 
> > 	    * target.def: New hook expand_divmod_libfunc.
> > 	    * doc/tm.texi.in: Add hook for TARGET_EXPAND_DIVMOD_LIBFUNC
> > 	    * doc/tm.texi: Regenerate.
> > 	    * internal-fn.def: Add new entry for DIVMOD ifn.
> > 	    * internal-fn.c (expand_DIVMOD): New.
> > 	    * tree-ssa-math-opts.c: Include optabs-libfuncs.h, tree-eh.h,
> > 	    targhooks.h.
> > 	    (widen_mul_stats): Add new field divmod_calls_inserted.
> > 	    (target_supports_divmod_p): New.
> > 	    (divmod_candidate_p): Likewise.
> > 	    (convert_to_divmod): Likewise.
> > 	    (pass_optimize_widening_mul::execute): Call
> > 	    calculate_dominance_info(), renumber_gimple_stmt_uids() at
> > 	    beginning of function. Call convert_to_divmod()
> > 	    and record stats for divmod.
> Starting with some high level design comments.  If these conflict with
> comments from others, let me know and we'll work through the issues.
> 
> I don't really like introducing code conditional on the target capabilities
> this early in the gimple optimization pipeline.

It's basically done right before RTL expansion 
(pass_optimize_widening_mul).

> Would it be possible to always do the transformation to divmod in the gimple
> optimizers, regardless of the target capabilities.  Then in the gimple->RTL
> expanders make a final decision about divmod insn, libcall, or using div/mod
> insns?

The issue is that it hoists one or both the division or the modulo and
if we don't do the transform we'd want to undo that code motion.

> That would move all the target dependencies out of the gimple optimizers and
> into the gimple->rtl expansion phase, which is the preferred place to start
> introducing this kind of target dependency.
> 
> With that background, I'm going to focus more on the identification of divmod
> opportunities than the expansion bits.
> 
> 
> > 
> > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> > index a4a8e49..866c368 100644
> > --- a/gcc/doc/tm.texi
> > +++ b/gcc/doc/tm.texi
> > @@ -7078,6 +7078,11 @@ This is firstly introduced on ARM/AArch64 targets,
> > please refer to
> >  the hook implementation for how different fusion types are supported.
> >  @end deftypefn
> > 
> > +@deftypefn {Target Hook} void TARGET_EXPAND_DIVMOD_LIBFUNC (rtx
> > @var{libfunc}, machine_mode @var{mode}, rtx @var{op0}, rtx @var{op1}, rtx
> > *@var{quot}, rtx *@var{rem})
> > +Define this hook for enabling divmod transform if the port does not have
> > +hardware divmod insn but defines target-specific divmod libfuncs.
> > +@end deftypefn
> > +
> >  @node Sections
> >  @section Dividing the Output into Sections (Texts, Data, @dots{})
> >  @c the above section title is WAY too long.  maybe cut the part between
> > diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> > index 265f1be..c4c387b 100644
> > --- a/gcc/doc/tm.texi.in
> > +++ b/gcc/doc/tm.texi.in
> > @@ -4890,6 +4890,8 @@ them: try the first ones in this list first.
> > 
> >  @hook TARGET_SCHED_FUSION_PRIORITY
> > 
> > +@hook TARGET_EXPAND_DIVMOD_LIBFUNC
> > +
> >  @node Sections
> >  @section Dividing the Output into Sections (Texts, Data, @dots{})
> >  @c the above section title is WAY too long.  maybe cut the part between
> > diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> > index 0b32d5f..42c6973 100644
> > --- a/gcc/internal-fn.c
> > +++ b/gcc/internal-fn.c
> > @@ -2207,6 +2207,53 @@ expand_ATOMIC_COMPARE_EXCHANGE (internal_fn, gcall
> > *call)
> >    expand_ifn_atomic_compare_exchange (call);
> >  }
> > 
> > +/* Expand DIVMOD() using:
> In general, we do not use () when referring to objects in comments.
> 
> > + a) optab handler for udivmod/sdivmod if it is available.
> > + b) If optab_handler doesn't exist, generate call to
> > +    target-specific divmod libfunc.  */
> > +
> > +static void
> > +expand_DIVMOD (internal_fn, gcall *call_stmt)
> In general, don't use upper case for function names.  Lower case please.  But
> I believe we should be pushing all the expansion stuff into the gimple->rtl
> expansion code, so I haven't looked at this closely yet on the expectation
> it'll likely move.
> 
> 
> 
> > diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
> > index 0cea1a8..4f7bdb2 100644
> > --- a/gcc/tree-ssa-math-opts.c
> > +++ b/gcc/tree-ssa-math-opts.c
> > @@ -3793,6 +3799,226 @@ match_uaddsub_overflow (gimple_stmt_iterator *gsi,
> > gimple *stmt,
> >    return true;
> >  }
> > 
> > +/* Return true if target has support for divmod.  */
> > +
> > +static bool
> > +target_supports_divmod_p (optab divmod_optab, optab div_optab, machine_mode
> > mode)
> So this is the kind of code I expect to avoid in the gimple optimizers.
> Instead the decision about whether to use a divmod insn, divmod libfunc or div
> + synthesized mod, or separate div and mod insns should happen at gimple->rtl
> expansion time.
> 
> 
> > +
> > +/* This function looks for:
> > +   t1 = a TRUNC_DIV_EXPR b;
> > +   t2 = a TRUNC_MOD_EXPR b;
> > +   and transforms it to the following sequence:
> > +   complex_tmp = DIVMOD (a, b);
> > +   t1 = REALPART_EXPR(a);
> > +   t2 = IMAGPART_EXPR(b);
> > +   For conditions enabling the transform see divmod_candidate_p().
> > +
> > +   The pass works in two phases:
> > +   1) Walk through all immediate uses of stmt's operand and find a
> > +      TRUNC_DIV_EXPR with matching operands and if such a stmt is found add
> > +      it to stmts vector.
> > +   2) Insert DIVMOD call before first div/mod stmt in top_bb (basic block
> > that
> > +      dominates other div/mod stmts with same operands) and update entries
> > in
> > +      stmts vector to use return value of DIMOVD (REALEXPR_PART for div,
> > +      IMAGPART_EXPR for mod).  */
> > +
> > +static bool
> > +convert_to_divmod (gassign *stmt)
> > +{
> > +  if (!divmod_candidate_p (stmt))
> > +    return false;
> > +
> > +  tree op1 = gimple_assign_rhs1 (stmt);
> > +  tree op2 = gimple_assign_rhs2 (stmt);
> > +
> > +  imm_use_iterator use_iter;
> > +  gimple *use_stmt;
> > +  auto_vec<gimple *> stmts;
> > +
> > +  gimple *top_stmt = stmt;
> > +  basic_block top_bb = gimple_bb (stmt);
> > +
> > +  /* Try to set top_stmt to "topmost" stmt
> > +     with code TRUNC_DIV_EXPR/TRUNC_MOD_EXPR having same operands as stmt.
> > */
> > +
> > +  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1)
> > +    {
> > +      if (is_gimple_assign (use_stmt)
> > +	  && (gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR
> > +	      || gimple_assign_rhs_code (use_stmt) == TRUNC_MOD_EXPR)
> > +	  && operand_equal_p (op1, gimple_assign_rhs1 (use_stmt), 0)
> > +	  && operand_equal_p (op2, gimple_assign_rhs2 (use_stmt), 0))
> So why check for TRUNC_MOD_EXPR here?  ISTM that stmt is always going to be
> TRUNC_MOD_EXPR and thus you're only interested in looking for a matching
> TRUNC_DIV_EXPR statement.
> 
> The only way I could see TRUNC_MOD_EXPR being useful here would be if there is
> a redundant TRUNC_MOD_EXPR in the IL, which would be a sign that some other
> optimizer hasn't done its job.  Have you seen this to be useful in practice?

Note that I've reviewed this parts already and we restructured things
in this way, requiring to look for TRUNC_MOD_EXPR to properly handle
finding a dominating trunc _or_ div and interatively doing so
correctly if we have more than one pair.

> > +	{
> > +	  if (stmt_can_throw_internal (use_stmt))
> > +	    continue;
> > +
> > +	  basic_block bb = gimple_bb (use_stmt);
> > +
> > +	  if (bb == top_bb)
> > +	    {
> > +	      if (gimple_uid (use_stmt) < gimple_uid (top_stmt))
> > +		top_stmt = use_stmt;
> > +	    }
> > +	  else if (dominated_by_p (CDI_DOMINATORS, top_bb, bb))
> > +	    {
> > +	      top_bb = bb;
> > +	      top_stmt = use_stmt;
> > +	    }
> Do you want to break out of the immediate use loop once you've found a
> TRUNC_DIV_EXPR statement with matching operands?
> 
> I could perhaps see value in finding the closest TRUNC_DIV_EXPR to the
> TRUNC_MOD_EXPR, but that would mean that we've got redundant TRUNC_DIV_EXPR
> statements in the IL, which presumably doesn't happen if the rest of the
> optimizers are doing their job.
> 
> 
> 
> > +	}
> > +    }
> > +
> > +  if (top_stmt == stmt && stmt_can_throw_internal (top_stmt))
> > +    return false;
> Don't you just want
> if (stmt_can_throw_internal (stmt))
>   return false;
> 
> Before the loop over the immediate uses, thus avoiding the loop if we've got a
> TRUNC_MOD_EXPR that we can't optimize?
> 
> 
> 
> > +
> > +  tree top_op1 = gimple_assign_rhs1 (top_stmt);
> > +  tree top_op2 = gimple_assign_rhs2 (top_stmt);
> > +
> > +  stmts.safe_push (top_stmt);
> > +  bool div_seen = (gimple_assign_rhs_code (top_stmt) == TRUNC_DIV_EXPR);
> > +
> > +  /* Ensure that gimple_bb (use_stmt) is dominated by top_bb.  */
> ?!?  It's not clear why you'd need to do another immediate use loop here.
> 
> ISTM at this point you've found two statements, one a TRUNC_DIV the other a
> TRUNC_MOD with the same operands.  Given that, ISTM you just need to check the
> dominance relationship of the blocks containing those statements.
> 
> It really feels like you're going out of your way to handle cases where we
> have redundant expressions in the IL.  Maybe I'm wrong. BUt if I'm right, I'd
> rather see that time spent optimizing away those redundant expressions in
> FRE/PRE/DOM.
> 
> > +
> > +  /* Create libcall to internal fn DIVMOD:
> > +     divmod_tmp = DIVMOD (op1, op2).  */
> > +
> > +  gcall *call_stmt = gimple_build_call_internal (IFN_DIVMOD, 2, op1, op2);
> > +  tree res = make_temp_ssa_name (
> > +		build_complex_type (TREE_TYPE (op1)),
> > +		call_stmt, "divmod_tmp");
> Please review the the GCC formatting guidelines.  This looks wrong.
> 
> tree res = make_temp_ssa_name (build_complex_type (TREE_TYPE (op1)),
>                                call_stmt, "divmod_tmp");
> 
> Seems like the right formatting to me.
> 
> > +
> > +  /* Update all statements in stmts.
> > +     if stmt is lhs = op1 TRUNC_DIV_EXPR op2, change to lhs =
> > REALPART_EXPR<divmod_tmp>
> > +     if stmt is lhs = op1 TRUNC_MOD_EXPR op2, change to lhs =
> > IMAGPART_EXPR<divmod_tmp>.  */
> I'd just emit a copy from RES to the appropriate lhs operand just after the
> divmod and delete the now unnecessary TRUNC_DIV_EXPR and TRUNC_MOD_EXPR
> statements.

That sounds like a good idea.

> Copy propagation should clean that up just fine and it simplifies your
> implementation.

There should be no copy to propagate that way.

> > +
> > +  bool cfg_changed = false;
> > +  for (unsigned i = 0; stmts.iterate (i, &use_stmt); ++i)
> > +    {
> > +      tree new_rhs;
> > +
> > +      switch (gimple_assign_rhs_code (use_stmt))
> > +	{
> > +	  case TRUNC_DIV_EXPR:
> > +	    new_rhs = fold_build1 (REALPART_EXPR, TREE_TYPE (op1), res);
> > +	    break;
> > +
> > +	  case TRUNC_MOD_EXPR:
> > +	    new_rhs = fold_build1 (IMAGPART_EXPR, TREE_TYPE (op2), res);
> > +	    break;
> > +
> > +	  default:
> > +	    gcc_unreachable ();
> > +	}
> > +
> > +      gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
> > +      gimple_assign_set_rhs_from_tree (&gsi, new_rhs);
> > +      update_stmt (use_stmt);
> > +
> > +      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
> > +	cfg_changed = true;
> Does this ever happen given how you filter out throwing statements earlier?

We filter out throwing stmts that are not "top".  The top one we replace
with the divmod call and we can preserve EH for that (and thus handle
the case where the original div/mod at this place may throw).

> 
> > @@ -3870,6 +4098,10 @@ pass_optimize_widening_mul::execute (function *fun)
> >  		    match_uaddsub_overflow (&gsi, stmt, code);
> >  		  break;
> > 
> > +		case TRUNC_MOD_EXPR:
> > +		  cfg_changed = convert_to_divmod (as_a<gassign *> (stmt));
> > +		  break;
> > +
> >  		default:;
> >  		}
> >  	    }
> If I'm reading this correctly, ISTM that you only search for a divmod pair if
> you first find the mod insn.  That's probably reasonable.  We have to find
> both to have an optimization opportunity and I suspect we have more div than
> mod insns, so you're presumably minimizing the search space with this
> approach.  Can you confirm that was your intention here, or were you looking
> to do something else?

Yes, that was the idea.

Richard.
Prathamesh Kulkarni Oct. 21, 2016, 10:34 a.m. UTC | #11
On 20 October 2016 at 15:02, Richard Biener <rguenther@suse.de> wrote:
> On Wed, 19 Oct 2016, Jeff Law wrote:
>
>> On 10/15/2016 11:59 PM, Prathamesh Kulkarni wrote:
>> > Hi,
>> > After approval from Bernd Schmidt, I committed the patch to remove
>> > optab functions for
>> > sdivmod_optab and udivmod_optab in optabs.def, which removes the block
>> > for divmod patch.
>> >
>> > This patch is mostly the same as previous one, except it drops
>> > targeting __udivmoddi4() because
>> > it gave undefined reference link error for calling __udivmoddi4() on
>> > aarch64-linux-gnu.
>> > It appears aarch64 has hardware insn for DImode div, so __udivmoddi4()
>> > isn't needed for the target
>> > (it was a bug in my patch that called __udivmoddi4() even though
>> > aarch64 supported hardware div).
>> >
>> > However this makes me wonder if it's guaranteed that __udivmoddi4()
>> > will be available for a target if it doesn't have hardware div and
>> > divmod insn and doesn't have target-specific libfunc for
>> > DImode divmod ? To be conservative, the attached patch doesn't
>> > generate call to __udivmoddi4.
>> >
>> > Passes bootstrap+test on x86_64-unknown-linux.
>> > Cross-tested on arm*-*-*, aarch64*-*-*.
>> > Verified that there are no regressions with SPEC2006 on
>> > x86_64-unknown-linux-gnu.
>> > OK to commit ?
>> >
>> > Thanks,
>> > Prathamesh
>> >
>> >
>> > divmod-v2-3-main.txt
>> >
>> >
>> > 2016-10-15  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>
>> >         Kugan Vivekanandarajah  <kuganv@linaro.org>
>> >         Jim Wilson  <jim.wilson@linaro.org>
>> >
>> >         * target.def: New hook expand_divmod_libfunc.
>> >         * doc/tm.texi.in: Add hook for TARGET_EXPAND_DIVMOD_LIBFUNC
>> >         * doc/tm.texi: Regenerate.
>> >         * internal-fn.def: Add new entry for DIVMOD ifn.
>> >         * internal-fn.c (expand_DIVMOD): New.
>> >         * tree-ssa-math-opts.c: Include optabs-libfuncs.h, tree-eh.h,
>> >         targhooks.h.
>> >         (widen_mul_stats): Add new field divmod_calls_inserted.
>> >         (target_supports_divmod_p): New.
>> >         (divmod_candidate_p): Likewise.
>> >         (convert_to_divmod): Likewise.
>> >         (pass_optimize_widening_mul::execute): Call
>> >         calculate_dominance_info(), renumber_gimple_stmt_uids() at
>> >         beginning of function. Call convert_to_divmod()
>> >         and record stats for divmod.
>> Starting with some high level design comments.  If these conflict with
>> comments from others, let me know and we'll work through the issues.
>>
>> I don't really like introducing code conditional on the target capabilities
>> this early in the gimple optimization pipeline.
>
> It's basically done right before RTL expansion
> (pass_optimize_widening_mul).
>
>> Would it be possible to always do the transformation to divmod in the gimple
>> optimizers, regardless of the target capabilities.  Then in the gimple->RTL
>> expanders make a final decision about divmod insn, libcall, or using div/mod
>> insns?
>
> The issue is that it hoists one or both the division or the modulo and
> if we don't do the transform we'd want to undo that code motion.
>
>> That would move all the target dependencies out of the gimple optimizers and
>> into the gimple->rtl expansion phase, which is the preferred place to start
>> introducing this kind of target dependency.
>>
>> With that background, I'm going to focus more on the identification of divmod
>> opportunities than the expansion bits.
>>
>>
>> >
>> > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
>> > index a4a8e49..866c368 100644
>> > --- a/gcc/doc/tm.texi
>> > +++ b/gcc/doc/tm.texi
>> > @@ -7078,6 +7078,11 @@ This is firstly introduced on ARM/AArch64 targets,
>> > please refer to
>> >  the hook implementation for how different fusion types are supported.
>> >  @end deftypefn
>> >
>> > +@deftypefn {Target Hook} void TARGET_EXPAND_DIVMOD_LIBFUNC (rtx
>> > @var{libfunc}, machine_mode @var{mode}, rtx @var{op0}, rtx @var{op1}, rtx
>> > *@var{quot}, rtx *@var{rem})
>> > +Define this hook for enabling divmod transform if the port does not have
>> > +hardware divmod insn but defines target-specific divmod libfuncs.
>> > +@end deftypefn
>> > +
>> >  @node Sections
>> >  @section Dividing the Output into Sections (Texts, Data, @dots{})
>> >  @c the above section title is WAY too long.  maybe cut the part between
>> > diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
>> > index 265f1be..c4c387b 100644
>> > --- a/gcc/doc/tm.texi.in
>> > +++ b/gcc/doc/tm.texi.in
>> > @@ -4890,6 +4890,8 @@ them: try the first ones in this list first.
>> >
>> >  @hook TARGET_SCHED_FUSION_PRIORITY
>> >
>> > +@hook TARGET_EXPAND_DIVMOD_LIBFUNC
>> > +
>> >  @node Sections
>> >  @section Dividing the Output into Sections (Texts, Data, @dots{})
>> >  @c the above section title is WAY too long.  maybe cut the part between
>> > diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
>> > index 0b32d5f..42c6973 100644
>> > --- a/gcc/internal-fn.c
>> > +++ b/gcc/internal-fn.c
>> > @@ -2207,6 +2207,53 @@ expand_ATOMIC_COMPARE_EXCHANGE (internal_fn, gcall
>> > *call)
>> >    expand_ifn_atomic_compare_exchange (call);
>> >  }
>> >
>> > +/* Expand DIVMOD() using:
>> In general, we do not use () when referring to objects in comments.
>>
>> > + a) optab handler for udivmod/sdivmod if it is available.
>> > + b) If optab_handler doesn't exist, generate call to
>> > +    target-specific divmod libfunc.  */
>> > +
>> > +static void
>> > +expand_DIVMOD (internal_fn, gcall *call_stmt)
>> In general, don't use upper case for function names.  Lower case please.  But
>> I believe we should be pushing all the expansion stuff into the gimple->rtl
>> expansion code, so I haven't looked at this closely yet on the expectation
>> it'll likely move.
>>
>>
>>
>> > diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
>> > index 0cea1a8..4f7bdb2 100644
>> > --- a/gcc/tree-ssa-math-opts.c
>> > +++ b/gcc/tree-ssa-math-opts.c
>> > @@ -3793,6 +3799,226 @@ match_uaddsub_overflow (gimple_stmt_iterator *gsi,
>> > gimple *stmt,
>> >    return true;
>> >  }
>> >
>> > +/* Return true if target has support for divmod.  */
>> > +
>> > +static bool
>> > +target_supports_divmod_p (optab divmod_optab, optab div_optab, machine_mode
>> > mode)
>> So this is the kind of code I expect to avoid in the gimple optimizers.
>> Instead the decision about whether to use a divmod insn, divmod libfunc or div
>> + synthesized mod, or separate div and mod insns should happen at gimple->rtl
>> expansion time.
>>
>>
>> > +
>> > +/* This function looks for:
>> > +   t1 = a TRUNC_DIV_EXPR b;
>> > +   t2 = a TRUNC_MOD_EXPR b;
>> > +   and transforms it to the following sequence:
>> > +   complex_tmp = DIVMOD (a, b);
>> > +   t1 = REALPART_EXPR(a);
>> > +   t2 = IMAGPART_EXPR(b);
>> > +   For conditions enabling the transform see divmod_candidate_p().
>> > +
>> > +   The pass works in two phases:
>> > +   1) Walk through all immediate uses of stmt's operand and find a
>> > +      TRUNC_DIV_EXPR with matching operands and if such a stmt is found add
>> > +      it to stmts vector.
>> > +   2) Insert DIVMOD call before first div/mod stmt in top_bb (basic block
>> > that
>> > +      dominates other div/mod stmts with same operands) and update entries
>> > in
>> > +      stmts vector to use return value of DIMOVD (REALEXPR_PART for div,
>> > +      IMAGPART_EXPR for mod).  */
>> > +
>> > +static bool
>> > +convert_to_divmod (gassign *stmt)
>> > +{
>> > +  if (!divmod_candidate_p (stmt))
>> > +    return false;
>> > +
>> > +  tree op1 = gimple_assign_rhs1 (stmt);
>> > +  tree op2 = gimple_assign_rhs2 (stmt);
>> > +
>> > +  imm_use_iterator use_iter;
>> > +  gimple *use_stmt;
>> > +  auto_vec<gimple *> stmts;
>> > +
>> > +  gimple *top_stmt = stmt;
>> > +  basic_block top_bb = gimple_bb (stmt);
>> > +
>> > +  /* Try to set top_stmt to "topmost" stmt
>> > +     with code TRUNC_DIV_EXPR/TRUNC_MOD_EXPR having same operands as stmt.
>> > */
>> > +
>> > +  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1)
>> > +    {
>> > +      if (is_gimple_assign (use_stmt)
>> > +     && (gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR
>> > +         || gimple_assign_rhs_code (use_stmt) == TRUNC_MOD_EXPR)
>> > +     && operand_equal_p (op1, gimple_assign_rhs1 (use_stmt), 0)
>> > +     && operand_equal_p (op2, gimple_assign_rhs2 (use_stmt), 0))
>> So why check for TRUNC_MOD_EXPR here?  ISTM that stmt is always going to be
>> TRUNC_MOD_EXPR and thus you're only interested in looking for a matching
>> TRUNC_DIV_EXPR statement.
>>
>> The only way I could see TRUNC_MOD_EXPR being useful here would be if there is
>> a redundant TRUNC_MOD_EXPR in the IL, which would be a sign that some other
>> optimizer hasn't done its job.  Have you seen this to be useful in practice?
>
> Note that I've reviewed this parts already and we restructured things
> in this way, requiring to look for TRUNC_MOD_EXPR to properly handle
> finding a dominating trunc _or_ div and interatively doing so
> correctly if we have more than one pair.
>
>> > +   {
>> > +     if (stmt_can_throw_internal (use_stmt))
>> > +       continue;
>> > +
>> > +     basic_block bb = gimple_bb (use_stmt);
>> > +
>> > +     if (bb == top_bb)
>> > +       {
>> > +         if (gimple_uid (use_stmt) < gimple_uid (top_stmt))
>> > +           top_stmt = use_stmt;
>> > +       }
>> > +     else if (dominated_by_p (CDI_DOMINATORS, top_bb, bb))
>> > +       {
>> > +         top_bb = bb;
>> > +         top_stmt = use_stmt;
>> > +       }
>> Do you want to break out of the immediate use loop once you've found a
>> TRUNC_DIV_EXPR statement with matching operands?
>>
>> I could perhaps see value in finding the closest TRUNC_DIV_EXPR to the
>> TRUNC_MOD_EXPR, but that would mean that we've got redundant TRUNC_DIV_EXPR
>> statements in the IL, which presumably doesn't happen if the rest of the
>> optimizers are doing their job.
>>
>>
>>
>> > +   }
>> > +    }
>> > +
>> > +  if (top_stmt == stmt && stmt_can_throw_internal (top_stmt))
>> > +    return false;
>> Don't you just want
>> if (stmt_can_throw_internal (stmt))
>>   return false;
>>
>> Before the loop over the immediate uses, thus avoiding the loop if we've got a
>> TRUNC_MOD_EXPR that we can't optimize?
>>
>>
>>
>> > +
>> > +  tree top_op1 = gimple_assign_rhs1 (top_stmt);
>> > +  tree top_op2 = gimple_assign_rhs2 (top_stmt);
>> > +
>> > +  stmts.safe_push (top_stmt);
>> > +  bool div_seen = (gimple_assign_rhs_code (top_stmt) == TRUNC_DIV_EXPR);
>> > +
>> > +  /* Ensure that gimple_bb (use_stmt) is dominated by top_bb.  */
>> ?!?  It's not clear why you'd need to do another immediate use loop here.
>>
>> ISTM at this point you've found two statements, one a TRUNC_DIV the other a
>> TRUNC_MOD with the same operands.  Given that, ISTM you just need to check the
>> dominance relationship of the blocks containing those statements.
>>
>> It really feels like you're going out of your way to handle cases where we
>> have redundant expressions in the IL.  Maybe I'm wrong. BUt if I'm right, I'd
>> rather see that time spent optimizing away those redundant expressions in
>> FRE/PRE/DOM.
>>
>> > +
>> > +  /* Create libcall to internal fn DIVMOD:
>> > +     divmod_tmp = DIVMOD (op1, op2).  */
>> > +
>> > +  gcall *call_stmt = gimple_build_call_internal (IFN_DIVMOD, 2, op1, op2);
>> > +  tree res = make_temp_ssa_name (
>> > +           build_complex_type (TREE_TYPE (op1)),
>> > +           call_stmt, "divmod_tmp");
>> Please review the the GCC formatting guidelines.  This looks wrong.
>>
>> tree res = make_temp_ssa_name (build_complex_type (TREE_TYPE (op1)),
>>                                call_stmt, "divmod_tmp");
>>
>> Seems like the right formatting to me.
>>
>> > +
>> > +  /* Update all statements in stmts.
>> > +     if stmt is lhs = op1 TRUNC_DIV_EXPR op2, change to lhs =
>> > REALPART_EXPR<divmod_tmp>
>> > +     if stmt is lhs = op1 TRUNC_MOD_EXPR op2, change to lhs =
>> > IMAGPART_EXPR<divmod_tmp>.  */
>> I'd just emit a copy from RES to the appropriate lhs operand just after the
>> divmod and delete the now unnecessary TRUNC_DIV_EXPR and TRUNC_MOD_EXPR
>> statements.
>
> That sounds like a good idea.
Um sorry, not sure if I understood this part.
For eg:
t1 = x / y;
t2 = x % y;

do we want to transform it to:
complex_tmp = DIVMOD (x, y);
div_tmp = REALPART_EXPR<complex_tmp>
mod_tmp = IMAGPART_EXPR<complex_tmp>

and then replace each use of lhs (t1, t2 etc.) by either div_tmp
or mod_tmp (depending if the stmt was trunc_div_expr or trunc_mod_expr) ?

Thanks,
Prathamesh
>
>> Copy propagation should clean that up just fine and it simplifies your
>> implementation.
>
> There should be no copy to propagate that way.
>
>> > +
>> > +  bool cfg_changed = false;
>> > +  for (unsigned i = 0; stmts.iterate (i, &use_stmt); ++i)
>> > +    {
>> > +      tree new_rhs;
>> > +
>> > +      switch (gimple_assign_rhs_code (use_stmt))
>> > +   {
>> > +     case TRUNC_DIV_EXPR:
>> > +       new_rhs = fold_build1 (REALPART_EXPR, TREE_TYPE (op1), res);
>> > +       break;
>> > +
>> > +     case TRUNC_MOD_EXPR:
>> > +       new_rhs = fold_build1 (IMAGPART_EXPR, TREE_TYPE (op2), res);
>> > +       break;
>> > +
>> > +     default:
>> > +       gcc_unreachable ();
>> > +   }
>> > +
>> > +      gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
>> > +      gimple_assign_set_rhs_from_tree (&gsi, new_rhs);
>> > +      update_stmt (use_stmt);
>> > +
>> > +      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
>> > +   cfg_changed = true;
>> Does this ever happen given how you filter out throwing statements earlier?
>
> We filter out throwing stmts that are not "top".  The top one we replace
> with the divmod call and we can preserve EH for that (and thus handle
> the case where the original div/mod at this place may throw).
>
>>
>> > @@ -3870,6 +4098,10 @@ pass_optimize_widening_mul::execute (function *fun)
>> >                 match_uaddsub_overflow (&gsi, stmt, code);
>> >               break;
>> >
>> > +           case TRUNC_MOD_EXPR:
>> > +             cfg_changed = convert_to_divmod (as_a<gassign *> (stmt));
>> > +             break;
>> > +
>> >             default:;
>> >             }
>> >         }
>> If I'm reading this correctly, ISTM that you only search for a divmod pair if
>> you first find the mod insn.  That's probably reasonable.  We have to find
>> both to have an optimization opportunity and I suspect we have more div than
>> mod insns, so you're presumably minimizing the search space with this
>> approach.  Can you confirm that was your intention here, or were you looking
>> to do something else?
>
> Yes, that was the idea.
>
> Richard.
Jeff Law Oct. 21, 2016, 5:50 p.m. UTC | #12
On 10/20/2016 03:32 AM, Richard Biener wrote:
>> Starting with some high level design comments.  If these conflict with
>> comments from others, let me know and we'll work through the issues.
>>
>> I don't really like introducing code conditional on the target capabilities
>> this early in the gimple optimization pipeline.
>
> It's basically done right before RTL expansion
> (pass_optimize_widening_mul).
I didn't look at the placement of the pass as it hadn't changed. 
Looking at that now, I see it's fairly late in the pipeline, so I won't 
object to its placement.


>
>> Would it be possible to always do the transformation to divmod in the gimple
>> optimizers, regardless of the target capabilities.  Then in the gimple->RTL
>> expanders make a final decision about divmod insn, libcall, or using div/mod
>> insns?
>
> The issue is that it hoists one or both the division or the modulo and
> if we don't do the transform we'd want to undo that code motion.
But don't we only hoist when we find a suitable div/mod pair to turn 
into a divmod?   Maybe I'm mis-understanding how the identification works.

[ snip ]

>> */
>>> +
>>> +  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1)
>>> +    {
>>> +      if (is_gimple_assign (use_stmt)
>>> +	  && (gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR
>>> +	      || gimple_assign_rhs_code (use_stmt) == TRUNC_MOD_EXPR)
>>> +	  && operand_equal_p (op1, gimple_assign_rhs1 (use_stmt), 0)
>>> +	  && operand_equal_p (op2, gimple_assign_rhs2 (use_stmt), 0))
>> So why check for TRUNC_MOD_EXPR here?  ISTM that stmt is always going to be
>> TRUNC_MOD_EXPR and thus you're only interested in looking for a matching
>> TRUNC_DIV_EXPR statement.
>>
>> The only way I could see TRUNC_MOD_EXPR being useful here would be if there is
>> a redundant TRUNC_MOD_EXPR in the IL, which would be a sign that some other
>> optimizer hasn't done its job.  Have you seen this to be useful in practice?
>
> Note that I've reviewed this parts already and we restructured things
> in this way, requiring to look for TRUNC_MOD_EXPR to properly handle
> finding a dominating trunc _or_ div and interatively doing so
> correctly if we have more than one pair.
But doesn't having more than one pair essentially mean that the other 
optimziers have left redundant junk in the IL?  I'm clearly missing 
something here.


>>> +
>>> +      gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
>>> +      gimple_assign_set_rhs_from_tree (&gsi, new_rhs);
>>> +      update_stmt (use_stmt);
>>> +
>>> +      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
>>> +	cfg_changed = true;
>> Does this ever happen given how you filter out throwing statements earlier?
>
> We filter out throwing stmts that are not "top".  The top one we replace
> with the divmod call and we can preserve EH for that (and thus handle
> the case where the original div/mod at this place may throw).
OK.  It just struck me as a bit odd.  No worries if y'all have already 
worked through this.


Jeff
Jeff Law Oct. 21, 2016, 8:44 p.m. UTC | #13
On 10/21/2016 04:34 AM, Prathamesh Kulkarni wrote:
> On 20 October 2016 at 15:02, Richard Biener <rguenther@suse.de> wrote:
>> On Wed, 19 Oct 2016, Jeff Law wrote:
>>
>>> On 10/15/2016 11:59 PM, Prathamesh Kulkarni wrote:
>>>> Hi,
>>>> After approval from Bernd Schmidt, I committed the patch to remove
>>>> optab functions for
>>>> sdivmod_optab and udivmod_optab in optabs.def, which removes the block
>>>> for divmod patch.
>>>>
>>>> This patch is mostly the same as previous one, except it drops
>>>> targeting __udivmoddi4() because
>>>> it gave undefined reference link error for calling __udivmoddi4() on
>>>> aarch64-linux-gnu.
>>>> It appears aarch64 has hardware insn for DImode div, so __udivmoddi4()
>>>> isn't needed for the target
>>>> (it was a bug in my patch that called __udivmoddi4() even though
>>>> aarch64 supported hardware div).
>>>>
>>>> However this makes me wonder if it's guaranteed that __udivmoddi4()
>>>> will be available for a target if it doesn't have hardware div and
>>>> divmod insn and doesn't have target-specific libfunc for
>>>> DImode divmod ? To be conservative, the attached patch doesn't
>>>> generate call to __udivmoddi4.
>>>>
>>>> Passes bootstrap+test on x86_64-unknown-linux.
>>>> Cross-tested on arm*-*-*, aarch64*-*-*.
>>>> Verified that there are no regressions with SPEC2006 on
>>>> x86_64-unknown-linux-gnu.
>>>> OK to commit ?
>>>>
>>>> Thanks,
>>>> Prathamesh
>>>>
>>>>
>>>> divmod-v2-3-main.txt
>>>>
>>>>
>>>> 2016-10-15  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>
>>>>         Kugan Vivekanandarajah  <kuganv@linaro.org>
>>>>         Jim Wilson  <jim.wilson@linaro.org>
>>>>
>>>>         * target.def: New hook expand_divmod_libfunc.
>>>>         * doc/tm.texi.in: Add hook for TARGET_EXPAND_DIVMOD_LIBFUNC
>>>>         * doc/tm.texi: Regenerate.
>>>>         * internal-fn.def: Add new entry for DIVMOD ifn.
>>>>         * internal-fn.c (expand_DIVMOD): New.
>>>>         * tree-ssa-math-opts.c: Include optabs-libfuncs.h, tree-eh.h,
>>>>         targhooks.h.
>>>>         (widen_mul_stats): Add new field divmod_calls_inserted.
>>>>         (target_supports_divmod_p): New.
>>>>         (divmod_candidate_p): Likewise.
>>>>         (convert_to_divmod): Likewise.
>>>>         (pass_optimize_widening_mul::execute): Call
>>>>         calculate_dominance_info(), renumber_gimple_stmt_uids() at
>>>>         beginning of function. Call convert_to_divmod()
>>>>         and record stats for divmod.
>>> Starting with some high level design comments.  If these conflict with
>>> comments from others, let me know and we'll work through the issues.
>>>
>>> I don't really like introducing code conditional on the target capabilities
>>> this early in the gimple optimization pipeline.
>>
>> It's basically done right before RTL expansion
>> (pass_optimize_widening_mul).
>>
>>> Would it be possible to always do the transformation to divmod in the gimple
>>> optimizers, regardless of the target capabilities.  Then in the gimple->RTL
>>> expanders make a final decision about divmod insn, libcall, or using div/mod
>>> insns?
>>
>> The issue is that it hoists one or both the division or the modulo and
>> if we don't do the transform we'd want to undo that code motion.
>>
>>> That would move all the target dependencies out of the gimple optimizers and
>>> into the gimple->rtl expansion phase, which is the preferred place to start
>>> introducing this kind of target dependency.
>>>
>>> With that background, I'm going to focus more on the identification of divmod
>>> opportunities than the expansion bits.
>>>
>>>
>>>>
>>>> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
>>>> index a4a8e49..866c368 100644
>>>> --- a/gcc/doc/tm.texi
>>>> +++ b/gcc/doc/tm.texi
>>>> @@ -7078,6 +7078,11 @@ This is firstly introduced on ARM/AArch64 targets,
>>>> please refer to
>>>>  the hook implementation for how different fusion types are supported.
>>>>  @end deftypefn
>>>>
>>>> +@deftypefn {Target Hook} void TARGET_EXPAND_DIVMOD_LIBFUNC (rtx
>>>> @var{libfunc}, machine_mode @var{mode}, rtx @var{op0}, rtx @var{op1}, rtx
>>>> *@var{quot}, rtx *@var{rem})
>>>> +Define this hook for enabling divmod transform if the port does not have
>>>> +hardware divmod insn but defines target-specific divmod libfuncs.
>>>> +@end deftypefn
>>>> +
>>>>  @node Sections
>>>>  @section Dividing the Output into Sections (Texts, Data, @dots{})
>>>>  @c the above section title is WAY too long.  maybe cut the part between
>>>> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
>>>> index 265f1be..c4c387b 100644
>>>> --- a/gcc/doc/tm.texi.in
>>>> +++ b/gcc/doc/tm.texi.in
>>>> @@ -4890,6 +4890,8 @@ them: try the first ones in this list first.
>>>>
>>>>  @hook TARGET_SCHED_FUSION_PRIORITY
>>>>
>>>> +@hook TARGET_EXPAND_DIVMOD_LIBFUNC
>>>> +
>>>>  @node Sections
>>>>  @section Dividing the Output into Sections (Texts, Data, @dots{})
>>>>  @c the above section title is WAY too long.  maybe cut the part between
>>>> diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
>>>> index 0b32d5f..42c6973 100644
>>>> --- a/gcc/internal-fn.c
>>>> +++ b/gcc/internal-fn.c
>>>> @@ -2207,6 +2207,53 @@ expand_ATOMIC_COMPARE_EXCHANGE (internal_fn, gcall
>>>> *call)
>>>>    expand_ifn_atomic_compare_exchange (call);
>>>>  }
>>>>
>>>> +/* Expand DIVMOD() using:
>>> In general, we do not use () when referring to objects in comments.
>>>
>>>> + a) optab handler for udivmod/sdivmod if it is available.
>>>> + b) If optab_handler doesn't exist, generate call to
>>>> +    target-specific divmod libfunc.  */
>>>> +
>>>> +static void
>>>> +expand_DIVMOD (internal_fn, gcall *call_stmt)
>>> In general, don't use upper case for function names.  Lower case please.  But
>>> I believe we should be pushing all the expansion stuff into the gimple->rtl
>>> expansion code, so I haven't looked at this closely yet on the expectation
>>> it'll likely move.
>>>
>>>
>>>
>>>> diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
>>>> index 0cea1a8..4f7bdb2 100644
>>>> --- a/gcc/tree-ssa-math-opts.c
>>>> +++ b/gcc/tree-ssa-math-opts.c
>>>> @@ -3793,6 +3799,226 @@ match_uaddsub_overflow (gimple_stmt_iterator *gsi,
>>>> gimple *stmt,
>>>>    return true;
>>>>  }
>>>>
>>>> +/* Return true if target has support for divmod.  */
>>>> +
>>>> +static bool
>>>> +target_supports_divmod_p (optab divmod_optab, optab div_optab, machine_mode
>>>> mode)
>>> So this is the kind of code I expect to avoid in the gimple optimizers.
>>> Instead the decision about whether to use a divmod insn, divmod libfunc or div
>>> + synthesized mod, or separate div and mod insns should happen at gimple->rtl
>>> expansion time.
>>>
>>>
>>>> +
>>>> +/* This function looks for:
>>>> +   t1 = a TRUNC_DIV_EXPR b;
>>>> +   t2 = a TRUNC_MOD_EXPR b;
>>>> +   and transforms it to the following sequence:
>>>> +   complex_tmp = DIVMOD (a, b);
>>>> +   t1 = REALPART_EXPR(a);
>>>> +   t2 = IMAGPART_EXPR(b);
>>>> +   For conditions enabling the transform see divmod_candidate_p().
>>>> +
>>>> +   The pass works in two phases:
>>>> +   1) Walk through all immediate uses of stmt's operand and find a
>>>> +      TRUNC_DIV_EXPR with matching operands and if such a stmt is found add
>>>> +      it to stmts vector.
>>>> +   2) Insert DIVMOD call before first div/mod stmt in top_bb (basic block
>>>> that
>>>> +      dominates other div/mod stmts with same operands) and update entries
>>>> in
>>>> +      stmts vector to use return value of DIMOVD (REALEXPR_PART for div,
>>>> +      IMAGPART_EXPR for mod).  */
>>>> +
>>>> +static bool
>>>> +convert_to_divmod (gassign *stmt)
>>>> +{
>>>> +  if (!divmod_candidate_p (stmt))
>>>> +    return false;
>>>> +
>>>> +  tree op1 = gimple_assign_rhs1 (stmt);
>>>> +  tree op2 = gimple_assign_rhs2 (stmt);
>>>> +
>>>> +  imm_use_iterator use_iter;
>>>> +  gimple *use_stmt;
>>>> +  auto_vec<gimple *> stmts;
>>>> +
>>>> +  gimple *top_stmt = stmt;
>>>> +  basic_block top_bb = gimple_bb (stmt);
>>>> +
>>>> +  /* Try to set top_stmt to "topmost" stmt
>>>> +     with code TRUNC_DIV_EXPR/TRUNC_MOD_EXPR having same operands as stmt.
>>>> */
>>>> +
>>>> +  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1)
>>>> +    {
>>>> +      if (is_gimple_assign (use_stmt)
>>>> +     && (gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR
>>>> +         || gimple_assign_rhs_code (use_stmt) == TRUNC_MOD_EXPR)
>>>> +     && operand_equal_p (op1, gimple_assign_rhs1 (use_stmt), 0)
>>>> +     && operand_equal_p (op2, gimple_assign_rhs2 (use_stmt), 0))
>>> So why check for TRUNC_MOD_EXPR here?  ISTM that stmt is always going to be
>>> TRUNC_MOD_EXPR and thus you're only interested in looking for a matching
>>> TRUNC_DIV_EXPR statement.
>>>
>>> The only way I could see TRUNC_MOD_EXPR being useful here would be if there is
>>> a redundant TRUNC_MOD_EXPR in the IL, which would be a sign that some other
>>> optimizer hasn't done its job.  Have you seen this to be useful in practice?
>>
>> Note that I've reviewed this parts already and we restructured things
>> in this way, requiring to look for TRUNC_MOD_EXPR to properly handle
>> finding a dominating trunc _or_ div and interatively doing so
>> correctly if we have more than one pair.
>>
>>>> +   {
>>>> +     if (stmt_can_throw_internal (use_stmt))
>>>> +       continue;
>>>> +
>>>> +     basic_block bb = gimple_bb (use_stmt);
>>>> +
>>>> +     if (bb == top_bb)
>>>> +       {
>>>> +         if (gimple_uid (use_stmt) < gimple_uid (top_stmt))
>>>> +           top_stmt = use_stmt;
>>>> +       }
>>>> +     else if (dominated_by_p (CDI_DOMINATORS, top_bb, bb))
>>>> +       {
>>>> +         top_bb = bb;
>>>> +         top_stmt = use_stmt;
>>>> +       }
>>> Do you want to break out of the immediate use loop once you've found a
>>> TRUNC_DIV_EXPR statement with matching operands?
>>>
>>> I could perhaps see value in finding the closest TRUNC_DIV_EXPR to the
>>> TRUNC_MOD_EXPR, but that would mean that we've got redundant TRUNC_DIV_EXPR
>>> statements in the IL, which presumably doesn't happen if the rest of the
>>> optimizers are doing their job.
>>>
>>>
>>>
>>>> +   }
>>>> +    }
>>>> +
>>>> +  if (top_stmt == stmt && stmt_can_throw_internal (top_stmt))
>>>> +    return false;
>>> Don't you just want
>>> if (stmt_can_throw_internal (stmt))
>>>   return false;
>>>
>>> Before the loop over the immediate uses, thus avoiding the loop if we've got a
>>> TRUNC_MOD_EXPR that we can't optimize?
>>>
>>>
>>>
>>>> +
>>>> +  tree top_op1 = gimple_assign_rhs1 (top_stmt);
>>>> +  tree top_op2 = gimple_assign_rhs2 (top_stmt);
>>>> +
>>>> +  stmts.safe_push (top_stmt);
>>>> +  bool div_seen = (gimple_assign_rhs_code (top_stmt) == TRUNC_DIV_EXPR);
>>>> +
>>>> +  /* Ensure that gimple_bb (use_stmt) is dominated by top_bb.  */
>>> ?!?  It's not clear why you'd need to do another immediate use loop here.
>>>
>>> ISTM at this point you've found two statements, one a TRUNC_DIV the other a
>>> TRUNC_MOD with the same operands.  Given that, ISTM you just need to check the
>>> dominance relationship of the blocks containing those statements.
>>>
>>> It really feels like you're going out of your way to handle cases where we
>>> have redundant expressions in the IL.  Maybe I'm wrong. BUt if I'm right, I'd
>>> rather see that time spent optimizing away those redundant expressions in
>>> FRE/PRE/DOM.
>>>
>>>> +
>>>> +  /* Create libcall to internal fn DIVMOD:
>>>> +     divmod_tmp = DIVMOD (op1, op2).  */
>>>> +
>>>> +  gcall *call_stmt = gimple_build_call_internal (IFN_DIVMOD, 2, op1, op2);
>>>> +  tree res = make_temp_ssa_name (
>>>> +           build_complex_type (TREE_TYPE (op1)),
>>>> +           call_stmt, "divmod_tmp");
>>> Please review the the GCC formatting guidelines.  This looks wrong.
>>>
>>> tree res = make_temp_ssa_name (build_complex_type (TREE_TYPE (op1)),
>>>                                call_stmt, "divmod_tmp");
>>>
>>> Seems like the right formatting to me.
>>>
>>>> +
>>>> +  /* Update all statements in stmts.
>>>> +     if stmt is lhs = op1 TRUNC_DIV_EXPR op2, change to lhs =
>>>> REALPART_EXPR<divmod_tmp>
>>>> +     if stmt is lhs = op1 TRUNC_MOD_EXPR op2, change to lhs =
>>>> IMAGPART_EXPR<divmod_tmp>.  */
>>> I'd just emit a copy from RES to the appropriate lhs operand just after the
>>> divmod and delete the now unnecessary TRUNC_DIV_EXPR and TRUNC_MOD_EXPR
>>> statements.
>>
>> That sounds like a good idea.
> Um sorry, not sure if I understood this part.
> For eg:
> t1 = x / y;
> t2 = x % y;
>
> do we want to transform it to:
> complex_tmp = DIVMOD (x, y);
> div_tmp = REALPART_EXPR<complex_tmp>
> mod_tmp = IMAGPART_EXPR<complex_tmp>

complex_tmp = DIVMOD (x, y)
t1 = REALPART_EXPR (complex_tmp)
t2 = IMAGPART_EXPR (compex_tmp)

Then remove the
t1 = x/y
t2 = x%y

  statements
jeff
>
Richard Biener Oct. 24, 2016, 7:28 a.m. UTC | #14
On Fri, 21 Oct 2016, Jeff Law wrote:

> On 10/21/2016 04:34 AM, Prathamesh Kulkarni wrote:
> > On 20 October 2016 at 15:02, Richard Biener <rguenther@suse.de> wrote:
> > > On Wed, 19 Oct 2016, Jeff Law wrote:
> > > 
> > > > On 10/15/2016 11:59 PM, Prathamesh Kulkarni wrote:
> > > > > Hi,
> > > > > After approval from Bernd Schmidt, I committed the patch to remove
> > > > > optab functions for
> > > > > sdivmod_optab and udivmod_optab in optabs.def, which removes the block
> > > > > for divmod patch.
> > > > > 
> > > > > This patch is mostly the same as previous one, except it drops
> > > > > targeting __udivmoddi4() because
> > > > > it gave undefined reference link error for calling __udivmoddi4() on
> > > > > aarch64-linux-gnu.
> > > > > It appears aarch64 has hardware insn for DImode div, so __udivmoddi4()
> > > > > isn't needed for the target
> > > > > (it was a bug in my patch that called __udivmoddi4() even though
> > > > > aarch64 supported hardware div).
> > > > > 
> > > > > However this makes me wonder if it's guaranteed that __udivmoddi4()
> > > > > will be available for a target if it doesn't have hardware div and
> > > > > divmod insn and doesn't have target-specific libfunc for
> > > > > DImode divmod ? To be conservative, the attached patch doesn't
> > > > > generate call to __udivmoddi4.
> > > > > 
> > > > > Passes bootstrap+test on x86_64-unknown-linux.
> > > > > Cross-tested on arm*-*-*, aarch64*-*-*.
> > > > > Verified that there are no regressions with SPEC2006 on
> > > > > x86_64-unknown-linux-gnu.
> > > > > OK to commit ?
> > > > > 
> > > > > Thanks,
> > > > > Prathamesh
> > > > > 
> > > > > 
> > > > > divmod-v2-3-main.txt
> > > > > 
> > > > > 
> > > > > 2016-10-15  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>
> > > > >         Kugan Vivekanandarajah  <kuganv@linaro.org>
> > > > >         Jim Wilson  <jim.wilson@linaro.org>
> > > > > 
> > > > >         * target.def: New hook expand_divmod_libfunc.
> > > > >         * doc/tm.texi.in: Add hook for TARGET_EXPAND_DIVMOD_LIBFUNC
> > > > >         * doc/tm.texi: Regenerate.
> > > > >         * internal-fn.def: Add new entry for DIVMOD ifn.
> > > > >         * internal-fn.c (expand_DIVMOD): New.
> > > > >         * tree-ssa-math-opts.c: Include optabs-libfuncs.h, tree-eh.h,
> > > > >         targhooks.h.
> > > > >         (widen_mul_stats): Add new field divmod_calls_inserted.
> > > > >         (target_supports_divmod_p): New.
> > > > >         (divmod_candidate_p): Likewise.
> > > > >         (convert_to_divmod): Likewise.
> > > > >         (pass_optimize_widening_mul::execute): Call
> > > > >         calculate_dominance_info(), renumber_gimple_stmt_uids() at
> > > > >         beginning of function. Call convert_to_divmod()
> > > > >         and record stats for divmod.
> > > > Starting with some high level design comments.  If these conflict with
> > > > comments from others, let me know and we'll work through the issues.
> > > > 
> > > > I don't really like introducing code conditional on the target
> > > > capabilities
> > > > this early in the gimple optimization pipeline.
> > > 
> > > It's basically done right before RTL expansion
> > > (pass_optimize_widening_mul).
> > > 
> > > > Would it be possible to always do the transformation to divmod in the
> > > > gimple
> > > > optimizers, regardless of the target capabilities.  Then in the
> > > > gimple->RTL
> > > > expanders make a final decision about divmod insn, libcall, or using
> > > > div/mod
> > > > insns?
> > > 
> > > The issue is that it hoists one or both the division or the modulo and
> > > if we don't do the transform we'd want to undo that code motion.
> > > 
> > > > That would move all the target dependencies out of the gimple optimizers
> > > > and
> > > > into the gimple->rtl expansion phase, which is the preferred place to
> > > > start
> > > > introducing this kind of target dependency.
> > > > 
> > > > With that background, I'm going to focus more on the identification of
> > > > divmod
> > > > opportunities than the expansion bits.
> > > > 
> > > > 
> > > > > 
> > > > > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> > > > > index a4a8e49..866c368 100644
> > > > > --- a/gcc/doc/tm.texi
> > > > > +++ b/gcc/doc/tm.texi
> > > > > @@ -7078,6 +7078,11 @@ This is firstly introduced on ARM/AArch64
> > > > > targets,
> > > > > please refer to
> > > > >  the hook implementation for how different fusion types are supported.
> > > > >  @end deftypefn
> > > > > 
> > > > > +@deftypefn {Target Hook} void TARGET_EXPAND_DIVMOD_LIBFUNC (rtx
> > > > > @var{libfunc}, machine_mode @var{mode}, rtx @var{op0}, rtx @var{op1},
> > > > > rtx
> > > > > *@var{quot}, rtx *@var{rem})
> > > > > +Define this hook for enabling divmod transform if the port does not
> > > > > have
> > > > > +hardware divmod insn but defines target-specific divmod libfuncs.
> > > > > +@end deftypefn
> > > > > +
> > > > >  @node Sections
> > > > >  @section Dividing the Output into Sections (Texts, Data, @dots{})
> > > > >  @c the above section title is WAY too long.  maybe cut the part
> > > > > between
> > > > > diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> > > > > index 265f1be..c4c387b 100644
> > > > > --- a/gcc/doc/tm.texi.in
> > > > > +++ b/gcc/doc/tm.texi.in
> > > > > @@ -4890,6 +4890,8 @@ them: try the first ones in this list first.
> > > > > 
> > > > >  @hook TARGET_SCHED_FUSION_PRIORITY
> > > > > 
> > > > > +@hook TARGET_EXPAND_DIVMOD_LIBFUNC
> > > > > +
> > > > >  @node Sections
> > > > >  @section Dividing the Output into Sections (Texts, Data, @dots{})
> > > > >  @c the above section title is WAY too long.  maybe cut the part
> > > > > between
> > > > > diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> > > > > index 0b32d5f..42c6973 100644
> > > > > --- a/gcc/internal-fn.c
> > > > > +++ b/gcc/internal-fn.c
> > > > > @@ -2207,6 +2207,53 @@ expand_ATOMIC_COMPARE_EXCHANGE (internal_fn,
> > > > > gcall
> > > > > *call)
> > > > >    expand_ifn_atomic_compare_exchange (call);
> > > > >  }
> > > > > 
> > > > > +/* Expand DIVMOD() using:
> > > > In general, we do not use () when referring to objects in comments.
> > > > 
> > > > > + a) optab handler for udivmod/sdivmod if it is available.
> > > > > + b) If optab_handler doesn't exist, generate call to
> > > > > +    target-specific divmod libfunc.  */
> > > > > +
> > > > > +static void
> > > > > +expand_DIVMOD (internal_fn, gcall *call_stmt)
> > > > In general, don't use upper case for function names.  Lower case please.
> > > > But
> > > > I believe we should be pushing all the expansion stuff into the
> > > > gimple->rtl
> > > > expansion code, so I haven't looked at this closely yet on the
> > > > expectation
> > > > it'll likely move.
> > > > 
> > > > 
> > > > 
> > > > > diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
> > > > > index 0cea1a8..4f7bdb2 100644
> > > > > --- a/gcc/tree-ssa-math-opts.c
> > > > > +++ b/gcc/tree-ssa-math-opts.c
> > > > > @@ -3793,6 +3799,226 @@ match_uaddsub_overflow (gimple_stmt_iterator
> > > > > *gsi,
> > > > > gimple *stmt,
> > > > >    return true;
> > > > >  }
> > > > > 
> > > > > +/* Return true if target has support for divmod.  */
> > > > > +
> > > > > +static bool
> > > > > +target_supports_divmod_p (optab divmod_optab, optab div_optab,
> > > > > machine_mode
> > > > > mode)
> > > > So this is the kind of code I expect to avoid in the gimple optimizers.
> > > > Instead the decision about whether to use a divmod insn, divmod libfunc
> > > > or div
> > > > + synthesized mod, or separate div and mod insns should happen at
> > > > gimple->rtl
> > > > expansion time.
> > > > 
> > > > 
> > > > > +
> > > > > +/* This function looks for:
> > > > > +   t1 = a TRUNC_DIV_EXPR b;
> > > > > +   t2 = a TRUNC_MOD_EXPR b;
> > > > > +   and transforms it to the following sequence:
> > > > > +   complex_tmp = DIVMOD (a, b);
> > > > > +   t1 = REALPART_EXPR(a);
> > > > > +   t2 = IMAGPART_EXPR(b);
> > > > > +   For conditions enabling the transform see divmod_candidate_p().
> > > > > +
> > > > > +   The pass works in two phases:
> > > > > +   1) Walk through all immediate uses of stmt's operand and find a
> > > > > +      TRUNC_DIV_EXPR with matching operands and if such a stmt is
> > > > > found add
> > > > > +      it to stmts vector.
> > > > > +   2) Insert DIVMOD call before first div/mod stmt in top_bb (basic
> > > > > block
> > > > > that
> > > > > +      dominates other div/mod stmts with same operands) and update
> > > > > entries
> > > > > in
> > > > > +      stmts vector to use return value of DIMOVD (REALEXPR_PART for
> > > > > div,
> > > > > +      IMAGPART_EXPR for mod).  */
> > > > > +
> > > > > +static bool
> > > > > +convert_to_divmod (gassign *stmt)
> > > > > +{
> > > > > +  if (!divmod_candidate_p (stmt))
> > > > > +    return false;
> > > > > +
> > > > > +  tree op1 = gimple_assign_rhs1 (stmt);
> > > > > +  tree op2 = gimple_assign_rhs2 (stmt);
> > > > > +
> > > > > +  imm_use_iterator use_iter;
> > > > > +  gimple *use_stmt;
> > > > > +  auto_vec<gimple *> stmts;
> > > > > +
> > > > > +  gimple *top_stmt = stmt;
> > > > > +  basic_block top_bb = gimple_bb (stmt);
> > > > > +
> > > > > +  /* Try to set top_stmt to "topmost" stmt
> > > > > +     with code TRUNC_DIV_EXPR/TRUNC_MOD_EXPR having same operands as
> > > > > stmt.
> > > > > */
> > > > > +
> > > > > +  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1)
> > > > > +    {
> > > > > +      if (is_gimple_assign (use_stmt)
> > > > > +     && (gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR
> > > > > +         || gimple_assign_rhs_code (use_stmt) == TRUNC_MOD_EXPR)
> > > > > +     && operand_equal_p (op1, gimple_assign_rhs1 (use_stmt), 0)
> > > > > +     && operand_equal_p (op2, gimple_assign_rhs2 (use_stmt), 0))
> > > > So why check for TRUNC_MOD_EXPR here?  ISTM that stmt is always going to
> > > > be
> > > > TRUNC_MOD_EXPR and thus you're only interested in looking for a matching
> > > > TRUNC_DIV_EXPR statement.
> > > > 
> > > > The only way I could see TRUNC_MOD_EXPR being useful here would be if
> > > > there is
> > > > a redundant TRUNC_MOD_EXPR in the IL, which would be a sign that some
> > > > other
> > > > optimizer hasn't done its job.  Have you seen this to be useful in
> > > > practice?
> > > 
> > > Note that I've reviewed this parts already and we restructured things
> > > in this way, requiring to look for TRUNC_MOD_EXPR to properly handle
> > > finding a dominating trunc _or_ div and interatively doing so
> > > correctly if we have more than one pair.
> > > 
> > > > > +   {
> > > > > +     if (stmt_can_throw_internal (use_stmt))
> > > > > +       continue;
> > > > > +
> > > > > +     basic_block bb = gimple_bb (use_stmt);
> > > > > +
> > > > > +     if (bb == top_bb)
> > > > > +       {
> > > > > +         if (gimple_uid (use_stmt) < gimple_uid (top_stmt))
> > > > > +           top_stmt = use_stmt;
> > > > > +       }
> > > > > +     else if (dominated_by_p (CDI_DOMINATORS, top_bb, bb))
> > > > > +       {
> > > > > +         top_bb = bb;
> > > > > +         top_stmt = use_stmt;
> > > > > +       }
> > > > Do you want to break out of the immediate use loop once you've found a
> > > > TRUNC_DIV_EXPR statement with matching operands?
> > > > 
> > > > I could perhaps see value in finding the closest TRUNC_DIV_EXPR to the
> > > > TRUNC_MOD_EXPR, but that would mean that we've got redundant
> > > > TRUNC_DIV_EXPR
> > > > statements in the IL, which presumably doesn't happen if the rest of the
> > > > optimizers are doing their job.
> > > > 
> > > > 
> > > > 
> > > > > +   }
> > > > > +    }
> > > > > +
> > > > > +  if (top_stmt == stmt && stmt_can_throw_internal (top_stmt))
> > > > > +    return false;
> > > > Don't you just want
> > > > if (stmt_can_throw_internal (stmt))
> > > >   return false;
> > > > 
> > > > Before the loop over the immediate uses, thus avoiding the loop if we've
> > > > got a
> > > > TRUNC_MOD_EXPR that we can't optimize?
> > > > 
> > > > 
> > > > 
> > > > > +
> > > > > +  tree top_op1 = gimple_assign_rhs1 (top_stmt);
> > > > > +  tree top_op2 = gimple_assign_rhs2 (top_stmt);
> > > > > +
> > > > > +  stmts.safe_push (top_stmt);
> > > > > +  bool div_seen = (gimple_assign_rhs_code (top_stmt) ==
> > > > > TRUNC_DIV_EXPR);
> > > > > +
> > > > > +  /* Ensure that gimple_bb (use_stmt) is dominated by top_bb.  */
> > > > ?!?  It's not clear why you'd need to do another immediate use loop
> > > > here.
> > > > 
> > > > ISTM at this point you've found two statements, one a TRUNC_DIV the
> > > > other a
> > > > TRUNC_MOD with the same operands.  Given that, ISTM you just need to
> > > > check the
> > > > dominance relationship of the blocks containing those statements.
> > > > 
> > > > It really feels like you're going out of your way to handle cases where
> > > > we
> > > > have redundant expressions in the IL.  Maybe I'm wrong. BUt if I'm
> > > > right, I'd
> > > > rather see that time spent optimizing away those redundant expressions
> > > > in
> > > > FRE/PRE/DOM.
> > > > 
> > > > > +
> > > > > +  /* Create libcall to internal fn DIVMOD:
> > > > > +     divmod_tmp = DIVMOD (op1, op2).  */
> > > > > +
> > > > > +  gcall *call_stmt = gimple_build_call_internal (IFN_DIVMOD, 2, op1,
> > > > > op2);
> > > > > +  tree res = make_temp_ssa_name (
> > > > > +           build_complex_type (TREE_TYPE (op1)),
> > > > > +           call_stmt, "divmod_tmp");
> > > > Please review the the GCC formatting guidelines.  This looks wrong.
> > > > 
> > > > tree res = make_temp_ssa_name (build_complex_type (TREE_TYPE (op1)),
> > > >                                call_stmt, "divmod_tmp");
> > > > 
> > > > Seems like the right formatting to me.
> > > > 
> > > > > +
> > > > > +  /* Update all statements in stmts.
> > > > > +     if stmt is lhs = op1 TRUNC_DIV_EXPR op2, change to lhs =
> > > > > REALPART_EXPR<divmod_tmp>
> > > > > +     if stmt is lhs = op1 TRUNC_MOD_EXPR op2, change to lhs =
> > > > > IMAGPART_EXPR<divmod_tmp>.  */
> > > > I'd just emit a copy from RES to the appropriate lhs operand just after
> > > > the
> > > > divmod and delete the now unnecessary TRUNC_DIV_EXPR and TRUNC_MOD_EXPR
> > > > statements.
> > > 
> > > That sounds like a good idea.
> > Um sorry, not sure if I understood this part.
> > For eg:
> > t1 = x / y;
> > t2 = x % y;
> > 
> > do we want to transform it to:
> > complex_tmp = DIVMOD (x, y);
> > div_tmp = REALPART_EXPR<complex_tmp>
> > mod_tmp = IMAGPART_EXPR<complex_tmp>
> 
> complex_tmp = DIVMOD (x, y)
> t1 = REALPART_EXPR (complex_tmp)
> t2 = IMAGPART_EXPR (compex_tmp)
> 
> Then remove the
> t1 = x/y
> t2 = x%y
> 
>  statements

OTOH implementation-wise that's more complicated.

Richard.
Jeff Law Oct. 24, 2016, 2:53 p.m. UTC | #15
On 10/24/2016 01:28 AM, Richard Biener wrote:
[ big snip ]

>>>>>
>>>>>> +
>>>>>> +  /* Update all statements in stmts.
>>>>>> +     if stmt is lhs = op1 TRUNC_DIV_EXPR op2, change to lhs =
>>>>>> REALPART_EXPR<divmod_tmp>
>>>>>> +     if stmt is lhs = op1 TRUNC_MOD_EXPR op2, change to lhs =
>>>>>> IMAGPART_EXPR<divmod_tmp>.  */
>>>>> I'd just emit a copy from RES to the appropriate lhs operand just after
>>>>> the
>>>>> divmod and delete the now unnecessary TRUNC_DIV_EXPR and TRUNC_MOD_EXPR
>>>>> statements.
>>>>
>>>> That sounds like a good idea.
>>> Um sorry, not sure if I understood this part.
>>> For eg:
>>> t1 = x / y;
>>> t2 = x % y;
>>>
>>> do we want to transform it to:
>>> complex_tmp = DIVMOD (x, y);
>>> div_tmp = REALPART_EXPR<complex_tmp>
>>> mod_tmp = IMAGPART_EXPR<complex_tmp>
>>
>> complex_tmp = DIVMOD (x, y)
>> t1 = REALPART_EXPR (complex_tmp)
>> t2 = IMAGPART_EXPR (compex_tmp)
>>
>> Then remove the
>> t1 = x/y
>> t2 = x%y
>>
>>  statements
>
> OTOH implementation-wise that's more complicated.
If so, then I wouldn't bother.  I only mention it because I've found 
that model is sometimes easier on the implementation side.  I don't 
consider it a big deal.

jeff
Prathamesh Kulkarni Oct. 24, 2016, 3:35 p.m. UTC | #16
On 24 October 2016 at 20:23, Jeff Law <law@redhat.com> wrote:
> On 10/24/2016 01:28 AM, Richard Biener wrote:
> [ big snip ]
>
>
>>>>>>
>>>>>>> +
>>>>>>> +  /* Update all statements in stmts.
>>>>>>> +     if stmt is lhs = op1 TRUNC_DIV_EXPR op2, change to lhs =
>>>>>>> REALPART_EXPR<divmod_tmp>
>>>>>>> +     if stmt is lhs = op1 TRUNC_MOD_EXPR op2, change to lhs =
>>>>>>> IMAGPART_EXPR<divmod_tmp>.  */
>>>>>>
>>>>>> I'd just emit a copy from RES to the appropriate lhs operand just
>>>>>> after
>>>>>> the
>>>>>> divmod and delete the now unnecessary TRUNC_DIV_EXPR and
>>>>>> TRUNC_MOD_EXPR
>>>>>> statements.
>>>>>
>>>>>
>>>>> That sounds like a good idea.
>>>>
>>>> Um sorry, not sure if I understood this part.
>>>> For eg:
>>>> t1 = x / y;
>>>> t2 = x % y;
>>>>
>>>> do we want to transform it to:
>>>> complex_tmp = DIVMOD (x, y);
>>>> div_tmp = REALPART_EXPR<complex_tmp>
>>>> mod_tmp = IMAGPART_EXPR<complex_tmp>
>>>
>>>
>>> complex_tmp = DIVMOD (x, y)
>>> t1 = REALPART_EXPR (complex_tmp)
>>> t2 = IMAGPART_EXPR (compex_tmp)
>>>
>>> Then remove the
>>> t1 = x/y
>>> t2 = x%y
>>>
>>>  statements
>>
>>
>> OTOH implementation-wise that's more complicated.
>
> If so, then I wouldn't bother.  I only mention it because I've found that
> model is sometimes easier on the implementation side.  I don't consider it a
> big deal.
Richard and Jeff,
Unless you have any further suggestions on the patch, should I
consider it approved (modulo formatting fixes) ?
I can confirm that the optab_libfunc() issue is solved at least for
the divmod transform with the patch to remove
optab functions for sdivmod_optab and udivmod_optab.

Thanks,
Prathamesh
>
> jeff
Jeff Law Oct. 24, 2016, 6:36 p.m. UTC | #17
On 10/24/2016 09:35 AM, Prathamesh Kulkarni wrote:
> On 24 October 2016 at 20:23, Jeff Law <law@redhat.com> wrote:
>> On 10/24/2016 01:28 AM, Richard Biener wrote:
>> [ big snip ]
>>
>>
>>>>>>>
>>>>>>>> +
>>>>>>>> +  /* Update all statements in stmts.
>>>>>>>> +     if stmt is lhs = op1 TRUNC_DIV_EXPR op2, change to lhs =
>>>>>>>> REALPART_EXPR<divmod_tmp>
>>>>>>>> +     if stmt is lhs = op1 TRUNC_MOD_EXPR op2, change to lhs =
>>>>>>>> IMAGPART_EXPR<divmod_tmp>.  */
>>>>>>>
>>>>>>> I'd just emit a copy from RES to the appropriate lhs operand just
>>>>>>> after
>>>>>>> the
>>>>>>> divmod and delete the now unnecessary TRUNC_DIV_EXPR and
>>>>>>> TRUNC_MOD_EXPR
>>>>>>> statements.
>>>>>>
>>>>>>
>>>>>> That sounds like a good idea.
>>>>>
>>>>> Um sorry, not sure if I understood this part.
>>>>> For eg:
>>>>> t1 = x / y;
>>>>> t2 = x % y;
>>>>>
>>>>> do we want to transform it to:
>>>>> complex_tmp = DIVMOD (x, y);
>>>>> div_tmp = REALPART_EXPR<complex_tmp>
>>>>> mod_tmp = IMAGPART_EXPR<complex_tmp>
>>>>
>>>>
>>>> complex_tmp = DIVMOD (x, y)
>>>> t1 = REALPART_EXPR (complex_tmp)
>>>> t2 = IMAGPART_EXPR (compex_tmp)
>>>>
>>>> Then remove the
>>>> t1 = x/y
>>>> t2 = x%y
>>>>
>>>>  statements
>>>
>>>
>>> OTOH implementation-wise that's more complicated.
>>
>> If so, then I wouldn't bother.  I only mention it because I've found that
>> model is sometimes easier on the implementation side.  I don't consider it a
>> big deal.
> Richard and Jeff,
> Unless you have any further suggestions on the patch, should I
> consider it approved (modulo formatting fixes) ?
> I can confirm that the optab_libfunc() issue is solved at least for
> the divmod transform with the patch to remove
> optab functions for sdivmod_optab and udivmod_optab.
I think Richi has the most state on this series, so I'll let him chime in.

jeff
Richard Biener Oct. 25, 2016, 8:13 a.m. UTC | #18
On Sun, Oct 16, 2016 at 7:59 AM, Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
> Hi,
> After approval from Bernd Schmidt, I committed the patch to remove
> optab functions for
> sdivmod_optab and udivmod_optab in optabs.def, which removes the block
> for divmod patch.
>
> This patch is mostly the same as previous one, except it drops
> targeting __udivmoddi4() because
> it gave undefined reference link error for calling __udivmoddi4() on
> aarch64-linux-gnu.
> It appears aarch64 has hardware insn for DImode div, so __udivmoddi4()
> isn't needed for the target
> (it was a bug in my patch that called __udivmoddi4() even though
> aarch64 supported hardware div).
>
> However this makes me wonder if it's guaranteed that __udivmoddi4()
> will be available for a target if it doesn't have hardware div and
> divmod insn and doesn't have target-specific libfunc for
> DImode divmod ? To be conservative, the attached patch doesn't
> generate call to __udivmoddi4.
>
> Passes bootstrap+test on x86_64-unknown-linux.
> Cross-tested on arm*-*-*, aarch64*-*-*.
> Verified that there are no regressions with SPEC2006 on
> x86_64-unknown-linux-gnu.
> OK to commit ?

I think the searching is still somewhat wrong - it's been some time
since my last look at the
patch so maybe I've said this already.  Please bail out early for
stmt_can_throw_internal (stmt),
otherwise the top stmt search might end up not working.  So

+
+  if (top_stmt == stmt && stmt_can_throw_internal (top_stmt))
+    return false;

can go.

top_stmt may end up as a TRUNC_DIV_EXPR so it's pointless to only look
for another
TRUNC_DIV_EXPR later ... you may end up without a single TRUNC_MOD_EXPR.
Which means you want a div_seen and a mod_seen, or simply record the top_stmt
code and look for the opposite in the 2nd loop.

+      switch (gimple_assign_rhs_code (use_stmt))
+       {
+         case TRUNC_DIV_EXPR:
+           new_rhs = fold_build1 (REALPART_EXPR, TREE_TYPE (op1), res);
+           break;
+
+         case TRUNC_MOD_EXPR:
+           new_rhs = fold_build1 (IMAGPART_EXPR, TREE_TYPE (op2), res);
+           break;
+

why type of op1 and type of op2 in the other case?  Choose one for consistency.

+      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
+       cfg_changed = true;

as you are rejecting all internally throwing stmts this shouldn't be necessary.

The patch is ok with those changes.

Thanks,
Richard.


> Thanks,
> Prathamesh
Prathamesh Kulkarni Oct. 25, 2016, 10:34 a.m. UTC | #19
On 25 October 2016 at 13:43, Richard Biener <richard.guenther@gmail.com> wrote:
> On Sun, Oct 16, 2016 at 7:59 AM, Prathamesh Kulkarni
> <prathamesh.kulkarni@linaro.org> wrote:
>> Hi,
>> After approval from Bernd Schmidt, I committed the patch to remove
>> optab functions for
>> sdivmod_optab and udivmod_optab in optabs.def, which removes the block
>> for divmod patch.
>>
>> This patch is mostly the same as previous one, except it drops
>> targeting __udivmoddi4() because
>> it gave undefined reference link error for calling __udivmoddi4() on
>> aarch64-linux-gnu.
>> It appears aarch64 has hardware insn for DImode div, so __udivmoddi4()
>> isn't needed for the target
>> (it was a bug in my patch that called __udivmoddi4() even though
>> aarch64 supported hardware div).
>>
>> However this makes me wonder if it's guaranteed that __udivmoddi4()
>> will be available for a target if it doesn't have hardware div and
>> divmod insn and doesn't have target-specific libfunc for
>> DImode divmod ? To be conservative, the attached patch doesn't
>> generate call to __udivmoddi4.
>>
>> Passes bootstrap+test on x86_64-unknown-linux.
>> Cross-tested on arm*-*-*, aarch64*-*-*.
>> Verified that there are no regressions with SPEC2006 on
>> x86_64-unknown-linux-gnu.
>> OK to commit ?
>
> I think the searching is still somewhat wrong - it's been some time
> since my last look at the
> patch so maybe I've said this already.  Please bail out early for
> stmt_can_throw_internal (stmt),
> otherwise the top stmt search might end up not working.  So
>
> +
> +  if (top_stmt == stmt && stmt_can_throw_internal (top_stmt))
> +    return false;
>
> can go.
>
> top_stmt may end up as a TRUNC_DIV_EXPR so it's pointless to only look
> for another
> TRUNC_DIV_EXPR later ... you may end up without a single TRUNC_MOD_EXPR.
> Which means you want a div_seen and a mod_seen, or simply record the top_stmt
> code and look for the opposite in the 2nd loop.
Um sorry I don't quite understand how we could end up without a trunc_mod stmt ?
The 2nd loop adds both trunc_div and trunc_mod to stmts vector, and
checks if we have
come across at least a single trunc_div stmt (and we bail out if no
div is seen).

At 2nd loop I suppose we don't need mod_seen, because stmt is
guaranteed to be trunc_mod_expr.
In the 2nd loop the following condition will never trigger for stmt:
  if (stmt_can_throw_internal (use_stmt))
            continue;
since we checked before hand if stmt could throw and chose to bail out
in that case.

and the following condition would also not trigger for stmt:
if (!dominated_by_p (CDI_DOMINATORS, gimple_bb (use_stmt), top_bb))
  {
    end_imm_use_stmt_traverse (&use_iter);
    return false;
  }
since gimple_bb (stmt) is always dominated by gimple_bb (top_stmt).

The case where top_stmt == stmt, we wouldn't reach the above
condition, since we have above it:
if (top_stmt == stmt)
  continue;

So IIUC, top_stmt and stmt would always get added to stmts vector.
Am I missing something ?

Thanks,
Prathamesh
>
> +      switch (gimple_assign_rhs_code (use_stmt))
> +       {
> +         case TRUNC_DIV_EXPR:
> +           new_rhs = fold_build1 (REALPART_EXPR, TREE_TYPE (op1), res);
> +           break;
> +
> +         case TRUNC_MOD_EXPR:
> +           new_rhs = fold_build1 (IMAGPART_EXPR, TREE_TYPE (op2), res);
> +           break;
> +
>
> why type of op1 and type of op2 in the other case?  Choose one for consistency.
>
> +      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
> +       cfg_changed = true;
>
> as you are rejecting all internally throwing stmts this shouldn't be necessary.
>
> The patch is ok with those changes.
>
> Thanks,
> Richard.
>
>
>> Thanks,
>> Prathamesh
Richard Biener Oct. 25, 2016, 10:47 a.m. UTC | #20
On Tue, 25 Oct 2016, Prathamesh Kulkarni wrote:

> On 25 October 2016 at 13:43, Richard Biener <richard.guenther@gmail.com> wrote:
> > On Sun, Oct 16, 2016 at 7:59 AM, Prathamesh Kulkarni
> > <prathamesh.kulkarni@linaro.org> wrote:
> >> Hi,
> >> After approval from Bernd Schmidt, I committed the patch to remove
> >> optab functions for
> >> sdivmod_optab and udivmod_optab in optabs.def, which removes the block
> >> for divmod patch.
> >>
> >> This patch is mostly the same as previous one, except it drops
> >> targeting __udivmoddi4() because
> >> it gave undefined reference link error for calling __udivmoddi4() on
> >> aarch64-linux-gnu.
> >> It appears aarch64 has hardware insn for DImode div, so __udivmoddi4()
> >> isn't needed for the target
> >> (it was a bug in my patch that called __udivmoddi4() even though
> >> aarch64 supported hardware div).
> >>
> >> However this makes me wonder if it's guaranteed that __udivmoddi4()
> >> will be available for a target if it doesn't have hardware div and
> >> divmod insn and doesn't have target-specific libfunc for
> >> DImode divmod ? To be conservative, the attached patch doesn't
> >> generate call to __udivmoddi4.
> >>
> >> Passes bootstrap+test on x86_64-unknown-linux.
> >> Cross-tested on arm*-*-*, aarch64*-*-*.
> >> Verified that there are no regressions with SPEC2006 on
> >> x86_64-unknown-linux-gnu.
> >> OK to commit ?
> >
> > I think the searching is still somewhat wrong - it's been some time
> > since my last look at the
> > patch so maybe I've said this already.  Please bail out early for
> > stmt_can_throw_internal (stmt),
> > otherwise the top stmt search might end up not working.  So
> >
> > +
> > +  if (top_stmt == stmt && stmt_can_throw_internal (top_stmt))
> > +    return false;
> >
> > can go.
> >
> > top_stmt may end up as a TRUNC_DIV_EXPR so it's pointless to only look
> > for another
> > TRUNC_DIV_EXPR later ... you may end up without a single TRUNC_MOD_EXPR.
> > Which means you want a div_seen and a mod_seen, or simply record the top_stmt
> > code and look for the opposite in the 2nd loop.
> Um sorry I don't quite understand how we could end up without a trunc_mod stmt ?
> The 2nd loop adds both trunc_div and trunc_mod to stmts vector, and
> checks if we have
> come across at least a single trunc_div stmt (and we bail out if no
> div is seen).
> 
> At 2nd loop I suppose we don't need mod_seen, because stmt is
> guaranteed to be trunc_mod_expr.
> In the 2nd loop the following condition will never trigger for stmt:
>   if (stmt_can_throw_internal (use_stmt))
>             continue;
> since we checked before hand if stmt could throw and chose to bail out
> in that case.
> 
> and the following condition would also not trigger for stmt:
> if (!dominated_by_p (CDI_DOMINATORS, gimple_bb (use_stmt), top_bb))
>   {
>     end_imm_use_stmt_traverse (&use_iter);
>     return false;
>   }
> since gimple_bb (stmt) is always dominated by gimple_bb (top_stmt).
> 
> The case where top_stmt == stmt, we wouldn't reach the above
> condition, since we have above it:
> if (top_stmt == stmt)
>   continue;
> 
> So IIUC, top_stmt and stmt would always get added to stmts vector.
> Am I missing something ?

Ah, indeed.  Maybe add a comment then, it wasn't really obvious ;)

Please still move the stmt_can_throw_internal (stmt) check up.

Thanks,
Richard.

> Thanks,
> Prathamesh
> >
> > +      switch (gimple_assign_rhs_code (use_stmt))
> > +       {
> > +         case TRUNC_DIV_EXPR:
> > +           new_rhs = fold_build1 (REALPART_EXPR, TREE_TYPE (op1), res);
> > +           break;
> > +
> > +         case TRUNC_MOD_EXPR:
> > +           new_rhs = fold_build1 (IMAGPART_EXPR, TREE_TYPE (op2), res);
> > +           break;
> > +
> >
> > why type of op1 and type of op2 in the other case?  Choose one for consistency.
> >
> > +      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
> > +       cfg_changed = true;
> >
> > as you are rejecting all internally throwing stmts this shouldn't be necessary.
> >
> > The patch is ok with those changes.
> >
> > Thanks,
> > Richard.
> >
> >
> >> Thanks,
> >> Prathamesh
> 
>
Prathamesh Kulkarni Oct. 25, 2016, 12:14 p.m. UTC | #21
On 25 October 2016 at 16:17, Richard Biener <rguenther@suse.de> wrote:
> On Tue, 25 Oct 2016, Prathamesh Kulkarni wrote:
>
>> On 25 October 2016 at 13:43, Richard Biener <richard.guenther@gmail.com> wrote:
>> > On Sun, Oct 16, 2016 at 7:59 AM, Prathamesh Kulkarni
>> > <prathamesh.kulkarni@linaro.org> wrote:
>> >> Hi,
>> >> After approval from Bernd Schmidt, I committed the patch to remove
>> >> optab functions for
>> >> sdivmod_optab and udivmod_optab in optabs.def, which removes the block
>> >> for divmod patch.
>> >>
>> >> This patch is mostly the same as previous one, except it drops
>> >> targeting __udivmoddi4() because
>> >> it gave undefined reference link error for calling __udivmoddi4() on
>> >> aarch64-linux-gnu.
>> >> It appears aarch64 has hardware insn for DImode div, so __udivmoddi4()
>> >> isn't needed for the target
>> >> (it was a bug in my patch that called __udivmoddi4() even though
>> >> aarch64 supported hardware div).
>> >>
>> >> However this makes me wonder if it's guaranteed that __udivmoddi4()
>> >> will be available for a target if it doesn't have hardware div and
>> >> divmod insn and doesn't have target-specific libfunc for
>> >> DImode divmod ? To be conservative, the attached patch doesn't
>> >> generate call to __udivmoddi4.
>> >>
>> >> Passes bootstrap+test on x86_64-unknown-linux.
>> >> Cross-tested on arm*-*-*, aarch64*-*-*.
>> >> Verified that there are no regressions with SPEC2006 on
>> >> x86_64-unknown-linux-gnu.
>> >> OK to commit ?
>> >
>> > I think the searching is still somewhat wrong - it's been some time
>> > since my last look at the
>> > patch so maybe I've said this already.  Please bail out early for
>> > stmt_can_throw_internal (stmt),
>> > otherwise the top stmt search might end up not working.  So
>> >
>> > +
>> > +  if (top_stmt == stmt && stmt_can_throw_internal (top_stmt))
>> > +    return false;
>> >
>> > can go.
>> >
>> > top_stmt may end up as a TRUNC_DIV_EXPR so it's pointless to only look
>> > for another
>> > TRUNC_DIV_EXPR later ... you may end up without a single TRUNC_MOD_EXPR.
>> > Which means you want a div_seen and a mod_seen, or simply record the top_stmt
>> > code and look for the opposite in the 2nd loop.
>> Um sorry I don't quite understand how we could end up without a trunc_mod stmt ?
>> The 2nd loop adds both trunc_div and trunc_mod to stmts vector, and
>> checks if we have
>> come across at least a single trunc_div stmt (and we bail out if no
>> div is seen).
>>
>> At 2nd loop I suppose we don't need mod_seen, because stmt is
>> guaranteed to be trunc_mod_expr.
>> In the 2nd loop the following condition will never trigger for stmt:
>>   if (stmt_can_throw_internal (use_stmt))
>>             continue;
>> since we checked before hand if stmt could throw and chose to bail out
>> in that case.
>>
>> and the following condition would also not trigger for stmt:
>> if (!dominated_by_p (CDI_DOMINATORS, gimple_bb (use_stmt), top_bb))
>>   {
>>     end_imm_use_stmt_traverse (&use_iter);
>>     return false;
>>   }
>> since gimple_bb (stmt) is always dominated by gimple_bb (top_stmt).
>>
>> The case where top_stmt == stmt, we wouldn't reach the above
>> condition, since we have above it:
>> if (top_stmt == stmt)
>>   continue;
>>
>> So IIUC, top_stmt and stmt would always get added to stmts vector.
>> Am I missing something ?
>
> Ah, indeed.  Maybe add a comment then, it wasn't really obvious ;)
>
> Please still move the stmt_can_throw_internal (stmt) check up.
Sure, I will move that up and do the other suggested changes.

I was wondering if this condition in 2nd loop is too restrictive ?
if (!dominated_by_p (CDI_DOMINATORS, gimple_bb (use_stmt), top_bb))
  {
    end_imm_use_stmt_traverse (&use_iter);
    return false;
  }

Should we rather "continue" in this case by not adding use_stmt to
stmts vector rather than dropping
the transform all-together if gimple_bb (use_stmt) is not dominated by
gimple_bb (top_stmt) ?

For instance if we have a test-case like:

if (cond)
{
  t1 = x / y;
  t2 = x % y;
}
else
  t3 = x % y;

and suppose stmt is "t2 = x % y", we would set top_stmt to "t1 = x / y";
In this case we would still want to do divmod transform in THEN block
even though "t3 = x % y" is not dominated by top_stmt ?

if (cond)
{
  divmod_tmp = DIVMOD (x, y);
  t1 = REALPART_EXPR (divmod_tmp);
  t2 = IMAGPART_EXPR (divmod_tmp);
}
else
  t3 = x % y;

We will always ensure that all the trunc_div, trunc_mod statements in
stmts vector will be dominated by top_stmt,
but I suppose they need not constitute all the trunc_div, trunc_mod
statements in the function.

Thanks,
Prathamesh
>
> Thanks,
> Richard.
>
>> Thanks,
>> Prathamesh
>> >
>> > +      switch (gimple_assign_rhs_code (use_stmt))
>> > +       {
>> > +         case TRUNC_DIV_EXPR:
>> > +           new_rhs = fold_build1 (REALPART_EXPR, TREE_TYPE (op1), res);
>> > +           break;
>> > +
>> > +         case TRUNC_MOD_EXPR:
>> > +           new_rhs = fold_build1 (IMAGPART_EXPR, TREE_TYPE (op2), res);
>> > +           break;
>> > +
>> >
>> > why type of op1 and type of op2 in the other case?  Choose one for consistency.
>> >
>> > +      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
>> > +       cfg_changed = true;
>> >
>> > as you are rejecting all internally throwing stmts this shouldn't be necessary.
>> >
>> > The patch is ok with those changes.
>> >
>> > Thanks,
>> > Richard.
>> >
>> >
>> >> Thanks,
>> >> Prathamesh
>>
>>
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Richard Biener Oct. 25, 2016, 1:17 p.m. UTC | #22
On Tue, 25 Oct 2016, Prathamesh Kulkarni wrote:

> On 25 October 2016 at 16:17, Richard Biener <rguenther@suse.de> wrote:
> > On Tue, 25 Oct 2016, Prathamesh Kulkarni wrote:
> >
> >> On 25 October 2016 at 13:43, Richard Biener <richard.guenther@gmail.com> wrote:
> >> > On Sun, Oct 16, 2016 at 7:59 AM, Prathamesh Kulkarni
> >> > <prathamesh.kulkarni@linaro.org> wrote:
> >> >> Hi,
> >> >> After approval from Bernd Schmidt, I committed the patch to remove
> >> >> optab functions for
> >> >> sdivmod_optab and udivmod_optab in optabs.def, which removes the block
> >> >> for divmod patch.
> >> >>
> >> >> This patch is mostly the same as previous one, except it drops
> >> >> targeting __udivmoddi4() because
> >> >> it gave undefined reference link error for calling __udivmoddi4() on
> >> >> aarch64-linux-gnu.
> >> >> It appears aarch64 has hardware insn for DImode div, so __udivmoddi4()
> >> >> isn't needed for the target
> >> >> (it was a bug in my patch that called __udivmoddi4() even though
> >> >> aarch64 supported hardware div).
> >> >>
> >> >> However this makes me wonder if it's guaranteed that __udivmoddi4()
> >> >> will be available for a target if it doesn't have hardware div and
> >> >> divmod insn and doesn't have target-specific libfunc for
> >> >> DImode divmod ? To be conservative, the attached patch doesn't
> >> >> generate call to __udivmoddi4.
> >> >>
> >> >> Passes bootstrap+test on x86_64-unknown-linux.
> >> >> Cross-tested on arm*-*-*, aarch64*-*-*.
> >> >> Verified that there are no regressions with SPEC2006 on
> >> >> x86_64-unknown-linux-gnu.
> >> >> OK to commit ?
> >> >
> >> > I think the searching is still somewhat wrong - it's been some time
> >> > since my last look at the
> >> > patch so maybe I've said this already.  Please bail out early for
> >> > stmt_can_throw_internal (stmt),
> >> > otherwise the top stmt search might end up not working.  So
> >> >
> >> > +
> >> > +  if (top_stmt == stmt && stmt_can_throw_internal (top_stmt))
> >> > +    return false;
> >> >
> >> > can go.
> >> >
> >> > top_stmt may end up as a TRUNC_DIV_EXPR so it's pointless to only look
> >> > for another
> >> > TRUNC_DIV_EXPR later ... you may end up without a single TRUNC_MOD_EXPR.
> >> > Which means you want a div_seen and a mod_seen, or simply record the top_stmt
> >> > code and look for the opposite in the 2nd loop.
> >> Um sorry I don't quite understand how we could end up without a trunc_mod stmt ?
> >> The 2nd loop adds both trunc_div and trunc_mod to stmts vector, and
> >> checks if we have
> >> come across at least a single trunc_div stmt (and we bail out if no
> >> div is seen).
> >>
> >> At 2nd loop I suppose we don't need mod_seen, because stmt is
> >> guaranteed to be trunc_mod_expr.
> >> In the 2nd loop the following condition will never trigger for stmt:
> >>   if (stmt_can_throw_internal (use_stmt))
> >>             continue;
> >> since we checked before hand if stmt could throw and chose to bail out
> >> in that case.
> >>
> >> and the following condition would also not trigger for stmt:
> >> if (!dominated_by_p (CDI_DOMINATORS, gimple_bb (use_stmt), top_bb))
> >>   {
> >>     end_imm_use_stmt_traverse (&use_iter);
> >>     return false;
> >>   }
> >> since gimple_bb (stmt) is always dominated by gimple_bb (top_stmt).
> >>
> >> The case where top_stmt == stmt, we wouldn't reach the above
> >> condition, since we have above it:
> >> if (top_stmt == stmt)
> >>   continue;
> >>
> >> So IIUC, top_stmt and stmt would always get added to stmts vector.
> >> Am I missing something ?
> >
> > Ah, indeed.  Maybe add a comment then, it wasn't really obvious ;)
> >
> > Please still move the stmt_can_throw_internal (stmt) check up.
> Sure, I will move that up and do the other suggested changes.
> 
> I was wondering if this condition in 2nd loop is too restrictive ?
> if (!dominated_by_p (CDI_DOMINATORS, gimple_bb (use_stmt), top_bb))
>   {
>     end_imm_use_stmt_traverse (&use_iter);
>     return false;
>   }
> 
> Should we rather "continue" in this case by not adding use_stmt to
> stmts vector rather than dropping
> the transform all-together if gimple_bb (use_stmt) is not dominated by
> gimple_bb (top_stmt) ?

Ah, yes - didn't spot that.

Richard.

> 
> For instance if we have a test-case like:
> 
> if (cond)
> {
>   t1 = x / y;
>   t2 = x % y;
> }
> else
>   t3 = x % y;
> 
> and suppose stmt is "t2 = x % y", we would set top_stmt to "t1 = x / y";
> In this case we would still want to do divmod transform in THEN block
> even though "t3 = x % y" is not dominated by top_stmt ?
> 
> if (cond)
> {
>   divmod_tmp = DIVMOD (x, y);
>   t1 = REALPART_EXPR (divmod_tmp);
>   t2 = IMAGPART_EXPR (divmod_tmp);
> }
> else
>   t3 = x % y;
> 
> We will always ensure that all the trunc_div, trunc_mod statements in
> stmts vector will be dominated by top_stmt,
> but I suppose they need not constitute all the trunc_div, trunc_mod
> statements in the function.
> 
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Richard.
> >
> >> Thanks,
> >> Prathamesh
> >> >
> >> > +      switch (gimple_assign_rhs_code (use_stmt))
> >> > +       {
> >> > +         case TRUNC_DIV_EXPR:
> >> > +           new_rhs = fold_build1 (REALPART_EXPR, TREE_TYPE (op1), res);
> >> > +           break;
> >> > +
> >> > +         case TRUNC_MOD_EXPR:
> >> > +           new_rhs = fold_build1 (IMAGPART_EXPR, TREE_TYPE (op2), res);
> >> > +           break;
> >> > +
> >> >
> >> > why type of op1 and type of op2 in the other case?  Choose one for consistency.
> >> >
> >> > +      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
> >> > +       cfg_changed = true;
> >> >
> >> > as you are rejecting all internally throwing stmts this shouldn't be necessary.
> >> >
> >> > The patch is ok with those changes.
> >> >
> >> > Thanks,
> >> > Richard.
> >> >
> >> >
> >> >> Thanks,
> >> >> Prathamesh
> >>
> >>
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
> 
>
Martin Liška Jan. 2, 2017, 3:07 p.m. UTC | #23
On 10/16/2016 07:59 AM, Prathamesh Kulkarni wrote:
> +  /* Disable the transform if either is a constant, since division-by-constant
> +     may have specialized expansion.  */
> +  if (CONSTANT_CLASS_P (op1) || CONSTANT_CLASS_P (op2))
> +    return false;

Hello.

I've just played a bit with DIVMOD and realized that the DIVMOD optimization is
disable when either op1 or op2 are constant. Aforementioned hunk contains a bit strange
comment, where one is not sure that the condition is intentional?

Thanks for clarification.
Martin
Prathamesh Kulkarni Jan. 2, 2017, 7:24 p.m. UTC | #24
On 2 January 2017 at 20:37, Martin Liška <mliska@suse.cz> wrote:
> On 10/16/2016 07:59 AM, Prathamesh Kulkarni wrote:
>> +  /* Disable the transform if either is a constant, since division-by-constant
>> +     may have specialized expansion.  */
>> +  if (CONSTANT_CLASS_P (op1) || CONSTANT_CLASS_P (op2))
>> +    return false;
>
> Hello.
>
> I've just played a bit with DIVMOD and realized that the DIVMOD optimization is
> disable when either op1 or op2 are constant. Aforementioned hunk contains a bit strange
> comment, where one is not sure that the condition is intentional?
Well, since expand_divmod() has specialized expansions for
division/mod for few constants,
the divmod transform is disabled if one of it's operands is constant.
I suppose the check could be made more precise by tracking cases that
are handled specially by expand_divmod,
but I didn't have a look at that, and simply disabled the transform if
one of the operands is a constant.

Thanks,
Prathamesh
>
> Thanks for clarification.
> Martin
diff mbox

Patch

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index a4a8e49..866c368 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -7078,6 +7078,11 @@  This is firstly introduced on ARM/AArch64 targets, please refer to
 the hook implementation for how different fusion types are supported.
 @end deftypefn
 
+@deftypefn {Target Hook} void TARGET_EXPAND_DIVMOD_LIBFUNC (rtx @var{libfunc}, machine_mode @var{mode}, rtx @var{op0}, rtx @var{op1}, rtx *@var{quot}, rtx *@var{rem})
+Define this hook for enabling divmod transform if the port does not have
+hardware divmod insn but defines target-specific divmod libfuncs.
+@end deftypefn
+
 @node Sections
 @section Dividing the Output into Sections (Texts, Data, @dots{})
 @c the above section title is WAY too long.  maybe cut the part between
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 265f1be..c4c387b 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4890,6 +4890,8 @@  them: try the first ones in this list first.
 
 @hook TARGET_SCHED_FUSION_PRIORITY
 
+@hook TARGET_EXPAND_DIVMOD_LIBFUNC
+
 @node Sections
 @section Dividing the Output into Sections (Texts, Data, @dots{})
 @c the above section title is WAY too long.  maybe cut the part between
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 0b32d5f..42c6973 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -2207,6 +2207,53 @@  expand_ATOMIC_COMPARE_EXCHANGE (internal_fn, gcall *call)
   expand_ifn_atomic_compare_exchange (call);
 }
 
+/* Expand DIVMOD() using:
+ a) optab handler for udivmod/sdivmod if it is available.
+ b) If optab_handler doesn't exist, generate call to
+    target-specific divmod libfunc.  */
+
+static void
+expand_DIVMOD (internal_fn, gcall *call_stmt)
+{
+  tree lhs = gimple_call_lhs (call_stmt);
+  tree arg0 = gimple_call_arg (call_stmt, 0);
+  tree arg1 = gimple_call_arg (call_stmt, 1);
+
+  gcc_assert (TREE_CODE (TREE_TYPE (lhs)) == COMPLEX_TYPE);
+  tree type = TREE_TYPE (TREE_TYPE (lhs));
+  machine_mode mode = TYPE_MODE (type);
+  bool unsignedp = TYPE_UNSIGNED (type);
+  optab tab = (unsignedp) ? udivmod_optab : sdivmod_optab;
+
+  rtx op0 = expand_normal (arg0);
+  rtx op1 = expand_normal (arg1);
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+
+  rtx quotient, remainder, libfunc;
+
+  /* Check if optab_handler exists for divmod_optab for given mode.  */
+  if (optab_handler (tab, mode) != CODE_FOR_nothing)
+    {
+      quotient = gen_reg_rtx (mode);
+      remainder = gen_reg_rtx (mode);
+      expand_twoval_binop (tab, op0, op1, quotient, remainder, unsignedp);
+    }
+
+  /* Generate call to divmod libfunc if it exists.  */
+  else if ((libfunc = optab_libfunc (tab, mode)) != NULL_RTX)
+    targetm.expand_divmod_libfunc (libfunc, mode, op0, op1,
+				   &quotient, &remainder);				    
+
+  else
+    gcc_unreachable ();
+
+  /* Wrap the return value (quotient, remainder) within COMPLEX_EXPR.  */
+  expand_expr (build2 (COMPLEX_EXPR, TREE_TYPE (lhs),
+		       make_tree (TREE_TYPE (arg0), quotient),
+		       make_tree (TREE_TYPE (arg1), remainder)),
+	      target, VOIDmode, EXPAND_NORMAL);
+}
+
 /* Expand a call to FN using the operands in STMT.  FN has a single
    output operand and NARGS input operands.  */
 
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index d4fbdb2..cd0e06c 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -198,6 +198,9 @@  DEF_INTERNAL_FN (ATOMIC_COMPARE_EXCHANGE, ECF_LEAF | ECF_NOTHROW, NULL)
 /* To implement [[fallthrough]].  */
 DEF_INTERNAL_FN (FALLTHROUGH, ECF_LEAF | ECF_NOTHROW, NULL)
 
+/* Divmod function.  */
+DEF_INTERNAL_FN (DIVMOD, ECF_CONST | ECF_LEAF, NULL)
+
 #undef DEF_INTERNAL_INT_FN
 #undef DEF_INTERNAL_FLT_FN
 #undef DEF_INTERNAL_OPTAB_FN
diff --git a/gcc/target.def b/gcc/target.def
index b6968f7..47ee3bc 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -5036,6 +5036,15 @@  Normally, this is not needed.",
  bool, (const_tree field, machine_mode mode),
  default_member_type_forces_blk)
 
+/* See tree-ssa-math-opts.c:divmod_candidate_p for conditions
+   that gate the divod transform.  */
+DEFHOOK
+(expand_divmod_libfunc,
+ "Define this hook for enabling divmod transform if the port does not have\n\
+hardware divmod insn but defines target-specific divmod libfuncs.", 
+ void, (rtx libfunc, machine_mode mode, rtx op0, rtx op1, rtx *quot, rtx *rem),
+ NULL)
+
 /* Return the class for a secondary reload, and fill in extra information.  */
 DEFHOOK
 (secondary_reload,
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 0cea1a8..4f7bdb2 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -112,6 +112,9 @@  along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "internal-fn.h"
 #include "case-cfn-macros.h"
+#include "optabs-libfuncs.h"
+#include "tree-eh.h"
+#include "targhooks.h"
 
 /* This structure represents one basic block that either computes a
    division, or is a common dominator for basic block that compute a
@@ -184,6 +187,9 @@  static struct
 
   /* Number of fp fused multiply-add ops inserted.  */
   int fmas_inserted;
+
+  /* Number of divmod calls inserted.  */
+  int divmod_calls_inserted;
 } widen_mul_stats;
 
 /* The instance of "struct occurrence" representing the highest
@@ -3793,6 +3799,226 @@  match_uaddsub_overflow (gimple_stmt_iterator *gsi, gimple *stmt,
   return true;
 }
 
+/* Return true if target has support for divmod.  */
+
+static bool
+target_supports_divmod_p (optab divmod_optab, optab div_optab, machine_mode mode) 
+{
+  /* If target supports hardware divmod insn, use it for divmod.  */
+  if (optab_handler (divmod_optab, mode) != CODE_FOR_nothing)
+    return true;
+
+  /* Check if libfunc for divmod is available.  */
+  rtx libfunc = optab_libfunc (divmod_optab, mode);
+  if (libfunc != NULL_RTX)
+    {
+      /* If optab_handler exists for div_optab, perhaps in a wider mode,
+	 we don't want to use the libfunc even if it exists for given mode.  */ 
+      for (machine_mode div_mode = mode;
+	   div_mode != VOIDmode;
+	   div_mode = GET_MODE_WIDER_MODE (div_mode))
+	if (optab_handler (div_optab, div_mode) != CODE_FOR_nothing)
+	  return false;
+
+      return targetm.expand_divmod_libfunc != NULL;
+    }
+  
+  /* FIXME: We could generate call to __udivmoddi4() for unsigned DImode,
+     if target doesn't have hardwre div, but I am not sure if __udivmoddi4 is
+     available for all targets.  */
+  return false; 
+}
+
+/* Check if stmt is candidate for divmod transform.  */
+
+static bool
+divmod_candidate_p (gassign *stmt)
+{
+  tree type = TREE_TYPE (gimple_assign_lhs (stmt));
+  enum machine_mode mode = TYPE_MODE (type);
+  optab divmod_optab, div_optab;
+
+  if (TYPE_UNSIGNED (type))
+    {
+      divmod_optab = udivmod_optab;
+      div_optab = udiv_optab;
+    }
+  else
+    {
+      divmod_optab = sdivmod_optab;
+      div_optab = sdiv_optab;
+    }
+
+  tree op1 = gimple_assign_rhs1 (stmt);
+  tree op2 = gimple_assign_rhs2 (stmt);
+
+  /* Disable the transform if either is a constant, since division-by-constant
+     may have specialized expansion.  */
+  if (CONSTANT_CLASS_P (op1) || CONSTANT_CLASS_P (op2))
+    return false;
+
+  /* Exclude the case where TYPE_OVERFLOW_TRAPS (type) as that should
+     expand using the [su]divv optabs.  */
+  if (TYPE_OVERFLOW_TRAPS (type))
+    return false;
+  
+  if (!target_supports_divmod_p (divmod_optab, div_optab, mode)) 
+    return false;
+
+  return true;
+}
+
+/* This function looks for:
+   t1 = a TRUNC_DIV_EXPR b;
+   t2 = a TRUNC_MOD_EXPR b;
+   and transforms it to the following sequence:
+   complex_tmp = DIVMOD (a, b);
+   t1 = REALPART_EXPR(a);
+   t2 = IMAGPART_EXPR(b);
+   For conditions enabling the transform see divmod_candidate_p().
+
+   The pass works in two phases:
+   1) Walk through all immediate uses of stmt's operand and find a
+      TRUNC_DIV_EXPR with matching operands and if such a stmt is found add
+      it to stmts vector.
+   2) Insert DIVMOD call before first div/mod stmt in top_bb (basic block that
+      dominates other div/mod stmts with same operands) and update entries in
+      stmts vector to use return value of DIMOVD (REALEXPR_PART for div,
+      IMAGPART_EXPR for mod).  */
+
+static bool
+convert_to_divmod (gassign *stmt)
+{
+  if (!divmod_candidate_p (stmt))
+    return false;
+
+  tree op1 = gimple_assign_rhs1 (stmt);
+  tree op2 = gimple_assign_rhs2 (stmt);
+  
+  imm_use_iterator use_iter;
+  gimple *use_stmt;
+  auto_vec<gimple *> stmts; 
+
+  gimple *top_stmt = stmt; 
+  basic_block top_bb = gimple_bb (stmt);
+
+  /* Try to set top_stmt to "topmost" stmt
+     with code TRUNC_DIV_EXPR/TRUNC_MOD_EXPR having same operands as stmt.  */
+
+  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1)
+    {
+      if (is_gimple_assign (use_stmt)
+	  && (gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR
+	      || gimple_assign_rhs_code (use_stmt) == TRUNC_MOD_EXPR)
+	  && operand_equal_p (op1, gimple_assign_rhs1 (use_stmt), 0)
+	  && operand_equal_p (op2, gimple_assign_rhs2 (use_stmt), 0))
+	{
+	  if (stmt_can_throw_internal (use_stmt))
+	    continue;
+
+	  basic_block bb = gimple_bb (use_stmt);
+
+	  if (bb == top_bb)
+	    {
+	      if (gimple_uid (use_stmt) < gimple_uid (top_stmt))
+		top_stmt = use_stmt;
+	    }
+	  else if (dominated_by_p (CDI_DOMINATORS, top_bb, bb))
+	    {
+	      top_bb = bb;
+	      top_stmt = use_stmt;
+	    }
+	}
+    }
+
+  if (top_stmt == stmt && stmt_can_throw_internal (top_stmt))
+    return false;
+
+  tree top_op1 = gimple_assign_rhs1 (top_stmt);
+  tree top_op2 = gimple_assign_rhs2 (top_stmt);
+
+  stmts.safe_push (top_stmt);
+  bool div_seen = (gimple_assign_rhs_code (top_stmt) == TRUNC_DIV_EXPR);
+
+  /* Ensure that gimple_bb (use_stmt) is dominated by top_bb.  */    
+
+  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, top_op1)
+    {
+      if (is_gimple_assign (use_stmt)
+	  && (gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR
+	      || gimple_assign_rhs_code (use_stmt) == TRUNC_MOD_EXPR)
+	  && operand_equal_p (top_op1, gimple_assign_rhs1 (use_stmt), 0)
+	  && operand_equal_p (top_op2, gimple_assign_rhs2 (use_stmt), 0))
+	{
+	  if (use_stmt == top_stmt)
+	    continue;
+
+	  if (stmt_can_throw_internal (use_stmt))
+	    continue;
+
+	  if (!dominated_by_p (CDI_DOMINATORS, gimple_bb (use_stmt), top_bb))
+	    {
+	      end_imm_use_stmt_traverse (&use_iter);
+	      return false;
+	    }
+
+	  stmts.safe_push (use_stmt);
+	  if (gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR)
+	    div_seen = true;
+	}
+    }
+
+  if (!div_seen)
+    return false;
+
+  /* Create libcall to internal fn DIVMOD:
+     divmod_tmp = DIVMOD (op1, op2).  */
+
+  gcall *call_stmt = gimple_build_call_internal (IFN_DIVMOD, 2, op1, op2);
+  tree res = make_temp_ssa_name (
+		build_complex_type (TREE_TYPE (op1)),
+		call_stmt, "divmod_tmp");
+  gimple_call_set_lhs (call_stmt, res);
+
+  /* Insert the call before top_stmt.  */
+  gimple_stmt_iterator top_stmt_gsi = gsi_for_stmt (top_stmt);
+  gsi_insert_before (&top_stmt_gsi, call_stmt, GSI_SAME_STMT);
+
+  widen_mul_stats.divmod_calls_inserted++;		
+
+  /* Update all statements in stmts.
+     if stmt is lhs = op1 TRUNC_DIV_EXPR op2, change to lhs = REALPART_EXPR<divmod_tmp>
+     if stmt is lhs = op1 TRUNC_MOD_EXPR op2, change to lhs = IMAGPART_EXPR<divmod_tmp>.  */
+
+  bool cfg_changed = false;
+  for (unsigned i = 0; stmts.iterate (i, &use_stmt); ++i)
+    {
+      tree new_rhs;
+
+      switch (gimple_assign_rhs_code (use_stmt))
+	{
+	  case TRUNC_DIV_EXPR:
+	    new_rhs = fold_build1 (REALPART_EXPR, TREE_TYPE (op1), res);
+	    break;
+
+	  case TRUNC_MOD_EXPR:
+	    new_rhs = fold_build1 (IMAGPART_EXPR, TREE_TYPE (op2), res);
+	    break;
+
+	  default:
+	    gcc_unreachable ();
+	}
+
+      gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
+      gimple_assign_set_rhs_from_tree (&gsi, new_rhs);
+      update_stmt (use_stmt);
+
+      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
+	cfg_changed = true;
+    }
+
+  return cfg_changed;
+}    
 
 /* Find integer multiplications where the operands are extended from
    smaller types, and replace the MULT_EXPR with a WIDEN_MULT_EXPR
@@ -3837,6 +4063,8 @@  pass_optimize_widening_mul::execute (function *fun)
   bool cfg_changed = false;
 
   memset (&widen_mul_stats, 0, sizeof (widen_mul_stats));
+  calculate_dominance_info (CDI_DOMINATORS);
+  renumber_gimple_stmt_uids ();
 
   FOR_EACH_BB_FN (bb, fun)
     {
@@ -3870,6 +4098,10 @@  pass_optimize_widening_mul::execute (function *fun)
 		    match_uaddsub_overflow (&gsi, stmt, code);
 		  break;
 
+		case TRUNC_MOD_EXPR:
+		  cfg_changed = convert_to_divmod (as_a<gassign *> (stmt));
+		  break;
+
 		default:;
 		}
 	    }
@@ -3916,6 +4148,8 @@  pass_optimize_widening_mul::execute (function *fun)
 			    widen_mul_stats.maccs_inserted);
   statistics_counter_event (fun, "fused multiply-adds inserted",
 			    widen_mul_stats.fmas_inserted);
+  statistics_counter_event (fun, "divmod calls inserted",
+			    widen_mul_stats.divmod_calls_inserted);
 
   return cfg_changed ? TODO_cleanup_cfg : 0;
 }