diff mbox

[RFC,i386] : Enable post-reload compare elimination pass

Message ID CAFULd4ZLes9A0mFFYvnv-CDC+n9w7hFw67PsP7Q9UTwz2eGUVA@mail.gmail.com
State New
Headers show

Commit Message

Uros Bizjak May 10, 2017, 7:05 p.m. UTC
On Wed, May 10, 2017 at 5:18 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Wed, May 10, 2017 at 4:27 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Tue, May 09, 2017 at 06:06:47PM +0200, Uros Bizjak wrote:
>>> Attached patch enables post-reload compare elimination pass by
>>> providing expected patterns (duplicates of existing patterns with
>>> setters of reg and flags switched in the parallel) for flag setting
>>> arithmetic instructions.
>>>
>>> The merge triggers more than 3000 times during the gcc bootstrap,
>>> mostly in cases where intervening memory load or store prevents
>>> combine from merging the arithmetic insn and the following compare.
>>>
>>> Also, some recent linux x86_64 defconfig build results in ~200 merges,
>>> removing ~200 test/cmp insns. Not much, but I think the results still
>>> warrant the pass to be enabled.
>>
>> Isn't the right fix instead to change the compare-elim.c pass to either
>> accept both reg vs. flags orderings in parallel, or both depending
>> on some target hook, or change it to the order i386.md and most other
>> major targets use and just fix up mn10300/rx (and aarch64?) to use the same
>> order?

Attached patch changes compare-elim.c order to what i386.md expects.

Thoughts?

Uros.

Comments

Uros Bizjak May 10, 2017, 7:57 p.m. UTC | #1
On Wed, May 10, 2017 at 9:05 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Wed, May 10, 2017 at 5:18 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> On Wed, May 10, 2017 at 4:27 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>>> On Tue, May 09, 2017 at 06:06:47PM +0200, Uros Bizjak wrote:
>>>> Attached patch enables post-reload compare elimination pass by
>>>> providing expected patterns (duplicates of existing patterns with
>>>> setters of reg and flags switched in the parallel) for flag setting
>>>> arithmetic instructions.
>>>>
>>>> The merge triggers more than 3000 times during the gcc bootstrap,
>>>> mostly in cases where intervening memory load or store prevents
>>>> combine from merging the arithmetic insn and the following compare.
>>>>
>>>> Also, some recent linux x86_64 defconfig build results in ~200 merges,
>>>> removing ~200 test/cmp insns. Not much, but I think the results still
>>>> warrant the pass to be enabled.
>>>
>>> Isn't the right fix instead to change the compare-elim.c pass to either
>>> accept both reg vs. flags orderings in parallel, or both depending
>>> on some target hook, or change it to the order i386.md and most other
>>> major targets use and just fix up mn10300/rx (and aarch64?) to use the same
>>> order?
>
> Attached patch changes compare-elim.c order to what i386.md expects.

BTW: This patch now catches 417 cases (instead of 200+) in linux
build, including e.g.:

(parallel [
        (set (reg:CCZ 17 flags)
            (compare:CCZ (lshiftrt:SI (reg:SI 4 si [orig:93 _10 ] [93])
                    (const_int 1 [0x1]))
                (const_int 0 [0])))
        (set (reg:DI 4 si)
            (zero_extend:DI (lshiftrt:SI (reg:SI 4 si [orig:93 _10 ] [93])
                    (const_int 1 [0x1]))))
    ])

Uros.
Jakub Jelinek May 10, 2017, 8:27 p.m. UTC | #2
On Wed, May 10, 2017 at 09:57:56PM +0200, Uros Bizjak wrote:
> BTW: This patch now catches 417 cases (instead of 200+) in linux
> build, including e.g.:
> 
> (parallel [
>         (set (reg:CCZ 17 flags)
>             (compare:CCZ (lshiftrt:SI (reg:SI 4 si [orig:93 _10 ] [93])
>                     (const_int 1 [0x1]))
>                 (const_int 0 [0])))
>         (set (reg:DI 4 si)
>             (zero_extend:DI (lshiftrt:SI (reg:SI 4 si [orig:93 _10 ] [93])
>                     (const_int 1 [0x1]))))
>     ])

That looks nice.  So, I think we need analysis on what order which targets
use.  I have looked at mn10300.md, I see {add,sub}si3_flags patterns that
would need PARALLEL reordering for this compare-elim.c change and then
cmp_liw vs. liw_cmp patterns I have no clue what they do and whether
compare-elim.c would care about those or not (they have UNSPECs).  Jeff/Alex?

In rx.md I see {add,sub}si3_flags too, then ssaddsi3 and 2 peepholes that
would need changing.

In visium.md I see flags_subst_{logic,arith} define_substs,
*{add,sub}<mode>3_insn_set_{carry,overflow}
{add,sub}si3_insn_set_{carry,overflow}, negsi2_insn_set_carry
and *neg<mode>2_insn_set_overflow that would need changing.

aarch64 is the only remaining compare-elim.c enabled target
(one that defines TARGET_FLAGS_REGNUM), and that one seems to use
the same parallel order as i386.md, so compare-elim.c most likely just
doesn't work there at all.

So all in all, sounds like we need to change at least 17 patterns on 3
not very widely used targets.

	Jakub
Jeff Law May 11, 2017, 5:17 p.m. UTC | #3
On 05/10/2017 02:27 PM, Jakub Jelinek wrote:
> On Wed, May 10, 2017 at 09:57:56PM +0200, Uros Bizjak wrote:
>> BTW: This patch now catches 417 cases (instead of 200+) in linux
>> build, including e.g.:
>>
>> (parallel [
>>          (set (reg:CCZ 17 flags)
>>              (compare:CCZ (lshiftrt:SI (reg:SI 4 si [orig:93 _10 ] [93])
>>                      (const_int 1 [0x1]))
>>                  (const_int 0 [0])))
>>          (set (reg:DI 4 si)
>>              (zero_extend:DI (lshiftrt:SI (reg:SI 4 si [orig:93 _10 ] [93])
>>                      (const_int 1 [0x1]))))
>>      ])
> 
> That looks nice.  So, I think we need analysis on what order which targets
> use.  I have looked at mn10300.md, I see {add,sub}si3_flags patterns that
> would need PARALLEL reordering for this compare-elim.c change and then
> cmp_liw vs. liw_cmp patterns I have no clue what they do and whether
> compare-elim.c would care about those or not (they have UNSPECs).  Jeff/Alex?
No idea what the cmp_liw and liw_cmp patterns are -- they were added 
after my involvement with the mn103 ended.

I know it's been discussed before, but all that's needed here is to put 
the assignment to CC_REG first in patterns were CC_REG is assigned a 
useful value, right?  For patterns were CC_REG is clobbered, we leave 
them as-is, right?

Does this impact logicals or just arithmetic?  If it hits logicals, 
there's a few more patterns in the mn103 port and likely the rx port as 
well.  But the changes look highly mechanical.

Furthermore, it's trivial for me to build this port through newlib.  So 
I could build before the change, stash away the resulting libc.a, then 
build after the change and compare the contents of the resulting 
objects.  In theory they should be identical unless I'm missing something.

I would say don't let the mn103 stop this work.  If the work goes 
forward, I'll own cleaning up the mn103.  I don't mind assigning the 
cleanup of rx to Nick to be done after the generic changes are done.

That just leaves visium which Eric B. owns.  Note that I can test visium 
in the same way I can test mn103 and rx, so if someone does the visium 
work I can test it.

Jeff
Jeff Law May 11, 2017, 6:28 p.m. UTC | #4
On 05/10/2017 01:05 PM, Uros Bizjak wrote:
> On Wed, May 10, 2017 at 5:18 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> On Wed, May 10, 2017 at 4:27 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>>> On Tue, May 09, 2017 at 06:06:47PM +0200, Uros Bizjak wrote:
>>>> Attached patch enables post-reload compare elimination pass by
>>>> providing expected patterns (duplicates of existing patterns with
>>>> setters of reg and flags switched in the parallel) for flag setting
>>>> arithmetic instructions.
>>>>
>>>> The merge triggers more than 3000 times during the gcc bootstrap,
>>>> mostly in cases where intervening memory load or store prevents
>>>> combine from merging the arithmetic insn and the following compare.
>>>>
>>>> Also, some recent linux x86_64 defconfig build results in ~200 merges,
>>>> removing ~200 test/cmp insns. Not much, but I think the results still
>>>> warrant the pass to be enabled.
>>>
>>> Isn't the right fix instead to change the compare-elim.c pass to either
>>> accept both reg vs. flags orderings in parallel, or both depending
>>> on some target hook, or change it to the order i386.md and most other
>>> major targets use and just fix up mn10300/rx (and aarch64?) to use the same
>>> order?
> 
> Attached patch changes compare-elim.c order to what i386.md expects.
> 
> Thoughts?
Haven't looked at the patch itself.  But I do have the necessary bits to 
convert the mn103 port.  It was slightly more than just fixing the md 
file, but nothing significant or time consuming.  The net result is 100% 
identical code for newlib before your patch vs after your patch w/mn103 
converted.

Hell, it was easy enough, I'll take a cut at the rx port.

jeff
Hans-Peter Nilsson May 11, 2017, 9:29 p.m. UTC | #5
On Wed, 10 May 2017, Jakub Jelinek wrote:

> On Wed, May 10, 2017 at 09:57:56PM +0200, Uros Bizjak wrote:
> > BTW: This patch now catches 417 cases (instead of 200+) in linux
> > build, including e.g.:
> >
> > (parallel [
> >         (set (reg:CCZ 17 flags)
> >             (compare:CCZ (lshiftrt:SI (reg:SI 4 si [orig:93 _10 ] [93])
> >                     (const_int 1 [0x1]))
> >                 (const_int 0 [0])))
> >         (set (reg:DI 4 si)
> >             (zero_extend:DI (lshiftrt:SI (reg:SI 4 si [orig:93 _10 ] [93])
> >                     (const_int 1 [0x1]))))
> >     ])
>
> That looks nice.  So, I think we need analysis on what order which targets
> use.  I have looked at mn10300.md, I see {add,sub}si3_flags patterns that
> would need PARALLEL reordering for this compare-elim.c change and then
> cmp_liw vs. liw_cmp patterns I have no clue what they do and whether
> compare-elim.c would care about those or not (they have UNSPECs).  Jeff/Alex?
>
> In rx.md I see {add,sub}si3_flags too, then ssaddsi3 and 2 peepholes that
> would need changing.

The canonical order of insns affecting condition-codes has been
discussed in the past too.

I was then arguing that the compare should go last, i.e.
 (parallel [(set (reg) (op...))
            (set (ccreg) (compare (op...) (const_int 0)))])
for simplicity of processing together with an alternative that
clobbers (i.e. same location in the parallel), as the canonical
order of clobbers in a parallel is that they always go last.
(IIRC from that distant past, transforming a cc0 target to
"ccreg" required at least three variants: the "naked" insn
(pre-reload?), the parallel with a clobber of ccreg, and the
actual use of ccreg; as above.)

Anyway, people seem to drift towards the ccreg-last variant for
some reason or another every time this is brought up; this time
it seems the single reason is that some existing pass expects or
generates this, using the order that causes the minimal fallout.
(Maybe it's the same pass...)  Not sure that's a good base for
decisions.

Whatever you do in the end, *please document the canonical
order* in the RTL documentation.

brgds, H-P
Jeff Law May 12, 2017, 6:11 a.m. UTC | #6
On 05/11/2017 03:29 PM, Hans-Peter Nilsson wrote:
> On Wed, 10 May 2017, Jakub Jelinek wrote:
> 
>> On Wed, May 10, 2017 at 09:57:56PM +0200, Uros Bizjak wrote:
>>> BTW: This patch now catches 417 cases (instead of 200+) in linux
>>> build, including e.g.:
>>>
>>> (parallel [
>>>          (set (reg:CCZ 17 flags)
>>>              (compare:CCZ (lshiftrt:SI (reg:SI 4 si [orig:93 _10 ] [93])
>>>                      (const_int 1 [0x1]))
>>>                  (const_int 0 [0])))
>>>          (set (reg:DI 4 si)
>>>              (zero_extend:DI (lshiftrt:SI (reg:SI 4 si [orig:93 _10 ] [93])
>>>                      (const_int 1 [0x1]))))
>>>      ])
>>
>> That looks nice.  So, I think we need analysis on what order which targets
>> use.  I have looked at mn10300.md, I see {add,sub}si3_flags patterns that
>> would need PARALLEL reordering for this compare-elim.c change and then
>> cmp_liw vs. liw_cmp patterns I have no clue what they do and whether
>> compare-elim.c would care about those or not (they have UNSPECs).  Jeff/Alex?
>>
>> In rx.md I see {add,sub}si3_flags too, then ssaddsi3 and 2 peepholes that
>> would need changing.
> 
> The canonical order of insns affecting condition-codes has been
> discussed in the past too.
> 
> I was then arguing that the compare should go last, i.e.
>   (parallel [(set (reg) (op...))
>              (set (ccreg) (compare (op...) (const_int 0)))])
> for simplicity of processing together with an alternative that
> clobbers (i.e. same location in the parallel), as the canonical
> order of clobbers in a parallel is that they always go last.
> (IIRC from that distant past, transforming a cc0 target to
> "ccreg" required at least three variants: the "naked" insn
> (pre-reload?), the parallel with a clobber of ccreg, and the
> actual use of ccreg; as above.)
> 
> Anyway, people seem to drift towards the ccreg-last variant for
> some reason or another every time this is brought up; this time
> it seems the single reason is that some existing pass expects or
> generates this, using the order that causes the minimal fallout.
> (Maybe it's the same pass...)  Not sure that's a good base for
> decisions.
> 
> Whatever you do in the end, *please document the canonical
> order* in the RTL documentation.
I think what's driving the decision is more a desire not to muck with 
x86 and aarch64 and instead place the burden to change on the less 
popular ports.

Though I must admit i prefer the cc setting last since that mirrors how 
we typically do things with clobbers of the cc register.

But yes, we definitely should document the final canonical ordering.

jeff
Hans-Peter Nilsson May 12, 2017, 9:42 a.m. UTC | #7
(To-list pruned, my correction doesn't need attention.)

On Thu, 11 May 2017, Hans-Peter Nilsson wrote:
> On Wed, 10 May 2017, Jakub Jelinek wrote:
>
> > On Wed, May 10, 2017 at 09:57:56PM +0200, Uros Bizjak wrote:
> > > BTW: This patch now catches 417 cases (instead of 200+) in linux
> > > build, including e.g.:
> > >
> > > (parallel [
> > >         (set (reg:CCZ 17 flags)
> > >             (compare:CCZ (lshiftrt:SI (reg:SI 4 si [orig:93 _10 ] [93])
> > >                     (const_int 1 [0x1]))
> > >                 (const_int 0 [0])))
> > >         (set (reg:DI 4 si)
> > >             (zero_extend:DI (lshiftrt:SI (reg:SI 4 si [orig:93 _10 ] [93])
> > >                     (const_int 1 [0x1]))))
> > >     ])

> Anyway, people seem to drift towards the ccreg-last variant

JFTR, I miswrote that; I meant "towards the variant with
ccreg-first" as in Uros' example kept above and as opposed to my
example.

brgds, H-P
Jakub Jelinek May 12, 2017, 10:31 a.m. UTC | #8
On Fri, May 12, 2017 at 05:42:59AM -0400, Hans-Peter Nilsson wrote:
> (To-list pruned, my correction doesn't need attention.)
> 
> On Thu, 11 May 2017, Hans-Peter Nilsson wrote:
> > On Wed, 10 May 2017, Jakub Jelinek wrote:
> >
> > > On Wed, May 10, 2017 at 09:57:56PM +0200, Uros Bizjak wrote:
> > > > BTW: This patch now catches 417 cases (instead of 200+) in linux
> > > > build, including e.g.:
> > > >
> > > > (parallel [
> > > >         (set (reg:CCZ 17 flags)
> > > >             (compare:CCZ (lshiftrt:SI (reg:SI 4 si [orig:93 _10 ] [93])
> > > >                     (const_int 1 [0x1]))
> > > >                 (const_int 0 [0])))
> > > >         (set (reg:DI 4 si)
> > > >             (zero_extend:DI (lshiftrt:SI (reg:SI 4 si [orig:93 _10 ] [93])
> > > >                     (const_int 1 [0x1]))))
> > > >     ])
> 
> > Anyway, people seem to drift towards the ccreg-last variant
> 
> JFTR, I miswrote that; I meant "towards the variant with
> ccreg-first" as in Uros' example kept above and as opposed to my
> example.

If I skim the current primary/secondary targets, then i386/x86_64, rs6000,
aarch64, arm, sparc, s390* all use the ccreg-first order, in mips I couldn't
find either of the orders.
Looking at other targets, in alpha, avr, bfin, c6x, cr16, cris, fr30, ft32,
h8300, ia64, iq2000, lm32, m32c, m32r, m68k, mcore, microblaze, mmix, moxie,
nds32, nios2, nvptx, pa, pdp11, riscv, rl78, sh, spu, stormy16, tilegx,
tilepro, v850, vax, xtensa I can't find any order,
arc, epiphany are ccreg-first,
frv is some weird mixture of ccreg-first (e.g. *combo_intop_compare2) and
ccreg-last (e.g. adddi3_lower),
mn10300, rx and visium are ccreg-last that we want to switch over now to
ccreg-first.
It was all from quickly skimming (mostly) the main config/<cpu>/<cpu>.md
looking for \(compare and/or \(reg.*CC and looking if there are patterns
that have the same arithmetics inside a compare or something similar vs.
the operation repeated on a set, so it is possible I've missed something.

But if the above is roughly true, then it is of course preferable to change
3 less used targets than 6 primary/secondary ones + 2 further ones.

	Jakub
Jeff Law May 12, 2017, 6:34 p.m. UTC | #9
On 05/10/2017 01:05 PM, Uros Bizjak wrote:
> On Wed, May 10, 2017 at 5:18 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> On Wed, May 10, 2017 at 4:27 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>>> On Tue, May 09, 2017 at 06:06:47PM +0200, Uros Bizjak wrote:
>>>> Attached patch enables post-reload compare elimination pass by
>>>> providing expected patterns (duplicates of existing patterns with
>>>> setters of reg and flags switched in the parallel) for flag setting
>>>> arithmetic instructions.
>>>>
>>>> The merge triggers more than 3000 times during the gcc bootstrap,
>>>> mostly in cases where intervening memory load or store prevents
>>>> combine from merging the arithmetic insn and the following compare.
>>>>
>>>> Also, some recent linux x86_64 defconfig build results in ~200 merges,
>>>> removing ~200 test/cmp insns. Not much, but I think the results still
>>>> warrant the pass to be enabled.
>>>
>>> Isn't the right fix instead to change the compare-elim.c pass to either
>>> accept both reg vs. flags orderings in parallel, or both depending
>>> on some target hook, or change it to the order i386.md and most other
>>> major targets use and just fix up mn10300/rx (and aarch64?) to use the same
>>> order?
> 
> Attached patch changes compare-elim.c order to what i386.md expects.
> 
> Thoughts?
I think with an appropriate change to the canonicalization rules in the 
manual this is fine.

I've got the visium, rx and mn103 patches handy to ensure they don't 
regress.  aarch64 doesn't seem to be affected either way but I didn't 
investigate why -- I expected it to improve with your change.

I'll write up a ChangeLog and commit the mn103, rx and visium changes 
after you commit the compare-elim & documentation bits.

jeff
Hans-Peter Nilsson May 17, 2017, 2:45 a.m. UTC | #10
On Fri, 12 May 2017, Jeff Law wrote:
> On 05/11/2017 03:29 PM, Hans-Peter Nilsson wrote:
> > The canonical order of insns affecting condition-codes has been
> > discussed in the past too.
> >
> > I was then arguing that the compare should go last, i.e.
> >   (parallel [(set (reg) (op...))
> >              (set (ccreg) (compare (op...) (const_int 0)))])
> > for simplicity of processing together with an alternative that
> > clobbers (i.e. same location in the parallel), as the canonical
> > order of clobbers in a parallel is that they always go last.
> > (IIRC from that distant past, transforming a cc0 target to
> > "ccreg" required at least three variants: the "naked" insn
> > (pre-reload?), the parallel with a clobber of ccreg, and the
> > actual use of ccreg; as above.)
> >
> > Anyway, people seem to drift towards the ccreg-last variant for

(miswrite: ccreg-*first*, as opposed to the above)

> > some reason or another every time this is brought up; this time
> > it seems the single reason is that some existing pass expects or
> > generates this, using the order that causes the minimal fallout.
> > (Maybe it's the same pass...)  Not sure that's a good base for
> > decisions.
> >
> > Whatever you do in the end, *please document the canonical
> > order* in the RTL documentation.
> I think what's driving the decision is more a desire not to muck with x86 and
> aarch64 and instead place the burden to change on the less popular ports.
>
> Though I must admit i prefer the cc setting last since that mirrors how we
> typically do things with clobbers of the cc register.

IOW.

> But yes, we definitely should document the final canonical ordering.

Is that about to also happen?

I foresee in another half-a-dozen years, and *this* iteration is
forgotten, someone bothered enough to argue eloquently coming
around, doing rtl-level-maintenance, maybe a new pass (ok maybe
not a *new RTL-pass* :) sees that order as wrong for the reason
listed above, and does the legwork to switch the order around.
It will be ok to change it again then, because the order just
happened this time because of minimal-edit-reasons, right?
Noone can argue that it was a thoughtful deliberate change that
we bothered to document, to stay consistent?  ;)

brgds, H-P
Uros Bizjak May 17, 2017, 6:33 a.m. UTC | #11
On Wed, May 17, 2017 at 4:45 AM, Hans-Peter Nilsson <hp@bitrange.com> wrote:

>> But yes, we definitely should document the final canonical ordering.
>
> Is that about to also happen?
>
> I foresee in another half-a-dozen years, and *this* iteration is
> forgotten, someone bothered enough to argue eloquently coming
> around, doing rtl-level-maintenance, maybe a new pass (ok maybe
> not a *new RTL-pass* :) sees that order as wrong for the reason
> listed above, and does the legwork to switch the order around.
> It will be ok to change it again then, because the order just
> happened this time because of minimal-edit-reasons, right?
> Noone can argue that it was a thoughtful deliberate change that
> we bothered to document, to stay consistent?  ;)

The proposed doc patch is wiating for review at [1].

[1] https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01073.html

Uros.
Hans-Peter Nilsson May 17, 2017, 12:54 p.m. UTC | #12
On Wed, 17 May 2017, Uros Bizjak wrote:
> On Wed, May 17, 2017 at 4:45 AM, Hans-Peter Nilsson <hp@bitrange.com> wrote:
> >> But yes, we definitely should document the final canonical ordering.
> >
> > Is that about to also happen?
> The proposed doc patch is wiating for review at [1].
>
> [1] https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01073.html

Thanks.  LGTM.

brgds, H-P
Jeff Law June 23, 2017, 4:46 a.m. UTC | #13
On 05/17/2017 12:33 AM, Uros Bizjak wrote:
> On Wed, May 17, 2017 at 4:45 AM, Hans-Peter Nilsson <hp@bitrange.com> wrote:
> 
>>> But yes, we definitely should document the final canonical ordering.
>>
>> Is that about to also happen?
>>
>> I foresee in another half-a-dozen years, and *this* iteration is
>> forgotten, someone bothered enough to argue eloquently coming
>> around, doing rtl-level-maintenance, maybe a new pass (ok maybe
>> not a *new RTL-pass* :) sees that order as wrong for the reason
>> listed above, and does the legwork to switch the order around.
>> It will be ok to change it again then, because the order just
>> happened this time because of minimal-edit-reasons, right?
>> Noone can argue that it was a thoughtful deliberate change that
>> we bothered to document, to stay consistent?  ;)
> 
> The proposed doc patch is wiating for review at [1].
> 
> [1] https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01073.html
OK for the trunk.

Sorry for the delay,
jeff
diff mbox

Patch

Index: compare-elim.c
===================================================================
--- compare-elim.c	(revision 247850)
+++ compare-elim.c	(working copy)
@@ -45,9 +45,9 @@ 
    (3) If an insn of form (2) can usefully set the flags, there is
        another pattern of the form
 
-	[(set (reg) (operation)
-	 (set (reg:CCM) (compare:CCM (operation) (immediate)))]
-
+	[(set (reg:CCM) (compare:CCM (operation) (immediate)))
+	 (set (reg) (operation)]
+	 
        The mode CCM will be chosen as if by SELECT_CC_MODE.
 
    Note that unlike NOTICE_UPDATE_CC, we do not handle memory operands.
@@ -582,7 +582,7 @@ 
 static bool
 try_eliminate_compare (struct comparison *cmp)
 {
-  rtx x, flags, in_a, in_b, cmp_src;
+  rtx flags, in_a, in_b, cmp_src;
 
   /* We must have found an interesting "clobber" preceding the compare.  */
   if (cmp->prev_clobber == NULL)
@@ -628,7 +628,8 @@ 
      Validate that PREV_CLOBBER itself does in fact refer to IN_A.  Do
      recall that we've already validated the shape of PREV_CLOBBER.  */
   rtx_insn *insn = cmp->prev_clobber;
-  x = XVECEXP (PATTERN (insn), 0, 0);
+
+  rtx x = XVECEXP (PATTERN (insn), 0, 0);
   if (rtx_equal_p (SET_DEST (x), in_a))
     cmp_src = SET_SRC (x);
 
@@ -666,16 +667,30 @@ 
     flags = gen_rtx_REG (cmp->orig_mode, targetm.flags_regnum);
 
   /* Generate a new comparison for installation in the setter.  */
-  x = copy_rtx (cmp_src);
-  x = gen_rtx_COMPARE (GET_MODE (flags), x, in_b);
-  x = gen_rtx_SET (flags, x);
+  rtx y = copy_rtx (cmp_src);
+  y = gen_rtx_COMPARE (GET_MODE (flags), y, in_b);
+  y = gen_rtx_SET (flags, y);
 
+  /* Canonicalize instruction to:
+	[(set (reg:CCM) (compare:CCM (operation) (immediate)))
+	 (set (reg) (operation)]
+  */
+
+  rtvec v = rtvec_alloc (2);
+  RTVEC_ELT (v, 0) = y;
+  RTVEC_ELT (v, 1) = x;
+  
+  rtx pat = gen_rtx_PARALLEL (VOIDmode, v);
+  
   /* Succeed if the new instruction is valid.  Note that we may have started
      a change group within maybe_select_cc_mode, therefore we must continue. */
-  validate_change (insn, &XVECEXP (PATTERN (insn), 0, 1), x, true);
+  validate_change (insn, &PATTERN (insn), pat, true);
   if (!apply_change_group ())
     return false;
 
+  printf ("TRIGGERED\n");
+  debug_rtx (pat);
+
   /* Success.  Delete the compare insn...  */
   delete_insn (cmp->insn);
 
Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c	(revision 247850)
+++ config/i386/i386.c	(working copy)
@@ -52043,6 +52043,8 @@ 
 #undef TARGET_ADDRESS_COST
 #define TARGET_ADDRESS_COST ix86_address_cost
 
+#undef TARGET_FLAGS_REGNUM
+#define TARGET_FLAGS_REGNUM FLAGS_REG
 #undef TARGET_FIXED_CONDITION_CODE_REGS
 #define TARGET_FIXED_CONDITION_CODE_REGS ix86_fixed_condition_code_regs
 #undef TARGET_CC_MODES_COMPATIBLE