diff mbox

[RFC,0/6] Flags outputs for asms

Message ID 554D195D.4010508@redhat.com
State New
Headers show

Commit Message

Richard Henderson May 8, 2015, 8:15 p.m. UTC
On 05/08/2015 08:54 AM, Richard Henderson wrote:
> I have a feeling I know why these didn't get merged.
> 
> The global optimizers aren't allowed to operate on hard registers lest they
> extend the lifetime of the hard register such that it creates an impossible
> situation for the register allocator.  Think what would happen if EAX were
> suddenly live across the entire function.
> 
> Similarly, combine is allowed to merge insns with hard registers if the insns
> are sequential.  But if the insns aren't sequential, we're lengthening the
> lifetime of the hard register.  Now, I thought this didn't apply to fixed
> registers like esp or flags, but perhaps not.
> 
> Note what happens if you swap the order of le and pf in the asm:
> 
>   asm("cmpl %3,%2" : "=@ccp" (pf), "=@ccle" (le) : "r" (x), "g" (y));
> 
> the order of the two setcc insns is reversed, and then the setle is in fact
> merged with the branch.
> 
> Anyway, I'll look into whether the branch around alpha can be optimized, but
> I'd be shocked if I'd be able to do anything about the branch around beta.
> True, there's nothing in between that will clobber the flags so it would be an
> excellent improvement, but combine doesn't work across basic blocks and
> changing that would be a major task.

My feeling was wrong.

(insn 9 11 10 2 (set (reg:QI 91 [ le ])
        (le:QI (reg:CCGC 17 flags)
            (const_int 0 [0]))) z.c:8 601 {*setcc_qi}
     (nil))
(insn 10 9 12 2 (set (reg:QI 92 [ pf ])
        (eq:QI (reg:CCP 17 flags)
            (const_int 0 [0]))) z.c:8 601 {*setcc_qi}
     (expr_list:REG_DEAD (reg:CCP 17 flags)
        (nil)))
(insn 12 10 13 2 (set (reg:CCZ 17 flags)
        (compare:CCZ (reg:QI 91 [ le ])
            (const_int 0 [0]))) z.c:12 1 {*cmpqi_ccno_1}
     (expr_list:REG_DEAD (reg:QI 91 [ le ])
        (nil)))
(jump_insn 13 12 14 2 (set (pc)
        (if_then_else (eq (reg:CCZ 17 flags)
                (const_int 0 [0]))
            (label_ref 17)
            (pc))) z.c:12 606 {*jcc_1}
     (expr_list:REG_DEAD (reg:CCZ 17 flags)
        (int_list:REG_BR_PROB 7929 (nil)))

Combine won't allow a combination of insn 9->12->13 in one step one of the
inputs to insn 9, the flags, is clobbered in between insn 9 and insn 13.
Namely, by insn 12.  Combine has a special-case for allowing this when all
insns are consecutive, but not when there's something in the middle.

But it *does* try to match an intermediate pattern,

(set (reg:CCGC 17 flags)
    (compare:CCGC (reg:CCGC 17 flags)
        (const_int 0 [0])))

which can be considered a no-op move.  If I add the attached pattern, then the
combination happens in two steps -- 9->12, 12->13 -- and we get what we hoped:

(note 9 11 10 2 NOTE_INSN_DELETED)
(insn 10 9 12 2 (set (reg:QI 92 [ pf ])
        (eq:QI (reg:CCP 17 flags)
            (const_int 0 [0]))) z.c:8 601 {*setcc_qi}
     (nil))
(note 12 10 13 2 NOTE_INSN_DELETED)
(jump_insn 13 12 14 2 (set (pc)
        (if_then_else (gt (reg:CCGC 17 flags)
                (const_int 0 [0]))
            (label_ref 17)
            (pc))) z.c:12 606 {*jcc_1}
     (int_list:REG_BR_PROB 7929 (expr_list:REG_DEAD (reg:CCZ 17 flags)
            (nil)))
 -> 17)


Jeff or Segher, is it worth complicating the can_combine_p test near line 1958

      /* Make sure that the value that is to be substituted for the register
         does not use any registers whose values alter in between.  However,
         If the insns are adjacent, a use can't cross a set even though we
         think it might (this can happen for a sequence of insns each setting
         the same destination; last_set of that register might point to
         a NOTE).  If INSN has a REG_EQUIV note, the register is always
         equivalent to the memory so the substitution is valid even if there
         are intervening stores.  Also, don't move a volatile asm or
         UNSPEC_VOLATILE across any other insns.  */
      || (! all_adjacent
          && (((!MEM_P (src)
                || ! find_reg_note (insn, REG_EQUIV, src))
               && use_crosses_set_p (src, DF_INSN_LUID (insn)))
              || (GET_CODE (src) == ASM_OPERANDS && MEM_VOLATILE_P (src))
              || GET_CODE (src) == UNSPEC_VOLATILE))

to notice that the set is one of SUCC or SUCC2, and is thus included in the
insns being combined?  That does seem cleaner and more general than the hacky
i386 nop_cmp pattern, but would certainly require tons more testing...



r~

Comments

Segher Boessenkool May 8, 2015, 9:14 p.m. UTC | #1
Hi Richard,

On Fri, May 08, 2015 at 01:15:25PM -0700, Richard Henderson wrote:
> But it *does* try to match an intermediate pattern,
> 
> (set (reg:CCGC 17 flags)
>     (compare:CCGC (reg:CCGC 17 flags)
>         (const_int 0 [0])))
> 
> which can be considered a no-op move.

Maybe we should teach combine this is a no-op, then?  Then everything
should work as-is?  Combine knows about no-op moves (they don't cost,
and it deletes them itself).

> Jeff or Segher, is it worth complicating the can_combine_p test near line 1958
> 
>       /* Make sure that the value that is to be substituted for the register
>          does not use any registers whose values alter in between.  However,
>          If the insns are adjacent, a use can't cross a set even though we
>          think it might (this can happen for a sequence of insns each setting
>          the same destination; last_set of that register might point to
>          a NOTE).  If INSN has a REG_EQUIV note, the register is always
>          equivalent to the memory so the substitution is valid even if there
>          are intervening stores.  Also, don't move a volatile asm or
>          UNSPEC_VOLATILE across any other insns.  */
>       || (! all_adjacent
>           && (((!MEM_P (src)
>                 || ! find_reg_note (insn, REG_EQUIV, src))
>                && use_crosses_set_p (src, DF_INSN_LUID (insn)))
>               || (GET_CODE (src) == ASM_OPERANDS && MEM_VOLATILE_P (src))
>               || GET_CODE (src) == UNSPEC_VOLATILE))
> 
> to notice that the set is one of SUCC or SUCC2, and is thus included in the
> insns being combined?  That does seem cleaner and more general than the hacky
> i386 nop_cmp pattern, but would certainly require tons more testing...

"Cleaner"?  In this code?  Heh.

use_crosses_set_p often estimates pessimistically.  Is that what is
happening here?  Using real dataflow in combine would fix that (and many
other problems).  Not that that helps you right now ;-)

I'll build with your patches tomorrow and investigate.


Segher
Jeff Law May 8, 2015, 9:32 p.m. UTC | #2
On 05/08/2015 02:15 PM, Richard Henderson wrote:
>
> But it *does* try to match an intermediate pattern,
>
> (set (reg:CCGC 17 flags)
>      (compare:CCGC (reg:CCGC 17 flags)
>          (const_int 0 [0])))
>
> which can be considered a no-op move.  If I add the attached pattern, then the
> combination happens in two steps -- 9->12, 12->13 -- and we get what we hoped:
So what happens if that pattern is actually recognized as a nop-move by 
set_noop_p?  That would allow recog_for_combine to see it as a nop and 
"recognize" it as valid.



>
>
> Jeff or Segher, is it worth complicating the can_combine_p test near line 1958
>
>        /* Make sure that the value that is to be substituted for the register
>           does not use any registers whose values alter in between.  However,
>           If the insns are adjacent, a use can't cross a set even though we
>           think it might (this can happen for a sequence of insns each setting
>           the same destination; last_set of that register might point to
>           a NOTE).  If INSN has a REG_EQUIV note, the register is always
>           equivalent to the memory so the substitution is valid even if there
>           are intervening stores.  Also, don't move a volatile asm or
>           UNSPEC_VOLATILE across any other insns.  */
>        || (! all_adjacent
>            && (((!MEM_P (src)
>                  || ! find_reg_note (insn, REG_EQUIV, src))
>                 && use_crosses_set_p (src, DF_INSN_LUID (insn)))
>                || (GET_CODE (src) == ASM_OPERANDS && MEM_VOLATILE_P (src))
>                || GET_CODE (src) == UNSPEC_VOLATILE))
>
> to notice that the set is one of SUCC or SUCC2, and is thus included in the
> insns being combined?  That does seem cleaner and more general than the hacky
> i386 nop_cmp pattern, but would certainly require tons more testing...
See above -- feels like we'd be better off letting combine know about 
this other form of a nop move and hopefully if we do that, all the right 
things happen.

jeff
Richard Henderson May 8, 2015, 9:55 p.m. UTC | #3
On 05/08/2015 02:32 PM, Jeff Law wrote:
> On 05/08/2015 02:15 PM, Richard Henderson wrote:
>>
>> But it *does* try to match an intermediate pattern,
>>
>> (set (reg:CCGC 17 flags)
>>      (compare:CCGC (reg:CCGC 17 flags)
>>          (const_int 0 [0])))
>>
>> which can be considered a no-op move.  If I add the attached pattern, then the
>> combination happens in two steps -- 9->12, 12->13 -- and we get what we hoped:
> So what happens if that pattern is actually recognized as a nop-move by
> set_noop_p?  That would allow recog_for_combine to see it as a nop and
> "recognize" it as valid.


Interesting suggestion -- I hadn't thought of that.  It might be easier than
playing with use_crosses_set_p, and certainly better than the nop_cmp pattern.

I'll have a go at this later.


r~
Segher Boessenkool May 8, 2015, 10:10 p.m. UTC | #4
On Fri, May 08, 2015 at 03:32:58PM -0600, Jeff Law wrote:
> On 05/08/2015 02:15 PM, Richard Henderson wrote:
> >
> >But it *does* try to match an intermediate pattern,
> >
> >(set (reg:CCGC 17 flags)
> >     (compare:CCGC (reg:CCGC 17 flags)
> >         (const_int 0 [0])))
> >
> >which can be considered a no-op move.  If I add the attached pattern, then 
> >the
> >combination happens in two steps -- 9->12, 12->13 -- and we get what we 
> >hoped:
> So what happens if that pattern is actually recognized as a nop-move by 
> set_noop_p?  That would allow recog_for_combine to see it as a nop and 
> "recognize" it as valid.

It's not valid RTL though, so it is a bit more work than that.


Segher
Richard Henderson May 8, 2015, 10:10 p.m. UTC | #5
On 05/08/2015 02:14 PM, Segher Boessenkool wrote:
> "Cleaner"?  In this code?  Heh.

Heh.

> use_crosses_set_p often estimates pessimistically.  Is that what is
> happening here?  Using real dataflow in combine would fix that (and many
> other problems).  Not that that helps you right now ;-)

Yes, I think so.  Proper data flow would fix this.  But...

My thought is that the use_crosses_set_p could grow another parameter,
ignore_luid, so that if it matches DF_INSN_LUID (rsp->last_set) we don't fail
the test.

Then can_combine_p has to adjust its use like so:

  use_crosses_set_p (src, DF_INSN_LUID (insn),
		     succ ? DF_INSN_LUID (succ) : -1)

which at least handles the 3-insn combine case, if not the 4-insn combine case.

I'll try out both this and Law's set_noop_p suggestion soon.



r~
Segher Boessenkool May 11, 2015, 1:19 p.m. UTC | #6
On Fri, May 08, 2015 at 01:15:25PM -0700, Richard Henderson wrote:
> But it *does* try to match an intermediate pattern,
> 
> (set (reg:CCGC 17 flags)
>     (compare:CCGC (reg:CCGC 17 flags)
>         (const_int 0 [0])))

Patch posted at <http://gcc.gnu.org/ml/gcc-patches/2015-05/msg00918.html>.


Segher
diff mbox

Patch

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index fc320f6..ffa5c46 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1320,6 +1320,22 @@ 
   [(set_attr "type" "icmp")
    (set_attr "mode" "QI")])
 
+;; This helps combine fold away a chain of setcc insns.
+
+(define_insn_and_split "*nop_cmp"
+  [(set (reg FLAGS_REG)
+	(compare (match_operand 0 "flags_reg_operand")
+		 (const_int 0)))]
+  "ix86_match_ccmode (insn, GET_MODE (operands[0]))"
+  "#"
+  ""
+  [(const_int 0)]
+{
+  /* No-op move.  Can't split to nothing; emit something.  */
+  emit_note (NOTE_INSN_DELETED);
+  DONE;
+})
+
 ;; These implement float point compares.
 ;; %%% See if we can get away with VOIDmode operands on the actual insns,
 ;; which would allow mix and match FP modes on the compares.  Which is what