Message ID | 4E2FFA77.4080300@gjlay.de |
---|---|
State | New |
Headers | show |
> +;; "*ashluqihiqi3.mem" > +;; "*ashlsqihiqi3.mem" > +(define_insn_and_split "*ashl<extend_prefix>qihiqi3.mem" > + [(set (match_operand:QI 0 "memory_operand" "=m") > + (subreg:QI (ashift:HI (any_extend:HI (match_operand:QI 1 "register_operand" "r")) > + (match_operand:QI 2 "register_operand" "r")) > + 0))] > + "!reload_completed" > + { gcc_unreachable(); } Surely this isn't necessary. Why would you ever be matching a memory output? > +(define_insn_and_split "*ashlhiqi3" > + [(set (match_operand:QI 0 "nonimmediate_operand" "=r") > + (subreg:QI (ashift:HI (match_operand:HI 1 "register_operand" "0") > + (match_operand:QI 2 "register_operand" "r")) 0))] > + "!reload_completed" > + { gcc_unreachable(); } Likewise. But the first pattern and the peep2 look good. r~
Richard Henderson wrote: >> +;; "*ashluqihiqi3.mem" >> +;; "*ashlsqihiqi3.mem" >> +(define_insn_and_split "*ashl<extend_prefix>qihiqi3.mem" >> + [(set (match_operand:QI 0 "memory_operand" "=m") >> + (subreg:QI (ashift:HI (any_extend:HI (match_operand:QI 1 "register_operand" "r")) >> + (match_operand:QI 2 "register_operand" "r")) >> + 0))] >> + "!reload_completed" >> + { gcc_unreachable(); } > > Surely this isn't necessary. Why would you ever be matching a memory output? > >> +(define_insn_and_split "*ashlhiqi3" >> + [(set (match_operand:QI 0 "nonimmediate_operand" "=r") >> + (subreg:QI (ashift:HI (match_operand:HI 1 "register_operand" "0") >> + (match_operand:QI 2 "register_operand" "r")) 0))] >> + "!reload_completed" >> + { gcc_unreachable(); } > > Likewise. > > But the first pattern and the peep2 look good. > It's that what combine comes up with, and combine is not smart enough to find a split point between the mem and the subreg. I don't know enough of combine, maybe it's because can_create_pseudo_p is false during combine, combine has no spare reg. A combine-split won't help as it needs a pseudo/spare reg. As consequence there is better code if memory operand is allowed which is a typical use-case, e.g. setting some bits in a SFR. Johann
On 07/27/2011 10:00 AM, Georg-Johann Lay wrote: > Richard Henderson wrote: >>> +;; "*ashluqihiqi3.mem" >>> +;; "*ashlsqihiqi3.mem" >>> +(define_insn_and_split "*ashl<extend_prefix>qihiqi3.mem" >>> + [(set (match_operand:QI 0 "memory_operand" "=m") >>> + (subreg:QI (ashift:HI (any_extend:HI (match_operand:QI 1 "register_operand" "r")) >>> + (match_operand:QI 2 "register_operand" "r")) >>> + 0))] >>> + "!reload_completed" >>> + { gcc_unreachable(); } >> >> Surely this isn't necessary. Why would you ever be matching a memory output? >> >>> +(define_insn_and_split "*ashlhiqi3" >>> + [(set (match_operand:QI 0 "nonimmediate_operand" "=r") >>> + (subreg:QI (ashift:HI (match_operand:HI 1 "register_operand" "0") >>> + (match_operand:QI 2 "register_operand" "r")) 0))] >>> + "!reload_completed" >>> + { gcc_unreachable(); } >> >> Likewise. >> >> But the first pattern and the peep2 look good. >> > > It's that what combine comes up with, and combine is not smart enough > to find a split point between the mem and the subreg. I don't know > enough of combine, maybe it's because can_create_pseudo_p is false > during combine, combine has no spare reg. A combine-split won't > help as it needs a pseudo/spare reg. Hmm. Perhaps. Have you a test case for this? r~
Richard Henderson wrote: > On 07/27/2011 10:00 AM, Georg-Johann Lay wrote: >> Richard Henderson wrote: >>>> +;; "*ashluqihiqi3.mem" >>>> +;; "*ashlsqihiqi3.mem" >>>> +(define_insn_and_split "*ashl<extend_prefix>qihiqi3.mem" >>>> + [(set (match_operand:QI 0 "memory_operand" "=m") >>>> + (subreg:QI (ashift:HI (any_extend:HI (match_operand:QI 1 "register_operand" "r")) >>>> + (match_operand:QI 2 "register_operand" "r")) >>>> + 0))] >>>> + "!reload_completed" >>>> + { gcc_unreachable(); } >>> Surely this isn't necessary. Why would you ever be matching a memory output? >>> >>>> +(define_insn_and_split "*ashlhiqi3" >>>> + [(set (match_operand:QI 0 "nonimmediate_operand" "=r") >>>> + (subreg:QI (ashift:HI (match_operand:HI 1 "register_operand" "0") >>>> + (match_operand:QI 2 "register_operand" "r")) 0))] >>>> + "!reload_completed" >>>> + { gcc_unreachable(); } >>> Likewise. >>> >>> But the first pattern and the peep2 look good. >>> >> It's that what combine comes up with, and combine is not smart enough >> to find a split point between the mem and the subreg. I don't know >> enough of combine, maybe it's because can_create_pseudo_p is false >> during combine, combine has no spare reg. A combine-split won't >> help as it needs a pseudo/spare reg. > > Hmm. Perhaps. Have you a test case for this? > > r~ char y; void shift2 (char x, unsigned char s) { y = x << s; } Combiner tries: ... Trying 9 -> 10: Failed to match this instruction: (set (mem/c/i:QI (symbol_ref:HI ("y") <var_decl 0xb764c180 y>) [0 y+0 S1 A8]) (subreg:QI (ashift:HI (reg:HI 48 [ x ]) (reg/v:QI 47 [ s ])) 0)) Trying 7, 9 -> 10: Failed to match this instruction: (set (mem/c/i:QI (symbol_ref:HI ("y") <var_decl 0xb764c180 y>) [0 y+0 S1 A8]) (subreg:QI (ashift:HI (zero_extend:HI (reg/v:QI 46 [ x ])) (reg/v:QI 47 [ s ])) 0)) Successfully matched this instruction: (set (reg:HI 50) (zero_extend:HI (reg/v:QI 46 [ x ]))) Failed to match this instruction: (set (mem/c/i:QI (symbol_ref:HI ("y") <var_decl 0xb764c180 y>) [0 y+0 S1 A8]) (subreg:QI (ashift:HI (reg:HI 50) (reg/v:QI 47 [ s ])) 0)) starting the processing of deferred insns ending the processing of deferred insns It sees that it can split out the zero_extend but it does not try to factor out the set, i.e. try to split like this where both instructions would match: (set (reg:QI foo) (subreg:QI (ashift:HI (reg:HI 50) (reg/v:QI 47 [ s ])) 0)) (set (mem/c/i:QI (symbol_ref:HI ("y") <var_decl 0xb764c180 y>) [0 y+0 S1 A8]) (reg:QI foo)) This is not specific the the test case, I see combine frequently to miss such opportunities even if just non-volatile memory is involved. If volatile memory is involved it's even worse because opportunities like load-modify-store, sign- and zero-extends or extracting/inserting are not detected, see PR49807. Johann
This patch is still pending review: http://gcc.gnu.org/ml/gcc-patches/2011-07/msg02376.html Georg-Johann Lay a écrit: > > [...] > > So here is a cleaned up version that maps to QI shifts. > > Tested without regressions. > > Ok? > > Johann > > PR target/29560 > * config/avr/avr.md (any_extend): New code interator. > (extend_prefix): New code attribute. > (*ashl<extend_prefix>qihiqi3): New insn-and-splits. > (*ashl<extend_prefix>qihiqi3.mem): New insn-and-splits. > (*ashlhiqi3): New insn-and-split. > Add peephole2 to map ashlhi3 to ashlqi3 if high part of > shift target is unused. >
2011/8/10 Georg-Johann Lay <avr@gjlay.de>: > This patch is still pending review: > > http://gcc.gnu.org/ml/gcc-patches/2011-07/msg02376.html > > Georg-Johann Lay a écrit: >> >> [...] >> >> So here is a cleaned up version that maps to QI shifts. >> >> Tested without regressions. >> >> Ok? Ok. Denis.
Index: config/avr/avr.md =================================================================== --- config/avr/avr.md (revision 176624) +++ config/avr/avr.md (working copy) @@ -132,6 +132,15 @@ (define_mode_iterator QIDI [(QI "") (HI (define_mode_iterator HIDI [(HI "") (SI "") (DI "")]) (define_mode_iterator HISI [(HI "") (SI "")]) +;; Define code iterators +(define_code_iterator any_extend [sign_extend zero_extend]) + +;; Define code attributes +(define_code_attr extend_prefix + [(sign_extend "s") + (zero_extend "u")]) + + ;;======================================================================== ;; The following is used by nonlocal_goto and setjmp. ;; The receiver pattern will create no instructions since internally @@ -1993,6 +2002,85 @@ (define_insn "ashlhi3" [(set_attr "length" "6,0,2,2,4,10,10") (set_attr "cc" "clobber,none,set_n,clobber,set_n,clobber,clobber")]) + +;; Insn like the following are generated when (implicitly) extending 8-bit shifts +;; like char1 = char2 << char3. Only the low-byte is needed in that situation. + +;; "*ashluqihiqi3" +;; "*ashlsqihiqi3" +(define_insn_and_split "*ashl<extend_prefix>qihiqi3" + [(set (match_operand:QI 0 "register_operand" "=r") + (subreg:QI (ashift:HI (any_extend:HI (match_operand:QI 1 "register_operand" "0")) + (match_operand:QI 2 "register_operand" "r")) + 0))] + "" + "#" + "" + [(set (match_dup 0) + (ashift:QI (match_dup 1) + (match_dup 2)))] + "") + +;; "*ashluqihiqi3.mem" +;; "*ashlsqihiqi3.mem" +(define_insn_and_split "*ashl<extend_prefix>qihiqi3.mem" + [(set (match_operand:QI 0 "memory_operand" "=m") + (subreg:QI (ashift:HI (any_extend:HI (match_operand:QI 1 "register_operand" "r")) + (match_operand:QI 2 "register_operand" "r")) + 0))] + "!reload_completed" + { gcc_unreachable(); } + "&& 1" + [(set (match_dup 3) + (ashift:QI (match_dup 1) + (match_dup 2))) + (set (match_dup 0) + (match_dup 3))] + { + operands[3] = force_reg (QImode, operands[0]); + }) + +;; Similar. + +(define_insn_and_split "*ashlhiqi3" + [(set (match_operand:QI 0 "nonimmediate_operand" "=r") + (subreg:QI (ashift:HI (match_operand:HI 1 "register_operand" "0") + (match_operand:QI 2 "register_operand" "r")) 0))] + "!reload_completed" + { gcc_unreachable(); } + "&& 1" + [(set (match_dup 4) + (ashift:QI (match_dup 3) + (match_dup 2))) + (set (match_dup 0) + (match_dup 4))] + { + operands[3] = simplify_gen_subreg (QImode, operands[1], HImode, 0); + operands[4] = force_reg (QImode, operands[0]); + }) + +;; High part of 16-bit shift is unused after the instruction: +;; No need to compute it, map to 8-bit shift. + +(define_peephole2 + [(set (match_operand:HI 0 "register_operand" "") + (ashift:HI (match_dup 0) + (match_operand:QI 1 "register_operand" "")))] + "" + [(set (match_dup 2) + (ashift:QI (match_dup 2) + (match_dup 1))) + (clobber (match_dup 3))] + { + operands[3] = simplify_gen_subreg (QImode, operands[0], HImode, 1); + + if (!peep2_reg_dead_p (1, operands[3])) + FAIL; + + operands[2] = simplify_gen_subreg (QImode, operands[0], HImode, 0); + }) + + (define_insn "ashlsi3" [(set (match_operand:SI 0 "register_operand" "=r,r,r,r,r,r,r") (ashift:SI (match_operand:SI 1 "register_operand" "0,0,0,r,0,0,0")