diff mbox

[AVR] : Fix PR29560 (map 16-bit shift to 8-bit)

Message ID 4E2FFA77.4080300@gjlay.de
State New
Headers show

Commit Message

Georg-Johann Lay July 27, 2011, 11:45 a.m. UTC
Joseph S. Myers wrote:
> On Tue, 26 Jul 2011, Georg-Johann Lay wrote:
> 
>> I once ran into trouble because there seems to be no clear
>> separation between undefinedness in C and undefinedness in RTL
>>
>> Starting thread from here,
>>   http://gcc.gnu.org/ml/gcc-help/2010-06/msg00191.html
>>
>> the treads comes to this
>>   http://gcc.gnu.org/ml/gcc-help/2010-06/msg00198.html
>>
>> which includes a snip that shows that some RTL passes
>> optimize on assumptions of undefinedness of C.
>> I.e. undefinedness is propagated from C trough SSA until RTL.
> 
> That seems like a bug.  Flags such flag_wrapv relate to semantics of C and 
> GIMPLE, not to RTL; (abs) RTL should always be wrapping.  (For shifts, 
> SHIFT_COUNT_TRUNCATED describes the semantics of RTL.  For clz and ctz, 
> CLZ_DEFINED_VALUE_AT_ZERO and CTZ_DEFINED_VALUE_AT_ZERO describe the 
> semantics.  All these target macros only describe RTL, not C source or 
> GIMPLE.  PR 30484 discusses how one might fix INT_MIN / -1 and INT_MIN % 
> -1 for -fwrapv.)

Thanks for that clarification Joseph.

So here is a cleaned up version that maps to QI shifts.

Tested without regressions.

Ok?

Johann


	PR target/29560
	* config/avr/avr.md (any_extend): New code interator.
	(extend_prefix): New code attribute.
	(*ashl<extend_prefix>qihiqi3): New insn-and-splits.
	(*ashl<extend_prefix>qihiqi3.mem): New insn-and-splits.
	(*ashlhiqi3): New insn-and-split.
	Add peephole2 to map ashlhi3 to ashlqi3 if high part of
	shift target is unused.

Comments

Richard Henderson July 27, 2011, 4:48 p.m. UTC | #1
> +;; "*ashluqihiqi3.mem"
> +;; "*ashlsqihiqi3.mem"
> +(define_insn_and_split "*ashl<extend_prefix>qihiqi3.mem"
> +  [(set (match_operand:QI 0 "memory_operand" "=m")
> +        (subreg:QI (ashift:HI (any_extend:HI (match_operand:QI 1 "register_operand" "r"))
> +                              (match_operand:QI 2 "register_operand" "r"))
> +                   0))]
> +  "!reload_completed"
> +  { gcc_unreachable(); }

Surely this isn't necessary.  Why would you ever be matching a memory output?

> +(define_insn_and_split "*ashlhiqi3"
> +  [(set (match_operand:QI 0 "nonimmediate_operand" "=r")
> +        (subreg:QI (ashift:HI (match_operand:HI 1 "register_operand" "0")
> +                              (match_operand:QI 2 "register_operand" "r")) 0))]
> +  "!reload_completed"
> +  { gcc_unreachable(); }

Likewise.

But the first pattern and the peep2 look good.


r~
Georg-Johann Lay July 27, 2011, 5 p.m. UTC | #2
Richard Henderson wrote:
>> +;; "*ashluqihiqi3.mem"
>> +;; "*ashlsqihiqi3.mem"
>> +(define_insn_and_split "*ashl<extend_prefix>qihiqi3.mem"
>> +  [(set (match_operand:QI 0 "memory_operand" "=m")
>> +        (subreg:QI (ashift:HI (any_extend:HI (match_operand:QI 1 "register_operand" "r"))
>> +                              (match_operand:QI 2 "register_operand" "r"))
>> +                   0))]
>> +  "!reload_completed"
>> +  { gcc_unreachable(); }
> 
> Surely this isn't necessary.  Why would you ever be matching a memory output?
> 
>> +(define_insn_and_split "*ashlhiqi3"
>> +  [(set (match_operand:QI 0 "nonimmediate_operand" "=r")
>> +        (subreg:QI (ashift:HI (match_operand:HI 1 "register_operand" "0")
>> +                              (match_operand:QI 2 "register_operand" "r")) 0))]
>> +  "!reload_completed"
>> +  { gcc_unreachable(); }
> 
> Likewise.
> 
> But the first pattern and the peep2 look good.
> 

It's that what combine comes up with, and combine is not smart enough
to find a split point between the mem and the subreg.  I don't know
enough of combine, maybe it's because can_create_pseudo_p is false
during combine, combine has no spare reg.  A combine-split won't
help as it needs a pseudo/spare reg.

As consequence there is better code if memory operand is allowed
which is a typical use-case, e.g. setting some bits in a SFR.

Johann
Richard Henderson July 28, 2011, 8:12 p.m. UTC | #3
On 07/27/2011 10:00 AM, Georg-Johann Lay wrote:
> Richard Henderson wrote:
>>> +;; "*ashluqihiqi3.mem"
>>> +;; "*ashlsqihiqi3.mem"
>>> +(define_insn_and_split "*ashl<extend_prefix>qihiqi3.mem"
>>> +  [(set (match_operand:QI 0 "memory_operand" "=m")
>>> +        (subreg:QI (ashift:HI (any_extend:HI (match_operand:QI 1 "register_operand" "r"))
>>> +                              (match_operand:QI 2 "register_operand" "r"))
>>> +                   0))]
>>> +  "!reload_completed"
>>> +  { gcc_unreachable(); }
>>
>> Surely this isn't necessary.  Why would you ever be matching a memory output?
>>
>>> +(define_insn_and_split "*ashlhiqi3"
>>> +  [(set (match_operand:QI 0 "nonimmediate_operand" "=r")
>>> +        (subreg:QI (ashift:HI (match_operand:HI 1 "register_operand" "0")
>>> +                              (match_operand:QI 2 "register_operand" "r")) 0))]
>>> +  "!reload_completed"
>>> +  { gcc_unreachable(); }
>>
>> Likewise.
>>
>> But the first pattern and the peep2 look good.
>>
> 
> It's that what combine comes up with, and combine is not smart enough
> to find a split point between the mem and the subreg.  I don't know
> enough of combine, maybe it's because can_create_pseudo_p is false
> during combine, combine has no spare reg.  A combine-split won't
> help as it needs a pseudo/spare reg.

Hmm.  Perhaps.  Have you a test case for this?



r~
Georg-Johann Lay July 29, 2011, 1:59 p.m. UTC | #4
Richard Henderson wrote:
> On 07/27/2011 10:00 AM, Georg-Johann Lay wrote:
>> Richard Henderson wrote:
>>>> +;; "*ashluqihiqi3.mem"
>>>> +;; "*ashlsqihiqi3.mem"
>>>> +(define_insn_and_split "*ashl<extend_prefix>qihiqi3.mem"
>>>> +  [(set (match_operand:QI 0 "memory_operand" "=m")
>>>> +        (subreg:QI (ashift:HI (any_extend:HI (match_operand:QI 1 "register_operand" "r"))
>>>> +                              (match_operand:QI 2 "register_operand" "r"))
>>>> +                   0))]
>>>> +  "!reload_completed"
>>>> +  { gcc_unreachable(); }
>>> Surely this isn't necessary.  Why would you ever be matching a memory output?
>>>
>>>> +(define_insn_and_split "*ashlhiqi3"
>>>> +  [(set (match_operand:QI 0 "nonimmediate_operand" "=r")
>>>> +        (subreg:QI (ashift:HI (match_operand:HI 1 "register_operand" "0")
>>>> +                              (match_operand:QI 2 "register_operand" "r")) 0))]
>>>> +  "!reload_completed"
>>>> +  { gcc_unreachable(); }
>>> Likewise.
>>>
>>> But the first pattern and the peep2 look good.
>>>
>> It's that what combine comes up with, and combine is not smart enough
>> to find a split point between the mem and the subreg.  I don't know
>> enough of combine, maybe it's because can_create_pseudo_p is false
>> during combine, combine has no spare reg.  A combine-split won't
>> help as it needs a pseudo/spare reg.
> 
> Hmm.  Perhaps.  Have you a test case for this?
> 
> r~

char y;

void shift2 (char x, unsigned char s)
{
    y = x << s;
}


Combiner tries:

...

Trying 9 -> 10:
Failed to match this instruction:
(set (mem/c/i:QI (symbol_ref:HI ("y")  <var_decl 0xb764c180 y>) [0 y+0 S1 A8])
    (subreg:QI (ashift:HI (reg:HI 48 [ x ])
            (reg/v:QI 47 [ s ])) 0))

Trying 7, 9 -> 10:
Failed to match this instruction:
(set (mem/c/i:QI (symbol_ref:HI ("y")  <var_decl 0xb764c180 y>) [0 y+0 S1 A8])
    (subreg:QI (ashift:HI (zero_extend:HI (reg/v:QI 46 [ x ]))
            (reg/v:QI 47 [ s ])) 0))
Successfully matched this instruction:
(set (reg:HI 50)
    (zero_extend:HI (reg/v:QI 46 [ x ])))
Failed to match this instruction:
(set (mem/c/i:QI (symbol_ref:HI ("y")  <var_decl 0xb764c180 y>) [0 y+0 S1 A8])
    (subreg:QI (ashift:HI (reg:HI 50)
            (reg/v:QI 47 [ s ])) 0))
starting the processing of deferred insns
ending the processing of deferred insns

It sees that it can split out the zero_extend but it does not try to factor
out the set, i.e. try to split like this where both instructions would match:

(set (reg:QI foo)
    (subreg:QI (ashift:HI (reg:HI 50)
            (reg/v:QI 47 [ s ])) 0))

(set (mem/c/i:QI (symbol_ref:HI ("y")  <var_decl 0xb764c180 y>) [0 y+0 S1 A8])
     (reg:QI foo))


This is not specific the the test case, I see combine frequently to miss such
opportunities even if just non-volatile memory is involved.

If volatile memory is involved it's even worse because opportunities like
load-modify-store, sign- and zero-extends or extracting/inserting are not
detected, see PR49807.

Johann
Georg-Johann Lay Aug. 9, 2011, 8:47 p.m. UTC | #5
This patch is still pending review:

http://gcc.gnu.org/ml/gcc-patches/2011-07/msg02376.html

Georg-Johann Lay a écrit:
> 
> [...]
> 
> So here is a cleaned up version that maps to QI shifts.
> 
> Tested without regressions.
> 
> Ok?
> 
> Johann
> 
> 	PR target/29560
> 	* config/avr/avr.md (any_extend): New code interator.
> 	(extend_prefix): New code attribute.
> 	(*ashl<extend_prefix>qihiqi3): New insn-and-splits.
> 	(*ashl<extend_prefix>qihiqi3.mem): New insn-and-splits.
> 	(*ashlhiqi3): New insn-and-split.
> 	Add peephole2 to map ashlhi3 to ashlqi3 if high part of
> 	shift target is unused.
>
Denis Chertykov Aug. 10, 2011, 4:57 a.m. UTC | #6
2011/8/10 Georg-Johann Lay <avr@gjlay.de>:
> This patch is still pending review:
>
> http://gcc.gnu.org/ml/gcc-patches/2011-07/msg02376.html
>
> Georg-Johann Lay a écrit:
>>
>> [...]
>>
>> So here is a cleaned up version that maps to QI shifts.
>>
>> Tested without regressions.
>>
>> Ok?

Ok.

Denis.
diff mbox

Patch

Index: config/avr/avr.md
===================================================================
--- config/avr/avr.md	(revision 176624)
+++ config/avr/avr.md	(working copy)
@@ -132,6 +132,15 @@  (define_mode_iterator QIDI [(QI "") (HI
 (define_mode_iterator HIDI [(HI "") (SI "") (DI "")])
 (define_mode_iterator HISI [(HI "") (SI "")])
 
+;; Define code iterators
+(define_code_iterator any_extend  [sign_extend zero_extend])
+
+;; Define code attributes
+(define_code_attr extend_prefix
+  [(sign_extend "s")
+   (zero_extend "u")])
+
+
 ;;========================================================================
 ;; The following is used by nonlocal_goto and setjmp.
 ;; The receiver pattern will create no instructions since internally
@@ -1993,6 +2002,85 @@  (define_insn "ashlhi3"
   [(set_attr "length" "6,0,2,2,4,10,10")
    (set_attr "cc" "clobber,none,set_n,clobber,set_n,clobber,clobber")])
 
+
+;; Insn like the following are generated when (implicitly) extending 8-bit shifts
+;; like char1 = char2 << char3.  Only the low-byte is needed in that situation.
+
+;; "*ashluqihiqi3"
+;; "*ashlsqihiqi3"
+(define_insn_and_split "*ashl<extend_prefix>qihiqi3"
+  [(set (match_operand:QI 0 "register_operand"                                     "=r")
+        (subreg:QI (ashift:HI (any_extend:HI (match_operand:QI 1 "register_operand" "0"))
+                              (match_operand:QI 2 "register_operand"                "r"))
+                   0))]
+  ""
+  "#"
+  ""
+  [(set (match_dup 0)
+        (ashift:QI (match_dup 1)
+                   (match_dup 2)))]
+  "")
+
+;; "*ashluqihiqi3.mem"
+;; "*ashlsqihiqi3.mem"
+(define_insn_and_split "*ashl<extend_prefix>qihiqi3.mem"
+  [(set (match_operand:QI 0 "memory_operand" "=m")
+        (subreg:QI (ashift:HI (any_extend:HI (match_operand:QI 1 "register_operand" "r"))
+                              (match_operand:QI 2 "register_operand" "r"))
+                   0))]
+  "!reload_completed"
+  { gcc_unreachable(); }
+  "&& 1"
+  [(set (match_dup 3)
+        (ashift:QI (match_dup 1)
+                   (match_dup 2)))
+   (set (match_dup 0)
+        (match_dup 3))]
+  {
+    operands[3] = force_reg (QImode, operands[0]);
+  })
+
+;; Similar.
+
+(define_insn_and_split "*ashlhiqi3"
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=r")
+        (subreg:QI (ashift:HI (match_operand:HI 1 "register_operand" "0")
+                              (match_operand:QI 2 "register_operand" "r")) 0))]
+  "!reload_completed"
+  { gcc_unreachable(); }
+  "&& 1"
+  [(set (match_dup 4)
+        (ashift:QI (match_dup 3)
+                   (match_dup 2)))
+   (set (match_dup 0)
+        (match_dup 4))]
+  {
+    operands[3] = simplify_gen_subreg (QImode, operands[1], HImode, 0);
+    operands[4] = force_reg (QImode, operands[0]);
+  })
+
+;; High part of 16-bit shift is unused after the instruction:
+;; No need to compute it, map to 8-bit shift.
+
+(define_peephole2
+  [(set (match_operand:HI 0 "register_operand" "")
+        (ashift:HI (match_dup 0)
+                   (match_operand:QI 1 "register_operand" "")))]
+  ""
+  [(set (match_dup 2)
+        (ashift:QI (match_dup 2)
+                   (match_dup 1)))
+   (clobber (match_dup 3))]
+  {
+    operands[3] = simplify_gen_subreg (QImode, operands[0], HImode, 1);
+
+    if (!peep2_reg_dead_p (1, operands[3]))
+      FAIL;
+
+    operands[2] = simplify_gen_subreg (QImode, operands[0], HImode, 0);
+  })
+
+
 (define_insn "ashlsi3"
   [(set (match_operand:SI 0 "register_operand"           "=r,r,r,r,r,r,r")
 	(ashift:SI (match_operand:SI 1 "register_operand" "0,0,0,r,0,0,0")