Patchwork [AVR] : Fix PR27663

login
register
mail settings
Submitter Georg-Johann Lay
Date May 2, 2011, 2:30 p.m.
Message ID <4DBEC003.4050300@gjlay.de>
Download mbox | patch
Permalink /patch/93668/
State New
Headers show

Comments

Georg-Johann Lay - May 2, 2011, 2:30 p.m.
This is a fix for an optimization flaw when a long value is composed
from byte values.

For -fsplit-wide-types (which is still default for avr) the code is
worse than with -fno-split-wide-types. The code for the test case is
better in either situations, i.e. compared to code without the patch,
but it is still not optimal.

Fixing this by some combine patterns is the only thing the BE can do.
I did not write more complex patterns because things get too complex
with little performance gain.

Tested without regressions.

Johann

2011-05-02  Georg-Johann Lay  <avr@gjlay.de>

	PR target/27663
	* config/avr/predicates.md (const_8_16_24_operand): New predicate.
	* config/avr/avr.md ("*ior<mode>qi.byte0",
	"*ior<mode>qi.byte1-3"): New define_insn_and_split patterns.
Denis Chertykov - May 2, 2011, 3:17 p.m.
2011/5/2 Georg-Johann Lay <avr@gjlay.de>:
> This is a fix for an optimization flaw when a long value is composed
> from byte values.
>
> For -fsplit-wide-types (which is still default for avr) the code is
> worse than with -fno-split-wide-types. The code for the test case is
> better in either situations, i.e. compared to code without the patch,
> but it is still not optimal.
>
> Fixing this by some combine patterns is the only thing the BE can do.
> I did not write more complex patterns because things get too complex
> with little performance gain.
>
> Tested without regressions.
>
> Johann
>
> 2011-05-02  Georg-Johann Lay  <avr@gjlay.de>
>
>        PR target/27663
>        * config/avr/predicates.md (const_8_16_24_operand): New predicate.
>        * config/avr/avr.md ("*ior<mode>qi.byte0",
>        "*ior<mode>qi.byte1-3"): New define_insn_and_split patterns.
>

I'm sorry, but I dot'n like to have a both combiner related patches in
port because code improvement isn't much and your patterns are
difficult to understand and maintain.

May be somebody else have a different oppinion ?
I'm open to discussion.

Denis.
Georg-Johann Lay - May 6, 2011, 1:42 p.m.
Denis Chertykov schrieb:
> 2011/5/2 Georg-Johann Lay <avr@gjlay.de>:
>> This is a fix for an optimization flaw when a long value is composed
>> from byte values.
>>
>> For -fsplit-wide-types (which is still default for avr) the code is
>> worse than with -fno-split-wide-types. The code for the test case is
>> better in either situations, i.e. compared to code without the patch,
>> but it is still not optimal.
>>
>> Fixing this by some combine patterns is the only thing the BE can do.
>> I did not write more complex patterns because things get too complex
>> with little performance gain.
>>
>> Tested without regressions.
>>
>> Johann
>>
>> 2011-05-02  Georg-Johann Lay  <avr@gjlay.de>
>>
>>        PR target/27663
>>        * config/avr/predicates.md (const_8_16_24_operand): New predicate.
>>        * config/avr/avr.md ("*ior<mode>qi.byte0",
>>        "*ior<mode>qi.byte1-3"): New define_insn_and_split patterns.
>>
> 
> I'm sorry, but I dot'n like to have a both combiner related patches in
> port because code improvement isn't much and your patterns are
> difficult to understand and maintain.

You refer to this patch for PR42210?

http://gcc.gnu.org/ml/gcc-patches/2011-04/msg02099.html

> May be somebody else have a different oppinion ?
> I'm open to discussion.


The patterns in this patch are similar to "*addhi3_zero_extend",
"*addhi3_zero_extend1" that handle HI+QI resp. "*addhi3_zero_extend"
that handle SI+QI.

The difference is that they handle IOR instead of PLUS. It's true that
the user has to use some specific code (addition of QI to HI resp. SI
in the first case and ORing of QI to HI resp. SI in the second).

IMO insn combine is a very powerful pass and I do not see why the avr
BE should not take advantage of it to synthesize new instructions.
Note that other parts like "*sbi" or "*cbi" rely on insn combine, too.

If it's hard to understand what their intention is, I can add some
more comments.

As insn combine is capable of generating new instructions that are not
covered by standard patterns, it is only natural that they might be
more complicated than standard patterns. But almost everything in GCC
is complicated, even in the avr BE stuff like, e.g. handling of
rotate, is way much more complicated.

The new patterns are restricted to one single place in the backend.
If they are correct, they are supposed to work in the future without
steadily maintaining them.

I agree that it would be nice if the middleend detected the
expressions as, say, (set (zero_extract:QI (reg:SI ...))), but that's
not the case; not even on 32-bit targets with full insv/extzv support.

And as I already wrote, the -fsplit-wide-types is not a good choice on
avr (except for 64-bit stuff where subreg lowering leads to much
code), see

http://gcc.gnu.org/ml/gcc/2011-03/msg00261.html

So with -fno-split-wide-types and some more elaborate testcase you
will see that the new patterns are a clear improvement.

Johann

> 
> Denis.
>
Georg-Johann Lay - May 11, 2011, 6:15 p.m.
Denis Chertykov schrieb:
> 2011/5/2 Georg-Johann Lay <avr@gjlay.de>:
> 
>>This is a fix for an optimization flaw when a long value is composed
>>from byte values.
>>
>>For -fsplit-wide-types (which is still default for avr) the code is
>>worse than with -fno-split-wide-types. The code for the test case is
>>better in either situations, i.e. compared to code without the patch,
>>but it is still not optimal.
>>
>>Fixing this by some combine patterns is the only thing the BE can do.
>>I did not write more complex patterns because things get too complex
>>with little performance gain.
>>
>>Tested without regressions.
>>
>>Johann
>>
>>2011-05-02  Georg-Johann Lay  <avr@gjlay.de>
>>
>>       PR target/27663
>>       * config/avr/predicates.md (const_8_16_24_operand): New predicate.
>>       * config/avr/avr.md ("*ior<mode>qi.byte0",
>>       "*ior<mode>qi.byte1-3"): New define_insn_and_split patterns.
>>
> I'm sorry, but I dot'n like to have a both combiner related patches in
> port because code improvement isn't much and your patterns are
> difficult to understand and maintain.
> 
> May be somebody else have a different oppinion ?
> I'm open to discussion.
> 
> Denis.

Let me add that the patch is generic enough to also improve ORing HI 
against QI like described in

http://gcc.gnu.org/PR41076

which is not uncommon on avr.

Johann
Denis Chertykov - May 13, 2011, 7:58 a.m.
2011/5/11 Georg-Johann Lay <avr@gjlay.de>:
> Denis Chertykov schrieb:
>>
>> 2011/5/2 Georg-Johann Lay <avr@gjlay.de>:
>>
>>> This is a fix for an optimization flaw when a long value is composed
>>> from byte values.
>>>
>>> For -fsplit-wide-types (which is still default for avr) the code is
>>> worse than with -fno-split-wide-types. The code for the test case is
>>> better in either situations, i.e. compared to code without the patch,
>>> but it is still not optimal.
>>>
>>> Fixing this by some combine patterns is the only thing the BE can do.
>>> I did not write more complex patterns because things get too complex
>>> with little performance gain.
>>>
>>> Tested without regressions.
>>>
>>> Johann
>>>
>>> 2011-05-02  Georg-Johann Lay  <avr@gjlay.de>
>>>
>>>      PR target/27663
>>>      * config/avr/predicates.md (const_8_16_24_operand): New predicate.
>>>      * config/avr/avr.md ("*ior<mode>qi.byte0",
>>>      "*ior<mode>qi.byte1-3"): New define_insn_and_split patterns.
>>>
>> I'm sorry, but I dot'n like to have a both combiner related patches in
>> port because code improvement isn't much and your patterns are
>> difficult to understand and maintain.
>>
>> May be somebody else have a different oppinion ?
>> I'm open to discussion.
>>
>> Denis.
>
> Let me add that the patch is generic enough to also improve ORing HI against
> QI like described in
>
> http://gcc.gnu.org/PR41076
>
> which is not uncommon on avr.

You right here.
Please, apply the patch.

Denis.
Georg-Johann Lay - May 16, 2011, 3:36 p.m.
Denis Chertykov schrieb:
> 2011/5/11 Georg-Johann Lay <avr@gjlay.de>:
>> Denis Chertykov schrieb:
>>> 2011/5/2 Georg-Johann Lay <avr@gjlay.de>:
>>>
>>>> This is a fix for an optimization flaw when a long value is composed
>>>> from byte values.
>>>>
>>>> For -fsplit-wide-types (which is still default for avr) the code is
>>>> worse than with -fno-split-wide-types. The code for the test case is
>>>> better in either situations, i.e. compared to code without the patch,
>>>> but it is still not optimal.
>>>>
>>>> Fixing this by some combine patterns is the only thing the BE can do.
>>>> I did not write more complex patterns because things get too complex
>>>> with little performance gain.
>>>>
>>>> Tested without regressions.
>>>>
>>>> Johann
>>>>
>>>> 2011-05-02  Georg-Johann Lay  <avr@gjlay.de>
>>>>
>>>>      PR target/27663
>>>>      * config/avr/predicates.md (const_8_16_24_operand): New predicate.
>>>>      * config/avr/avr.md ("*ior<mode>qi.byte0",
>>>>      "*ior<mode>qi.byte1-3"): New define_insn_and_split patterns.
>>>>
>>> I'm sorry, but I dot'n like to have a both combiner related patches in
>>> port because code improvement isn't much and your patterns are
>>> difficult to understand and maintain.
>>>
>>> May be somebody else have a different oppinion ?
>>> I'm open to discussion.
>>>
>>> Denis.
>> Let me add that the patch is generic enough to also improve ORing HI against
>> QI like described in
>>
>> http://gcc.gnu.org/PR41076
>>
>> which is not uncommon on avr.
> 
> You right here.
> Please, apply the patch.

Applied the current version of the patch:

http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=173792

There are still situations that could be handled easily by avr like:

[1]: Two instructions
(set (reg:HI 50)
     (ior:HI (ashift:HI (zero_extend:HI (reg:QI 52))
                        (const_int 8))
             (zero_extend:HI (reg:QI 55))))

[2]: Two instructions
(set (reg:HI 50)
     (ior:HI (ashift:HI (reg:HI 51)
                        (const_int))
             (zero_extend:HI (reg:QI 55))))

[3]: One Instruction
(set (reg:HI 50)
     (ior:HI (ashift:HI (reg:HI 51)
                        (const_int 8))
             (reg:HI 54)))

If you agree, I make a patch to cover these cases, too. Just for HI,
without mode iterators so that they are easier to understand.

IMO it's preferred to do some pre-reload transforms instead of trying
to catch such situations in peepholes or let the optimization
opportunity pass by without making use of it.

Unfortunately, there is no standard form the middle end tries; it
differs depending on if a value is in memory or in a register and of
-fsplit-wide-types is on or not.

The patterns may seem complicated, but it's just what's going on...

Johann

> 
> Denis.
>
Denis Chertykov - May 16, 2011, 5:30 p.m.
2011/5/16 Georg-Johann Lay <avr@gjlay.de>:
> Denis Chertykov schrieb:
>> 2011/5/11 Georg-Johann Lay <avr@gjlay.de>:
>>> Denis Chertykov schrieb:
>>>> 2011/5/2 Georg-Johann Lay <avr@gjlay.de>:
>>>>
>>>>> This is a fix for an optimization flaw when a long value is composed
>>>>> from byte values.
>>>>>
>>>>> For -fsplit-wide-types (which is still default for avr) the code is
>>>>> worse than with -fno-split-wide-types. The code for the test case is
>>>>> better in either situations, i.e. compared to code without the patch,
>>>>> but it is still not optimal.
>>>>>
>>>>> Fixing this by some combine patterns is the only thing the BE can do.
>>>>> I did not write more complex patterns because things get too complex
>>>>> with little performance gain.
>>>>>
>>>>> Tested without regressions.
>>>>>
>>>>> Johann
>>>>>
>>>>> 2011-05-02  Georg-Johann Lay  <avr@gjlay.de>
>>>>>
>>>>>      PR target/27663
>>>>>      * config/avr/predicates.md (const_8_16_24_operand): New predicate.
>>>>>      * config/avr/avr.md ("*ior<mode>qi.byte0",
>>>>>      "*ior<mode>qi.byte1-3"): New define_insn_and_split patterns.
>>>>>
>>>> I'm sorry, but I dot'n like to have a both combiner related patches in
>>>> port because code improvement isn't much and your patterns are
>>>> difficult to understand and maintain.
>>>>
>>>> May be somebody else have a different oppinion ?
>>>> I'm open to discussion.
>>>>
>>>> Denis.
>>> Let me add that the patch is generic enough to also improve ORing HI against
>>> QI like described in
>>>
>>> http://gcc.gnu.org/PR41076
>>>
>>> which is not uncommon on avr.
>>
>> You right here.
>> Please, apply the patch.
>
> Applied the current version of the patch:
>
> http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=173792
>
> There are still situations that could be handled easily by avr like:
>
> [1]: Two instructions
> (set (reg:HI 50)
>     (ior:HI (ashift:HI (zero_extend:HI (reg:QI 52))
>                        (const_int 8))
>             (zero_extend:HI (reg:QI 55))))
>
> [2]: Two instructions
> (set (reg:HI 50)
>     (ior:HI (ashift:HI (reg:HI 51)
>                        (const_int))
>             (zero_extend:HI (reg:QI 55))))
>
> [3]: One Instruction
> (set (reg:HI 50)
>     (ior:HI (ashift:HI (reg:HI 51)
>                        (const_int 8))
>             (reg:HI 54)))
>
> If you agree, I make a patch to cover these cases, too. Just for HI,
> without mode iterators so that they are easier to understand.
>
> IMO it's preferred to do some pre-reload transforms instead of trying
> to catch such situations in peepholes or let the optimization
> opportunity pass by without making use of it.
>
> Unfortunately, there is no standard form the middle end tries; it
> differs depending on if a value is in memory or in a register and of
> -fsplit-wide-types is on or not.
>
> The patterns may seem complicated, but it's just what's going on...

Please, provide the patch.

Denis.

Patch

Index: config/avr/predicates.md
===================================================================
--- config/avr/predicates.md	(Revision 172902)
+++ config/avr/predicates.md	(Arbeitskopie)
@@ -138,3 +138,10 @@  (define_predicate "call_insn_operand"
 (define_predicate "pseudo_register_operand"
   (and (match_code "reg")
        (match_test "!HARD_REGISTER_P (op)")))
+
+;; Return true if OP is a constant integer that is either
+;; 8 or 16 or 24.
+(define_predicate "const_8_16_24_operand"
+  (and (match_code "const_int")
+       (match_test "8 == INTVAL(op) || 16 == INTVAL(op) || 24 == INTVAL(op)")))
+
Index: config/avr/avr.md
===================================================================
--- config/avr/avr.md	(Revision 172902)
+++ config/avr/avr.md	(Arbeitskopie)
@@ -3388,3 +3388,42 @@  (define_insn "fmulsu"
 	clr __zero_reg__"
   [(set_attr "length" "3")
    (set_attr "cc" "clobber")])
+
+
+;; Some combine patterns that try to fix bad code when a value is composed
+;; from byte parts like in PR27663.
+;; The patterns give some release but the code still is not optimal,
+;; in particular when subreg lowering (-fsplit-wide-types) is turned on.
+;; That switch obfuscates things here and in many other places.
+
+(define_insn_and_split "*ior<mode>qi.byte0"
+  [(set (match_operand:HISI 0 "register_operand"                 "=r")
+        (ior:HISI
+         (zero_extend:HISI (match_operand:QI 1 "register_operand" "r"))
+         (match_operand:HISI 2 "register_operand"                 "0")))]
+  ""
+  "#"
+  "reload_completed"
+  [(set (match_dup 3)
+        (ior:QI (match_dup 3)
+                (match_dup 1)))]
+  {
+    operands[3] = simplify_gen_subreg (QImode, operands[0], <MODE>mode, 0);
+  })
+
+(define_insn_and_split "*ior<mode>qi.byte1-3"
+  [(set (match_operand:HISI 0 "register_operand"                              "=r")
+        (ior:HISI
+         (ashift:HISI (zero_extend:HISI (match_operand:QI 1 "register_operand" "r"))
+                      (match_operand:QI 2 "const_8_16_24_operand"              "n"))
+         (match_operand:HISI 3 "register_operand"                              "0")))]
+  "INTVAL(operands[2]) < GET_MODE_BITSIZE (<MODE>mode)"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 4)
+        (ior:QI (match_dup 4)
+                (match_dup 1)))]
+  {
+    int byteno = INTVAL(operands[2]) / BITS_PER_UNIT;
+    operands[4] = simplify_gen_subreg (QImode, operands[0], <MODE>mode, byteno);
+  })