diff mbox

[AVR] : Fix PR46779

Message ID 4DF73490.2080709@gjlay.de
State New
Headers show

Commit Message

Georg-Johann Lay June 14, 2011, 10:14 a.m. UTC
Denis Chertykov schrieb:
> 2011/6/13 Georg-Johann Lay <avr@gjlay.de>:
>> So you think is is pointless/discouraged to give a more realistic
>> description of AVR addressing be means of MODE_CODE_BASE_REG_CLASS (instead
>> of BASE_REG_CLASS) resp. REGNO_MODE_CODE_OK_FOR_BASE_P?
>>
>>> Look carefully at `out_movqi_r_mr'.
>>> There are even two fake addressing modes:
>>> 1. [Y + infinite-dslacement];
>>> 2. [X + (0...63)].
>> Yes, I know. The first is introduced by avr_legitimate_address_p and the
>> second appears to be artifact of LEGITIMIZE_RELOAD_ADDRESS.
>>
>> The changes are basically MODE_CODE_BASE_REG_CLASS (introduced in 4.2) and a
>> rewrite of avr_legitimate_address_p. The changes aim at a better addressing
>> for X and to minimize fake addresses.
>>
>>> I have spent a many hours (days, months) to debug GCC (especially avr port
>>> and reload) for right addressing modes.
>>> I have stopped on this code.
>>> AVR have a limited memory addressing and GCC can't handle it in native
>>> form.
>>> Because of that I have supported a fake adddressing modes.
>> I assume the code is from prior to 4.2 when REGNO_MODE_CODE_OK_FOR_BASE_P
>> and MODE_CODE_BASE_REG_CLASS had not been available so that supporting X
>> required some hacking.
>> All that would still be fine; however the new register allocator leads to
>> code that noone would accept. Accessing a structure through a pointer is not
>> uncommon, not even on AVR. So if Z is used for, say accessing flash, X
>> appears to be the best register.
>>
>> The shortcoming in GCC is that there is no way to give costs of addressing
>> (TARGET_ADDRESS_COST does different things).
>>
>> So take a look what avr-gcc compiles here:
>>  http://gcc.gnu.org/bugzilla/attachment.cgi?id=22242
>> I saw similar complains in forums on the web.
>>
>>> (Richard Henderson have a different opinion: GCC can, AVR port can't)
>> What does he mean with that?
>>
>>> IMHO that three limited pointer registers is not enough for C compiler.
>>> Even more with frame pointer it's only two and X is a very limited.
>> The current implementation has several oddities like
>>
>> * allowing SUBREGs in avr-legitimate_address_p
>> * changing BASE_REG_CLASS on the fly (by means of reload_completed)
>>
>> isn't that supposed to cause trouble?
> 
> You can try to remove all oddities and check results.
> Definitely something changed in GCC core since I wrote addressing code.
> 
> 
> Denis.

For your interest, here is a patch that shows the changes in
addressing mode.

Note that the

* LEGITIMIZE_RELOAD_ADDRESS is disabled. This is because I am
  unsure about how it should look like. The special cases for X
  are no more needed, and for Y and Z it might be good to have
  intermediate addresses with, say offset =0 mod 60, so that
  big offsets can be reached with addr + const, 0<= const < 60.

* patch already includes patch for pr46779 from
  http://gcc.gnu.org/ml/gcc-patches/2011-06/msg00810.html

* As I said above, I removed orphan DI insns.

So if you have a look into reload, it might also be interesting what
it does with this changes.

Johann

--

	* config/avr/avr.h (BASE_REG_CLASS): Remove.
	(REG_OK_FOR_BASE_NOSTRICT_P): Remove.
	(REG_OK_FOR_BASE_STRICT_P): Remove.
	(LEGITIMIZE_RELOAD_ADDRESS): Remove.
	(MODE_CODE_BASE_REG_CLASS): New Define.
	(REGNO_MODE_CODE_OK_FOR_BASE_P): New Define.

	* config/avr/avr.c: (avr_legitimate_address_p): Rewrite to allow
	addresses that can actually be handled by hardware.
	(avr_regno_mode_code_ok_for_base_p): New global Function.
	(avr_mode_code_base_reg_class): New global Function.
	(avr_hard_regno_mode_ok): Allow QI in all GPRs.
	(avr_reg_ok_for_addr): New static function.
	(avr_regno_reg_class): Change return type from enum reg_class to
	reg_class_t.
	(reg_class_tab): Set base type to reg_class_t. Return smallest
	register class for each register.

	* config/avr/avr.md: ("*sbrx_branch<mode>"): Disallow DI in mode.
	("rotl<mode>3"): Ditto.
	("*movqi"): Remove constraint 'Q'.
	("*movsi"): Ditto.
	("*movsf"): Ditto.
	("*ashlqi3", "ashrqi3", "*lshrqi3"): Ditto.
	("ashlhi3", "ashrhi3", "lshrhi3"): Ditto.
	("ashlsi3", "ashrsi3", "lshrsi3"): Ditto.
	("*movhi_sp"): Remove insn.
	("zero_extendqidi2"): Remove insn_and_split.
	("zero_extendhidi2"): Remove insn_and_split.
	("zero_extendsidi2"): Remove insn_and_split.

	* config/avr/avr-protos.h
	(secondary_input_reload_class): Remove prototype.
	(avr_mode_code_base_reg_class): New prototype.
	(avr_regno_mode_code_ok_for_base_p): New prototype.
	(avr_legitimize_reload_address): New prototype.

Comments

Denis Chertykov June 14, 2011, 11:29 a.m. UTC | #1
2011/6/14 Georg-Johann Lay <avr@gjlay.de>:
> Denis Chertykov schrieb:
>> 2011/6/13 Georg-Johann Lay <avr@gjlay.de>:
>>> So you think is is pointless/discouraged to give a more realistic
>>> description of AVR addressing be means of MODE_CODE_BASE_REG_CLASS (instead
>>> of BASE_REG_CLASS) resp. REGNO_MODE_CODE_OK_FOR_BASE_P?
>>>
>>>> Look carefully at `out_movqi_r_mr'.
>>>> There are even two fake addressing modes:
>>>> 1. [Y + infinite-dslacement];
>>>> 2. [X + (0...63)].
>>> Yes, I know. The first is introduced by avr_legitimate_address_p and the
>>> second appears to be artifact of LEGITIMIZE_RELOAD_ADDRESS.
>>>
>>> The changes are basically MODE_CODE_BASE_REG_CLASS (introduced in 4.2) and a
>>> rewrite of avr_legitimate_address_p. The changes aim at a better addressing
>>> for X and to minimize fake addresses.
>>>
>>>> I have spent a many hours (days, months) to debug GCC (especially avr port
>>>> and reload) for right addressing modes.
>>>> I have stopped on this code.
>>>> AVR have a limited memory addressing and GCC can't handle it in native
>>>> form.
>>>> Because of that I have supported a fake adddressing modes.
>>> I assume the code is from prior to 4.2 when REGNO_MODE_CODE_OK_FOR_BASE_P
>>> and MODE_CODE_BASE_REG_CLASS had not been available so that supporting X
>>> required some hacking.
>>> All that would still be fine; however the new register allocator leads to
>>> code that noone would accept. Accessing a structure through a pointer is not
>>> uncommon, not even on AVR. So if Z is used for, say accessing flash, X
>>> appears to be the best register.
>>>
>>> The shortcoming in GCC is that there is no way to give costs of addressing
>>> (TARGET_ADDRESS_COST does different things).
>>>
>>> So take a look what avr-gcc compiles here:
>>>  http://gcc.gnu.org/bugzilla/attachment.cgi?id=22242
>>> I saw similar complains in forums on the web.
>>>
>>>> (Richard Henderson have a different opinion: GCC can, AVR port can't)
>>> What does he mean with that?
>>>
>>>> IMHO that three limited pointer registers is not enough for C compiler.
>>>> Even more with frame pointer it's only two and X is a very limited.
>>> The current implementation has several oddities like
>>>
>>> * allowing SUBREGs in avr-legitimate_address_p
>>> * changing BASE_REG_CLASS on the fly (by means of reload_completed)
>>>
>>> isn't that supposed to cause trouble?
>>
>> You can try to remove all oddities and check results.
>> Definitely something changed in GCC core since I wrote addressing code.
>>
>>
>> Denis.
>
> For your interest, here is a patch that shows the changes in
> addressing mode.

Generally, the patch seems as a "right thing". I like it.

How about a regression testing and code quality.

Denis.
Georg-Johann Lay June 14, 2011, 9:29 p.m. UTC | #2
Denis Chertykov schrieb:
> 2011/6/14 Georg-Johann Lay <avr@gjlay.de>:
> 
>>Denis Chertykov schrieb:
>>
>>>2011/6/13 Georg-Johann Lay <avr@gjlay.de>:
>>>
>>>>So you think is is pointless/discouraged to give a more realistic
>>>>description of AVR addressing be means of MODE_CODE_BASE_REG_CLASS (instead
>>>>of BASE_REG_CLASS) resp. REGNO_MODE_CODE_OK_FOR_BASE_P?
>>>>
>>>>
>>>>>Look carefully at `out_movqi_r_mr'.
>>>>>There are even two fake addressing modes:
>>>>>1. [Y + infinite-dslacement];
>>>>>2. [X + (0...63)].
>>>>
>>>>Yes, I know. The first is introduced by avr_legitimate_address_p and the
>>>>second appears to be artifact of LEGITIMIZE_RELOAD_ADDRESS.
>>>>
>>>>The changes are basically MODE_CODE_BASE_REG_CLASS (introduced in 4.2) and a
>>>>rewrite of avr_legitimate_address_p. The changes aim at a better addressing
>>>>for X and to minimize fake addresses.
>>>>
>>>>
>>>>>I have spent a many hours (days, months) to debug GCC (especially avr port
>>>>>and reload) for right addressing modes.
>>>>>I have stopped on this code.
>>>>>AVR have a limited memory addressing and GCC can't handle it in native
>>>>>form.
>>>>>Because of that I have supported a fake adddressing modes.
>>>>
>>>>I assume the code is from prior to 4.2 when REGNO_MODE_CODE_OK_FOR_BASE_P
>>>>and MODE_CODE_BASE_REG_CLASS had not been available so that supporting X
>>>>required some hacking.
>>>>All that would still be fine; however the new register allocator leads to
>>>>code that noone would accept. Accessing a structure through a pointer is not
>>>>uncommon, not even on AVR. So if Z is used for, say accessing flash, X
>>>>appears to be the best register.
>>>>
>>>>The shortcoming in GCC is that there is no way to give costs of addressing
>>>>(TARGET_ADDRESS_COST does different things).
>>>>
>>>>So take a look what avr-gcc compiles here:
>>>> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22242
>>>>I saw similar complains in forums on the web.
>>>>
>>>>
>>>>>(Richard Henderson have a different opinion: GCC can, AVR port can't)
>>>>
>>>>What does he mean with that?
>>>>
>>>>
>>>>>IMHO that three limited pointer registers is not enough for C compiler.
>>>>>Even more with frame pointer it's only two and X is a very limited.
>>>>
>>>>The current implementation has several oddities like
>>>>
>>>>* allowing SUBREGs in avr-legitimate_address_p
>>>>* changing BASE_REG_CLASS on the fly (by means of reload_completed)
>>>>
>>>>isn't that supposed to cause trouble?
>>>
>>>You can try to remove all oddities and check results.
>>>Definitely something changed in GCC core since I wrote addressing code.
>>>
>>>Denis.
>>
>>For your interest, here is a patch that shows the changes in
>>addressing mode.

http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01029.html

> Generally, the patch seems as a "right thing". I like it.
> 
> How about a regression testing and code quality.
> 
> Denis.

I tested on some handcrafted examples and on the code attached to 
PR46278. The generated code looked very good and so I started regression 
testing but found at spill fail in
   gcc.c-torture/compile/950612-1.c

As I don't know how to fix a spill failure I stopped working on the 
patch. Perhaps it would help to have fake y+const addresses; I didn't try.

As far as I remember, reload emits inefficient code in some situations 
when it tries to fit address into available addressing mode by adding 
constant. I could fix that by new constraint alternative in addhi3 insn, 
something like "=*!r,r,n". But that are just details.

The major work left to be done are fixing spill fails and implementing 
appropriate LEGITIMIZE_RELOAD_ADDRESS which are beyond my skills.

BTW, fixing PR48459, Richard ran immediately into a spill failure during 
newlib build:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48459#c20

Johann
Richard Henderson June 14, 2011, 10:46 p.m. UTC | #3
On 06/14/2011 02:29 PM, Georg-Johann Lay wrote:
> http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01029.html

It does look like a step in the right direction.

> I tested on some handcrafted examples and on the code attached to
> PR46278. The generated code looked very good and so I started
> regression testing but found at spill fail in
>   gcc.c-torture/compile/950612-1.c

What compiler options?  I tried -O{1,2,s,3} and they all passed.

> The major work left to be done are fixing spill fails and
> implementing appropriate LEGITIMIZE_RELOAD_ADDRESS which are beyond
> my skills.

L_R_A is a tad tricky, but there are good examples to copy from.

But you shouldn't need that to fix spill failures.  L_R_A should
simply be able to provide slightly better sharing between addresses.
Not that I expect that there will be many instances of that, since
you'll quickly run out of registers in which to share anything.

> BTW, fixing PR48459, Richard ran immediately into a spill failure during newlib build:
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48459#c20

FWIW, that spill failure vanishes with your patch.


r~
Richard Henderson June 15, 2011, 5:34 p.m. UTC | #4
On 06/14/2011 02:29 PM, Georg-Johann Lay wrote:
> I tested on some handcrafted examples and on the code attached to
> PR46278. The generated code looked very good and so I started
> regression testing but found at spill fail in
>   gcc.c-torture/compile/950612-1.c

I reproduced this today.

The Problem is that we really have run out of registers.  X and Z
are both in use, and we're attempting to spill them for caller-save.
There is no register in which to load fp+158 so that we can save
either X or Z.

You're going to have to have some support for fp+large somewhere.


r~
Georg-Johann Lay June 15, 2011, 8:40 p.m. UTC | #5
Richard Henderson schrieb:
> On 06/14/2011 02:29 PM, Georg-Johann Lay wrote:
> 
>>I tested on some handcrafted examples and on the code attached to
>>PR46278. The generated code looked very good and so I started
>>regression testing but found at spill fail in
>>  gcc.c-torture/compile/950612-1.c
> 
> I reproduced this today.
> 
> The Problem is that we really have run out of registers.  X and Z
> are both in use, and we're attempting to spill them for caller-save.
> There is no register in which to load fp+158 so that we can save
> either X or Z.

Thanks for the analysis, that makes thinks clearer.

But I still wonder what's the very problem. Any architecture has limited 
reg+const addressing and limited number of address registers.
I definetely saw architectures run out of registers and reload manages 
to access stack beyond reg+maxoff without any hacks and clear and 
straight forward backend.

Bigger machines definetely have bigger maxoff and more address regs, but 
the software that's intended to run on them is considerably more 
complicated. Yet gcc doesn't run into problems and reload can cope with 
that.

> You're going to have to have some support for fp+large somewhere.

So there must be something fundamentally different between avr and other 
ports?

Is there a minimum requirement for add<pmode>3 insn like "r,r,n" 
alternative so that it can add a const without reload?

Johann

> r~
Richard Henderson June 15, 2011, 9:58 p.m. UTC | #6
On 06/15/2011 01:40 PM, Georg-Johann Lay wrote:
> But I still wonder what's the very problem. Any architecture has
> limited reg+const addressing and limited number of address
> registers. I definetely saw architectures run out of registers and
> reload manages to access stack beyond reg+maxoff without any hacks
> and clear and straight forward backend.

Well, this is probably the only architecture that has *no*
non-fixed, call-saved base registers.

Indeed, I can work around this particular crash by either
hacking Z to be call-saved, or hacking the frame pointer to
not be required.  The former of course changes the abi, and
the second produces awful code due to too many copies from
the stack pointer.  So neither option is "preferred".



r~
Richard Henderson June 15, 2011, 11:07 p.m. UTC | #7
On 06/15/2011 02:58 PM, Richard Henderson wrote:
> Indeed, I can work around this particular crash by either
> hacking Z to be call-saved, or hacking the frame pointer to
> not be required.  The former of course changes the abi, and
> the second produces awful code due to too many copies from
> the stack pointer.  So neither option is "preferred".

Perhaps I spoke too soon re the frame pointer.  The old
code is even worse.

   text	   data	    bss	    dec	    hex	filename
  10032	     25	      0	  10057	   2749	bld-avr-orig/gcc/z.o
   5816	     25	      0	   5841	   16d1	bld-avr-new/gcc/z.o

That said, there's a crash building libgcc so there's
still work to do.


r~
Denis Chertykov June 16, 2011, 11:34 a.m. UTC | #8
2011/6/16 Richard Henderson <rth@redhat.com>:
> On 06/15/2011 02:58 PM, Richard Henderson wrote:
>> Indeed, I can work around this particular crash by either
>> hacking Z to be call-saved, or hacking the frame pointer to
>> not be required.  The former of course changes the abi, and
>> the second produces awful code due to too many copies from
>> the stack pointer.  So neither option is "preferred".
>
> Perhaps I spoke too soon re the frame pointer.  The old
> code is even worse.
>
>   text    data     bss     dec     hex filename
>  10032      25       0   10057    2749 bld-avr-orig/gcc/z.o
>   5816      25       0    5841    16d1 bld-avr-new/gcc/z.o
>
> That said, there's a crash building libgcc so there's
> still work to do.

@rth (while you are diving into AVR microworld ;-)
May be you can give a suggestion to change the AVR abi.
I have tuned the abi for code size almost 13 years ago.
The register pressure to r18-r31 are very high.

I have a set of experiments with omitting the frame poiner and I found that
better to support fake addressing modes (infinite displacement for frame
pointer).

Denis.
Denis Chertykov June 16, 2011, 11:46 a.m. UTC | #9
2011/6/16 Denis Chertykov <chertykov@gmail.com>:
> 2011/6/16 Richard Henderson <rth@redhat.com>:
>> On 06/15/2011 02:58 PM, Richard Henderson wrote:
>>> Indeed, I can work around this particular crash by either
>>> hacking Z to be call-saved, or hacking the frame pointer to
>>> not be required.  The former of course changes the abi, and
>>> the second produces awful code due to too many copies from
>>> the stack pointer.  So neither option is "preferred".
>>
>> Perhaps I spoke too soon re the frame pointer.  The old
>> code is even worse.
>>
>>   text    data     bss     dec     hex filename
>>  10032      25       0   10057    2749 bld-avr-orig/gcc/z.o
>>   5816      25       0    5841    16d1 bld-avr-new/gcc/z.o
>>
>> That said, there's a crash building libgcc so there's
>> still work to do.
>
> @rth (while you are diving into AVR microworld ;-)
> May be you can give a suggestion to change the AVR abi.
> I have tuned the abi for code size almost 13 years ago.
> The register pressure to r18-r31 are very high.
>
> I have a set of experiments with omitting the frame poiner and I found that
> better to support fake addressing modes (infinite displacement for frame
> pointer).

I forgot to said that suggestion from Bernd Schmidt about
HARD_FRAME_POINTER_REGNUM seems very useful:
> Maybe it would help for your port to define a separate
> FRAME_POINTER_REGNUM, able to hold an HImode value, which then gets
> eliminated to HARD_FRAME_POINTER_REGNUM? This mechanism is used on many
> other ports if you need examples.

It's not related to addressing modes but it's related to frame pointer bugs.

Denis.
Richard Henderson June 16, 2011, 2:25 p.m. UTC | #10
On 06/16/2011 04:34 AM, Denis Chertykov wrote:
> @rth (while you are diving into AVR microworld ;-)
> May be you can give a suggestion to change the AVR abi.
> I have tuned the abi for code size almost 13 years ago.
> The register pressure to r18-r31 are very high.

As far as I can see, the ABI is pretty good.  There's nothing
that I would say that should obviously be changed.

I might totally drop support for DImode.  Let "long long" map
to SImode.  If you want 64-bit data, you probably don't want
to select an 8-bit microcontroller.

That might take a bit of surgery on the way we currently build
libgcc though.

> I have a set of experiments with omitting the frame poiner and I found that
> better to support fake addressing modes (infinite displacement for frame
> pointer).

The biggest problem that I see, from the 950612-1.c test case
with the current handling of the "infinite displacement frame
pointer", is that the adjustments to the frame pointer are 
never exposed as separate instructions, so there's never a
chance to optimize them.

So once you build a stack frame larger than 64 bytes, things
get worse and worse.  You wind up with code like

        subi r28,lo8(-133)
        sbci r29,hi8(-133)
        ld r18,Y
        subi r28,lo8(133)
        sbci r29,hi8(133)
        subi r28,lo8(-134)
        sbci r29,hi8(-134)
        ld r19,Y
        subi r28,lo8(134)
        sbci r29,hi8(134)

where obviously the 4 middle instructions could be eliminated.

If we came out of reload (or shortly thereafter via split) with
these as separate patterns, we might be able to eliminate them
via an existing optimization pass.  OTOH, an existing pass might
refuse to operate on the frame pointer because the frame pointer
is usually Special.

One could write an avr-specific pass for this:
  Scan the code for references to the frame pointer.
  Record the offsets of the uses.
  Compute a sliding 64-byte window that satisfies the maximum
	number of uses within the region.
  Insert adjustments to the frame pointer.
  Arrange for the dwarf2out routines to scan the whole function.
	(If alloca has been used, the frame pointer anchors the
	CFA, and the unwind info will need adjusting throughout
	the function.  Otherwise gdb will fail entirely.)

That last point probably depends on Bernd Schmidt's work in
dwarf2out for shrink-wrapping...


r~
Bernd Schmidt June 16, 2011, 2:35 p.m. UTC | #11
On 06/16/2011 04:25 PM, Richard Henderson wrote:
> That last point probably depends on Bernd Schmidt's work in
> dwarf2out for shrink-wrapping...

Did you have a chance to look at the ia64 unwind code? I probably won't
be able to get back to it for a while yet since I'm working on C6X tuning...


Bernd
Richard Henderson June 16, 2011, 2:51 p.m. UTC | #12
On 06/16/2011 07:35 AM, Bernd Schmidt wrote:
> Did you have a chance to look at the ia64 unwind code? I probably won't
> be able to get back to it for a while yet since I'm working on C6X tuning...

I was looking at it before I moved house a month ago, and then
totally forgot about it until today.  I should get back on that...


r~
Denis Chertykov June 16, 2011, 6:09 p.m. UTC | #13
2011/6/16 Richard Henderson <rth@redhat.com>:
> On 06/16/2011 04:34 AM, Denis Chertykov wrote:
>> @rth (while you are diving into AVR microworld ;-)
>> May be you can give a suggestion to change the AVR abi.
>> I have tuned the abi for code size almost 13 years ago.
>> The register pressure to r18-r31 are very high.
>
> As far as I can see, the ABI is pretty good.  There's nothing
> that I would say that should obviously be changed.
>
> I might totally drop support for DImode.  Let "long long" map
> to SImode.  If you want 64-bit data, you probably don't want
> to select an 8-bit microcontroller.
>
> That might take a bit of surgery on the way we currently build
> libgcc though.
>
>> I have a set of experiments with omitting the frame poiner and I found that
>> better to support fake addressing modes (infinite displacement for frame
>> pointer).
>
> The biggest problem that I see, from the 950612-1.c test case
> with the current handling of the "infinite displacement frame
> pointer", is that the adjustments to the frame pointer are
> never exposed as separate instructions, so there's never a
> chance to optimize them.

Yes. I'm agree.

> So once you build a stack frame larger than 64 bytes, things
> get worse and worse.

May be something like this:
The port must not permit infinite displacements before reload.
  (all addressing modes must have a right form)
Fake addressing mode enabled only after reload_in_progress.
  (only small part of spilled/local variables will be addressed by
fake addressing mode.)

I don't like fake addressing mode and I don't want to support it at
all, but may be it's a best way to work around 'unable to find a
register to spill' problem. (the current AVR addressing code written
in this way)

>  You wind up with code like
>
>        subi r28,lo8(-133)
>        sbci r29,hi8(-133)
>        ld r18,Y
>        subi r28,lo8(133)
>        sbci r29,hi8(133)
>        subi r28,lo8(-134)
>        sbci r29,hi8(-134)
>        ld r19,Y
>        subi r28,lo8(134)
>        sbci r29,hi8(134)
>
> where obviously the 4 middle instructions could be eliminated.
>
> If we came out of reload (or shortly thereafter via split) with
> these as separate patterns, we might be able to eliminate them
> via an existing optimization pass.  OTOH, an existing pass might
> refuse to operate on the frame pointer because the frame pointer
> is usually Special.

IMHO the source of the problem is a REGISTER ALLOCATOR because it is
not handle a register elimination and strict insn constraints.

Denis.
Denis Chertykov June 23, 2011, 8:15 p.m. UTC | #14
2011/6/16 Richard Henderson <rth@redhat.com>:
> On 06/15/2011 02:58 PM, Richard Henderson wrote:
>> Indeed, I can work around this particular crash by either
>> hacking Z to be call-saved, or hacking the frame pointer to
>> not be required.  The former of course changes the abi, and
>> the second produces awful code due to too many copies from
>> the stack pointer.  So neither option is "preferred".
>
> Perhaps I spoke too soon re the frame pointer.  The old
> code is even worse.
>
>   text    data     bss     dec     hex filename
>  10032      25       0   10057    2749 bld-avr-orig/gcc/z.o
>   5816      25       0    5841    16d1 bld-avr-new/gcc/z.o

Richard, can you send me this z.c file ?
Right now I'm notice that new code is worse.

Denis.
Richard Henderson June 23, 2011, 9:55 p.m. UTC | #15
On 06/23/2011 01:15 PM, Denis Chertykov wrote:
>>   text    data     bss     dec     hex filename
>>  10032      25       0   10057    2749 bld-avr-orig/gcc/z.o
>>   5816      25       0    5841    16d1 bld-avr-new/gcc/z.o
> 
> Richard, can you send me this z.c file ?
> Right now I'm notice that new code is worse.

That's gcc.c-torture/compile/950612-1.c.


r~
Denis Chertykov June 26, 2011, 6:50 p.m. UTC | #16
2011/6/24 Richard Henderson <rth@redhat.com>:
> On 06/23/2011 01:15 PM, Denis Chertykov wrote:
>>>   text    data     bss     dec     hex filename
>>>  10032      25       0   10057    2749 bld-avr-orig/gcc/z.o
>>>   5816      25       0    5841    16d1 bld-avr-new/gcc/z.o
>>
>> Richard, can you send me this z.c file ?
>> Right now I'm notice that new code is worse.
>
> That's gcc.c-torture/compile/950612-1.c.
>

I have founded that postreload optimizations can't handle results of
new L_R_A code.
I think that it's can be handled by CSE (postreload).

This is a fragment from 950612-1.c:

---------- you can skip it I have a short version below -------
(insn 5186 23 5187 2 (set (reg:HI 30 r30)
        (const_int 128 [0x80])) c950612-1.c:20 9 {*movhi}
     (nil))

(insn 5187 5186 5189 2 (set (reg:HI 30 r30)
        (plus:HI (reg:HI 30 r30)
            (reg/f:HI 28 r28))) c950612-1.c:20 21 {*addhi3}
     (expr_list:REG_EQUIV (plus:HI (reg/f:HI 28 r28)
            (const_int 128 [0x80]))
        (nil)))

(insn 5189 5187 2451 2 (set (mem/c:HI (plus:HI (reg:HI 30 r30)
                (const_int 1 [0x1])) [8 %sfp+129 S2 A8])
        (reg:HI 18 r18)) c950612-1.c:20 9 {*movhi}
     (nil))

(note 2451 5189 2537 2 NOTE_INSN_DELETED)

(note 2537 2451 1651 2 NOTE_INSN_DELETED)

(note 1651 2537 3180 2 NOTE_INSN_DELETED)

(note 3180 1651 5191 2 NOTE_INSN_DELETED)

(insn 5191 3180 5192 2 (set (reg:HI 30 r30)
        (const_int 128 [0x80])) c950612-1.c:132 9 {*movhi}
     (nil))

(insn 5192 5191 5071 2 (set (reg:HI 30 r30)
        (plus:HI (reg:HI 30 r30)
            (reg/f:HI 28 r28))) c950612-1.c:132 21 {*addhi3}
     (expr_list:REG_EQUIV (plus:HI (reg/f:HI 28 r28)
            (const_int 128 [0x80]))
        (nil)))

(insn 5071 5192 5073 2 (set (mem/c:HI (plus:HI (reg:HI 30 r30)
                (const_int 3 [0x3])) [8 %sfp+131 S2 A8])
        (reg/v/f:HI 8 r8 [orig:211 pc ] [211])) c950612-1.c:132 9 {*movhi}
     (nil))
------------------------------------------------------

Insns 5186,5187 equal to 5191,5192 and 5191,5192 can be removed, but
reload_cse_regs_1 operate only on one insn.

5186 Z=128
5187 Z=Y+128       ; REG_EQUIV Z=Y+128
5189 HI:[Z+1]=HI:R18
...(deleted insns)
5191 Z=128
5192 Z=Y+128       ; REG_EQUIV Z=Y+128

(5191,5192) really a one three addressing add Z=Y+128.
Insns (5191,5192) exists because AVR havn't 3 addressing add.
Insn 5191 destroy REG_EQUIV (it's right), but reload_cse_regs_1 can't
optimize insns 5191,5192.

Right now I have only one idea:
1. create peephole2 for joining such insns (5191,5192);
2. inside machine dependent pass rerun postreload and may be gcse2;
3. split joined insns to originals.

Denis.
Georg-Johann Lay June 26, 2011, 7:59 p.m. UTC | #17
Denis Chertykov schrieb:
> 2011/6/24 Richard Henderson <rth@redhat.com>:
> 
>>On 06/23/2011 01:15 PM, Denis Chertykov wrote:
>>
>>>>  text    data     bss     dec     hex filename
>>>> 10032      25       0   10057    2749 bld-avr-orig/gcc/z.o
>>>>  5816      25       0    5841    16d1 bld-avr-new/gcc/z.o
>>>
>>>Richard, can you send me this z.c file ?
>>>Right now I'm notice that new code is worse.
>>
>>That's gcc.c-torture/compile/950612-1.c.
> 
> I have founded that postreload optimizations can't handle results of
> new L_R_A code.
> I think that it's can be handled by CSE (postreload).
> 
> This is a fragment from 950612-1.c:
> 
> ---------- you can skip it I have a short version below -------
> (insn 5186 23 5187 2 (set (reg:HI 30 r30)
>         (const_int 128 [0x80])) c950612-1.c:20 9 {*movhi}
>      (nil))
> 
> (insn 5187 5186 5189 2 (set (reg:HI 30 r30)
>         (plus:HI (reg:HI 30 r30)
>             (reg/f:HI 28 r28))) c950612-1.c:20 21 {*addhi3}
>      (expr_list:REG_EQUIV (plus:HI (reg/f:HI 28 r28)
>             (const_int 128 [0x80]))
>         (nil)))
> 
> (insn 5189 5187 2451 2 (set (mem/c:HI (plus:HI (reg:HI 30 r30)
>                 (const_int 1 [0x1])) [8 %sfp+129 S2 A8])
>         (reg:HI 18 r18)) c950612-1.c:20 9 {*movhi}
>      (nil))
> 
> (note 2451 5189 2537 2 NOTE_INSN_DELETED)
> 
> (note 2537 2451 1651 2 NOTE_INSN_DELETED)
> 
> (note 1651 2537 3180 2 NOTE_INSN_DELETED)
> 
> (note 3180 1651 5191 2 NOTE_INSN_DELETED)
> 
> (insn 5191 3180 5192 2 (set (reg:HI 30 r30)
>         (const_int 128 [0x80])) c950612-1.c:132 9 {*movhi}
>      (nil))
> 
> (insn 5192 5191 5071 2 (set (reg:HI 30 r30)
>         (plus:HI (reg:HI 30 r30)
>             (reg/f:HI 28 r28))) c950612-1.c:132 21 {*addhi3}
>      (expr_list:REG_EQUIV (plus:HI (reg/f:HI 28 r28)
>             (const_int 128 [0x80]))
>         (nil)))
> 
> (insn 5071 5192 5073 2 (set (mem/c:HI (plus:HI (reg:HI 30 r30)
>                 (const_int 3 [0x3])) [8 %sfp+131 S2 A8])
>         (reg/v/f:HI 8 r8 [orig:211 pc ] [211])) c950612-1.c:132 9 {*movhi}
>      (nil))
> ------------------------------------------------------
> 
> Insns 5186,5187 equal to 5191,5192 and 5191,5192 can be removed, but
> reload_cse_regs_1 operate only on one insn.
> 
> 5186 Z=128
> 5187 Z=Y+128       ; REG_EQUIV Z=Y+128
> 5189 HI:[Z+1]=HI:R18
> ...(deleted insns)
> 5191 Z=128
> 5192 Z=Y+128       ; REG_EQUIV Z=Y+128
> 
> (5191,5192) really a one three addressing add Z=Y+128.
> Insns (5191,5192) exists because AVR havn't 3 addressing add.
> Insn 5191 destroy REG_EQUIV (it's right), but reload_cse_regs_1 can't
> optimize insns 5191,5192.

Did you try to add constraint alternative to *addhi3?
Like "*!d,d,n" or even "*!r,r,n"

I saw some code improvement with that alternative.

Johann

> Right now I have only one idea:
> 1. create peephole2 for joining such insns (5191,5192);
> 2. inside machine dependent pass rerun postreload and may be gcse2;
> 3. split joined insns to originals.
> 
> Denis.
Denis Chertykov June 27, 2011, 5:07 a.m. UTC | #18
2011/6/26 Georg-Johann Lay <avr@gjlay.de>:
> Denis Chertykov schrieb:
>>
>> 2011/6/24 Richard Henderson <rth@redhat.com>:
>>
>>> On 06/23/2011 01:15 PM, Denis Chertykov wrote:
>>>
>>>>>  text    data     bss     dec     hex filename
>>>>> 10032      25       0   10057    2749 bld-avr-orig/gcc/z.o
>>>>>  5816      25       0    5841    16d1 bld-avr-new/gcc/z.o
>>>>
>>>> Richard, can you send me this z.c file ?
>>>> Right now I'm notice that new code is worse.
>>>
>>> That's gcc.c-torture/compile/950612-1.c.
>>
>> I have founded that postreload optimizations can't handle results of
>> new L_R_A code.
>> I think that it's can be handled by CSE (postreload).
> Did you try to add constraint alternative to *addhi3?
> Like "*!d,d,n" or even "*!r,r,n"
>
> I saw some code improvement with that alternative.

I'm trying:

(define_insn "*addhi3"
  [(set (match_operand:HI 0 "register_operand" "=r,!w,!w,d,r,r,!d")
 	(plus:HI
 	 (match_operand:HI 1 "register_operand" "%0,0,0,0,0,0,!r")
 	 (match_operand:HI 2 "nonmemory_operand" "r,I,J,i,P,N,!ri")))]
  ""
  "@
 	add %A0,%A2\;adc %B0,%B2
 	adiw %A0,%2
 	sbiw %A0,%n2
 	subi %A0,lo8(-(%2))\;sbci %B0,hi8(-(%2))
 	sec\;adc %A0,__zero_reg__\;adc %B0,__zero_reg__
 	sec\;sbc %A0,__zero_reg__\;sbc %B0,__zero_reg__
        #"
  [(set_attr "length" "2,1,1,2,3,3,4")
   (set_attr "cc" "set_n,set_czn,set_czn,set_czn,set_n,set_n,set_n")])

;; Special split three addressing addhi3
;; to make postreload optimization possible
(define_split ; addhi3 !d,!r,!ri
  [(set (match_operand:HI 0 "d_register_operand" "")
	(plus:HI (match_operand:HI 1 "register_operand" "")
		 (match_operand:HI 2 "nonmemory_operand" "")))]
  "reload_completed"
  [(set (match_dup 0) (match_dup 2))
   (set (match_dup 0) (plus:HI (match_dup 0) (match_dup 1)))]
  "")

The main problem for me is that the new addressing mode produce a
worse code in many tests.
Although, results for gcc.c-torture/compile/950612-1.c is
significantly better with new addressing.

Denis.
Georg-Johann Lay June 27, 2011, 8:42 a.m. UTC | #19
Denis Chertykov wrote:
> 2011/6/26 Georg-Johann Lay <avr@gjlay.de>:
>> Denis Chertykov schrieb:
>>> 2011/6/24 Richard Henderson <rth@redhat.com>:
>>>
>>>> On 06/23/2011 01:15 PM, Denis Chertykov wrote:
>>>>
>>>>>>  text    data     bss     dec     hex filename
>>>>>> 10032      25       0   10057    2749 bld-avr-orig/gcc/z.o
>>>>>>  5816      25       0    5841    16d1 bld-avr-new/gcc/z.o
>>>>> Richard, can you send me this z.c file ?
>>>>> Right now I'm notice that new code is worse.
>>>> That's gcc.c-torture/compile/950612-1.c.
>>> I have founded that postreload optimizations can't handle results of
>>> new L_R_A code.
>>> I think that it's can be handled by CSE (postreload).
>> Did you try to add constraint alternative to *addhi3?
>> Like "*!d,d,n" or even "*!r,r,n"
>>
>> I saw some code improvement with that alternative.
> 
> I'm trying:
> 
> (define_insn "*addhi3"
>   [(set (match_operand:HI 0 "register_operand" "=r,!w,!w,d,r,r,!d")
>  	(plus:HI
>  	 (match_operand:HI 1 "register_operand" "%0,0,0,0,0,0,!r")
>  	 (match_operand:HI 2 "nonmemory_operand" "r,I,J,i,P,N,!ri")))]
>   ""
>   "@
>  	add %A0,%A2\;adc %B0,%B2
>  	adiw %A0,%2
>  	sbiw %A0,%n2
>  	subi %A0,lo8(-(%2))\;sbci %B0,hi8(-(%2))
>  	sec\;adc %A0,__zero_reg__\;adc %B0,__zero_reg__
>  	sec\;sbc %A0,__zero_reg__\;sbc %B0,__zero_reg__
>         #"
>   [(set_attr "length" "2,1,1,2,3,3,4")
>    (set_attr "cc" "set_n,set_czn,set_czn,set_czn,set_n,set_n,set_n")])
> 

That split will split always:

> ;; Special split three addressing addhi3
> ;; to make postreload optimization possible
> (define_split ; addhi3 !d,!r,!ri
>   [(set (match_operand:HI 0 "d_register_operand" "")
> 	(plus:HI (match_operand:HI 1 "register_operand" "")
> 		 (match_operand:HI 2 "nonmemory_operand" "")))]
>   "reload_completed"
     && REGNO(operands[0]) != REGNO(operands[1])"
>   [(set (match_dup 0) (match_dup 2))
>    (set (match_dup 0) (plus:HI (match_dup 0) (match_dup 1)))]
>   "")
Maybe it can also restrict to const_int_operand in #2 and then it's
best to
    (set (match_dup 0)
         (match_dup 1))
    (set (match_dup 0)
         (plus:HI (match_dup 0)
                  (match_dup 2)))
> 
> The main problem for me is that the new addressing mode produce a
> worse code in many tests.

You have an example source?

> Although, results for gcc.c-torture/compile/950612-1.c is
> significantly better with new addressing.

That testcase is pathologic for AVR...

> 
> Denis.
> 

Johann
Denis Chertykov June 27, 2011, 10:07 a.m. UTC | #20
2011/6/27 Georg-Johann Lay <avr@gjlay.de>:
> Denis Chertykov wrote:
>> 2011/6/26 Georg-Johann Lay <avr@gjlay.de>:
>>> Denis Chertykov schrieb:
>>>> 2011/6/24 Richard Henderson <rth@redhat.com>:
>>>>
>>>>> On 06/23/2011 01:15 PM, Denis Chertykov wrote:
>>>>>
>>>>>>>  text    data     bss     dec     hex filename
>>>>>>> 10032      25       0   10057    2749 bld-avr-orig/gcc/z.o
>>>>>>>  5816      25       0    5841    16d1 bld-avr-new/gcc/z.o
>>>>>> Richard, can you send me this z.c file ?
>>>>>> Right now I'm notice that new code is worse.
>>>>> That's gcc.c-torture/compile/950612-1.c.
>>>> I have founded that postreload optimizations can't handle results of
>>>> new L_R_A code.
>>>> I think that it's can be handled by CSE (postreload).
>>> Did you try to add constraint alternative to *addhi3?
>>> Like "*!d,d,n" or even "*!r,r,n"
>>>
>>> I saw some code improvement with that alternative.
>>
>> I'm trying:
>>
>> (define_insn "*addhi3"
>>   [(set (match_operand:HI 0 "register_operand" "=r,!w,!w,d,r,r,!d")
>>       (plus:HI
>>        (match_operand:HI 1 "register_operand" "%0,0,0,0,0,0,!r")
>>        (match_operand:HI 2 "nonmemory_operand" "r,I,J,i,P,N,!ri")))]
>>   ""
>>   "@
>>       add %A0,%A2\;adc %B0,%B2
>>       adiw %A0,%2
>>       sbiw %A0,%n2
>>       subi %A0,lo8(-(%2))\;sbci %B0,hi8(-(%2))
>>       sec\;adc %A0,__zero_reg__\;adc %B0,__zero_reg__
>>       sec\;sbc %A0,__zero_reg__\;sbc %B0,__zero_reg__
>>         #"
>>   [(set_attr "length" "2,1,1,2,3,3,4")
>>    (set_attr "cc" "set_n,set_czn,set_czn,set_czn,set_n,set_n,set_n")])
>>
>
> That split will split always:
>
>> ;; Special split three addressing addhi3
>> ;; to make postreload optimization possible
>> (define_split ; addhi3 !d,!r,!ri
>>   [(set (match_operand:HI 0 "d_register_operand" "")
>>       (plus:HI (match_operand:HI 1 "register_operand" "")
>>                (match_operand:HI 2 "nonmemory_operand" "")))]
>>   "reload_completed"
>     && REGNO(operands[0]) != REGNO(operands[1])"
>>   [(set (match_dup 0) (match_dup 2))
>>    (set (match_dup 0) (plus:HI (match_dup 0) (match_dup 1)))]
>>   "")
> Maybe it can also restrict to const_int_operand in #2 and then it's
> best to
>    (set (match_dup 0)
>         (match_dup 1))
>    (set (match_dup 0)
>         (plus:HI (match_dup 0)
>                  (match_dup 2)))

Thanks for suggestions.

>> The main problem for me is that the new addressing mode produce a
>> worse code in many tests.
>
> You have an example source?

In attachment.

Denis.
diff mbox

Patch

Index: config/avr/avr-protos.h
===================================================================
--- config/avr/avr-protos.h	(Revision 175011)
+++ config/avr/avr-protos.h	(Arbeitskopie)
@@ -24,7 +24,7 @@ 
 
 extern int function_arg_regno_p (int r);
 extern void avr_cpu_cpp_builtins (struct cpp_reader * pfile);
-extern enum reg_class avr_regno_reg_class (int r);
+extern reg_class_t avr_regno_reg_class (unsigned int r);
 extern void asm_globalize_label (FILE *file, const char *name);
 extern void avr_asm_declare_function_name (FILE *, const char *, tree);
 extern void order_regs_for_local_alloc (void);
@@ -88,9 +88,6 @@  extern int extra_constraint_Q (rtx x);
 extern int adjust_insn_length (rtx insn, int len);
 extern const char *output_reload_inhi (rtx insn, rtx *operands, int *len);
 extern const char *output_reload_insisf (rtx insn, rtx *operands, int *len);
-extern enum reg_class secondary_input_reload_class (enum reg_class,
-						    enum machine_mode,
-						    rtx);
 extern void notice_update_cc (rtx body, rtx insn);
 extern void print_operand (FILE *file, rtx x, int code);
 extern void print_operand_address (FILE *file, rtx addr);
@@ -109,6 +106,8 @@  extern RTX_CODE avr_normalize_condition
 extern int compare_eq_p (rtx insn);
 extern void out_shift_with_cnt (const char *templ, rtx insn,
 				rtx operands[], int *len, int t_len);
+extern reg_class_t avr_mode_code_base_reg_class (enum machine_mode, RTX_CODE, RTX_CODE);
+extern bool avr_regno_mode_code_ok_for_base_p (int, enum machine_mode, RTX_CODE, RTX_CODE);
 extern rtx avr_incoming_return_addr_rtx (void);
 #endif /* RTX_CODE */
 
Index: config/avr/avr.md
===================================================================
--- config/avr/avr.md	(Revision 175011)
+++ config/avr/avr.md	(Arbeitskopie)
@@ -246,8 +246,8 @@  (define_expand "movqi"
   ")
 
 (define_insn "*movqi"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=r,d,Qm,r,q,r,*r")
-	(match_operand:QI 1 "general_operand"       "rL,i,rL,Qm,r,q,i"))]
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=r,d,m,r,q,r,*r")
+	(match_operand:QI 1 "general_operand"       "rL,i,rL,m,r,q,i"))]
   "(register_operand (operands[0],QImode)
     || register_operand (operands[1], QImode) || const0_rtx == operands[1])"
   "* return output_movqi (insn, operands, NULL);"
@@ -295,15 +295,6 @@  (define_expand "movhi"
     }
 }")
 
-(define_insn "*movhi_sp"
-  [(set (match_operand:HI 0 "register_operand" "=q,r")
-        (match_operand:HI 1 "register_operand"  "r,q"))]
-  "((stack_register_operand(operands[0], HImode) && register_operand (operands[1], HImode))
-    || (register_operand (operands[0], HImode) && stack_register_operand(operands[1], HImode)))"
-  "* return output_movhi (insn, operands, NULL);"
-  [(set_attr "length" "5,2")
-   (set_attr "cc" "none,none")])
-
 (define_insn "movhi_sp_r_irq_off"
   [(set (match_operand:HI 0 "stack_register_operand" "=q")
         (unspec_volatile:HI [(match_operand:HI 1 "register_operand"  "r")] 
@@ -425,8 +416,8 @@  (define_insn "*reload_insi"
 
 
 (define_insn "*movsi"
-  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,r,r,Qm,!d,r")
-        (match_operand:SI 1 "general_operand"       "r,L,Qm,rL,i,i"))]
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,r,r,m,!d,r")
+        (match_operand:SI 1 "general_operand"       "r,L,m,rL,i,i"))]
   "(register_operand (operands[0],SImode)
     || register_operand (operands[1],SImode) || const0_rtx == operands[1])"
   "* return output_movsisf (insn, operands, NULL);"
@@ -451,8 +442,8 @@  (define_expand "movsf"
 }")
 
 (define_insn "*movsf"
-  [(set (match_operand:SF 0 "nonimmediate_operand" "=r,r,r,Qm,!d,r")
-        (match_operand:SF 1 "general_operand"       "r,G,Qm,r,F,F"))]
+  [(set (match_operand:SF 0 "nonimmediate_operand" "=r,r,r,m,!d,r")
+        (match_operand:SF 1 "general_operand"       "r,G,m,r,F,F"))]
   "register_operand (operands[0], SFmode)
    || register_operand (operands[1], SFmode)"
   "* return output_movsisf (insn, operands, NULL);"
@@ -1518,8 +1509,8 @@  (define_mode_attr rotx [(DI "&r,&r,X") (
 (define_mode_attr rotsmode [(DI "QI") (SI "HI") (HI "QI")])
 
 (define_expand "rotl<mode>3"
-  [(parallel [(set (match_operand:HIDI 0 "register_operand" "")
-		   (rotate:HIDI (match_operand:HIDI 1 "register_operand" "")
+  [(parallel [(set (match_operand:HISI 0 "register_operand" "")
+		   (rotate:HISI (match_operand:HISI 1 "register_operand" "")
 				(match_operand:VOID 2 "const_int_operand" "")))
 		(clobber (match_dup 3))])]
   ""
@@ -1618,7 +1609,7 @@  (define_split ; ashlqi3_const6
 (define_insn "*ashlqi3"
   [(set (match_operand:QI 0 "register_operand"           "=r,r,r,r,!d,r,r")
 	(ashift:QI (match_operand:QI 1 "register_operand" "0,0,0,0,0,0,0")
-		   (match_operand:QI 2 "general_operand"  "r,L,P,K,n,n,Qm")))]
+		   (match_operand:QI 2 "general_operand"  "r,L,P,K,n,n,m")))]
   ""
   "* return ashlqi3_out (insn, operands, NULL);"
   [(set_attr "length" "5,0,1,2,4,6,9")
@@ -1627,7 +1618,7 @@  (define_insn "*ashlqi3"
 (define_insn "ashlhi3"
   [(set (match_operand:HI 0 "register_operand"           "=r,r,r,r,r,r,r")
 	(ashift:HI (match_operand:HI 1 "register_operand" "0,0,0,r,0,0,0")
-		   (match_operand:QI 2 "general_operand"  "r,L,P,O,K,n,Qm")))]
+		   (match_operand:QI 2 "general_operand"  "r,L,P,O,K,n,m")))]
   ""
   "* return ashlhi3_out (insn, operands, NULL);"
   [(set_attr "length" "6,0,2,2,4,10,10")
@@ -1636,7 +1627,7 @@  (define_insn "ashlhi3"
 (define_insn "ashlsi3"
   [(set (match_operand:SI 0 "register_operand"           "=r,r,r,r,r,r,r")
 	(ashift:SI (match_operand:SI 1 "register_operand" "0,0,0,r,0,0,0")
-		   (match_operand:QI 2 "general_operand"  "r,L,P,O,K,n,Qm")))]
+		   (match_operand:QI 2 "general_operand"  "r,L,P,O,K,n,m")))]
   ""
   "* return ashlsi3_out (insn, operands, NULL);"
   [(set_attr "length" "8,0,4,4,8,10,12")
@@ -1725,7 +1716,7 @@  (define_insn "*ashlsi3_const"
 (define_insn "ashrqi3"
   [(set (match_operand:QI 0 "register_operand" "=r,r,r,r,r,r")
 	(ashiftrt:QI (match_operand:QI 1 "register_operand" "0,0,0,0,0,0")
-		     (match_operand:QI 2 "general_operand"  "r,L,P,K,n,Qm")))]
+		     (match_operand:QI 2 "general_operand"  "r,L,P,K,n,m")))]
   ""
   "* return ashrqi3_out (insn, operands, NULL);"
   [(set_attr "length" "5,0,1,2,5,9")
@@ -1734,7 +1725,7 @@  (define_insn "ashrqi3"
 (define_insn "ashrhi3"
   [(set (match_operand:HI 0 "register_operand"             "=r,r,r,r,r,r,r")
 	(ashiftrt:HI (match_operand:HI 1 "register_operand" "0,0,0,r,0,0,0")
-		     (match_operand:QI 2 "general_operand"  "r,L,P,O,K,n,Qm")))]
+		     (match_operand:QI 2 "general_operand"  "r,L,P,O,K,n,m")))]
   ""
   "* return ashrhi3_out (insn, operands, NULL);"
   [(set_attr "length" "6,0,2,4,4,10,10")
@@ -1743,7 +1734,7 @@  (define_insn "ashrhi3"
 (define_insn "ashrsi3"
   [(set (match_operand:SI 0 "register_operand"             "=r,r,r,r,r,r,r")
 	(ashiftrt:SI (match_operand:SI 1 "register_operand" "0,0,0,r,0,0,0")
-		     (match_operand:QI 2 "general_operand"  "r,L,P,O,K,n,Qm")))]
+		     (match_operand:QI 2 "general_operand"  "r,L,P,O,K,n,m")))]
   ""
   "* return ashrsi3_out (insn, operands, NULL);"
   [(set_attr "length" "8,0,4,6,8,10,12")
@@ -1833,7 +1824,7 @@  (define_split	; lshrqi3_const6
 (define_insn "*lshrqi3"
   [(set (match_operand:QI 0 "register_operand"             "=r,r,r,r,!d,r,r")
 	(lshiftrt:QI (match_operand:QI 1 "register_operand" "0,0,0,0,0,0,0")
-		     (match_operand:QI 2 "general_operand"  "r,L,P,K,n,n,Qm")))]
+		     (match_operand:QI 2 "general_operand"  "r,L,P,K,n,n,m")))]
   ""
   "* return lshrqi3_out (insn, operands, NULL);"
   [(set_attr "length" "5,0,1,2,4,6,9")
@@ -1842,7 +1833,7 @@  (define_insn "*lshrqi3"
 (define_insn "lshrhi3"
   [(set (match_operand:HI 0 "register_operand"             "=r,r,r,r,r,r,r")
 	(lshiftrt:HI (match_operand:HI 1 "register_operand" "0,0,0,r,0,0,0")
-		     (match_operand:QI 2 "general_operand"  "r,L,P,O,K,n,Qm")))]
+		     (match_operand:QI 2 "general_operand"  "r,L,P,O,K,n,m")))]
   ""
   "* return lshrhi3_out (insn, operands, NULL);"
   [(set_attr "length" "6,0,2,2,4,10,10")
@@ -1851,7 +1842,7 @@  (define_insn "lshrhi3"
 (define_insn "lshrsi3"
   [(set (match_operand:SI 0 "register_operand"             "=r,r,r,r,r,r,r")
 	(lshiftrt:SI (match_operand:SI 1 "register_operand" "0,0,0,r,0,0,0")
-		     (match_operand:QI 2 "general_operand"  "r,L,P,O,K,n,Qm")))]
+		     (match_operand:QI 2 "general_operand"  "r,L,P,O,K,n,m")))]
   ""
   "* return lshrsi3_out (insn, operands, NULL);"
   [(set_attr "length" "8,0,4,4,8,10,12")
@@ -2124,54 +2115,6 @@  (define_insn_and_split "zero_extendhisi2
   operands[3] = simplify_gen_subreg (HImode, operands[0], SImode, high_off);
 })
 
-(define_insn_and_split "zero_extendqidi2"
-  [(set (match_operand:DI 0 "register_operand" "=r")
-        (zero_extend:DI (match_operand:QI 1 "register_operand" "r")))]
-  ""
-  "#"
-  "reload_completed"
-  [(set (match_dup 2) (zero_extend:SI (match_dup 1)))
-   (set (match_dup 3) (const_int 0))]
-{
-  unsigned int low_off = subreg_lowpart_offset (SImode, DImode);
-  unsigned int high_off = subreg_highpart_offset (SImode, DImode);
-
-  operands[2] = simplify_gen_subreg (SImode, operands[0], DImode, low_off);
-  operands[3] = simplify_gen_subreg (SImode, operands[0], DImode, high_off);
-})
-
-(define_insn_and_split "zero_extendhidi2"
-  [(set (match_operand:DI 0 "register_operand" "=r")
-        (zero_extend:DI (match_operand:HI 1 "register_operand" "r")))]
-  ""
-  "#"
-  "reload_completed"
-  [(set (match_dup 2) (zero_extend:SI (match_dup 1)))
-   (set (match_dup 3) (const_int 0))]
-{
-  unsigned int low_off = subreg_lowpart_offset (SImode, DImode);
-  unsigned int high_off = subreg_highpart_offset (SImode, DImode);
-
-  operands[2] = simplify_gen_subreg (SImode, operands[0], DImode, low_off);
-  operands[3] = simplify_gen_subreg (SImode, operands[0], DImode, high_off);
-})
-
-(define_insn_and_split "zero_extendsidi2"
-  [(set (match_operand:DI 0 "register_operand" "=r")
-        (zero_extend:DI (match_operand:SI 1 "register_operand" "r")))]
-  ""
-  "#"
-  "reload_completed"
-  [(set (match_dup 2) (match_dup 1))
-   (set (match_dup 3) (const_int 0))]
-{
-  unsigned int low_off = subreg_lowpart_offset (SImode, DImode);
-  unsigned int high_off = subreg_highpart_offset (SImode, DImode);
-
-  operands[2] = simplify_gen_subreg (SImode, operands[0], DImode, low_off);
-  operands[3] = simplify_gen_subreg (SImode, operands[0], DImode, high_off);
-})
-
 ;;<=><=><=><=><=><=><=><=><=><=><=><=><=><=><=><=><=><=><=><=><=><=><=><=><=>
 ;; compare
 
@@ -2430,7 +2373,7 @@  (define_insn "*sbrx_branch<mode>"
   [(set (pc)
         (if_then_else
 	 (match_operator 0 "eqne_operator"
-			 [(zero_extract:QIDI
+			 [(zero_extract:QISI
 			   (match_operand:VOID 1 "register_operand" "r")
 			   (const_int 1)
 			   (match_operand 2 "const_int_operand" "n"))
Index: config/avr/avr.c
===================================================================
--- config/avr/avr.c	(Revision 175011)
+++ config/avr/avr.c	(Arbeitskopie)
@@ -343,18 +343,27 @@  avr_help (void)
 
 /*  return register class from register number.  */
 
-static const enum reg_class reg_class_tab[]={
-  GENERAL_REGS,GENERAL_REGS,GENERAL_REGS,GENERAL_REGS,GENERAL_REGS,
-  GENERAL_REGS,GENERAL_REGS,GENERAL_REGS,GENERAL_REGS,GENERAL_REGS,
-  GENERAL_REGS,GENERAL_REGS,GENERAL_REGS,GENERAL_REGS,GENERAL_REGS,
-  GENERAL_REGS, /* r0 - r15 */
-  LD_REGS,LD_REGS,LD_REGS,LD_REGS,LD_REGS,LD_REGS,LD_REGS,
-  LD_REGS,                      /* r16 - 23 */
-  ADDW_REGS,ADDW_REGS,          /* r24,r25 */
-  POINTER_X_REGS,POINTER_X_REGS, /* r26,27 */
-  POINTER_Y_REGS,POINTER_Y_REGS, /* r28,r29 */
-  POINTER_Z_REGS,POINTER_Z_REGS, /* r30,r31 */
-  STACK_REG,STACK_REG           /* SPL,SPH */
+static const reg_class_t reg_class_tab[] =
+  {
+    /* r0 */
+    R0_REG,
+    /* r1 - r15 */
+    NO_LD_REGS, NO_LD_REGS, NO_LD_REGS, NO_LD_REGS, NO_LD_REGS,
+    NO_LD_REGS, NO_LD_REGS, NO_LD_REGS, NO_LD_REGS, NO_LD_REGS,
+    NO_LD_REGS, NO_LD_REGS, NO_LD_REGS, NO_LD_REGS, NO_LD_REGS,
+    /* r16 - r23 */
+    SIMPLE_LD_REGS, SIMPLE_LD_REGS, SIMPLE_LD_REGS, SIMPLE_LD_REGS,
+    SIMPLE_LD_REGS, SIMPLE_LD_REGS, SIMPLE_LD_REGS, SIMPLE_LD_REGS,
+    /* r24, r25 */
+    ADDW_REGS, ADDW_REGS,
+    /* r26, r27 */
+    POINTER_X_REGS, POINTER_X_REGS,
+    /* r28, r29 */
+    POINTER_Y_REGS, POINTER_Y_REGS,
+    /* r30, r31 */
+    POINTER_Z_REGS, POINTER_Z_REGS,
+    /* SPL, SPH */
+    STACK_REG, STACK_REG
 };
 
 /* Function to set up the backend function structure.  */
@@ -367,8 +376,8 @@  avr_init_machine_status (void)
 
 /* Return register class for register R.  */
 
-enum reg_class
-avr_regno_reg_class (int r)
+reg_class_t
+avr_regno_reg_class (unsigned int r)
 {
   if (r <= 33)
     return reg_class_tab[r];
@@ -1146,74 +1155,75 @@  avr_cannot_modify_jumps_p (void)
 }
 
 
-/* Return nonzero if X (an RTX) is a legitimate memory address on the target
-   machine for a memory operand of mode MODE.  */
+/* Helper function for `avr_legitimate_address_p'.  */
+
+static inline int
+avr_reg_ok_for_addr (rtx reg, int strict)
+{
+  return (REG_P (reg)
+          && (avr_regno_mode_code_ok_for_base_p (REGNO (reg), QImode, MEM, SCRATCH)
+              || (!strict && REGNO (reg) >= FIRST_PSEUDO_REGISTER)));
+}
+
+
+/* Implement `TARGET_LEGITIMATE_ADDRESS_P'.  */
 
 bool
 avr_legitimate_address_p (enum machine_mode mode, rtx x, bool strict)
 {
-  enum reg_class r = NO_REGS;
-  
-  if (TARGET_ALL_DEBUG)
-    {
-      fprintf (stderr, "mode: (%s) %s %s %s %s:",
-	       GET_MODE_NAME(mode),
-	       strict ? "(strict)": "",
-	       reload_completed ? "(reload_completed)": "",
-	       reload_in_progress ? "(reload_in_progress)": "",
-	       reg_renumber ? "(reg_renumber)" : "");
-      if (GET_CODE (x) == PLUS
-	  && REG_P (XEXP (x, 0))
-	  && GET_CODE (XEXP (x, 1)) == CONST_INT
-	  && INTVAL (XEXP (x, 1)) >= 0
-	  && INTVAL (XEXP (x, 1)) <= MAX_LD_OFFSET (mode)
-	  && reg_renumber
-	  )
-	fprintf (stderr, "(r%d ---> r%d)", REGNO (XEXP (x, 0)),
-		 true_regnum (XEXP (x, 0)));
-      debug_rtx (x);
-    }
-  if (!strict && GET_CODE (x) == SUBREG)
-	x = SUBREG_REG (x);
-  if (REG_P (x) && (strict ? REG_OK_FOR_BASE_STRICT_P (x)
-                    : REG_OK_FOR_BASE_NOSTRICT_P (x)))
-    r = POINTER_REGS;
-  else if (CONSTANT_ADDRESS_P (x))
-    r = ALL_REGS;
-  else if (GET_CODE (x) == PLUS
-           && REG_P (XEXP (x, 0))
-	   && GET_CODE (XEXP (x, 1)) == CONST_INT
-	   && INTVAL (XEXP (x, 1)) >= 0)
-    {
-      int fit = INTVAL (XEXP (x, 1)) <= MAX_LD_OFFSET (mode);
-      if (fit)
-	{
-	  if (! strict
-	      || REGNO (XEXP (x,0)) == REG_X
-	      || REGNO (XEXP (x,0)) == REG_Y
-	      || REGNO (XEXP (x,0)) == REG_Z)
-	    r = BASE_POINTER_REGS;
-	  if (XEXP (x,0) == frame_pointer_rtx
-	      || XEXP (x,0) == arg_pointer_rtx)
-	    r = BASE_POINTER_REGS;
-	}
-      else if (frame_pointer_needed && XEXP (x,0) == frame_pointer_rtx)
-	r = POINTER_Y_REGS;
-    }
-  else if ((GET_CODE (x) == PRE_DEC || GET_CODE (x) == POST_INC)
-           && REG_P (XEXP (x, 0))
-           && (strict ? REG_OK_FOR_BASE_STRICT_P (XEXP (x, 0))
-               : REG_OK_FOR_BASE_NOSTRICT_P (XEXP (x, 0))))
-    {
-      r = POINTER_REGS;
-    }
-  if (TARGET_ALL_DEBUG)
+  bool ok = false;
+
+  switch (GET_CODE (x))
     {
-      fprintf (stderr, "   ret = %c\n", r + '0');
+    case REG:
+      ok = avr_reg_ok_for_addr (x, strict);
+      if (strict
+          && DImode == mode
+          && REG_X == REGNO (x))
+        {
+          ok = false;
+        }
+      break;
+
+    case POST_INC:
+    case PRE_DEC:
+      ok = avr_reg_ok_for_addr (XEXP (x, 0), strict);
+      break;
+
+    case SYMBOL_REF:
+    case CONST_INT:
+    case CONST:
+      ok = true;
+      break;
+      
+    case PLUS:
+      {
+        rtx op0 = XEXP (x, 0);
+        rtx op1 = XEXP (x, 1);
+            
+        if (REG_P (op0)
+            && CONST_INT_P (op1))
+          {
+            ok = (avr_reg_ok_for_addr (op0, strict)
+                  && INTVAL (op1) >= 0
+                  && INTVAL (op1) <= MAX_LD_OFFSET (mode));
+            
+            if (strict
+                && REG_X == REGNO (op0))
+              {
+                ok = false;
+              }
+          }
+        break;
+      }
+      
+    default:
+      break;
     }
-  return r == NO_REGS ? 0 : (int)r;
-}
 
+  return ok;
+}
+ 
 /* Attempts to replace X with a valid
    memory address for an operand of mode MODE  */
 
@@ -6278,27 +6288,74 @@  jump_over_one_insn_p (rtx insn, rtx dest
 int
 avr_hard_regno_mode_ok (int regno, enum machine_mode mode)
 {
-  /* Disallow QImode in stack pointer regs.  */
-  if ((regno == REG_SP || regno == (REG_SP + 1)) && mode == QImode)
-    return 0;
+  /* Any GENERAL_REGS register can hold 8-bit values.  */
+  /* FIXME:  8-bit values must not be disallowed for R28 or R29.
+     Disallowing QI et al. in these registers might lead to code like
+         (set (subreg:QI (reg:HI 28) n) ...)
+     which will result in wrong code because reload does not handle
+     SUBREGs of hard regsisters like this.  This could be fixed in reload.
+     However, it appears that fixing reload is not wanted by reload people.  */
+  
+  if (GET_MODE_SIZE (mode) == 1)
+     return 1;
+   
+   /* All modes larger than 8 bits should start in an even register.  */
+   
+  return regno % 2 == 0;
+}
 
-  /* The only thing that can go into registers r28:r29 is a Pmode.  */
-  if (regno == REG_Y && mode == Pmode)
-    return 1;
 
-  /* Otherwise disallow all regno/mode combinations that span r28:r29.  */
-  if (regno <= (REG_Y + 1) && (regno + GET_MODE_SIZE (mode)) >= (REG_Y + 1))
-    return 0;
+/* Worker function for `MODE_CODE_BASE_REG_CLASS'.  */
 
-  if (mode == QImode)
-    return 1;
+reg_class_t
+avr_mode_code_base_reg_class (enum machine_mode mode ATTRIBUTE_UNUSED,
+                              RTX_CODE outer_code, RTX_CODE index_code ATTRIBUTE_UNUSED)
+{
+  reg_class_t rclass = BASE_POINTER_REGS;
+  
+  switch (outer_code)
+    {
+    case MEM:
+    case POST_INC:
+    case PRE_DEC:
+      rclass = POINTER_REGS;
+      break;
+      
+    default:
+      break;
+    }
+  
+  return rclass;
+}
 
-  /* Modes larger than QImode occupy consecutive registers.  */
-  if (regno + GET_MODE_SIZE (mode) > FIRST_PSEUDO_REGISTER)
-    return 0;
 
-  /* All modes larger than QImode should start in an even register.  */
-  return !(regno & 1);
+/* Worker function for `REGNO_MODE_CODE_OK_FOR_BASE_P'.  */
+
+bool
+avr_regno_mode_code_ok_for_base_p (int regno, enum machine_mode mode ATTRIBUTE_UNUSED,
+                                   RTX_CODE outer_code, RTX_CODE index_code ATTRIBUTE_UNUSED)
+{
+  bool ok;
+  
+  switch (outer_code)
+    {
+    case PLUS:
+      ok = regno == REG_Z || regno == REG_Y;
+      break;
+      
+    case MEM: /* plain reg */
+    case POST_INC:
+    case PRE_DEC:
+      ok = regno == REG_Z || regno == REG_Y || regno == REG_X;
+      break;
+      
+    default:
+      ok = false;
+      break;
+      
+    }
+
+  return ok;
 }
 
 const char *
@@ -6424,13 +6481,23 @@  avr_hard_regno_scratch_ok (unsigned int
       && !df_regs_ever_live_p (regno))
     return false;
 
+  /* Don't allow hard registers that might be part of the frame pointer.
+     Some places in the compiler just test for [HARD_]FRAME_POINTER_REGNUM
+     and don't care for a frame pointer that spans more than one register.  */
+
+  if ((!reload_completed || frame_pointer_needed)
+      && (regno == REG_Y || regno == REG_Y + 1))
+    {
+      return false;
+    }
+
   return true;
 }
 
 /* Return nonzero if register OLD_REG can be renamed to register NEW_REG.  */
 
 int
-avr_hard_regno_rename_ok (unsigned int old_reg ATTRIBUTE_UNUSED,
+avr_hard_regno_rename_ok (unsigned int old_reg,
 			  unsigned int new_reg)
 {
   /* Interrupt functions can only use registers that have already been
@@ -6441,6 +6508,17 @@  avr_hard_regno_rename_ok (unsigned int o
       && !df_regs_ever_live_p (new_reg))
     return 0;
 
+  /* Don't allow hard registers that might be part of the frame pointer.
+     Some places in the compiler just test for [HARD_]FRAME_POINTER_REGNUM
+     and don't care for a frame pointer that spans more than one register.  */
+
+  if ((!reload_completed || frame_pointer_needed)
+      && (old_reg == REG_Y || old_reg == REG_Y + 1
+          || new_reg == REG_Y || new_reg == REG_Y + 1))
+    {
+      return 0;
+    }
+  
   return 1;
 }
 
Index: config/avr/avr.h
===================================================================
--- config/avr/avr.h	(Revision 175011)
+++ config/avr/avr.h	(Arbeitskopie)
@@ -294,21 +294,13 @@  enum reg_class {
 
 #define REGNO_REG_CLASS(R) avr_regno_reg_class(R)
 
-#define BASE_REG_CLASS (reload_completed ? BASE_POINTER_REGS : POINTER_REGS)
+#define MODE_CODE_BASE_REG_CLASS(mode, outer_code, index_code)  \
+  avr_mode_code_base_reg_class (mode, outer_code, index_code)
 
 #define INDEX_REG_CLASS NO_REGS
 
-#define REGNO_OK_FOR_BASE_P(r) (((r) < FIRST_PSEUDO_REGISTER		\
-				 && ((r) == REG_X			\
-				     || (r) == REG_Y			\
-				     || (r) == REG_Z			\
-				     || (r) == ARG_POINTER_REGNUM))	\
-				|| (reg_renumber			\
-				    && (reg_renumber[r] == REG_X	\
-					|| reg_renumber[r] == REG_Y	\
-					|| reg_renumber[r] == REG_Z	\
-					|| (reg_renumber[r]		\
-					    == ARG_POINTER_REGNUM))))
+#define REGNO_MODE_CODE_OK_FOR_BASE_P(num, mode, outer_code, index_code) \
+  avr_regno_mode_code_ok_for_base_p (num, mode, outer_code, index_code) 
 
 #define REGNO_OK_FOR_INDEX_P(NUM) 0
 
@@ -371,16 +363,11 @@  extern int avr_reg_order[];
 
 #define MAX_REGS_PER_ADDRESS 1
 
-#define REG_OK_FOR_BASE_NOSTRICT_P(X) \
-  (REGNO (X) >= FIRST_PSEUDO_REGISTER || REG_OK_FOR_BASE_STRICT_P(X))
-
-#define REG_OK_FOR_BASE_STRICT_P(X) REGNO_OK_FOR_BASE_P (REGNO (X))
-
 /* LEGITIMIZE_RELOAD_ADDRESS will allow register R26/27 to be used, where it
    is no worse than normal base pointers R28/29 and R30/31. For example:
    If base offset is greater than 63 bytes or for R++ or --R addressing.  */
    
-#define LEGITIMIZE_RELOAD_ADDRESS(X, MODE, OPNUM, TYPE, IND_LEVELS, WIN)    \
+#define _LEGITIMIZE_RELOAD_ADDRESS(X, MODE, OPNUM, TYPE, IND_LEVELS, WIN)    \
 do {									    \
   if (1&&(GET_CODE (X) == POST_INC || GET_CODE (X) == PRE_DEC))	    \
     {									    \