diff mbox

[AVR] Light-weight DImode implementation.

Message ID 4ECAA70E.5050205@gjlay.de
State New
Headers show

Commit Message

Georg-Johann Lay Nov. 21, 2011, 7:31 p.m. UTC
This adds support for DImode insns that don't operate byte-wise like NEG,
COMPARE, PLUS, MINUS, ASHIFT, LSHIFTRT, ASHIFTRT, ROTATE.

The crucial point is that there is no movdi, with the following reasoning as
cited from new avr-dimode.md:

;; The purpose of this file is to provide a light-weight DImode
;; implementation for AVR.  The trouble with DImode is that tree -> RTL
;; lowering leads to really unpleasant code for operations that don't
;; work byte-wise like NEG, PLUS, MINUS, etc.  Defining optabs entries for
;; them won't help because the optab machinery assumes these operations
;; are cheap and does not check if a libgcc implementation is available.
;;
;; The DImode insns are all straight forward -- except movdi.  The approach
;; of this implementation is to provide DImode insns without the burden of
;; introducing movdi.
;;
;; The caveat is that if there are insns for some mode, there must also be a
;; respective move insn that describes reloads.  Therefore, this
;; implementation uses an accumulator-based model with two hard-coded,
;; accumulator-like registers
;;
;;    A[] = reg:DI 18
;;    B[] = reg:DI 10
;;
;; so that no DImode insn contains pseudos or needs reloading.

Comments are welcome whether this is reasonable or bullshit ;-)

The implementation is just some 300 lines and works smooth with the test suite
and libgcc and avr-libc generated with -m64, of course.

All insns/expanders in the new avr-dimode.md are triggered by -m64 which is off
per default.

The new option is because I excepted problems with the patch, but it runs
smooth since the first line so that the option can be dropped if the
explanation above holds.

The changes to the rest of the backend are just hand-full of lines :-)

As said above, test suite passes fine both woth -m64 off and with -m64 turn on
and with libgcc/libc built with that switch.

Ok?

Johann

gcc/
	* config/avr/avr.opt (-m64): New option.
	* config/avr/avr-dimode.md: New file.
	* config/avr/avr.md: Include it.
	(adjust_len): Add plus64, compare64.
	(HIDI): Remove code iterator.
	(code_stdname): New code attribute.
	(rotx, rotsmode): Remove DI.
	(rotl<mode>3, *rotw<mode>, *rotb<mode>): Use HISI instead of HIDI
	as code iterator.
	* config/avr/avr-protos.h (avr_out_plus64, avr_out_compare64): New.
	* config/avr/avr.c (avr_out_compare): Handle DImode.
	(avr_out_compare64, avr_out_plus64): New functions.
	(avr_out_plus_1): Use simplify_unary_operation to negate xval.
	(adjust_insn_length): Handle ADJUST_LEN_COMPARE64, ADJUST_LEN_PLUS64.
	(avr_compare_pattern): Skip DImode comparisons.

libgcc/
	* config/avr/t-avr (LIB1ASMFUNCS): Add _adddi3, _adddi3_s8,
	_subdi3, _cmpdi2, _cmpdi2_s8, _rotldi3.
	* config/avr/lib1funcs.S (__adddi3, __adddi3_s8, __subdi3,
	__cmpdi2, __cmpdi2_s8, __rotldi3): New functions.

Comments

Richard Henderson Nov. 21, 2011, 10:39 p.m. UTC | #1
On 11/21/2011 11:31 AM, Georg-Johann Lay wrote:
> ;; The caveat is that if there are insns for some mode, there must also be a
> ;; respective move insn that describes reloads.  Therefore, this
> ;; implementation uses an accumulator-based model with two hard-coded,
> ;; accumulator-like registers
> ;;
> ;;    A[] = reg:DI 18
> ;;    B[] = reg:DI 10
> ;;
> ;; so that no DImode insn contains pseudos or needs reloading.

Well, rtl loop optimization will not work, but given that SSE optimizations ought to have been performed, that's probably acceptable.

It's definitely a hack, but perhaps you'll be able to get away with it.

I do wonder if you might even get smaller code if you force DImode quantities into the stack (just hack use_register_for_decl locally while testing; a new target hook if that pans out), and pass pointers to the variables instead.  At the moment you're having to use 8*3 insns inline to put the quantities in place and take them back out again.  With pointers this would seem to drop to 2*3.


r~
Georg-Johann Lay Nov. 22, 2011, 12:15 a.m. UTC | #2
Richard Henderson schrieb:
> On 11/21/2011 11:31 AM, Georg-Johann Lay wrote:
> 
>>;; The caveat is that if there are insns for some mode, there must also be a
>>;; respective move insn that describes reloads.  Therefore, this
>>;; implementation uses an accumulator-based model with two hard-coded,
>>;; accumulator-like registers
>>;;
>>;;    A[] = reg:DI 18
>>;;    B[] = reg:DI 10
>>;;
>>;; so that no DImode insn contains pseudos or needs reloading.
> 
> Well, rtl loop optimization will not work, but given that SSE

You mean "won't optimize" or "gives wrong code"?
What's SSE? I definitely need a GCC glossary.

> optimizations ought to have been performed, that's probably
> acceptable.
> 
> It's definitely a hack, but perhaps you'll be able to get away with
> it.

Yes, I'm aware it's hack. But the extreme bloaty code -- see below -- is 
one of the reasons for bad reputation of avr-gcc, even though just very 
few people are using 64-bit.

> I do wonder if you might even get smaller code if you force DImode
> quantities into the stack (just hack use_register_for_decl locally
> while testing; a new target hook if that pans out), and pass pointers
> to the variables instead.  At the moment you're having to use 8*3
> insns inline to put the quantities in place and take them back out
> again.  With pointers this would seem to drop to 2*3.

I already thought about using pointers; but remind that AVR only has 3 
pointer registers.  Moreover, I remember some post in gcc@ or gcc-help@ 
where someone asked how to write an addition or similar that works 
/only/ on memory, and the answer was "it's not possible", IIRC.

Anyways, if you compare the new code with /some/ move insns against the 
old code for, say,

long long add64 (long long a, long long b)
{
     return a + b;
}

that compiles with -Os to

add64:
	push r10	 ;  222	pushqi1/1	[length = 1]
	push r11	 ;  223	pushqi1/1	[length = 1]
	push r12	 ;  224	pushqi1/1	[length = 1]
	push r13	 ;  225	pushqi1/1	[length = 1]
	push r14	 ;  226	pushqi1/1	[length = 1]
	push r15	 ;  227	pushqi1/1	[length = 1]
	push r16	 ;  228	pushqi1/1	[length = 1]
	push r17	 ;  229	pushqi1/1	[length = 1]
/* prologue: function */
/* frame size = 0 */
/* stack size = 8 */
.L__stack_usage = 8
	add r10,r18	 ;  24	addqi3/1	[length = 1]
	ldi r30,lo8(1)	 ;  25	*movqi/2	[length = 1]
	cp r10,r18	 ;  26	*cmpqi/2	[length = 1]
	brlo .L2	 ;  27	branch	[length = 1]
	ldi r30,lo8(0)	 ;  28	*movqi/1	[length = 1]
.L2:
	add r11,r19	 ;  30	addqi3/1	[length = 1]
	ldi r18,lo8(1)	 ;  31	*movqi/2	[length = 1]
	cp r11,r19	 ;  32	*cmpqi/2	[length = 1]
	brlo .L3	 ;  33	branch	[length = 1]
	ldi r18,lo8(0)	 ;  34	*movqi/1	[length = 1]
.L3:
	mov r19,r30	 ;  216	*movqi/1	[length = 1]
	add r19,r11	 ;  36	addqi3/1	[length = 1]
	ldi r30,lo8(1)	 ;  37	*movqi/2	[length = 1]
	cp r19,r11	 ;  38	*cmpqi/2	[length = 1]
	brlo .L4	 ;  39	branch	[length = 1]
	ldi r30,lo8(0)	 ;  40	*movqi/1	[length = 1]
.L4:
	or r18,r30	 ;  42	iorqi3/1	[length = 1]
	add r12,r20	 ;  44	addqi3/1	[length = 1]
	ldi r30,lo8(1)	 ;  45	*movqi/2	[length = 1]
	cp r12,r20	 ;  46	*cmpqi/2	[length = 1]
	brlo .L5	 ;  47	branch	[length = 1]
	ldi r30,lo8(0)	 ;  48	*movqi/1	[length = 1]
.L5:
	mov r20,r18	 ;  217	*movqi/1	[length = 1]
	add r20,r12	 ;  50	addqi3/1	[length = 1]
	ldi r18,lo8(1)	 ;  51	*movqi/2	[length = 1]
	cp r20,r12	 ;  52	*cmpqi/2	[length = 1]
	brlo .L6	 ;  53	branch	[length = 1]
	ldi r18,lo8(0)	 ;  54	*movqi/1	[length = 1]
.L6:
	or r30,r18	 ;  56	iorqi3/1	[length = 1]
	add r13,r21	 ;  58	addqi3/1	[length = 1]
	ldi r18,lo8(1)	 ;  59	*movqi/2	[length = 1]
	cp r13,r21	 ;  60	*cmpqi/2	[length = 1]
	brlo .L7	 ;  61	branch	[length = 1]
	ldi r18,lo8(0)	 ;  62	*movqi/1	[length = 1]
.L7:
	mov r21,r30	 ;  218	*movqi/1	[length = 1]
	add r21,r13	 ;  64	addqi3/1	[length = 1]
	ldi r30,lo8(1)	 ;  65	*movqi/2	[length = 1]
	cp r21,r13	 ;  66	*cmpqi/2	[length = 1]
	brlo .L8	 ;  67	branch	[length = 1]
	ldi r30,lo8(0)	 ;  68	*movqi/1	[length = 1]
.L8:
	or r18,r30	 ;  70	iorqi3/1	[length = 1]
	add r14,r22	 ;  72	addqi3/1	[length = 1]
	ldi r30,lo8(1)	 ;  73	*movqi/2	[length = 1]
	cp r14,r22	 ;  74	*cmpqi/2	[length = 1]
	brlo .L9	 ;  75	branch	[length = 1]
	ldi r30,lo8(0)	 ;  76	*movqi/1	[length = 1]
.L9:
	mov r22,r18	 ;  219	*movqi/1	[length = 1]
	add r22,r14	 ;  78	addqi3/1	[length = 1]
	ldi r18,lo8(1)	 ;  79	*movqi/2	[length = 1]
	cp r22,r14	 ;  80	*cmpqi/2	[length = 1]
	brlo .L10	 ;  81	branch	[length = 1]
	ldi r18,lo8(0)	 ;  82	*movqi/1	[length = 1]
.L10:
	or r30,r18	 ;  84	iorqi3/1	[length = 1]
	add r15,r23	 ;  86	addqi3/1	[length = 1]
	ldi r18,lo8(1)	 ;  87	*movqi/2	[length = 1]
	cp r15,r23	 ;  88	*cmpqi/2	[length = 1]
	brlo .L11	 ;  89	branch	[length = 1]
	ldi r18,lo8(0)	 ;  90	*movqi/1	[length = 1]
.L11:
	mov r23,r30	 ;  220	*movqi/1	[length = 1]
	add r23,r15	 ;  92	addqi3/1	[length = 1]
	ldi r30,lo8(1)	 ;  93	*movqi/2	[length = 1]
	cp r23,r15	 ;  94	*cmpqi/2	[length = 1]
	brlo .L12	 ;  95	branch	[length = 1]
	ldi r30,lo8(0)	 ;  96	*movqi/1	[length = 1]
.L12:
	or r18,r30	 ;  98	iorqi3/1	[length = 1]
	add r16,r24	 ;  100	addqi3/1	[length = 1]
	ldi r30,lo8(1)	 ;  101	*movqi/2	[length = 1]
	cp r16,r24	 ;  102	*cmpqi/2	[length = 1]
	brlo .L13	 ;  103	branch	[length = 1]
	ldi r30,lo8(0)	 ;  104	*movqi/1	[length = 1]
.L13:
	mov r24,r18	 ;  221	*movqi/1	[length = 1]
	add r24,r16	 ;  106	addqi3/1	[length = 1]
	ldi r18,lo8(1)	 ;  107	*movqi/2	[length = 1]
	cp r24,r16	 ;  108	*cmpqi/2	[length = 1]
	brlo .L14	 ;  109	branch	[length = 1]
	ldi r18,lo8(0)	 ;  110	*movqi/1	[length = 1]
.L14:
	or r30,r18	 ;  112	iorqi3/1	[length = 1]
	add r25,r17	 ;  114	addqi3/1	[length = 1]
	mov r18,r10	 ;  138	*movqi/1	[length = 1]
	add r25,r30	 ;  145	addqi3/1	[length = 1]
/* epilogue start */
	pop r17	 ;  232	popqi	[length = 1]
	pop r16	 ;  233	popqi	[length = 1]
	pop r15	 ;  234	popqi	[length = 1]
	pop r14	 ;  235	popqi	[length = 1]
	pop r13	 ;  236	popqi	[length = 1]
	pop r12	 ;  237	popqi	[length = 1]
	pop r11	 ;  238	popqi	[length = 1]
	pop r10	 ;  239	popqi	[length = 1]
	ret	 ;  240	return_from_epilogue	[length = 1]

I'd say that the new code is way better -- even with 24 move 
instructions. And if there are more DI operations in a line, some moves 
might vanish because only registers 18 and 10 are used.

And I even say that this approach is no worse than supplying movdi and 
let IRA/reload do the work -- at least that's my impression from the 
code that I often see from IRA, like PR50775 for example.

Johann
Georg-Johann Lay Nov. 29, 2011, 6:11 p.m. UTC | #3
Richard Henderson wrote:
> On 11/21/2011 11:31 AM, Georg-Johann Lay wrote:
>> ;; The caveat is that if there are insns for some mode, there must also be a
>> ;; respective move insn that describes reloads.  Therefore, this
>> ;; implementation uses an accumulator-based model with two hard-coded,
>> ;; accumulator-like registers
>> ;;
>> ;;    A[] = reg:DI 18
>> ;;    B[] = reg:DI 10
>> ;;
>> ;; so that no DImode insn contains pseudos or needs reloading.
> 

> Well, rtl loop optimization will not work, but given that SSE optimizations
> ought to have been performed, that's probably acceptable.
> 
> It's definitely a hack, but perhaps you'll be able to get away with it.

If I understand you correctly, it is "unconventional but safe".

So the question is how to proceed with this patch.

The only question that remains is what the -m64 option should be like?

[ ] Omit it altogether
[ ] Leave it as is (off per default)
[ ] Set it on per default

As soon as the direction is clear, I'll post a follow-up patch to add the
missing bits like, e.g., documentation for the new switch.

Johann

http://gcc.gnu.org/ml/gcc-patches/2011-11/msg02136.html
Richard Henderson Nov. 29, 2011, 6:30 p.m. UTC | #4
On 11/29/2011 10:11 AM, Georg-Johann Lay wrote:
> The only question that remains is what the -m64 option should be like?
> 
> [ ] Omit it altogether
> [ ] Leave it as is (off per default)
> [ ] Set it on per default
> 
> As soon as the direction is clear, I'll post a follow-up patch to add the
> missing bits like, e.g., documentation for the new switch.

I'll leave the decision to Denis, but I'm for omitting it.


r~
Weddington, Eric Nov. 29, 2011, 8:50 p.m. UTC | #5
> -----Original Message-----
> From: Richard Henderson 
> Sent: Tuesday, November 29, 2011 11:30 AM
> To: Georg-Johann Lay
> Cc: gcc-patches@gcc.gnu.org; Denis Chertykov; Weddington, Eric;
Anatoly
> Sokolov
> Subject: Re: [Patch,AVR] Light-weight DImode implementation.
> 
> On 11/29/2011 10:11 AM, Georg-Johann Lay wrote:
> > The only question that remains is what the -m64 option should be
like?
> >
> > [ ] Omit it altogether
> > [ ] Leave it as is (off per default)
> > [ ] Set it on per default
> >
> > As soon as the direction is clear, I'll post a follow-up patch to
add
> the
> > missing bits like, e.g., documentation for the new switch.
> 
> I'll leave the decision to Denis, but I'm for omitting it.

I will also defer to Denis, but I'd rather avoid having another option,
if we can. Keep it simple for the users.

Eric Weddington
Denis Chertykov Nov. 30, 2011, 7:35 a.m. UTC | #6
2011/11/30 Weddington, Eric <Eric.Weddington@atmel.com>:
>
>
>> -----Original Message-----
>> From: Richard Henderson
>> Sent: Tuesday, November 29, 2011 11:30 AM
>> To: Georg-Johann Lay
>> Cc: gcc-patches@gcc.gnu.org; Denis Chertykov; Weddington, Eric;
> Anatoly
>> Sokolov
>> Subject: Re: [Patch,AVR] Light-weight DImode implementation.
>>
>> On 11/29/2011 10:11 AM, Georg-Johann Lay wrote:
>> > The only question that remains is what the -m64 option should be
> like?
>> >
>> > [ ] Omit it altogether
>> > [ ] Leave it as is (off per default)
>> > [ ] Set it on per default
>> >
>> > As soon as the direction is clear, I'll post a follow-up patch to
> add
>> the
>> > missing bits like, e.g., documentation for the new switch.
>>
>> I'll leave the decision to Denis, but I'm for omitting it.
>
> I will also defer to Denis, but I'd rather avoid having another option,
> if we can. Keep it simple for the users.
>

I'm agree with Richard. I'm for omitting it.

Denis.
Paolo Bonzini Nov. 30, 2011, 10:41 a.m. UTC | #7
On 11/22/2011 01:15 AM, Georg-Johann Lay wrote:
>
>      ldi r30,lo8(1)     ;  25    *movqi/2    [length = 1]
>      cp r10,r18     ;  26    *cmpqi/2    [length = 1]
>      brlo .L2     ;  27    branch    [length = 1]
>      ldi r30,lo8(0)     ;  28    *movqi/1    [length = 1]
> .L2:
>      add r11,r19     ;  30    addqi3/1    [length = 1]

(WARNING: there are a lot of "if"s in this message.  However, some of 
the ideas here might give better code in general).

I noticed that the AVR backend does not have cstore, movcc, addcc. 
These are quite useful in general, and especially with the kind of code 
that expand_binop produces.  For example all of these can be implemented 
in terms of adc or sbc:

     x = (a < b)            (cstore ltu, using ldi+adc)
     x = (a < b) ? -1 : 0   (movcc ltu, using ldi+sbc)
     x += (a < b) ? 1 : 0   (addcc ltu, using adc)
     x += (a < b) ? -1 : 0  (addcc ltu, using sbc)

In the code above, cstore could change the brlo/ldi pair to a single adc 
or sbc.  addcc could also change a ldi/cp/brlo/ldi/add to cp+adc. 
expand_binop does not try to use addcc currently, but it could do so 
instead of cstore+or in the computation of the carry.

Then if final can successfully combine the "add" and "cp" instructions, 
you'd get something like:

     add r10,r18             ; sum 0
     ldi r30,lo8(0)          ; carryout 0
     adc r30,__zero_reg__
     add r11,r19             ; sum 1
     ldi r18,lo8(0)          ; carryout 1
     adc r18,__zero_reg__
     mov r19,r30             ; sum carry 1
     add r19,r11
     adc r18,__zero_reg__    ; carryout 1
     add r12,r20             ; sum 2
     ldi r30,lo8(0)          ; carryout 2
     adc r30,__zero_reg__
     mov r20,r18             ; sum carry 2
     add r20,r12
     adc r30,__zero_reg__    ; carryout 2
     add r13,r21             ; sum 3
     ldi r18,lo8(0)          ; carryout 3
     adc r18,__zero_reg__
     mov r21,r30             ; sum carry 3
     add r21,r13
     adc r18,__zero_reg__    ; carryout 3
     add r14,r22             ; sum 4
     ldi r30,lo8(0)          ; carryout 4
     adc r30,__zero_reg__
     mov r22,r18             ; sum carry 4
     add r22,r14
     adc r30,__zero_reg__    ; carryout 4
     add r15,r23             ; sum 5
     ldi r18,lo8(0)          ; carryout 5
     adc r18,__zero_reg__
     mov r23,r30             ; sum carry 5
     add r23,r15
     adc r18,__zero_reg__    ; carryout 5
     add r16,r24             ; sum 6
     ldi r30,lo8(0)          ; carryout 6
     adc r30,__zero_reg__
     mov r24,r18             ; sum carry 6
     add r24,r16
     adc r30,__zero_reg__    ; carryout 6
     add r25,r17             ; sum 7
     mov r18,r10
     add r25,r30             ; sum carry 7

Still suboptimal, but already much better than what you get now.

expand_binop could also sum the carry first, and the operand second, 
which would avoid overlapping live ranges for the carry:

     add r18,r10             ; sum 0
     ldi r30,lo8(0)          ; carryout 0
     adc r30,__zero_reg__
     add r19,r30             ; sum carry 1
     ldi r30,lo8(0)          ; carryout 1
     adc r30,__zero_reg__
     add r19,r11             ; sum 1
     adc r30,__zero_reg__    ; carryout 1
     add r20,r30             ; sum carry 2
     ldi r30,lo8(0)          ; carryout 2
     adc r30,__zero_reg__
     add r20,r12             ; sum 2
     adc r30,__zero_reg__    ; carryout 2
     add r21,r30             ; sum carry 3
     ldi r30,lo8(0)          ; carryout 3
     adc r30,__zero_reg__
     add r21,r13             ; sum 3
     adc r30,__zero_reg__    ; carryout 3
     add r22,r30             ; sum carry 4
     ldi r30,lo8(0)          ; carryout 4
     adc r30,__zero_reg__
     add r22,r14             ; sum 4
     adc r30,__zero_reg__    ; carryout 4
     add r23,r30             ; sum carry 5
     ldi r30,lo8(0)          ; carryout 5
     adc r30,__zero_reg__
     add r23,r15             ; sum 5
     adc r30,__zero_reg__    ; carryout 5
     add r24,r30             ; sum carry 6
     ldi r30,lo8(0)          ; carryout 6
     adc r30,__zero_reg__
     add r24,r16             ; sum 6
     adc r30,__zero_reg__    ; carryout 6
     add r25,r30             ; sum carry 7
     add r25,r17             ; sum 7

The code now is very regular and r30 dies often, so that playing with 
peephole2 could give decent code.  This code:

     (cp rAA, rBB)                          ; eliminated by final
     ldi r30,lo8(0)          ; carryout 0
     adc r30,__zero_reg__
     add rXX,r30             ; sum carry 1
     (cp rXX, r30)                          ; eliminated by final
     ldi r30,lo8(0)          ; carryout 1
     adc r30,__zero_reg__
     add rXX,rZZ             ; sum 1
     (cp rXX, rZZ)                          ; eliminated by final
     adc r30,__zero_reg__    ; carryout 1
     add rYY,r30             ; sum carry 2

can be rewritten to

     (cp rAA, rBB)                          ; eliminated by final
     adc rXX,rZZ
     (cp rXX, rZZ)                          ; eliminated by final
     ldi r30,lo8(0)          ; carryout 1
     adc r30,__zero_reg__
     add rYY,r30             ; sum carry 2

Now, in theory the final four instructions would chain again into the 
same pattern (not sure peephole2 does this properly because it scan 
backwards; if it doesn't, you have to code the peepholes from the 
bottom) until you have

     add r18,r10
     adc r19,r11
     adc r20,r12
     adc r21,r13
     adc r22,r14
     adc r23,r15
     adc r24,r16
     ldi r30,lo8(0)          ; carryout 6
     adc r30,__zero_reg__
     add r25,r30             ; sum carry 7
     add r25,r17             ; sum 7

and finally with a simpler peephole (cp/ldi+adc/add/add to cp/adc):

     add r18,r10
     adc r19,r11
     adc r20,r12
     adc r21,r13
     adc r22,r14
     adc r23,r15
     adc r24,r16
     adc r25,r17

Paolo
diff mbox

Patch

Index: gcc/config/avr/avr-dimode.md
===================================================================
--- gcc/config/avr/avr-dimode.md	(revision 0)
+++ gcc/config/avr/avr-dimode.md	(revision 0)
@@ -0,0 +1,283 @@ 
+;;   Machine description for GNU compiler,
+;;   for Atmel AVR micro controllers.
+;;   Copyright (C) 1998 - 2011
+;;   Free Software Foundation, Inc.
+;;   Contributed by Georg Lay (avr@gjlay.de)
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+;; The purpose of this file is to provide a light-weight DImode
+;; implementation for AVR.  The trouble with DImode is that tree -> RTL
+;; lowering leads to really unpleasant code for operations that don't
+;; work byte-wise like NEG, PLUS, MINUS, etc.  Defining optabs entries for
+;; them won't help because the optab machinery assumes these operations
+;; are cheap and does not check if a libgcc implementation is available.
+;;
+;; The DImode insns are all straight forward -- except movdi.  The approach
+;; of this implementation is to provide DImode insns without the burden of
+;; introducing movdi.
+;; 
+;; The caveat is that if there are insns for some mode, there must also be a
+;; respective move insn that describes reloads.  Therefore, this
+;; implementation uses an accumulator-based model with two hard-coded,
+;; accumulator-like registers
+;;
+;;    A[] = reg:DI 18
+;;    B[] = reg:DI 10
+;;
+;; so that no DImode insn contains pseudos or needs reloading.
+
+(define_constants
+  [(ACC_A	18)
+   (ACC_B	10)])
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; Addition
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+(define_expand "adddi3"
+  [(parallel [(match_operand:DI 0 "general_operand" "")
+              (match_operand:DI 1 "general_operand" "")
+              (match_operand:DI 2 "general_operand" "")])]
+  "avr_have_dimode"
+  {
+    rtx acc_a = gen_rtx_REG (DImode, ACC_A);
+
+    emit_move_insn (acc_a, operands[1]);
+
+    if (s8_operand (operands[2], VOIDmode))
+      {
+        emit_move_insn (gen_rtx_REG (QImode, REG_X), operands[2]);
+        emit_insn (gen_adddi3_const8_insn ());
+      }        
+    else if (CONST_INT_P (operands[2])
+             || CONST_DOUBLE_P (operands[2]))
+      {
+        emit_insn (gen_adddi3_const_insn (operands[2]));
+      }
+    else
+      {
+        emit_move_insn (gen_rtx_REG (DImode, ACC_B), operands[2]);
+        emit_insn (gen_adddi3_insn ());
+      }
+
+    emit_move_insn (operands[0], acc_a);
+    DONE;
+  })
+
+(define_insn "adddi3_insn"
+  [(set (reg:DI ACC_A)
+        (plus:DI (reg:DI ACC_A)
+                 (reg:DI ACC_B)))]
+  "avr_have_dimode"
+  "%~call __adddi3"
+  [(set_attr "adjust_len" "call")
+   (set_attr "cc" "clobber")])
+
+(define_insn "adddi3_const8_insn"
+  [(set (reg:DI ACC_A)
+        (plus:DI (reg:DI ACC_A)
+                 (sign_extend:DI (reg:QI REG_X))))]
+  "avr_have_dimode"
+  "%~call __adddi3_s8"
+  [(set_attr "adjust_len" "call")
+   (set_attr "cc" "clobber")])
+
+(define_insn "adddi3_const_insn"
+  [(set (reg:DI ACC_A)
+        (plus:DI (reg:DI ACC_A)
+                 (match_operand:DI 0 "const_double_operand" "n")))]
+  "avr_have_dimode
+   && !s8_operand (operands[0], VOIDmode)"
+  {
+    return avr_out_plus64 (operands[0], NULL);
+  }
+  [(set_attr "adjust_len" "plus64")
+   (set_attr "cc" "clobber")])
+
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; Subtraction
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+(define_expand "subdi3"
+  [(parallel [(match_operand:DI 0 "general_operand" "")
+              (match_operand:DI 1 "general_operand" "")
+              (match_operand:DI 2 "general_operand" "")])]
+  "avr_have_dimode"
+  {
+    rtx acc_a = gen_rtx_REG (DImode, ACC_A);
+
+    emit_move_insn (acc_a, operands[1]);
+    emit_move_insn (gen_rtx_REG (DImode, ACC_B), operands[2]);
+    emit_insn (gen_subdi3_insn ());
+    emit_move_insn (operands[0], acc_a);
+    DONE;
+  })
+
+(define_insn "subdi3_insn"
+  [(set (reg:DI ACC_A)
+        (minus:DI (reg:DI ACC_A)
+                  (reg:DI ACC_B)))]
+  "avr_have_dimode"
+  "%~call __subdi3"
+  [(set_attr "adjust_len" "call")
+   (set_attr "cc" "set_czn")])
+
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; Negation
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+(define_expand "negdi2"
+  [(parallel [(match_operand:DI 0 "general_operand" "")
+              (match_operand:DI 1 "general_operand" "")])]
+  "avr_have_dimode"
+  {
+    rtx acc_a = gen_rtx_REG (DImode, ACC_A);
+
+    emit_move_insn (acc_a, operands[1]);
+    emit_insn (gen_negdi2_insn ());
+    emit_move_insn (operands[0], acc_a);
+    DONE;
+  })
+
+(define_insn "negdi2_insn"
+  [(set (reg:DI ACC_A)
+        (neg:DI (reg:DI ACC_A)))]
+  "avr_have_dimode"
+  "%~call __negdi2"
+  [(set_attr "adjust_len" "call")
+   (set_attr "cc" "clobber")])
+
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; Comparison
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+(define_expand "conditional_jump"
+  [(set (pc)
+        (if_then_else
+         (match_operator 0 "ordered_comparison_operator" [(cc0)
+                                                          (const_int 0)])
+         (label_ref (match_operand 1 "" ""))
+         (pc)))]
+  "avr_have_dimode")
+
+(define_expand "cbranchdi4"
+  [(parallel [(match_operand:DI 1 "register_operand" "")
+              (match_operand:DI 2 "nonmemory_operand" "")
+              (match_operator 0 "ordered_comparison_operator" [(cc0)
+                                                               (const_int 0)])
+              (label_ref (match_operand 3 "" ""))])]
+  "avr_have_dimode"
+  {
+    rtx acc_a = gen_rtx_REG (DImode, ACC_A);
+
+    emit_move_insn (acc_a, operands[1]);
+
+    if (s8_operand (operands[2], VOIDmode))
+      {
+        emit_move_insn (gen_rtx_REG (QImode, REG_X), operands[2]);
+        emit_insn (gen_compare_const8_di2 ());
+      }        
+    else if (CONST_INT_P (operands[2])
+             || CONST_DOUBLE_P (operands[2]))
+      {
+        emit_insn (gen_compare_const_di2 (operands[2]));
+      }
+    else
+      {
+        emit_move_insn (gen_rtx_REG (DImode, ACC_B), operands[2]);
+        emit_insn (gen_compare_di2 ());
+      }
+
+    emit_jump_insn (gen_conditional_jump (operands[0], operands[3]));
+    DONE;
+  })
+
+(define_insn "compare_di2"
+  [(set (cc0)
+        (compare (reg:DI ACC_A)
+                 (reg:DI ACC_B)))]
+  "avr_have_dimode"
+  "%~call __cmpdi2"
+  [(set_attr "adjust_len" "call")
+   (set_attr "cc" "compare")])
+
+(define_insn "compare_const8_di2"
+  [(set (cc0)
+        (compare (reg:DI ACC_A)
+                 (sign_extend:DI (reg:QI REG_X))))]
+  "avr_have_dimode"
+  "%~call __cmpdi2_s8"
+  [(set_attr "adjust_len" "call")
+   (set_attr "cc" "compare")])
+
+(define_insn "compare_const_di2"
+  [(set (cc0)
+        (compare (reg:DI ACC_A)
+                 (match_operand:DI 0 "const_double_operand" "n")))
+   (clobber (match_scratch:QI 1 "=&d"))]
+  "avr_have_dimode
+   && !s8_operand (operands[0], VOIDmode)"
+  {
+    return avr_out_compare64 (insn, operands, NULL);
+  }
+  [(set_attr "adjust_len" "compare64")
+   (set_attr "cc" "compare")])
+
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; Shifts and Rotate
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+(define_code_iterator di_shifts
+  [ashift ashiftrt lshiftrt rotate])
+
+;; Shift functions from libgcc are called without defining these insns,
+;; but with them we can describe their reduced register footprint.
+
+;; "ashldi3"
+;; "ashrdi3"
+;; "lshrdi3"
+;; "rotldi3"
+(define_expand "<code_stdname>di3"
+  [(parallel [(match_operand:DI 0 "general_operand" "")
+              (di_shifts:DI (match_operand:DI 1 "general_operand" "")
+                            (match_operand:QI 2 "general_operand" ""))])]
+  "avr_have_dimode"
+  {
+    rtx acc_a = gen_rtx_REG (DImode, ACC_A);
+
+    emit_move_insn (acc_a, operands[1]);
+    emit_move_insn (gen_rtx_REG (QImode, 16), operands[2]);
+    emit_insn (gen_<code_stdname>di3_insn ());
+    emit_move_insn (operands[0], acc_a);
+    DONE;
+  })
+
+(define_insn "<code_stdname>di3_insn"
+  [(set (reg:DI ACC_A)
+        (di_shifts:DI (reg:DI ACC_A)
+                      (reg:QI 16)))]
+  "avr_have_dimode"
+  "%~call __<code_stdname>di3"
+  [(set_attr "adjust_len" "call")
+   (set_attr "cc" "clobber")])
Index: gcc/config/avr/avr.md
===================================================================
--- gcc/config/avr/avr.md	(revision 181592)
+++ gcc/config/avr/avr.md	(working copy)
@@ -131,8 +131,8 @@  (define_attr "length" ""
 ;; Otherwise do special processing depending on the attribute.
 
 (define_attr "adjust_len"
-  "out_bitop, out_plus, out_plus_noclobber, addto_sp,
-   tsthi, tstpsi, tstsi, compare, call,
+  "out_bitop, out_plus, out_plus_noclobber, plus64, addto_sp,
+   tsthi, tstpsi, tstsi, compare, compare64, call,
    mov8, mov16, mov24, mov32, reload_in16, reload_in24, reload_in32,
    xload, movmem,
    ashlqi, ashrqi, lshrqi,
@@ -206,7 +206,6 @@  (define_mode_iterator QIHI  [(QI "") (HI
 (define_mode_iterator QIHI2 [(QI "") (HI "")])
 (define_mode_iterator QISI [(QI "") (HI "") (PSI "") (SI "")])
 (define_mode_iterator QIDI [(QI "") (HI "") (PSI "") (SI "") (DI "")])
-(define_mode_iterator HIDI [(HI "") (PSI "") (SI "") (DI "")])
 (define_mode_iterator HISI [(HI "") (PSI "") (SI "")])
 
 ;; All supported move-modes
@@ -235,6 +234,12 @@  (define_code_attr mul_r_d
   [(zero_extend "r")
    (sign_extend "d")])
 
+;; Map RTX code to its standard insn name
+(define_code_attr code_stdname
+  [(ashift   "ashl")
+   (ashiftrt "ashr")
+   (lshiftrt "lshr")
+   (rotate   "rotl")])
 
 ;;========================================================================
 ;; The following is used by nonlocal_goto and setjmp.
@@ -2956,23 +2961,21 @@  (define_insn "*rotlqi3"
   [(set_attr "length" "2,4,4,1,3,5,3,0")
    (set_attr "cc" "set_n,set_n,clobber,none,set_n,set_n,clobber,none")])
 
-;; Split all rotates of HI,SI and DImode registers where rotation is by
+;; Split all rotates of HI,SI and PSImode registers where rotation is by
 ;; a whole number of bytes.  The split creates the appropriate moves and
-;; considers all overlap situations.  DImode is split before reload.
+;; considers all overlap situations.
 
 ;; HImode does not need scratch.  Use attribute for this constraint.
-;; Use QI scratch for DI mode as this is often split into byte sized operands.
 
-(define_mode_attr rotx [(DI "&r,&r,X") (SI "&r,&r,X") (PSI "&r,&r,X") (HI "X,X,X")])
-(define_mode_attr rotsmode [(DI "QI") (SI "HI") (PSI "QI") (HI "QI")])
+(define_mode_attr rotx [(SI "&r,&r,X") (PSI "&r,&r,X") (HI "X,X,X")])
+(define_mode_attr rotsmode [(SI "HI") (PSI "QI") (HI "QI")])
 
 ;; "rotlhi3"
 ;; "rotlpsi3"
 ;; "rotlsi3"
-;; "rotldi3"
 (define_expand "rotl<mode>3"
-  [(parallel [(set (match_operand:HIDI 0 "register_operand" "")
-                   (rotate:HIDI (match_operand:HIDI 1 "register_operand" "")
+  [(parallel [(set (match_operand:HISI 0 "register_operand" "")
+                   (rotate:HISI (match_operand:HISI 1 "register_operand" "")
                                 (match_operand:VOID 2 "const_int_operand" "")))
               (clobber (match_dup 3))])]
   ""
@@ -2991,9 +2994,8 @@  (define_expand "rotl<mode>3"
         else
           operands[3] = gen_rtx_SCRATCH (QImode);
       }
-    else if (<MODE>mode != DImode
-             && (offset == 1
-                 || offset == GET_MODE_BITSIZE (<MODE>mode) -1))
+    else if (offset == 1
+             || offset == GET_MODE_BITSIZE (<MODE>mode) -1)
       {
         /*; Support rotate left/right by 1  */
 
@@ -3069,18 +3071,17 @@  (define_insn "*rotlsi2.31"
 
 ;; "*rotwhi"
 ;; "*rotwsi"
-;; "*rotwdi"
 (define_insn_and_split "*rotw<mode>"
-  [(set (match_operand:HIDI 0 "register_operand" "=r,r,#&r")
-        (rotate:HIDI (match_operand:HIDI 1 "register_operand" "0,r,r")
-                     (match_operand 2 "const_int_operand" "n,n,n")))
+  [(set (match_operand:HISI 0 "register_operand"             "=r,r,#&r")
+        (rotate:HISI (match_operand:HISI 1 "register_operand" "0,r,r")
+                     (match_operand 2 "const_int_operand"     "n,n,n")))
    (clobber (match_scratch:<rotsmode> 3 "=<rotx>"))]
   "AVR_HAVE_MOVW
    && CONST_INT_P (operands[2])
    && GET_MODE_SIZE (<MODE>mode) % 2 == 0
    && 0 == INTVAL (operands[2]) % 16"
   "#"
-  "&& (reload_completed || <MODE>mode == DImode)"
+  "&& reload_completed"
   [(const_int 0)]
   {
     avr_rotate_bytes (operands);
@@ -3093,11 +3094,10 @@  (define_insn_and_split "*rotw<mode>"
 ;; "*rotbhi"
 ;; "*rotbpsi"
 ;; "*rotbsi"
-;; "*rotbdi"
 (define_insn_and_split "*rotb<mode>"
-  [(set (match_operand:HIDI 0 "register_operand" "=r,r,#&r")
-        (rotate:HIDI (match_operand:HIDI 1 "register_operand" "0,r,r")
-                     (match_operand 2 "const_int_operand" "n,n,n")))
+  [(set (match_operand:HISI 0 "register_operand"             "=r,r,#&r")
+        (rotate:HISI (match_operand:HISI 1 "register_operand" "0,r,r")
+                     (match_operand 2 "const_int_operand"     "n,n,n")))
    (clobber (match_scratch:QI 3 "=<rotx>"))]
   "CONST_INT_P (operands[2])
    && (8 == INTVAL (operands[2]) % 16
@@ -3105,7 +3105,7 @@  (define_insn_and_split "*rotb<mode>"
             || GET_MODE_SIZE (<MODE>mode) % 2 != 0)
            && 0 == INTVAL (operands[2]) % 16))"
   "#"
-  "&& (reload_completed || <MODE>mode == DImode)"
+  "&& reload_completed"
   [(const_int 0)]
   {
     avr_rotate_bytes (operands);
@@ -5779,3 +5779,5 @@  (define_insn_and_split "*extzv.qihi2"
     operands[3] = simplify_gen_subreg (QImode, operands[0], HImode, 0);
     operands[4] = simplify_gen_subreg (QImode, operands[0], HImode, 1);
   })
+
+(include "avr-dimode.md")
Index: gcc/config/avr/avr.opt
===================================================================
--- gcc/config/avr/avr.opt	(revision 181554)
+++ gcc/config/avr/avr.opt	(working copy)
@@ -77,3 +77,7 @@  When accessing RAM, use X as imposed by
 mbranch-cost=
 Target Report RejectNegative Joined UInteger Var(avr_branch_cost) Init(0)
 Set the cost of a branch instruction.  Default value is 0.
+
+m64
+Target Report Var(avr_have_dimode) Init(0)
+Experimental.
Index: gcc/config/avr/avr-protos.h
===================================================================
--- gcc/config/avr/avr-protos.h	(revision 181554)
+++ gcc/config/avr/avr-protos.h	(working copy)
@@ -56,6 +56,7 @@  extern const char *avr_out_tstsi (rtx, r
 extern const char *avr_out_tsthi (rtx, rtx*, int*);
 extern const char *avr_out_tstpsi (rtx, rtx*, int*);
 extern const char *avr_out_compare (rtx, rtx*, int*);
+extern const char *avr_out_compare64 (rtx, rtx*, int*);
 extern const char *ret_cond_branch (rtx x, int len, int reverse);
 extern const char *avr_out_movpsi (rtx, rtx*, int*);
 
@@ -89,6 +90,7 @@  extern const char *avr_out_sbxx_branch (
 extern const char* avr_out_bitop (rtx, rtx*, int*);
 extern const char* avr_out_plus (rtx*, int*, int*);
 extern const char* avr_out_plus_noclobber (rtx*, int*, int*);
+extern const char* avr_out_plus64 (rtx, int*);
 extern const char* avr_out_addto_sp (rtx*, int*);
 extern const char* avr_out_xload (rtx, rtx*, int*);
 extern const char* avr_out_movmem (rtx, rtx*, int*);
Index: gcc/config/avr/avr.c
===================================================================
--- gcc/config/avr/avr.c	(revision 181592)
+++ gcc/config/avr/avr.c	(working copy)
@@ -4031,14 +4031,17 @@  avr_out_compare (rtx insn, rtx *xop, int
   /* Value (0..0xff) held in clobber register xop[2] or -1 if unknown.  */
   int clobber_val = -1;
 
-  gcc_assert (REG_P (xreg)
-              && CONST_INT_P (xval));
+  gcc_assert (REG_P (xreg));
+  gcc_assert ((CONST_INT_P (xval) && n_bytes <= 4)
+              || (const_double_operand (xval, VOIDmode) && n_bytes == 8));
   
   if (plen)
     *plen = 0;
 
   /* Comparisons == +/-1 and != +/-1 can be done similar to camparing
-     against 0 by ORing the bytes.  This is one instruction shorter.  */
+     against 0 by ORing the bytes.  This is one instruction shorter.
+     Notice that DImode comparisons are always against reg:DI 18
+     and therefore don't use this.  */
 
   if (!test_hard_reg_class (LD_REGS, xreg)
       && compare_eq_p (insn)
@@ -4156,6 +4159,20 @@  avr_out_compare (rtx insn, rtx *xop, int
 }
 
 
+/* Prepare operands of compare_const_di2 to be used with avr_out_compare.  */
+
+const char*
+avr_out_compare64 (rtx insn, rtx *op, int *plen)
+{
+  rtx xop[3];
+
+  xop[0] = gen_rtx_REG (DImode, 18);
+  xop[1] = op[0];
+  xop[2] = op[1];
+
+  return avr_out_compare (insn, xop, plen);
+}
+
 /* Output test instruction for HImode.  */
 
 const char*
@@ -5795,7 +5812,7 @@  avr_out_plus_1 (rtx *xop, int *plen, enu
   *pcc = (MINUS == code) ? CC_SET_CZN : CC_CLOBBER;
 
   if (MINUS == code)
-    xval = gen_int_mode (-UINTVAL (xval), mode);
+    xval = simplify_unary_operation (NEG, mode, xval, mode);
 
   op[2] = xop[3];
 
@@ -5970,6 +5987,25 @@  avr_out_plus_noclobber (rtx *xop, int *p
   return avr_out_plus (op, plen, pcc);
 }
 
+
+/* Prepare operands of adddi3_const_insn to be used with avr_out_plus_1.  */
+
+const char*
+avr_out_plus64 (rtx addend, int *plen)
+{
+  int cc_dummy;
+  rtx op[4];
+
+  op[0] = gen_rtx_REG (DImode, 18);
+  op[1] = op[0];
+  op[2] = addend;
+  op[3] = NULL_RTX;
+
+  avr_out_plus_1 (op, plen, MINUS, &cc_dummy);
+
+  return "";
+}
+
 /* Output bit operation (IOR, AND, XOR) with register XOP[0] and compile
    time constant XOP[2]:
 
@@ -6355,6 +6391,7 @@  adjust_insn_length (rtx insn, int len)
     case ADJUST_LEN_OUT_BITOP: avr_out_bitop (insn, op, &len); break;
       
     case ADJUST_LEN_OUT_PLUS: avr_out_plus (op, &len, NULL); break;
+    case ADJUST_LEN_PLUS64: avr_out_plus64 (op[0], &len); break;
     case ADJUST_LEN_OUT_PLUS_NOCLOBBER:
       avr_out_plus_noclobber (op, &len, NULL); break;
 
@@ -6371,6 +6408,7 @@  adjust_insn_length (rtx insn, int len)
     case ADJUST_LEN_TSTPSI: avr_out_tstpsi (insn, op, &len); break;
     case ADJUST_LEN_TSTSI: avr_out_tstsi (insn, op, &len); break;
     case ADJUST_LEN_COMPARE: avr_out_compare (insn, op, &len); break;
+    case ADJUST_LEN_COMPARE64: avr_out_compare64 (insn, op, &len); break;
 
     case ADJUST_LEN_LSHRQI: lshrqi3_out (insn, op, &len); break;
     case ADJUST_LEN_LSHRHI: lshrhi3_out (insn, op, &len); break;
@@ -8327,7 +8365,9 @@  avr_compare_pattern (rtx insn)
   if (pattern
       && NONJUMP_INSN_P (insn)
       && SET_DEST (pattern) == cc0_rtx
-      && GET_CODE (SET_SRC (pattern)) == COMPARE)
+      && GET_CODE (SET_SRC (pattern)) == COMPARE
+      && DImode != GET_MODE (XEXP (SET_SRC (pattern), 0))
+      && DImode != GET_MODE (XEXP (SET_SRC (pattern), 1)))
     {
       return pattern;
     }
Index: libgcc/config/avr/lib1funcs.S
===================================================================
--- libgcc/config/avr/lib1funcs.S	(revision 181554)
+++ libgcc/config/avr/lib1funcs.S	(working copy)
@@ -1161,6 +1161,71 @@  ENDF __divdi3_moddi3
 
 #endif /* L_divdi3 */
 
+.section .text.libgcc, "ax", @progbits
+
+#define TT __tmp_reg__
+
+#if defined (L_adddi3)
+;; (set (reg:DI 18)
+;;      (plus:DI (reg:DI 18)
+;;               (reg:DI 10)))
+DEFUN __adddi3
+    ADD A0,B0  $  adc A1,B1  $  adc A2,B2  $  adc A3,B3
+    adc A4,B4  $  adc A5,B5  $  adc A6,B6  $  adc A7,B7
+    ret
+ENDF __adddi3
+#endif /* L_adddi3 */
+
+#if defined (L_adddi3_s8)
+;; (set (reg:DI 18)
+;;      (plus:DI (reg:DI 18)
+;;               (sign_extend:SI (reg:QI 26))))
+DEFUN __adddi3_s8
+    clr     TT
+    sbrc    r26, 7
+    com     TT
+    ADD A0,r26 $  adc A1,TT  $  adc A2,TT  $  adc A3,TT
+    adc A4,TT  $  adc A5,TT  $  adc A6,TT  $  adc A7,TT
+    ret
+ENDF __adddi3_s8
+#endif /* L_adddi3_s8 */
+
+#if defined (L_subdi3)
+;; (set (reg:DI 18)
+;;      (minus:DI (reg:DI 18)
+;;                (reg:DI 10)))
+DEFUN __subdi3
+    SUB A0,B0  $  sbc A1,B1  $  sbc A2,B2  $  sbc A3,B3
+    sbc A4,B4  $  sbc A5,B5  $  sbc A6,B6  $  sbc A7,B7
+    ret
+ENDF __subdi3
+#endif /* L_subdi3 */
+
+#if defined (L_cmpdi2)
+;; (set (cc0)
+;;      (compare (reg:DI 18)
+;;               (reg:DI 10)))
+DEFUN __cmpdi2
+    CP  A0,B0  $  cpc A1,B1  $  cpc A2,B2  $  cpc A3,B3
+    cpc A4,B4  $  cpc A5,B5  $  cpc A6,B6  $  cpc A7,B7
+    ret
+ENDF __cmpdi2
+#endif /* L_cmpdi2 */
+
+#if defined (L_cmpdi2_s8)
+;; (set (cc0)
+;;      (compare (reg:DI 18)
+;;               (sign_extend:SI (reg:QI 26))))
+DEFUN __cmpdi2_s8
+    clr     TT
+    sbrc    r26, 7
+    com     TT
+    CP  A0,r26 $  cpc A1,TT  $  cpc A2,TT  $  cpc A3,TT
+    cpc A4,TT  $  cpc A5,TT  $  cpc A6,TT  $  cpc A7,TT
+    ret
+ENDF __cmpdi2_s8
+#endif /* L_cmpdi2_s8 */
+
 #if defined (L_negdi2)
 DEFUN __negdi2
 
@@ -1174,6 +1239,8 @@  DEFUN __negdi2
 ENDF __negdi2
 #endif /* L_negdi2 */
 
+#undef TT
+
 #undef C7
 #undef C6
 #undef C5
@@ -2052,6 +2119,29 @@  DEFUN __ashldi3
 ENDF __ashldi3
 #endif /* defined (L_ashldi3) */
 
+#if defined (L_rotldi3)
+;; Shift left
+;; r25:r18 = rotl64 (r25:r18, r17:r16)
+DEFUN __rotldi3
+    push r16
+    andi r16, 63
+    breq 2f
+1:  lsl  r18
+    rol  r19
+    rol  r20
+    rol  r21
+    rol  r22
+    rol  r23
+    rol  r24
+    rol  r25
+    adc  r18, __zero_reg__
+    dec  r16
+    brne 1b
+2:  pop  r16
+    ret
+ENDF __rotldi3
+#endif /* defined (L_rotldi3) */
+
 
 .section .text.libgcc.fmul, "ax", @progbits
 
Index: libgcc/config/avr/t-avr
===================================================================
--- libgcc/config/avr/t-avr	(revision 181554)
+++ libgcc/config/avr/t-avr	(working copy)
@@ -47,9 +47,9 @@  LIB1ASMFUNCS = \
 	_popcountqi2 \
 	_bswapsi2 \
 	_bswapdi2 \
-	_ashldi3 \
-	_ashrdi3 \
-	_lshrdi3 \
+	_ashldi3 _ashrdi3 _lshrdi3 _rotldi3 \
+	_adddi3 _adddi3_s8 _subdi3 \
+	_cmpdi2 _cmpdi2_s8 \
 	_fmul _fmuls _fmulsu
 
 LIB2FUNCS_EXCLUDE = \