Patchwork PATCH: PR target/44588: Very inefficient 8bit mod/div

login
register
mail settings
Submitter H.J. Lu
Date June 22, 2010, 6:27 p.m.
Message ID <AANLkTinvsAQe57kp-irwq_eD3cmUlMmayXHsfnMZNaMN@mail.gmail.com>
Download mbox | patch
Permalink /patch/56560/
State New
Headers show

Comments

H.J. Lu - June 22, 2010, 6:27 p.m.
On Tue, Jun 22, 2010 at 11:05 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Mon, 2010-06-21 at 12:33 -0700, H.J. Lu wrote:
>> Hi,
>>
>> This patch adds 8bit divmov pattern for x86. X86 8bit divide
>> instructions return result in AX with
>>
>> AL <- Quotient
>> AH <- Remainder
>>
>> This patch models it and properly extends quotient. Tested
>> on Intel64 with -m64 and -m32.  There are no regressions.
>> OK for trunk?
>>
>> BTW, there is only one divb used in subreg_get_info in
>> gcc compilers. The old code is
>>
>>         movzbl  mode_size(%r13), %edi
>>         movzbl  mode_size(%r14), %esi
>>         xorl    %edx, %edx
>>         movl    %edi, %eax
>>         divw    %si
>>         testw   %dx, %dx
>>         jne     .L1194
>>
>> The new one is
>>
>>         movzbl  mode_size(%r13), %edi
>>         movl    %edi, %eax
>>         divb    mode_size(%r14)
>>         movzbl  %ah, %eax
>>         testb   %al, %al
>>         jne     .L1194
>>
>
> Hm, something is not combined correctly, I'd say "testb %ah, %ah" is
> optimal in the second case.
>

Here is another update adjusted for mov pattern changes in i386.md.

8bit result is stored in

AL <- Quotient
AH <- Remainder

If we use AX for quotient in 8bit divmod pattern, we have to make
sure that AX is valid for quotient.  We have to extend AL with UNSPEC
since AH isn't the part of quotient,.  Instead, I use AL for quotient and
use UNSPEC_MOVQI_EXTZH to extract remainder from AL. Quotient
access can be optimized very nicely. If remainder is used, we may have
an extract move for UNSPEC_MOVQI_EXTZH. I think this is a reasonable
comprise.
Uros Bizjak - June 22, 2010, 6:44 p.m.
On Tue, 2010-06-22 at 11:27 -0700, H.J. Lu wrote:

> >> This patch adds 8bit divmov pattern for x86. X86 8bit divide
> >> instructions return result in AX with
> >>
> >> AL <- Quotient
> >> AH <- Remainder
> >>
> >> This patch models it and properly extends quotient. Tested
> >> on Intel64 with -m64 and -m32.  There are no regressions.
> >> OK for trunk?
> >>
> >> BTW, there is only one divb used in subreg_get_info in
> >> gcc compilers. The old code is
> >>
> >>         movzbl  mode_size(%r13), %edi
> >>         movzbl  mode_size(%r14), %esi
> >>         xorl    %edx, %edx
> >>         movl    %edi, %eax
> >>         divw    %si
> >>         testw   %dx, %dx
> >>         jne     .L1194
> >>
> >> The new one is
> >>
> >>         movzbl  mode_size(%r13), %edi
> >>         movl    %edi, %eax
> >>         divb    mode_size(%r14)
> >>         movzbl  %ah, %eax
> >>         testb   %al, %al
> >>         jne     .L1194
> >>
> >
> > Hm, something is not combined correctly, I'd say "testb %ah, %ah" is
> > optimal in the second case.
> >
> 
> Here is another update adjusted for mov pattern changes in i386.md.
> 
> 8bit result is stored in
> 
> AL <- Quotient
> AH <- Remainder
> 
> If we use AX for quotient in 8bit divmod pattern, we have to make
> sure that AX is valid for quotient.  We have to extend AL with UNSPEC
> since AH isn't the part of quotient,.  Instead, I use AL for quotient and
> use UNSPEC_MOVQI_EXTZH to extract remainder from AL. Quotient
> access can be optimized very nicely. If remainder is used, we may have
> an extract move for UNSPEC_MOVQI_EXTZH. I think this is a reasonable
> comprise.

Why we need to reinvent movqi_extzv_2 ?

I guess that <u>divqi3 has to be implemented as multiple-set divmod
pattern using strict_low_part subregs to exactly describe in which
subreg quotient and remainder go.

Uros.
H.J. Lu - June 22, 2010, 6:57 p.m.
On Tue, Jun 22, 2010 at 11:44 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>
>> If we use AX for quotient in 8bit divmod pattern, we have to make
>> sure that AX is valid for quotient.  We have to extend AL with UNSPEC
>> since AH isn't the part of quotient,.  Instead, I use AL for quotient and
>> use UNSPEC_MOVQI_EXTZH to extract remainder from AL. Quotient
>> access can be optimized very nicely. If remainder is used, we may have
>> an extract move for UNSPEC_MOVQI_EXTZH. I think this is a reasonable
>> comprise.
>
> Why we need to reinvent movqi_extzv_2 ?

Because UNSPEC_MOVQI_EXTZH extracts AH from AL, not from
AX, EAX, RAX.

> I guess that <u>divqi3 has to be implemented as multiple-set divmod
> pattern using strict_low_part subregs to exactly describe in which
> subreg quotient and remainder go.
>

I tried it and couldn't get it to work without adding a new
constraint for the upper 8bit registers.
Uros Bizjak - June 23, 2010, 6:11 a.m.
On Tue, Jun 22, 2010 at 8:57 PM, H.J. Lu <hjl.tools@gmail.com> wrote:

>>> If we use AX for quotient in 8bit divmod pattern, we have to make
>>> sure that AX is valid for quotient.  We have to extend AL with UNSPEC
>>> since AH isn't the part of quotient,.  Instead, I use AL for quotient and
>>> use UNSPEC_MOVQI_EXTZH to extract remainder from AL. Quotient
>>> access can be optimized very nicely. If remainder is used, we may have
>>> an extract move for UNSPEC_MOVQI_EXTZH. I think this is a reasonable
>>> comprise.
>>
>> Why we need to reinvent movqi_extzv_2 ?
>
> Because UNSPEC_MOVQI_EXTZH extracts AH from AL, not from
> AX, EAX, RAX.

This is wrong, you can't extract AH out from AL...
>
>> I guess that <u>divqi3 has to be implemented as multiple-set divmod
>> pattern using strict_low_part subregs to exactly describe in which
>> subreg quotient and remainder go.
>>
>
> I tried it and couldn't get it to work without adding a new
> constraint for the upper 8bit registers.

"Q" with %h modifier ?

Uros.
H.J. Lu - June 23, 2010, 2:26 p.m.
On Tue, Jun 22, 2010 at 11:11 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Tue, Jun 22, 2010 at 8:57 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>
>>>> If we use AX for quotient in 8bit divmod pattern, we have to make
>>>> sure that AX is valid for quotient.  We have to extend AL with UNSPEC
>>>> since AH isn't the part of quotient,.  Instead, I use AL for quotient and
>>>> use UNSPEC_MOVQI_EXTZH to extract remainder from AL. Quotient
>>>> access can be optimized very nicely. If remainder is used, we may have
>>>> an extract move for UNSPEC_MOVQI_EXTZH. I think this is a reasonable
>>>> comprise.
>>>
>>> Why we need to reinvent movqi_extzv_2 ?
>>
>> Because UNSPEC_MOVQI_EXTZH extracts AH from AL, not from
>> AX, EAX, RAX.
>
> This is wrong, you can't extract AH out from AL...

UNSPEC_MOVQI_EXTZH  is only used on AL from
8bit div where AH has remainder. Should I give it a different
name, like UNSPEC_MOVQI_EXTZH_FROM_8BITDIV?

>>
>>> I guess that <u>divqi3 has to be implemented as multiple-set divmod
>>> pattern using strict_low_part subregs to exactly describe in which
>>> subreg quotient and remainder go.
>>>
>>
>> I tried it and couldn't get it to work without adding a new
>> constraint for the upper 8bit registers.
>
> "Q" with %h modifier ?
>

Although there are AX, AL, AH, ......, the rest of compiler only
knows one hard register, AX_REG, which is register 0. We can't
really access upper 8bit register,  like AH,  as a real register. We
can only access them via sign_extract and zero_extract.  As far
as the rest of gcc is concerned, there are no upper 8bit registers.
I don't think we can describe where 8bit div remainder is to the
rest of compiler.
Paolo Bonzini - June 23, 2010, 5:34 p.m.
On 06/23/2010 04:26 PM, H.J. Lu wrote:
> On Tue, Jun 22, 2010 at 11:11 PM, Uros Bizjak<ubizjak@gmail.com>  wrote:
>> On Tue, Jun 22, 2010 at 8:57 PM, H.J. Lu<hjl.tools@gmail.com>  wrote:
>>
>>>>> If we use AX for quotient in 8bit divmod pattern, we have to make
>>>>> sure that AX is valid for quotient.  We have to extend AL with UNSPEC
>>>>> since AH isn't the part of quotient,.  Instead, I use AL for quotient and
>>>>> use UNSPEC_MOVQI_EXTZH to extract remainder from AL. Quotient
>>>>> access can be optimized very nicely. If remainder is used, we may have
>>>>> an extract move for UNSPEC_MOVQI_EXTZH. I think this is a reasonable
>>>>> comprise.
>>>>
>>>> Why we need to reinvent movqi_extzv_2 ?
>>>
>>> Because UNSPEC_MOVQI_EXTZH extracts AH from AL, not from
>>> AX, EAX, RAX.
>>
>> This is wrong, you can't extract AH out from AL...
>
> UNSPEC_MOVQI_EXTZH  is only used on AL from
> 8bit div where AH has remainder. Should I give it a different
> name, like UNSPEC_MOVQI_EXTZH_FROM_8BITDIV?
>
>>>
>>>> I guess that<u>divqi3 has to be implemented as multiple-set divmod
>>>> pattern using strict_low_part subregs to exactly describe in which
>>>> subreg quotient and remainder go.
>>>>
>>>
>>> I tried it and couldn't get it to work without adding a new
>>> constraint for the upper 8bit registers.
>>
>> "Q" with %h modifier ?
>>
>
> Although there are AX, AL, AH, ......, the rest of compiler only
> knows one hard register, AX_REG, which is register 0. We can't
> really access upper 8bit register,  like AH,  as a real register. We
> can only access them via sign_extract and zero_extract.  As far
> as the rest of gcc is concerned, there are no upper 8bit registers.
> I don't think we can describe where 8bit div remainder is to the
> rest of compiler.

Well, sure you can:

(set (reg:HI r)
    (ior:HI (lshift:HI
             (zero_extend:HI (umod:QI ...) (const_int 8)))
            (zero_extend:HI (udiv:QI ...) (const_int 8))))

(set (reg:QI s) ;; remainder
    (zero_extract (reg:HI r) (const_int 8) (const_int 8)))

(set (reg:QI t) ;; quotient
    (subreg (reg:HI r) 0))

The question is only whether fwprop, combine and regalloc are smart 
enough to reason about it.

Paolo
H.J. Lu - June 23, 2010, 5:48 p.m.
On Wed, Jun 23, 2010 at 10:34 AM, Paolo Bonzini <bonzini@gnu.org> wrote:
> On 06/23/2010 04:26 PM, H.J. Lu wrote:
>>
>> On Tue, Jun 22, 2010 at 11:11 PM, Uros Bizjak<ubizjak@gmail.com>  wrote:
>>>
>>> On Tue, Jun 22, 2010 at 8:57 PM, H.J. Lu<hjl.tools@gmail.com>  wrote:
>>>
>>>>>> If we use AX for quotient in 8bit divmod pattern, we have to make
>>>>>> sure that AX is valid for quotient.  We have to extend AL with UNSPEC
>>>>>> since AH isn't the part of quotient,.  Instead, I use AL for quotient
>>>>>> and
>>>>>> use UNSPEC_MOVQI_EXTZH to extract remainder from AL. Quotient
>>>>>> access can be optimized very nicely. If remainder is used, we may have
>>>>>> an extract move for UNSPEC_MOVQI_EXTZH. I think this is a reasonable
>>>>>> comprise.
>>>>>
>>>>> Why we need to reinvent movqi_extzv_2 ?
>>>>
>>>> Because UNSPEC_MOVQI_EXTZH extracts AH from AL, not from
>>>> AX, EAX, RAX.
>>>
>>> This is wrong, you can't extract AH out from AL...
>>
>> UNSPEC_MOVQI_EXTZH  is only used on AL from
>> 8bit div where AH has remainder. Should I give it a different
>> name, like UNSPEC_MOVQI_EXTZH_FROM_8BITDIV?
>>
>>>>
>>>>> I guess that<u>divqi3 has to be implemented as multiple-set divmod
>>>>> pattern using strict_low_part subregs to exactly describe in which
>>>>> subreg quotient and remainder go.
>>>>>
>>>>
>>>> I tried it and couldn't get it to work without adding a new
>>>> constraint for the upper 8bit registers.
>>>
>>> "Q" with %h modifier ?
>>>
>>
>> Although there are AX, AL, AH, ......, the rest of compiler only
>> knows one hard register, AX_REG, which is register 0. We can't
>> really access upper 8bit register,  like AH,  as a real register. We
>> can only access them via sign_extract and zero_extract.  As far
>> as the rest of gcc is concerned, there are no upper 8bit registers.
>> I don't think we can describe where 8bit div remainder is to the
>> rest of compiler.
>
> Well, sure you can:
>
> (set (reg:HI r)
>   (ior:HI (lshift:HI
>            (zero_extend:HI (umod:QI ...) (const_int 8)))
>           (zero_extend:HI (udiv:QI ...) (const_int 8))))
>
> (set (reg:QI s) ;; remainder
>   (zero_extract (reg:HI r) (const_int 8) (const_int 8)))
>
> (set (reg:QI t) ;; quotient
>   (subreg (reg:HI r) 0))
>
> The question is only whether fwprop, combine and regalloc are smart enough
> to reason about it.
>

That is what I meant by "We can only access them via sign_extract
and zero_extract."  The rest of gcc will see the same hard register
number for both lower and upper 8bit registers. They aren't prepared
to deal with it.
Paolo Bonzini - June 23, 2010, 6:13 p.m.
On 06/23/2010 07:48 PM, H.J. Lu wrote:
>> Well, sure you can:
>>
>> (set (reg:HI r)
>>    (ior:HI (lshift:HI
>>             (zero_extend:HI (umod:QI ...) (const_int 8)))
>>            (zero_extend:HI (udiv:QI ...) (const_int 8))))
>>
>> (set (reg:QI s) ;; remainder
>>    (zero_extract (reg:HI r) (const_int 8) (const_int 8)))
>>
>> (set (reg:QI t) ;; quotient
>>    (subreg (reg:HI r) 0))
>>
>> The question is only whether fwprop, combine and regalloc are smart enough
>> to reason about it.
>>
>
> That is what I meant by "We can only access them via sign_extract
> and zero_extract."  The rest of gcc will see the same hard register
> number for both lower and upper 8bit registers. They aren't prepared
> to deal with it.

Do you mean that GCC will miscompile or pessimize a "mov %ah, %al" for 
the zero-extract in the second set above?  Otherwise, it seems like a 
false problem.

Paolo
H.J. Lu - June 23, 2010, 6:24 p.m.
On Wed, Jun 23, 2010 at 11:13 AM, Paolo Bonzini <bonzini@gnu.org> wrote:
> On 06/23/2010 07:48 PM, H.J. Lu wrote:
>>>
>>> Well, sure you can:
>>>
>>> (set (reg:HI r)
>>>   (ior:HI (lshift:HI
>>>            (zero_extend:HI (umod:QI ...) (const_int 8)))
>>>           (zero_extend:HI (udiv:QI ...) (const_int 8))))
>>>
>>> (set (reg:QI s) ;; remainder
>>>   (zero_extract (reg:HI r) (const_int 8) (const_int 8)))
>>>
>>> (set (reg:QI t) ;; quotient
>>>   (subreg (reg:HI r) 0))
>>>
>>> The question is only whether fwprop, combine and regalloc are smart
>>> enough
>>> to reason about it.
>>>
>>
>> That is what I meant by "We can only access them via sign_extract
>> and zero_extract."  The rest of gcc will see the same hard register
>> number for both lower and upper 8bit registers. They aren't prepared
>> to deal with it.
>
> Do you mean that GCC will miscompile or pessimize a "mov %ah, %al" for the
> zero-extract in the second set above?  Otherwise, it seems like a false
> problem.

I tried

 [(set (match_operand:HI 0 "register_operand" "=a")
        (div:HI
          (match_operand:HI 1 "register_operand" "0")
          (match_operand:QI 2 "nonimmediate_operand" "qm")))
   (clobber (reg:CC FLAGS_REG))]

for 8bit div and it doesn't work since AH has remainder.
How can I write 8bit div pattern which works with RA and
other parts of gcc?
Paolo Bonzini - June 23, 2010, 7:16 p.m.
On 06/23/2010 08:24 PM, H.J. Lu wrote:
>   [(set (match_operand:HI 0 "register_operand" "=a")
>          (div:HI
>            (match_operand:HI 1 "register_operand" "0")
>            (match_operand:QI 2 "nonimmediate_operand" "qm")))
>     (clobber (reg:CC FLAGS_REG))]

Maybe this:

[(set (match_operand:QI 0 "register_operand" "=a")
    (subreg:QI
     (div:HI
      (match_operand:HI 1 "register_operand" "0")
      (match_operand:QI 2 "nonimmediate_operand" "qm")) 0))
  (clobber (reg:CC FLAGS_REG))]

Paolo
H.J. Lu - June 23, 2010, 7:35 p.m.
On Wed, Jun 23, 2010 at 12:16 PM, Paolo Bonzini <bonzini@gnu.org> wrote:
> On 06/23/2010 08:24 PM, H.J. Lu wrote:
>>
>>  [(set (match_operand:HI 0 "register_operand" "=a")
>>         (div:HI
>>           (match_operand:HI 1 "register_operand" "0")
>>           (match_operand:QI 2 "nonimmediate_operand" "qm")))
>>    (clobber (reg:CC FLAGS_REG))]
>
> Maybe this:
>
> [(set (match_operand:QI 0 "register_operand" "=a")
>   (subreg:QI
>    (div:HI
>     (match_operand:HI 1 "register_operand" "0")
>     (match_operand:QI 2 "nonimmediate_operand" "qm")) 0))
>  (clobber (reg:CC FLAGS_REG))]
>

It doesn't make a big difference since only lower 8bit
is set by it.
Paolo Bonzini - June 23, 2010, 7:36 p.m.
On 06/23/2010 09:35 PM, H.J. Lu wrote:
> On Wed, Jun 23, 2010 at 12:16 PM, Paolo Bonzini<bonzini@gnu.org>  wrote:
>> On 06/23/2010 08:24 PM, H.J. Lu wrote:
>>>
>>>   [(set (match_operand:HI 0 "register_operand" "=a")
>>>          (div:HI
>>>            (match_operand:HI 1 "register_operand" "0")
>>>            (match_operand:QI 2 "nonimmediate_operand" "qm")))
>>>     (clobber (reg:CC FLAGS_REG))]
>>
>> Maybe this:
>>
>> [(set (match_operand:QI 0 "register_operand" "=a")
>>    (subreg:QI
>>     (div:HI
>>      (match_operand:HI 1 "register_operand" "0")
>>      (match_operand:QI 2 "nonimmediate_operand" "qm")) 0))
>>   (clobber (reg:CC FLAGS_REG))]
>>
>
> It doesn't make a big difference since only lower 8bit
> is set by it.

For full divmod you have to shift mod and ior of course.  I understood 
that you were talking of *<u>divqi3.

Paolo
H.J. Lu - June 23, 2010, 7:50 p.m.
On Wed, Jun 23, 2010 at 12:36 PM, Paolo Bonzini <bonzini@gnu.org> wrote:
> On 06/23/2010 09:35 PM, H.J. Lu wrote:
>>
>> On Wed, Jun 23, 2010 at 12:16 PM, Paolo Bonzini<bonzini@gnu.org>  wrote:
>>>
>>> On 06/23/2010 08:24 PM, H.J. Lu wrote:
>>>>
>>>>  [(set (match_operand:HI 0 "register_operand" "=a")
>>>>         (div:HI
>>>>           (match_operand:HI 1 "register_operand" "0")
>>>>           (match_operand:QI 2 "nonimmediate_operand" "qm")))
>>>>    (clobber (reg:CC FLAGS_REG))]
>>>
>>> Maybe this:
>>>
>>> [(set (match_operand:QI 0 "register_operand" "=a")
>>>   (subreg:QI
>>>    (div:HI
>>>     (match_operand:HI 1 "register_operand" "0")
>>>     (match_operand:QI 2 "nonimmediate_operand" "qm")) 0))
>>>  (clobber (reg:CC FLAGS_REG))]
>>>
>>
>> It doesn't make a big difference since only lower 8bit
>> is set by it.
>
> For full divmod you have to shift mod and ior of course.  I understood that
> you were talking of *<u>divqi3.
>

I can't do shift/ior on QI output from *<u>divqi3. I have

;; Divide AX by r/m8, with result stored in
;; AL <- Quotient
;; AH <- Remainder
(define_insn "*<u>divqi3"
  [(set (match_operand:QI 0 "register_operand" "=a")
        (any_div:QI
          (match_operand:HI 1 "register_operand" "0")
          (match_operand:QI 2 "nonimmediate_operand" "qm")))
   (clobber (reg:CC FLAGS_REG))]
  "TARGET_QIMODE_MATH"
  "<sgnprefix>div{b}\t%2"
  [(set_attr "type" "idiv")
   (set_attr "mode" "QI")])

and use

;; Used to extract remainder from AH by 8bit divmod.  Use unspec so
;; that we can extract it from AL.
(define_insn "*movqi_extzh"
  [(set (match_operand:QI 0 "nonimmediate_operand" "=Qm,?R")
        (unspec:QI [(match_operand 1 "register_operand" "Q,Q")]
                   UNSPEC_MOVQI_EXTZH))]

to extract it from AL

Patch

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index ab90d73..f268e90 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -104,6 +104,7 @@ 
   UNSPEC_REP
   UNSPEC_LD_MPIC	; load_macho_picbase
   UNSPEC_TRUNC_NOOP
+  UNSPEC_MOVQI_EXTZH
 
   ;; For SSE/MMX support:
   UNSPEC_FIX_NOTRUNC
@@ -760,6 +761,11 @@ 
 
 ;; Used in signed and unsigned divisions.
 (define_code_iterator any_div [div udiv])
+(define_code_attr any_div_code [(div "DIV") (udiv "UDIV")])
+(define_code_attr extend_code
+  [(div "SIGN_EXTEND") (udiv "ZERO_EXTEND")])
+(define_code_attr extract_code
+  [(div "SIGN_EXTRACT") (udiv "ZERO_EXTRACT")])
 
 ;; Instruction prefix for signed and unsigned operations.
 (define_code_attr sgnprefix [(sign_extend "i") (zero_extend "")
@@ -2052,6 +2058,59 @@ 
        (const_string "orig")))
    (set_attr "mode" "SI,DI,DI,DI,SI,DI,DI,DI,DI,DI,DI,TI,TI,DI,DI,DI,DI,DI,DI")])
 
+;; Used to extract remainder from AH by 8bit divmod.  Use unspec so
+;; that we can extract it from AL.
+(define_insn "*movqi_extzh"
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=Qm,?R")
+	(unspec:QI [(match_operand 1 "register_operand" "Q,Q")]
+		   UNSPEC_MOVQI_EXTZH))]
+  "!TARGET_64BIT"
+{
+  switch (get_attr_type (insn))
+    {
+    case TYPE_IMOVX:
+      return "movz{bl|x}\t{%h1, %k0|%k0, %h1}";
+    default:
+      return "mov{b}\t{%h1, %0|%0, %h1}";
+    }
+}
+  [(set (attr "type")
+     (if_then_else (and (match_operand:QI 0 "register_operand" "")
+			(ior (not (match_operand:QI 0 "q_regs_operand" ""))
+			     (ne (symbol_ref "TARGET_MOVX")
+				 (const_int 0))))
+	(const_string "imovx")
+	(const_string "imov")))
+   (set (attr "mode")
+     (if_then_else (eq_attr "type" "imovx")
+	(const_string "SI")
+	(const_string "QI")))])
+
+(define_insn "*movqi_extzh_rex64"
+  [(set (match_operand:QI 0 "register_operand" "=Q,?R")
+	(unspec:QI [(match_operand 1 "register_operand" "Q,Q")]
+		   UNSPEC_MOVQI_EXTZH))]
+  "TARGET_64BIT"
+{
+  switch (get_attr_type (insn))
+    {
+    case TYPE_IMOVX:
+      return "movz{bl|x}\t{%h1, %k0|%k0, %h1}";
+    default:
+      return "mov{b}\t{%h1, %0|%0, %h1}";
+    }
+}
+  [(set (attr "type")
+     (if_then_else (ior (not (match_operand:QI 0 "q_regs_operand" ""))
+			(ne (symbol_ref "TARGET_MOVX")
+			    (const_int 0)))
+	(const_string "imovx")
+	(const_string "imov")))
+   (set (attr "mode")
+     (if_then_else (eq_attr "type" "imovx")
+	(const_string "SI")
+	(const_string "QI")))])
+
 ;; Convert impossible stores of immediate to existing instructions.
 ;; First try to get scratch register and go through it.  In case this
 ;; fails, move by 32bit parts.
@@ -7305,17 +7364,6 @@ 
 
 ;; Divide instructions
 
-(define_insn "<u>divqi3"
-  [(set (match_operand:QI 0 "register_operand" "=a")
-	(any_div:QI
-	  (match_operand:HI 1 "register_operand" "0")
-	  (match_operand:QI 2 "nonimmediate_operand" "qm")))
-   (clobber (reg:CC FLAGS_REG))]
-  "TARGET_QIMODE_MATH"
-  "<sgnprefix>div{b}\t%2"
-  [(set_attr "type" "idiv")
-   (set_attr "mode" "QI")])
-
 ;; The patterns that match these are at the end of this file.
 
 (define_expand "divxf3"
@@ -7352,6 +7400,58 @@ 
 
 ;; Divmod instructions.
 
+(define_expand "<u>divmodqi4"
+  [(parallel [(set (match_operand:QI 0 "register_operand" "")
+		   (any_div:QI
+		     (match_operand:QI 1 "register_operand" "")
+		     (match_operand:QI 2 "nonimmediate_operand" "")))
+	      (set (match_operand:QI 3 "register_operand" "")
+		   (mod:QI (match_dup 1) (match_dup 2)))
+	      (clobber (reg:CC FLAGS_REG))])]
+  "TARGET_QIMODE_MATH"
+{
+  rtx div, clobber, set;
+  rtx operand1;
+  
+  operand1 = gen_reg_rtx (HImode);
+  clobber = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, FLAGS_REG));
+
+  /* Properly extend operands[1] to HImode.  */
+  set = gen_rtx_SET (VOIDmode, operand1,
+		     gen_rtx_<extend_code> (HImode, operands[1]));
+  if (<extend_code> == SIGN_EXTEND)
+    emit_insn (set);
+  else
+    emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set, clobber)));
+
+  /* Generate 8bit divide.  Result is in AX.  */
+  div = gen_rtx_SET (VOIDmode, operands[0],
+		     gen_rtx_<any_div_code> (QImode, operand1,
+					     operands[2]));
+  emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, div, clobber)));
+
+  /* Extract remainder from AH.  Use UNSPEC_MOVQI_EXTZH to extract it
+     from AL.  */
+  operand1 = gen_rtx_UNSPEC (QImode, gen_rtvec (1, operands[0]), 
+			     UNSPEC_MOVQI_EXTZH);
+  emit_move_insn (operands[3], operand1);
+  DONE;
+})
+
+;; Divide AX by r/m8, with result stored in
+;; AL <- Quotient
+;; AH <- Remainder
+(define_insn "*<u>divqi3"
+  [(set (match_operand:QI 0 "register_operand" "=a")
+	(any_div:QI
+	  (match_operand:HI 1 "register_operand" "0")
+	  (match_operand:QI 2 "nonimmediate_operand" "qm")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_QIMODE_MATH"
+  "<sgnprefix>div{b}\t%2"
+  [(set_attr "type" "idiv")
+   (set_attr "mode" "QI")])
+
 (define_expand "divmod<mode>4"
   [(parallel [(set (match_operand:SWIM248 0 "register_operand" "")
 		   (div:SWIM248
--- /dev/null	2010-06-16 10:11:06.602750711 -0700
+++ gcc/gcc/testsuite/gcc.target/i386/umod-1.c	2010-06-21 12:01:25.705950180 -0700
@@ -0,0 +1,11 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune=atom" } */
+
+unsigned char
+foo (unsigned char x, unsigned char y)
+{
+  return x % y;
+}
+
+/* { dg-final { scan-assembler-times "divb" 1 } } */
+/* { dg-final { scan-assembler-not "divw" } } */
--- /dev/null	2010-06-16 10:11:06.602750711 -0700
+++ gcc/gcc/testsuite/gcc.target/i386/umod-2.c	2010-06-21 12:01:17.857932744 -0700
@@ -0,0 +1,14 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune=atom" } */
+
+extern unsigned char z;
+
+unsigned char
+foo (unsigned char x, unsigned char y)
+{
+  z = x/y;
+  return x % y;
+}
+
+/* { dg-final { scan-assembler-times "divb" 1 } } */
+/* { dg-final { scan-assembler-not "divw" } } */
--- /dev/null	2010-06-16 10:11:06.602750711 -0700
+++ gcc/gcc/testsuite/gcc.target/i386/umod-3.c	2010-06-21 12:02:36.809962702 -0700
@@ -0,0 +1,21 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune=atom" } */
+
+extern void abort (void);
+extern void exit (int);
+
+unsigned char cx = 7;
+
+int
+main ()
+{
+  unsigned char cy;
+  
+  cy = cx / 6; if (cy != 1) abort ();
+  cy = cx % 6; if (cy != 1) abort ();
+
+  exit(0);
+}
+
+/* { dg-final { scan-assembler-times "divb" 1 } } */
+/* { dg-final { scan-assembler-not "divw" } } */