Patchwork RFC: PATCH: Add -m8bit-idiv for x86

login
register
mail settings
Submitter H.J. Lu
Date Sept. 13, 2010, 4:17 p.m.
Message ID <20100913161721.GA18471@intel.com>
Download mbox | patch
Permalink /patch/64611/
State New
Headers show

Comments

H.J. Lu - Sept. 13, 2010, 4:17 p.m.
Hi,

On some processors like Atom, 32bit integer divide is much slower than
8bit unsigned integer divide.  This patch adds an option, -m8bit-idiv,
for x86. It turns 32bit integer divide into

if (dividend an divisor are in [0-255])
  use 8bit unsigned integer divide
else
  use 32bit integer divide

I didn't turn it on by default for Atom:

1. If inputs are outside of 8bit unsigned integers, extra compare and
branch will always be slow.
2. For

---
int funca( int a, int b )
{
  return (a/b) + (a%b);
}
--

This patch generates 2 idivbs since the optimization is done at RTL
expansion. Is there a way to delay this until later when 2 idivls are
optimized into 1 idivl and before IRA since this optimization needs
a scratch register.

I tried to split idivl into idivb and idivl. It didn't work due
to recursive split.

Thanks.


H.J.
---
gcc/

2010-09-13  H.J. Lu  <hongjiu.lu@intel.com>

	* config/i386/i386-protos.h (ix86_expand_idivmod): New.

	* config/i386/i386.c (predict_jump): Add prototype.
	(override_options): Handle -m8bit-idiv.
	(ix86_expand_idivmod): New.

	* config/i386/i386.md (divmod<mode>4): Renamed to ...
	(divmod<mode>4_1): This.
	(udivmod<mode>4): Renamed to ...
	(udivmod<mode>4_1): This.
	(divmod<mode>4): New.
	(udivmod<mode>4): Likewise.

	* config/i386/i386.opt (m8bit-idiv): New.

	* doc/invoke.texi: Document -m8bit-idiv.

gcc/testsuite/

2010-09-13  H.J. Lu  <hongjiu.lu@intel.com>

	* gcc.target/i386/divmod-1.c: New.
	* gcc.target/i386/divmod-2.c: Likewise.
	* gcc.target/i386/divmod-3.c: Likewise.
	* gcc.target/i386/udivmod-1.c: Likewise.
	* gcc.target/i386/udivmod-2.c: Likewise.
	* gcc.target/i386/udivmod-3.c: Likewise.
	* gcc.target/i386/umod-1.c: Likewise.
	* gcc.target/i386/umod-2.c: Likewise.
	* gcc.target/i386/umod-3.c: Likewise.
Uros Bizjak - Sept. 13, 2010, 6:58 p.m.
On Mon, Sep 13, 2010 at 6:17 PM, H.J. Lu <hongjiu.lu@intel.com> wrote:

> On some processors like Atom, 32bit integer divide is much slower than
> 8bit unsigned integer divide.  This patch adds an option, -m8bit-idiv,
> for x86. It turns 32bit integer divide into
>
> if (dividend an divisor are in [0-255])
>  use 8bit unsigned integer divide
> else
>  use 32bit integer divide
>
> I didn't turn it on by default for Atom:
>
> 1. If inputs are outside of 8bit unsigned integers, extra compare and
> branch will always be slow.
> 2. For
>
> ---
> int funca( int a, int b )
> {
>  return (a/b) + (a%b);
> }
> --
>
> This patch generates 2 idivbs since the optimization is done at RTL
> expansion. Is there a way to delay this until later when 2 idivls are
> optimized into 1 idivl and before IRA since this optimization needs
> a scratch register.

Splitter with && can_create_pseudo_p () split constraint will limit
splits to pre-regalloc passes, or ...

> I tried to split idivl into idivb and idivl. It didn't work due
> to recursive split.

... use peephole2 pass. It can allocate hard-reg scratch, if it finds
a free one.

Uros.
H.J. Lu - Sept. 14, 2010, 5:05 a.m.
On Mon, Sep 13, 2010 at 11:58 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Mon, Sep 13, 2010 at 6:17 PM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>
>> On some processors like Atom, 32bit integer divide is much slower than
>> 8bit unsigned integer divide.  This patch adds an option, -m8bit-idiv,
>> for x86. It turns 32bit integer divide into
>>
>> if (dividend an divisor are in [0-255])
>>  use 8bit unsigned integer divide
>> else
>>  use 32bit integer divide
>>
>> I didn't turn it on by default for Atom:
>>
>> 1. If inputs are outside of 8bit unsigned integers, extra compare and
>> branch will always be slow.
>> 2. For
>>
>> ---
>> int funca( int a, int b )
>> {
>>  return (a/b) + (a%b);
>> }
>> --
>>
>> This patch generates 2 idivbs since the optimization is done at RTL
>> expansion. Is there a way to delay this until later when 2 idivls are
>> optimized into 1 idivl and before IRA since this optimization needs
>> a scratch register.
>
> Splitter with && can_create_pseudo_p () split constraint will limit
> splits to pre-regalloc passes, or ...

try_split doesn't allow any insn of the result matches the original pattern
to avoid infinite loop.

>> I tried to split idivl into idivb and idivl. It didn't work due
>> to recursive split.
>
> ... use peephole2 pass. It can allocate hard-reg scratch, if it finds
> a free one.

Unfortunately, peephole2 pass can't find a scratch register in
simple case.

Also I really want to split

        (parallel [
            (set (reg:SI 66)
                (div:SI (reg/v:SI 62 [ x ])
                    (mem/c/i:SI (plus:SI (reg/f:SI 16 argp)
                            (const_int 4 [0x4])) [2 y+0 S4 A32])))
            (set (reg:SI 67)
                (mod:SI (reg/v:SI 62 [ x ])
                    (mem/c/i:SI (plus:SI (reg/f:SI 16 argp)
                            (const_int 4 [0x4])) [2 y+0 S4 A32])))
            (clobber (reg:CC 17 flags))
        ])
Uros Bizjak - Sept. 14, 2010, 7:13 a.m.
On Tue, Sep 14, 2010 at 7:05 AM, H.J. Lu <hjl.tools@gmail.com> wrote:

>>> This patch generates 2 idivbs since the optimization is done at RTL
>>> expansion. Is there a way to delay this until later when 2 idivls are
>>> optimized into 1 idivl and before IRA since this optimization needs
>>> a scratch register.
>>
>> Splitter with && can_create_pseudo_p () split constraint will limit
>> splits to pre-regalloc passes, or ...
>
> try_split doesn't allow any insn of the result matches the original pattern
> to avoid infinite loop.

So, switch the places of div and mod RTXes in the parallel and provide
another divl_1 insn pattern that matches this new parallel.

Uros.
Andi Kleen - Sept. 14, 2010, 7:54 a.m.
"H.J. Lu" <hongjiu.lu@intel.com> writes:

> On some processors like Atom, 32bit integer divide is much slower than
> 8bit unsigned integer divide.  This patch adds an option, -m8bit-idiv,
> for x86. It turns 32bit integer divide into

To be honest I'm not sure the option is really useful -- it's 
probably hard to figure out for a user that it makes sense
to set.

>
> if (dividend an divisor are in [0-255])
>   use 8bit unsigned integer divide
> else
>   use 32bit integer divide
>
> I didn't turn it on by default for Atom:
>
> 1. If inputs are outside of 8bit unsigned integers, extra compare and
> branch will always be slow.

Would be something for profile feedback to determine?

-Andi
H.J. Lu - Sept. 14, 2010, 10:30 p.m.
On Tue, Sep 14, 2010 at 12:54 AM, Andi Kleen <andi@firstfloor.org> wrote:
> "H.J. Lu" <hongjiu.lu@intel.com> writes:
>
>> On some processors like Atom, 32bit integer divide is much slower than
>> 8bit unsigned integer divide.  This patch adds an option, -m8bit-idiv,
>> for x86. It turns 32bit integer divide into
>
> To be honest I'm not sure the option is really useful -- it's
> probably hard to figure out for a user that it makes sense
> to set.

That is true.

>>
>> if (dividend an divisor are in [0-255])
>>   use 8bit unsigned integer divide
>> else
>>   use 32bit integer divide
>>
>> I didn't turn it on by default for Atom:
>>
>> 1. If inputs are outside of 8bit unsigned integers, extra compare and
>> branch will always be slow.
>
> Would be something for profile feedback to determine?
>

That is an excellent idea. I will take a look in a follow up patch.

Thanks,.
Paolo Bonzini - Sept. 15, 2010, 5:07 p.m.
On 09/14/2010 09:13 AM, Uros Bizjak wrote:
> On Tue, Sep 14, 2010 at 7:05 AM, H.J. Lu<hjl.tools@gmail.com>  wrote:
>
>>>> This patch generates 2 idivbs since the optimization is done at RTL
>>>> expansion. Is there a way to delay this until later when 2 idivls are
>>>> optimized into 1 idivl and before IRA since this optimization needs
>>>> a scratch register.
>>>
>>> Splitter with&&  can_create_pseudo_p () split constraint will limit
>>> splits to pre-regalloc passes, or ...
>>
>> try_split doesn't allow any insn of the result matches the original pattern
>> to avoid infinite loop.
>
> So, switch the places of div and mod RTXes in the parallel and provide
> another divl_1 insn pattern that matches this new parallel.

I think adding an unspec to the parallel would be less hacky than 
reversing the div/mod.

Paolo
H.J. Lu - Sept. 15, 2010, 5:16 p.m.
On Wed, Sep 15, 2010 at 10:07 AM, Paolo Bonzini <bonzini@gnu.org> wrote:
> On 09/14/2010 09:13 AM, Uros Bizjak wrote:
>>
>> On Tue, Sep 14, 2010 at 7:05 AM, H.J. Lu<hjl.tools@gmail.com>  wrote:
>>
>>>>> This patch generates 2 idivbs since the optimization is done at RTL
>>>>> expansion. Is there a way to delay this until later when 2 idivls are
>>>>> optimized into 1 idivl and before IRA since this optimization needs
>>>>> a scratch register.
>>>>
>>>> Splitter with&&  can_create_pseudo_p () split constraint will limit
>>>> splits to pre-regalloc passes, or ...
>>>
>>> try_split doesn't allow any insn of the result matches the original
>>> pattern
>>> to avoid infinite loop.
>>
>> So, switch the places of div and mod RTXes in the parallel and provide
>> another divl_1 insn pattern that matches this new parallel.
>
> I think adding an unspec to the parallel would be less hacky than reversing
> the div/mod.
>

There are places which check DIV/UDIV.  In i386.c, there are

    case DIV:
    case UDIV:
    case MOD:
    case UMOD:
      if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
        /* ??? SSE cost should be used here.  */
        *total = cost->fdiv;
      else if (X87_FLOAT_MODE_P (mode))
        *total = cost->fdiv;
      else if (FLOAT_MODE_P (mode))
        /* ??? SSE vector cost should be used here.  */
        *total = cost->fdiv;
      else
        *total = cost->divide[MODE_INDEX (mode)];
      return false;

I'd like to keep them as much as possible.
Paolo Bonzini - Sept. 16, 2010, 7:20 a.m.
On 09/15/2010 07:16 PM, H.J. Lu wrote:
>> >  I think adding an unspec to the parallel would be less hacky than reversing
>> >  the div/mod.
>> >
> There are places which check DIV/UDIV.

I said "adding", like

   (parallel [(set ... (div ...))
              (set ... (mod ...))
              (unspec [] DIV_ALREADY_SPLIT)])

Paolo
Uros Bizjak - Sept. 16, 2010, 7:34 a.m.
On Thu, Sep 16, 2010 at 9:20 AM, Paolo Bonzini <bonzini@gnu.org> wrote:
> On 09/15/2010 07:16 PM, H.J. Lu wrote:
>>>
>>> >  I think adding an unspec to the parallel would be less hacky than
>>> > reversing
>>> >  the div/mod.
>>> >
>>
>> There are places which check DIV/UDIV.
>
> I said "adding", like
>
>  (parallel [(set ... (div ...))
>             (set ... (mod ...))
>             (unspec [] DIV_ALREADY_SPLIT)])

I agree with Paolo.

BTW: You should write:

             (unspec [(const_int 0)] DIV_ALREADY_SPLIT)

otherwise sky will fall down.

Uros.

Patch

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 900b424..6b1f447 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -129,6 +129,7 @@  extern void ix86_split_ashr (rtx *, rtx, enum machine_mode);
 extern void ix86_split_lshr (rtx *, rtx, enum machine_mode);
 extern rtx ix86_find_base_term (rtx);
 extern bool ix86_check_movabs (rtx, int);
+extern void ix86_expand_idivmod (enum rtx_code, enum machine_mode, rtx[]);
 
 extern rtx assign_386_stack_local (enum machine_mode, enum ix86_stack_slot);
 extern int ix86_attr_length_immediate_default (rtx, int);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 1d79a18..21d9c0f 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -1981,6 +1981,7 @@  static bool ix86_expand_vector_init_one_nonzero (bool, enum machine_mode,
 static void ix86_add_new_builtins (int);
 static rtx ix86_expand_vec_perm_builtin (tree);
 static tree ix86_canonical_va_list_type (tree);
+static void predict_jump (int);
 
 enum ix86_function_specific_strings
 {
@@ -3699,6 +3700,9 @@  override_options (bool main_args_p)
 #endif
    }
 
+  if (flag_8bit_idiv < 0)
+    flag_8bit_idiv = 0;
+
   /* Save the initial options in case the user does function specific options */
   if (main_args_p)
     target_option_default_node = target_option_current_node
@@ -14647,6 +14651,103 @@  ix86_expand_unary_operator (enum rtx_code code, enum machine_mode mode,
     emit_move_insn (operands[0], dst);
 }
 
+/* Expand 32bit divmod with 8bit unsigned divmod if dividend and
+   divisor are within the the range [0-255].  */
+
+void
+ix86_expand_idivmod (enum rtx_code code, enum machine_mode mode,
+		     rtx operands[])
+{
+  rtx end_label, qimode_label;
+  rtx insn, div, mod;
+  rtx tmp0, tmp1, tmp2;
+  rtx (*gen_divmod4) (rtx, rtx, rtx, rtx);
+
+  switch (mode)
+    {
+    case HImode:
+      gen_divmod4 = code == DIV ? gen_divmodhi4_1 : gen_udivmodhi4_1;
+      break;
+    case SImode:
+      gen_divmod4 = code == DIV ? gen_divmodsi4_1 : gen_udivmodsi4_1;
+      break;
+    case DImode:
+      gen_divmod4 = code == DIV ? gen_divmoddi4_1 : gen_udivmoddi4_1;
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  div = (*gen_divmod4) (operands[0], operands[1],
+			operands[2], operands[3]);
+  if (mode != SImode
+      || !flag_8bit_idiv
+      || !TARGET_QIMODE_MATH
+      || optimize_insn_for_size_p ())
+    {
+      emit_insn (div);
+      return;
+    }
+
+  end_label = gen_label_rtx ();
+  qimode_label = gen_label_rtx ();
+
+  tmp0 = gen_reg_rtx (SImode);
+
+  /* Use 8bit unsigned divimod if dividend and divisor are within the
+     the range [0-255].  */
+  emit_move_insn (tmp0, operands[1]);
+  tmp0 = expand_simple_binop (mode, IOR, tmp0, operands[2],
+			      tmp0, 1, OPTAB_DIRECT);
+  emit_insn (gen_testsi_ccno_1 (tmp0, GEN_INT (-0x100)));
+  tmp0 = gen_rtx_REG (CCNOmode, FLAGS_REG);
+  tmp0 = gen_rtx_EQ (VOIDmode, tmp0, const0_rtx);
+  tmp0 = gen_rtx_IF_THEN_ELSE (VOIDmode, tmp0,
+			       gen_rtx_LABEL_REF (VOIDmode, qimode_label),
+			       pc_rtx);
+  insn = emit_jump_insn (gen_rtx_SET (VOIDmode, pc_rtx, tmp0));
+  predict_jump (REG_BR_PROB_BASE * 50 / 100);
+  JUMP_LABEL (insn) = qimode_label;
+
+  /* Generate 32bit signed/unsigned divimod.  */
+  emit_insn (div);
+
+  /* Branch to the end.  */
+  emit_jump_insn (gen_jump (end_label));
+  emit_barrier ();
+
+  /* Generate 8bit unsigned divide.  */
+  emit_label (qimode_label);
+  tmp0 = simplify_gen_subreg (HImode, operands[0], SImode, 0);
+  tmp1 = simplify_gen_subreg (HImode, operands[1], SImode, 0);
+  tmp2 = simplify_gen_subreg (QImode, operands[2], SImode, 0);
+  emit_insn (gen_udivmodhiqi3 (tmp0, tmp1, tmp2));
+
+  if (code == DIV)
+    {
+      div = gen_rtx_DIV (SImode, operands[1], operands[2]);
+      mod = gen_rtx_MOD (SImode, operands[1], operands[2]);
+    }
+  else
+    {
+      div = gen_rtx_UDIV (SImode, operands[1], operands[2]);
+      mod = gen_rtx_UMOD (SImode, operands[1], operands[2]);
+    }
+
+  /* Extract remainder from AH.  */
+  tmp1 = gen_rtx_ZERO_EXTRACT (SImode, tmp0,
+			       GEN_INT (8), GEN_INT (8));
+  insn = emit_move_insn (operands[3], tmp1);
+  set_unique_reg_note (insn, REG_EQUAL, mod);
+
+  /* Zero extend quotient from AL.  */
+  tmp0 = gen_lowpart (QImode, tmp0);
+  insn = emit_insn (gen_zero_extendqisi2 (operands[0], tmp0));
+  set_unique_reg_note (insn, REG_EQUAL, div);
+
+  emit_label (end_label);
+}
+
 #define LEA_SEARCH_THRESHOLD 12
 
 /* Search backward for non-agu definition of register number REGNO1
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 9780eef..1ddf0b4 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -7295,7 +7295,7 @@ 
   [(set_attr "type" "idiv")
    (set_attr "mode" "QI")])
 
-(define_expand "divmod<mode>4"
+(define_expand "divmod<mode>4_1"
   [(parallel [(set (match_operand:SWIM248 0 "register_operand" "")
 		   (div:SWIM248
 		     (match_operand:SWIM248 1 "register_operand" "")
@@ -7306,6 +7306,17 @@ 
   ""
   "")
 
+(define_expand "divmod<mode>4"
+  [(parallel [(set (match_operand:SWIM248 0 "register_operand" "")
+		   (div:SWIM248
+		     (match_operand:SWIM248 1 "register_operand" "")
+		     (match_operand:SWIM248 2 "nonimmediate_operand" "")))
+	      (set (match_operand:SWIM248 3 "register_operand" "")
+		   (mod:SWIM248 (match_dup 1) (match_dup 2)))
+	      (clobber (reg:CC FLAGS_REG))])]
+  ""
+  "ix86_expand_idivmod (DIV, <MODE>mode, operands); DONE;")
+
 (define_insn_and_split "*divmod<mode>4"
   [(set (match_operand:SWIM248 0 "register_operand" "=a")
 	(div:SWIM248 (match_operand:SWIM248 2 "register_operand" "0")
@@ -7354,7 +7365,7 @@ 
   [(set_attr "type" "idiv")
    (set_attr "mode" "<MODE>")])
 
-(define_expand "udivmod<mode>4"
+(define_expand "udivmod<mode>4_1"
   [(parallel [(set (match_operand:SWIM248 0 "register_operand" "")
 		   (udiv:SWIM248
 		     (match_operand:SWIM248 1 "register_operand" "")
@@ -7365,6 +7376,17 @@ 
   ""
   "")
 
+(define_expand "udivmod<mode>4"
+  [(parallel [(set (match_operand:SWIM248 0 "register_operand" "")
+		   (udiv:SWIM248
+		     (match_operand:SWIM248 1 "register_operand" "")
+		     (match_operand:SWIM248 2 "nonimmediate_operand" "")))
+	      (set (match_operand:SWIM248 3 "register_operand" "")
+		   (umod:SWIM248 (match_dup 1) (match_dup 2)))
+	      (clobber (reg:CC FLAGS_REG))])]
+  ""
+  "ix86_expand_idivmod (UDIV, <MODE>mode, operands); DONE;")
+
 (define_insn_and_split "*udivmod<mode>4"
   [(set (match_operand:SWIM248 0 "register_operand" "=a")
 	(udiv:SWIM248 (match_operand:SWIM248 2 "register_operand" "0")
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 5790e76..f7abe75 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -388,3 +388,7 @@  Support F16C built-in functions and code generation
 mfentry
 Target Report Var(flag_fentry) Init(-1)
 Emit profiling counter call at function entry before prologue.
+
+m8bit-idiv
+Target Report Var(flag_8bit_idiv) Init(-1) Save
+Expand 32bit integer divide into control flow with 8bit unsigned integer divide
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index b354382..cb3a756 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -602,7 +602,7 @@  Objective-C and Objective-C++ Dialects}.
 -momit-leaf-frame-pointer  -mno-red-zone -mno-tls-direct-seg-refs @gol
 -mcmodel=@var{code-model} -mabi=@var{name} @gol
 -m32  -m64 -mlarge-data-threshold=@var{num} @gol
--msse2avx -mfentry}
+-msse2avx -mfentry -m8bit-idiv}
 
 @emph{IA-64 Options}
 @gccoptlist{-mbig-endian  -mlittle-endian  -mgnu-as  -mgnu-ld  -mno-pic @gol
@@ -12647,6 +12647,13 @@  If profiling is active @option{-pg} put the profiling
 counter call before prologue.
 Note: On x86 architectures the attribute @code{ms_hook_prologue}
 isn't possible at the moment for @option{-mfentry} and @option{-pg}.
+
+@item -m8bit-idiv
+@itemx -mno-8bit-idiv
+@opindex 8bit-idiv
+This option will enable GCC to expand 32bit integer divide into control
+flow with 8bit unsigned integer divide.
+
 @end table
 
 These @samp{-m} switches are supported in addition to the above
diff --git a/gcc/testsuite/gcc.target/i386/divmod-1.c b/gcc/testsuite/gcc.target/i386/divmod-1.c
new file mode 100644
index 0000000..2769a21
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/divmod-1.c
@@ -0,0 +1,30 @@ 
+/* { dg-do run } */
+/* { dg-options "-O2 -m8bit-idiv" } */
+
+extern void abort (void);
+
+void
+__attribute__((noinline))
+test (int x, int y, int q, int r)
+{
+  if ((x / y) != q || (x % y) != r)
+    abort ();
+}
+
+int
+main ()
+{
+  test (7, 6, 1, 1);
+  test (-7, -6, 1, -1);
+  test (-7, 6, -1, -1);
+  test (7, -6, -1, 1);
+  test (255, 254, 1, 1);
+  test (256, 254, 1, 2);
+  test (256, 256, 1, 0);
+  test (254, 256, 0, 254);
+  test (254, 255, 0, 254);
+  test (254, 1, 254, 0);
+  test (255, 2, 127, 1);
+  test (1, 256, 0, 1);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/divmod-2.c b/gcc/testsuite/gcc.target/i386/divmod-2.c
new file mode 100644
index 0000000..dbe40b7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/divmod-2.c
@@ -0,0 +1,10 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -m8bit-idiv" } */
+
+int
+foo (int x, int y)
+{
+   return x / y;
+}
+
+/* { dg-final { scan-assembler-times "divb" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/divmod-3.c b/gcc/testsuite/gcc.target/i386/divmod-3.c
new file mode 100644
index 0000000..4966d7f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/divmod-3.c
@@ -0,0 +1,10 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -m8bit-idiv" } */
+
+int
+foo (int x, int y)
+{
+   return x % y;
+}
+
+/* { dg-final { scan-assembler-times "divb" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/udivmod-1.c b/gcc/testsuite/gcc.target/i386/udivmod-1.c
new file mode 100644
index 0000000..eebd843
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/udivmod-1.c
@@ -0,0 +1,31 @@ 
+/* { dg-do run } */
+/* { dg-options "-O2 -m8bit-idiv" } */
+
+extern void abort (void);
+
+void
+__attribute__((noinline))
+test (unsigned int x, unsigned int y, unsigned int q, unsigned int r)
+{
+  if ((x / y) != q || (x % y) != r)
+    abort ();
+}
+
+int
+main ()
+{
+  test (7, 6, 1, 1);
+  test (255, 254, 1, 1);
+  test (256, 254, 1, 2);
+  test (256, 256, 1, 0);
+  test (254, 256, 0, 254);
+  test (254, 255, 0, 254);
+  test (254, 1, 254, 0);
+  test (255, 2, 127, 1);
+  test (1, 256, 0, 1);
+  test (0x80000000, 0x7fffffff, 1, 1);
+  test (0x7fffffff, 0x80000000, 0, 0x7fffffff);
+  test (0x80000000, 0x80000003, 0, 0x80000000);
+  test (0xfffffffd, 0xfffffffe, 0, 0xfffffffd);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/udivmod-2.c b/gcc/testsuite/gcc.target/i386/udivmod-2.c
new file mode 100644
index 0000000..e43bbc7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/udivmod-2.c
@@ -0,0 +1,10 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -m8bit-idiv" } */
+
+unsigned int
+foo (unsigned int x, unsigned int y)
+{
+   return x / y;
+}
+
+/* { dg-final { scan-assembler-times "divb" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/udivmod-3.c b/gcc/testsuite/gcc.target/i386/udivmod-3.c
new file mode 100644
index 0000000..9aead9e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/udivmod-3.c
@@ -0,0 +1,10 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -m8bit-idiv" } */
+
+unsigned int
+foo (unsigned int x, unsigned int y)
+{
+   return x % y;
+}
+
+/* { dg-final { scan-assembler-times "divb" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/umod-1.c b/gcc/testsuite/gcc.target/i386/umod-1.c
index 54edf13..a39e75b 100644
--- a/gcc/testsuite/gcc.target/i386/umod-1.c
+++ b/gcc/testsuite/gcc.target/i386/umod-1.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -mtune=atom" } */
+/* { dg-options "-O2 -m8bit-idiv" } */
 
 unsigned char
 foo (unsigned char x, unsigned char y)
diff --git a/gcc/testsuite/gcc.target/i386/umod-2.c b/gcc/testsuite/gcc.target/i386/umod-2.c
index 6fe7384..1ef33bb 100644
--- a/gcc/testsuite/gcc.target/i386/umod-2.c
+++ b/gcc/testsuite/gcc.target/i386/umod-2.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -mtune=atom" } */
+/* { dg-options "-O2 -m8bit-idiv" } */
 
 extern unsigned char z;
 
diff --git a/gcc/testsuite/gcc.target/i386/umod-3.c b/gcc/testsuite/gcc.target/i386/umod-3.c
index 7123bc9..3e5bc40 100644
--- a/gcc/testsuite/gcc.target/i386/umod-3.c
+++ b/gcc/testsuite/gcc.target/i386/umod-3.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -mtune=atom" } */
+/* { dg-options "-O2 -m8bit-idiv" } */
 
 extern void abort (void);
 extern void exit (int);