diff mbox series

[v2] Fold (add -1; zero_ext; add +1) operations to zero_ext when not overflow (PR37451, part of PR61837)

Message ID ce5777e2-694a-fdf5-5aa8-20e1a91e5e66@linux.ibm.com
State New
Headers show
Series [v2] Fold (add -1; zero_ext; add +1) operations to zero_ext when not overflow (PR37451, part of PR61837) | expand

Commit Message

Xionghu Luo May 12, 2020, 6:48 a.m. UTC
Minor refine of checking iterations nonoverflow and a testcase for stage 1.


This "subtract/extend/add" existed for a long time and still annoying us
(PR37451, part of PR61837) when converting from 32bits to 64bits, as the ctr
register is used as 64bits on powerpc64, Andraw Pinski had a patch but
caused some issue and reverted by Joseph S. Myers(PR37451, PR37782).

Andraw:
http://gcc.gnu.org/ml/gcc-patches/2008-09/msg01070.html
http://gcc.gnu.org/ml/gcc-patches/2008-10/msg01321.html
Joseph:
https://gcc.gnu.org/legacy-ml/gcc-patches/2011-11/msg02405.html

We can do the simplification from "subtract/extend/add" to only extend
when loop iterations is known to be LT than MODE_MAX-1(NOT do simplify
when counter+0x1 overflow).

Bootstrap and regression tested pass on Power8-LE.

gcc/ChangeLog

	2020-05-12  Xiong Hu Luo  <luoxhu@linux.ibm.com>

	PR rtl-optimization/37451, part of PR target/61837
	* loop-doloop.c (doloop_modify): Simplify (add -1; zero_ext; add +1)
	to zero_ext when not wrapping overflow.

gcc/testsuite/ChangeLog

	2020-05-12  Xiong Hu Luo  <luoxhu@linux.ibm.com>

	PR rtl-optimization/37451, part of PR target/61837
	* gcc.target/powerpc/doloop-2.c: New test.
---
 gcc/loop-doloop.c                           | 46 ++++++++++++++++++++-
 gcc/testsuite/gcc.target/powerpc/doloop-2.c | 14 +++++++
 2 files changed, 59 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/doloop-2.c

Comments

Segher Boessenkool May 12, 2020, 2:58 p.m. UTC | #1
Hi!

On Tue, May 12, 2020 at 02:48:40PM +0800, luoxhu wrote:
> diff --git a/gcc/testsuite/gcc.target/powerpc/doloop-2.c b/gcc/testsuite/gcc.target/powerpc/doloop-2.c
> new file mode 100644
> index 00000000000..dc8516bb0ab
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/doloop-2.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target powerpc*-*-* } } */

Just { dg-do compiler } please, *everything* that runs this testsuite
is powerpc*-*-*; but compile is the default as well, so you can leave
that line completely out as well, if you want.

> +/* { dg-final { scan-assembler-not "-1" } } */

This will fail the test for the string "-1" anywhere in the file.  Like,
if it was called "doloop-1.c" it would fail, or "doloop-12345.c".  \m
and \M can help for that last case, but you probably want to make the
regex a bit more selective ;-)  (And, document what it doesn't want to
see, if it isn't really obvious?)


Segher
Richard Sandiford May 12, 2020, 6:24 p.m. UTC | #2
luoxhu <luoxhu@linux.ibm.com> writes:
> +      /* Fold (add -1; zero_ext; add +1) operations to zero_ext. i.e:
> +
> +	 73: r145:SI=r123:DI#0-0x1
> +	 74: r144:DI=zero_extend (r145:SI)
> +	 75: r143:DI=r144:DI+0x1
> +	 ...
> +	 31: r135:CC=cmp (r123:DI,0)
> +	 72: {pc={(r143:DI!=0x1)?L70:pc};r143:DI=r143:DI-0x1;clobber
> +	 scratch;clobber scratch;}

Minor, but it might be worth stubbing out the clobbers, since they're
not really necessary to understand the comment:

	 72: {pc={(r143:DI!=0x1)?L70:pc};r143:DI=r143:DI-0x1;...}

> +
> +	 r123:DI#0-0x1 is param count derived from loop->niter_expr equal to the
> +	 loop iterations, if loop iterations expression doesn't overflow, then
> +	 (zero_extend (r123:DI#0-1))+1 could be simplified to zero_extend only.
> +       */
> +      bool simplify_zext = false;

I think it'd be easier to follow if this was split out into
a subroutine, rather than having the simplify_zext variable.

> +      rtx extop0 = XEXP (count, 0);
> +      if (GET_CODE (count) == ZERO_EXTEND && GET_CODE (extop0) == PLUS)

This isn't valid: we can only do XEXP (count, 0) *after* checking
for a ZERO_EXTEND.  (It'd be good to test the patch with
--enable-checking=yes,extra,rtl , which hopefully would have
caught this.)

> +	{
> +	  rtx addop0 = XEXP (extop0, 0);
> +	  rtx addop1 = XEXP (extop0, 1);
> +
> +	  int nonoverflow = 0;
> +	  unsigned int_mode
> +	    = GET_MODE_PRECISION (as_a<scalar_int_mode> GET_MODE (addop0));

Heh.  I wondered at first how on earth this compiled.  It looked like
there was a missing "(...)" around the GET_MODE.  But of course,
GET_MODE adds its own parentheses, so it all works out. :-)

Please add the "(...)" anyway though.  We shouldn't rely on that.

"int_mode" seems a bit of a confusing name, since it's actually a precision
in bits rather than a mode.

> +	  unsigned HOST_WIDE_INT int_mode_max
> +	    = (HOST_WIDE_INT_1U << (int_mode - 1) << 1) - 1;
> +	  if (get_max_loop_iterations (loop, &iterations)
> +	      && wi::ltu_p (iterations, int_mode_max))

You could use GET_MODE_MASK instead of int_mode_max here.

For extra safety, it would be good to add a HWI_COMPUTABLE_P test,
to make sure that using HWIs is valid.

> +	    nonoverflow = 1;
> +
> +	  if (nonoverflow

Having the nonoverflow variable doesn't seem necessary.  We could
just fuse the two "if" conditions together.

> +	      && CONST_SCALAR_INT_P (addop1)
> +	      && GET_MODE_PRECISION (mode) == int_mode * 2

This GET_MODE_PRECISION condition also shouldn't be necessary.
If we can prove that the subtraction doesn't wrap, we can extend
to any wider mode, not just to double the width.

> +	      && addop1 == GEN_INT (-1))

This can just be:

   addop1 == constm1_rtx

There's then no need for the CONST_SCALAR_INT_P check.

Thanks,
Richard
Xionghu Luo May 14, 2020, 2:48 a.m. UTC | #3
On 2020/5/13 02:24, Richard Sandiford wrote:
> luoxhu <luoxhu@linux.ibm.com> writes:
>> +      /* Fold (add -1; zero_ext; add +1) operations to zero_ext. i.e:
>> +
>> +	 73: r145:SI=r123:DI#0-0x1
>> +	 74: r144:DI=zero_extend (r145:SI)
>> +	 75: r143:DI=r144:DI+0x1
>> +	 ...
>> +	 31: r135:CC=cmp (r123:DI,0)
>> +	 72: {pc={(r143:DI!=0x1)?L70:pc};r143:DI=r143:DI-0x1;clobber
>> +	 scratch;clobber scratch;}
> 
> Minor, but it might be worth stubbing out the clobbers, since they're
> not really necessary to understand the comment:
> 
> 	 72: {pc={(r143:DI!=0x1)?L70:pc};r143:DI=r143:DI-0x1;...}
> 
>> +
>> +	 r123:DI#0-0x1 is param count derived from loop->niter_expr equal to the
>> +	 loop iterations, if loop iterations expression doesn't overflow, then
>> +	 (zero_extend (r123:DI#0-1))+1 could be simplified to zero_extend only.
>> +       */
>> +      bool simplify_zext = false;
> 
> I think it'd be easier to follow if this was split out into
> a subroutine, rather than having the simplify_zext variable.
> 
>> +      rtx extop0 = XEXP (count, 0);
>> +      if (GET_CODE (count) == ZERO_EXTEND && GET_CODE (extop0) == PLUS)
> 
> This isn't valid: we can only do XEXP (count, 0) *after* checking
> for a ZERO_EXTEND.  (It'd be good to test the patch with
> --enable-checking=yes,extra,rtl , which hopefully would have
> caught this.)
> 
>> +	{
>> +	  rtx addop0 = XEXP (extop0, 0);
>> +	  rtx addop1 = XEXP (extop0, 1);
>> +
>> +	  int nonoverflow = 0;
>> +	  unsigned int_mode
>> +	    = GET_MODE_PRECISION (as_a<scalar_int_mode> GET_MODE (addop0));
> 
> Heh.  I wondered at first how on earth this compiled.  It looked like
> there was a missing "(...)" around the GET_MODE.  But of course,
> GET_MODE adds its own parentheses, so it all works out. :-)
> 
> Please add the "(...)" anyway though.  We shouldn't rely on that.
> 
> "int_mode" seems a bit of a confusing name, since it's actually a precision
> in bits rather than a mode.
> 
>> +	  unsigned HOST_WIDE_INT int_mode_max
>> +	    = (HOST_WIDE_INT_1U << (int_mode - 1) << 1) - 1;
>> +	  if (get_max_loop_iterations (loop, &iterations)
>> +	      && wi::ltu_p (iterations, int_mode_max))
> 
> You could use GET_MODE_MASK instead of int_mode_max here.
> 
> For extra safety, it would be good to add a HWI_COMPUTABLE_P test,
> to make sure that using HWIs is valid.
> 
>> +	    nonoverflow = 1;
>> +
>> +	  if (nonoverflow
> 
> Having the nonoverflow variable doesn't seem necessary.  We could
> just fuse the two "if" conditions together.
> 
>> +	      && CONST_SCALAR_INT_P (addop1)
>> +	      && GET_MODE_PRECISION (mode) == int_mode * 2
> 
> This GET_MODE_PRECISION condition also shouldn't be necessary.
> If we can prove that the subtraction doesn't wrap, we can extend
> to any wider mode, not just to double the width.
> 
>> +	      && addop1 == GEN_INT (-1))
> 
> This can just be:
> 
>     addop1 == constm1_rtx
> 
> There's then no need for the CONST_SCALAR_INT_P check.
> 
> Thanks,
> Richard
> 

Thanks for all your great comments, addressed them all with below update,
"--enable-checking=yes,extra,rtl" did catch the ICE with performance penalty.


This "subtract/extend/add" existed for a long time and still annoying us
(PR37451, part of PR61837) when converting from 32bits to 64bits, as the ctr
register is used as 64bits on powerpc64, Andraw Pinski had a patch but
caused some issue and reverted by Joseph S. Myers(PR37451, PR37782).

Andraw:
http://gcc.gnu.org/ml/gcc-patches/2008-09/msg01070.html
http://gcc.gnu.org/ml/gcc-patches/2008-10/msg01321.html
Joseph:
https://gcc.gnu.org/legacy-ml/gcc-patches/2011-11/msg02405.html

We still can do the simplification from "subtract/zero_ext/add" to "zero_ext"
when loop iterations is known to be LT than MODE_MAX (only do simplify
when counter+0x1 NOT overflow).

Bootstrap and regression tested pass on Power8-LE.

gcc/ChangeLog

	2020-05-14  Xiong Hu Luo  <luoxhu@linux.ibm.com>

	PR rtl-optimization/37451, part of PR target/61837
	* loop-doloop.c (doloop_simplify_count): New function.  Simplify
	(add -1; zero_ext; add +1) to zero_ext when not wrapping.
	(doloop_modify): Call doloop_simplify_count.

gcc/testsuite/ChangeLog

	2020-05-14  Xiong Hu Luo  <luoxhu@linux.ibm.com>

	PR rtl-optimization/37451, part of PR target/61837
	* gcc.target/powerpc/doloop-2.c: New test.
---
 gcc/loop-doloop.c                           | 38 ++++++++++++++++++++-
 gcc/testsuite/gcc.target/powerpc/doloop-2.c | 29 ++++++++++++++++
 2 files changed, 66 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/doloop-2.c

diff --git a/gcc/loop-doloop.c b/gcc/loop-doloop.c
index db6a014e43d..02282d45bd5 100644
--- a/gcc/loop-doloop.c
+++ b/gcc/loop-doloop.c
@@ -397,6 +397,42 @@ add_test (rtx cond, edge *e, basic_block dest)
   return true;
 }
 
+/* Fold (add -1; zero_ext; add +1) operations to zero_ext if not wrapping. i.e:
+
+   73: r145:SI=r123:DI#0-0x1
+   74: r144:DI=zero_extend (r145:SI)
+   75: r143:DI=r144:DI+0x1
+   ...
+   31: r135:CC=cmp (r123:DI,0)
+   72: {pc={(r143:DI!=0x1)?L70:pc};r143:DI=r143:DI-0x1;...}
+
+   r123:DI#0-0x1 is param count derived from loop->niter_expr equal to number of
+   loop iterations, if loop iterations expression doesn't overflow, then
+   (zero_extend (r123:DI#0-1))+1 can be simplified to zero_extend.  */
+
+static rtx
+doloop_simplify_count (class loop *loop, scalar_int_mode mode, rtx count)
+{
+  widest_int iterations;
+  if (GET_CODE (count) == ZERO_EXTEND)
+    {
+      rtx extop0 = XEXP (count, 0);
+      if (GET_CODE (extop0) == PLUS)
+	{
+	  rtx addop0 = XEXP (extop0, 0);
+	  rtx addop1 = XEXP (extop0, 1);
+
+	  if (get_max_loop_iterations (loop, &iterations)
+	      && wi::ltu_p (iterations, GET_MODE_MASK (GET_MODE (addop0)))
+	      && addop1 == constm1_rtx)
+	    return simplify_gen_unary (ZERO_EXTEND, mode, addop0,
+				       GET_MODE (addop0));
+	}
+    }
+
+  return simplify_gen_binary (PLUS, mode, count, const1_rtx);
+}
+
 /* Modify the loop to use the low-overhead looping insn where LOOP
    describes the loop, DESC describes the number of iterations of the
    loop, and DOLOOP_INSN is the low-overhead looping insn to emit at the
@@ -477,7 +513,7 @@ doloop_modify (class loop *loop, class niter_desc *desc,
     }
 
   if (increment_count)
-    count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
+    count = doloop_simplify_count (loop, mode, count);
 
   /* Insert initialization of the count register into the loop header.  */
   start_sequence ();
diff --git a/gcc/testsuite/gcc.target/powerpc/doloop-2.c b/gcc/testsuite/gcc.target/powerpc/doloop-2.c
new file mode 100644
index 00000000000..3199fe56d35
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/doloop-2.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-unroll-loops" } */
+
+unsigned int
+foo1 (unsigned int l, int *a)
+{
+  unsigned int i;
+  for(i = 0;i < l; i++)
+    a[i] = i;
+  return l;
+}
+
+int
+foo2 (int l, int *a)
+{
+  int i;
+  for(i = 0;i < l; i++)
+    a[i] = i;
+  return l;
+}
+
+/* The place where we were getting an extra -1 is when converting from 32bits
+   to 64bits as the ctr register is used as 64bits on powerpc64.  We should be
+   able to do this loop without "add -1/zero_ext/add 1" to the l to get the
+   number of iterations of this loop still doing a do-loop.  */
+
+/* { dg-final { scan-assembler-not {(?n)\maddi .*,.*,-1$} } } */
+/* { dg-final { scan-assembler-times "bdnz" 2 } } */
+/* { dg-final { scan-assembler-times "mtctr" 2 } } */
Richard Sandiford May 14, 2020, 11:15 a.m. UTC | #4
luoxhu <luoxhu@linux.ibm.com> writes:
> This "subtract/extend/add" existed for a long time and still annoying us
> (PR37451, part of PR61837) when converting from 32bits to 64bits, as the ctr
> register is used as 64bits on powerpc64, Andraw Pinski had a patch but
> caused some issue and reverted by Joseph S. Myers(PR37451, PR37782).
>
> Andraw:
> http://gcc.gnu.org/ml/gcc-patches/2008-09/msg01070.html
> http://gcc.gnu.org/ml/gcc-patches/2008-10/msg01321.html
> Joseph:
> https://gcc.gnu.org/legacy-ml/gcc-patches/2011-11/msg02405.html
>
> We still can do the simplification from "subtract/zero_ext/add" to "zero_ext"
> when loop iterations is known to be LT than MODE_MAX (only do simplify
> when counter+0x1 NOT overflow).
>
> Bootstrap and regression tested pass on Power8-LE.
>
> gcc/ChangeLog
>
> 	2020-05-14  Xiong Hu Luo  <luoxhu@linux.ibm.com>
>
> 	PR rtl-optimization/37451, part of PR target/61837
> 	* loop-doloop.c (doloop_simplify_count): New function.  Simplify
> 	(add -1; zero_ext; add +1) to zero_ext when not wrapping.
> 	(doloop_modify): Call doloop_simplify_count.
>
> gcc/testsuite/ChangeLog
>
> 	2020-05-14  Xiong Hu Luo  <luoxhu@linux.ibm.com>
>
> 	PR rtl-optimization/37451, part of PR target/61837
> 	* gcc.target/powerpc/doloop-2.c: New test.

OK, thanks.

Richard

> ---
>  gcc/loop-doloop.c                           | 38 ++++++++++++++++++++-
>  gcc/testsuite/gcc.target/powerpc/doloop-2.c | 29 ++++++++++++++++
>  2 files changed, 66 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/doloop-2.c
>
> diff --git a/gcc/loop-doloop.c b/gcc/loop-doloop.c
> index db6a014e43d..02282d45bd5 100644
> --- a/gcc/loop-doloop.c
> +++ b/gcc/loop-doloop.c
> @@ -397,6 +397,42 @@ add_test (rtx cond, edge *e, basic_block dest)
>    return true;
>  }
>  
> +/* Fold (add -1; zero_ext; add +1) operations to zero_ext if not wrapping. i.e:
> +
> +   73: r145:SI=r123:DI#0-0x1
> +   74: r144:DI=zero_extend (r145:SI)
> +   75: r143:DI=r144:DI+0x1
> +   ...
> +   31: r135:CC=cmp (r123:DI,0)
> +   72: {pc={(r143:DI!=0x1)?L70:pc};r143:DI=r143:DI-0x1;...}
> +
> +   r123:DI#0-0x1 is param count derived from loop->niter_expr equal to number of
> +   loop iterations, if loop iterations expression doesn't overflow, then
> +   (zero_extend (r123:DI#0-1))+1 can be simplified to zero_extend.  */
> +
> +static rtx
> +doloop_simplify_count (class loop *loop, scalar_int_mode mode, rtx count)
> +{
> +  widest_int iterations;
> +  if (GET_CODE (count) == ZERO_EXTEND)
> +    {
> +      rtx extop0 = XEXP (count, 0);
> +      if (GET_CODE (extop0) == PLUS)
> +	{
> +	  rtx addop0 = XEXP (extop0, 0);
> +	  rtx addop1 = XEXP (extop0, 1);
> +
> +	  if (get_max_loop_iterations (loop, &iterations)
> +	      && wi::ltu_p (iterations, GET_MODE_MASK (GET_MODE (addop0)))
> +	      && addop1 == constm1_rtx)
> +	    return simplify_gen_unary (ZERO_EXTEND, mode, addop0,
> +				       GET_MODE (addop0));
> +	}
> +    }
> +
> +  return simplify_gen_binary (PLUS, mode, count, const1_rtx);
> +}
> +
>  /* Modify the loop to use the low-overhead looping insn where LOOP
>     describes the loop, DESC describes the number of iterations of the
>     loop, and DOLOOP_INSN is the low-overhead looping insn to emit at the
> @@ -477,7 +513,7 @@ doloop_modify (class loop *loop, class niter_desc *desc,
>      }
>  
>    if (increment_count)
> -    count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
> +    count = doloop_simplify_count (loop, mode, count);
>  
>    /* Insert initialization of the count register into the loop header.  */
>    start_sequence ();
> diff --git a/gcc/testsuite/gcc.target/powerpc/doloop-2.c b/gcc/testsuite/gcc.target/powerpc/doloop-2.c
> new file mode 100644
> index 00000000000..3199fe56d35
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/doloop-2.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-unroll-loops" } */
> +
> +unsigned int
> +foo1 (unsigned int l, int *a)
> +{
> +  unsigned int i;
> +  for(i = 0;i < l; i++)
> +    a[i] = i;
> +  return l;
> +}
> +
> +int
> +foo2 (int l, int *a)
> +{
> +  int i;
> +  for(i = 0;i < l; i++)
> +    a[i] = i;
> +  return l;
> +}
> +
> +/* The place where we were getting an extra -1 is when converting from 32bits
> +   to 64bits as the ctr register is used as 64bits on powerpc64.  We should be
> +   able to do this loop without "add -1/zero_ext/add 1" to the l to get the
> +   number of iterations of this loop still doing a do-loop.  */
> +
> +/* { dg-final { scan-assembler-not {(?n)\maddi .*,.*,-1$} } } */
> +/* { dg-final { scan-assembler-times "bdnz" 2 } } */
> +/* { dg-final { scan-assembler-times "mtctr" 2 } } */
diff mbox series

Patch

diff --git a/gcc/loop-doloop.c b/gcc/loop-doloop.c
index db6a014e43d..16372382a22 100644
--- a/gcc/loop-doloop.c
+++ b/gcc/loop-doloop.c
@@ -477,7 +477,51 @@  doloop_modify (class loop *loop, class niter_desc *desc,
     }
 
   if (increment_count)
-    count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
+    {
+      /* Fold (add -1; zero_ext; add +1) operations to zero_ext. i.e:
+
+	 73: r145:SI=r123:DI#0-0x1
+	 74: r144:DI=zero_extend (r145:SI)
+	 75: r143:DI=r144:DI+0x1
+	 ...
+	 31: r135:CC=cmp (r123:DI,0)
+	 72: {pc={(r143:DI!=0x1)?L70:pc};r143:DI=r143:DI-0x1;clobber
+	 scratch;clobber scratch;}
+
+	 r123:DI#0-0x1 is param count derived from loop->niter_expr equal to the
+	 loop iterations, if loop iterations expression doesn't overflow, then
+	 (zero_extend (r123:DI#0-1))+1 could be simplified to zero_extend only.
+       */
+      bool simplify_zext = false;
+      rtx extop0 = XEXP (count, 0);
+      if (GET_CODE (count) == ZERO_EXTEND && GET_CODE (extop0) == PLUS)
+	{
+	  rtx addop0 = XEXP (extop0, 0);
+	  rtx addop1 = XEXP (extop0, 1);
+
+	  int nonoverflow = 0;
+	  unsigned int_mode
+	    = GET_MODE_PRECISION (as_a<scalar_int_mode> GET_MODE (addop0));
+	  unsigned HOST_WIDE_INT int_mode_max
+	    = (HOST_WIDE_INT_1U << (int_mode - 1) << 1) - 1;
+	  if (get_max_loop_iterations (loop, &iterations)
+	      && wi::ltu_p (iterations, int_mode_max))
+	    nonoverflow = 1;
+
+	  if (nonoverflow
+	      && CONST_SCALAR_INT_P (addop1)
+	      && GET_MODE_PRECISION (mode) == int_mode * 2
+	      && addop1 == GEN_INT (-1))
+	    {
+	      count = simplify_gen_unary (ZERO_EXTEND, mode, addop0,
+					  GET_MODE (addop0));
+	      simplify_zext = true;
+	    }
+	}
+
+      if (!simplify_zext)
+	count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
+    }
 
   /* Insert initialization of the count register into the loop header.  */
   start_sequence ();
diff --git a/gcc/testsuite/gcc.target/powerpc/doloop-2.c b/gcc/testsuite/gcc.target/powerpc/doloop-2.c
new file mode 100644
index 00000000000..dc8516bb0ab
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/doloop-2.c
@@ -0,0 +1,14 @@ 
+/* { dg-do compile { target powerpc*-*-* } } */
+/* { dg-options "-O2 -fno-unroll-loops" } */
+
+int f(int l, int *a)
+{
+    int i;
+      for(i = 0;i < l; i++)
+	    a[i] = i;
+        return l;
+}
+
+/* { dg-final { scan-assembler-not "-1" } } */
+/* { dg-final { scan-assembler "bdnz" } } */
+/* { dg-final { scan-assembler-times "mtctr" 1 } } */