Patchwork [x86] Use vector moves in memmove expanding

login
register
mail settings
Submitter Michael Zolotukhin
Date Sept. 9, 2013, 7:42 a.m.
Message ID <20130909074205.GB53568@msticlxl57.ims.intel.com>
Download mbox | patch
Permalink /patch/273515/
State New
Headers show

Comments

Michael Zolotukhin - Sept. 9, 2013, 7:42 a.m.
> OK,
> Would you mind adding a testcase?
Thanks, here is the patch with Eric's test.
OK to commit?

Changelog:
gcc:
2013-09-09  Michael Zolotukhin  <michael.v.zolotukhin@gmail.com>

	* config/i386/i386.c (ix86_expand_movmem): Fix epilogue generation.

gcc/testsuite:
2013-09-09  Michael Zolotukhin  <michael.v.zolotukhin@gmail.com>

	* gcc.target/i386/memcpy-2.c: New test.


Thanks, Michael
> Honza
Uros Bizjak - Sept. 9, 2013, 7:45 a.m.
On Mon, Sep 9, 2013 at 9:42 AM, Michael V. Zolotukhin
<michael.v.zolotukhin@gmail.com> wrote:
>> OK,
>> Would you mind adding a testcase?
> Thanks, here is the patch with Eric's test.
> OK to commit?
>
> Changelog:
> gcc:
> 2013-09-09  Michael Zolotukhin  <michael.v.zolotukhin@gmail.com>
>
>         * config/i386/i386.c (ix86_expand_movmem): Fix epilogue generation.
>
> gcc/testsuite:
> 2013-09-09  Michael Zolotukhin  <michael.v.zolotukhin@gmail.com>
>
>         * gcc.target/i386/memcpy-2.c: New test.

+/* { dg-do run } */
+/* { dg-require-effective-target ia32 } */
+/* { dg-options "-O2 -march=pentiumpro -minline-all-stringops -fno-common" } */

Please use -mtune instead of -march.

Otherwise OK.

Thanks,
Uros.
Jakub Jelinek - Sept. 9, 2013, 7:46 a.m.
On Mon, Sep 09, 2013 at 11:42:05AM +0400, Michael V. Zolotukhin wrote:
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/memcpy-2.c
> @@ -0,0 +1,22 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target ia32 } */
> +/* { dg-options "-O2 -march=pentiumpro -minline-all-stringops -fno-common" } */

I don't see anything i386 specific on the testcase, except the flags,
and don't see why you need -fno-common in there, there are no global vars.
So, I think it would be better to stick it into gcc.dg/torture/, drop
dg-require-* line and instead just add
/* { dg-additional-options "-march=pentiumpro" { target ia32 } } */
/* { dg-additional-options "-minline-all-stringops" { target { i?86-*-* x86_64-*-* } } } */
or so (and let it cycle through all the -O* options).

> +
> +static void __attribute__((noinline, noclone))
> +my_memcpy (char *dest, const char *src, int n)
> +{
> +  __builtin_memcpy (dest, src, n);
> +}
> +
> +int
> +main (void)
> +{
> +  char a1[4], a2[4];
> +  __builtin_memset (a1, 'a', 4);
> +  __builtin_memset (a2, 'b', 4);
> +  my_memcpy (a2, a1, 4);
> +  if (a2[0] != 'a')
> +    __builtin_abort ();
> +  return 0;
> +}
> +


	Jakub
Michael Zolotukhin - Sept. 9, 2013, 7:52 a.m.
> I don't see anything i386 specific on the testcase, except the flags,
> and don't see why you need -fno-common in there, there are no global vars.
> So, I think it would be better to stick it into gcc.dg/torture/, drop
> dg-require-* line and instead just add
> /* { dg-additional-options "-march=pentiumpro" { target ia32 } } */
> /* { dg-additional-options "-minline-all-stringops" { target { i?86-*-* x86_64-*-* } } } */
> or so (and let it cycle through all the -O* options).
Originally the test targeted a specific situation, happening on pentiumpro (and
thus ia32), because on pentium pro we want 64-bit alignment for 32-bit
rep-moves.  So, it reveals an issue when desired alignment is bigger than size
of move_mode.
I don't see if it could be helpful on other platforms, though if
you think it's worthwhile, I'll update the test as you suggested.

Michael
> 	Jakub
Jakub Jelinek - Sept. 9, 2013, 7:59 a.m.
On Mon, Sep 09, 2013 at 11:52:49AM +0400, Michael V. Zolotukhin wrote:
> > I don't see anything i386 specific on the testcase, except the flags,
> > and don't see why you need -fno-common in there, there are no global vars.
> > So, I think it would be better to stick it into gcc.dg/torture/, drop
> > dg-require-* line and instead just add
> > /* { dg-additional-options "-march=pentiumpro" { target ia32 } } */
> > /* { dg-additional-options "-minline-all-stringops" { target { i?86-*-* x86_64-*-* } } } */
> > or so (and let it cycle through all the -O* options).
> Originally the test targeted a specific situation, happening on pentiumpro (and
> thus ia32), because on pentium pro we want 64-bit alignment for 32-bit
> rep-moves.  So, it reveals an issue when desired alignment is bigger than size
> of move_mode.
> I don't see if it could be helpful on other platforms, though if
> you think it's worthwhile, I'll update the test as you suggested.

I think it is worthwhile, various targets have many different ways to expand
memcpy, admittedly i?86/x86_64 probably the biggest number of these, and
while right now you've encountered it on ia32 with certain options doesn't
mean that in a few years it couldn't hit some unrelated target, arm, sh,
sparc, whatever.

	Jakub

Patch

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index e2fa71a..1f07e6f 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -23329,7 +23329,7 @@  ix86_expand_movmem (rtx dst, rtx src, rtx count_exp, rtx align_exp,
 
   if (count_exp != const0_rtx && epilogue_size_needed > 1)
     expand_movmem_epilogue (dst, src, destreg, srcreg, count_exp,
-			    size_needed);
+			    epilogue_size_needed);
   if (jump_around_label)
     emit_label (jump_around_label);
   return true;
diff --git a/gcc/testsuite/gcc.target/i386/memcpy-2.c b/gcc/testsuite/gcc.target/i386/memcpy-2.c
new file mode 100644
index 0000000..c8dfbe3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/memcpy-2.c
@@ -0,0 +1,22 @@ 
+/* { dg-do run } */
+/* { dg-require-effective-target ia32 } */
+/* { dg-options "-O2 -march=pentiumpro -minline-all-stringops -fno-common" } */
+
+static void __attribute__((noinline, noclone))
+my_memcpy (char *dest, const char *src, int n)
+{
+  __builtin_memcpy (dest, src, n);
+}
+
+int
+main (void)
+{
+  char a1[4], a2[4];
+  __builtin_memset (a1, 'a', 4);
+  __builtin_memset (a2, 'b', 4);
+  my_memcpy (a2, a1, 4);
+  if (a2[0] != 'a')
+    __builtin_abort ();
+  return 0;
+}
+