diff mbox

[libgfortran] Use memcpy in a few more places for eoshift

Message ID 0df7937d-8122-5a31-fdbc-bb09369e5283@netcologne.de
State New
Headers show

Commit Message

Thomas Koenig July 3, 2017, 10:06 p.m. UTC
Hello world,

attached are a few more speedups for special eoshift cases.  This
time, nothing fancy, just use memcpy for copying in the
contiguous case.

I am still looking at eoshift2 (scalar shift, array boundary)
to see if it would be possible to duplicate the speed gains for
eoshift0 (scalar shift, scalar boundary), but it won't hurt
to do this first.  At least the shift along dimension 1
should be faster by about a factor of two.

I have also added a few test cases which test eoshift in all
the variants touched by this patch.

Regression-testing as I write this.  I don't expect anything bad
(because I tested all test cases containing *eoshift*).

OK for trunk if this passes?

Regards

	Thomas

2017-06-03  Thomas Koenig  <tkoenig@gcc.gnu.org>

         * intrinsics/eoshift2.c (eoshift2):  Use memcpy
         for innermost copy where possible.
         * m4/eoshift1.m4 (eoshift1): Likewise.
         * m4/eoshift3.m4 (eoshift3): Likewise.
         * generated/eoshift1_16.c: Regenerated.
         * generated/eoshift1_4.c: Regenerated.
         * generated/eoshift1_8.c: Regenerated.
         * generated/eoshift3_16.c: Regenerated.
         * generated/eoshift3_4.c: Regenerated.
         * generated/eoshift3_8.c: Regenerated.

2017-06-03  Thomas Koenig  <tkoenig@gcc.gnu.org>

         * gfortran.dg/eoshift_4.f90:  New test.
         * gfortran.dg/eoshift_5.f90:  New test.
         * gfortran.dg/eoshift_6.f90:  New test.

Comments

Thomas Koenig July 8, 2017, 11:57 a.m. UTC | #1
Am 04.07.2017 um 00:06 schrieb Thomas Koenig:

> attached are a few more speedups for special eoshift cases.  This
> time, nothing fancy, just use memcpy for copying in the
> contiguous case.

Ping?

Regards

	Thomas
Thomas Koenig July 9, 2017, 11:28 a.m. UTC | #2
Am 08.07.2017 um 13:57 schrieb Thomas Koenig:
> Am 04.07.2017 um 00:06 schrieb Thomas Koenig:
> 
>> attached are a few more speedups for special eoshift cases.  This
>> time, nothing fancy, just use memcpy for copying in the
>> contiguous case.
> 
> Ping?
> 
> Regards
> 
>      Thomas

Some benchmarks (source attached).


$ gfortran eo_bench_2.f90 && ./a.out
  dim =            1  t =   0.747093916
  dim =            2  t =    2.09117603
  dim =            3  t =    3.07099581
$ gfortran-7 -static-libgfortran eo_bench_2.f90 && ./a.out
  dim =            1  t =    1.24332905
  dim =            2  t =    2.09103727
  dim =            3  t =    3.05382776
$ gfortran eo_bench_3.f90 && ./a.out
  dim =            1  t =   0.734890938
  dim =            2  t =    2.40442204
  dim =            3  t =    3.12888288
$ gfortran-7 -static-libgfortran eo_bench_3.f90 && ./a.out
  dim =            1  t =    1.30460107
  dim =            2  t =    2.17445374
  dim =            3  t =    2.78331423
$ gfortran eo_bench_4.f90 && ./a.out
  dim =            1  t =   0.777376175
  dim =            2  t =    2.40524292
  dim =            3  t =    3.10695219
$ gfortran-7 -static-libgfortran eo_bench_4.f90 && ./a.out
  dim =            1  t =    1.39399910
  dim =            2  t =    2.16738701
  dim =            3  t =    3.09568548

So, we get a 65% to 78% speedup for a common use case (dim=1).
program main
  implicit none
  integer, parameter :: n=600
  real, dimension(n,n,n) :: a, c
  real, dimension(n,n) :: b
  real :: t1, t2
  integer :: dim
  
  call random_number(a)
  b = 0.
  do dim=1,3
     call cpu_time(t1)
     c = eoshift(a, -3, dim=dim, boundary=b)
     call cpu_time(t2)
     print *,"dim = ", dim, " t = ", t2-t1
  end do
end program main
Paul Richard Thomas July 9, 2017, 6:10 p.m. UTC | #3
Hi Thomas,

The patch is OK by me.

Thanks for working on speeding up these library functions. Does the
octave version, mentioned in the clf thread, translate easily into C?
I had to remind myself of how octave cell arrays function. It is
certainly a remarkably concise solution.

Cheers

Paul

On 8 July 2017 at 12:57, Thomas Koenig <tkoenig@netcologne.de> wrote:
> Am 04.07.2017 um 00:06 schrieb Thomas Koenig:
>
>> attached are a few more speedups for special eoshift cases.  This
>> time, nothing fancy, just use memcpy for copying in the
>> contiguous case.
>
>
> Ping?
>
> Regards
>
>         Thomas
Thomas Koenig July 9, 2017, 7:11 p.m. UTC | #4
Hi Paul,

> The patch is OK by me.

Thanks for the review.  Committed as rev. 250085.

> Thanks for working on speeding up these library functions. Does the
> octave version, mentioned in the clf thread, translate easily into C?
> I had to remind myself of how octave cell arrays function. It is
> certainly a remarkably concise solution.

I have to admit that I do not yet know how ocatave does it.
A bit of cursory grepping in the source did not reveal any C (or C++)
code which uses something called "circshift", so I will have to do
some more looking.

Regards

	Thomas
diff mbox

Patch

Index: intrinsics/eoshift2.c
===================================================================
--- intrinsics/eoshift2.c	(Revision 249936)
+++ intrinsics/eoshift2.c	(Arbeitskopie)
@@ -181,12 +181,23 @@  eoshift2 (gfc_array_char *ret, const gfc_array_cha
           src = sptr;
           dest = &rptr[-shift * roffset];
         }
-      for (n = 0; n < len; n++)
-        {
-          memcpy (dest, src, size);
-          dest += roffset;
-          src += soffset;
-        }
+
+      /* If the elements are contiguous, perform a single block move.  */
+      if (soffset == size && roffset == size)
+	{
+	  size_t chunk = size * len;
+	  memcpy (dest, src, chunk);
+	  dest += chunk;
+	}
+      else
+	{
+	  for (n = 0; n < len; n++)
+	    {
+	      memcpy (dest, src, size);
+	      dest += roffset;
+	      src += soffset;
+	    }
+	}
       if (shift >= 0)
         {
           n = shift;
Index: m4/eoshift1.m4
===================================================================
--- m4/eoshift1.m4	(Revision 249936)
+++ m4/eoshift1.m4	(Arbeitskopie)
@@ -184,12 +184,23 @@  eoshift1 (gfc_array_char * const restrict ret,
           src = sptr;
           dest = &rptr[delta * roffset];
         }
-      for (n = 0; n < len - delta; n++)
-        {
-          memcpy (dest, src, size);
-          dest += roffset;
-          src += soffset;
-        }
+
+      /* If the elements are contiguous, perform a single block move.  */
+      if (soffset == size && roffset == size)
+	{
+	  size_t chunk = size * (len - delta);
+	  memcpy (dest, src, chunk);
+	  dest += chunk;
+	}
+      else
+	{
+	  for (n = 0; n < len - delta; n++)
+	    {
+	      memcpy (dest, src, size);
+	      dest += roffset;
+	      src += soffset;
+	    }
+	}
       if (sh < 0)
         dest = rptr;
       n = delta;
Index: m4/eoshift3.m4
===================================================================
--- m4/eoshift3.m4	(Revision 249936)
+++ m4/eoshift3.m4	(Arbeitskopie)
@@ -199,12 +199,24 @@  eoshift3 (gfc_array_char * const restrict ret,
           src = sptr;
           dest = &rptr[delta * roffset];
         }
-      for (n = 0; n < len - delta; n++)
-        {
-          memcpy (dest, src, size);
-          dest += roffset;
-          src += soffset;
-        }
+
+      /* If the elements are contiguous, perform a single block move.  */
+      if (soffset == size && roffset == size)
+	{
+	  size_t chunk = size * (len - delta);
+	  memcpy (dest, src, chunk);
+	  dest += chunk;
+	}
+      else
+	{
+	  for (n = 0; n < len - delta; n++)
+	    {
+	      memcpy (dest, src, size);
+	      dest += roffset;
+	      src += soffset;
+	    }
+	}
+
       if (sh < 0)
         dest = rptr;
       n = delta;
Index: generated/eoshift1_16.c
===================================================================
--- generated/eoshift1_16.c	(Revision 249936)
+++ generated/eoshift1_16.c	(Arbeitskopie)
@@ -183,12 +183,23 @@  eoshift1 (gfc_array_char * const restrict ret,
           src = sptr;
           dest = &rptr[delta * roffset];
         }
-      for (n = 0; n < len - delta; n++)
-        {
-          memcpy (dest, src, size);
-          dest += roffset;
-          src += soffset;
-        }
+
+      /* If the elements are contiguous, perform a single block move.  */
+      if (soffset == size && roffset == size)
+	{
+	  size_t chunk = size * (len - delta);
+	  memcpy (dest, src, chunk);
+	  dest += chunk;
+	}
+      else
+	{
+	  for (n = 0; n < len - delta; n++)
+	    {
+	      memcpy (dest, src, size);
+	      dest += roffset;
+	      src += soffset;
+	    }
+	}
       if (sh < 0)
         dest = rptr;
       n = delta;
Index: generated/eoshift1_4.c
===================================================================
--- generated/eoshift1_4.c	(Revision 249936)
+++ generated/eoshift1_4.c	(Arbeitskopie)
@@ -183,12 +183,23 @@  eoshift1 (gfc_array_char * const restrict ret,
           src = sptr;
           dest = &rptr[delta * roffset];
         }
-      for (n = 0; n < len - delta; n++)
-        {
-          memcpy (dest, src, size);
-          dest += roffset;
-          src += soffset;
-        }
+
+      /* If the elements are contiguous, perform a single block move.  */
+      if (soffset == size && roffset == size)
+	{
+	  size_t chunk = size * (len - delta);
+	  memcpy (dest, src, chunk);
+	  dest += chunk;
+	}
+      else
+	{
+	  for (n = 0; n < len - delta; n++)
+	    {
+	      memcpy (dest, src, size);
+	      dest += roffset;
+	      src += soffset;
+	    }
+	}
       if (sh < 0)
         dest = rptr;
       n = delta;
Index: generated/eoshift1_8.c
===================================================================
--- generated/eoshift1_8.c	(Revision 249936)
+++ generated/eoshift1_8.c	(Arbeitskopie)
@@ -183,12 +183,23 @@  eoshift1 (gfc_array_char * const restrict ret,
           src = sptr;
           dest = &rptr[delta * roffset];
         }
-      for (n = 0; n < len - delta; n++)
-        {
-          memcpy (dest, src, size);
-          dest += roffset;
-          src += soffset;
-        }
+
+      /* If the elements are contiguous, perform a single block move.  */
+      if (soffset == size && roffset == size)
+	{
+	  size_t chunk = size * (len - delta);
+	  memcpy (dest, src, chunk);
+	  dest += chunk;
+	}
+      else
+	{
+	  for (n = 0; n < len - delta; n++)
+	    {
+	      memcpy (dest, src, size);
+	      dest += roffset;
+	      src += soffset;
+	    }
+	}
       if (sh < 0)
         dest = rptr;
       n = delta;
Index: generated/eoshift3_16.c
===================================================================
--- generated/eoshift3_16.c	(Revision 249936)
+++ generated/eoshift3_16.c	(Arbeitskopie)
@@ -198,12 +198,24 @@  eoshift3 (gfc_array_char * const restrict ret,
           src = sptr;
           dest = &rptr[delta * roffset];
         }
-      for (n = 0; n < len - delta; n++)
-        {
-          memcpy (dest, src, size);
-          dest += roffset;
-          src += soffset;
-        }
+
+      /* If the elements are contiguous, perform a single block move.  */
+      if (soffset == size && roffset == size)
+	{
+	  size_t chunk = size * (len - delta);
+	  memcpy (dest, src, chunk);
+	  dest += chunk;
+	}
+      else
+	{
+	  for (n = 0; n < len - delta; n++)
+	    {
+	      memcpy (dest, src, size);
+	      dest += roffset;
+	      src += soffset;
+	    }
+	}
+
       if (sh < 0)
         dest = rptr;
       n = delta;
Index: generated/eoshift3_4.c
===================================================================
--- generated/eoshift3_4.c	(Revision 249936)
+++ generated/eoshift3_4.c	(Arbeitskopie)
@@ -198,12 +198,24 @@  eoshift3 (gfc_array_char * const restrict ret,
           src = sptr;
           dest = &rptr[delta * roffset];
         }
-      for (n = 0; n < len - delta; n++)
-        {
-          memcpy (dest, src, size);
-          dest += roffset;
-          src += soffset;
-        }
+
+      /* If the elements are contiguous, perform a single block move.  */
+      if (soffset == size && roffset == size)
+	{
+	  size_t chunk = size * (len - delta);
+	  memcpy (dest, src, chunk);
+	  dest += chunk;
+	}
+      else
+	{
+	  for (n = 0; n < len - delta; n++)
+	    {
+	      memcpy (dest, src, size);
+	      dest += roffset;
+	      src += soffset;
+	    }
+	}
+
       if (sh < 0)
         dest = rptr;
       n = delta;
Index: generated/eoshift3_8.c
===================================================================
--- generated/eoshift3_8.c	(Revision 249936)
+++ generated/eoshift3_8.c	(Arbeitskopie)
@@ -198,12 +198,24 @@  eoshift3 (gfc_array_char * const restrict ret,
           src = sptr;
           dest = &rptr[delta * roffset];
         }
-      for (n = 0; n < len - delta; n++)
-        {
-          memcpy (dest, src, size);
-          dest += roffset;
-          src += soffset;
-        }
+
+      /* If the elements are contiguous, perform a single block move.  */
+      if (soffset == size && roffset == size)
+	{
+	  size_t chunk = size * (len - delta);
+	  memcpy (dest, src, chunk);
+	  dest += chunk;
+	}
+      else
+	{
+	  for (n = 0; n < len - delta; n++)
+	    {
+	      memcpy (dest, src, size);
+	      dest += roffset;
+	      src += soffset;
+	    }
+	}
+
       if (sh < 0)
         dest = rptr;
       n = delta;