Patchwork [fortran] optimize string comparison

login
register
mail settings
Submitter Tobias Burnus
Date March 27, 2013, 9:20 a.m.
Message ID <5152B9FB.1090103@net-b.de>
Download mbox | patch
Permalink /patch/231621/
State New
Headers show

Comments

Tobias Burnus - March 27, 2013, 9:20 a.m.
(The email below was only sent to gcc-patches@; I now also CC fortran@ - 
sorry for the full quote)

Regarding the below patch: I think it does not work as-is for Unicode 
strings (UCS4, character(kind=4)), where each character is 4 bytes wide 
and a space does not consist of sequences of four ' '.


Regarding Thomas' patch:* I would also think that memcmp should work for 
kind=4 characters; one then needs to multiply the length by the 
byte-size. (Actually, for kind==1, one could check the excess characters 
in the generated code via memchr as done in Ondřej's patch.)


However, looking at intrinsics/string_intrinsics{,_inc}.c, I see that we 
don't use MEMCMP for UCS4 either - but some a hand-written function. I 
think that could also be replaced by the normal memcmp (or did I miss 
some fine print?). A possible patch would be to replace
   #define MEMCMP memcmp_char4
by
   #define MEMCMP(a,b,c) memcmp(a,b,4*(c))
and delete the memcmp_char4 function.


Tobias

* http://gcc.gnu.org/ml/fortran/2013-03/msg00142.html


-------- Original Message --------
Subject: [Patch, fortran] optimize string comparison
Date: Wed, 27 Mar 2013 09:35:57 +0100
From: Ondřej Bílka <neleai@seznam.cz>
To: gcc-patches@gcc.gnu.org


Hi,
as I looked to compare_string I discovered that it could be
optimized. This speeds up case when strings are equal but we must check
padding where checking it byte by byte is suboptimal.

Ondra

2013-03-27  Ondřej Bílka  <neleai@seznam.cz>

	* libgfortran/intrinsics/string_intrinsics_inc.c (compare_string): 
Optimize.

    return 0;
Ondrej Bilka - March 27, 2013, 6:59 p.m.
On Wed, Mar 27, 2013 at 10:20:59AM +0100, Tobias Burnus wrote:
> (The email below was only sent to gcc-patches@; I now also CC
> fortran@ - sorry for the full quote)
> 
> Regarding the below patch: I think it does not work as-is for
> Unicode strings (UCS4, character(kind=4)), where each character is 4
> bytes wide and a space does not consist of sequences of four ' '.
>
I did not know about that. We could use wmemchr when sizeof(wchar_t)==4.
where should I put that?
> 
> Regarding Thomas' patch:* I would also think that memcmp should work
> for kind=4 characters; one then needs to multiply the length by the
> byte-size. (Actually, for kind==1, one could check the excess
> characters in the generated code via memchr as done in Ondřej's
> patch.)
> 
> 
> However, looking at intrinsics/string_intrinsics{,_inc}.c, I see
> that we don't use MEMCMP for UCS4 either - but some a hand-written
> function. I think that could also be replaced by the normal memcmp
> (or did I miss some fine print?). A possible patch would be to
> replace
>   #define MEMCMP memcmp_char4
> by
>   #define MEMCMP(a,b,c) memcmp(a,b,4*(c))
> and delete the memcmp_char4 function.
> 
Or use wmemcmp.
> 
> Tobias
> 
> * http://gcc.gnu.org/ml/fortran/2013-03/msg00142.html
>

Patch

diff --git a/libgfortran/intrinsics/string_intrinsics_inc.c 
b/libgfortran/intrinsics/string_intrinsics_inc.c
index a1f86b5..9eb0613 100644
--- a/libgfortran/intrinsics/string_intrinsics_inc.c
+++ b/libgfortran/intrinsics/string_intrinsics_inc.c
@@ -107,16 +107,15 @@  compare_string (gfc_charlen_type len1, const 
CHARTYPE *s1,
        res = 1;
      }

-  while (len--)
+	s = memchr (s, ' ', len);
+	if (!s)
+		return 0;
+  if (*s != ' ')
      {
-      if (*s != ' ')
-        {
-          if (*s > ' ')
-            return res;
-          else
-            return -res;
-        }
-      s++;
+      if (*s > ' ')
+        return res;
+      else
+        return -res;
      }