[fortran] optimize string comparison

Message ID	5152B9FB.1090103@net-b.de
State	New
Headers	show Return-Path: <gcc-patches-return-338755-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; q=dns; s= default; b=QPJcTlTa4BwbfyMoPT6L8/fiOdab99jkOgKn/gYHuA4JqMCZxlxom ZXI2YkWuWMF+xMzSH5mxcxRoRtEshZRAiwdEc4BF/O4/oenQ9v4XbFGvKXF/lRcH fA5VcoDwk2j//XgczyqxsmO9U32X4+2jbnSFpdkW3YINRClGHPH0hU= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org Message-ID: <5152B9FB.1090103@net-b.de> Date: Wed, 27 Mar 2013 10:20:59 +0100 From: Tobias Burnus <burnus@net-b.de> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130307 Thunderbird/17.0.4 MIME-Version: 1.0 To: gfortran <fortran@gcc.gnu.org>, gcc patches <gcc-patches@gcc.gnu.org>, =?UTF-8?B?T25kxZllaiBCw61sa2E=?= <neleai@seznam.cz> Subject: Fwd: [Patch, fortran] optimize string comparison References: <20130327083557.GB6374@domone.kolej.mff.cuni.cz> In-Reply-To: <20130327083557.GB6374@domone.kolej.mff.cuni.cz> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit

Message ID

5152B9FB.1090103@net-b.de

State

New

Headers

DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:message-id:date:from:mime-version:to:subject:references
	:in-reply-to:content-type:content-transfer-encoding; q=dns; s=
	default; b=QPJcTlTa4BwbfyMoPT6L8/fiOdab99jkOgKn/gYHuA4JqMCZxlxom
	ZXI2YkWuWMF+xMzSH5mxcxRoRtEshZRAiwdEc4BF/O4/oenQ9v4XbFGvKXF/lRcH
	fA5VcoDwk2j//XgczyqxsmO9U32X4+2jbnSFpdkW3YINRClGHPH0hU=
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
Sender: gcc-patches-owner@gcc.gnu.org
Message-ID: <5152B9FB.1090103@net-b.de>
Date: Wed, 27 Mar 2013 10:20:59 +0100
From: Tobias Burnus <burnus@net-b.de>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:17.0) Gecko/20130307 Thunderbird/17.0.4
MIME-Version: 1.0
To: gfortran <fortran@gcc.gnu.org>, gcc patches <gcc-patches@gcc.gnu.org>,
	=?UTF-8?B?T25kxZllaiBCw61sa2E=?= <neleai@seznam.cz>
Subject: Fwd: [Patch, fortran] optimize string comparison
References: <20130327083557.GB6374@domone.kolej.mff.cuni.cz>
In-Reply-To: <20130327083557.GB6374@domone.kolej.mff.cuni.cz>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit

Commit Message

Tobias Burnus March 27, 2013, 9:20 a.m. UTC

(The email below was only sent to gcc-patches@; I now also CC fortran@ - 
sorry for the full quote)

Regarding the below patch: I think it does not work as-is for Unicode 
strings (UCS4, character(kind=4)), where each character is 4 bytes wide 
and a space does not consist of sequences of four ' '.


Regarding Thomas' patch:* I would also think that memcmp should work for 
kind=4 characters; one then needs to multiply the length by the 
byte-size. (Actually, for kind==1, one could check the excess characters 
in the generated code via memchr as done in Ondřej's patch.)


However, looking at intrinsics/string_intrinsics{,_inc}.c, I see that we 
don't use MEMCMP for UCS4 either - but some a hand-written function. I 
think that could also be replaced by the normal memcmp (or did I miss 
some fine print?). A possible patch would be to replace
   #define MEMCMP memcmp_char4
by
   #define MEMCMP(a,b,c) memcmp(a,b,4*(c))
and delete the memcmp_char4 function.


Tobias

* http://gcc.gnu.org/ml/fortran/2013-03/msg00142.html


-------- Original Message --------
Subject: [Patch, fortran] optimize string comparison
Date: Wed, 27 Mar 2013 09:35:57 +0100
From: Ondřej Bílka <neleai@seznam.cz>
To: gcc-patches@gcc.gnu.org


Hi,
as I looked to compare_string I discovered that it could be
optimized. This speeds up case when strings are equal but we must check
padding where checking it byte by byte is suboptimal.

Ondra

2013-03-27  Ondřej Bílka  <neleai@seznam.cz>

	* libgfortran/intrinsics/string_intrinsics_inc.c (compare_string): 
Optimize.

    return 0;

Comments

Ondřej Bílka March 27, 2013, 6:59 p.m. UTC | #1

On Wed, Mar 27, 2013 at 10:20:59AM +0100, Tobias Burnus wrote:
> (The email below was only sent to gcc-patches@; I now also CC
> fortran@ - sorry for the full quote)
> 
> Regarding the below patch: I think it does not work as-is for
> Unicode strings (UCS4, character(kind=4)), where each character is 4
> bytes wide and a space does not consist of sequences of four ' '.
>
I did not know about that. We could use wmemchr when sizeof(wchar_t)==4.
where should I put that?
> 
> Regarding Thomas' patch:* I would also think that memcmp should work
> for kind=4 characters; one then needs to multiply the length by the
> byte-size. (Actually, for kind==1, one could check the excess
> characters in the generated code via memchr as done in Ondřej's
> patch.)
> 
> 
> However, looking at intrinsics/string_intrinsics{,_inc}.c, I see
> that we don't use MEMCMP for UCS4 either - but some a hand-written
> function. I think that could also be replaced by the normal memcmp
> (or did I miss some fine print?). A possible patch would be to
> replace
>   #define MEMCMP memcmp_char4
> by
>   #define MEMCMP(a,b,c) memcmp(a,b,4*(c))
> and delete the memcmp_char4 function.
> 
Or use wmemcmp.
> 
> Tobias
> 
> * http://gcc.gnu.org/ml/fortran/2013-03/msg00142.html
>

diff --git a/libgfortran/intrinsics/string_intrinsics_inc.c 
b/libgfortran/intrinsics/string_intrinsics_inc.c
index a1f86b5..9eb0613 100644
--- a/libgfortran/intrinsics/string_intrinsics_inc.c
+++ b/libgfortran/intrinsics/string_intrinsics_inc.c
@@ -107,16 +107,15 @@  compare_string (gfc_charlen_type len1, const 
CHARTYPE *s1,
        res = 1;
      }

-  while (len--)
+	s = memchr (s, ' ', len);
+	if (!s)
+		return 0;
+  if (*s != ' ')
      {
-      if (*s != ' ')
-        {
-          if (*s > ' ')
-            return res;
-          else
-            return -res;
-        }
-      s++;
+      if (*s > ' ')
+        return res;
+      else
+        return -res;
      }