diff mbox

[1/4] Improve generic strcspn performance

Message ID 1459178389-14133-2-git-send-email-adhemerval.zanella@linaro.org
State New
Headers show

Commit Message

Adhemerval Zanella Netto March 28, 2016, 3:19 p.m. UTC
From: Wilco Dijkstra <wdijkstr@arm.com>

Improve strcspn performance using a much faster algorithm.  It is kept simple
so it works well on most targets.  It is generally at least 10 times faster
than the existing implementation on bench-strcspn on a few AArch64
implementations, and for some tests 100 times as fast (repeatedly calling
strchr on a small string is extremely slow...).

In fact the string/bits/string2.h inlines make no longer sense, as GCC
already uses strlen if reject is an empty string, strchrnul is 5 times as
fast as __strcspn_c1, while __strcspn_c2 and __strcspn_c3 are slower than
the strcspn main loop for large strings (though reject length 2-4 could be
special cased in the future to gain even more performance).

Tested on x86_64, i686, and aarch64.

	* string/strcspn.c (strcspn): Rewrite function.
	* string/bits/string2.h (strcspn): Use __builtin_strcspn.
---
 ChangeLog             |  6 ++++++
 string/bits/string2.h | 41 ++++++-----------------------------------
 string/strcspn.c      | 44 ++++++++++++++++++++++++++++++++++++--------
 3 files changed, 48 insertions(+), 43 deletions(-)

Comments

Wilco Dijkstra March 29, 2016, 1:02 p.m. UTC | #1
Adhemerval Zanella wrote:

> +  if (accept[0] == '\0')
> +    return 0;
> +  if (accept[1] == '\0')
> +    { 

GCC doesn't get the static branch prediction correct for the 2nd if,
so it would be useful to add __glibc_unlikely given that single-character
accepts are rare.

> +  s = (unsigned char *) ((size_t)(s) & ~3);
> +  unsigned int c0, c1, c2, c3; 
> +  do {
> +      s += 4;
> +      c0 = p[s[0]];
> +      c1 = p[s[1]];
> +      c2 = p[s[2]];
> +      c3 = p[s[3]];
> +  } while ((c0 && c1 && c2 && c3) == 1);

That should use '&' rather than '&&' and '!= 0' similar to how I did it in strcspn.
This will use 3 AND(S) instructions and a single branch.

> +
> +  size_t count = s - (unsigned char *) str;
> +  return (c0 && c1) == 0 ? count - !c0 + 1 : count - !c2 + 3;

Again, c0 & c1 is better and allows CSE with the while expression above.
Also -!c0 +1 is equivalent to c0, -!c2 + 3 is equivalent to c2 + 2 - this is simpler
and faster.

Otherwise it looks good, and thanks for doing this one too!

Cheers,
Wilco
Richard Henderson March 30, 2016, 5:47 p.m. UTC | #2
On 03/28/2016 08:19 AM, Adhemerval Zanella wrote:
> + /* The inline functions are not used from GLIBC 2.24 and forward, however
> +    they are required to provide the symbols through string-inlines.c
> +    (if inlining is not possible for compatibility reasons).  */
>  __STRING_INLINE size_t __strcspn_c1 (const char *__s, int __reject);
>  __STRING_INLINE size_t
>  __strcspn_c1 (const char *__s, int __reject)

They could, however, be moved out of the installed header file and be given
compatibility symbol versions.


r~
Wilco Dijkstra March 30, 2016, 6 p.m. UTC | #3
Richard Henderson wrote:
> On 03/28/2016 08:19 AM, Adhemerval Zanella wrote:
> > + /* The inline functions are not used from GLIBC 2.24 and forward, however
> > +    they are required to provide the symbols through string-inlines.c
> > +    (if inlining is not possible for compatibility reasons).  */
> >  __STRING_INLINE size_t __strcspn_c1 (const char *__s, int __reject);
> >  __STRING_INLINE size_t
> >  __strcspn_c1 (const char *__s, int __reject)
>
> They could, however, be moved out of the installed header file and be given
> compatibility symbol versions.

I have several patches out for review that move all the inlines from string2.h 
to string/string-inlines.c for backwards compatibility and discontinue the complex
inlining of some code from string2.h so one no longer needs to worry about
accidental ABI changes.

See http://patchwork.sourceware.org/patch/10936/ for the strcspn move (this builds on
http://patchwork.sourceware.org/patch/10933/ and http://patchwork.sourceware.org/patch/10934/).

What do you mean with "compatibility symbol versions"?

Wilco
Adhemerval Zanella Netto March 30, 2016, 6:24 p.m. UTC | #4
On 30-03-2016 15:00, Wilco Dijkstra wrote:
> 
> Richard Henderson wrote:
>> On 03/28/2016 08:19 AM, Adhemerval Zanella wrote:
>>> + /* The inline functions are not used from GLIBC 2.24 and forward, however
>>> +    they are required to provide the symbols through string-inlines.c
>>> +    (if inlining is not possible for compatibility reasons).  */
>>>  __STRING_INLINE size_t __strcspn_c1 (const char *__s, int __reject);
>>>  __STRING_INLINE size_t
>>>  __strcspn_c1 (const char *__s, int __reject)
>>
>> They could, however, be moved out of the installed header file and be given
>> compatibility symbol versions.
> 
> I have several patches out for review that move all the inlines from string2.h 
> to string/string-inlines.c for backwards compatibility and discontinue the complex
> inlining of some code from string2.h so one no longer needs to worry about
> accidental ABI changes.
> 
> See http://patchwork.sourceware.org/patch/10936/ for the strcspn move (this builds on
> http://patchwork.sourceware.org/patch/10933/ and http://patchwork.sourceware.org/patch/10934/).
> 
> What do you mean with "compatibility symbol versions"?
> 
> Wilco 
> 

My understanding is since GLIBC will not emit the __str{spn,cspn,brk}
anymore the symbols can now be build as compatibility mode only.
I will change it for this patch series and check out the ones you
noted.
Richard Henderson March 30, 2016, 6:46 p.m. UTC | #5
On 03/30/2016 11:00 AM, Wilco Dijkstra wrote:
> 
> Richard Henderson wrote:
>> On 03/28/2016 08:19 AM, Adhemerval Zanella wrote:
>>> + /* The inline functions are not used from GLIBC 2.24 and forward, however
>>> +    they are required to provide the symbols through string-inlines.c
>>> +    (if inlining is not possible for compatibility reasons).  */
>>>  __STRING_INLINE size_t __strcspn_c1 (const char *__s, int __reject);
>>>  __STRING_INLINE size_t
>>>  __strcspn_c1 (const char *__s, int __reject)
>>
>> They could, however, be moved out of the installed header file and be given
>> compatibility symbol versions.
> 
> I have several patches out for review that move all the inlines from string2.h 
> to string/string-inlines.c for backwards compatibility and discontinue the complex
> inlining of some code from string2.h so one no longer needs to worry about
> accidental ABI changes.
> 
> See http://patchwork.sourceware.org/patch/10936/ for the strcspn move (this builds on
> http://patchwork.sourceware.org/patch/10933/ and http://patchwork.sourceware.org/patch/10934/).
> 
> What do you mean with "compatibility symbol versions"?

It means that the symbols are not available with a default version, and so new
programs cannot link against the symbol.  However, old programs that are
already linked will continue to use the old symbol version.

See e.g. nptl/pt-fork.c, wherein "fork" and "__fork" were removed from the
default symbols for libpthread, and so now are exported only as compatibility
symbols.


r~
Richard Henderson March 31, 2016, 5 p.m. UTC | #6
On 03/28/2016 08:19 AM, Adhemerval Zanella wrote:
> +  s = (unsigned char *) ((size_t)s & ~3);

Nit: s/size_t/uintptr_t/.

It's the same type for all supported targets,
but the spelling says what you mean.


r~
Roland McGrath April 1, 2016, 8:44 p.m. UTC | #7
> On 03/28/2016 08:19 AM, Adhemerval Zanella wrote:
> > +  s = (unsigned char *) ((size_t)s & ~3);
> 
> Nit: s/size_t/uintptr_t/.
> 
> It's the same type for all supported targets,
> but the spelling says what you mean.

libc-internal.h has PTR_ALIGN_DOWN for this.
diff mbox

Patch

diff --git a/string/bits/string2.h b/string/bits/string2.h
index 8200ef1..1b87686 100644
--- a/string/bits/string2.h
+++ b/string/bits/string2.h
@@ -905,43 +905,14 @@  __stpcpy_small (char *__dest,
 
 /* Return the length of the initial segment of S which
    consists entirely of characters not in REJECT.  */
-#if !defined _HAVE_STRING_ARCH_strcspn || defined _FORCE_INLINES
-# ifndef _HAVE_STRING_ARCH_strcspn
-#  if __GNUC_PREREQ (3, 2)
-#   define strcspn(s, reject) \
-  __extension__								      \
-  ({ char __r0, __r1, __r2;						      \
-     (__builtin_constant_p (reject) && __string2_1bptr_p (reject)	      \
-      ? ((__builtin_constant_p (s) && __string2_1bptr_p (s))		      \
-	 ? __builtin_strcspn (s, reject)				      \
-	 : ((__r0 = ((const char *) (reject))[0], __r0 == '\0')		      \
-	    ? strlen (s)						      \
-	    : ((__r1 = ((const char *) (reject))[1], __r1 == '\0')	      \
-	       ? __strcspn_c1 (s, __r0)					      \
-	       : ((__r2 = ((const char *) (reject))[2], __r2 == '\0')	      \
-		  ? __strcspn_c2 (s, __r0, __r1)			      \
-		  : (((const char *) (reject))[3] == '\0'		      \
-		     ? __strcspn_c3 (s, __r0, __r1, __r2)		      \
-		     : __builtin_strcspn (s, reject))))))		      \
-      : __builtin_strcspn (s, reject)); })
-#  else
-#   define strcspn(s, reject) \
-  __extension__								      \
-  ({ char __r0, __r1, __r2;						      \
-     (__builtin_constant_p (reject) && __string2_1bptr_p (reject)	      \
-      ? ((__r0 = ((const char *) (reject))[0], __r0 == '\0')		      \
-	 ? strlen (s)							      \
-	 : ((__r1 = ((const char *) (reject))[1], __r1 == '\0')		      \
-	    ? __strcspn_c1 (s, __r0)					      \
-	    : ((__r2 = ((const char *) (reject))[2], __r2 == '\0')	      \
-	       ? __strcspn_c2 (s, __r0, __r1)				      \
-	       : (((const char *) (reject))[3] == '\0'			      \
-		  ? __strcspn_c3 (s, __r0, __r1, __r2)			      \
-		  : strcspn (s, reject)))))				      \
-      : strcspn (s, reject)); })
-#  endif
+#ifndef _HAVE_STRING_ARCH_strcspn
+# if __GNUC_PREREQ (3, 2)
+#  define strcspn(s, reject) __builtin_strcspn (s, reject)
 # endif
 
+ /* The inline functions are not used from GLIBC 2.24 and forward, however
+    they are required to provide the symbols through string-inlines.c
+    (if inlining is not possible for compatibility reasons).  */
 __STRING_INLINE size_t __strcspn_c1 (const char *__s, int __reject);
 __STRING_INLINE size_t
 __strcspn_c1 (const char *__s, int __reject)
diff --git a/string/strcspn.c b/string/strcspn.c
index 8888919..89ba4ca 100644
--- a/string/strcspn.c
+++ b/string/strcspn.c
@@ -26,16 +26,44 @@ 
 /* Return the length of the maximum initial segment of S
    which contains no characters from REJECT.  */
 size_t
-STRCSPN (const char *s, const char *reject)
+STRCSPN (const char *str, const char *reject)
 {
-  size_t count = 0;
+  if (reject[0] == '\0' || reject[1] == '\0')
+    return __strchrnul (str, reject [0]) - str;
 
-  while (*s != '\0')
-    if (strchr (reject, *s++) == NULL)
-      ++count;
-    else
-      return count;
+  /* Use multiple small memsets to enable inlining on most targets.  */
+  unsigned char table[256];
+  unsigned char *p = memset (table, 0, 64);
+  memset (p + 64, 0, 64);
+  memset (p + 128, 0, 64);
+  memset (p + 192, 0, 64);
 
-  return count;
+  unsigned char *s = (unsigned char*) reject;
+  unsigned char tmp;
+  do
+    p[tmp = *s++] = 1;
+  while (tmp);
+
+  s = (unsigned char*) str;
+  if (p[s[0]]) return 0;
+  if (p[s[1]]) return 1;
+  if (p[s[2]]) return 2;
+  if (p[s[3]]) return 3;
+
+  s = (unsigned char *) ((size_t)s & ~3);
+
+  unsigned int c0, c1, c2, c3;
+  do
+    {
+      s += 4;
+      c0 = p[s[0]];
+      c1 = p[s[1]];
+      c2 = p[s[2]];
+      c3 = p[s[3]];
+    }
+  while ((c0 | c1 | c2 | c3) == 0);
+
+  size_t count = s - (unsigned char *) str;
+  return (c0 | c1) != 0 ? count - c0 + 1 : count - c2 + 3;
 }
 libc_hidden_builtin_def (strcspn)