[v3,05/18] string: Improve generic strlen

Message ID 1515588482-15744-6-git-send-email-adhemerval.zanella@linaro.org
State New
Headers show
Series
  • Improve generic string routines
Related show

Commit Message

Adhemerval Zanella Jan. 10, 2018, 12:47 p.m.
From: Richard Henderson <rth@twiddle.net>

New algorithm have the following key differences:

  - Reads first word unaligned and use string-maskoff functions to
    remove unwanted data.  This strategy follow assemble optimized
    ones for powerpc, sparc, and SH.

  - Use of has_zero and index_first_zero parametrized functions.

Checked on x86_64-linux-gnu, i686-linux-gnu, sparc64-linux-gnu,
and sparcv9-linux-gnu by removing the arch-specific assembly
implementation and disabling multi-arch (it covers both LE and BE
for 64 and 32 bits).

	[BZ #5806]
    	* string/strlen.c: Use them.
---
 string/strlen.c | 83 +++++++++++----------------------------------------------
 1 file changed, 15 insertions(+), 68 deletions(-)

Comments

Paul Eggert Jan. 11, 2018, 5:21 p.m. | #1
On 01/10/2018 04:47 AM, Adhemerval Zanella wrote:
> +  /* Align pointer to sizeof op_t.  */
> +  const uintptr_t s_int = (uintptr_t) str;
> +  const op_t *word_ptr = (const op_t*) (s_int & -sizeof (op_t));

I see this sort of code used in multiple places (often with different 
implementations), and suggest packaging it up into an inline function. 
Also, you might consider implementing it this way, which is a bit 
simpler (it's the method used in your generic strcmp):

   op_t *
   word_containing (char const *p)
   {
      return (op_t *) (p - (uintptr_t) p % sizeof (op_t));
   }

This generates the same code with gcc -O2, and minimizes the usage of 
pointers as integers.

Similarly, in other code I suggest using char * instead of uintptr_t, as 
much as possible. This works just as well with ordinary addition, and 
will simplify GCC's job if we ever build with -fcheck-pointer-bounds.

> +  while (1)
>       {
> -      /* 64-bit version of the magic.  */
> -      /* Do the shift in two steps to avoid a warning if long has 32 bits.  */
> -      himagic = ((himagic << 16) << 16) | himagic;
> -      lomagic = ((lomagic << 16) << 16) | lomagic;
> +      if (has_zero (word))
> +	break;
> +      word = *++word_ptr;
>       }

This would be a bit simpler:

   while (! has_zero (word))
     word = *++word_ptr;
Adhemerval Zanella Jan. 12, 2018, 5:59 p.m. | #2
On 11/01/2018 15:21, Paul Eggert wrote:
> On 01/10/2018 04:47 AM, Adhemerval Zanella wrote:
>> +  /* Align pointer to sizeof op_t.  */
>> +  const uintptr_t s_int = (uintptr_t) str;
>> +  const op_t *word_ptr = (const op_t*) (s_int & -sizeof (op_t));
> 
> I see this sort of code used in multiple places (often with different implementations), and suggest packaging it up into an inline function. Also, you might consider implementing it this way, which is a bit simpler (it's the method used in your generic strcmp):
> 
>   op_t *
>   word_containing (char const *p)
>   {
>      return (op_t *) (p - (uintptr_t) p % sizeof (op_t));
>   }
> 
> This generates the same code with gcc -O2, and minimizes the usage of pointers as integers.

Thanks, I have added this function suggestion to string-maskoff.h and used on
the generic implementations.

> 
> Similarly, in other code I suggest using char * instead of uintptr_t, as much as possible. This works just as well with ordinary addition, and will simplify GCC's job if we ever build with -fcheck-pointer-bounds.
> 
>> +  while (1)
>>       {
>> -      /* 64-bit version of the magic.  */
>> -      /* Do the shift in two steps to avoid a warning if long has 32 bits.  */
>> -      himagic = ((himagic << 16) << 16) | himagic;
>> -      lomagic = ((lomagic << 16) << 16) | lomagic;
>> +      if (has_zero (word))
>> +    break;
>> +      word = *++word_ptr;
>>       }
> 
> This would be a bit simpler:
> 
>   while (! has_zero (word))
>     word = *++word_ptr;
> 

Thanks, I have applied it locally as well.

Patch

diff --git a/string/strlen.c b/string/strlen.c
index 8ce1318..6bd0ed9 100644
--- a/string/strlen.c
+++ b/string/strlen.c
@@ -20,6 +20,11 @@ 
 
 #include <string.h>
 #include <stdlib.h>
+#include <stdint.h>
+#include <string-fza.h>
+#include <string-fzb.h>
+#include <string-fzi.h>
+#include <string-maskoff.h>
 
 #undef strlen
 
@@ -32,78 +37,20 @@ 
 size_t
 STRLEN (const char *str)
 {
-  const char *char_ptr;
-  const unsigned long int *longword_ptr;
-  unsigned long int longword, himagic, lomagic;
+  /* Align pointer to sizeof op_t.  */
+  const uintptr_t s_int = (uintptr_t) str;
+  const op_t *word_ptr = (const op_t*) (s_int & -sizeof (op_t));
 
-  /* Handle the first few characters by reading one character at a time.
-     Do this until CHAR_PTR is aligned on a longword boundary.  */
-  for (char_ptr = str; ((unsigned long int) char_ptr
-			& (sizeof (longword) - 1)) != 0;
-       ++char_ptr)
-    if (*char_ptr == '\0')
-      return char_ptr - str;
+  /* Read and MASK the first word. */
+  op_t word = *word_ptr | create_mask (s_int);
 
-  /* All these elucidatory comments refer to 4-byte longwords,
-     but the theory applies equally well to 8-byte longwords.  */
-
-  longword_ptr = (unsigned long int *) char_ptr;
-
-  /* Bits 31, 24, 16, and 8 of this number are zero.  Call these bits
-     the "holes."  Note that there is a hole just to the left of
-     each byte, with an extra at the end:
-
-     bits:  01111110 11111110 11111110 11111111
-     bytes: AAAAAAAA BBBBBBBB CCCCCCCC DDDDDDDD
-
-     The 1-bits make sure that carries propagate to the next 0-bit.
-     The 0-bits provide holes for carries to fall into.  */
-  himagic = 0x80808080L;
-  lomagic = 0x01010101L;
-  if (sizeof (longword) > 4)
+  while (1)
     {
-      /* 64-bit version of the magic.  */
-      /* Do the shift in two steps to avoid a warning if long has 32 bits.  */
-      himagic = ((himagic << 16) << 16) | himagic;
-      lomagic = ((lomagic << 16) << 16) | lomagic;
+      if (has_zero (word))
+	break;
+      word = *++word_ptr;
     }
-  if (sizeof (longword) > 8)
-    abort ();
 
-  /* Instead of the traditional loop which tests each character,
-     we will test a longword at a time.  The tricky part is testing
-     if *any of the four* bytes in the longword in question are zero.  */
-  for (;;)
-    {
-      longword = *longword_ptr++;
-
-      if (((longword - lomagic) & ~longword & himagic) != 0)
-	{
-	  /* Which of the bytes was the zero?  If none of them were, it was
-	     a misfire; continue the search.  */
-
-	  const char *cp = (const char *) (longword_ptr - 1);
-
-	  if (cp[0] == 0)
-	    return cp - str;
-	  if (cp[1] == 0)
-	    return cp - str + 1;
-	  if (cp[2] == 0)
-	    return cp - str + 2;
-	  if (cp[3] == 0)
-	    return cp - str + 3;
-	  if (sizeof (longword) > 4)
-	    {
-	      if (cp[4] == 0)
-		return cp - str + 4;
-	      if (cp[5] == 0)
-		return cp - str + 5;
-	      if (cp[6] == 0)
-		return cp - str + 6;
-	      if (cp[7] == 0)
-		return cp - str + 7;
-	    }
-	}
-    }
+  return ((const char *) word_ptr) + index_first_zero (word) - str;
 }
 libc_hidden_builtin_def (strlen)