diff mbox series

string: Replace outdated comments in strlen().

Message ID 20220320010442.991728-1-bluepenguin@gmail.com
State New
Headers show
Series string: Replace outdated comments in strlen(). | expand

Commit Message

Ricardo Bittencourt March 20, 2022, 1:04 a.m. UTC
Copyright The GNU Toolchain Authors.

The comments on strlen() don't match what the actual code does. They
describe an older algorithm which is no longer in use. This change
replace the old comments with new ones describing the algorithm used.

I am a first time contributor, and I believe there is no need for
copyright assignment, since the file changed is not in the shared
source files list. 

This patch only changes comments, but for safety I have run the tests in 
my x64 ubuntu machine, with the following results:

Summary of test results:                                                        
   5051 PASS                                                                    
     80 UNSUPPORTED                                                             
     16 XFAIL                                                                   
      6 XPASS 

Signed-off-by: Ricardo Bittencourt <bluepenguin@gmail.com>
---
 string/strlen.c | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

Comments

Noah Goldstein March 20, 2022, 4:41 a.m. UTC | #1
On Sat, Mar 19, 2022 at 8:05 PM Ricardo Bittencourt via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> Copyright The GNU Toolchain Authors.
>
> The comments on strlen() don't match what the actual code does. They
> describe an older algorithm which is no longer in use. This change
> replace the old comments with new ones describing the algorithm used.
>
> I am a first time contributor, and I believe there is no need for
> copyright assignment, since the file changed is not in the shared
> source files list.
>
> This patch only changes comments, but for safety I have run the tests in
> my x64 ubuntu machine, with the following results:
>
> Summary of test results:
>    5051 PASS
>      80 UNSUPPORTED
>      16 XFAIL
>       6 XPASS
>
> Signed-off-by: Ricardo Bittencourt <bluepenguin@gmail.com>
> ---
>  string/strlen.c | 16 +++++-----------
>  1 file changed, 5 insertions(+), 11 deletions(-)
>
> diff --git a/string/strlen.c b/string/strlen.c
> index 0b8aefb812..54f3fb8167 100644
> --- a/string/strlen.c
> +++ b/string/strlen.c
> @@ -46,15 +46,10 @@ STRLEN (const char *str)
>
>    longword_ptr = (unsigned long int *) char_ptr;
>
> -  /* Bits 31, 24, 16, and 8 of this number are zero.  Call these bits
> -     the "holes."  Note that there is a hole just to the left of
> -     each byte, with an extra at the end:
> -
> -     bits:  01111110 11111110 11111110 11111111
> -     bytes: AAAAAAAA BBBBBBBB CCCCCCCC DDDDDDDD
> -
> -     The 1-bits make sure that carries propagate to the next 0-bit.
> -     The 0-bits provide holes for carries to fall into.  */
> +  /* Computing (longword - lomagic) sets the high bit of any corresponding
> +     byte that is either zero or greater than 0x80.  The latter case can be
> +     filtered out by computing (~longword & himagic).  The final result
> +     will always be non-zero if one of the bytes of longword is zero.  */
>    himagic = 0x80808080L;
>    lomagic = 0x01010101L;
>    if (sizeof (longword) > 4)
> @@ -76,8 +71,7 @@ STRLEN (const char *str)
>
>        if (((longword - lomagic) & ~longword & himagic) != 0)
>         {
> -         /* Which of the bytes was the zero?  If none of them were, it was
> -            a misfire; continue the search.  */
> +         /* Which of the bytes was the zero?  */
>
>           const char *cp = (const char *) (longword_ptr - 1);
>
> --
> 2.25.1
>

LGTM. But wait to commit until monday so others can give feedback
if they want to.
Joseph Myers March 21, 2022, 6:18 p.m. UTC | #2
Some incidental comments (this is not a review of the patch and should not 
delay it going in):

* Do I understand correctly that this is a *different* comment issue from 
the one described in bug 5806 with comments in various string functions?  
(Maybe the issue described in comment#10 in that bug?)

* I'd still like to see the generic string function improvements discussed 
in that bug getting into glibc at some point.
Ricardo Bittencourt March 21, 2022, 6:50 p.m. UTC | #3
I checked the bug and it's the same comment issue (the comments
describe an earlier version of the algorithm). After sending this
patch I can update the others.

The generic string improvements in the bug are better than the current
implementation, is there something stopping us from just applying it?

--
Ricardo

On Mon, Mar 21, 2022 at 3:18 PM Joseph Myers <joseph@codesourcery.com> wrote:
>
> Some incidental comments (this is not a review of the patch and should not
> delay it going in):
>
> * Do I understand correctly that this is a *different* comment issue from
> the one described in bug 5806 with comments in various string functions?
> (Maybe the issue described in comment#10 in that bug?)
>
> * I'd still like to see the generic string function improvements discussed
> in that bug getting into glibc at some point.
>
> --
> Joseph S. Myers
> joseph@codesourcery.com
Ricardo Bittencourt April 9, 2022, 3:50 p.m. UTC | #4
Hi people, by any chance can someone commit this change? I asked for a
sourceware account to do it myself, but it's taking a while, so it
would be best if someone applies this patch meanwhile.

--
Ricardo

On Mon, Mar 21, 2022 at 3:50 PM Ricardo Bittencourt <ricbit@ricbit.com> wrote:
>
> I checked the bug and it's the same comment issue (the comments
> describe an earlier version of the algorithm). After sending this
> patch I can update the others.
>
> The generic string improvements in the bug are better than the current
> implementation, is there something stopping us from just applying it?
>
> --
> Ricardo
>
> On Mon, Mar 21, 2022 at 3:18 PM Joseph Myers <joseph@codesourcery.com> wrote:
> >
> > Some incidental comments (this is not a review of the patch and should not
> > delay it going in):
> >
> > * Do I understand correctly that this is a *different* comment issue from
> > the one described in bug 5806 with comments in various string functions?
> > (Maybe the issue described in comment#10 in that bug?)
> >
> > * I'd still like to see the generic string function improvements discussed
> > in that bug getting into glibc at some point.
> >
> > --
> > Joseph S. Myers
> > joseph@codesourcery.com
Noah Goldstein April 9, 2022, 4:46 p.m. UTC | #5
On Sat, Apr 9, 2022 at 10:50 AM Ricardo Bittencourt via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> Hi people, by any chance can someone commit this change? I asked for a
> sourceware account to do it myself, but it's taking a while, so it
> would be best if someone applies this patch meanwhile.

Pushed.
>
> --
> Ricardo
>
> On Mon, Mar 21, 2022 at 3:50 PM Ricardo Bittencourt <ricbit@ricbit.com> wrote:
> >
> > I checked the bug and it's the same comment issue (the comments
> > describe an earlier version of the algorithm). After sending this
> > patch I can update the others.
> >
> > The generic string improvements in the bug are better than the current
> > implementation, is there something stopping us from just applying it?
> >
> > --
> > Ricardo
> >
> > On Mon, Mar 21, 2022 at 3:18 PM Joseph Myers <joseph@codesourcery.com> wrote:
> > >
> > > Some incidental comments (this is not a review of the patch and should not
> > > delay it going in):
> > >
> > > * Do I understand correctly that this is a *different* comment issue from
> > > the one described in bug 5806 with comments in various string functions?
> > > (Maybe the issue described in comment#10 in that bug?)
> > >
> > > * I'd still like to see the generic string function improvements discussed
> > > in that bug getting into glibc at some point.
> > >
> > > --
> > > Joseph S. Myers
> > > joseph@codesourcery.com
diff mbox series

Patch

diff --git a/string/strlen.c b/string/strlen.c
index 0b8aefb812..54f3fb8167 100644
--- a/string/strlen.c
+++ b/string/strlen.c
@@ -46,15 +46,10 @@  STRLEN (const char *str)
 
   longword_ptr = (unsigned long int *) char_ptr;
 
-  /* Bits 31, 24, 16, and 8 of this number are zero.  Call these bits
-     the "holes."  Note that there is a hole just to the left of
-     each byte, with an extra at the end:
-
-     bits:  01111110 11111110 11111110 11111111
-     bytes: AAAAAAAA BBBBBBBB CCCCCCCC DDDDDDDD
-
-     The 1-bits make sure that carries propagate to the next 0-bit.
-     The 0-bits provide holes for carries to fall into.  */
+  /* Computing (longword - lomagic) sets the high bit of any corresponding
+     byte that is either zero or greater than 0x80.  The latter case can be
+     filtered out by computing (~longword & himagic).  The final result
+     will always be non-zero if one of the bytes of longword is zero.  */
   himagic = 0x80808080L;
   lomagic = 0x01010101L;
   if (sizeof (longword) > 4)
@@ -76,8 +71,7 @@  STRLEN (const char *str)
 
       if (((longword - lomagic) & ~longword & himagic) != 0)
 	{
-	  /* Which of the bytes was the zero?  If none of them were, it was
-	     a misfire; continue the search.  */
+	  /* Which of the bytes was the zero?  */
 
 	  const char *cp = (const char *) (longword_ptr - 1);