mbox series

[v5,00/17] Improve generic string routines

Message ID 20220919195920.956393-1-adhemerval.zanella@linaro.org
Headers show
Series Improve generic string routines | expand

Message

Adhemerval Zanella Netto Sept. 19, 2022, 7:59 p.m. UTC
It is done by:

  1. parametrizing the internal routines (for instance the find zero
     in a word) so each architecture can reimplement without the need
     to reimplement the whole routine.

  2. vectorizing more string implementations (for instance strcpy
     and strcmp).

  3. Change some implementations to use already possible optimized
     ones (for instance strnlen).  It makes new ports to focus on
     only provide optimized implementation of a hardful symbols
     (for instance memchr) and make its improvement to be used in
     a larger set of routines.

For the rest of #5806 I think we can handle them later and if
performance of generic implementation is closer I think it is better
to just remove old assembly implementations.

I also checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu,
and powerpc64-linux-gnu by removing the arch-specific assembly
implementation and disabling multiarch (it covers both LE and BE
for 64 and 32 bits). I also checked the string routines on alpha, hppa,
and sh.

Changes since v4:
  * Removed __clz and __ctz in favor of count_leading_zero and
    count_trailing_zeros from longlong.h.
  * Use repeat_bytes more often.
  * Added a comment on strcmp final_cmp on why index_first_zero_ne can
    not be used.

Changes since v3:
  * Rebased against master.
  * Dropped strcpy optimization.
  * Refactor strcmp implementation.
  * Some minor changes in comments.

Changes since v2:
  * Move string-fz{a,b,i} to its own patch.
  * Add a inline implementation for __builtin_c{l,t}z to avoid using
    compiler provided symbols.
  * Add a new header, string-maskoff.h, to handle unaligned accesses
    on some implementation.
  * Fixed strcmp on LE machines.
  * Added a unaligned strcpy variant for architecture that define
    _STRING_ARCH_unaligned.
  * Add SH string-fzb.h (which uses cmp/str instruction to find
    a zero in word).

Changes since v1:
  * Marked ChangeLog entries with [BZ #5806], as appropriate.
  * Reorganized the headers, so that armv6t2 and power6 need override
    as little as possible to use their (integer) zero detection insns.
  * Hopefully fixed all of the coding style issues.
  * Adjusted the memrchr algorithm as discussed.
  * Replaced the #ifdef STRRCHR etc that are used by the multiarch
  * files.
  * Tested on i386, i686, x86_64 (verified this is unused), ppc64,
    ppc64le --with-cpu=power8 (to use power6 in multiarch), armv7,
    aarch64, alpha (qemu) and hppa (qemu).

Adhemerval Zanella (10):
  Add string-maskoff.h generic header
  Add string vectorized find and detection functions
  string: Improve generic strlen
  string: Improve generic strnlen
  string: Improve generic strchr
  string: Improve generic strchrnul
  string: Improve generic strcmp
  string: Improve generic memchr
  string: Improve generic memrchr
  sh: Add string-fzb.h

Richard Henderson (7):
  Parameterize op_t from memcopy.h
  Parameterize OP_T_THRES from memcopy.h
  hppa: Add memcopy.h
  hppa: Add string-fzb.h and string-fzi.h
  alpha: Add string-fzb.h and string-fzi.h
  arm: Add string-fza.h
  powerpc: Add string-fza.h

 string/memchr.c                               | 168 ++++------------
 string/memcmp.c                               |   4 -
 string/memrchr.c                              | 189 +++---------------
 string/strchr.c                               | 172 +++-------------
 string/strchrnul.c                            | 156 +++------------
 string/strcmp.c                               | 119 +++++++++--
 string/strlen.c                               |  90 ++-------
 string/strnlen.c                              | 137 +------------
 sysdeps/alpha/string-fzb.h                    |  51 +++++
 sysdeps/alpha/string-fzi.h                    | 113 +++++++++++
 sysdeps/arm/armv6t2/string-fza.h              |  70 +++++++
 sysdeps/generic/memcopy.h                     |  10 +-
 sysdeps/generic/string-extbyte.h              |  37 ++++
 sysdeps/generic/string-fza.h                  | 106 ++++++++++
 sysdeps/generic/string-fzb.h                  |  49 +++++
 sysdeps/generic/string-fzi.h                  | 120 +++++++++++
 sysdeps/generic/string-maskoff.h              |  73 +++++++
 sysdeps/generic/string-opthr.h                |  25 +++
 sysdeps/generic/string-optype.h               |  31 +++
 sysdeps/hppa/memcopy.h                        |  42 ++++
 sysdeps/hppa/string-fzb.h                     |  69 +++++++
 sysdeps/hppa/string-fzi.h                     | 135 +++++++++++++
 sysdeps/i386/i686/multiarch/strnlen-c.c       |  14 +-
 sysdeps/i386/memcopy.h                        |   3 -
 sysdeps/i386/string-opthr.h                   |  25 +++
 sysdeps/m68k/memcopy.h                        |   3 -
 sysdeps/powerpc/powerpc32/power4/memcopy.h    |   5 -
 .../powerpc32/power4/multiarch/memchr-ppc32.c |  14 +-
 .../power4/multiarch/strchrnul-ppc32.c        |   4 -
 .../power4/multiarch/strnlen-ppc32.c          |  14 +-
 .../powerpc64/multiarch/memchr-ppc64.c        |   9 +-
 sysdeps/powerpc/string-fza.h                  |  70 +++++++
 sysdeps/s390/strchr-c.c                       |  11 +-
 sysdeps/s390/strchrnul-c.c                    |   2 -
 sysdeps/s390/strlen-c.c                       |  10 +-
 sysdeps/s390/strnlen-c.c                      |  14 +-
 sysdeps/sh/string-fzb.h                       |  53 +++++
 37 files changed, 1366 insertions(+), 851 deletions(-)
 create mode 100644 sysdeps/alpha/string-fzb.h
 create mode 100644 sysdeps/alpha/string-fzi.h
 create mode 100644 sysdeps/arm/armv6t2/string-fza.h
 create mode 100644 sysdeps/generic/string-extbyte.h
 create mode 100644 sysdeps/generic/string-fza.h
 create mode 100644 sysdeps/generic/string-fzb.h
 create mode 100644 sysdeps/generic/string-fzi.h
 create mode 100644 sysdeps/generic/string-maskoff.h
 create mode 100644 sysdeps/generic/string-opthr.h
 create mode 100644 sysdeps/generic/string-optype.h
 create mode 100644 sysdeps/hppa/memcopy.h
 create mode 100644 sysdeps/hppa/string-fzb.h
 create mode 100644 sysdeps/hppa/string-fzi.h
 create mode 100644 sysdeps/i386/string-opthr.h
 create mode 100644 sysdeps/powerpc/string-fza.h
 create mode 100644 sysdeps/sh/string-fzb.h

Comments

Xi Ruoyao Dec. 5, 2022, 5:07 p.m. UTC | #1
Hi,

Any status update on this series?

On Mon, 2022-09-19 at 16:59 -0300, Adhemerval Zanella via Libc-alpha
wrote:
> It is done by:
> 
>   1. parametrizing the internal routines (for instance the find zero
>      in a word) so each architecture can reimplement without the need
>      to reimplement the whole routine.
> 
>   2. vectorizing more string implementations (for instance strcpy
>      and strcmp).
> 
>   3. Change some implementations to use already possible optimized
>      ones (for instance strnlen).  It makes new ports to focus on
>      only provide optimized implementation of a hardful symbols
>      (for instance memchr) and make its improvement to be used in
>      a larger set of routines.
> 
> For the rest of #5806 I think we can handle them later and if
> performance of generic implementation is closer I think it is better
> to just remove old assembly implementations.
> 
> I also checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu,
> and powerpc64-linux-gnu by removing the arch-specific assembly
> implementation and disabling multiarch (it covers both LE and BE
> for 64 and 32 bits). I also checked the string routines on alpha,
> hppa,
> and sh.
> 
> Changes since v4:
>   * Removed __clz and __ctz in favor of count_leading_zero and
>     count_trailing_zeros from longlong.h.
>   * Use repeat_bytes more often.
>   * Added a comment on strcmp final_cmp on why index_first_zero_ne can
>     not be used.
> 
> Changes since v3:
>   * Rebased against master.
>   * Dropped strcpy optimization.
>   * Refactor strcmp implementation.
>   * Some minor changes in comments.
> 
> Changes since v2:
>   * Move string-fz{a,b,i} to its own patch.
>   * Add a inline implementation for __builtin_c{l,t}z to avoid using
>     compiler provided symbols.
>   * Add a new header, string-maskoff.h, to handle unaligned accesses
>     on some implementation.
>   * Fixed strcmp on LE machines.
>   * Added a unaligned strcpy variant for architecture that define
>     _STRING_ARCH_unaligned.
>   * Add SH string-fzb.h (which uses cmp/str instruction to find
>     a zero in word).
> 
> Changes since v1:
>   * Marked ChangeLog entries with [BZ #5806], as appropriate.
>   * Reorganized the headers, so that armv6t2 and power6 need override
>     as little as possible to use their (integer) zero detection insns.
>   * Hopefully fixed all of the coding style issues.
>   * Adjusted the memrchr algorithm as discussed.
>   * Replaced the #ifdef STRRCHR etc that are used by the multiarch
>   * files.
>   * Tested on i386, i686, x86_64 (verified this is unused), ppc64,
>     ppc64le --with-cpu=power8 (to use power6 in multiarch), armv7,
>     aarch64, alpha (qemu) and hppa (qemu).
> 
> Adhemerval Zanella (10):
>   Add string-maskoff.h generic header
>   Add string vectorized find and detection functions
>   string: Improve generic strlen
>   string: Improve generic strnlen
>   string: Improve generic strchr
>   string: Improve generic strchrnul
>   string: Improve generic strcmp
>   string: Improve generic memchr
>   string: Improve generic memrchr
>   sh: Add string-fzb.h
> 
> Richard Henderson (7):
>   Parameterize op_t from memcopy.h
>   Parameterize OP_T_THRES from memcopy.h
>   hppa: Add memcopy.h
>   hppa: Add string-fzb.h and string-fzi.h
>   alpha: Add string-fzb.h and string-fzi.h
>   arm: Add string-fza.h
>   powerpc: Add string-fza.h
> 
>  string/memchr.c                               | 168 ++++------------
>  string/memcmp.c                               |   4 -
>  string/memrchr.c                              | 189 +++--------------
> -
>  string/strchr.c                               | 172 +++-------------
>  string/strchrnul.c                            | 156 +++------------
>  string/strcmp.c                               | 119 +++++++++--
>  string/strlen.c                               |  90 ++-------
>  string/strnlen.c                              | 137 +------------
>  sysdeps/alpha/string-fzb.h                    |  51 +++++
>  sysdeps/alpha/string-fzi.h                    | 113 +++++++++++
>  sysdeps/arm/armv6t2/string-fza.h              |  70 +++++++
>  sysdeps/generic/memcopy.h                     |  10 +-
>  sysdeps/generic/string-extbyte.h              |  37 ++++
>  sysdeps/generic/string-fza.h                  | 106 ++++++++++
>  sysdeps/generic/string-fzb.h                  |  49 +++++
>  sysdeps/generic/string-fzi.h                  | 120 +++++++++++
>  sysdeps/generic/string-maskoff.h              |  73 +++++++
>  sysdeps/generic/string-opthr.h                |  25 +++
>  sysdeps/generic/string-optype.h               |  31 +++
>  sysdeps/hppa/memcopy.h                        |  42 ++++
>  sysdeps/hppa/string-fzb.h                     |  69 +++++++
>  sysdeps/hppa/string-fzi.h                     | 135 +++++++++++++
>  sysdeps/i386/i686/multiarch/strnlen-c.c       |  14 +-
>  sysdeps/i386/memcopy.h                        |   3 -
>  sysdeps/i386/string-opthr.h                   |  25 +++
>  sysdeps/m68k/memcopy.h                        |   3 -
>  sysdeps/powerpc/powerpc32/power4/memcopy.h    |   5 -
>  .../powerpc32/power4/multiarch/memchr-ppc32.c |  14 +-
>  .../power4/multiarch/strchrnul-ppc32.c        |   4 -
>  .../power4/multiarch/strnlen-ppc32.c          |  14 +-
>  .../powerpc64/multiarch/memchr-ppc64.c        |   9 +-
>  sysdeps/powerpc/string-fza.h                  |  70 +++++++
>  sysdeps/s390/strchr-c.c                       |  11 +-
>  sysdeps/s390/strchrnul-c.c                    |   2 -
>  sysdeps/s390/strlen-c.c                       |  10 +-
>  sysdeps/s390/strnlen-c.c                      |  14 +-
>  sysdeps/sh/string-fzb.h                       |  53 +++++
>  37 files changed, 1366 insertions(+), 851 deletions(-)
>  create mode 100644 sysdeps/alpha/string-fzb.h
>  create mode 100644 sysdeps/alpha/string-fzi.h
>  create mode 100644 sysdeps/arm/armv6t2/string-fza.h
>  create mode 100644 sysdeps/generic/string-extbyte.h
>  create mode 100644 sysdeps/generic/string-fza.h
>  create mode 100644 sysdeps/generic/string-fzb.h
>  create mode 100644 sysdeps/generic/string-fzi.h
>  create mode 100644 sysdeps/generic/string-maskoff.h
>  create mode 100644 sysdeps/generic/string-opthr.h
>  create mode 100644 sysdeps/generic/string-optype.h
>  create mode 100644 sysdeps/hppa/memcopy.h
>  create mode 100644 sysdeps/hppa/string-fzb.h
>  create mode 100644 sysdeps/hppa/string-fzi.h
>  create mode 100644 sysdeps/i386/string-opthr.h
>  create mode 100644 sysdeps/powerpc/string-fza.h
>  create mode 100644 sysdeps/sh/string-fzb.h
>
Adhemerval Zanella Netto Jan. 5, 2023, 9:56 p.m. UTC | #2
Unfortunately no one worked on reviewing it.  It would be good to have
it for 2.37, although I think it is too late.  However, since most
architectures do use arch-specific routines, I think the possible disruption
of using this patchset should be minimal.

On 05/12/22 14:07, Xi Ruoyao wrote:
> Hi,
> 
> Any status update on this series?
> 
> On Mon, 2022-09-19 at 16:59 -0300, Adhemerval Zanella via Libc-alpha
> wrote:
>> It is done by:
>>
>>   1. parametrizing the internal routines (for instance the find zero
>>      in a word) so each architecture can reimplement without the need
>>      to reimplement the whole routine.
>>
>>   2. vectorizing more string implementations (for instance strcpy
>>      and strcmp).
>>
>>   3. Change some implementations to use already possible optimized
>>      ones (for instance strnlen).  It makes new ports to focus on
>>      only provide optimized implementation of a hardful symbols
>>      (for instance memchr) and make its improvement to be used in
>>      a larger set of routines.
>>
>> For the rest of #5806 I think we can handle them later and if
>> performance of generic implementation is closer I think it is better
>> to just remove old assembly implementations.
>>
>> I also checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu,
>> and powerpc64-linux-gnu by removing the arch-specific assembly
>> implementation and disabling multiarch (it covers both LE and BE
>> for 64 and 32 bits). I also checked the string routines on alpha,
>> hppa,
>> and sh.
>>
>> Changes since v4:
>>   * Removed __clz and __ctz in favor of count_leading_zero and
>>     count_trailing_zeros from longlong.h.
>>   * Use repeat_bytes more often.
>>   * Added a comment on strcmp final_cmp on why index_first_zero_ne can
>>     not be used.
>>
>> Changes since v3:
>>   * Rebased against master.
>>   * Dropped strcpy optimization.
>>   * Refactor strcmp implementation.
>>   * Some minor changes in comments.
>>
>> Changes since v2:
>>   * Move string-fz{a,b,i} to its own patch.
>>   * Add a inline implementation for __builtin_c{l,t}z to avoid using
>>     compiler provided symbols.
>>   * Add a new header, string-maskoff.h, to handle unaligned accesses
>>     on some implementation.
>>   * Fixed strcmp on LE machines.
>>   * Added a unaligned strcpy variant for architecture that define
>>     _STRING_ARCH_unaligned.
>>   * Add SH string-fzb.h (which uses cmp/str instruction to find
>>     a zero in word).
>>
>> Changes since v1:
>>   * Marked ChangeLog entries with [BZ #5806], as appropriate.
>>   * Reorganized the headers, so that armv6t2 and power6 need override
>>     as little as possible to use their (integer) zero detection insns.
>>   * Hopefully fixed all of the coding style issues.
>>   * Adjusted the memrchr algorithm as discussed.
>>   * Replaced the #ifdef STRRCHR etc that are used by the multiarch
>>   * files.
>>   * Tested on i386, i686, x86_64 (verified this is unused), ppc64,
>>     ppc64le --with-cpu=power8 (to use power6 in multiarch), armv7,
>>     aarch64, alpha (qemu) and hppa (qemu).
>>
>> Adhemerval Zanella (10):
>>   Add string-maskoff.h generic header
>>   Add string vectorized find and detection functions
>>   string: Improve generic strlen
>>   string: Improve generic strnlen
>>   string: Improve generic strchr
>>   string: Improve generic strchrnul
>>   string: Improve generic strcmp
>>   string: Improve generic memchr
>>   string: Improve generic memrchr
>>   sh: Add string-fzb.h
>>
>> Richard Henderson (7):
>>   Parameterize op_t from memcopy.h
>>   Parameterize OP_T_THRES from memcopy.h
>>   hppa: Add memcopy.h
>>   hppa: Add string-fzb.h and string-fzi.h
>>   alpha: Add string-fzb.h and string-fzi.h
>>   arm: Add string-fza.h
>>   powerpc: Add string-fza.h
>>
>>  string/memchr.c                               | 168 ++++------------
>>  string/memcmp.c                               |   4 -
>>  string/memrchr.c                              | 189 +++--------------
>> -
>>  string/strchr.c                               | 172 +++-------------
>>  string/strchrnul.c                            | 156 +++------------
>>  string/strcmp.c                               | 119 +++++++++--
>>  string/strlen.c                               |  90 ++-------
>>  string/strnlen.c                              | 137 +------------
>>  sysdeps/alpha/string-fzb.h                    |  51 +++++
>>  sysdeps/alpha/string-fzi.h                    | 113 +++++++++++
>>  sysdeps/arm/armv6t2/string-fza.h              |  70 +++++++
>>  sysdeps/generic/memcopy.h                     |  10 +-
>>  sysdeps/generic/string-extbyte.h              |  37 ++++
>>  sysdeps/generic/string-fza.h                  | 106 ++++++++++
>>  sysdeps/generic/string-fzb.h                  |  49 +++++
>>  sysdeps/generic/string-fzi.h                  | 120 +++++++++++
>>  sysdeps/generic/string-maskoff.h              |  73 +++++++
>>  sysdeps/generic/string-opthr.h                |  25 +++
>>  sysdeps/generic/string-optype.h               |  31 +++
>>  sysdeps/hppa/memcopy.h                        |  42 ++++
>>  sysdeps/hppa/string-fzb.h                     |  69 +++++++
>>  sysdeps/hppa/string-fzi.h                     | 135 +++++++++++++
>>  sysdeps/i386/i686/multiarch/strnlen-c.c       |  14 +-
>>  sysdeps/i386/memcopy.h                        |   3 -
>>  sysdeps/i386/string-opthr.h                   |  25 +++
>>  sysdeps/m68k/memcopy.h                        |   3 -
>>  sysdeps/powerpc/powerpc32/power4/memcopy.h    |   5 -
>>  .../powerpc32/power4/multiarch/memchr-ppc32.c |  14 +-
>>  .../power4/multiarch/strchrnul-ppc32.c        |   4 -
>>  .../power4/multiarch/strnlen-ppc32.c          |  14 +-
>>  .../powerpc64/multiarch/memchr-ppc64.c        |   9 +-
>>  sysdeps/powerpc/string-fza.h                  |  70 +++++++
>>  sysdeps/s390/strchr-c.c                       |  11 +-
>>  sysdeps/s390/strchrnul-c.c                    |   2 -
>>  sysdeps/s390/strlen-c.c                       |  10 +-
>>  sysdeps/s390/strnlen-c.c                      |  14 +-
>>  sysdeps/sh/string-fzb.h                       |  53 +++++
>>  37 files changed, 1366 insertions(+), 851 deletions(-)
>>  create mode 100644 sysdeps/alpha/string-fzb.h
>>  create mode 100644 sysdeps/alpha/string-fzi.h
>>  create mode 100644 sysdeps/arm/armv6t2/string-fza.h
>>  create mode 100644 sysdeps/generic/string-extbyte.h
>>  create mode 100644 sysdeps/generic/string-fza.h
>>  create mode 100644 sysdeps/generic/string-fzb.h
>>  create mode 100644 sysdeps/generic/string-fzi.h
>>  create mode 100644 sysdeps/generic/string-maskoff.h
>>  create mode 100644 sysdeps/generic/string-opthr.h
>>  create mode 100644 sysdeps/generic/string-optype.h
>>  create mode 100644 sysdeps/hppa/memcopy.h
>>  create mode 100644 sysdeps/hppa/string-fzb.h
>>  create mode 100644 sysdeps/hppa/string-fzi.h
>>  create mode 100644 sysdeps/i386/string-opthr.h
>>  create mode 100644 sysdeps/powerpc/string-fza.h
>>  create mode 100644 sysdeps/sh/string-fzb.h
>>
>
Noah Goldstein Jan. 5, 2023, 11:52 p.m. UTC | #3
On Thu, Jan 5, 2023 at 1:56 PM Adhemerval Zanella Netto via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> Unfortunately no one worked on reviewing it.  It would be good to have
> it for 2.37, although I think it is too late.  However, since most
> architectures do use arch-specific routines, I think the possible disruption
> of using this patchset should be minimal.

I can start reviewing this. Not sure I can do all the arch headers but
can get up
to 11/17.
>
> On 05/12/22 14:07, Xi Ruoyao wrote:
> > Hi,
> >
> > Any status update on this series?
> >
> > On Mon, 2022-09-19 at 16:59 -0300, Adhemerval Zanella via Libc-alpha
> > wrote:
> >> It is done by:
> >>
> >>   1. parametrizing the internal routines (for instance the find zero
> >>      in a word) so each architecture can reimplement without the need
> >>      to reimplement the whole routine.
> >>
> >>   2. vectorizing more string implementations (for instance strcpy
> >>      and strcmp).
> >>
> >>   3. Change some implementations to use already possible optimized
> >>      ones (for instance strnlen).  It makes new ports to focus on
> >>      only provide optimized implementation of a hardful symbols
> >>      (for instance memchr) and make its improvement to be used in
> >>      a larger set of routines.
> >>
> >> For the rest of #5806 I think we can handle them later and if
> >> performance of generic implementation is closer I think it is better
> >> to just remove old assembly implementations.
> >>
> >> I also checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu,
> >> and powerpc64-linux-gnu by removing the arch-specific assembly
> >> implementation and disabling multiarch (it covers both LE and BE
> >> for 64 and 32 bits). I also checked the string routines on alpha,
> >> hppa,
> >> and sh.
> >>
> >> Changes since v4:
> >>   * Removed __clz and __ctz in favor of count_leading_zero and
> >>     count_trailing_zeros from longlong.h.
> >>   * Use repeat_bytes more often.
> >>   * Added a comment on strcmp final_cmp on why index_first_zero_ne can
> >>     not be used.
> >>
> >> Changes since v3:
> >>   * Rebased against master.
> >>   * Dropped strcpy optimization.
> >>   * Refactor strcmp implementation.
> >>   * Some minor changes in comments.
> >>
> >> Changes since v2:
> >>   * Move string-fz{a,b,i} to its own patch.
> >>   * Add a inline implementation for __builtin_c{l,t}z to avoid using
> >>     compiler provided symbols.
> >>   * Add a new header, string-maskoff.h, to handle unaligned accesses
> >>     on some implementation.
> >>   * Fixed strcmp on LE machines.
> >>   * Added a unaligned strcpy variant for architecture that define
> >>     _STRING_ARCH_unaligned.
> >>   * Add SH string-fzb.h (which uses cmp/str instruction to find
> >>     a zero in word).
> >>
> >> Changes since v1:
> >>   * Marked ChangeLog entries with [BZ #5806], as appropriate.
> >>   * Reorganized the headers, so that armv6t2 and power6 need override
> >>     as little as possible to use their (integer) zero detection insns.
> >>   * Hopefully fixed all of the coding style issues.
> >>   * Adjusted the memrchr algorithm as discussed.
> >>   * Replaced the #ifdef STRRCHR etc that are used by the multiarch
> >>   * files.
> >>   * Tested on i386, i686, x86_64 (verified this is unused), ppc64,
> >>     ppc64le --with-cpu=power8 (to use power6 in multiarch), armv7,
> >>     aarch64, alpha (qemu) and hppa (qemu).
> >>
> >> Adhemerval Zanella (10):
> >>   Add string-maskoff.h generic header
> >>   Add string vectorized find and detection functions
> >>   string: Improve generic strlen
> >>   string: Improve generic strnlen
> >>   string: Improve generic strchr
> >>   string: Improve generic strchrnul
> >>   string: Improve generic strcmp
> >>   string: Improve generic memchr
> >>   string: Improve generic memrchr
> >>   sh: Add string-fzb.h
> >>
> >> Richard Henderson (7):
> >>   Parameterize op_t from memcopy.h
> >>   Parameterize OP_T_THRES from memcopy.h
> >>   hppa: Add memcopy.h
> >>   hppa: Add string-fzb.h and string-fzi.h
> >>   alpha: Add string-fzb.h and string-fzi.h
> >>   arm: Add string-fza.h
> >>   powerpc: Add string-fza.h
> >>
> >>  string/memchr.c                               | 168 ++++------------
> >>  string/memcmp.c                               |   4 -
> >>  string/memrchr.c                              | 189 +++--------------
> >> -
> >>  string/strchr.c                               | 172 +++-------------
> >>  string/strchrnul.c                            | 156 +++------------
> >>  string/strcmp.c                               | 119 +++++++++--
> >>  string/strlen.c                               |  90 ++-------
> >>  string/strnlen.c                              | 137 +------------
> >>  sysdeps/alpha/string-fzb.h                    |  51 +++++
> >>  sysdeps/alpha/string-fzi.h                    | 113 +++++++++++
> >>  sysdeps/arm/armv6t2/string-fza.h              |  70 +++++++
> >>  sysdeps/generic/memcopy.h                     |  10 +-
> >>  sysdeps/generic/string-extbyte.h              |  37 ++++
> >>  sysdeps/generic/string-fza.h                  | 106 ++++++++++
> >>  sysdeps/generic/string-fzb.h                  |  49 +++++
> >>  sysdeps/generic/string-fzi.h                  | 120 +++++++++++
> >>  sysdeps/generic/string-maskoff.h              |  73 +++++++
> >>  sysdeps/generic/string-opthr.h                |  25 +++
> >>  sysdeps/generic/string-optype.h               |  31 +++
> >>  sysdeps/hppa/memcopy.h                        |  42 ++++
> >>  sysdeps/hppa/string-fzb.h                     |  69 +++++++
> >>  sysdeps/hppa/string-fzi.h                     | 135 +++++++++++++
> >>  sysdeps/i386/i686/multiarch/strnlen-c.c       |  14 +-
> >>  sysdeps/i386/memcopy.h                        |   3 -
> >>  sysdeps/i386/string-opthr.h                   |  25 +++
> >>  sysdeps/m68k/memcopy.h                        |   3 -
> >>  sysdeps/powerpc/powerpc32/power4/memcopy.h    |   5 -
> >>  .../powerpc32/power4/multiarch/memchr-ppc32.c |  14 +-
> >>  .../power4/multiarch/strchrnul-ppc32.c        |   4 -
> >>  .../power4/multiarch/strnlen-ppc32.c          |  14 +-
> >>  .../powerpc64/multiarch/memchr-ppc64.c        |   9 +-
> >>  sysdeps/powerpc/string-fza.h                  |  70 +++++++
> >>  sysdeps/s390/strchr-c.c                       |  11 +-
> >>  sysdeps/s390/strchrnul-c.c                    |   2 -
> >>  sysdeps/s390/strlen-c.c                       |  10 +-
> >>  sysdeps/s390/strnlen-c.c                      |  14 +-
> >>  sysdeps/sh/string-fzb.h                       |  53 +++++
> >>  37 files changed, 1366 insertions(+), 851 deletions(-)
> >>  create mode 100644 sysdeps/alpha/string-fzb.h
> >>  create mode 100644 sysdeps/alpha/string-fzi.h
> >>  create mode 100644 sysdeps/arm/armv6t2/string-fza.h
> >>  create mode 100644 sysdeps/generic/string-extbyte.h
> >>  create mode 100644 sysdeps/generic/string-fza.h
> >>  create mode 100644 sysdeps/generic/string-fzb.h
> >>  create mode 100644 sysdeps/generic/string-fzi.h
> >>  create mode 100644 sysdeps/generic/string-maskoff.h
> >>  create mode 100644 sysdeps/generic/string-opthr.h
> >>  create mode 100644 sysdeps/generic/string-optype.h
> >>  create mode 100644 sysdeps/hppa/memcopy.h
> >>  create mode 100644 sysdeps/hppa/string-fzb.h
> >>  create mode 100644 sysdeps/hppa/string-fzi.h
> >>  create mode 100644 sysdeps/i386/string-opthr.h
> >>  create mode 100644 sysdeps/powerpc/string-fza.h
> >>  create mode 100644 sysdeps/sh/string-fzb.h
> >>
> >
Adhemerval Zanella Netto Jan. 6, 2023, 1:43 p.m. UTC | #4
On 05/01/23 20:52, Noah Goldstein wrote:
> On Thu, Jan 5, 2023 at 1:56 PM Adhemerval Zanella Netto via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
>>
>> Unfortunately no one worked on reviewing it.  It would be good to have
>> it for 2.37, although I think it is too late.  However, since most
>> architectures do use arch-specific routines, I think the possible disruption
>> of using this patchset should be minimal.
> 
> I can start reviewing this. Not sure I can do all the arch headers but
> can get up
> to 11/17.

Thanks, I will try to follow up the reviews to get this sort out for next week.