Message ID | 20220919195920.956393-1-adhemerval.zanella@linaro.org |
---|---|
Headers | show |
Series | Improve generic string routines | expand |
Hi, Any status update on this series? On Mon, 2022-09-19 at 16:59 -0300, Adhemerval Zanella via Libc-alpha wrote: > It is done by: > > 1. parametrizing the internal routines (for instance the find zero > in a word) so each architecture can reimplement without the need > to reimplement the whole routine. > > 2. vectorizing more string implementations (for instance strcpy > and strcmp). > > 3. Change some implementations to use already possible optimized > ones (for instance strnlen). It makes new ports to focus on > only provide optimized implementation of a hardful symbols > (for instance memchr) and make its improvement to be used in > a larger set of routines. > > For the rest of #5806 I think we can handle them later and if > performance of generic implementation is closer I think it is better > to just remove old assembly implementations. > > I also checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu, > and powerpc64-linux-gnu by removing the arch-specific assembly > implementation and disabling multiarch (it covers both LE and BE > for 64 and 32 bits). I also checked the string routines on alpha, > hppa, > and sh. > > Changes since v4: > * Removed __clz and __ctz in favor of count_leading_zero and > count_trailing_zeros from longlong.h. > * Use repeat_bytes more often. > * Added a comment on strcmp final_cmp on why index_first_zero_ne can > not be used. > > Changes since v3: > * Rebased against master. > * Dropped strcpy optimization. > * Refactor strcmp implementation. > * Some minor changes in comments. > > Changes since v2: > * Move string-fz{a,b,i} to its own patch. > * Add a inline implementation for __builtin_c{l,t}z to avoid using > compiler provided symbols. > * Add a new header, string-maskoff.h, to handle unaligned accesses > on some implementation. > * Fixed strcmp on LE machines. > * Added a unaligned strcpy variant for architecture that define > _STRING_ARCH_unaligned. > * Add SH string-fzb.h (which uses cmp/str instruction to find > a zero in word). > > Changes since v1: > * Marked ChangeLog entries with [BZ #5806], as appropriate. > * Reorganized the headers, so that armv6t2 and power6 need override > as little as possible to use their (integer) zero detection insns. > * Hopefully fixed all of the coding style issues. > * Adjusted the memrchr algorithm as discussed. > * Replaced the #ifdef STRRCHR etc that are used by the multiarch > * files. > * Tested on i386, i686, x86_64 (verified this is unused), ppc64, > ppc64le --with-cpu=power8 (to use power6 in multiarch), armv7, > aarch64, alpha (qemu) and hppa (qemu). > > Adhemerval Zanella (10): > Add string-maskoff.h generic header > Add string vectorized find and detection functions > string: Improve generic strlen > string: Improve generic strnlen > string: Improve generic strchr > string: Improve generic strchrnul > string: Improve generic strcmp > string: Improve generic memchr > string: Improve generic memrchr > sh: Add string-fzb.h > > Richard Henderson (7): > Parameterize op_t from memcopy.h > Parameterize OP_T_THRES from memcopy.h > hppa: Add memcopy.h > hppa: Add string-fzb.h and string-fzi.h > alpha: Add string-fzb.h and string-fzi.h > arm: Add string-fza.h > powerpc: Add string-fza.h > > string/memchr.c | 168 ++++------------ > string/memcmp.c | 4 - > string/memrchr.c | 189 +++-------------- > - > string/strchr.c | 172 +++------------- > string/strchrnul.c | 156 +++------------ > string/strcmp.c | 119 +++++++++-- > string/strlen.c | 90 ++------- > string/strnlen.c | 137 +------------ > sysdeps/alpha/string-fzb.h | 51 +++++ > sysdeps/alpha/string-fzi.h | 113 +++++++++++ > sysdeps/arm/armv6t2/string-fza.h | 70 +++++++ > sysdeps/generic/memcopy.h | 10 +- > sysdeps/generic/string-extbyte.h | 37 ++++ > sysdeps/generic/string-fza.h | 106 ++++++++++ > sysdeps/generic/string-fzb.h | 49 +++++ > sysdeps/generic/string-fzi.h | 120 +++++++++++ > sysdeps/generic/string-maskoff.h | 73 +++++++ > sysdeps/generic/string-opthr.h | 25 +++ > sysdeps/generic/string-optype.h | 31 +++ > sysdeps/hppa/memcopy.h | 42 ++++ > sysdeps/hppa/string-fzb.h | 69 +++++++ > sysdeps/hppa/string-fzi.h | 135 +++++++++++++ > sysdeps/i386/i686/multiarch/strnlen-c.c | 14 +- > sysdeps/i386/memcopy.h | 3 - > sysdeps/i386/string-opthr.h | 25 +++ > sysdeps/m68k/memcopy.h | 3 - > sysdeps/powerpc/powerpc32/power4/memcopy.h | 5 - > .../powerpc32/power4/multiarch/memchr-ppc32.c | 14 +- > .../power4/multiarch/strchrnul-ppc32.c | 4 - > .../power4/multiarch/strnlen-ppc32.c | 14 +- > .../powerpc64/multiarch/memchr-ppc64.c | 9 +- > sysdeps/powerpc/string-fza.h | 70 +++++++ > sysdeps/s390/strchr-c.c | 11 +- > sysdeps/s390/strchrnul-c.c | 2 - > sysdeps/s390/strlen-c.c | 10 +- > sysdeps/s390/strnlen-c.c | 14 +- > sysdeps/sh/string-fzb.h | 53 +++++ > 37 files changed, 1366 insertions(+), 851 deletions(-) > create mode 100644 sysdeps/alpha/string-fzb.h > create mode 100644 sysdeps/alpha/string-fzi.h > create mode 100644 sysdeps/arm/armv6t2/string-fza.h > create mode 100644 sysdeps/generic/string-extbyte.h > create mode 100644 sysdeps/generic/string-fza.h > create mode 100644 sysdeps/generic/string-fzb.h > create mode 100644 sysdeps/generic/string-fzi.h > create mode 100644 sysdeps/generic/string-maskoff.h > create mode 100644 sysdeps/generic/string-opthr.h > create mode 100644 sysdeps/generic/string-optype.h > create mode 100644 sysdeps/hppa/memcopy.h > create mode 100644 sysdeps/hppa/string-fzb.h > create mode 100644 sysdeps/hppa/string-fzi.h > create mode 100644 sysdeps/i386/string-opthr.h > create mode 100644 sysdeps/powerpc/string-fza.h > create mode 100644 sysdeps/sh/string-fzb.h >
Unfortunately no one worked on reviewing it. It would be good to have it for 2.37, although I think it is too late. However, since most architectures do use arch-specific routines, I think the possible disruption of using this patchset should be minimal. On 05/12/22 14:07, Xi Ruoyao wrote: > Hi, > > Any status update on this series? > > On Mon, 2022-09-19 at 16:59 -0300, Adhemerval Zanella via Libc-alpha > wrote: >> It is done by: >> >> 1. parametrizing the internal routines (for instance the find zero >> in a word) so each architecture can reimplement without the need >> to reimplement the whole routine. >> >> 2. vectorizing more string implementations (for instance strcpy >> and strcmp). >> >> 3. Change some implementations to use already possible optimized >> ones (for instance strnlen). It makes new ports to focus on >> only provide optimized implementation of a hardful symbols >> (for instance memchr) and make its improvement to be used in >> a larger set of routines. >> >> For the rest of #5806 I think we can handle them later and if >> performance of generic implementation is closer I think it is better >> to just remove old assembly implementations. >> >> I also checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu, >> and powerpc64-linux-gnu by removing the arch-specific assembly >> implementation and disabling multiarch (it covers both LE and BE >> for 64 and 32 bits). I also checked the string routines on alpha, >> hppa, >> and sh. >> >> Changes since v4: >> * Removed __clz and __ctz in favor of count_leading_zero and >> count_trailing_zeros from longlong.h. >> * Use repeat_bytes more often. >> * Added a comment on strcmp final_cmp on why index_first_zero_ne can >> not be used. >> >> Changes since v3: >> * Rebased against master. >> * Dropped strcpy optimization. >> * Refactor strcmp implementation. >> * Some minor changes in comments. >> >> Changes since v2: >> * Move string-fz{a,b,i} to its own patch. >> * Add a inline implementation for __builtin_c{l,t}z to avoid using >> compiler provided symbols. >> * Add a new header, string-maskoff.h, to handle unaligned accesses >> on some implementation. >> * Fixed strcmp on LE machines. >> * Added a unaligned strcpy variant for architecture that define >> _STRING_ARCH_unaligned. >> * Add SH string-fzb.h (which uses cmp/str instruction to find >> a zero in word). >> >> Changes since v1: >> * Marked ChangeLog entries with [BZ #5806], as appropriate. >> * Reorganized the headers, so that armv6t2 and power6 need override >> as little as possible to use their (integer) zero detection insns. >> * Hopefully fixed all of the coding style issues. >> * Adjusted the memrchr algorithm as discussed. >> * Replaced the #ifdef STRRCHR etc that are used by the multiarch >> * files. >> * Tested on i386, i686, x86_64 (verified this is unused), ppc64, >> ppc64le --with-cpu=power8 (to use power6 in multiarch), armv7, >> aarch64, alpha (qemu) and hppa (qemu). >> >> Adhemerval Zanella (10): >> Add string-maskoff.h generic header >> Add string vectorized find and detection functions >> string: Improve generic strlen >> string: Improve generic strnlen >> string: Improve generic strchr >> string: Improve generic strchrnul >> string: Improve generic strcmp >> string: Improve generic memchr >> string: Improve generic memrchr >> sh: Add string-fzb.h >> >> Richard Henderson (7): >> Parameterize op_t from memcopy.h >> Parameterize OP_T_THRES from memcopy.h >> hppa: Add memcopy.h >> hppa: Add string-fzb.h and string-fzi.h >> alpha: Add string-fzb.h and string-fzi.h >> arm: Add string-fza.h >> powerpc: Add string-fza.h >> >> string/memchr.c | 168 ++++------------ >> string/memcmp.c | 4 - >> string/memrchr.c | 189 +++-------------- >> - >> string/strchr.c | 172 +++------------- >> string/strchrnul.c | 156 +++------------ >> string/strcmp.c | 119 +++++++++-- >> string/strlen.c | 90 ++------- >> string/strnlen.c | 137 +------------ >> sysdeps/alpha/string-fzb.h | 51 +++++ >> sysdeps/alpha/string-fzi.h | 113 +++++++++++ >> sysdeps/arm/armv6t2/string-fza.h | 70 +++++++ >> sysdeps/generic/memcopy.h | 10 +- >> sysdeps/generic/string-extbyte.h | 37 ++++ >> sysdeps/generic/string-fza.h | 106 ++++++++++ >> sysdeps/generic/string-fzb.h | 49 +++++ >> sysdeps/generic/string-fzi.h | 120 +++++++++++ >> sysdeps/generic/string-maskoff.h | 73 +++++++ >> sysdeps/generic/string-opthr.h | 25 +++ >> sysdeps/generic/string-optype.h | 31 +++ >> sysdeps/hppa/memcopy.h | 42 ++++ >> sysdeps/hppa/string-fzb.h | 69 +++++++ >> sysdeps/hppa/string-fzi.h | 135 +++++++++++++ >> sysdeps/i386/i686/multiarch/strnlen-c.c | 14 +- >> sysdeps/i386/memcopy.h | 3 - >> sysdeps/i386/string-opthr.h | 25 +++ >> sysdeps/m68k/memcopy.h | 3 - >> sysdeps/powerpc/powerpc32/power4/memcopy.h | 5 - >> .../powerpc32/power4/multiarch/memchr-ppc32.c | 14 +- >> .../power4/multiarch/strchrnul-ppc32.c | 4 - >> .../power4/multiarch/strnlen-ppc32.c | 14 +- >> .../powerpc64/multiarch/memchr-ppc64.c | 9 +- >> sysdeps/powerpc/string-fza.h | 70 +++++++ >> sysdeps/s390/strchr-c.c | 11 +- >> sysdeps/s390/strchrnul-c.c | 2 - >> sysdeps/s390/strlen-c.c | 10 +- >> sysdeps/s390/strnlen-c.c | 14 +- >> sysdeps/sh/string-fzb.h | 53 +++++ >> 37 files changed, 1366 insertions(+), 851 deletions(-) >> create mode 100644 sysdeps/alpha/string-fzb.h >> create mode 100644 sysdeps/alpha/string-fzi.h >> create mode 100644 sysdeps/arm/armv6t2/string-fza.h >> create mode 100644 sysdeps/generic/string-extbyte.h >> create mode 100644 sysdeps/generic/string-fza.h >> create mode 100644 sysdeps/generic/string-fzb.h >> create mode 100644 sysdeps/generic/string-fzi.h >> create mode 100644 sysdeps/generic/string-maskoff.h >> create mode 100644 sysdeps/generic/string-opthr.h >> create mode 100644 sysdeps/generic/string-optype.h >> create mode 100644 sysdeps/hppa/memcopy.h >> create mode 100644 sysdeps/hppa/string-fzb.h >> create mode 100644 sysdeps/hppa/string-fzi.h >> create mode 100644 sysdeps/i386/string-opthr.h >> create mode 100644 sysdeps/powerpc/string-fza.h >> create mode 100644 sysdeps/sh/string-fzb.h >> >
On Thu, Jan 5, 2023 at 1:56 PM Adhemerval Zanella Netto via Libc-alpha <libc-alpha@sourceware.org> wrote: > > Unfortunately no one worked on reviewing it. It would be good to have > it for 2.37, although I think it is too late. However, since most > architectures do use arch-specific routines, I think the possible disruption > of using this patchset should be minimal. I can start reviewing this. Not sure I can do all the arch headers but can get up to 11/17. > > On 05/12/22 14:07, Xi Ruoyao wrote: > > Hi, > > > > Any status update on this series? > > > > On Mon, 2022-09-19 at 16:59 -0300, Adhemerval Zanella via Libc-alpha > > wrote: > >> It is done by: > >> > >> 1. parametrizing the internal routines (for instance the find zero > >> in a word) so each architecture can reimplement without the need > >> to reimplement the whole routine. > >> > >> 2. vectorizing more string implementations (for instance strcpy > >> and strcmp). > >> > >> 3. Change some implementations to use already possible optimized > >> ones (for instance strnlen). It makes new ports to focus on > >> only provide optimized implementation of a hardful symbols > >> (for instance memchr) and make its improvement to be used in > >> a larger set of routines. > >> > >> For the rest of #5806 I think we can handle them later and if > >> performance of generic implementation is closer I think it is better > >> to just remove old assembly implementations. > >> > >> I also checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu, > >> and powerpc64-linux-gnu by removing the arch-specific assembly > >> implementation and disabling multiarch (it covers both LE and BE > >> for 64 and 32 bits). I also checked the string routines on alpha, > >> hppa, > >> and sh. > >> > >> Changes since v4: > >> * Removed __clz and __ctz in favor of count_leading_zero and > >> count_trailing_zeros from longlong.h. > >> * Use repeat_bytes more often. > >> * Added a comment on strcmp final_cmp on why index_first_zero_ne can > >> not be used. > >> > >> Changes since v3: > >> * Rebased against master. > >> * Dropped strcpy optimization. > >> * Refactor strcmp implementation. > >> * Some minor changes in comments. > >> > >> Changes since v2: > >> * Move string-fz{a,b,i} to its own patch. > >> * Add a inline implementation for __builtin_c{l,t}z to avoid using > >> compiler provided symbols. > >> * Add a new header, string-maskoff.h, to handle unaligned accesses > >> on some implementation. > >> * Fixed strcmp on LE machines. > >> * Added a unaligned strcpy variant for architecture that define > >> _STRING_ARCH_unaligned. > >> * Add SH string-fzb.h (which uses cmp/str instruction to find > >> a zero in word). > >> > >> Changes since v1: > >> * Marked ChangeLog entries with [BZ #5806], as appropriate. > >> * Reorganized the headers, so that armv6t2 and power6 need override > >> as little as possible to use their (integer) zero detection insns. > >> * Hopefully fixed all of the coding style issues. > >> * Adjusted the memrchr algorithm as discussed. > >> * Replaced the #ifdef STRRCHR etc that are used by the multiarch > >> * files. > >> * Tested on i386, i686, x86_64 (verified this is unused), ppc64, > >> ppc64le --with-cpu=power8 (to use power6 in multiarch), armv7, > >> aarch64, alpha (qemu) and hppa (qemu). > >> > >> Adhemerval Zanella (10): > >> Add string-maskoff.h generic header > >> Add string vectorized find and detection functions > >> string: Improve generic strlen > >> string: Improve generic strnlen > >> string: Improve generic strchr > >> string: Improve generic strchrnul > >> string: Improve generic strcmp > >> string: Improve generic memchr > >> string: Improve generic memrchr > >> sh: Add string-fzb.h > >> > >> Richard Henderson (7): > >> Parameterize op_t from memcopy.h > >> Parameterize OP_T_THRES from memcopy.h > >> hppa: Add memcopy.h > >> hppa: Add string-fzb.h and string-fzi.h > >> alpha: Add string-fzb.h and string-fzi.h > >> arm: Add string-fza.h > >> powerpc: Add string-fza.h > >> > >> string/memchr.c | 168 ++++------------ > >> string/memcmp.c | 4 - > >> string/memrchr.c | 189 +++-------------- > >> - > >> string/strchr.c | 172 +++------------- > >> string/strchrnul.c | 156 +++------------ > >> string/strcmp.c | 119 +++++++++-- > >> string/strlen.c | 90 ++------- > >> string/strnlen.c | 137 +------------ > >> sysdeps/alpha/string-fzb.h | 51 +++++ > >> sysdeps/alpha/string-fzi.h | 113 +++++++++++ > >> sysdeps/arm/armv6t2/string-fza.h | 70 +++++++ > >> sysdeps/generic/memcopy.h | 10 +- > >> sysdeps/generic/string-extbyte.h | 37 ++++ > >> sysdeps/generic/string-fza.h | 106 ++++++++++ > >> sysdeps/generic/string-fzb.h | 49 +++++ > >> sysdeps/generic/string-fzi.h | 120 +++++++++++ > >> sysdeps/generic/string-maskoff.h | 73 +++++++ > >> sysdeps/generic/string-opthr.h | 25 +++ > >> sysdeps/generic/string-optype.h | 31 +++ > >> sysdeps/hppa/memcopy.h | 42 ++++ > >> sysdeps/hppa/string-fzb.h | 69 +++++++ > >> sysdeps/hppa/string-fzi.h | 135 +++++++++++++ > >> sysdeps/i386/i686/multiarch/strnlen-c.c | 14 +- > >> sysdeps/i386/memcopy.h | 3 - > >> sysdeps/i386/string-opthr.h | 25 +++ > >> sysdeps/m68k/memcopy.h | 3 - > >> sysdeps/powerpc/powerpc32/power4/memcopy.h | 5 - > >> .../powerpc32/power4/multiarch/memchr-ppc32.c | 14 +- > >> .../power4/multiarch/strchrnul-ppc32.c | 4 - > >> .../power4/multiarch/strnlen-ppc32.c | 14 +- > >> .../powerpc64/multiarch/memchr-ppc64.c | 9 +- > >> sysdeps/powerpc/string-fza.h | 70 +++++++ > >> sysdeps/s390/strchr-c.c | 11 +- > >> sysdeps/s390/strchrnul-c.c | 2 - > >> sysdeps/s390/strlen-c.c | 10 +- > >> sysdeps/s390/strnlen-c.c | 14 +- > >> sysdeps/sh/string-fzb.h | 53 +++++ > >> 37 files changed, 1366 insertions(+), 851 deletions(-) > >> create mode 100644 sysdeps/alpha/string-fzb.h > >> create mode 100644 sysdeps/alpha/string-fzi.h > >> create mode 100644 sysdeps/arm/armv6t2/string-fza.h > >> create mode 100644 sysdeps/generic/string-extbyte.h > >> create mode 100644 sysdeps/generic/string-fza.h > >> create mode 100644 sysdeps/generic/string-fzb.h > >> create mode 100644 sysdeps/generic/string-fzi.h > >> create mode 100644 sysdeps/generic/string-maskoff.h > >> create mode 100644 sysdeps/generic/string-opthr.h > >> create mode 100644 sysdeps/generic/string-optype.h > >> create mode 100644 sysdeps/hppa/memcopy.h > >> create mode 100644 sysdeps/hppa/string-fzb.h > >> create mode 100644 sysdeps/hppa/string-fzi.h > >> create mode 100644 sysdeps/i386/string-opthr.h > >> create mode 100644 sysdeps/powerpc/string-fza.h > >> create mode 100644 sysdeps/sh/string-fzb.h > >> > >
On 05/01/23 20:52, Noah Goldstein wrote: > On Thu, Jan 5, 2023 at 1:56 PM Adhemerval Zanella Netto via Libc-alpha > <libc-alpha@sourceware.org> wrote: >> >> Unfortunately no one worked on reviewing it. It would be good to have >> it for 2.37, although I think it is too late. However, since most >> architectures do use arch-specific routines, I think the possible disruption >> of using this patchset should be minimal. > > I can start reviewing this. Not sure I can do all the arch headers but > can get up > to 11/17. Thanks, I will try to follow up the reviews to get this sort out for next week.