diff mbox

improve string find algorithm

Message ID 1481132816-31162-1-git-send-email-aditya.k7@samsung.com
State New
Headers show

Commit Message

Aditya Kumar Dec. 7, 2016, 5:46 p.m. UTC
Here is an improved version of basic_string::find. The idea is to
split the string find in two parts:
1. search for the first match by using traits_type::find (this gets converted to memchr for x86)
2. see if there is a match (this gets converted to memcmp for x86)

Passes bootstrap on x86-64.

The patch results in good improvements on a synthetic test case I wrote using the google-benchmark.
following are the results.


Branch: master without patch
$ ./bin/string.libcxx.out
Run on (24 X 1899.12 MHz CPU s)
2016-12-06 16:41:55
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
Benchmark                           Time           CPU Iterations
-----------------------------------------------------------------
BM_StringFindNoMatch/10             8 ns          8 ns   81880927
BM_StringFindNoMatch/64            52 ns         52 ns   13235018
BM_StringFindNoMatch/512          355 ns        355 ns    1962488
BM_StringFindNoMatch/4k          2769 ns       2772 ns     249090
BM_StringFindNoMatch/32k        22598 ns      22619 ns      30984
BM_StringFindNoMatch/128k       89745 ns      89830 ns       7996
BM_StringFindAllMatch/1             7 ns          7 ns  102893835
BM_StringFindAllMatch/8             9 ns          9 ns   75403364
BM_StringFindAllMatch/64           12 ns         12 ns   60766893
BM_StringFindAllMatch/512          31 ns         31 ns   23163999
BM_StringFindAllMatch/4k          141 ns        141 ns    4980386
BM_StringFindAllMatch/32k        1402 ns       1403 ns     483581
BM_StringFindAllMatch/128k       5604 ns       5609 ns     126123
BM_StringFindMatch1/1           44430 ns      44473 ns      15804
BM_StringFindMatch1/8           44315 ns      44357 ns      15741
BM_StringFindMatch1/64          44689 ns      44731 ns      15712
BM_StringFindMatch1/512         44247 ns      44290 ns      15724
BM_StringFindMatch1/4k          45010 ns      45053 ns      15678
BM_StringFindMatch1/32k         45717 ns      45761 ns      15278
BM_StringFindMatch2/1           44307 ns      44349 ns      15730
BM_StringFindMatch2/8           44631 ns      44674 ns      15721
BM_StringFindMatch2/64          44300 ns      44342 ns      15750
BM_StringFindMatch2/512         44239 ns      44281 ns      15713
BM_StringFindMatch2/4k          44886 ns      44928 ns      15787

Branch: master with patch
$ ./bin/string.libcxx.out
Run on (24 X 2892.28 MHz CPU s)
2016-12-06 18:51:38
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
Benchmark                           Time           CPU Iterations
-----------------------------------------------------------------
BM_StringFindNoMatch/10            11 ns         11 ns   63049677
BM_StringFindNoMatch/64            12 ns         12 ns   57259381
BM_StringFindNoMatch/512           27 ns         27 ns   25495432
BM_StringFindNoMatch/4k           130 ns        130 ns    5301301
BM_StringFindNoMatch/32k          858 ns        859 ns     824048
BM_StringFindNoMatch/128k        4091 ns       4095 ns     171493
BM_StringFindAllMatch/1            14 ns         14 ns   53023977
BM_StringFindAllMatch/8            14 ns         14 ns   51516536
BM_StringFindAllMatch/64           17 ns         17 ns   40992668
BM_StringFindAllMatch/512          37 ns         37 ns   18503267
BM_StringFindAllMatch/4k          153 ns        153 ns    4494458
BM_StringFindAllMatch/32k        1460 ns       1461 ns     483380
BM_StringFindAllMatch/128k       5801 ns       5806 ns     120680
BM_StringFindMatch1/1            2062 ns       2064 ns     333144
BM_StringFindMatch1/8            2057 ns       2059 ns     335496
BM_StringFindMatch1/64           2083 ns       2085 ns     341469
BM_StringFindMatch1/512          2134 ns       2136 ns     336880
BM_StringFindMatch1/4k           2309 ns       2312 ns     308745
BM_StringFindMatch1/32k          3413 ns       3417 ns     208206
BM_StringFindMatch2/1            2053 ns       2055 ns     341523
BM_StringFindMatch2/8            2061 ns       2063 ns     343999
BM_StringFindMatch2/64           2075 ns       2077 ns     338479
BM_StringFindMatch2/512          2102 ns       2104 ns     332276
BM_StringFindMatch2/4k           2286 ns       2288 ns     300416
BM_StringFindMatch2/32k          3385 ns       3388 ns     204158


ChangeLog:

2016-12-07  Aditya Kumar <hiraditya@msn.com>
       * include/bits/basic_string.tcc(find(const _CharT* __s, size_type
       __pos, size_type __n) const)): Improve the algorithm


---
 libstdc++-v3/include/bits/basic_string.tcc | 31 ++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

Comments

Jonathan Wakely Dec. 8, 2016, 11:38 a.m. UTC | #1
On 07/12/16 11:46 -0600, Aditya Kumar wrote:
>Here is an improved version of basic_string::find. The idea is to
>split the string find in two parts:
>1. search for the first match by using traits_type::find (this gets converted to memchr for x86)
>2. see if there is a match (this gets converted to memcmp for x86)

This is a good approach, and more portable than my own attempt using
the GNU memmem function.

I'll run some tests of my own...
Jonathan Wakely Jan. 6, 2017, 1:35 p.m. UTC | #2
On 07/12/16 11:46 -0600, Aditya Kumar wrote:
>Here is an improved version of basic_string::find. The idea is to
>split the string find in two parts:
>1. search for the first match by using traits_type::find (this gets converted to memchr for x86)
>2. see if there is a match (this gets converted to memcmp for x86)
>
>Passes bootstrap on x86-64.
>
>The patch results in good improvements on a synthetic test case I wrote using the google-benchmark.
>following are the results.
>
>
>Branch: master without patch
>$ ./bin/string.libcxx.out
>Run on (24 X 1899.12 MHz CPU s)
>2016-12-06 16:41:55
>***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
>Benchmark                           Time           CPU Iterations
>-----------------------------------------------------------------
>BM_StringFindNoMatch/10             8 ns          8 ns   81880927
>BM_StringFindNoMatch/64            52 ns         52 ns   13235018
>BM_StringFindNoMatch/512          355 ns        355 ns    1962488
>BM_StringFindNoMatch/4k          2769 ns       2772 ns     249090
>BM_StringFindNoMatch/32k        22598 ns      22619 ns      30984
>BM_StringFindNoMatch/128k       89745 ns      89830 ns       7996
>BM_StringFindAllMatch/1             7 ns          7 ns  102893835
>BM_StringFindAllMatch/8             9 ns          9 ns   75403364
>BM_StringFindAllMatch/64           12 ns         12 ns   60766893
>BM_StringFindAllMatch/512          31 ns         31 ns   23163999
>BM_StringFindAllMatch/4k          141 ns        141 ns    4980386
>BM_StringFindAllMatch/32k        1402 ns       1403 ns     483581
>BM_StringFindAllMatch/128k       5604 ns       5609 ns     126123
>BM_StringFindMatch1/1           44430 ns      44473 ns      15804
>BM_StringFindMatch1/8           44315 ns      44357 ns      15741
>BM_StringFindMatch1/64          44689 ns      44731 ns      15712
>BM_StringFindMatch1/512         44247 ns      44290 ns      15724
>BM_StringFindMatch1/4k          45010 ns      45053 ns      15678
>BM_StringFindMatch1/32k         45717 ns      45761 ns      15278
>BM_StringFindMatch2/1           44307 ns      44349 ns      15730
>BM_StringFindMatch2/8           44631 ns      44674 ns      15721
>BM_StringFindMatch2/64          44300 ns      44342 ns      15750
>BM_StringFindMatch2/512         44239 ns      44281 ns      15713
>BM_StringFindMatch2/4k          44886 ns      44928 ns      15787
>
>Branch: master with patch
>$ ./bin/string.libcxx.out
>Run on (24 X 2892.28 MHz CPU s)
>2016-12-06 18:51:38
>***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
>Benchmark                           Time           CPU Iterations
>-----------------------------------------------------------------
>BM_StringFindNoMatch/10            11 ns         11 ns   63049677
>BM_StringFindNoMatch/64            12 ns         12 ns   57259381
>BM_StringFindNoMatch/512           27 ns         27 ns   25495432
>BM_StringFindNoMatch/4k           130 ns        130 ns    5301301
>BM_StringFindNoMatch/32k          858 ns        859 ns     824048
>BM_StringFindNoMatch/128k        4091 ns       4095 ns     171493
>BM_StringFindAllMatch/1            14 ns         14 ns   53023977
>BM_StringFindAllMatch/8            14 ns         14 ns   51516536
>BM_StringFindAllMatch/64           17 ns         17 ns   40992668
>BM_StringFindAllMatch/512          37 ns         37 ns   18503267
>BM_StringFindAllMatch/4k          153 ns        153 ns    4494458
>BM_StringFindAllMatch/32k        1460 ns       1461 ns     483380
>BM_StringFindAllMatch/128k       5801 ns       5806 ns     120680
>BM_StringFindMatch1/1            2062 ns       2064 ns     333144
>BM_StringFindMatch1/8            2057 ns       2059 ns     335496
>BM_StringFindMatch1/64           2083 ns       2085 ns     341469
>BM_StringFindMatch1/512          2134 ns       2136 ns     336880
>BM_StringFindMatch1/4k           2309 ns       2312 ns     308745
>BM_StringFindMatch1/32k          3413 ns       3417 ns     208206
>BM_StringFindMatch2/1            2053 ns       2055 ns     341523
>BM_StringFindMatch2/8            2061 ns       2063 ns     343999
>BM_StringFindMatch2/64           2075 ns       2077 ns     338479
>BM_StringFindMatch2/512          2102 ns       2104 ns     332276
>BM_StringFindMatch2/4k           2286 ns       2288 ns     300416
>BM_StringFindMatch2/32k          3385 ns       3388 ns     204158
>
>
>ChangeLog:
>
>2016-12-07  Aditya Kumar <hiraditya@msn.com>
>       * include/bits/basic_string.tcc(find(const _CharT* __s, size_type
>       __pos, size_type __n) const)): Improve the algorithm
>
>
>---
> libstdc++-v3/include/bits/basic_string.tcc | 31 ++++++++++++++++++++++--------
> 1 file changed, 23 insertions(+), 8 deletions(-)
>
>diff --git a/libstdc++-v3/include/bits/basic_string.tcc b/libstdc++-v3/include/bits/basic_string.tcc
>index df1e8dd..7942ee6 100644
>--- a/libstdc++-v3/include/bits/basic_string.tcc
>+++ b/libstdc++-v3/include/bits/basic_string.tcc
>@@ -1194,14 +1194,29 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>       if (__n == 0)
> 	return __pos <= __size ? __pos : npos;
>
>-      if (__n <= __size)
>-	{
>-	  for (; __pos <= __size - __n; ++__pos)
>-	    if (traits_type::eq(__data[__pos], __s[0])
>-		&& traits_type::compare(__data + __pos + 1,
>-					__s + 1, __n - 1) == 0)
>-	      return __pos;
>-	}
>+      if (__n > __size)
>+	return npos;
>+
>+      const _CharT __elem0 = __s[0];
>+      const _CharT* __first1 = __data;
>+      const _CharT* __first2 = __s;
>+      const _CharT* __last1 = __data + __size;
>+      ptrdiff_t __len2 = __n -  __pos;
>+
>+      while (true) {
>+        ptrdiff_t __len1 = __last1 - __first1;
>+	if (__len1 < __len2)
>+	  return npos;
>+
>+	// Find the first match.
>+	__first1 = traits_type::find(__first1, __len1 - __len2 + 1, __elem0);
>+	if (__first1 == 0)
>+	  return npos;
>+	// Compare the full string when first match is found.
>+        if (traits_type::compare(__first1, __first2, __len2) == 0)
>+	  return __first1 - __data;

Don't you need to increment __first1 here? Otherwise we go into an
infinite loop when the first character is found but the rest of the
string doesn't match.

i.e. this never terminates:

#include <string>
int main() {
  std::string("aa").find("ab");
}

I'm surprised we don't have any tests for this case, but apparently we
don't, as your patch passes all our tests.
Jonathan Wakely Jan. 6, 2017, 8:32 p.m. UTC | #3
On 06/01/17 13:35 +0000, Jonathan Wakely wrote:
>I'm surprised we don't have any tests for this case, but apparently we
>don't, as your patch passes all our tests.

My bad, we do have tests that FAIL with this patch, but only after a
'make clean' (otherwise the explicit instantiation definitions in the
library don't get rebuilt after applying the patch, so the tests use
the old code).

FAIL: 21_strings/basic_string/operations/find/char/1.cc execution test
FAIL: 21_strings/basic_string/operations/find/wchar_t/1.cc execution test

These fail on assert(s.find(s, 1) == std::string::npos) which gives
the wrong answer as explained in my previous mail.

The infinite loop caused by this patch isn't caught by any existing
tests, so I'll add a check for that.
diff mbox

Patch

diff --git a/libstdc++-v3/include/bits/basic_string.tcc b/libstdc++-v3/include/bits/basic_string.tcc
index df1e8dd..7942ee6 100644
--- a/libstdc++-v3/include/bits/basic_string.tcc
+++ b/libstdc++-v3/include/bits/basic_string.tcc
@@ -1194,14 +1194,29 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
       if (__n == 0)
 	return __pos <= __size ? __pos : npos;
 
-      if (__n <= __size)
-	{
-	  for (; __pos <= __size - __n; ++__pos)
-	    if (traits_type::eq(__data[__pos], __s[0])
-		&& traits_type::compare(__data + __pos + 1,
-					__s + 1, __n - 1) == 0)
-	      return __pos;
-	}
+      if (__n > __size)
+	return npos;
+
+      const _CharT __elem0 = __s[0];
+      const _CharT* __first1 = __data;
+      const _CharT* __first2 = __s;
+      const _CharT* __last1 = __data + __size;
+      ptrdiff_t __len2 = __n -  __pos;
+
+      while (true) {
+        ptrdiff_t __len1 = __last1 - __first1;
+	if (__len1 < __len2)
+	  return npos;
+
+	// Find the first match.
+	__first1 = traits_type::find(__first1, __len1 - __len2 + 1, __elem0);
+	if (__first1 == 0)
+	  return npos;
+	// Compare the full string when first match is found.
+        if (traits_type::compare(__first1, __first2, __len2) == 0)
+	  return __first1 - __data;
+      }
+
       return npos;
     }