diff mbox

[PR,61720] Clear regex BFS match queue after every iteration

Message ID CAG4ZjN=RtUJqbPcOxjgNOvgGWuOJGQ85iXXew+4esV1sFM7HhQ@mail.gmail.com
State New
Headers show

Commit Message

Tim Shen July 9, 2014, 1:53 a.m. UTC
On Mon, Jul 7, 2014 at 1:24 AM, Paolo Carlini <paolo.carlini@oracle.com> wrote:
> Bah... too bad we have to resort to this. Then please summarize the
> discussion at the beginning of the file: what the file is *exactly* for (an
> accidental contributor should be able to understand if and what should go in
> it) and why it does exist in the first place. Also, please remember the test
> variables.

I think it is possible (but not for sure) to reduce compile time by
rewriting _Compiler. I did some simple compile time profiling on regex
and made that conclusion but I don't know why.

Comments

Paolo Carlini July 9, 2014, 7:48 a.m. UTC | #1
Hi,

On 07/09/2014 03:53 AM, Tim Shen wrote:
> +// This file is for general testcases in regex.
> +// We use a single file for multiple testcases because it takes too long to
> +// compile regex for each testcase in a single file.
> +// Normal testcases in other files should be moved into this file in the future.
Thus the plan would be *all* the regex testcases in a single file!?! If 
I understand correctly, that certainly doesn't make sense. In any case, 
I don't want this detail to slow down fixing the bug, thus please commit 
fix + normal testcase mainline and branch and then we'll take our time 
discussing the testing issue (I'm also meeting Jon face to face in a few 
days...) By the way, are we sure that the various regex testcases are 
enabled to run in parallel in the testsuite?

Thanks,
Paolo.
Jonathan Wakely July 9, 2014, 8 a.m. UTC | #2
On 08/07/14 18:53 -0700, Tim Shen wrote:
>+// libstdc++/61720
>+void
>+PR61720()
>+{
>+  bool test __attribute__((unused)) = true;
>+
>+  string test = R"("test\")";

Does this compile?
You've redeclared 'test'
Jonathan Wakely July 9, 2014, 8:08 a.m. UTC | #3
On 09/07/14 09:48 +0200, Paolo Carlini wrote:
>Hi,
>
>On 07/09/2014 03:53 AM, Tim Shen wrote:
>>+// This file is for general testcases in regex.
>>+// We use a single file for multiple testcases because it takes too long to
>>+// compile regex for each testcase in a single file.
>>+// Normal testcases in other files should be moved into this file in the future.
>Thus the plan would be *all* the regex testcases in a single file!?! 
>If I understand correctly, that certainly doesn't make sense. In any 

Jakub pointed out that there are lots of useful tests for regexes in
the glibc testsuite. I think it might make sense to have a file of
tests cases that contains separate fields such as the input string,
the regex, the flags, and the expected result. Then a test case could
loop over the contents of that file and run every test.

I don't think it makes sense to add all new tests to a single file,
but if we import hundreds of test cases at once we probably don't want
to add hundereds of separate files each testing a single input string
against a single regex.

>case, I don't want this detail to slow down fixing the bug, thus 
>please commit fix + normal testcase mainline and branch and then we'll 
>take our time discussing the testing issue (I'm also meeting Jon face 
>to face in a few days...) By the way, are we sure that the various 
>regex testcases are enabled to run in parallel in the testsuite?

All the 28_regex tests run as part of the same target:

          normal7) \
            dirs="`cd $$srcdir; echo 26_*/* 28_*/[c-z]*`";; \

I've tried moving the 26_numeric tests to a different target but it
doesn't make much difference, the regex tests still take longer than
any other target. We could try splitting up the 28_regex tests across
two targets, but the fact is that it takes 2-5 seconds to compile any
test using std::regex, so the more tests we add the slower the
testsuite gets.
Paolo Carlini July 9, 2014, 8:10 a.m. UTC | #4
Hi,

On 07/09/2014 10:08 AM, Jonathan Wakely wrote:
> All the 28_regex tests run as part of the same target:
>
>          normal7) \
>            dirs="`cd $$srcdir; echo 26_*/* 28_*/[c-z]*`";; \
>
> I've tried moving the 26_numeric tests to a different target but it
> doesn't make much difference, the regex tests still take longer than
> any other target. We could try splitting up the 28_regex tests across
> two targets, but the fact is that it takes 2-5 seconds to compile any
> test using std::regex, so the more tests we add the slower the
> testsuite gets.
I'm not an expert in this area, but my guess is indeed that it would be 
beneficial to split up 28_regex across multiple targets, why not more 
than 2, I would try 4 for example.

In general, I agree it would certainly make sense grouping the tests, 
but not all in a single file! Anyway, let's fix this bug first.

Paolo.
Jonathan Wakely July 9, 2014, 8:23 a.m. UTC | #5
On 09/07/14 10:10 +0200, Paolo Carlini wrote:
>Hi,
>
>On 07/09/2014 10:08 AM, Jonathan Wakely wrote:
>>All the 28_regex tests run as part of the same target:
>>
>>         normal7) \
>>           dirs="`cd $$srcdir; echo 26_*/* 28_*/[c-z]*`";; \
>>
>>I've tried moving the 26_numeric tests to a different target but it
>>doesn't make much difference, the regex tests still take longer than
>>any other target. We could try splitting up the 28_regex tests across
>>two targets, but the fact is that it takes 2-5 seconds to compile any
>>test using std::regex, so the more tests we add the slower the
>>testsuite gets.
>I'm not an expert in this area, but my guess is indeed that it would 
>be beneficial to split up 28_regex across multiple targets, why not 
>more than 2, I would try 4 for example.
>
>In general, I agree it would certainly make sense grouping the tests, 
>but not all in a single file! Anyway, let's fix this bug first.

I can't actually reproduce the bug with trunk.
Jakub Jelinek July 9, 2014, 8:26 a.m. UTC | #6
On Wed, Jul 09, 2014 at 09:08:40AM +0100, Jonathan Wakely wrote:
> >Thus the plan would be *all* the regex testcases in a single file!?! If I
> >understand correctly, that certainly doesn't make sense. In any
> 
> Jakub pointed out that there are lots of useful tests for regexes in
> the glibc testsuite. I think it might make sense to have a file of
> tests cases that contains separate fields such as the input string,
> the regex, the flags, and the expected result. Then a test case could
> loop over the contents of that file and run every test.

Most of those glibc tests are actually text file driven, where you have
a small testcase driver which opens the text file, parses it and invokes
regex on the pattern in one column, text to search in another one, with some
flags etc. in yet other columns.  4 different file formats I think, each
with hundreds of individual tests.  The reason for different file formats
is where those come from, there are e.g. boost regex tests, rxspencer regex
tests, pcre tests and then supposedly glibc's own old format.

	Jakub
Mike Stump July 9, 2014, 5:33 p.m. UTC | #7
On Jul 9, 2014, at 12:48 AM, Paolo Carlini <paolo.carlini@oracle.com> wrote:
> 
> On 07/09/2014 03:53 AM, Tim Shen wrote:
>> +// This file is for general testcases in regex.
>> +// We use a single file for multiple testcases because it takes too long to
>> +// compile regex for each testcase in a single file.
>> +// Normal testcases in other files should be moved into this file in the future.
> Thus the plan would be *all* the regex testcases in a single file!?!

So, there is a balance here.  The way I view it, imagine you had a 1024 CPU system, do you increase the testing time?  If other parts of the test suite take 4 minutes, then you can take up to 4 minutes and be covered by that other long arm.  You don’t want to break them up to 2.5 seconds a testcase for the theoretic beauty of 1 regexp per test case, when you can run 1000 of them in one run, using 2.48 seconds overhead plus 0.02 seconds per case.  The other limit is how big the resulting program is.  Too large, and it won’t test on a small memory system.  Under a meg, and most people don’t care.  As it climbs above that it starts mattering more.

So I endorse batching them up to a limit, more than 5 minutes, and it is certainly way too big.  Under 20 seconds, and I don’t see the problem.  I’d likely stop batching at the 30 seconds mark, which is 1/10 of the usual test case limit of 300 seconds.  Some people think running it up to 280 seconds is good for a single test case, this is wrong.  On a slow system, they blow the limit, and on a multi user system that is not completely unloaded, they blow the limit.
Jonathan Wakely July 9, 2014, 6:33 p.m. UTC | #8
On 9 July 2014 18:33, Mike Stump wrote:
> The other limit is how big the resulting program is.  Too large, and it won’t test on a small memory system.  Under a meg, and most people don’t care.  As it climbs above that it starts mattering more.

If the test is driven by a data file containing the cases to test then
the executable going to be pretty small, as it reads a single case
form the file and runs it, then loops. So executable size shouldn't be
an issue here.
diff mbox

Patch

diff --git a/libstdc++-v3/include/bits/regex_executor.tcc b/libstdc++-v3/include/bits/regex_executor.tcc
index 38b8ff2..3c68668 100644
--- a/libstdc++-v3/include/bits/regex_executor.tcc
+++ b/libstdc++-v3/include/bits/regex_executor.tcc
@@ -137,6 +137,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	}
       if (__match_mode == _Match_mode::_Exact)
 	__ret = _M_has_sol;
+      _M_states._M_match_queue.clear();
       return __ret;
     }
 
diff --git a/libstdc++-v3/testsuite/28_regex/general_testcases.cc b/libstdc++-v3/testsuite/28_regex/general_testcases.cc
new file mode 100644
index 0000000..c7372ab
--- /dev/null
+++ b/libstdc++-v3/testsuite/28_regex/general_testcases.cc
@@ -0,0 +1,53 @@ 
+// { dg-options "-std=gnu++11" }
+
+//
+// Copyright (C) 2014 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+//
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// <http://www.gnu.org/licenses/>.
+
+// This file is for general testcases in regex.
+// We use a single file for multiple testcases because it takes too long to
+// compile regex for each testcase in a single file.
+// Normal testcases in other files should be moved into this file in the future.
+
+#include <regex>
+#include <testsuite_hooks.h>
+#include <testsuite_regex.h>
+
+using namespace __gnu_test;
+using namespace std;
+
+// libstdc++/61720
+void
+PR61720()
+{
+  bool test __attribute__((unused)) = true;
+
+  string test = R"("test\")";
+  VERIFY(!regex_search_debug(test, regex(R"("([^"]|\\")*[^\\]")")));
+  VERIFY(!regex_match_debug(test, regex(R"("([^"]|\\")*[^\\]")")));
+  VERIFY(!regex_search_debug(test, regex(R"("([^"]|\\")*[^\\]")",
+					 regex_constants::extended)));
+  VERIFY(!regex_match_debug(test, regex(R"("([^"]|\\")*[^\\]")",
+					regex_constants::extended)));
+}
+
+int
+main()
+{
+  PR61720();
+  return 0;
+}