diff mbox

Fix regex multiple consecutive quantifiers bug.

Message ID CAPrifDmyRQeeereCPY+KNsuCzyovcxwpLv4HZw1_tR6kQ-yNNg@mail.gmail.com
State New
Headers show

Commit Message

Tim Shen Jan. 19, 2014, 9:59 p.m. UTC
On Sun, Jan 19, 2014 at 5:01 AM, Paolo Carlini <paolo.carlini@oracle.com> wrote:
> Ok. Please remove 2013 as copyright year for the tescase. I also think you
> can avoid the auxiliary includes.

Oops, fixed, including fixing incorrect file name :)

Tested and committed.

Comments

Tim Shen Jan. 20, 2014, 8:14 a.m. UTC | #1
On Sun, Jan 19, 2014 at 4:59 PM, Tim Shen <timshen91@gmail.com> wrote:
> Tested and committed.

It's quite interesting that after this change in the patch:
-   this->_M_quantifier();
+   while (this->_M_quantifier());

...even the regex input is nothing to do with quantifiers at all (say
regex re(" ")), g++ -O3 generates slower code than -O2:

~ # g++ -O2 perf.cc && time ./a.out
./a.out  0.46s user 0.00s system 99% cpu 0.461 total
~ # g++ -O3 perf.cc && time ./a.out
./a.out  0.56s user 0.00s system 99% cpu 0.569 total

perf.cc is almost the same as testsuite/performance/28_regex/split.cc.

Following the man page, I found that g++ claims that the difference
between -O3 and -O2 are:
~ # /usr/bin/g++ -c -Q -O3 --help=optimizers > /tmp/O3-opts
~ # /usr/bin/g++ -c -Q -O2 --help=optimizers > /tmp/O2-opts
~ # diff /tmp/O2-opts /tmp/O3-opts | grep enabled
>   -fgcse-after-reload                 [enabled]
>   -finline-functions                  [enabled]
>   -fipa-cp-clone                      [enabled]
>   -fpredictive-commoning              [enabled]
>   -ftree-loop-distribute-patterns     [enabled]
>   -ftree-loop-vectorize               [enabled]
>   -ftree-partial-pre                  [enabled]
>   -ftree-slp-vectorize                [enabled]
>   -funswitch-loops                    [enabled]

However, -O2 with those flags give me a postive result:
~ # g++ -O2 perf.cc -fgcse-after-reload -finline-functions
-fipa-cp-clone -fpredictive-commoning -ftree-loop-distribute-patterns
-ftree-loop-vectorize -ftree-partial-pre -ftree-slp-vectorize
-funswitch-loops && time ./a.out
./a.out  0.45s user 0.01s system 99% cpu 0.460 total

By the way, my "g++" is alas to "g++ -g -Wall -std=c++11", and I'm
sure -g doesn't matter.

I don't know much about it. Is there any explanations as interesting
as the question? ;)

Thank you!
Marc Glisse Jan. 20, 2014, 8:31 a.m. UTC | #2
On Mon, 20 Jan 2014, Tim Shen wrote:

> ...even the regex input is nothing to do with quantifiers at all (say
> regex re(" ")), g++ -O3 generates slower code than -O2:
>
> ~ # g++ -O2 perf.cc && time ./a.out
> ./a.out  0.46s user 0.00s system 99% cpu 0.461 total
> ~ # g++ -O3 perf.cc && time ./a.out
> ./a.out  0.56s user 0.00s system 99% cpu 0.569 total
>
> perf.cc is almost the same as testsuite/performance/28_regex/split.cc.
>
> Following the man page, I found that g++ claims that the difference
> between -O3 and -O2 are:

It is a FAQ that you can't get the effects of -Oy with -Ox and a bunch of 
-f flags. Some things depend directly on the optimization level. Note that 
you can also try the reverse: start from -O3 and use -fno-* flags.

I would first add -march=native to make sure this isn't the result of 
optimizing for a different platform.

I would suggest you use -fdump-tree-optimized and look at the generated 
files at -O2 and -O3 and see (they are in a vaguely C-like dialect) if you 
can find a difference that might explain the result. If so, then you can 
use -fdump-tree-all and find out where exactly gcc is going wrong. If not, 
there is -da, but the files will be much harder to read.

In any case, reducing the testcase can only make it easier to understand 
the issue.
Paolo Carlini Jan. 20, 2014, 8:41 a.m. UTC | #3
Hi,

On 01/20/2014 09:31 AM, Marc Glisse wrote:
> It is a FAQ that you can't get the effects of -Oy with -Ox and a bunch 
> of -f flags. Some things depend directly on the optimization level.
For example, inlining of free functions. By the way, I think it is a 
*very* common experience that -O3 doesn't do better than -O2. Thus I'm 
not particularly surprised or worried ;)

Paolo.
diff mbox

Patch

diff --git a/libstdc++-v3/include/bits/regex_compiler.h b/libstdc++-v3/include/bits/regex_compiler.h
index 216f8fb..fe2e5f1 100644
--- a/libstdc++-v3/include/bits/regex_compiler.h
+++ b/libstdc++-v3/include/bits/regex_compiler.h
@@ -83,7 +83,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
       bool
       _M_assertion();
 
-      void
+      bool
       _M_quantifier();
 
       bool
diff --git a/libstdc++-v3/include/bits/regex_compiler.tcc b/libstdc++-v3/include/bits/regex_compiler.tcc
index 621e43f..128dac1 100644
--- a/libstdc++-v3/include/bits/regex_compiler.tcc
+++ b/libstdc++-v3/include/bits/regex_compiler.tcc
@@ -135,7 +135,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	return true;
       if (this->_M_atom())
 	{
-	  this->_M_quantifier();
+	  while (this->_M_quantifier());
 	  return true;
 	}
       return false;
@@ -173,7 +173,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
     }
 
   template<typename _TraitsT>
-    void
+    bool
     _Compiler<_TraitsT>::
     _M_quantifier()
     {
@@ -276,6 +276,9 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	    }
 	  _M_stack.push(__e);
 	}
+      else
+	return false;
+      return true;
     }
 
 #define __INSERT_REGEX_MATCHER(__func, args...)\
diff --git a/libstdc++-v3/testsuite/28_regex/basic_regex/multiple_quantifiers.cc b/libstdc++-v3/testsuite/28_regex/basic_regex/multiple_quantifiers.cc
new file mode 100644
index 0000000..5670cbb
--- /dev/null
+++ b/libstdc++-v3/testsuite/28_regex/basic_regex/multiple_quantifiers.cc
@@ -0,0 +1,33 @@ 
+// { dg-options "-std=gnu++11" }
+
+// Copyright (C) 2014 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+//
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// <http://www.gnu.org/licenses/>.
+
+// 28.8 basic_regex
+// Tests multiple consecutive quantifiers
+
+#include <regex>
+
+using namespace std;
+
+int
+main()
+{
+  regex re1("a++");
+  regex re2("(a+)+");
+  return 0;
+}