[1/3] Use manual regex algorithm switching

Message ID	CAPrifDk3eVrCDzk=_5MgExooF+BrpVWCi8AWaZef6YNhenQj8w@mail.gmail.com
State	New
Headers	show Return-Path: <gcc-patches-return-366025-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; q=dns; s=default; b=oPLb45ckFfXJJeQb5x M1gy/LAl32Misge4fdF8weHpl4AtJACLkN7XSq3uKoQiYrdgOyg4GRU6X7B7LvNx oRbIy2oQYmfaqab5Blt7tajeGV1H6q0qU8ndn7EM++yizBxwXFkR8rXAykmPMctU nn/Xl6ZlbUj+jGyytlxDXNHco= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org MIME-Version: 1.0 In-Reply-To: <20140425235620.GP928@redhat.com> References: <CAPrifDm8MwcHY8XSnW_z6Z_hLVS+Lw4huqWEF3_eNT=fw3mkRg@mail.gmail.com> <20140425211417.GM928@redhat.com> <CAPrifD=y+bxDS+V0yiKbc-GvKC5+Vr1H7ofxuLHF=BOvtdXMGA@mail.gmail.com> <20140425235620.GP928@redhat.com> Date: Fri, 25 Apr 2014 20:50:47 -0400 Message-ID: <CAPrifDk3eVrCDzk=_5MgExooF+BrpVWCi8AWaZef6YNhenQj8w@mail.gmail.com> Subject: Re: [Patch 1/3] Use manual regex algorithm switching From: Tim Shen <timshen91@gmail.com> To: Jonathan Wakely <jwakely@redhat.com> Cc: "libstdc++" <libstdc++@gcc.gnu.org>, gcc-patches <gcc-patches@gcc.gnu.org> Content-Type: multipart/mixed; boundary=047d7b41cc860cb2c604f7e77940

Message ID

CAPrifDk3eVrCDzk=_5MgExooF+BrpVWCi8AWaZef6YNhenQj8w@mail.gmail.com

State

New

Headers

DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:mime-version:in-reply-to:references:date:message-id:subject
	:from:to:cc:content-type; q=dns; s=default; b=oPLb45ckFfXJJeQb5x
	M1gy/LAl32Misge4fdF8weHpl4AtJACLkN7XSq3uKoQiYrdgOyg4GRU6X7B7LvNx
	oRbIy2oQYmfaqab5Blt7tajeGV1H6q0qU8ndn7EM++yizBxwXFkR8rXAykmPMctU
	nn/Xl6ZlbUj+jGyytlxDXNHco=
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
Sender: gcc-patches-owner@gcc.gnu.org
MIME-Version: 1.0
In-Reply-To: <20140425235620.GP928@redhat.com>
References: <CAPrifDm8MwcHY8XSnW_z6Z_hLVS+Lw4huqWEF3_eNT=fw3mkRg@mail.gmail.com>	<20140425211417.GM928@redhat.com>	<CAPrifD=y+bxDS+V0yiKbc-GvKC5+Vr1H7ofxuLHF=BOvtdXMGA@mail.gmail.com>	<20140425235620.GP928@redhat.com>
Date: Fri, 25 Apr 2014 20:50:47 -0400
Message-ID: <CAPrifDk3eVrCDzk=_5MgExooF+BrpVWCi8AWaZef6YNhenQj8w@mail.gmail.com>
Subject: Re: [Patch 1/3] Use manual regex algorithm switching
From: Tim Shen <timshen91@gmail.com>
To: Jonathan Wakely <jwakely@redhat.com>
Cc: "libstdc++" <libstdc++@gcc.gnu.org>,
	gcc-patches <gcc-patches@gcc.gnu.org>
Content-Type: multipart/mixed; boundary=047d7b41cc860cb2c604f7e77940

Comments

Jonathan Wakely April 26, 2014, 2:50 p.m. UTC | #1

On 25/04/14 20:50 -0400, Tim Shen wrote:
>
>    	* include/bits/regex.tcc (__regex_algo_impl<>): Remove
>    	_GLIBCXX_REGEX_DFS_QUANTIFIERS_LIMIT and use
>    	_GLIBCXX_REGEX_USE_THOMPSON_NFA instead.
>    	* include/bits/regex_automaton.h: Remove quantifier counting variable.
>    	* include/bits/regex_automaton.tcc (_State_base::_M_dot):
>    	Adjust debug NFA dump.

This patch is OK, thanks very much.

Tim Shen April 27, 2014, 11:49 p.m. UTC | #2

On Sat, Apr 26, 2014 at 10:50 AM, Jonathan Wakely <jwakely@redhat.com> wrote:
> This patch is OK, thanks very much.

Committed. Thanks!

diff --git a/libstdc++-v3/include/bits/regex.tcc b/libstdc++-v3/include/bits/regex.tcc
index 5fa1f01..0d737a0 100644
--- a/libstdc++-v3/include/bits/regex.tcc
+++ b/libstdc++-v3/include/bits/regex.tcc
@@ -28,12 +28,12 @@ 
  *  Do not attempt to use it directly. @headername{regex}
  */
 
-// See below __regex_algo_impl to get what this is talking about. The default
-// value 1 indicated a conservative optimization without giving up worst case
-// performance.
-#ifndef _GLIBCXX_REGEX_DFS_QUANTIFIERS_LIMIT
-#define _GLIBCXX_REGEX_DFS_QUANTIFIERS_LIMIT 1
-#endif
+// A non-standard switch to let the user pick the matching algorithm.
+// If _GLIBCXX_REGEX_USE_THOMPSON_NFA is defined, the thompson NFA
+// algorithm will be used. This algorithm is not enabled by default,
+// and cannot be used if the regex contains back-references, but has better
+// (polynomial instead of exponential) worst case performace.
+// See __regex_algo_impl below.
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
@@ -66,24 +66,15 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
       for (auto& __it : __res)
 	__it.matched = false;
 
-      // This function decide which executor to use under given circumstances.
-      // The _S_auto policy now is the following: if a NFA has no
-      // back-references and has more than _GLIBCXX_REGEX_DFS_QUANTIFIERS_LIMIT
-      // quantifiers (*, +, ?), the BFS executor will be used, other wise
-      // DFS executor. This is because DFS executor has a exponential upper
-      // bound, but better best-case performace. Meanwhile, BFS executor can
-      // effectively prevent from exponential-long time matching (which must
-      // contains many quantifiers), but it's slower in average.
-      //
-      // For simple regex, BFS executor could be 2 or more times slower than
-      // DFS executor.
-      //
-      // Of course, BFS executor cannot handle back-references.
+      // __policy is used by testsuites so that they can use Thompson NFA
+      // without defining a macro. Users should define
+      // _GLIBCXX_REGEX_USE_THOMPSON_NFA if they need to use this approach.
       bool __ret;
       if (!__re._M_automaton->_M_has_backref
-	  && (__policy == _RegexExecutorPolicy::_S_alternate
-	      || __re._M_automaton->_M_quant_count
-		> _GLIBCXX_REGEX_DFS_QUANTIFIERS_LIMIT))
+#ifndef _GLIBCXX_REGEX_USE_THOMPSON_NFA
+	  && __policy == _RegexExecutorPolicy::_S_alternate
+#endif
+	  )
 	{
 	  _Executor<_BiIter, _Alloc, _TraitsT, false>
 	    __executor(__s, __e, __m, __re, __flags);
diff --git a/libstdc++-v3/include/bits/regex_automaton.h b/libstdc++-v3/include/bits/regex_automaton.h
index a442cfe..64ecd6d 100644
--- a/libstdc++-v3/include/bits/regex_automaton.h
+++ b/libstdc++-v3/include/bits/regex_automaton.h
@@ -74,8 +74,6 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
       size_t _M_backref_index;  // for _S_opcode_backref
       struct
       {
-	// for _S_opcode_alternative.
-	_StateIdT  _M_quant_index;
 	// for _S_opcode_alternative or _S_opcode_subexpr_lookahead
 	_StateIdT  _M_alt;
 	// for _S_opcode_word_boundary or _S_opcode_subexpr_lookahead or
@@ -120,7 +118,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
     explicit
     _NFA_base(_FlagT __f)
     : _M_flags(__f), _M_start_state(0), _M_subexpr_count(0),
-    _M_quant_count(0), _M_has_backref(false)
+    _M_has_backref(false)
     { }
 
     _NFA_base(_NFA_base&&) = default;
@@ -145,7 +143,6 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
     _FlagT                    _M_flags;
     _StateIdT                 _M_start_state;
     _SizeT                    _M_subexpr_count;
-    _SizeT                    _M_quant_count;
     bool                      _M_has_backref;
   };
 
@@ -175,7 +172,6 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	_StateT __tmp(_S_opcode_alternative);
 	// It labels every quantifier to make greedy comparison easier in BFS
 	// approach.
-	__tmp._M_quant_index = this->_M_quant_count++;
 	__tmp._M_next = __next;
 	__tmp._M_alt = __alt;
 	__tmp._M_neg = __neg;
diff --git a/libstdc++-v3/include/bits/regex_automaton.tcc b/libstdc++-v3/include/bits/regex_automaton.tcc
index 1476ae2..38787fa 100644
--- a/libstdc++-v3/include/bits/regex_automaton.tcc
+++ b/libstdc++-v3/include/bits/regex_automaton.tcc
@@ -74,9 +74,9 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
       case _S_opcode_alternative:
 	__ostr << __id << " [label=\"" << __id << "\\nALT\"];\n"
 	       << __id << " -> " << _M_next
-	       << " [label=\"epsilon\", tailport=\"s\"];\n"
+	       << " [label=\"next\", tailport=\"s\"];\n"
 	       << __id << " -> " << _M_alt
-	       << " [label=\"epsilon\", tailport=\"n\"];\n";
+	       << " [label=\"alt\", tailport=\"n\"];\n";
 	break;
       case _S_opcode_backref:
 	__ostr << __id << " [label=\"" << __id << "\\nBACKREF "

[1/3] Use manual regex algorithm switching

Commit Message

Comments

Patch