diff mbox

[3/N] std::regex refactoring - _Executor DFS / BFS

Message ID 20140428144659.GY928@redhat.com
State New
Headers show

Commit Message

Jonathan Wakely April 28, 2014, 2:46 p.m. UTC
This change splits _Executor::_M_main() into two overloaded
_M_main_dispatch() functions, choosing which to run based on the
__dfs_mode template parameter.

I think this gives a (very) small improvement in compilation time when
using regexes.

Splitting _M_main() allows the _M_match_queue and _M_visited members
(which are only used in BFS mode) to be replaced with an instantiation
of the new _States class template. _States<__dfs> only contains the
start state (so that it's not empty and doesn't waste space) but
_States<__bfs> contains the start state and also the match queue and
visited list.

As the visited list never changes size after construction I changed
its type from unique_ptr<vector<bool>> to unique_ptr<bool[]>, which
avoids an indirection, although now that that member is only present
when actually required it could just be vector<bool>. An array of bool
is simpler to access, but takes more heap memory. vector<bool> uses
less space but the compiler has to do all the masking to address
individual bits.

I also changed this range-based for loop in _M_main (now in
_M_main_dispatch(__bfs)) to make __task a reference, avoiding one
copy, and to move from __task.second, avoiding a second copy:

        auto __old_queue = std::move(_M_states._M_match_queue);
        for (auto& __task : __old_queue)
          {
            _M_cur_results = std::move(__task.second);
            _M_dfs<__match_mode>(__task.first);
          }

Thoughts?

Comments

Tim Shen April 28, 2014, 3:40 p.m. UTC | #1
On Mon, Apr 28, 2014 at 10:46 AM, Jonathan Wakely <jwakely@redhat.com> wrote:
> This change splits _Executor::_M_main() into two overloaded
> _M_main_dispatch() functions, choosing which to run based on the
> __dfs_mode template parameter.
>
> I think this gives a (very) small improvement in compilation time when
> using regexes.
>
> Splitting _M_main() allows the _M_match_queue and _M_visited members
> (which are only used in BFS mode) to be replaced with an instantiation
> of the new _States class template. _States<__dfs> only contains the
> start state (so that it's not empty and doesn't waste space) but
> _States<__bfs> contains the start state and also the match queue and
> visited list.

This is great! I keep worrying about the ugly all-in-one _Executor
class; I've tried to specialize two versions of it, but that
introduced much duplicated code. Your abstract is nice!

> As the visited list never changes size after construction I changed
> its type from unique_ptr<vector<bool>> to unique_ptr<bool[]>, which
> avoids an indirection, although now that that member is only present
> when actually required it could just be vector<bool>. An array of bool
> is simpler to access, but takes more heap memory. vector<bool> uses
> less space but the compiler has to do all the masking to address
> individual bits.

What about bitset? Anyway we should get rid of vector<bool>.

> I also changed this range-based for loop in _M_main (now in
> _M_main_dispatch(__bfs)) to make __task a reference, avoiding one
> copy, and to move from __task.second, avoiding a second copy:
>
>        auto __old_queue = std::move(_M_states._M_match_queue);
>        for (auto& __task : __old_queue)
>          {
>            _M_cur_results = std::move(__task.second);
>            _M_dfs<__match_mode>(__task.first);
>          }

I didn't know why I've written the code that copys a lot. Thanks.
Jonathan Wakely April 28, 2014, 4:51 p.m. UTC | #2
On 28/04/14 11:40 -0400, Tim Shen wrote:
>On Mon, Apr 28, 2014 at 10:46 AM, Jonathan Wakely <jwakely@redhat.com> wrote:
>> This change splits _Executor::_M_main() into two overloaded
>> _M_main_dispatch() functions, choosing which to run based on the
>> __dfs_mode template parameter.
>>
>> I think this gives a (very) small improvement in compilation time when
>> using regexes.
>>
>> Splitting _M_main() allows the _M_match_queue and _M_visited members
>> (which are only used in BFS mode) to be replaced with an instantiation
>> of the new _States class template. _States<__dfs> only contains the
>> start state (so that it's not empty and doesn't waste space) but
>> _States<__bfs> contains the start state and also the match queue and
>> visited list.
>
>This is great! I keep worrying about the ugly all-in-one _Executor
>class; I've tried to specialize two versions of it, but that
>introduced much duplicated code. Your abstract is nice!

Thanks. I'll clean up that patch and commit it soon too.

The next thing I plan to look at, which I haven't done yet, is to see
if passing the __match_mode template parameter as a runtime function
parameter makes any difference to the way the code is structuted. Do
you have any thoughts in that, before I waste time doing something
that won't work?

>> As the visited list never changes size after construction I changed
>> its type from unique_ptr<vector<bool>> to unique_ptr<bool[]>, which
>> avoids an indirection, although now that that member is only present
>> when actually required it could just be vector<bool>. An array of bool
>> is simpler to access, but takes more heap memory. vector<bool> uses
>> less space but the compiler has to do all the masking to address
>> individual bits.
>
>What about bitset? Anyway we should get rid of vector<bool>.

The size of a bitset is fixed at compile-time. If we could use
dynarray<bool> that might be nice, but we can't :-)
Tim Shen April 28, 2014, 7:02 p.m. UTC | #3
On Mon, Apr 28, 2014 at 12:51 PM, Jonathan Wakely <jwakely@redhat.com> wrote:
> The next thing I plan to look at, which I haven't done yet, is to see
> if passing the __match_mode template parameter as a runtime function
> parameter makes any difference to the way the code is structuted. Do
> you have any thoughts in that, before I waste time doing something
> that won't work?

The thing behind the template parameter is like this:

0) We have DFS and BFS executor;
1) To be DRY, I write one class for those two approaches. We have to
use some option variable (__match_mode) in a function (say
_Executor::_M_dfs) to distingush one approach from another.
2) Keep checking the flag at runtime hurts efficiency, so a template
flag is used.

However, it turns out that it's not clear that if __match_mode is
frequently asked (in _Executor::_M_main and the _S_opcode_accept
branch in _Executor::_M_dfs).

If we want to change it to a runtime flag, it should be a class
member. Otherwise we have to pass it as a function parameter all the
time, and it may waste an instruction and one byte per recursive call.
It surely make the code cleaner.

Am I premature optimizing?
Jonathan Wakely April 28, 2014, 7:10 p.m. UTC | #4
On 28/04/14 15:02 -0400, Tim Shen wrote:
>If we want to change it to a runtime flag, it should be a class
>member. Otherwise we have to pass it as a function parameter all the
>time, and it may waste an instruction and one byte per recursive call.
>It surely make the code cleaner.

Data members are accessed through the 'this' pointer, which can
require an indirection and be harder to optimise than a function
parameter.

>Am I premature optimizing?

I don't know yet - we need to measure.

My concern is that making it a template parameter causes twice as much
code to be instantiated, which makes executables bigger and can cause
more I-cache misses.
Tim Shen April 28, 2014, 7:24 p.m. UTC | #5
On Mon, Apr 28, 2014 at 3:10 PM, Jonathan Wakely <jwakely@redhat.com> wrote:
> Data members are accessed through the 'this' pointer, which can
> require an indirection and be harder to optimise than a function
> parameter.

It doesn't matter if this variable is not frequently checked. But
let's just use the function parameter and expecting compiler's
optimization.

> My concern is that making it a template parameter causes twice as much
> code to be instantiated, which makes executables bigger and can cause
> more I-cache misses.

Worth a try. Will you make the change or will I? It seems to be
simpler doing than talking.
diff mbox

Patch

diff --git a/libstdc++-v3/include/bits/regex_executor.h b/libstdc++-v3/include/bits/regex_executor.h
index c110b88..064e3df 100644
--- a/libstdc++-v3/include/bits/regex_executor.h
+++ b/libstdc++-v3/include/bits/regex_executor.h
@@ -42,8 +42,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
    */
 
   /**
-   * @brief Takes a regex and an input string in and
-   * do the matching.
+   * @brief Takes a regex and an input string and does the matching.
    *
    * The %_Executor class has two modes: DFS mode and BFS mode, controlled
    * by the template parameter %__dfs_mode.
@@ -52,6 +51,10 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	   bool __dfs_mode>
     class _Executor
     {
+      using __search_mode = integral_constant<bool, __dfs_mode>;
+      using __dfs = true_type;
+      using __bfs = false_type;
+
     public:
       typedef typename iterator_traits<_BiIter>::value_type _CharT;
       typedef basic_regex<_CharT, _TraitsT>                 _RegexT;
@@ -71,16 +74,13 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
       _M_re(__re),
       _M_nfa(*__re._M_automaton),
       _M_results(__results),
-      _M_match_queue(__dfs_mode ? nullptr
-		     : new vector<pair<_StateIdT, _ResultsVec>>()),
       _M_rep_count(_M_nfa.size()),
-      _M_visited(__dfs_mode ? nullptr : new vector<bool>(_M_nfa.size())),
+      _M_states(_M_nfa._M_start(), _M_nfa.size()),
       _M_flags((__flags & regex_constants::match_prev_avail)
 	       ? (__flags
 		  & ~regex_constants::match_not_bol
 		  & ~regex_constants::match_not_bow)
-	       : __flags),
-      _M_start_state(_M_nfa._M_start())
+	       : __flags)
       { }
 
       // Set matched when string exactly match the pattern.
@@ -113,7 +113,16 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
       template<bool __match_mode>
 	bool
-	_M_main();
+	_M_main()
+	{ return _M_main_dispatch<__match_mode>(__search_mode{}); }
+
+      template<bool __match_mode>
+	bool
+	_M_main_dispatch(__dfs);
+
+      template<bool __match_mode>
+	bool
+	_M_main_dispatch(__bfs);
 
       bool
       _M_is_word(_CharT __ch) const
@@ -144,6 +153,53 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
       bool
       _M_lookahead(_State<_TraitsT> __state);
 
+       // Holds additional information used in BFS-mode.
+      template<typename _SearchMode, typename _ResultsVec>
+	struct _State_info;
+
+      template<typename _ResultsVec>
+	struct _State_info<__bfs, _ResultsVec>
+	{
+	  explicit
+	  _State_info(_StateIdT __start, size_t __n)
+	  : _M_start(__start), _M_visited_states(new bool[__n]())
+	  { }
+
+	  bool _M_visited(_StateIdT __i)
+	  {
+	    if (_M_visited_states[__i])
+	      return true;
+	    _M_visited_states[__i] = true;
+	    return false;
+	  }
+
+	  void _M_queue(_StateIdT __i, const _ResultsVec& __res)
+	  { _M_match_queue.emplace_back(__i, __res); }
+
+	  // Saves states that need to be considered for the next character.
+	  vector<pair<_StateIdT, _ResultsVec>>	_M_match_queue;
+	  // Indicates which states are already visited.
+	  unique_ptr<bool[]>			_M_visited_states;
+	  // To record current solution.
+	  _StateIdT _M_start;
+	};
+
+      template<typename _ResultsVec>
+	struct _State_info<__dfs, _ResultsVec>
+	{
+	  explicit
+	  _State_info(_StateIdT __start, size_t) : _M_start(__start)
+	  { }
+
+	  // Dummy implementations for DFS mode.
+	  bool _M_visited(_StateIdT) const { return false; }
+	  void _M_queue(_StateIdT, const _ResultsVec&) { }
+
+	  // To record current solution.
+	  _StateIdT _M_start;
+	};
+
+
     public:
       _ResultsVec                                           _M_cur_results;
       _BiIter                                               _M_current;
@@ -152,15 +208,9 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
       const _RegexT&                                        _M_re;
       const _NFAT&                                          _M_nfa;
       _ResultsVec&                                          _M_results;
-      // Used in BFS, saving states that need to be considered for the next
-      // character.
-      unique_ptr<vector<pair<_StateIdT, _ResultsVec>>>      _M_match_queue;
-      // Used in BFS, indicating that which state is already visited.
       vector<pair<_BiIter, int>>                            _M_rep_count;
-      unique_ptr<vector<bool>>                              _M_visited;
+      _State_info<__search_mode, _ResultsVec>		    _M_states;
       _FlagT                                                _M_flags;
-      // To record current solution.
-      _StateIdT                                             _M_start_state;
       // Do we have a solution so far?
       bool                                                  _M_has_sol;
     };
diff --git a/libstdc++-v3/include/bits/regex_executor.tcc b/libstdc++-v3/include/bits/regex_executor.tcc
index 7f89933..3e14bf6 100644
--- a/libstdc++-v3/include/bits/regex_executor.tcc
+++ b/libstdc++-v3/include/bits/regex_executor.tcc
@@ -35,7 +35,7 @@  namespace __detail
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template<typename _BiIter, typename _Alloc, typename _TraitsT,
-    bool __dfs_mode>
+	   bool __dfs_mode>
     bool _Executor<_BiIter, _Alloc, _TraitsT, __dfs_mode>::
     _M_search()
     {
@@ -53,8 +53,9 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
       return false;
     }
 
-  // This function operates in different modes, DFS mode or BFS mode, indicated
-  // by template parameter __dfs_mode. See _M_main for details.
+  // The _M_main function operates in different modes, DFS mode or BFS mode,
+  // indicated by template parameter __dfs_mode, and dispatches to one of the
+  // _M_main_dispatch overloads.
   //
   // ------------------------------------------------------------
   //
@@ -75,6 +76,18 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // Time complexity: \Omega(match_length), O(2^(_M_nfa.size()))
   // Space complexity: \theta(match_results.size() + match_length)
   //
+  template<typename _BiIter, typename _Alloc, typename _TraitsT,
+	   bool __dfs_mode>
+  template<bool __match_mode>
+    bool _Executor<_BiIter, _Alloc, _TraitsT, __dfs_mode>::
+    _M_main_dispatch(__dfs)
+    {
+      _M_has_sol = false;
+      _M_cur_results = _M_results;
+      _M_dfs<__match_mode>(_M_states._M_start);
+      return _M_has_sol;
+    }
+
   // ------------------------------------------------------------
   //
   // BFS mode:
@@ -84,11 +97,11 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
   //
   // It first computes epsilon closure (states that can be achieved without
   // consuming characters) for every state that's still matching,
-  // using the same DFS algorithm, but doesn't re-enter states (find a true in
-  // _M_visited), nor follows _S_opcode_match.
+  // using the same DFS algorithm, but doesn't re-enter states (using
+  // _M_states._M_visited to check), nor follow _S_opcode_match.
   //
-  // Then apply DFS using every _S_opcode_match (in _M_match_queue) as the start
-  // state.
+  // Then apply DFS using every _S_opcode_match (in _M_states._M_match_queue)
+  // as the start state.
   //
   // It significantly reduces potential duplicate states, so has a better
   // upper bound; but it requires more overhead.
@@ -98,49 +111,39 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // Space complexity: \Omega(_M_nfa.size() + match_results.size())
   //                   O(_M_nfa.size() * match_results.size())
   template<typename _BiIter, typename _Alloc, typename _TraitsT,
-    bool __dfs_mode>
+	   bool __dfs_mode>
   template<bool __match_mode>
     bool _Executor<_BiIter, _Alloc, _TraitsT, __dfs_mode>::
-    _M_main()
+    _M_main_dispatch(__bfs)
     {
-      if (__dfs_mode)
+      _M_states._M_queue(_M_states._M_start, _M_results);
+      bool __ret = false;
+      while (1)
 	{
 	  _M_has_sol = false;
-	  _M_cur_results = _M_results;
-	  _M_dfs<__match_mode>(_M_start_state);
-	  return _M_has_sol;
-	}
-      else
-	{
-	  _M_match_queue->push_back(make_pair(_M_start_state, _M_results));
-	  bool __ret = false;
-	  while (1)
+	  if (_M_states._M_match_queue.empty())
+	    break;
+	  std::fill_n(_M_states._M_visited_states.get(), _M_nfa.size(), false);
+	  auto __old_queue = std::move(_M_states._M_match_queue);
+	  for (auto& __task : __old_queue)
 	    {
-	      _M_has_sol = false;
-	      if (_M_match_queue->empty())
-		break;
-	      _M_visited->assign(_M_visited->size(), false);
-	      auto _M_old_queue = std::move(*_M_match_queue);
-	      for (auto __task : _M_old_queue)
-		{
-		  _M_cur_results = __task.second;
-		  _M_dfs<__match_mode>(__task.first);
-		}
-	      if (!__match_mode)
-		__ret |= _M_has_sol;
-	      if (_M_current == _M_end)
-		break;
-	      ++_M_current;
+	      _M_cur_results = std::move(__task.second);
+	      _M_dfs<__match_mode>(__task.first);
 	    }
-	  if (__match_mode)
-	    __ret = _M_has_sol;
-	  return __ret;
+	  if (!__match_mode)
+	    __ret |= _M_has_sol;
+	  if (_M_current == _M_end)
+	    break;
+	  ++_M_current;
 	}
+      if (__match_mode)
+	__ret = _M_has_sol;
+      return __ret;
     }
 
   // Return whether now match the given sub-NFA.
   template<typename _BiIter, typename _Alloc, typename _TraitsT,
-    bool __dfs_mode>
+	   bool __dfs_mode>
     bool _Executor<_BiIter, _Alloc, _TraitsT, __dfs_mode>::
     _M_lookahead(_State<_TraitsT> __state)
     {
@@ -150,7 +153,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 							    __what,
 							    _M_re,
 							    _M_flags));
-      __sub->_M_start_state = __state._M_alt;
+      __sub->_M_states._M_start = __state._M_alt;
       if (__sub->_M_search_from_first())
 	{
 	  for (size_t __i = 0; __i < __what.size(); __i++)
@@ -195,17 +198,13 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
     };
 
   template<typename _BiIter, typename _Alloc, typename _TraitsT,
-    bool __dfs_mode>
+	   bool __dfs_mode>
   template<bool __match_mode>
     void _Executor<_BiIter, _Alloc, _TraitsT, __dfs_mode>::
     _M_dfs(_StateIdT __i)
     {
-      if (!__dfs_mode)
-	{
-	  if ((*_M_visited)[__i])
-	    return;
-	  (*_M_visited)[__i] = true;
-	}
+      if (_M_states._M_visited(__i))
+	return;
 
       const auto& __state = _M_nfa[__i];
       // Every change on _M_cur_results and _M_current will be rolled back after
@@ -302,8 +301,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	    }
 	  else
 	    if (__state._M_matches(*_M_current))
-	      _M_match_queue->push_back(make_pair(__state._M_next,
-						  _M_cur_results));
+	      _M_states._M_queue(__state._M_next, _M_cur_results);
 	  break;
 	// First fetch the matched result from _M_cur_results as __submatch;
 	// then compare it with
@@ -375,7 +373,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   // Return whether now is at some word boundary.
   template<typename _BiIter, typename _Alloc, typename _TraitsT,
-    bool __dfs_mode>
+	   bool __dfs_mode>
     bool _Executor<_BiIter, _Alloc, _TraitsT, __dfs_mode>::
     _M_word_boundary(_State<_TraitsT> __state) const
     {