diff mbox

[libstdc++/63920] Fix regex_constants::match_not_null behavior

Message ID CAG4ZjNm2Dj3UbcP1L1fp7BTku2e2zDyDwU1SH4hYSktz5XhbJA@mail.gmail.com
State New
Headers show

Commit Message

Tim Shen Nov. 18, 2014, 7:12 p.m. UTC
Bootstrapped and tested.

Thanks!

Comments

Paolo Carlini Nov. 18, 2014, 7:19 p.m. UTC | #1
Hi,

On 11/18/2014 08:12 PM, Tim Shen wrote:
> Bootstrapped and tested.
Jonathan lately is following your work much better than me, but naively 
seems weird that _M_begin is non-const and _M_end is const, a different 
type anyway.

Paolo.
Tim Shen Nov. 19, 2014, 7:27 a.m. UTC | #2
On Tue, Nov 18, 2014 at 11:19 AM, Paolo Carlini
<paolo.carlini@oracle.com> wrote:
> Jonathan lately is following your work much better than me, but naively
> seems weird that _M_begin is non-const and _M_end is const, a different type
> anyway.

Hmm. The current regex_search algorithm is implemented as try match
starting from _M_begin; if it doesn't match, start over from
_M_begin+1, ...

As we can tell, _M_begin is never changed, and it's const.

The problem is when the executer reaches the accept state (which
indicates a match) we use _M_current == _M_begin to verify if it's an
empty match. It is possible that, when we are not in the first
iteration, say, in the second iteration actually, _M_current is
initialized with _M_begin+1. It turns out even _M_current has never
been increased (no chars are eaten, aka empty match), _M_current !=
_M_begin is still true.

This fix is making each regex_search iteration more thorough, with
increased _M_begin, as if it's a new regex _M_search_from_first.

I've carefully (admittedly, after sending this patch) inspect
everywhere when _M_begin is used. It turns out _M_begin is under
well-defined (the initial position of _M_current when current
iteration starts) invariants (see _Executor<>::_M_at_begin), indicated
by the use of regex_constants::match_prev_avail. This flag actually
implies that __begin iterator passed into regex_search is not always
"the physical boundary" who matches "^". Boost (and we) conforms this
behavior:

    std::regex_search("asdf", std::regex("^asdf"),
std::regex_constants::match_prev_avail)

returns false.

It's more elegant to move _Executor::_M_search out of its class and
make _M_begin still const, but _Executor costs too much to initialize.
Paolo Carlini Nov. 19, 2014, 4:16 p.m. UTC | #3
Hi Tim,

On 11/19/2014 08:27 AM, Tim Shen wrote:
> On Tue, Nov 18, 2014 at 11:19 AM, Paolo Carlini
> <paolo.carlini@oracle.com> wrote:
>> Jonathan lately is following your work much better than me, but naively
>> seems weird that _M_begin is non-const and _M_end is const, a different type
>> anyway.
> Hmm. The current regex_search algorithm is implemented as try match
> starting from _M_begin; if it doesn't match, start over from
> _M_begin+1, ...
>
> As we can tell, _M_begin is never changed, and it's const.
>
> The problem is when the executer reaches the accept state (which
> indicates a match) we use _M_current == _M_begin to verify if it's an
> empty match. It is possible that, when we are not in the first
> iteration, say, in the second iteration actually, _M_current is
> initialized with _M_begin+1. It turns out even _M_current has never
> been increased (no chars are eaten, aka empty match), _M_current !=
> _M_begin is still true.
>
> This fix is making each regex_search iteration more thorough, with
> increased _M_begin, as if it's a new regex _M_search_from_first.
>
> I've carefully (admittedly, after sending this patch) inspect
> everywhere when _M_begin is used. It turns out _M_begin is under
> well-defined (the initial position of _M_current when current
> iteration starts) invariants (see _Executor<>::_M_at_begin), indicated
> by the use of regex_constants::match_prev_avail. This flag actually
> implies that __begin iterator passed into regex_search is not always
> "the physical boundary" who matches "^". Boost (and we) conforms this
> behavior:
>
>      std::regex_search("asdf", std::regex("^asdf"),
> std::regex_constants::match_prev_avail)
>
> returns false.
>
> It's more elegant to move _Executor::_M_search out of its class and
> make _M_begin still const, but _Executor costs too much to initialize.
Good. To be clear, not having carefully analyzed whatsoever, my point 
was more about changing _M_end too, to non-const, than about not 
touching _M_begin. Would that make sense?

Thanks,
Paolo.
Tim Shen Nov. 19, 2014, 6:42 p.m. UTC | #4
On Wed, Nov 19, 2014 at 8:16 AM, Paolo Carlini <paolo.carlini@oracle.com> wrote:
> Good. To be clear, not having carefully analyzed whatsoever, my point was
> more about changing _M_end too, to non-const, than about not touching
> _M_begin. Would that make sense?

Currently we never mutate _M_end. I *believe* that we still won't in
the future, since _M_end is not as volatile as _M_begin.
Daniel Krügler Nov. 19, 2014, 6:43 p.m. UTC | #5
2014-11-19 19:42 GMT+01:00 Tim Shen <timshen@google.com>:
> On Wed, Nov 19, 2014 at 8:16 AM, Paolo Carlini <paolo.carlini@oracle.com> wrote:
>> Good. To be clear, not having carefully analyzed whatsoever, my point was
>> more about changing _M_end too, to non-const, than about not touching
>> _M_begin. Would that make sense?
>
> Currently we never mutate _M_end. I *believe* that we still won't in
> the future, since _M_end is not as volatile as _M_begin.

I agree with Tim here, why shouldn't the const member and the
non-const member coexist?

- Daniel
Paolo Carlini Nov. 19, 2014, 6:54 p.m. UTC | #6
Hi,

On 11/19/2014 07:43 PM, Daniel Krügler wrote:
> 2014-11-19 19:42 GMT+01:00 Tim Shen <timshen@google.com>:
>> On Wed, Nov 19, 2014 at 8:16 AM, Paolo Carlini <paolo.carlini@oracle.com> wrote:
>>> Good. To be clear, not having carefully analyzed whatsoever, my point was
>>> more about changing _M_end too, to non-const, than about not touching
>>> _M_begin. Would that make sense?
>> Currently we never mutate _M_end. I *believe* that we still won't in
>> the future, since _M_end is not as volatile as _M_begin.
> I agree with Tim here, why shouldn't the const member and the
> non-const member coexist?
I was just aiming for consistency, from a very, very, general point of 
view. Jon will review the substance of the patch, anyway.

Thanks!
Paolo.
Tim Shen Nov. 19, 2014, 7:32 p.m. UTC | #7
On Wed, Nov 19, 2014 at 10:54 AM, Paolo Carlini
<paolo.carlini@oracle.com> wrote:
> I was just aiming for consistency, from a very, very, general point of view.

Not a problem. Thank you actually, since it's obviously good for us :)
Jonathan Wakely Nov. 24, 2014, 11:17 a.m. UTC | #8
On 18/11/14 11:12 -0800, Tim Shen wrote:
>Bootstrapped and tested.
>
>Thanks!
>
>
>-- 
>Regards,
>Tim Shen

>commit f17155183b9ae1283d04f3bbdb61d05d9279ebe4
>Author: timshen <timshen@google.com>
>Date:   Tue Nov 18 00:07:28 2014 -0800
>
>    	PR libstdc++/63920
>    	* include/bits/regex_executor.h: Make _M_begin non const.
>    	* include/bits/regex_executor.tcc (_Executor<>::_M_search): Increase
>    	_M_begin in search algorithm, so that _M_begin is treated as
>    	"current start position" for each search iteration.
>    	* testsuite/28_regex/algorithms/regex_search/ecma/flags.cc: New
>    	testcase.

OK for trunk - thanks.
Tim Shen Nov. 25, 2014, 8:04 a.m. UTC | #9
On Mon, Nov 24, 2014 at 3:17 AM, Jonathan Wakely <jwakely@redhat.com> wrote:
> OK for trunk - thanks.

Committed. :)

Thanks!
diff mbox

Patch

diff --git a/libstdc++-v3/include/bits/regex_executor.h b/libstdc++-v3/include/bits/regex_executor.h
index b26992c..2d7796f 100644
--- a/libstdc++-v3/include/bits/regex_executor.h
+++ b/libstdc++-v3/include/bits/regex_executor.h
@@ -205,7 +205,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
     public:
       _ResultsVec                                           _M_cur_results;
       _BiIter                                               _M_current;
-      const _BiIter                                         _M_begin;
+      _BiIter                                               _M_begin;
       const _BiIter                                         _M_end;
       const _RegexT&                                        _M_re;
       const _NFAT&                                          _M_nfa;
diff --git a/libstdc++-v3/include/bits/regex_executor.tcc b/libstdc++-v3/include/bits/regex_executor.tcc
index 38d4781..a973667 100644
--- a/libstdc++-v3/include/bits/regex_executor.tcc
+++ b/libstdc++-v3/include/bits/regex_executor.tcc
@@ -39,17 +39,17 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
     bool _Executor<_BiIter, _Alloc, _TraitsT, __dfs_mode>::
     _M_search()
     {
+      if (_M_search_from_first())
+	return true;
       if (_M_flags & regex_constants::match_continuous)
-	return _M_search_from_first();
-      auto __cur = _M_begin;
-      do
+	return false;
+      _M_flags |= regex_constants::match_prev_avail;
+      while (_M_begin != _M_end)
 	{
-	  _M_current = __cur;
-	  if (_M_main(_Match_mode::_Prefix))
+	  ++_M_begin;
+	  if (_M_search_from_first())
 	    return true;
 	}
-      // Continue when __cur == _M_end
-      while (__cur++ != _M_end);
       return false;
     }
 
diff --git a/libstdc++-v3/testsuite/28_regex/algorithms/regex_search/ecma/flags.cc b/libstdc++-v3/testsuite/28_regex/algorithms/regex_search/ecma/flags.cc
index 45ad0f6..6b038ee 100644
--- a/libstdc++-v3/testsuite/28_regex/algorithms/regex_search/ecma/flags.cc
+++ b/libstdc++-v3/testsuite/28_regex/algorithms/regex_search/ecma/flags.cc
@@ -65,6 +65,8 @@  test01()
 			     regex_constants::match_prev_avail));
   VERIFY( regex_search_debug("ba"+1, regex("\\Ba"),
 			     regex_constants::match_prev_avail));
+  // PR libstdc++/63920
+  VERIFY(!regex_search_debug("a", regex("b*"), regex_constants::match_not_null));
 }
 
 int