Message ID | CAG4ZjNm2Dj3UbcP1L1fp7BTku2e2zDyDwU1SH4hYSktz5XhbJA@mail.gmail.com |
---|---|
State | New |
Headers | show |
Hi,
On 11/18/2014 08:12 PM, Tim Shen wrote:
> Bootstrapped and tested.
Jonathan lately is following your work much better than me, but naively
seems weird that _M_begin is non-const and _M_end is const, a different
type anyway.
Paolo.
On Tue, Nov 18, 2014 at 11:19 AM, Paolo Carlini <paolo.carlini@oracle.com> wrote: > Jonathan lately is following your work much better than me, but naively > seems weird that _M_begin is non-const and _M_end is const, a different type > anyway. Hmm. The current regex_search algorithm is implemented as try match starting from _M_begin; if it doesn't match, start over from _M_begin+1, ... As we can tell, _M_begin is never changed, and it's const. The problem is when the executer reaches the accept state (which indicates a match) we use _M_current == _M_begin to verify if it's an empty match. It is possible that, when we are not in the first iteration, say, in the second iteration actually, _M_current is initialized with _M_begin+1. It turns out even _M_current has never been increased (no chars are eaten, aka empty match), _M_current != _M_begin is still true. This fix is making each regex_search iteration more thorough, with increased _M_begin, as if it's a new regex _M_search_from_first. I've carefully (admittedly, after sending this patch) inspect everywhere when _M_begin is used. It turns out _M_begin is under well-defined (the initial position of _M_current when current iteration starts) invariants (see _Executor<>::_M_at_begin), indicated by the use of regex_constants::match_prev_avail. This flag actually implies that __begin iterator passed into regex_search is not always "the physical boundary" who matches "^". Boost (and we) conforms this behavior: std::regex_search("asdf", std::regex("^asdf"), std::regex_constants::match_prev_avail) returns false. It's more elegant to move _Executor::_M_search out of its class and make _M_begin still const, but _Executor costs too much to initialize.
Hi Tim, On 11/19/2014 08:27 AM, Tim Shen wrote: > On Tue, Nov 18, 2014 at 11:19 AM, Paolo Carlini > <paolo.carlini@oracle.com> wrote: >> Jonathan lately is following your work much better than me, but naively >> seems weird that _M_begin is non-const and _M_end is const, a different type >> anyway. > Hmm. The current regex_search algorithm is implemented as try match > starting from _M_begin; if it doesn't match, start over from > _M_begin+1, ... > > As we can tell, _M_begin is never changed, and it's const. > > The problem is when the executer reaches the accept state (which > indicates a match) we use _M_current == _M_begin to verify if it's an > empty match. It is possible that, when we are not in the first > iteration, say, in the second iteration actually, _M_current is > initialized with _M_begin+1. It turns out even _M_current has never > been increased (no chars are eaten, aka empty match), _M_current != > _M_begin is still true. > > This fix is making each regex_search iteration more thorough, with > increased _M_begin, as if it's a new regex _M_search_from_first. > > I've carefully (admittedly, after sending this patch) inspect > everywhere when _M_begin is used. It turns out _M_begin is under > well-defined (the initial position of _M_current when current > iteration starts) invariants (see _Executor<>::_M_at_begin), indicated > by the use of regex_constants::match_prev_avail. This flag actually > implies that __begin iterator passed into regex_search is not always > "the physical boundary" who matches "^". Boost (and we) conforms this > behavior: > > std::regex_search("asdf", std::regex("^asdf"), > std::regex_constants::match_prev_avail) > > returns false. > > It's more elegant to move _Executor::_M_search out of its class and > make _M_begin still const, but _Executor costs too much to initialize. Good. To be clear, not having carefully analyzed whatsoever, my point was more about changing _M_end too, to non-const, than about not touching _M_begin. Would that make sense? Thanks, Paolo.
On Wed, Nov 19, 2014 at 8:16 AM, Paolo Carlini <paolo.carlini@oracle.com> wrote: > Good. To be clear, not having carefully analyzed whatsoever, my point was > more about changing _M_end too, to non-const, than about not touching > _M_begin. Would that make sense? Currently we never mutate _M_end. I *believe* that we still won't in the future, since _M_end is not as volatile as _M_begin.
2014-11-19 19:42 GMT+01:00 Tim Shen <timshen@google.com>: > On Wed, Nov 19, 2014 at 8:16 AM, Paolo Carlini <paolo.carlini@oracle.com> wrote: >> Good. To be clear, not having carefully analyzed whatsoever, my point was >> more about changing _M_end too, to non-const, than about not touching >> _M_begin. Would that make sense? > > Currently we never mutate _M_end. I *believe* that we still won't in > the future, since _M_end is not as volatile as _M_begin. I agree with Tim here, why shouldn't the const member and the non-const member coexist? - Daniel
Hi, On 11/19/2014 07:43 PM, Daniel Krügler wrote: > 2014-11-19 19:42 GMT+01:00 Tim Shen <timshen@google.com>: >> On Wed, Nov 19, 2014 at 8:16 AM, Paolo Carlini <paolo.carlini@oracle.com> wrote: >>> Good. To be clear, not having carefully analyzed whatsoever, my point was >>> more about changing _M_end too, to non-const, than about not touching >>> _M_begin. Would that make sense? >> Currently we never mutate _M_end. I *believe* that we still won't in >> the future, since _M_end is not as volatile as _M_begin. > I agree with Tim here, why shouldn't the const member and the > non-const member coexist? I was just aiming for consistency, from a very, very, general point of view. Jon will review the substance of the patch, anyway. Thanks! Paolo.
On Wed, Nov 19, 2014 at 10:54 AM, Paolo Carlini
<paolo.carlini@oracle.com> wrote:
> I was just aiming for consistency, from a very, very, general point of view.
Not a problem. Thank you actually, since it's obviously good for us :)
On 18/11/14 11:12 -0800, Tim Shen wrote: >Bootstrapped and tested. > >Thanks! > > >-- >Regards, >Tim Shen >commit f17155183b9ae1283d04f3bbdb61d05d9279ebe4 >Author: timshen <timshen@google.com> >Date: Tue Nov 18 00:07:28 2014 -0800 > > PR libstdc++/63920 > * include/bits/regex_executor.h: Make _M_begin non const. > * include/bits/regex_executor.tcc (_Executor<>::_M_search): Increase > _M_begin in search algorithm, so that _M_begin is treated as > "current start position" for each search iteration. > * testsuite/28_regex/algorithms/regex_search/ecma/flags.cc: New > testcase. OK for trunk - thanks.
On Mon, Nov 24, 2014 at 3:17 AM, Jonathan Wakely <jwakely@redhat.com> wrote:
> OK for trunk - thanks.
Committed. :)
Thanks!
diff --git a/libstdc++-v3/include/bits/regex_executor.h b/libstdc++-v3/include/bits/regex_executor.h index b26992c..2d7796f 100644 --- a/libstdc++-v3/include/bits/regex_executor.h +++ b/libstdc++-v3/include/bits/regex_executor.h @@ -205,7 +205,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION public: _ResultsVec _M_cur_results; _BiIter _M_current; - const _BiIter _M_begin; + _BiIter _M_begin; const _BiIter _M_end; const _RegexT& _M_re; const _NFAT& _M_nfa; diff --git a/libstdc++-v3/include/bits/regex_executor.tcc b/libstdc++-v3/include/bits/regex_executor.tcc index 38d4781..a973667 100644 --- a/libstdc++-v3/include/bits/regex_executor.tcc +++ b/libstdc++-v3/include/bits/regex_executor.tcc @@ -39,17 +39,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION bool _Executor<_BiIter, _Alloc, _TraitsT, __dfs_mode>:: _M_search() { + if (_M_search_from_first()) + return true; if (_M_flags & regex_constants::match_continuous) - return _M_search_from_first(); - auto __cur = _M_begin; - do + return false; + _M_flags |= regex_constants::match_prev_avail; + while (_M_begin != _M_end) { - _M_current = __cur; - if (_M_main(_Match_mode::_Prefix)) + ++_M_begin; + if (_M_search_from_first()) return true; } - // Continue when __cur == _M_end - while (__cur++ != _M_end); return false; } diff --git a/libstdc++-v3/testsuite/28_regex/algorithms/regex_search/ecma/flags.cc b/libstdc++-v3/testsuite/28_regex/algorithms/regex_search/ecma/flags.cc index 45ad0f6..6b038ee 100644 --- a/libstdc++-v3/testsuite/28_regex/algorithms/regex_search/ecma/flags.cc +++ b/libstdc++-v3/testsuite/28_regex/algorithms/regex_search/ecma/flags.cc @@ -65,6 +65,8 @@ test01() regex_constants::match_prev_avail)); VERIFY( regex_search_debug("ba"+1, regex("\\Ba"), regex_constants::match_prev_avail)); + // PR libstdc++/63920 + VERIFY(!regex_search_debug("a", regex("b*"), regex_constants::match_not_null)); } int