diff mbox series

[libstdc++/71500] make back reference work with icase

Message ID CAG4ZjN=teuigpysiMSm+tE+3S4dm+WM=7n7A80gXBDFpdqfhBQ@mail.gmail.com
State New
Headers show
Series [libstdc++/71500] make back reference work with icase | expand

Commit Message

Li, Pan2 via Gcc-patches Sept. 4, 2017, 10:31 a.m. UTC
This fixes the follow-up comments in 71500.

Back-reference matching is different from other matching, as the
content the back-reference refers to is at "run-time", aka during
regex_match(), not regex() compilation.

For compilation we do have an abstraction layer to catch all
comparison customizations, namely _M_translator in regex_compiler.h.
Until this patch, we don't have an abstraction for "run-time"
matching. I believe that back-reference is the only place that needs
run-time matching, so I just build a _Backref_matcher in
regex_executot.tcc.

Tested on x86_64-linux-gnu.

Thanks!

Comments

Jonathan Wakely Sept. 8, 2017, 3:52 p.m. UTC | #1
On 04/09/17 03:31 -0700, Tim Shen via libstdc++ wrote:
>This fixes the follow-up comments in 71500.
>
>Back-reference matching is different from other matching, as the
>content the back-reference refers to is at "run-time", aka during
>regex_match(), not regex() compilation.
>
>For compilation we do have an abstraction layer to catch all
>comparison customizations, namely _M_translator in regex_compiler.h.
>Until this patch, we don't have an abstraction for "run-time"
>matching. I believe that back-reference is the only place that needs
>run-time matching, so I just build a _Backref_matcher in
>regex_executot.tcc.

Looks like a nice solution. OK for trunk, thanks.

I think this looks safe to backport too, but let's leave it on trunk
for a while first.
Jonathan Wakely Sept. 15, 2017, 3:39 p.m. UTC | #2
On 04/09/17 03:31 -0700, Tim Shen via libstdc++ wrote:
>This fixes the follow-up comments in 71500.
>
>Back-reference matching is different from other matching, as the
>content the back-reference refers to is at "run-time", aka during
>regex_match(), not regex() compilation.
>
>For compilation we do have an abstraction layer to catch all
>comparison customizations, namely _M_translator in regex_compiler.h.
>Until this patch, we don't have an abstraction for "run-time"
>matching. I believe that back-reference is the only place that needs
>run-time matching, so I just build a _Backref_matcher in
>regex_executot.tcc.
>
>Tested on x86_64-linux-gnu.
>
>Thanks!
>
>-- 
>Regards,
>Tim Shen

>commit a97b7fecd319e031ffc489a956b8cf3dc63eeb26
>Author: Tim Shen <timshen@google.com>
>Date:   Mon Sep 4 03:19:35 2017 -0700
>
>            PR libstdc++/71500
>            * include/bits/regex_executor.tcc: Support icase in
>            regex_tratis<...> for back reference matches.
>            * testsuite/28_regex/regression.cc: Test case.
>
>diff --git a/libstdc++-v3/include/bits/regex_executor.tcc b/libstdc++-v3/include/bits/regex_executor.tcc
>index 226e05856e1..f6149fecf9d 100644
>--- a/libstdc++-v3/include/bits/regex_executor.tcc
>+++ b/libstdc++-v3/include/bits/regex_executor.tcc
>@@ -335,6 +335,54 @@ namespace __detail
> 	  _M_states._M_queue(__state._M_next, _M_cur_results);
>     }
> 
>+  template<typename _BiIter, typename _TraitsT>
>+    struct _Backref_matcher
>+    {
>+      _Backref_matcher(bool __icase, const _TraitsT& __traits)
>+      : _M_traits(__traits) { }
>+
>+      bool
>+      _M_apply(_BiIter __expected_begin,
>+	       _BiIter __expected_end, _BiIter __actual_begin,
>+	       _BiIter __actual_end)
>+      {
>+	return _M_traits.transform(__expected_begin, __expected_end)
>+	    == _M_traits.transform(__actual_begin, __actual_end);
>+      }
>+
>+      const _TraitsT& _M_traits;
>+    };
>+
>+  template<typename _BiIter, typename _CharT>
>+    struct _Backref_matcher<_BiIter, std::regex_traits<_CharT>>
>+    {
>+      using _TraitsT = std::regex_traits<_CharT>;
>+      _Backref_matcher(bool __icase, const _TraitsT& __traits)
>+      : _M_icase(__icase), _M_traits(__traits) { }
>+
>+      bool
>+      _M_apply(_BiIter __expected_begin,
>+	       _BiIter __expected_end, _BiIter __actual_begin,
>+	       _BiIter __actual_end)
>+      {
>+	if (!_M_icase)
>+	  return std::equal(__expected_begin, __expected_end,
>+			    __actual_begin, __actual_end);

This is only valid in C++14 and higher, because the 4-argument version
of std::equal isn't present in C++11.

>+	typedef std::ctype<_CharT> __ctype_type;
>+	const auto& __fctyp = use_facet<__ctype_type>(_M_traits.getloc());
>+	return std::equal(__expected_begin, __expected_end,
>+			  __actual_begin, __actual_end,

Same here.

>+			  [this, &__fctyp](_CharT __lhs, _CharT __rhs)
>+			  {
>+			    return __fctyp.tolower(__lhs)
>+				== __fctyp.tolower(__rhs);
>+			  });

We need to rewrite this to check the lengths are equal first, and then
call the 3-argument version of std::equal.

Alternatively, we could move the implementation of the C++14
std::equal overloads to __equal and make that available for C++11.
I'll try that.
Jonathan Wakely Sept. 18, 2017, 5:26 p.m. UTC | #3
On 15/09/17 16:39 +0100, Jonathan Wakely wrote:
>On 04/09/17 03:31 -0700, Tim Shen via libstdc++ wrote:
>>This fixes the follow-up comments in 71500.
>>
>>Back-reference matching is different from other matching, as the
>>content the back-reference refers to is at "run-time", aka during
>>regex_match(), not regex() compilation.
>>
>>For compilation we do have an abstraction layer to catch all
>>comparison customizations, namely _M_translator in regex_compiler.h.
>>Until this patch, we don't have an abstraction for "run-time"
>>matching. I believe that back-reference is the only place that needs
>>run-time matching, so I just build a _Backref_matcher in
>>regex_executot.tcc.
>>
>>Tested on x86_64-linux-gnu.
>>
>>Thanks!
>>
>>-- 
>>Regards,
>>Tim Shen
>
>>commit a97b7fecd319e031ffc489a956b8cf3dc63eeb26
>>Author: Tim Shen <timshen@google.com>
>>Date:   Mon Sep 4 03:19:35 2017 -0700
>>
>>           PR libstdc++/71500
>>           * include/bits/regex_executor.tcc: Support icase in
>>           regex_tratis<...> for back reference matches.
>>           * testsuite/28_regex/regression.cc: Test case.
>>
>>diff --git a/libstdc++-v3/include/bits/regex_executor.tcc b/libstdc++-v3/include/bits/regex_executor.tcc
>>index 226e05856e1..f6149fecf9d 100644
>>--- a/libstdc++-v3/include/bits/regex_executor.tcc
>>+++ b/libstdc++-v3/include/bits/regex_executor.tcc
>>@@ -335,6 +335,54 @@ namespace __detail
>>	  _M_states._M_queue(__state._M_next, _M_cur_results);
>>    }
>>
>>+  template<typename _BiIter, typename _TraitsT>
>>+    struct _Backref_matcher
>>+    {
>>+      _Backref_matcher(bool __icase, const _TraitsT& __traits)
>>+      : _M_traits(__traits) { }
>>+
>>+      bool
>>+      _M_apply(_BiIter __expected_begin,
>>+	       _BiIter __expected_end, _BiIter __actual_begin,
>>+	       _BiIter __actual_end)
>>+      {
>>+	return _M_traits.transform(__expected_begin, __expected_end)
>>+	    == _M_traits.transform(__actual_begin, __actual_end);
>>+      }
>>+
>>+      const _TraitsT& _M_traits;
>>+    };
>>+
>>+  template<typename _BiIter, typename _CharT>
>>+    struct _Backref_matcher<_BiIter, std::regex_traits<_CharT>>
>>+    {
>>+      using _TraitsT = std::regex_traits<_CharT>;
>>+      _Backref_matcher(bool __icase, const _TraitsT& __traits)
>>+      : _M_icase(__icase), _M_traits(__traits) { }
>>+
>>+      bool
>>+      _M_apply(_BiIter __expected_begin,
>>+	       _BiIter __expected_end, _BiIter __actual_begin,
>>+	       _BiIter __actual_end)
>>+      {
>>+	if (!_M_icase)
>>+	  return std::equal(__expected_begin, __expected_end,
>>+			    __actual_begin, __actual_end);
>
>This is only valid in C++14 and higher, because the 4-argument version
>of std::equal isn't present in C++11.
>
>>+	typedef std::ctype<_CharT> __ctype_type;
>>+	const auto& __fctyp = use_facet<__ctype_type>(_M_traits.getloc());
>>+	return std::equal(__expected_begin, __expected_end,
>>+			  __actual_begin, __actual_end,
>
>Same here.
>
>>+			  [this, &__fctyp](_CharT __lhs, _CharT __rhs)
>>+			  {
>>+			    return __fctyp.tolower(__lhs)
>>+				== __fctyp.tolower(__rhs);
>>+			  });
>
>We need to rewrite this to check the lengths are equal first, and then
>call the 3-argument version of std::equal.
>
>Alternatively, we could move the implementation of the C++14
>std::equal overloads to __equal and make that available for C++11.
>I'll try that.

Here's a proof of concept patch for that. It's a bit ugly.
diff --git a/libstdc++-v3/include/bits/regex_executor.tcc b/libstdc++-v3/include/bits/regex_executor.tcc
index f6149fecf9d..4b185cc9d1e 100644
--- a/libstdc++-v3/include/bits/regex_executor.tcc
+++ b/libstdc++-v3/include/bits/regex_executor.tcc
@@ -366,17 +366,21 @@ namespace __detail
 	       _BiIter __actual_end)
       {
 	if (!_M_icase)
-	  return std::equal(__expected_begin, __expected_end,
-			    __actual_begin, __actual_end);
+	  return std::__equal4(__expected_begin, __expected_end,
+			       __actual_begin, __actual_end,
+			       std::__iterator_category(__expected_begin),
+			       std::__iterator_category(__actual_begin));
 	typedef std::ctype<_CharT> __ctype_type;
 	const auto& __fctyp = use_facet<__ctype_type>(_M_traits.getloc());
-	return std::equal(__expected_begin, __expected_end,
-			  __actual_begin, __actual_end,
-			  [this, &__fctyp](_CharT __lhs, _CharT __rhs)
-			  {
-			    return __fctyp.tolower(__lhs)
-				== __fctyp.tolower(__rhs);
-			  });
+	return std::__equal4_p(__expected_begin, __expected_end,
+			       __actual_begin, __actual_end,
+			       [this, &__fctyp](_CharT __lhs, _CharT __rhs)
+			       {
+				 return __fctyp.tolower(__lhs)
+				   == __fctyp.tolower(__rhs);
+			       },
+			       std::__iterator_category(__expected_begin),
+			       std::__iterator_category(__actual_begin));
       }
 
       bool _M_icase;
diff --git a/libstdc++-v3/include/bits/stl_algobase.h b/libstdc++-v3/include/bits/stl_algobase.h
index f68ecb22b82..b7848b3de99 100644
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -1082,6 +1082,59 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
       return true;
     }
 
+#if __cplusplus >= 201103L
+  template<typename _II1, typename _II2>
+    inline bool
+    __equal4(_II1 __first1, _II1 __last1, _II2 __first2, _II2 __last2,
+	     random_access_iterator_tag, random_access_iterator_tag)
+    {
+      auto __d1 = std::distance(__first1, __last1);
+      auto __d2 = std::distance(__first2, __last2);
+      if (__d1 != __d2)
+	return false;
+      return _GLIBCXX_STD_A::equal(__first1, __last1, __first2);
+    }
+
+  template<typename _II1, typename _II2, typename _Cat1, typename _Cat2>
+    inline bool
+    __equal4(_II1 __first1, _II1 __last1, _II2 __first2, _II2 __last2,
+	     _Cat1, _Cat2)
+    {
+      for (; __first1 != __last1 && __first2 != __last2;
+	  ++__first1, (void)++__first2)
+	if (!(*__first1 == *__first2))
+	  return false;
+      return __first1 == __last1 && __first2 == __last2;
+    }
+
+  template<typename _II1, typename _II2, typename _BinaryPredicate>
+    inline bool
+    __equal4_p(_II1 __first1, _II1 __last1, _II2 __first2, _II2 __last2,
+	       _BinaryPredicate __binary_pred, random_access_iterator_tag,
+	       random_access_iterator_tag)
+    {
+      auto __d1 = std::distance(__first1, __last1);
+      auto __d2 = std::distance(__first2, __last2);
+      if (__d1 != __d2)
+	return false;
+      return _GLIBCXX_STD_A::equal(__first1, __last1, __first2,
+				   __binary_pred);
+    }
+
+  template<typename _II1, typename _II2, typename _BinaryPredicate,
+	   typename _Cat1, typename _Cat2>
+    inline bool
+    __equal4_p(_II1 __first1, _II1 __last1, _II2 __first2, _II2 __last2,
+	       _BinaryPredicate __binary_pred, _Cat1, _Cat2)
+    {
+      for (; __first1 != __last1 && __first2 != __last2;
+	  ++__first1, (void)++__first2)
+	if (!bool(__binary_pred(*__first1, *__first2)))
+	  return false;
+      return __first1 == __last1 && __first2 == __last2;
+    }
+#endif // C++11
+
 #if __cplusplus > 201103L
 
 #define __cpp_lib_robust_nonmodifying_seq_ops 201304
@@ -1112,24 +1165,9 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
       __glibcxx_requires_valid_range(__first1, __last1);
       __glibcxx_requires_valid_range(__first2, __last2);
 
-      using _RATag = random_access_iterator_tag;
-      using _Cat1 = typename iterator_traits<_II1>::iterator_category;
-      using _Cat2 = typename iterator_traits<_II2>::iterator_category;
-      using _RAIters = __and_<is_same<_Cat1, _RATag>, is_same<_Cat2, _RATag>>;
-      if (_RAIters())
-	{
-	  auto __d1 = std::distance(__first1, __last1);
-	  auto __d2 = std::distance(__first2, __last2);
-	  if (__d1 != __d2)
-	    return false;
-	  return _GLIBCXX_STD_A::equal(__first1, __last1, __first2);
-	}
-
-      for (; __first1 != __last1 && __first2 != __last2;
-	  ++__first1, (void)++__first2)
-	if (!(*__first1 == *__first2))
-	  return false;
-      return __first1 == __last1 && __first2 == __last2;
+      return _GLIBCXX_STD_A::__equal4(__first1, __last1, __first2, __last2,
+				      __iterator_category(__first1),
+				      __iterator_category(__first2));
     }
 
   /**
@@ -1159,27 +1197,12 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
       __glibcxx_requires_valid_range(__first1, __last1);
       __glibcxx_requires_valid_range(__first2, __last2);
 
-      using _RATag = random_access_iterator_tag;
-      using _Cat1 = typename iterator_traits<_IIter1>::iterator_category;
-      using _Cat2 = typename iterator_traits<_IIter2>::iterator_category;
-      using _RAIters = __and_<is_same<_Cat1, _RATag>, is_same<_Cat2, _RATag>>;
-      if (_RAIters())
-	{
-	  auto __d1 = std::distance(__first1, __last1);
-	  auto __d2 = std::distance(__first2, __last2);
-	  if (__d1 != __d2)
-	    return false;
-	  return _GLIBCXX_STD_A::equal(__first1, __last1, __first2,
-				       __binary_pred);
-	}
-
-      for (; __first1 != __last1 && __first2 != __last2;
-	  ++__first1, (void)++__first2)
-	if (!bool(__binary_pred(*__first1, *__first2)))
-	  return false;
-      return __first1 == __last1 && __first2 == __last2;
+      return _GLIBCXX_STD_A::__equal4_p(__first1, __last1, __first2, __last2,
+					__binary_pred,
+					__iterator_category(__first1),
+					__iterator_category(__first2));
     }
-#endif
+#endif // C++14
 
   /**
    *  @brief Performs @b dictionary comparison on ranges.
Li, Pan2 via Gcc-patches Sept. 18, 2017, 5:58 p.m. UTC | #4
On Mon, Sep 18, 2017 at 10:26 AM, Jonathan Wakely <jwakely@redhat.com> wrote:
>> We need to rewrite this to check the lengths are equal first, and then
>> call the 3-argument version of std::equal.
>>
>> Alternatively, we could move the implementation of the C++14
>> std::equal overloads to __equal and make that available for C++11.
>> I'll try that.
>
>
> Here's a proof of concept patch for that. It's a bit ugly.

Instead of having iterator tags in the interface, we can probe the
random-access-ness inside __equal4/__equal4_p, can't we? It's similar
to the existing "if (_RAIters()) { ... }".

I'd expect the patches to be renaming the current implementations and
adding wrappers, instead of adding new implementations.
Jonathan Wakely Sept. 18, 2017, 11:01 p.m. UTC | #5
On 18/09/17 10:58 -0700, Tim Shen via libstdc++ wrote:
>On Mon, Sep 18, 2017 at 10:26 AM, Jonathan Wakely <jwakely@redhat.com> wrote:
>>> We need to rewrite this to check the lengths are equal first, and then
>>> call the 3-argument version of std::equal.
>>>
>>> Alternatively, we could move the implementation of the C++14
>>> std::equal overloads to __equal and make that available for C++11.
>>> I'll try that.
>>
>>
>> Here's a proof of concept patch for that. It's a bit ugly.
>
>Instead of having iterator tags in the interface, we can probe the
>random-access-ness inside __equal4/__equal4_p, can't we? It's similar
>to the existing "if (_RAIters()) { ... }".
>
>I'd expect the patches to be renaming the current implementations and
>adding wrappers, instead of adding new implementations.

Well I decided to split the existing functions up and use tag
dispatching, which is conceptually cleaner anyway. But as the
RandomAccessIterator version doesn't need any operations that aren't
valid for other categories, it's not strictly necessary. The tag
dispatching version should generate slightly smaller code for
unoptimized builds, but that's not very important.

Here's the patch doing it as you suggest. We can't call the new
functions __equal because t hat name is already taken by a helper
struct, hence __equal4.

Do you prefer this version?
diff --git a/libstdc++-v3/include/bits/regex_executor.tcc b/libstdc++-v3/include/bits/regex_executor.tcc
index f6149fecf9d..2ceba35e7b8 100644
--- a/libstdc++-v3/include/bits/regex_executor.tcc
+++ b/libstdc++-v3/include/bits/regex_executor.tcc
@@ -366,17 +366,17 @@ namespace __detail
 	       _BiIter __actual_end)
       {
 	if (!_M_icase)
-	  return std::equal(__expected_begin, __expected_end,
-			    __actual_begin, __actual_end);
+	  return std::__equal4(__expected_begin, __expected_end,
+			       __actual_begin, __actual_end);
 	typedef std::ctype<_CharT> __ctype_type;
 	const auto& __fctyp = use_facet<__ctype_type>(_M_traits.getloc());
-	return std::equal(__expected_begin, __expected_end,
-			  __actual_begin, __actual_end,
-			  [this, &__fctyp](_CharT __lhs, _CharT __rhs)
-			  {
-			    return __fctyp.tolower(__lhs)
-				== __fctyp.tolower(__rhs);
-			  });
+	return std::__equal4(__expected_begin, __expected_end,
+			     __actual_begin, __actual_end,
+			     [this, &__fctyp](_CharT __lhs, _CharT __rhs)
+			     {
+			       return __fctyp.tolower(__lhs)
+				 == __fctyp.tolower(__rhs);
+			     });
       }
 
       bool _M_icase;
diff --git a/libstdc++-v3/include/bits/stl_algobase.h b/libstdc++-v3/include/bits/stl_algobase.h
index f68ecb22b82..ff5e94d9ae8 100644
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -1082,6 +1082,58 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
       return true;
     }
 
+#if __cplusplus >= 201103L
+  template<typename _II1, typename _II2>
+    inline bool
+    __equal4(_II1 __first1, _II1 __last1, _II2 __first2, _II2 __last2)
+    {
+      using _RATag = random_access_iterator_tag;
+      using _Cat1 = typename iterator_traits<_II1>::iterator_category;
+      using _Cat2 = typename iterator_traits<_II2>::iterator_category;
+      using _RAIters = __and_<is_same<_Cat1, _RATag>, is_same<_Cat2, _RATag>>;
+      if (_RAIters())
+	{
+	  auto __d1 = std::distance(__first1, __last1);
+	  auto __d2 = std::distance(__first2, __last2);
+	  if (__d1 != __d2)
+	    return false;
+	  return _GLIBCXX_STD_A::equal(__first1, __last1, __first2);
+	}
+
+      for (; __first1 != __last1 && __first2 != __last2;
+	  ++__first1, (void)++__first2)
+	if (!(*__first1 == *__first2))
+	  return false;
+      return __first1 == __last1 && __first2 == __last2;
+    }
+
+  template<typename _II1, typename _II2, typename _BinaryPredicate>
+    inline bool
+    __equal4(_II1 __first1, _II1 __last1, _II2 __first2, _II2 __last2,
+	     _BinaryPredicate __binary_pred)
+    {
+      using _RATag = random_access_iterator_tag;
+      using _Cat1 = typename iterator_traits<_II1>::iterator_category;
+      using _Cat2 = typename iterator_traits<_II2>::iterator_category;
+      using _RAIters = __and_<is_same<_Cat1, _RATag>, is_same<_Cat2, _RATag>>;
+      if (_RAIters())
+	{
+	  auto __d1 = std::distance(__first1, __last1);
+	  auto __d2 = std::distance(__first2, __last2);
+	  if (__d1 != __d2)
+	    return false;
+	  return _GLIBCXX_STD_A::equal(__first1, __last1, __first2,
+				       __binary_pred);
+	}
+
+      for (; __first1 != __last1 && __first2 != __last2;
+	  ++__first1, (void)++__first2)
+	if (!bool(__binary_pred(*__first1, *__first2)))
+	  return false;
+      return __first1 == __last1 && __first2 == __last2;
+    }
+#endif // C++11
+
 #if __cplusplus > 201103L
 
 #define __cpp_lib_robust_nonmodifying_seq_ops 201304
@@ -1112,24 +1164,7 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
       __glibcxx_requires_valid_range(__first1, __last1);
       __glibcxx_requires_valid_range(__first2, __last2);
 
-      using _RATag = random_access_iterator_tag;
-      using _Cat1 = typename iterator_traits<_II1>::iterator_category;
-      using _Cat2 = typename iterator_traits<_II2>::iterator_category;
-      using _RAIters = __and_<is_same<_Cat1, _RATag>, is_same<_Cat2, _RATag>>;
-      if (_RAIters())
-	{
-	  auto __d1 = std::distance(__first1, __last1);
-	  auto __d2 = std::distance(__first2, __last2);
-	  if (__d1 != __d2)
-	    return false;
-	  return _GLIBCXX_STD_A::equal(__first1, __last1, __first2);
-	}
-
-      for (; __first1 != __last1 && __first2 != __last2;
-	  ++__first1, (void)++__first2)
-	if (!(*__first1 == *__first2))
-	  return false;
-      return __first1 == __last1 && __first2 == __last2;
+      return _GLIBCXX_STD_A::__equal4(__first1, __last1, __first2, __last2);
     }
 
   /**
@@ -1159,27 +1194,10 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
       __glibcxx_requires_valid_range(__first1, __last1);
       __glibcxx_requires_valid_range(__first2, __last2);
 
-      using _RATag = random_access_iterator_tag;
-      using _Cat1 = typename iterator_traits<_IIter1>::iterator_category;
-      using _Cat2 = typename iterator_traits<_IIter2>::iterator_category;
-      using _RAIters = __and_<is_same<_Cat1, _RATag>, is_same<_Cat2, _RATag>>;
-      if (_RAIters())
-	{
-	  auto __d1 = std::distance(__first1, __last1);
-	  auto __d2 = std::distance(__first2, __last2);
-	  if (__d1 != __d2)
-	    return false;
-	  return _GLIBCXX_STD_A::equal(__first1, __last1, __first2,
-				       __binary_pred);
-	}
-
-      for (; __first1 != __last1 && __first2 != __last2;
-	  ++__first1, (void)++__first2)
-	if (!bool(__binary_pred(*__first1, *__first2)))
-	  return false;
-      return __first1 == __last1 && __first2 == __last2;
+      return _GLIBCXX_STD_A::__equal4(__first1, __last1, __first2, __last2,
+				      __binary_pred);
     }
-#endif
+#endif // C++14
 
   /**
    *  @brief Performs @b dictionary comparison on ranges.
Li, Pan2 via Gcc-patches Sept. 18, 2017, 11:54 p.m. UTC | #6
On Mon, Sep 18, 2017 at 4:01 PM, Jonathan Wakely <jwakely@redhat.com> wrote:
> On 18/09/17 10:58 -0700, Tim Shen via libstdc++ wrote:
>>
>> On Mon, Sep 18, 2017 at 10:26 AM, Jonathan Wakely <jwakely@redhat.com>
>> wrote:
>>>>
>>>> We need to rewrite this to check the lengths are equal first, and then
>>>> call the 3-argument version of std::equal.
>>>>
>>>> Alternatively, we could move the implementation of the C++14
>>>> std::equal overloads to __equal and make that available for C++11.
>>>> I'll try that.
>>>
>>>
>>>
>>> Here's a proof of concept patch for that. It's a bit ugly.
>>
>>
>> Instead of having iterator tags in the interface, we can probe the
>> random-access-ness inside __equal4/__equal4_p, can't we? It's similar
>> to the existing "if (_RAIters()) { ... }".
>>
>> I'd expect the patches to be renaming the current implementations and
>> adding wrappers, instead of adding new implementations.
>
>
> Well I decided to split the existing functions up and use tag
> dispatching, which is conceptually cleaner anyway. But as the
> RandomAccessIterator version doesn't need any operations that aren't
> valid for other categories, it's not strictly necessary. The tag
> dispatching version should generate slightly smaller code for
> unoptimized builds, but that's not very important.

Unoptimized builds don't inline small functions, therefore the first
patch generate two weak symbols, instead of one by the second patch.
It's unclear to me how would number of symbols penalize the
performance/binary size.

>
> Here's the patch doing it as you suggest. We can't call the new
> functions __equal because t hat name is already taken by a helper
> struct, hence __equal4.
>
> Do you prefer this version?

Yes, I prefer this version for readability reasons:
1) subjectively, less scattered code; and
2) ideally I want `if constexpr (...)`), the if version is closer.

I agree that it's not a big difference. I just wanted to point out the
small difference. I'm fine with either version.

Thanks for the prototyping!
Jonathan Wakely Sept. 19, 2017, 2:38 p.m. UTC | #7
On 18/09/17 16:54 -0700, Tim Shen wrote:
>On Mon, Sep 18, 2017 at 4:01 PM, Jonathan Wakely <jwakely@redhat.com> wrote:
>> On 18/09/17 10:58 -0700, Tim Shen via libstdc++ wrote:
>>>
>>> On Mon, Sep 18, 2017 at 10:26 AM, Jonathan Wakely <jwakely@redhat.com>
>>> wrote:
>>>>>
>>>>> We need to rewrite this to check the lengths are equal first, and then
>>>>> call the 3-argument version of std::equal.
>>>>>
>>>>> Alternatively, we could move the implementation of the C++14
>>>>> std::equal overloads to __equal and make that available for C++11.
>>>>> I'll try that.
>>>>
>>>>
>>>>
>>>> Here's a proof of concept patch for that. It's a bit ugly.
>>>
>>>
>>> Instead of having iterator tags in the interface, we can probe the
>>> random-access-ness inside __equal4/__equal4_p, can't we? It's similar
>>> to the existing "if (_RAIters()) { ... }".
>>>
>>> I'd expect the patches to be renaming the current implementations and
>>> adding wrappers, instead of adding new implementations.
>>
>>
>> Well I decided to split the existing functions up and use tag
>> dispatching, which is conceptually cleaner anyway. But as the
>> RandomAccessIterator version doesn't need any operations that aren't
>> valid for other categories, it's not strictly necessary. The tag
>> dispatching version should generate slightly smaller code for
>> unoptimized builds, but that's not very important.
>
>Unoptimized builds don't inline small functions, therefore the first
>patch generate two weak symbols, instead of one by the second patch.

Two small functions that only do the necessary work, rather than one
large function that has a branch for RAIters even when it can never be
taken.

>It's unclear to me how would number of symbols penalize the
>performance/binary size.

People who care about performance or binary size should be optimizing,
and in that case the RAIters branch will be known at compile-time and
the dead code should get removed, and the wrapper functions inlined.

>> Here's the patch doing it as you suggest. We can't call the new
>> functions __equal because t hat name is already taken by a helper
>> struct, hence __equal4.
>>
>> Do you prefer this version?
>
>Yes, I prefer this version for readability reasons:
>1) subjectively, less scattered code; and
>2) ideally I want `if constexpr (...)`), the if version is closer.

Yes, we could add _GLIBCXX17_CONSTEXPR there, but I'm not sure it's
worth doing.

3) The calls to __equal4 in _Backref_matcher are simpler.

>I agree that it's not a big difference. I just wanted to point out the
>small difference. I'm fine with either version.

I'll commit the second version.

>Thanks for the prototyping!
>
>
>-- 
>Regards,
>Tim Shen
Jonathan Wakely Sept. 19, 2017, 2:41 p.m. UTC | #8
The failures that need to be fixed can be seen at 
https://gcc.gnu.org/ml/gcc-testresults/2017-09/msg01633.html
Jonathan Wakely Sept. 19, 2017, 5:07 p.m. UTC | #9
On 19/09/17 15:38 +0100, Jonathan Wakely wrote:
>On 18/09/17 16:54 -0700, Tim Shen wrote:
>>On Mon, Sep 18, 2017 at 4:01 PM, Jonathan Wakely <jwakely@redhat.com> wrote:
>>>On 18/09/17 10:58 -0700, Tim Shen via libstdc++ wrote:
>>>>
>>>>On Mon, Sep 18, 2017 at 10:26 AM, Jonathan Wakely <jwakely@redhat.com>
>>>>wrote:
>>>>>>
>>>>>>We need to rewrite this to check the lengths are equal first, and then
>>>>>>call the 3-argument version of std::equal.
>>>>>>
>>>>>>Alternatively, we could move the implementation of the C++14
>>>>>>std::equal overloads to __equal and make that available for C++11.
>>>>>>I'll try that.
>>>>>
>>>>>
>>>>>
>>>>>Here's a proof of concept patch for that. It's a bit ugly.
>>>>
>>>>
>>>>Instead of having iterator tags in the interface, we can probe the
>>>>random-access-ness inside __equal4/__equal4_p, can't we? It's similar
>>>>to the existing "if (_RAIters()) { ... }".
>>>>
>>>>I'd expect the patches to be renaming the current implementations and
>>>>adding wrappers, instead of adding new implementations.
>>>
>>>
>>>Well I decided to split the existing functions up and use tag
>>>dispatching, which is conceptually cleaner anyway. But as the
>>>RandomAccessIterator version doesn't need any operations that aren't
>>>valid for other categories, it's not strictly necessary. The tag
>>>dispatching version should generate slightly smaller code for
>>>unoptimized builds, but that's not very important.
>>
>>Unoptimized builds don't inline small functions, therefore the first
>>patch generate two weak symbols, instead of one by the second patch.
>
>Two small functions that only do the necessary work, rather than one
>large function that has a branch for RAIters even when it can never be
>taken.
>
>>It's unclear to me how would number of symbols penalize the
>>performance/binary size.
>
>People who care about performance or binary size should be optimizing,
>and in that case the RAIters branch will be known at compile-time and
>the dead code should get removed, and the wrapper functions inlined.
>
>>>Here's the patch doing it as you suggest. We can't call the new
>>>functions __equal because t hat name is already taken by a helper
>>>struct, hence __equal4.
>>>
>>>Do you prefer this version?
>>
>>Yes, I prefer this version for readability reasons:
>>1) subjectively, less scattered code; and
>>2) ideally I want `if constexpr (...)`), the if version is closer.
>
>Yes, we could add _GLIBCXX17_CONSTEXPR there, but I'm not sure it's
>worth doing.
>
>3) The calls to __equal4 in _Backref_matcher are simpler.
>
>>I agree that it's not a big difference. I just wanted to point out the
>>small difference. I'm fine with either version.
>
>I'll commit the second version.

Here's what I've committed, with a minimal test to catch this
happening in future.

I'll re-run the full set of test variations.
commit 371c5de025c0fc95420d96bf96f3da84e3725c9d
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Tue Sep 19 17:36:52 2017 +0100

    PR libstdc++/71500 restore C++11 compatibility in <regex>
    
            PR libstdc++/71500
            * include/bits/regex_executor.tcc
            (_Backref_matcher<BidIt, regex_traits<C>>::_M_apply): Use
            std::__equal4 instead of C++14 4-iterator overloads of std::equal.
            * include/bits/stl_algobase.h (__equal4): New functions implementing
            4-iterator overloads of std::equal for use in C++11.
            (equal(It1, It1, It2, It2), equal(It1, It1, It2, It2, BinaryPred)):
            Move function bodies to new __equal4 functions.
            * testsuite/28_regex/simple_c++11.cc: New.

diff --git a/libstdc++-v3/include/bits/regex_executor.tcc b/libstdc++-v3/include/bits/regex_executor.tcc
index f6149fecf9d..2ceba35e7b8 100644
--- a/libstdc++-v3/include/bits/regex_executor.tcc
+++ b/libstdc++-v3/include/bits/regex_executor.tcc
@@ -366,17 +366,17 @@ namespace __detail
 	       _BiIter __actual_end)
       {
 	if (!_M_icase)
-	  return std::equal(__expected_begin, __expected_end,
-			    __actual_begin, __actual_end);
+	  return std::__equal4(__expected_begin, __expected_end,
+			       __actual_begin, __actual_end);
 	typedef std::ctype<_CharT> __ctype_type;
 	const auto& __fctyp = use_facet<__ctype_type>(_M_traits.getloc());
-	return std::equal(__expected_begin, __expected_end,
-			  __actual_begin, __actual_end,
-			  [this, &__fctyp](_CharT __lhs, _CharT __rhs)
-			  {
-			    return __fctyp.tolower(__lhs)
-				== __fctyp.tolower(__rhs);
-			  });
+	return std::__equal4(__expected_begin, __expected_end,
+			     __actual_begin, __actual_end,
+			     [this, &__fctyp](_CharT __lhs, _CharT __rhs)
+			     {
+			       return __fctyp.tolower(__lhs)
+				 == __fctyp.tolower(__rhs);
+			     });
       }
 
       bool _M_icase;
diff --git a/libstdc++-v3/include/bits/stl_algobase.h b/libstdc++-v3/include/bits/stl_algobase.h
index f68ecb22b82..a80934c4faa 100644
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -1082,6 +1082,60 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
       return true;
     }
 
+#if __cplusplus >= 201103L
+  // 4-iterator version of std::equal<It1, It2> for use in C++11.
+  template<typename _II1, typename _II2>
+    inline bool
+    __equal4(_II1 __first1, _II1 __last1, _II2 __first2, _II2 __last2)
+    {
+      using _RATag = random_access_iterator_tag;
+      using _Cat1 = typename iterator_traits<_II1>::iterator_category;
+      using _Cat2 = typename iterator_traits<_II2>::iterator_category;
+      using _RAIters = __and_<is_same<_Cat1, _RATag>, is_same<_Cat2, _RATag>>;
+      if (_RAIters())
+	{
+	  auto __d1 = std::distance(__first1, __last1);
+	  auto __d2 = std::distance(__first2, __last2);
+	  if (__d1 != __d2)
+	    return false;
+	  return _GLIBCXX_STD_A::equal(__first1, __last1, __first2);
+	}
+
+      for (; __first1 != __last1 && __first2 != __last2;
+	  ++__first1, (void)++__first2)
+	if (!(*__first1 == *__first2))
+	  return false;
+      return __first1 == __last1 && __first2 == __last2;
+    }
+
+  // 4-iterator version of std::equal<It1, It2, BinaryPred> for use in C++11.
+  template<typename _II1, typename _II2, typename _BinaryPredicate>
+    inline bool
+    __equal4(_II1 __first1, _II1 __last1, _II2 __first2, _II2 __last2,
+	     _BinaryPredicate __binary_pred)
+    {
+      using _RATag = random_access_iterator_tag;
+      using _Cat1 = typename iterator_traits<_II1>::iterator_category;
+      using _Cat2 = typename iterator_traits<_II2>::iterator_category;
+      using _RAIters = __and_<is_same<_Cat1, _RATag>, is_same<_Cat2, _RATag>>;
+      if (_RAIters())
+	{
+	  auto __d1 = std::distance(__first1, __last1);
+	  auto __d2 = std::distance(__first2, __last2);
+	  if (__d1 != __d2)
+	    return false;
+	  return _GLIBCXX_STD_A::equal(__first1, __last1, __first2,
+				       __binary_pred);
+	}
+
+      for (; __first1 != __last1 && __first2 != __last2;
+	  ++__first1, (void)++__first2)
+	if (!bool(__binary_pred(*__first1, *__first2)))
+	  return false;
+      return __first1 == __last1 && __first2 == __last2;
+    }
+#endif // C++11
+
 #if __cplusplus > 201103L
 
 #define __cpp_lib_robust_nonmodifying_seq_ops 201304
@@ -1112,24 +1166,7 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
       __glibcxx_requires_valid_range(__first1, __last1);
       __glibcxx_requires_valid_range(__first2, __last2);
 
-      using _RATag = random_access_iterator_tag;
-      using _Cat1 = typename iterator_traits<_II1>::iterator_category;
-      using _Cat2 = typename iterator_traits<_II2>::iterator_category;
-      using _RAIters = __and_<is_same<_Cat1, _RATag>, is_same<_Cat2, _RATag>>;
-      if (_RAIters())
-	{
-	  auto __d1 = std::distance(__first1, __last1);
-	  auto __d2 = std::distance(__first2, __last2);
-	  if (__d1 != __d2)
-	    return false;
-	  return _GLIBCXX_STD_A::equal(__first1, __last1, __first2);
-	}
-
-      for (; __first1 != __last1 && __first2 != __last2;
-	  ++__first1, (void)++__first2)
-	if (!(*__first1 == *__first2))
-	  return false;
-      return __first1 == __last1 && __first2 == __last2;
+      return _GLIBCXX_STD_A::__equal4(__first1, __last1, __first2, __last2);
     }
 
   /**
@@ -1159,27 +1196,10 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
       __glibcxx_requires_valid_range(__first1, __last1);
       __glibcxx_requires_valid_range(__first2, __last2);
 
-      using _RATag = random_access_iterator_tag;
-      using _Cat1 = typename iterator_traits<_IIter1>::iterator_category;
-      using _Cat2 = typename iterator_traits<_IIter2>::iterator_category;
-      using _RAIters = __and_<is_same<_Cat1, _RATag>, is_same<_Cat2, _RATag>>;
-      if (_RAIters())
-	{
-	  auto __d1 = std::distance(__first1, __last1);
-	  auto __d2 = std::distance(__first2, __last2);
-	  if (__d1 != __d2)
-	    return false;
-	  return _GLIBCXX_STD_A::equal(__first1, __last1, __first2,
-				       __binary_pred);
-	}
-
-      for (; __first1 != __last1 && __first2 != __last2;
-	  ++__first1, (void)++__first2)
-	if (!bool(__binary_pred(*__first1, *__first2)))
-	  return false;
-      return __first1 == __last1 && __first2 == __last2;
+      return _GLIBCXX_STD_A::__equal4(__first1, __last1, __first2, __last2,
+				      __binary_pred);
     }
-#endif
+#endif // C++14
 
   /**
    *  @brief Performs @b dictionary comparison on ranges.
diff --git a/libstdc++-v3/testsuite/28_regex/simple_c++11.cc b/libstdc++-v3/testsuite/28_regex/simple_c++11.cc
new file mode 100644
index 00000000000..2cfa503fc07
--- /dev/null
+++ b/libstdc++-v3/testsuite/28_regex/simple_c++11.cc
@@ -0,0 +1,27 @@
+// Copyright (C) 2017 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// <http://www.gnu.org/licenses/>.
+
+// { dg-options "-std=gnu++11" }
+// { dg-do compile }
+
+#include <regex>
+
+// Ensure compilation of trivial example still works with C++11.
+// https://gcc.gnu.org/ml/libstdc++/2017-09/msg00040.html
+std::regex r{""};
+std::cmatch m;
+bool b = regex_match("", m, r);
diff mbox series

Patch

commit a97b7fecd319e031ffc489a956b8cf3dc63eeb26
Author: Tim Shen <timshen@google.com>
Date:   Mon Sep 4 03:19:35 2017 -0700

            PR libstdc++/71500
            * include/bits/regex_executor.tcc: Support icase in
            regex_tratis<...> for back reference matches.
            * testsuite/28_regex/regression.cc: Test case.

diff --git a/libstdc++-v3/include/bits/regex_executor.tcc b/libstdc++-v3/include/bits/regex_executor.tcc
index 226e05856e1..f6149fecf9d 100644
--- a/libstdc++-v3/include/bits/regex_executor.tcc
+++ b/libstdc++-v3/include/bits/regex_executor.tcc
@@ -335,6 +335,54 @@  namespace __detail
 	  _M_states._M_queue(__state._M_next, _M_cur_results);
     }
 
+  template<typename _BiIter, typename _TraitsT>
+    struct _Backref_matcher
+    {
+      _Backref_matcher(bool __icase, const _TraitsT& __traits)
+      : _M_traits(__traits) { }
+
+      bool
+      _M_apply(_BiIter __expected_begin,
+	       _BiIter __expected_end, _BiIter __actual_begin,
+	       _BiIter __actual_end)
+      {
+	return _M_traits.transform(__expected_begin, __expected_end)
+	    == _M_traits.transform(__actual_begin, __actual_end);
+      }
+
+      const _TraitsT& _M_traits;
+    };
+
+  template<typename _BiIter, typename _CharT>
+    struct _Backref_matcher<_BiIter, std::regex_traits<_CharT>>
+    {
+      using _TraitsT = std::regex_traits<_CharT>;
+      _Backref_matcher(bool __icase, const _TraitsT& __traits)
+      : _M_icase(__icase), _M_traits(__traits) { }
+
+      bool
+      _M_apply(_BiIter __expected_begin,
+	       _BiIter __expected_end, _BiIter __actual_begin,
+	       _BiIter __actual_end)
+      {
+	if (!_M_icase)
+	  return std::equal(__expected_begin, __expected_end,
+			    __actual_begin, __actual_end);
+	typedef std::ctype<_CharT> __ctype_type;
+	const auto& __fctyp = use_facet<__ctype_type>(_M_traits.getloc());
+	return std::equal(__expected_begin, __expected_end,
+			  __actual_begin, __actual_end,
+			  [this, &__fctyp](_CharT __lhs, _CharT __rhs)
+			  {
+			    return __fctyp.tolower(__lhs)
+				== __fctyp.tolower(__rhs);
+			  });
+      }
+
+      bool _M_icase;
+      const _TraitsT& _M_traits;
+    };
+
   // First fetch the matched result from _M_cur_results as __submatch;
   // then compare it with
   // (_M_current, _M_current + (__submatch.second - __submatch.first)).
@@ -355,9 +403,10 @@  namespace __detail
 	   __last != _M_end && __tmp != __submatch.second;
 	   ++__tmp)
 	++__last;
-      if (_M_re._M_automaton->_M_traits.transform(__submatch.first,
-						  __submatch.second)
-	  == _M_re._M_automaton->_M_traits.transform(_M_current, __last))
+      if (_Backref_matcher<_BiIter, _TraitsT>(
+	      _M_re.flags() & regex_constants::icase,
+	      _M_re._M_automaton->_M_traits)._M_apply(
+		  __submatch.first, __submatch.second, _M_current, __last))
 	{
 	  if (__last != _M_current)
 	    {
diff --git a/libstdc++-v3/testsuite/28_regex/regression.cc b/libstdc++-v3/testsuite/28_regex/regression.cc
index ee4d3e1e6f8..3fa9022eac4 100644
--- a/libstdc++-v3/testsuite/28_regex/regression.cc
+++ b/libstdc++-v3/testsuite/28_regex/regression.cc
@@ -93,6 +93,17 @@  test06()
   }
 }
 
+// PR libstdc++/71500
+void
+test07()
+{
+  bool test [[gnu::unused]] = true;
+
+  VERIFY(regex_match_debug("abc abc", regex("([a-z]+) \\1", regex::icase)));
+  VERIFY(regex_match_debug("Abc abc", regex("([a-z]+) \\1", regex::icase)));
+  VERIFY(regex_match_debug("abc Abc", regex("([a-z]+) \\1", regex::icase)));
+}
+
 int
 main()
 {
@@ -102,6 +113,7 @@  main()
   test04();
   test05();
   test06();
+  test07();
   return 0;
 }