diff mbox

[libstdc++/64649] Fix regex_traits::lookup_collatename and regex_traits::lookup_classname

Message ID CAG4ZjN=a3TG-qANYgtgt_e9vhhMCWxvChQRzcBHHYJrwKLszHg@mail.gmail.com
State New
Headers show

Commit Message

Tim Shen Jan. 18, 2015, 7:09 a.m. UTC
Bootstrapped and tested in trunk, but I guess it'll fit 4.9 branch
well. I'll do another test for 4.9 if it's appropriate to commit.

Thanks!

Comments

Jonathan Wakely Jan. 19, 2015, 1:23 p.m. UTC | #1
On 17/01/15 23:09 -0800, Tim Shen wrote:
>Bootstrapped and tested in trunk, but I guess it'll fit 4.9 branch
>well. I'll do another test for 4.9 if it's appropriate to commit.
>
>Thanks!
>
>
>-- 
>Regards,
>Tim Shen

>    	PR libstdc++/64649
>    	* include/bits/regex.tcc (regex_traits<>::lookup_collatename,
>    	regex_traits<>::lookup_classname): Conform forward iterator constrain
>    	for lookup_collatename lookup_classname.

I think this is clearer:

    	* include/bits/regex.tcc (regex_traits<>::lookup_collatename,
    	regex_traits<>::lookup_classname): Support forward iterators.

>    	* testsuite/28_regex/traits/char/lookup_classname.cc: New testcases.
>    	* testsuite/28_regex/traits/char/lookup_collatename.cc: New testcase.
>
>+      string __s(__first, __last);

This assumes the _ForwardIterator value_type is char, or safely
convertible to char. If I'm reading the standard correctly
regex_traits<wchar_t> should be able to accept an iterator range
referring to wide chars, e.g.

  std::forward_list<wchar_t> l{L't', L'i', L'l', L'd', L'e'};
  std::regex_traits<wchar_t>{}.lookup_collatename(l.begin(), l.end())

Maybe it's OK, because all the elements of __collatename use ASCII
chars which have the same value as wide chars? (Probably not true for
all locales).

Your change is definitely an improvement and good enough for now
(maybe we should leave the Bugzilla PR open, with a note about the
wide char issue, and deal with it after stage4).

OK for trunk - thanks.
Tim Shen Jan. 19, 2015, 11:28 p.m. UTC | #2
On Mon, Jan 19, 2015 at 5:23 AM, Jonathan Wakely <jwakely@redhat.com> wrote:
> Maybe it's OK, because all the elements of __collatename use ASCII
> chars which have the same value as wide chars? (Probably not true for
> all locales).

I just noticed that if we want to implement it 100% accurately,
narrowing down the input string to char string is clearly wrong, e.g.

  wchar_t wch = 0x1177; // 0x77, the default narrowed result, is 'w'
  std::wcout << std::regex_traits<wchar_t>().lookup_collatename(&wch,
&wch+1) << L"\n"; // prints w

So the correct approach is widening (and use ctype widen so that users
can customize it) all char strings (those statically stored in these
two functions), and compare them to the input range. But it seems to
be slightly slower, so I wrote this implementation out of
instinctively.

Maybe people (now including me) always know that anything concerned
with locales *are* slow? ;)

> Your change is definitely an improvement and good enough for now
> (maybe we should leave the Bugzilla PR open, with a note about the
> wide char issue, and deal with it after stage4).

I'm not sure how emergent we are, so I just go ahead and committed it.
As I mentioned above, the right way to do this is widening the
expected strings and compare them to the input string.


Thanks!
Tim Shen Jan. 19, 2015, 11:52 p.m. UTC | #3
On Mon, Jan 19, 2015 at 3:28 PM, Tim Shen <timshen@google.com> wrote:
> I just noticed that if we want to implement it 100% accurately,
> narrowing down the input string to char string is clearly wrong.

Let me take that back, since [22.4.1.1.2].12 requires that
do_widen(do_narrow(c,0)) == c, which gives us a chance to use narrow
and still be accurate.

I think it'll be a quick fix.
diff mbox

Patch

diff --git a/libstdc++-v3/include/bits/regex.tcc b/libstdc++-v3/include/bits/regex.tcc
index 4ea8180..3f16e66 100644
--- a/libstdc++-v3/include/bits/regex.tcc
+++ b/libstdc++-v3/include/bits/regex.tcc
@@ -267,53 +267,17 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  "right-curly-bracket",
 	  "tilde",
 	  "DEL",
-	  ""
 	};
 
-      // same as boost
-      //static const char* __digraphs[] =
-      //  {
-      //    "ae",
-      //    "Ae",
-      //    "AE",
-      //    "ch",
-      //    "Ch",
-      //    "CH",
-      //    "ll",
-      //    "Ll",
-      //    "LL",
-      //    "ss",
-      //    "Ss",
-      //    "SS",
-      //    "nj",
-      //    "Nj",
-      //    "NJ",
-      //    "dz",
-      //    "Dz",
-      //    "DZ",
-      //    "lj",
-      //    "Lj",
-      //    "LJ",
-      //    ""
-      //  };
-
-      std::string __s(__last - __first, '?');
-      __fctyp.narrow(__first, __last, '?', &*__s.begin());
-
-      for (unsigned int __i = 0; *__collatenames[__i]; __i++)
-	if (__s == __collatenames[__i])
-	  return string_type(1, __fctyp.widen(static_cast<char>(__i)));
-
-      //for (unsigned int __i = 0; *__digraphs[__i]; __i++)
-      //  {
-      //    const char* __now = __digraphs[__i];
-      //    if (__s == __now)
-      //      {
-      //	string_type ret(__s.size(), __fctyp.widen('?'));
-      //	__fctyp.widen(__now, __now + 2/* ouch */, &*ret.begin());
-      //	return ret;
-      //      }
-      //  }
+      string __s(__first, __last);
+      for (const auto& __it : __collatenames)
+	if (__s == __it)
+	  return string_type(1, __fctyp.widen(
+	    static_cast<char>(&__it - __collatenames)));
+
+      // TODO Add digraph support:
+      // http://boost.sourceforge.net/libs/regex/doc/collating_names.html
+
       return string_type();
     }
 
@@ -324,12 +288,10 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
     lookup_classname(_Fwd_iter __first, _Fwd_iter __last, bool __icase) const
     {
       typedef std::ctype<char_type> __ctype_type;
-      typedef std::ctype<char> __cctype_type;
-      typedef const pair<const char*, char_class_type> _ClassnameEntry;
       const __ctype_type& __fctyp(use_facet<__ctype_type>(_M_locale));
-      const __cctype_type& __cctyp(use_facet<__cctype_type>(_M_locale));
 
-      static _ClassnameEntry __classnames[] =
+      // Mappings from class name to class mask.
+      static const pair<const char*, char_class_type> __classnames[] =
       {
 	{"d", ctype_base::digit},
 	{"w", {ctype_base::alnum, _RegexMask::_S_under}},
@@ -348,22 +310,19 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	{"xdigit", ctype_base::xdigit},
       };
 
-      std::string __s(__last - __first, '?');
-      __fctyp.narrow(__first, __last, '?', &__s[0]);
-      __cctyp.tolower(&*__s.begin(), &*__s.begin() + __s.size());
-      for (_ClassnameEntry* __it = __classnames;
-	   __it < *(&__classnames + 1);
-	   ++__it)
-	{
-	  if (__s == __it->first)
-	    {
-	      if (__icase
-		  && ((__it->second
-		       & (ctype_base::lower | ctype_base::upper)) != 0))
-		return ctype_base::alpha;
-	      return __it->second;
-	    }
-	}
+      string __s;
+      for (auto __cur = __first; __cur != __last; ++__cur)
+	__s += __fctyp.narrow(__fctyp.tolower(*__cur), '?');
+
+      for (const auto& __it : __classnames)
+	if (__s == __it.first)
+	  {
+	    if (__icase
+		&& ((__it.second
+		     & (ctype_base::lower | ctype_base::upper)) != 0))
+	      return ctype_base::alpha;
+	    return __it.second;
+	  }
       return 0;
     }
 
diff --git a/libstdc++-v3/testsuite/28_regex/traits/char/lookup_classname.cc b/libstdc++-v3/testsuite/28_regex/traits/char/lookup_classname.cc
index d7216ce..f4c9758 100644
--- a/libstdc++-v3/testsuite/28_regex/traits/char/lookup_classname.cc
+++ b/libstdc++-v3/testsuite/28_regex/traits/char/lookup_classname.cc
@@ -26,6 +26,7 @@ 
 // 28.7(9) Class template regex_traits [re.traits]
 
 #include <regex>
+#include <forward_list>
 #include <testsuite_hooks.h>
 
 void
@@ -47,8 +48,29 @@  test01()
 	VERIFY( c2 == c3 );
 }
 
+// Test forward iterator
+void
+test02()
+{
+  const char strlit[] = "upper";
+  std::forward_list<char> s(strlit, strlit + strlen(strlit));
+  std::regex_traits<char> traits;
+  VERIFY(traits.isctype('C', traits.lookup_classname(s.begin(), s.end(), false)));
+}
+
+// icase
+void
+test03()
+{
+  std::string s("lower");
+  std::regex_traits<char> traits;
+  VERIFY(traits.isctype('C', traits.lookup_classname(s.begin(), s.end(), true)));
+}
+
 int main()
 {
 	test01();
+	test02();
+	test03();
 	return 0;
 }
diff --git a/libstdc++-v3/testsuite/28_regex/traits/char/lookup_collatename.cc b/libstdc++-v3/testsuite/28_regex/traits/char/lookup_collatename.cc
index 56d0576..fac4a24 100644
--- a/libstdc++-v3/testsuite/28_regex/traits/char/lookup_collatename.cc
+++ b/libstdc++-v3/testsuite/28_regex/traits/char/lookup_collatename.cc
@@ -26,6 +26,7 @@ 
 // 28.7 (8) Class template regex_traits [re.traits]
 
 #include <regex>
+#include <forward_list>
 #include <testsuite_hooks.h>
 
 void
@@ -40,8 +41,19 @@  test01()
   VERIFY(t.lookup_collatename(name, name+sizeof(name)-1) == "~");
 }
 
+// Test forward iterator.
+void
+test02()
+{
+  const char strlit[] = "tilde";
+  std::forward_list<char> s(strlit, strlit + strlen(strlit));
+  std::regex_traits<char> traits;
+  VERIFY(traits.lookup_collatename(s.begin(), s.end()) == "~");
+}
+
 int main()
 {
 	test01();
+	test02();
 	return 0;
 }