From patchwork Sat Sep 12 02:04:38 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tim Shen X-Patchwork-Id: 517063 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 8C35F1401AD for ; Sat, 12 Sep 2015 12:05:04 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=PTqw6eSQ; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; q=dns; s=default; b=VyIFxL6DdNyuuSoOGg B6MuZ9aUe2KOXvcryXVEtFy7xR8+FC/LQhv8zGswV0Y2Z5NVrnZv5RfpD+aiR1G4 XdPYHOe70EVfSxL005v5mkhYbHKRQdkVhVjk7D4I8aJvHmXFQkncd4qwbpM4Q8LZ stLdDeJZ31wkBA8id8D7T9imQ= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; s=default; bh=rvsrfVAOwLHW2XIo42oWBOz5 3+Y=; b=PTqw6eSQeTL72kT61FaK2iHQ17Su6uaRhEsSuknVfY2USFXvfjxy/rZV HHcljTgrxcUiGIKyOvHxBtkWCzLxAQI5XsHf/kGOCzqPANAOiDWBvvwbWywzXX9f Uy+8uWTf3KoP/cVWVAfoh3rgeaI1C8WwirgSVTCnRjMRQUK/RyY= Received: (qmail 74699 invoked by alias); 12 Sep 2015 02:04:46 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 74667 invoked by uid 89); 12 Sep 2015 02:04:43 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.2 required=5.0 tests=AWL, BAYES_50, RCVD_IN_DNSWL_LOW, SPF_PASS, T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: mail-qg0-f54.google.com Received: from mail-qg0-f54.google.com (HELO mail-qg0-f54.google.com) (209.85.192.54) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Sat, 12 Sep 2015 02:04:41 +0000 Received: by qgx61 with SMTP id 61so77581603qgx.3 for ; Fri, 11 Sep 2015 19:04:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=/Sr2F/MwTM9dzkHpOj9s5oE+ECZCZTG1gF8d7UoEn6Y=; b=clKOyDoudmdnCFTGJtzjDhJoO9OUyEj9QlMGn7x74jFCHsIxEwx47YZN9Pby88sB6F 1+HitntJepep7VwHi/vKXJOZpx1xKqnyankAZtDD/TGWFbtd/jXKmBDWgcF4UFN9w9W7 9JHMzdBG7uGgk+rcpaz381xRKmOI3XQ6Bc+ZoTqYTJxFCH7gOGpzOjFxkp+82bFlPC2Q Mjexd2bFDwMV8Uj7DJdXxXlQbh4NawB0gazrXoX/Lt21zHGQSsS5TpQCGNObytSTGxwA L1bC4RrguzlgpnRfVkLze69zMGSVqjagE1SanSWbenDyO/KALWIKZAVNNyKzCCg6Y1da qvEw== X-Gm-Message-State: ALoCoQnS7sRIuuFfSoozrxqgR5fqXLiM8hc34OoUSjwvBtk4+1IwZ226L5DWwgF9j+9uCyC2JjH2 MIME-Version: 1.0 X-Received: by 10.140.94.214 with SMTP id g80mr3026214qge.57.1442023478350; Fri, 11 Sep 2015 19:04:38 -0700 (PDT) Received: by 10.55.6.14 with HTTP; Fri, 11 Sep 2015 19:04:38 -0700 (PDT) In-Reply-To: <20150907110614.GG2631@redhat.com> References: <20150828155946.GZ2631@redhat.com> <20150907110614.GG2631@redhat.com> Date: Fri, 11 Sep 2015 19:04:38 -0700 Message-ID: Subject: Re: [Patch, libstdc++] Add specific error message into exceptions From: Tim Shen To: Jonathan Wakely Cc: "libstdc++" , gcc-patches On Mon, Sep 7, 2015 at 4:06 AM, Jonathan Wakely wrote: > On 28/08/15 11:23 -0700, Tim Shen wrote: >> >> On Fri, Aug 28, 2015 at 8:59 AM, Jonathan Wakely >> wrote: >>> >>> There seems to be no need to construct a std::string here, just pass a >>> const char* (see below). >> >> >> To be honest, I wasn't considering performance for a bit, since >> exceptions are already considered slow by me :P. But yes, we can do >> less allocations. >> >>> I wonder if we want to make this more efficient by adding a private >>> member to regex_error that would allow information to be appended to >>> the string, rather then creating a new regex_error with a new string. > > > In case it wasn't clear, I was suggesting to add a private member > *function* not data member. > >> I can add a helper function to _Scanner to construct the exception >> object for only once. For functions that can't access this helper, use >> return value for error handling. >> >>> I suggest adding another overload that takes a const char* rather than >>> std::string. The reason is that when using the new ABI this function >>> will take a std::__cxx11::string, so calling it will allocate memory >>> for the string data, then that string is passed to the regex_error >>> constructor which has to convert it internally to an old std::string, >>> which has to allocate a second time. >> >> >> First, to make it clear: due to _M_get_location_string(), we need >> dynamic allocation. >> >> So is it good to have an owned raw pointer stored in runtime_error, >> pointing to a heap allocated char chunk, which will be deallocated in >> regex_error's dtor? > > > No, adding that pointer is an ABI change. > > If you can't do it without an ABI change then you will have to lose > the _M_get_location_string() functionality. It seems non-essential > anyway. Ok then, let's not append dynamic location information, but use a string literal pointer only. diff --git a/libstdc++-v3/include/bits/regex_automaton.h b/libstdc++-v3/include/bits/regex_automaton.h index b6ab307..1f672ee 100644 --- a/libstdc++-v3/include/bits/regex_automaton.h +++ b/libstdc++-v3/include/bits/regex_automaton.h @@ -327,7 +327,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { this->push_back(std::move(__s)); if (this->size() > _GLIBCXX_REGEX_STATE_LIMIT) - __throw_regex_error(regex_constants::error_space); + __throw_regex_error( + regex_constants::error_space, + "Number of NFA states exceeds limit. Please use shorter regex " + "string, or use smaller brace expression, or make " + "_GLIBCXX_REGEX_STATE_LIMIT larger."); return this->size()-1; } diff --git a/libstdc++-v3/include/bits/regex_automaton.tcc b/libstdc++-v3/include/bits/regex_automaton.tcc index f6f63a1..9bb1164 100644 --- a/libstdc++-v3/include/bits/regex_automaton.tcc +++ b/libstdc++-v3/include/bits/regex_automaton.tcc @@ -149,7 +149,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _NFA<_TraitsT>::_M_insert_backref(size_t __index) { if (this->_M_flags & regex_constants::__polynomial) - __throw_regex_error(regex_constants::error_complexity); + __throw_regex_error(regex_constants::error_complexity, + "Unexpected back-reference in polynomial mode."); // To figure out whether a backref is valid, a stack is used to store // unfinished sub-expressions. For example, when parsing // "(a(b)(c\\1(d)))" at '\\1', _M_subexpr_count is 3, indicating that 3 @@ -158,10 +159,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION // _M_paren_stack is {1, 3}, for incomplete "(a.." and "(c..". At this // time, "\\2" is valid, but "\\1" and "\\3" are not. if (__index >= _M_subexpr_count) - __throw_regex_error(regex_constants::error_backref); + __throw_regex_error( + regex_constants::error_backref, + "Back-reference index exceeds current sub-expression count."); for (auto __it : this->_M_paren_stack) if (__index == __it) - __throw_regex_error(regex_constants::error_backref); + __throw_regex_error( + regex_constants::error_backref, + "Back-reference referred to an opened sub-expression."); this->_M_has_backref = true; _StateT __tmp(_S_opcode_backref); __tmp._M_backref_index = __index; diff --git a/libstdc++-v3/include/bits/regex_compiler.h b/libstdc++-v3/include/bits/regex_compiler.h index 07a9ed3..1f6348a 100644 --- a/libstdc++-v3/include/bits/regex_compiler.h +++ b/libstdc++-v3/include/bits/regex_compiler.h @@ -392,7 +392,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION auto __st = _M_traits.lookup_collatename(__s.data(), __s.data() + __s.size()); if (__st.empty()) - __throw_regex_error(regex_constants::error_collate); + __throw_regex_error(regex_constants::error_collate, + "Invalid collate element."); _M_char_set.push_back(_M_translator._M_translate(__st[0])); _GLIBCXX_DEBUG_ONLY(_M_is_ready = false); return __st; @@ -404,7 +405,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION auto __st = _M_traits.lookup_collatename(__s.data(), __s.data() + __s.size()); if (__st.empty()) - __throw_regex_error(regex_constants::error_collate); + __throw_regex_error(regex_constants::error_collate, + "Invalid equivalence class."); __st = _M_traits.transform_primary(__st.data(), __st.data() + __st.size()); _M_equiv_set.push_back(__st); @@ -419,7 +421,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __s.data() + __s.size(), __icase); if (__mask == 0) - __throw_regex_error(regex_constants::error_ctype); + __throw_regex_error(regex_constants::error_collate, + "Invalid character class."); if (!__neg) _M_class_set |= __mask; else @@ -431,7 +434,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _M_make_range(_CharT __l, _CharT __r) { if (__l > __r) - __throw_regex_error(regex_constants::error_range); + __throw_regex_error(regex_constants::error_range, + "Invalid range in bracket expression."); _M_range_set.push_back(make_pair(_M_translator._M_transform(__l), _M_translator._M_transform(__r))); _GLIBCXX_DEBUG_ONLY(_M_is_ready = false); diff --git a/libstdc++-v3/include/bits/regex_compiler.tcc b/libstdc++-v3/include/bits/regex_compiler.tcc index 336a2e8..f7d52fc 100644 --- a/libstdc++-v3/include/bits/regex_compiler.tcc +++ b/libstdc++-v3/include/bits/regex_compiler.tcc @@ -162,7 +162,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION auto __neg = _M_value[0] == 'n'; this->_M_disjunction(); if (!_M_match_token(_ScannerT::_S_token_subexpr_end)) - __throw_regex_error(regex_constants::error_paren); + __throw_regex_error(regex_constants::error_paren, + "Parenthesis is not closed."); auto __tmp = _M_pop(); __tmp._M_append(_M_nfa->_M_insert_accept()); _M_stack.push( @@ -184,7 +185,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION auto __init = [this, &__neg]() { if (_M_stack.empty()) - __throw_regex_error(regex_constants::error_badrepeat); + __throw_regex_error(regex_constants::error_badrepeat, + "Nothing to repeat before a quantifier."); __neg = __neg && _M_match_token(_ScannerT::_S_token_opt); }; if (_M_match_token(_ScannerT::_S_token_closure0)) @@ -220,9 +222,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION else if (_M_match_token(_ScannerT::_S_token_interval_begin)) { if (_M_stack.empty()) - __throw_regex_error(regex_constants::error_badrepeat); + __throw_regex_error(regex_constants::error_badrepeat, + "Nothing to repeat before a quantifier."); if (!_M_match_token(_ScannerT::_S_token_dup_count)) - __throw_regex_error(regex_constants::error_badbrace); + __throw_regex_error(regex_constants::error_badbrace, + "Unexpected token in brace expression."); _StateSeqT __r(_M_pop()); _StateSeqT __e(*_M_nfa, _M_nfa->_M_insert_dummy()); long __min_rep = _M_cur_int_value(10); @@ -238,7 +242,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION else __n = 0; if (!_M_match_token(_ScannerT::_S_token_interval_end)) - __throw_regex_error(regex_constants::error_brace); + __throw_regex_error(regex_constants::error_brace, + "Unexpected end of brace expression."); __neg = __neg && _M_match_token(_ScannerT::_S_token_opt); @@ -257,7 +262,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION else { if (__n < 0) - __throw_regex_error(regex_constants::error_badbrace); + __throw_regex_error(regex_constants::error_badbrace, + "Invalid range in brace expression."); auto __end = _M_nfa->_M_insert_dummy(); // _M_alt is the "match more" branch, and _M_next is the // "match less" one. Switch _M_alt and _M_next of all created @@ -324,7 +330,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _StateSeqT __r(*_M_nfa, _M_nfa->_M_insert_dummy()); this->_M_disjunction(); if (!_M_match_token(_ScannerT::_S_token_subexpr_end)) - __throw_regex_error(regex_constants::error_paren); + __throw_regex_error(regex_constants::error_paren, + "Parenthesis is not closed."); __r._M_append(_M_pop()); _M_stack.push(__r); } @@ -333,7 +340,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _StateSeqT __r(*_M_nfa, _M_nfa->_M_insert_subexpr_begin()); this->_M_disjunction(); if (!_M_match_token(_ScannerT::_S_token_subexpr_end)) - __throw_regex_error(regex_constants::error_paren); + __throw_regex_error(regex_constants::error_paren, + "Parenthesis is not closed."); __r._M_append(_M_pop()); __r._M_append(_M_nfa->_M_insert_subexpr_end()); _M_stack.push(__r); @@ -474,7 +482,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { if (_M_match_token(_ScannerT::_S_token_bracket_end)) return false; - __throw_regex_error(regex_constants::error_range); + __throw_regex_error( + regex_constants::error_range, + "Unexpected dash in bracket expression. For POSIX syntax, " + "a dash is not treated literally only when it is at " + "beginning or end."); } __last_char.first = true; __last_char.second = _M_value[0]; @@ -492,7 +504,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { if (_M_scanner._M_get_token() != _ScannerT::_S_token_bracket_end) - __throw_regex_error(regex_constants::error_range); + __throw_regex_error( + regex_constants::error_range, + "Unexpected end of bracket expression."); __matcher._M_add_char(_M_value[0]); } } @@ -508,7 +522,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _M_ctype.is(_CtypeT::upper, _M_value[0])); else - __throw_regex_error(regex_constants::error_brack); + __throw_regex_error(regex_constants::error_brack, + "Unexpected character in bracket expression."); return true; } diff --git a/libstdc++-v3/include/bits/regex_error.h b/libstdc++-v3/include/bits/regex_error.h index 778edd5..be19fc1 100644 --- a/libstdc++-v3/include/bits/regex_error.h +++ b/libstdc++-v3/include/bits/regex_error.h @@ -155,6 +155,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION regex_constants::error_type code() const { return _M_code; } + + private: + regex_error(regex_constants::error_type __ecode, const char* __what) + : std::runtime_error(__what), _M_code(__ecode) + { } + + friend void __throw_regex_error(regex_constants::error_type, const char*); }; //@} // group regex @@ -162,5 +169,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION void __throw_regex_error(regex_constants::error_type __ecode); + inline void + __throw_regex_error(regex_constants::error_type __ecode, const char* __what) + { _GLIBCXX_THROW_OR_ABORT(regex_error(__ecode, __what)); } + _GLIBCXX_END_NAMESPACE_VERSION } // namespace std diff --git a/libstdc++-v3/include/bits/regex_scanner.tcc b/libstdc++-v3/include/bits/regex_scanner.tcc index c158c65..7d24e06 100644 --- a/libstdc++-v3/include/bits/regex_scanner.tcc +++ b/libstdc++-v3/include/bits/regex_scanner.tcc @@ -108,7 +108,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION if (__c == '\\') { if (_M_current == _M_end) - __throw_regex_error(regex_constants::error_escape); + __throw_regex_error( + regex_constants::error_escape, + "Unexpected end of regex when escaping."); if (!_M_is_basic() || (*_M_current != '(' @@ -125,7 +127,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION if (_M_is_ecma() && *_M_current == '?') { if (++_M_current == _M_end) - __throw_regex_error(regex_constants::error_paren); + __throw_regex_error( + regex_constants::error_paren, + "Unexpected end of regex when in an open parenthesis."); if (*_M_current == ':') { @@ -145,7 +149,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _M_value.assign(1, 'n'); } else - __throw_regex_error(regex_constants::error_paren); + __throw_regex_error( + regex_constants::error_paren, + "Invalid special open parenthesis."); } else if (_M_flags & regex_constants::nosubs) _M_token = _S_token_subexpr_no_group_begin; @@ -204,14 +210,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _M_scan_in_bracket() { if (_M_current == _M_end) - __throw_regex_error(regex_constants::error_brack); + __throw_regex_error( + regex_constants::error_brack, + "Unexpected end of regex when in bracket expression."); auto __c = *_M_current++; if (__c == '[') { if (_M_current == _M_end) - __throw_regex_error(regex_constants::error_brack); + __throw_regex_error(regex_constants::error_brack, + "Unexpected character class open bracket."); if (*_M_current == '.') { @@ -261,7 +270,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _M_scan_in_brace() { if (_M_current == _M_end) - __throw_regex_error(regex_constants::error_brace); + __throw_regex_error( + regex_constants::error_brace, + "Unexpected end of regex when in brace expression."); auto __c = *_M_current++; @@ -285,7 +296,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION ++_M_current; } else - __throw_regex_error(regex_constants::error_badbrace); + __throw_regex_error(regex_constants::error_badbrace, + "Unexpected character in brace expression."); } else if (__c == '}') { @@ -293,7 +305,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _M_token = _S_token_interval_end; } else - __throw_regex_error(regex_constants::error_badbrace); + __throw_regex_error(regex_constants::error_badbrace, + "Unexpected character in brace expression."); } template @@ -302,7 +315,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _M_eat_escape_ecma() { if (_M_current == _M_end) - __throw_regex_error(regex_constants::error_escape); + __throw_regex_error(regex_constants::error_escape, + "Unexpected end of regex when escaping."); auto __c = *_M_current++; auto __pos = _M_find_escape(_M_ctype.narrow(__c, '\0')); @@ -336,7 +350,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION else if (__c == 'c') { if (_M_current == _M_end) - __throw_regex_error(regex_constants::error_escape); + __throw_regex_error( + regex_constants::error_escape, + "Unexpected end of regex when reading control code."); _M_token = _S_token_ord_char; _M_value.assign(1, *_M_current++); } @@ -347,7 +363,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { if (_M_current == _M_end || !_M_ctype.is(_CtypeT::xdigit, *_M_current)) - __throw_regex_error(regex_constants::error_escape); + __throw_regex_error( + regex_constants::error_escape, + "Unexpected end of regex when ascii character."); _M_value += *_M_current++; } _M_token = _S_token_hex_num; @@ -376,7 +394,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _M_eat_escape_posix() { if (_M_current == _M_end) - __throw_regex_error(regex_constants::error_escape); + __throw_regex_error(regex_constants::error_escape, + "Unexpected end of regex when escaping."); auto __c = *_M_current; auto __pos = std::strchr(_M_spec_char, _M_ctype.narrow(__c, '\0')); @@ -401,7 +420,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { #ifdef __STRICT_ANSI__ // POSIX says it is undefined to escape ordinary characters - __throw_regex_error(regex_constants::error_escape); + __throw_regex_error(regex_constants::error_escape, + "Unexpected escape character."); #else _M_token = _S_token_ord_char; _M_value.assign(1, __c); @@ -441,7 +461,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION return; } else - __throw_regex_error(regex_constants::error_escape); + __throw_regex_error(regex_constants::error_escape, + "Unexpected escape character."); } // Eats a character class or throws an exception. @@ -460,9 +481,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION || *_M_current++ != ']') // skip ']' { if (__ch == ':') - __throw_regex_error(regex_constants::error_ctype); + __throw_regex_error(regex_constants::error_ctype, + "Unexpected end of character class."); else - __throw_regex_error(regex_constants::error_collate); + __throw_regex_error(regex_constants::error_collate, + "Unexpected end of character class."); } }