| Submitter | Ralf Wildenhues |
|---|---|
| Date | Sept. 19, 2010, 8:21 a.m. |
| Message ID | <20100919082136.GE5435@gmx.de> |
| Download | mbox | patch |
| Permalink | /patch/65150/ |
| State | New |
| Headers | show |
Comments
Hi Ralf, > Hi Paolo, all, > > I noticed that C++ header uglification seems to be a slow manual > process. That looks suboptimal, and shouldn't be the case, it > should be mostly automatic and quick (in terms of developer time). > Well, I agree, but it seems to me that the issue normally is rather academic because people working on the C++ runtime full time *know from the outset* that uglification is required, *never* write first un-uglified names and then fix each name in a second pass. Thus I see efforts in this area more as a diagnostic, pre-commit tool for patches contributed by accidental contributors, this kind of situation. In other terms, no perfect accuracy required, etc. When you are happy with a script (I see you clearly understand already all the quirks and special cases) feel free to contribute / commit it right away. Thanks! > My first idea was to just hunt the headers for suspicious words > (avoiding the use of the preprocessor to catch all possible code): > > cd libstdc++-v3 > set x `find include \( -name \*.cc -o -name \*.am -o -name \*.in \) \ > -prune -o -type f -print` > shift > perl -e ' > my %words; > undef $/; # slurp whole file at once, for multiline match > while (<>) { > s/\/\/[^\n]*//g; # good enough for C++ comments > s/\/\*.*?\*\///gs; # good enough for C comments > foreach my $word (split /\b/) { > $words{$word}++ > if $word =~ m/^[a-zA-Z][a-zA-Z0-9_]*$/; # ignore _words > } > } > foreach my $word (keys %words) { > print "$words{$word}\t$word\n"; > }' "$@" | > sort -k1n | less > > which already shows lots of potential issues below ext/, but also false > positives from preprocessor statements, string literals, arithmetic > constant prefixes and suffixes. Also, I still have to generate a list > of keywords and C++ API words to exclude, but glancing over them is > fairly easy. > > My next idea would be to instantiate as much code as possible and > extract debugging symbols, maybe that can be an easy second attack > vector. > > Another idea would be a testsuite addition poisoning one- and > two-character identifiers not part of the API? Might be too dangerous > in the presence of broken system headers. > Yes, I agree. > Who designed C++ classes inside <random> with multiple one-character > member names in the API by the way? > Physicists? ;) I'm rather serious actually, the specifications have been largely worked out by physicists at Fermi Lab + Jens Maurer and somehow nobody noticed so far. Bad, I agree that can be a problem. I'm afraid it's a bit too late to change it but if you really think it can be a *serious* problem, you can send a DR to the current LWG Chair, Alisdair Meredith (wg21@alisdairm.net). I'll personally take care of following its iter at the next Meetings. > Anyway, I found a couple of instances with the above approach, patch > below survived bootstrap and regtest on x86_64-unknown-linux-gnu. OK? > Sure. Thanks again, Paolo.
* Paolo Carlini wrote on Sun, Sep 19, 2010 at 11:10:10AM CEST: > > I noticed that C++ header uglification seems to be a slow manual > > process. That looks suboptimal, and shouldn't be the case, it > > should be mostly automatic and quick (in terms of developer time). > > > Well, I agree, but it seems to me that the issue normally is rather > academic because people working on the C++ runtime full time *know from > the outset* that uglification is required, *never* write first > un-uglified names and then fix each name in a second pass. This statement seems to be at odds with several files below include/ext. For example, include/ext/pb_ds/assoc_container.hpp seems not uglified at all. Does it maybe not need uglification for some reason, and if so, is there some rule I'm overlooking for which files don't need it? > Thus I see > efforts in this area more as a diagnostic, pre-commit tool for patches > contributed by accidental contributors, this kind of situation. Sure. > > Who designed C++ classes inside <random> with multiple one-character > > member names in the API by the way? > > > Physicists? ;) I'm rather serious actually, the specifications have been > largely worked out by physicists at Fermi Lab + Jens Maurer and somehow > nobody noticed so far. Bad, I agree that can be a problem. I'm afraid > it's a bit too late to change it but if you really think it can be a > *serious* problem, you can send a DR to the current LWG Chair, Alisdair > Meredith (wg21@alisdairm.net). I'll personally take care of following > its iter at the next Meetings. Well, not serious, it's rather that it looks at odds with the rest of the STL which for the most part uses descriptive names. > > Anyway, I found a couple of instances with the above approach, patch > > below survived bootstrap and regtest on x86_64-unknown-linux-gnu. OK? > > > Sure. Thanks for the super-fast feedback, committed, Ralf
On 09/19/2010 11:59 AM, Ralf Wildenhues wrote: > * Paolo Carlini wrote on Sun, Sep 19, 2010 at 11:10:10AM CEST: > >>> I noticed that C++ header uglification seems to be a slow manual >>> process. That looks suboptimal, and shouldn't be the case, it >>> should be mostly automatic and quick (in terms of developer time). >>> >>> >> Well, I agree, but it seems to me that the issue normally is rather >> academic because people working on the C++ runtime full time *know from >> the outset* that uglification is required, *never* write first >> un-uglified names and then fix each name in a second pass. >> > This statement seems to be at odds with several files below include/ext. > For example, include/ext/pb_ds/assoc_container.hpp seems not uglified at > all. Does it maybe not need uglification for some reason, and if so, is > there some rule I'm overlooking for which files don't need it? > We are exactly talking about one of those "interesting" cases of accidental contributors, at least for ext/pb_ds. This guy contributed the code (we were interested to its substance and we didn't want to enforce our normal policy about uglification as a mandatory requirement before committing) and then disappeared, sigh. For some other bits of ext/ which you may find odd, consider that it also hosts legacy stuff, other code contributed by accidental contributors and moved here, etc. I think something is wrong also in typelist.h. For patches I will personally review and approve, this kind of sloppiness is not going to happen anymore, I promise. Paolo.
> This statement seems to be at odds with several files below > include/ext. For example, include/ext/pb_ds/assoc_container.hpp seems > not uglified at all. Does it maybe not need uglification for some > reason, and if so, is there some rule I'm overlooking for which files > don't need it? It needs/needed uglification. Thanks for your efforts in this area. > > Thus I see > > efforts in this area more as a diagnostic, pre-commit tool for > > patches contributed by accidental contributors, this kind of > > situation. Sounds great. You can even add the BADNAMES entries to a poison list. -benjamin
Patch
diff --git a/libstdc++-v3/include/ext/throw_allocator.h b/libstdc++-v3/include/ext/throw_allocator.h index 669d433..cc34478 100644 --- a/libstdc++-v3/include/ext/throw_allocator.h +++ b/libstdc++-v3/include/ext/throw_allocator.h @@ -737,8 +737,8 @@ namespace std size_t operator()(const __gnu_cxx::throw_value_limit& __val) const { - std::hash<std::size_t> h; - size_t __result = h(__val._M_i); + std::hash<std::size_t> __h; + size_t __result = __h(__val._M_i); return __result; } }; @@ -751,8 +751,8 @@ namespace std size_t operator()(const __gnu_cxx::throw_value_random& __val) const { - std::hash<std::size_t> h; - size_t __result = h(__val._M_i); + std::hash<std::size_t> __h; + size_t __result = __h(__val._M_i); return __result; } }; diff --git a/libstdc++-v3/include/parallel/set_operations.h b/libstdc++-v3/include/parallel/set_operations.h index f6b076f..f552c1d 100644 --- a/libstdc++-v3/include/parallel/set_operations.h +++ b/libstdc++-v3/include/parallel/set_operations.h @@ -103,11 +103,11 @@ namespace __gnu_parallel } _DifferenceType - __count(_IIter __a, _IIter __b, _IIter __c, _IIter d) const + __count(_IIter __a, _IIter __b, _IIter __c, _IIter __d) const { _DifferenceType __counter = 0; - while (__a != __b && __c != d) + while (__a != __b && __c != __d) { if (_M_comp(*__a, *__c)) { @@ -126,12 +126,12 @@ namespace __gnu_parallel } } - return __counter + (__b - __a) + (d - __c); + return __counter + (__b - __a) + (__d - __c); } _OutputIterator - __first_empty(_IIter __c, _IIter d, _OutputIterator __out) const - { return std::copy(__c, d, __out); } + __first_empty(_IIter __c, _IIter __d, _OutputIterator __out) const + { return std::copy(__c, __d, __out); } _OutputIterator __second_empty(_IIter __a, _IIter __b, _OutputIterator __out) const @@ -153,10 +153,10 @@ namespace __gnu_parallel _Compare _M_comp; _OutputIterator - _M_invoke(_IIter __a, _IIter __b, _IIter __c, _IIter d, + _M_invoke(_IIter __a, _IIter __b, _IIter __c, _IIter __d, _OutputIterator __r) const { - while (__a != __b && __c != d) + while (__a != __b && __c != __d) { if (_M_comp(*__a, *__c)) { @@ -177,11 +177,11 @@ namespace __gnu_parallel _DifferenceType __count(_IIter __a, _IIter __b, - _IIter __c, _IIter d) const + _IIter __c, _IIter __d) const { _DifferenceType __counter = 0; - while (__a != __b && __c != d) + while (__a != __b && __c != __d) { if (_M_comp(*__a, *__c)) { diff --git a/libstdc++-v3/include/profile/impl/profiler_node.h b/libstdc++-v3/include/profile/impl/profiler_node.h index d22a3e1..86f541f 100644 --- a/libstdc++-v3/include/profile/impl/profiler_node.h +++ b/libstdc++-v3/include/profile/impl/profiler_node.h @@ -148,7 +148,7 @@ namespace __gnu_profile __stack() const { return _M_stack; } - virtual void __write(FILE* f) const = 0; + virtual void __write(FILE* __f) const = 0; protected: __stack_t _M_stack;
Hi Paolo, all, I noticed that C++ header uglification seems to be a slow manual process. That looks suboptimal, and shouldn't be the case, it should be mostly automatic and quick (in terms of developer time). My first idea was to just hunt the headers for suspicious words (avoiding the use of the preprocessor to catch all possible code): cd libstdc++-v3 set x `find include \( -name \*.cc -o -name \*.am -o -name \*.in \) \ -prune -o -type f -print` shift perl -e ' my %words; undef $/; # slurp whole file at once, for multiline match while (<>) { s/\/\/[^\n]*//g; # good enough for C++ comments s/\/\*.*?\*\///gs; # good enough for C comments foreach my $word (split /\b/) { $words{$word}++ if $word =~ m/^[a-zA-Z][a-zA-Z0-9_]*$/; # ignore _words } } foreach my $word (keys %words) { print "$words{$word}\t$word\n"; }' "$@" | sort -k1n | less which already shows lots of potential issues below ext/, but also false positives from preprocessor statements, string literals, arithmetic constant prefixes and suffixes. Also, I still have to generate a list of keywords and C++ API words to exclude, but glancing over them is fairly easy. My next idea would be to instantiate as much code as possible and extract debugging symbols, maybe that can be an easy second attack vector. Another idea would be a testsuite addition poisoning one- and two-character identifiers not part of the API? Might be too dangerous in the presence of broken system headers. Who designed C++ classes inside <random> with multiple one-character member names in the API by the way? Anyway, I found a couple of instances with the above approach, patch below survived bootstrap and regtest on x86_64-unknown-linux-gnu. OK? Thanks, Ralf Uglify C++ headers some more. libstdc++-v3/ChangeLog: 2010-09-19 Ralf Wildenhues <Ralf.Wildenhues@gmx.de> * include/ext/throw_allocator.h (hash<__gnu_cxx::throw_value_limit>::operator()): Uglify local. (hash<__gnu_cxx::throw_value_random>::operator()): Likewise. * include/parallel/set_operations.h (__symmetric_difference_func): Uglify remaining arguments to __count, __first_empty, _M_invoke. (__difference_func): Likewise for __count. * include/profile/impl/profiler_node.h (__object_info_base::__write): Uglify parameter.