Patchwork [v3,headers] variable uglification

login
register
mail settings
Submitter Ralf Wildenhues
Date Sept. 19, 2010, 8:21 a.m.
Message ID <20100919082136.GE5435@gmx.de>
Download mbox | patch
Permalink /patch/65150/
State New
Headers show

Comments

Ralf Wildenhues - Sept. 19, 2010, 8:21 a.m.
Hi Paolo, all,

I noticed that C++ header uglification seems to be a slow manual
process.  That looks suboptimal, and shouldn't be the case, it
should be mostly automatic and quick (in terms of developer time).

My first idea was to just hunt the headers for suspicious words
(avoiding the use of the preprocessor to catch all possible code):

cd libstdc++-v3
set x `find include \( -name \*.cc -o -name \*.am -o -name \*.in \) \
            -prune -o -type f -print`
shift
perl -e '
  my %words;
  undef $/;             # slurp whole file at once, for multiline match
  while (<>) {
    s/\/\/[^\n]*//g;    # good enough for C++ comments
    s/\/\*.*?\*\///gs;  # good enough for C comments
    foreach my $word (split /\b/) {
      $words{$word}++
        if $word =~ m/^[a-zA-Z][a-zA-Z0-9_]*$/; # ignore _words
    }
  }
  foreach my $word (keys %words) {
    print "$words{$word}\t$word\n";
  }' "$@" |
sort -k1n | less

which already shows lots of potential issues below ext/, but also false
positives from preprocessor statements, string literals, arithmetic
constant prefixes and suffixes.  Also, I still have to generate a list
of keywords and C++ API words to exclude, but glancing over them is
fairly easy.

My next idea would be to instantiate as much code as possible and
extract debugging symbols, maybe that can be an easy second attack
vector.

Another idea would be a testsuite addition poisoning one- and
two-character identifiers not part of the API?  Might be too dangerous
in the presence of broken system headers.

Who designed C++ classes inside <random> with multiple one-character
member names in the API by the way?

Anyway, I found a couple of instances with the above approach, patch
below survived bootstrap and regtest on x86_64-unknown-linux-gnu.  OK?

Thanks,
Ralf

Uglify C++ headers some more.

libstdc++-v3/ChangeLog:
2010-09-19  Ralf Wildenhues  <Ralf.Wildenhues@gmx.de>

	* include/ext/throw_allocator.h
	(hash<__gnu_cxx::throw_value_limit>::operator()): Uglify local.
	(hash<__gnu_cxx::throw_value_random>::operator()): Likewise.
	* include/parallel/set_operations.h (__symmetric_difference_func):
	Uglify remaining arguments to __count, __first_empty, _M_invoke.
	(__difference_func): Likewise for __count.
	* include/profile/impl/profiler_node.h
	(__object_info_base::__write): Uglify parameter.
Paolo Carlini - Sept. 19, 2010, 9:10 a.m.
Hi Ralf,
> Hi Paolo, all,
>
> I noticed that C++ header uglification seems to be a slow manual
> process.  That looks suboptimal, and shouldn't be the case, it
> should be mostly automatic and quick (in terms of developer time).
>   
Well, I agree, but it seems to me that the issue normally is rather
academic because people working on the C++ runtime full time *know from
the outset* that uglification is required, *never* write first
un-uglified names and then fix each name in a second pass. Thus I see
efforts in this area more as a diagnostic, pre-commit tool for patches
contributed by accidental contributors, this kind of situation. In other
terms, no perfect accuracy required, etc. When you are happy with a
script (I see you clearly understand already all the quirks and special
cases) feel free to contribute / commit it right away. Thanks!
> My first idea was to just hunt the headers for suspicious words
> (avoiding the use of the preprocessor to catch all possible code):
>
> cd libstdc++-v3
> set x `find include \( -name \*.cc -o -name \*.am -o -name \*.in \) \
>             -prune -o -type f -print`
> shift
> perl -e '
>   my %words;
>   undef $/;             # slurp whole file at once, for multiline match
>   while (<>) {
>     s/\/\/[^\n]*//g;    # good enough for C++ comments
>     s/\/\*.*?\*\///gs;  # good enough for C comments
>     foreach my $word (split /\b/) {
>       $words{$word}++
>         if $word =~ m/^[a-zA-Z][a-zA-Z0-9_]*$/; # ignore _words
>     }
>   }
>   foreach my $word (keys %words) {
>     print "$words{$word}\t$word\n";
>   }' "$@" |
> sort -k1n | less
>
> which already shows lots of potential issues below ext/, but also false
> positives from preprocessor statements, string literals, arithmetic
> constant prefixes and suffixes.  Also, I still have to generate a list
> of keywords and C++ API words to exclude, but glancing over them is
> fairly easy.
>
> My next idea would be to instantiate as much code as possible and
> extract debugging symbols, maybe that can be an easy second attack
> vector.
>
> Another idea would be a testsuite addition poisoning one- and
> two-character identifiers not part of the API?  Might be too dangerous
> in the presence of broken system headers.
>   
Yes, I agree.
> Who designed C++ classes inside <random> with multiple one-character
> member names in the API by the way?
>   
Physicists? ;) I'm rather serious actually, the specifications have been
largely worked out by physicists at Fermi Lab + Jens Maurer and somehow
nobody noticed so far. Bad, I agree that can be a problem. I'm afraid
it's a bit too late to change it but if you really think it can be a
*serious* problem, you can send a DR to the current LWG Chair, Alisdair
Meredith (wg21@alisdairm.net). I'll personally take care of following
its iter at the next Meetings.
> Anyway, I found a couple of instances with the above approach, patch
> below survived bootstrap and regtest on x86_64-unknown-linux-gnu.  OK?
>   
Sure.

Thanks again,
Paolo.
Ralf Wildenhues - Sept. 19, 2010, 9:59 a.m.
* Paolo Carlini wrote on Sun, Sep 19, 2010 at 11:10:10AM CEST:
> > I noticed that C++ header uglification seems to be a slow manual
> > process.  That looks suboptimal, and shouldn't be the case, it
> > should be mostly automatic and quick (in terms of developer time).
> >   
> Well, I agree, but it seems to me that the issue normally is rather
> academic because people working on the C++ runtime full time *know from
> the outset* that uglification is required, *never* write first
> un-uglified names and then fix each name in a second pass.

This statement seems to be at odds with several files below include/ext.
For example, include/ext/pb_ds/assoc_container.hpp seems not uglified at
all.  Does it maybe not need uglification for some reason, and if so, is
there some rule I'm overlooking for which files don't need it?

> Thus I see
> efforts in this area more as a diagnostic, pre-commit tool for patches
> contributed by accidental contributors, this kind of situation.

Sure.

> > Who designed C++ classes inside <random> with multiple one-character
> > member names in the API by the way?
> >   
> Physicists? ;) I'm rather serious actually, the specifications have been
> largely worked out by physicists at Fermi Lab + Jens Maurer and somehow
> nobody noticed so far. Bad, I agree that can be a problem. I'm afraid
> it's a bit too late to change it but if you really think it can be a
> *serious* problem, you can send a DR to the current LWG Chair, Alisdair
> Meredith (wg21@alisdairm.net). I'll personally take care of following
> its iter at the next Meetings.

Well, not serious, it's rather that it looks at odds with the rest of
the STL which for the most part uses descriptive names.

> > Anyway, I found a couple of instances with the above approach, patch
> > below survived bootstrap and regtest on x86_64-unknown-linux-gnu.  OK?
> >   
> Sure.

Thanks for the super-fast feedback, committed,
Ralf
Paolo Carlini - Sept. 19, 2010, 10:06 a.m.
On 09/19/2010 11:59 AM, Ralf Wildenhues wrote:
> * Paolo Carlini wrote on Sun, Sep 19, 2010 at 11:10:10AM CEST:
>   
>>> I noticed that C++ header uglification seems to be a slow manual
>>> process.  That looks suboptimal, and shouldn't be the case, it
>>> should be mostly automatic and quick (in terms of developer time).
>>>   
>>>       
>> Well, I agree, but it seems to me that the issue normally is rather
>> academic because people working on the C++ runtime full time *know from
>> the outset* that uglification is required, *never* write first
>> un-uglified names and then fix each name in a second pass.
>>     
> This statement seems to be at odds with several files below include/ext.
> For example, include/ext/pb_ds/assoc_container.hpp seems not uglified at
> all.  Does it maybe not need uglification for some reason, and if so, is
> there some rule I'm overlooking for which files don't need it?
>   
We are exactly talking about one of those "interesting" cases of
accidental contributors, at least for ext/pb_ds. This guy contributed
the code (we were interested to its substance and we didn't want to
enforce our normal policy about uglification as a mandatory requirement
before committing) and then disappeared, sigh.

For some other bits of ext/ which you may find odd, consider that it
also hosts legacy stuff, other code contributed by accidental
contributors and moved here, etc. I think something is wrong also in
typelist.h.

For patches I will personally review and approve, this kind of
sloppiness is not going to happen anymore, I promise.

Paolo.
Benjamin Kosnik - Sept. 23, 2010, 4:58 p.m.
> This statement seems to be at odds with several files below
> include/ext. For example, include/ext/pb_ds/assoc_container.hpp seems
> not uglified at all.  Does it maybe not need uglification for some
> reason, and if so, is there some rule I'm overlooking for which files
> don't need it?

It needs/needed uglification. Thanks for your efforts in this area. 
 
> > Thus I see
> > efforts in this area more as a diagnostic, pre-commit tool for
> > patches contributed by accidental contributors, this kind of
> > situation.

Sounds great. You can even add the BADNAMES entries to a poison list.

-benjamin

Patch

diff --git a/libstdc++-v3/include/ext/throw_allocator.h b/libstdc++-v3/include/ext/throw_allocator.h
index 669d433..cc34478 100644
--- a/libstdc++-v3/include/ext/throw_allocator.h
+++ b/libstdc++-v3/include/ext/throw_allocator.h
@@ -737,8 +737,8 @@  namespace std
       size_t
       operator()(const __gnu_cxx::throw_value_limit& __val) const
       {
-	std::hash<std::size_t> h;
-	size_t __result = h(__val._M_i);
+	std::hash<std::size_t> __h;
+	size_t __result = __h(__val._M_i);
 	return __result;
       }
     };
@@ -751,8 +751,8 @@  namespace std
       size_t
       operator()(const __gnu_cxx::throw_value_random& __val) const
       {
-	std::hash<std::size_t> h;
-	size_t __result = h(__val._M_i);
+	std::hash<std::size_t> __h;
+	size_t __result = __h(__val._M_i);
 	return __result;
       }
     };
diff --git a/libstdc++-v3/include/parallel/set_operations.h b/libstdc++-v3/include/parallel/set_operations.h
index f6b076f..f552c1d 100644
--- a/libstdc++-v3/include/parallel/set_operations.h
+++ b/libstdc++-v3/include/parallel/set_operations.h
@@ -103,11 +103,11 @@  namespace __gnu_parallel
       }
 
       _DifferenceType
-      __count(_IIter __a, _IIter __b, _IIter __c, _IIter d) const
+      __count(_IIter __a, _IIter __b, _IIter __c, _IIter __d) const
       {
 	_DifferenceType __counter = 0;
 
-	while (__a != __b && __c != d)
+	while (__a != __b && __c != __d)
           {
             if (_M_comp(*__a, *__c))
               {
@@ -126,12 +126,12 @@  namespace __gnu_parallel
               }
           }
 
-	return __counter + (__b - __a) + (d - __c);
+	return __counter + (__b - __a) + (__d - __c);
       }
 
       _OutputIterator
-      __first_empty(_IIter __c, _IIter d, _OutputIterator __out) const
-      { return std::copy(__c, d, __out); }
+      __first_empty(_IIter __c, _IIter __d, _OutputIterator __out) const
+      { return std::copy(__c, __d, __out); }
 
       _OutputIterator
       __second_empty(_IIter __a, _IIter __b, _OutputIterator __out) const
@@ -153,10 +153,10 @@  namespace __gnu_parallel
       _Compare _M_comp;
 
       _OutputIterator
-      _M_invoke(_IIter __a, _IIter __b, _IIter __c, _IIter d,
+      _M_invoke(_IIter __a, _IIter __b, _IIter __c, _IIter __d,
 		_OutputIterator __r) const
       {
-	while (__a != __b && __c != d)
+	while (__a != __b && __c != __d)
           {
             if (_M_comp(*__a, *__c))
               {
@@ -177,11 +177,11 @@  namespace __gnu_parallel
 
       _DifferenceType
       __count(_IIter __a, _IIter __b,
-	      _IIter __c, _IIter d) const
+	      _IIter __c, _IIter __d) const
       {
 	_DifferenceType __counter = 0;
 
-	while (__a != __b && __c != d)
+	while (__a != __b && __c != __d)
           {
             if (_M_comp(*__a, *__c))
               {
diff --git a/libstdc++-v3/include/profile/impl/profiler_node.h b/libstdc++-v3/include/profile/impl/profiler_node.h
index d22a3e1..86f541f 100644
--- a/libstdc++-v3/include/profile/impl/profiler_node.h
+++ b/libstdc++-v3/include/profile/impl/profiler_node.h
@@ -148,7 +148,7 @@  namespace __gnu_profile
     __stack() const
     { return _M_stack; }
     
-    virtual void __write(FILE* f) const = 0;
+    virtual void __write(FILE* __f) const = 0;
 
   protected:
     __stack_t _M_stack;