diff mbox

[0/3] Header file reduction.

Message ID 5612DF9D.9050602@redhat.com
State New
Headers show

Commit Message

Bernd Schmidt Oct. 5, 2015, 8:37 p.m. UTC
On 10/05/2015 10:10 PM, Andrew MacLeod wrote:
> Its just an example of the sort of redundant includes the tool removes.
> And your assertion turns out to be incorrect... bitmap.h is barely used
> outside the backend, thus it is included in the backend.h aggregator
> (This is the only header now which includes bitmap.h... Most of this
> many-month effort was to untangle all those indirect includes.)

I said a few headers include obstack.h, not bitmap.h, and that's true in 
my (maybe a week old) checkout. My suggestion was to move the include of 
the former to (as Richi corrected) coretypes.h.

And it's one example, but it does point out a problem with this sort of 
automated approach: realistically no one is going to check the whole 
patch, and it may contain changes that could be done better.

>>  * diff -c is somewhat unusual and I find diff -u much more readable.
>
> unsual? I've been using -cp for the past 2 decades and no one has ever
> mentioned it before...   poking around  the wiki I see it mentions you
> can use either -up or -cp.
>
> I guess I could repackage things using -up...  I don't even know where
> my script is to change it :-).   is -u what everyone uses now? no one
> has mentioned it before that I am aware of.

I'm pretty much used to seeing diff -u, whenever I get a -c diff things 
become harder to work out, because the region in the diff you're looking 
at never tells you the full story. In this case in particular, the 
existence of both reordering and removing changes makes it very hard to 
mentally keep track of what's going on.

Let's take this example:

Index: attribs.c
===================================================================
*** attribs.c   (revision 228331)
--- attribs.c   (working copy)
*************** along with GCC; see the file COPYING3.
*** 20,36 ****
   #include "config.h"
   #include "system.h"
   #include "coretypes.h"
! #include "tm.h"
   #include "tree.h"
- #include "alias.h"
   #include "stringpool.h"
   #include "attribs.h"
   #include "stor-layout.h"
- #include "flags.h"
- #include "diagnostic-core.h"
- #include "tm_p.h"
- #include "cpplib.h"
- #include "target.h"
   #include "langhooks.h"
   #include "plugin.h"

--- 20,31 ----
   #include "config.h"
   #include "system.h"
   #include "coretypes.h"
! #include "target.h"
   #include "tree.h"
   #include "stringpool.h"
+ #include "diagnostic-core.h"
   #include "attribs.h"
   #include "stor-layout.h"
   #include "langhooks.h"
   #include "plugin.h"

You could be misled into thinking that diagnostic-core.h and target.h 
are removed, and the algorithm confuses the issue by showing that lines 
"changed" rather than getting removed or added. With a unified diff, the 
size is cut in half which makes things more readable to begin with:

   Index: attribs.c
and with things closer together it's easier to follow what's actually 
going on. This is a smaller example, many instances in your patch are 
actually about a page long and you have to scroll back and forth to work 
things out, getting confused because everything in the 410k of text 
looks the same.

This was actually one of the reasons I proposed splitting the patch into 
reordering and removal phases, it would alleviate the diff -c disadvantages.

> the reduction on all those files will take the better part of a week.

That's a little concerning due to the possibility of intervening 
commits. I'd like to make one requirement for checkin, that you take the 
revision at which you're committing and then run the script again, 
verifying that the process produces the same changes as the patch you 
committed. (Or do things in smaller chunks.).


Bernd

Comments

Andrew MacLeod Oct. 5, 2015, 9:11 p.m. UTC | #1
On 10/05/2015 04:37 PM, Bernd Schmidt wrote:
> On 10/05/2015 10:10 PM, Andrew MacLeod wrote:
>> Its just an example of the sort of redundant includes the tool removes.
>> And your assertion turns out to be incorrect... bitmap.h is barely used
>> outside the backend, thus it is included in the backend.h aggregator
>> (This is the only header now which includes bitmap.h... Most of this
>> many-month effort was to untangle all those indirect includes.)
>
> I said a few headers include obstack.h, not bitmap.h, and that's true 
> in my (maybe a week old) checkout. My suggestion was to move the 
> include of the former to (as Richi corrected) coretypes.h.
>

Ah, sorry the parsing was non-deterministic and I parsed it the other 
way.   My comments refer to the "true" dependencies in the code after 
all the un-needed headers have been trimmed out... i've little doubt 
mainline shows obstack.h and bitmap.h being included everywhere.

In any case, a direct include of obstack.h in coretypes.h was considered 
earlier in the aggregation process and it didn't show up as something 
that would be a win.  It is included a couple of common places that we 
have no control over..  in particular libcpp/include/symtab.h includes 
obstack.h and  is included by tree-core.h.  A very significant number of 
files bring that in.  If we included obstack.h in coretypes.h then those 
files would be including it again for a second time for no particularly 
good reason.  So I made the judgement call to not put it in coretypes.h.

> And it's one example, but it does point out a problem with this sort 
> of automated approach: realistically no one is going to check the 
> whole patch, and it may contain changes that could be done better.

The point being that the aggregation *wasn't* automated... and has 
nothing to do with this patch set.    I analyzed and preformed all that 
sort of thing  earlier. Sure judgment calls were made, but it wasn't 
automated in the slightest.   There are certainly further aggregation 
improvements that could be made... and either I or someone else could do 
more down the road.,  The heavy lifting has all been done now.

So  the *only* thing that is automated is removing include files which 
are not needed so that we can get an idea of what the true dependencies 
in the source base are.




>>>  * diff -c is somewhat unusual and I find diff -u much more readable.
>>
>> unsual? I've been using -cp for the past 2 decades and no one has ever
>> mentioned it before...   poking around  the wiki I see it mentions you
>> can use either -up or -cp.
>>
>> I guess I could repackage things using -up...  I don't even know where
>> my script is to change it :-).   is -u what everyone uses now? no one
>> has mentioned it before that I am aware of.
>
> I'm pretty much used to seeing diff -u, whenever I get a -c diff 
> things become harder to work out, because the region in the diff 
> you're looking at never tells you the full story. In this case in 
> particular, the existence of both reordering and removing changes 
> makes it very hard to mentally keep track of what's going on.
>

I can switch to -u.. I've just never seen the request before.

I can regenerate the patches with -u if you want.
>
>> the reduction on all those files will take the better part of a week.
>
> That's a little concerning due to the possibility of intervening 
> commits. I'd like to make one requirement for checkin, that you take 
> the revision at which you're committing and then run the script again, 
> verifying that the process produces the same changes as the patch you 
> committed. (Or do things in smaller chunks.).
>

Well, sure there are intervening commits.. the only ones that actually 
matter are the ones which fail to compile because someone made a code 
change which now requires a header that wasn't needed before.  which is 
really a state we're looking for I think.   I fix those up before 
committing.   Its *possible* a conditional compilation issue could creep 
in, but highly unlikely.

I can rerun everything on that revision from just before I committed and 
see if everything is the same.  It'll take a week to find out :-) but 
that seems like a reasonable sanity check.

Andrew
Andrew MacLeod Oct. 6, 2015, 3:02 a.m. UTC | #2
On 10/05/2015 05:11 PM, Andrew MacLeod wrote:
>
> I can switch to -u.. I've just never seen the request before.
>
> I can regenerate the patches with -u if you want.

You are right, the patches are significantly easier to read with -u..  
I've changed my svn diff script.    here's all 3 patches:

Andrew
Jeff Law Oct. 6, 2015, 9:55 p.m. UTC | #3
On 10/05/2015 03:11 PM, Andrew MacLeod wrote:
>
> In any case, a direct include of obstack.h in coretypes.h was considered
> earlier in the aggregation process and it didn't show up as something
> that would be a win.  It is included a couple of common places that we
> have no control over..  in particular libcpp/include/symtab.h includes
> obstack.h and  is included by tree-core.h.  A very significant number of
> files bring that in.  If we included obstack.h in coretypes.h then those
> files would be including it again for a second time for no particularly
> good reason.  So I made the judgement call to not put it in coretypes.h.
And just as important, we can revisit the aggregators and when we do so, 
we ought to be able to answer the question, "if obstack.h is put into 
coretypes.h" does that clean things up elsewhere and re-run the tools to 
clean things up.


>
>> And it's one example, but it does point out a problem with this sort
>> of automated approach: realistically no one is going to check the
>> whole patch, and it may contain changes that could be done better.
>
> The point being that the aggregation *wasn't* automated... and has
> nothing to do with this patch set.    I analyzed and preformed all that
> sort of thing  earlier. Sure judgment calls were made, but it wasn't
> automated in the slightest.   There are certainly further aggregation
> improvements that could be made... and either I or someone else could do
> more down the road.,  The heavy lifting has all been done now.
Agreed.

>
> So  the *only* thing that is automated is removing include files which
> are not needed so that we can get an idea of what the true dependencies
> in the source base are.
Also agreed.

>>> the reduction on all those files will take the better part of a week.
>>
>> That's a little concerning due to the possibility of intervening
>> commits. I'd like to make one requirement for checkin, that you take
>> the revision at which you're committing and then run the script again,
>> verifying that the process produces the same changes as the patch you
>> committed. (Or do things in smaller chunks.).
>>
>
> Well, sure there are intervening commits.. the only ones that actually
> matter are the ones which fail to compile because someone made a code
> change which now requires a header that wasn't needed before.  which is
> really a state we're looking for I think.   I fix those up before
> committing.   Its *possible* a conditional compilation issue could creep
> in, but highly unlikely.
More likely is conditional compilation will be removed :-)  We're trying 
to get away from conditional compilation as a general direction.

Intervening commits are always a problem with this kind of large patch 
that hits many places.   But IMHO, they're an easily managed problem.

jeff
diff mbox

Patch

===================================================================
--- attribs.c   (revision 228331)
+++ attribs.c   (working copy)
@@ -20,17 +20,12 @@ 
  #include "config.h"
  #include "system.h"
  #include "coretypes.h"
-#include "tm.h"
+#include "target.h"
  #include "tree.h"
-#include "alias.h"
  #include "stringpool.h"
+#include "diagnostic-core.h"
  #include "attribs.h"
  #include "stor-layout.h"
-#include "flags.h"
-#include "diagnostic-core.h"
-#include "tm_p.h"
-#include "cpplib.h"
-#include "target.h"
  #include "langhooks.h"
  #include "plugin.h"