Patchwork linux-next: origin tree build failure

login
register
mail settings
Submitter Stephen Rothwell
Date June 12, 2009, 12:24 a.m.
Message ID <20090612102427.32582baa.sfr@canb.auug.org.au>
Download mbox | patch
Permalink /patch/28613/
State Accepted, archived
Headers show

Comments

Stephen Rothwell - June 12, 2009, 12:24 a.m.
Hi all,

Today's linux-next build (powerpc ppc64_defconfig) failed like this:

include/linux/perf_counter.h:677: error: redefinition of 'perf_counter_do_pending'
arch/powerpc/include/asm/hw_irq.h:170: note: previous definition of 'perf_counter_do_pending' was here

Caused by commit 925d519ab82b6dd7aca9420d809ee83819c08db2 ("perf_counter:
unify and fix delayed counter wakeup") which added the former definitions
but forgot the remove the latter.

I applied the following patch.
Paul Mackerras - June 12, 2009, 12:53 a.m.
Stephen Rothwell writes:

> Subject: [PATCH] perfcounters: remove powerpc definitions of perf_counter_do_pending
> 
> Commit 925d519ab82b6dd7aca9420d809ee83819c08db2 ("perf_counter:
> unify and fix delayed counter wakeup") added global definitions.
> 
> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

Acked-by: Paul Mackerras <paulus@samba.org>
Benjamin Herrenschmidt - June 12, 2009, 1 a.m.
On Fri, 2009-06-12 at 10:24 +1000, Stephen Rothwell wrote:

> From: Stephen Rothwell <sfr@canb.auug.org.au>
> Date: Fri, 12 Jun 2009 10:14:22 +1000
> Subject: [PATCH] perfcounters: remove powerpc definitions of perf_counter_do_pending
> 
> Commit 925d519ab82b6dd7aca9420d809ee83819c08db2 ("perf_counter:
> unify and fix delayed counter wakeup") added global definitions.
> 
> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

Linus, please apply. BTW, This is _EXACTLY_ why this should have been in
-next for a few days before being merged :-(

Cheers,
Ben.

> ---
>  arch/powerpc/include/asm/hw_irq.h |    3 ---
>  arch/powerpc/kernel/irq.c         |    1 +
>  2 files changed, 1 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
> index 20a44d0..5351237 100644
> --- a/arch/powerpc/include/asm/hw_irq.h
> +++ b/arch/powerpc/include/asm/hw_irq.h
> @@ -156,8 +156,6 @@ static inline void clear_perf_counter_pending(void)
>  		"i" (offsetof(struct paca_struct, perf_counter_pending)));
>  }
>  
> -extern void perf_counter_do_pending(void);
> -
>  #else
>  
>  static inline unsigned long test_perf_counter_pending(void)
> @@ -167,7 +165,6 @@ static inline unsigned long test_perf_counter_pending(void)
>  
>  static inline void set_perf_counter_pending(void) {}
>  static inline void clear_perf_counter_pending(void) {}
> -static inline void perf_counter_do_pending(void) {}
>  #endif /* CONFIG_PERF_COUNTERS */
>  
>  #endif	/* __KERNEL__ */
> diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
> index feff792..844d3f8 100644
> --- a/arch/powerpc/kernel/irq.c
> +++ b/arch/powerpc/kernel/irq.c
> @@ -53,6 +53,7 @@
>  #include <linux/bootmem.h>
>  #include <linux/pci.h>
>  #include <linux/debugfs.h>
> +#include <linux/perf_counter.h>
>  
>  #include <asm/uaccess.h>
>  #include <asm/system.h>
Ingo Molnar - June 12, 2009, 9:20 a.m.
* Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Fri, 2009-06-12 at 10:24 +1000, Stephen Rothwell wrote:
> 
> > From: Stephen Rothwell <sfr@canb.auug.org.au>
> > Date: Fri, 12 Jun 2009 10:14:22 +1000
> > Subject: [PATCH] perfcounters: remove powerpc definitions of perf_counter_do_pending
> > 
> > Commit 925d519ab82b6dd7aca9420d809ee83819c08db2 ("perf_counter:
> > unify and fix delayed counter wakeup") added global definitions.
> > 
> > Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
> 
> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Ah - thanks. The bug was caused by me being a bit too optimistic in 
applying the shiny-new Power7 support patches on the last day. (nice 
CPU btw.)

> Linus, please apply. BTW, This is _EXACTLY_ why this should have 
> been in -next for a few days before being merged :-(

Not really: for example current upstream is build-broken on x86 due 
to an integration artifact via the kmemleak tree - despite it having 
been in linux-next for months.

Paulus was building and booting powerpc on a daily basis and i ran 
cross-builds as well.

Such bugs happen, and they are easy enough to fix. What matters 
arent the 1-2 short-lived bugs that do happen when a new combination 
of trees is created, but the long-lived combination bugs and 
conflicts.

	Ingo
Benjamin Herrenschmidt - June 12, 2009, 9:33 a.m.
> Ah - thanks. The bug was caused by me being a bit too optimistic in 
> applying the shiny-new Power7 support patches on the last day. (nice 
> CPU btw.)

In that case paulus tells me it's actually Peter screwing up moving
something from the powerpc code to generic :-)

 .../...

> Such bugs happen, and they are easy enough to fix. What matters 
> arent the 1-2 short-lived bugs that do happen when a new combination 
> of trees is created, but the long-lived combination bugs and 
> conflicts.

I'm not saying -next would fix world hunger ... but in this case we have
two sets of issues, perfctr and the init ordering change which both got
merged totally bypassing -next... We should at least -try- to follow the
process we've defined, don't you think ?

Cheers,
Ben.
Peter Zijlstra - June 12, 2009, 9:43 a.m.
On Fri, 2009-06-12 at 19:33 +1000, Benjamin Herrenschmidt wrote:
> We should at least -try- to follow the
> process we've defined, don't you think ?

So you're saying -next should include whole new subsystems even though
its not clear they will be merged?

That'll invariably create the opposite case where a tree doesn't get
pulled and breaks bits due to its absence.

-next does a great job of sorting the existing subsystem trees, but I
don't think its Stephens job to decide if things will get merged.

Therefore when things are in limbo (there was no definite ACK from Linus
on perf counters) both inclusion and exclusion from -next can lead to
trouble.
Ingo Molnar - June 12, 2009, 9:55 a.m.
* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> On Fri, 2009-06-12 at 19:33 +1000, Benjamin Herrenschmidt wrote:
> > We should at least -try- to follow the
> > process we've defined, don't you think ?
>
> So you're saying -next should include whole new subsystems even 
> though its not clear they will be merged?
> 
> That'll invariably create the opposite case where a tree doesn't 
> get pulled and breaks bits due to its absence.
> 
> -next does a great job of sorting the existing subsystem trees, 
> but I don't think its Stephens job to decide if things will get 
> merged.
> 
> Therefore when things are in limbo (there was no definite ACK from 
> Linus on perf counters) both inclusion and exclusion from -next 
> can lead to trouble.

Precisely. linux-next is for the uncontroversial stuff from existing 
subsystems. Sometimes for features pushed by or approved by existing 
subsystem maintainers. But it is not for controversial stuff - Linus 
is the upstream maintainer, not Stephen.

We had a real mess with perfmon3 which was included into linux-next 
in a rouge way without Cc:-ing the affected maintainers and against 
the maintainers. There was a repeat incident recently as well, where 
a tree was included into linux-next without the approval (and 
without the Cc:) of affected maintainers. linux-next needs to be 
more careful about adding trees.

All in one, we did the same with perfcounters that we expected of 
perfmonv3. No double standard.

Nor is there any real issue here. The bug was my fault, it was 
trivial to fix, it affects a small subset of testers and it is 
already upstream, applied on the same day perfcounters were pulled.

	Ingo
Benjamin Herrenschmidt - June 12, 2009, 9:57 a.m.
On Fri, 2009-06-12 at 11:43 +0200, Peter Zijlstra wrote:
> On Fri, 2009-06-12 at 19:33 +1000, Benjamin Herrenschmidt wrote:
> > We should at least -try- to follow the
> > process we've defined, don't you think ?
> 
> So you're saying -next should include whole new subsystems even though
> its not clear they will be merged?

Maybe yes. And if there's some debate as to whether it should be merged
or not, maybe Linus should make the decision, let -next carry it for a
few days to iron out those problems, and -then- merge it.

> That'll invariably create the opposite case where a tree doesn't get
> pulled and breaks bits due to its absence.
> 
> -next does a great job of sorting the existing subsystem trees, but I
> don't think its Stephens job to decide if things will get merged.

No, it's not, but then, maybe Linus could play the game and -tell- us
whether he intend to merge or not at least a few days in advance :-)

> Therefore when things are in limbo (there was no definite ACK from Linus
> on perf counters) both inclusion and exclusion from -next can lead to
> trouble.

Well, Linus did ACK by merging :-) So he should have been able to give
that ack a few days in advance too..

Cheers,
Ben.
Ingo Molnar - June 12, 2009, 12:53 p.m.
* Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> > Ah - thanks. The bug was caused by me being a bit too optimistic 
> > in applying the shiny-new Power7 support patches on the last 
> > day. (nice CPU btw.)
> 
> In that case paulus tells me it's actually Peter screwing up 
> moving something from the powerpc code to generic :-)

Yes, but i committed it and it's my task to make sure that the thing 
works as a whole so it's my fault still :)

>  .../...
> 
> > Such bugs happen, and they are easy enough to fix. What matters 
> > arent the 1-2 short-lived bugs that do happen when a new 
> > combination of trees is created, but the long-lived combination 
> > bugs and conflicts.
> 
> I'm not saying -next would fix world hunger ... but in this case 
> we have two sets of issues, perfctr and the init ordering change 
> which both got merged totally bypassing -next... We should at 
> least -try- to follow the process we've defined, don't you think ?

You are trying to define a process that does not exist in that form 
and which never existed in that form.

It was never true that new code _MUST_ go via linux-next - and i 
hope it will never be true.

linux-next has integration testing so that interactions between 
maintainer trees are mapped and that architectures that otherwise 
few people use get build-tested too (well beyond their practical 
relevance, i have to add) - but there's little critical review done 
in linux-next. Nor should it be the forum for that, it simply 
contains way too much stuff and has a weird history format with 
daily rebases that makes review hard and expensive in that form.

linux-next should not be second-guessing maintainers and should not 
act as an "approval forum" for controversial features, increasing 
the (already quite substantial) pressure on maintainers to apply 
more crap.

And that is true even if it's a new feature that i happen to support 
- as in this case - it sure would have been handy to have more 
perfcounters test coverage, every little bit of extra testing helps.

If linux-next wants to do that then it should be renamed to 
something else and not called linux-next.

	Ingo
Benjamin Herrenschmidt - June 12, 2009, 1:10 p.m.
On Fri, 2009-06-12 at 14:53 +0200, Ingo Molnar wrote:
> * Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> linux-next has integration testing so that interactions between 
> maintainer trees are mapped and that architectures that otherwise 
> few people use get build-tested too (well beyond their practical 
> relevance, i have to add) - but there's little critical review done 
> in linux-next. Nor should it be the forum for that, it simply 
> contains way too much stuff and has a weird history format with 
> daily rebases that makes review hard and expensive in that form.

I think you are mixing several issues. One is integration testing, one
is the problem of remote architecture of subsystems testing...

> linux-next should not be second-guessing maintainers and should not 
> act as an "approval forum" for controversial features, increasing 
> the (already quite substantial) pressure on maintainers to apply 
> more crap.

I agree here. That's not the point. The idea is that for things that
-are- approved by their respective maintainers, to get some integration
testing and ironing of those mechanical bugs so that by the time they
hit mainstream, they don't break bisection among others.

Yes, next is -not- the place to debate controversial features. That's
not, I believe, why it was initiated (I may be wrong, Stephen will
correct me if I am), but the way I see things is that stuff that is
meant to be merged gets a chance to get some of that integration testing
against all the other stuff that is also meant to be merged to limit the
amount of clash and problems once we hit Linus tree.

> And that is true even if it's a new feature that i happen to support 
> - as in this case - it sure would have been handy to have more 
> perfcounters test coverage, every little bit of extra testing helps.

That doesn't invalidate my point. We are not talking about whether
perfcounters is worth merging or not, testing more or not, but strictly,
imho, about getting a chance (a couple of days at least) to do that
integration testing and catch the simple issues like the one that
triggered my initial rant -before- they hit mainline.

To some extent, here, the issue is on Linus side and it's up to him (Hey
Linus ! still listening ?) to maybe be more proactive at giving an ack
or nack so that we can get a chance to do that final pass of ironing out
the mechanical bugs before we hit the main tree.

Cheers,
Ben.

> If linux-next wants to do that then it should be renamed to 
> something else and not called linux-next.
> 
> 	Ingo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
Benjamin Herrenschmidt - June 12, 2009, 1:29 p.m.
On Fri, 2009-06-12 at 23:10 +1000, Benjamin Herrenschmidt wrote:
> On Fri, 2009-06-12 at 14:53 +0200, Ingo Molnar wrote:

> To some extent, here, the issue is on Linus side and it's up to him (Hey
> Linus ! still listening ?) to maybe be more proactive at giving an ack
> or nack so that we can get a chance to do that final pass of ironing out
> the mechanical bugs before we hit the main tree.

Let me add a little bit more background to my reasoning here and why I
think having this integration testing step is so valuable...

It all boils down to bisection and having a bisectable tree.

Yes, I hate bisecting and I'm sure you are the same. It's a major PITA
and in most cases, I'm better off tracking down the actual bug and then
finding how it came into being.

However, what the ability to have a reasonably bisectable tree buys us
is all those users, testers, good wills, etc... people who do not have
the knowledge, skill, familiarity with the code etc... to track the bug
down, to be able to still find out what precise patch brought that pesky
regression that doesn't happen on anybody else machine, and thus brings
us some useful material to work with when we cannot reproduce the exact
same setup on our own machines.

Yes, I and I'm sure you can deal with a bisection breakage caused by a
minor screweup like the one we are talking about. But our testers often
can't and will just give up.

It has -nothing- to do with whether the patches are controversial or
not, it is purely about trying to make sure that things going into linus
tree had at least a few days of churning by the various involved parties
to try to get closer to the graal of a fully bisectable tree.

At least that's how I see it.

Now, we may disagree and I'm happy to discuss that more around a beer at
next KS, and to some extent, what is done is done, and if we screwed up
with -next vs. perfmon, then so be it and let's learn from our mistakes,
but I believe it makes a lot of sense to have that staging area that
helps us making sure that within a merge window with gazillion things
being merged pretty much at once, we keep this ability for our users and
testers to track down which individual patch broke something.

Cheers,
Ben.
Ingo Molnar - June 12, 2009, 1:44 p.m.
* Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> > linux-next should not be second-guessing maintainers and should 
> > not act as an "approval forum" for controversial features, 
> > increasing the (already quite substantial) pressure on 
> > maintainers to apply more crap.
> 
> I agree here. That's not the point. The idea is that for things 
> that -are- approved by their respective maintainers, to get some 
> integration testing and ironing of those mechanical bugs so that 
> by the time they hit mainstream, they don't break bisection among 
> others.

This is certainly doable for agreeable features - which is the bulk 
- and it is being done.

But this is a catch-22 for _controversial_ new features - which 
perfcounters clearly was, in case you turned off your lkml 
subscription ;-)

And if you hit that build breakage during bisection you can do:

   git cherry-pick e14112d

Also, you seem to brush off the notion that far more bugs slip 
through linux-next than get caught by it.

So if you think linux-next matters in terms of _regression_ testing, 
the numbers dont seem to support that notion. This particular 
incident does support that notion though, granted - but it's taken 
out of context IMHO:

In terms of test coverage, at least for our trees, less than 1% of 
the bugs we handle get reported in a linux-next context - and most 
of the bugs that get reported (against say the scheduler tree) are 
related to rare architectures.

In fact, i checked, there were _zero_ x86 bugs reported against 
linux-next and solved against it between v2.6.30-rc1 and v2.6.30:

   git log --grep=next -i v2.6.30-rc1..v2.6.30 arch/x86/

Doing it over the full cycle shows one commit altogether - a Xen 
build failure. In fact, i just checked the whole stabilization cycle 
for the whole kernel (v2.6.30-rc1..v2.6.30-final), and there were 
only 5 linux-next originated patches, most of them build failures.

I did this by looking at all occurances of 'next', in all commit 
logs:

   git log --grep=next -i v2.6.30-rc1..v2.6.30

and then manually checking the context of all 'next' matches and 
counting the linux-next related commits.

So lets be generous and say that because some people dont put the 
bug report originator into the changelog it was four times as many, 
20 - but that's still dwarved by the sheer amount of post-rc1 
changes: thousands of changes and hundreds of regressions.

linux-next is mostly useful (to me at least) not for the 
cross-builds it does, but in terms of mapping out upcoming conflicts 
- which also drives early detection of problematic patches and 
problematic conflicts.

	Ingo
Ingo Molnar - June 12, 2009, 1:49 p.m.
* Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Fri, 2009-06-12 at 23:10 +1000, Benjamin Herrenschmidt wrote:
> > On Fri, 2009-06-12 at 14:53 +0200, Ingo Molnar wrote:
> 
> > To some extent, here, the issue is on Linus side and it's up to him (Hey
> > Linus ! still listening ?) to maybe be more proactive at giving an ack
> > or nack so that we can get a chance to do that final pass of ironing out
> > the mechanical bugs before we hit the main tree.
> 
> Let me add a little bit more background to my reasoning here and why I
> think having this integration testing step is so valuable...
> 
> It all boils down to bisection and having a bisectable tree.

I think you are way too concentrated on this particular incident, 
and you are generalizing it into something that is not so in 
practice.

Even in this particular case, there's just 3 other commit points in 
the Git tree between commit 8a1ca8c (the breakage on PowerPC) and 
e14112d (the fix). We'll have up to 10,000 commits.

I bisect on an almost daily basis, and i'm not seeing unreasonable 
problems.

	Ingo
Benjamin Herrenschmidt - June 12, 2009, 1:56 p.m.
On Fri, 2009-06-12 at 15:44 +0200, Ingo Molnar wrote:

> This is certainly doable for agreeable features - which is the bulk 
> - and it is being done.
> 
> But this is a catch-22 for _controversial_ new features - which 
> perfcounters clearly was, in case you turned off your lkml 
> subscription ;-)

I didn't :-) My point here is that Linus can make a decision with an
email -before- merging so that -next gets a chance, at least for a
couple of days, to do the integration testing once the controversy has
been sorted by his highness.

> And if you hit that build breakage during bisection you can do:
> 
>    git cherry-pick e14112d

Right, I can, you can, but can random tester who wants to track down
what his problem is ? I'm not sure...

> Also, you seem to brush off the notion that far more bugs slip 
> through linux-next than get caught by it.

Less than without linux-next. We aren't perfect and no process will
solve everything. But this could have been easily avoided.

> So if you think linux-next matters in terms of _regression_ testing, 
> the numbers dont seem to support that notion. This particular 
> incident does support that notion though, granted - but it's taken 
> out of context IMHO:
> 
> In terms of test coverage, at least for our trees, less than 1% of 
> the bugs we handle get reported in a linux-next context - and most 
> of the bugs that get reported (against say the scheduler tree) are 
> related to rare architectures.

But most obvious bugs will have been caught way before that, which
leaves the hard to catch ones or the configuration-specific ones. Those
will pass linux-next, I agree. But that isn't my point and that isn't
what linux-next will catch.  What is will catch is that kind of really
simple mechanical problems, such as build breakage for other archs.

If perfcounters had been 1 or 2 days in -next before being merged, we
would have avoided that problem and made everybody's bisecting life
easier.

> In fact, i checked, there were _zero_ x86 bugs reported against 
> linux-next and solved against it between v2.6.30-rc1 and v2.6.30:

No but Stephen caught a bunch of mechanical compile fails due to
integration problems.

>    git log --grep=next -i v2.6.30-rc1..v2.6.30 arch/x86/
> 
> Doing it over the full cycle shows one commit altogether - a Xen 
> build failure. In fact, i just checked the whole stabilization cycle 
> for the whole kernel (v2.6.30-rc1..v2.6.30-final), and there were 
> only 5 linux-next originated patches, most of them build failures.
> 
> I did this by looking at all occurances of 'next', in all commit 
> logs:
> 
>    git log --grep=next -i v2.6.30-rc1..v2.6.30
> 
> and then manually checking the context of all 'next' matches and 
> counting the linux-next related commits.
> 
> So lets be generous and say that because some people dont put the 
> bug report originator into the changelog it was four times as many, 
> 20 - but that's still dwarved by the sheer amount of post-rc1 
> changes: thousands of changes and hundreds of regressions.
> 
> linux-next is mostly useful (to me at least) not for the 
> cross-builds it does, but in terms of mapping out upcoming conflicts 
> - which also drives early detection of problematic patches and 
> problematic conflicts.

Yes, it does. The problem is that it helps -you- that way, but won't
help -us- vs. that kind of mechanical problems unless -you- also play
the game and get your stuff in there for a little while before merging
it :-)

Cheers,
Ben.
Benjamin Herrenschmidt - June 12, 2009, 2:06 p.m.
On Fri, 2009-06-12 at 15:49 +0200, Ingo Molnar wrote:
> * Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> 
> > On Fri, 2009-06-12 at 23:10 +1000, Benjamin Herrenschmidt wrote:
> > > On Fri, 2009-06-12 at 14:53 +0200, Ingo Molnar wrote:
> > 
> > > To some extent, here, the issue is on Linus side and it's up to him (Hey
> > > Linus ! still listening ?) to maybe be more proactive at giving an ack
> > > or nack so that we can get a chance to do that final pass of ironing out
> > > the mechanical bugs before we hit the main tree.
> > 
> > Let me add a little bit more background to my reasoning here and why I
> > think having this integration testing step is so valuable...
> > 
> > It all boils down to bisection and having a bisectable tree.
> 
> I think you are way too concentrated on this particular incident, 
> and you are generalizing it into something that is not so in 
> practice.

Maybe. But maybe it's representative... so far in this merge window,
100% of the powerpc build and runtime breakage upstream comes from stuff
that didn't get into -next before.

Some of the runtime breakage in powerpc-next comes from my own bugs,
indeed, and fortunately I caught it before I asked Linus to pull.

Cheers,
Ben.
Ingo Molnar - June 12, 2009, 2:07 p.m.
* Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Fri, 2009-06-12 at 15:44 +0200, Ingo Molnar wrote:
> 
> > This is certainly doable for agreeable features - which is the bulk 
> > - and it is being done.
> > 
> > But this is a catch-22 for _controversial_ new features - which 
> > perfcounters clearly was, in case you turned off your lkml 
> > subscription ;-)
> 
> I didn't :-) My point here is that Linus can make a decision with 
> an email -before- merging so that -next gets a chance, at least 
> for a couple of days, to do the integration testing once the 
> controversy has been sorted by his highness.

Uhm, the bug you are making a big deal of would have been found and 
fixed by Paulus a few hours after any such mail - and probably by me 
too as i do daily cross builds to Power.

So yes, we had a bug, but any extra linux-next hoops would not have 
prevented it: i could still have messed up by getting lured by that 
nice piece of Power7 hardware enablement patch on the last day ;-)

So the bug was my fault for being too fast-and-loose with that 
particular patch, creating a ~5-commits-hop build breakage bisection 
window on Power.

Now that i'm sufficiently chastised, can we now move on please? :)

	Ingo
Ingo Molnar - June 12, 2009, 2:11 p.m.
* Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Fri, 2009-06-12 at 15:49 +0200, Ingo Molnar wrote:
> > * Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> > 
> > > On Fri, 2009-06-12 at 23:10 +1000, Benjamin Herrenschmidt wrote:
> > > > On Fri, 2009-06-12 at 14:53 +0200, Ingo Molnar wrote:
> > > 
> > > > To some extent, here, the issue is on Linus side and it's up to him (Hey
> > > > Linus ! still listening ?) to maybe be more proactive at giving an ack
> > > > or nack so that we can get a chance to do that final pass of ironing out
> > > > the mechanical bugs before we hit the main tree.
> > > 
> > > Let me add a little bit more background to my reasoning here and why I
> > > think having this integration testing step is so valuable...
> > > 
> > > It all boils down to bisection and having a bisectable tree.
> > 
> > I think you are way too concentrated on this particular incident, 
> > and you are generalizing it into something that is not so in 
> > practice.
> 
> Maybe. But maybe it's representative... so far in this merge 
> window, 100% of the powerpc build and runtime breakage upstream 
> comes from stuff that didn't get into -next before.

But that's axiomatic, isnt it? linux-next build-tests PowerPC as the 
first in the row of tests - so no change that was in linux-next can 
ever cause a build failure on PowerPC, right?

	Ingo
Benjamin Herrenschmidt - June 12, 2009, 2:19 p.m.
> Uhm, the bug you are making a big deal of would have been found and 
> fixed by Paulus a few hours after any such mail - and probably by me 
> too as i do daily cross builds to Power.
> 
> So yes, we had a bug, but any extra linux-next hoops would not have 
> prevented it: i could still have messed up by getting lured by that 
> nice piece of Power7 hardware enablement patch on the last day ;-)
> 
> So the bug was my fault for being too fast-and-loose with that 
> particular patch, creating a ~5-commits-hop build breakage bisection 
> window on Power.
> 
> Now that i'm sufficiently chastised, can we now move on please? :)

Sure we can :-) My point is, get a break before you merge upstream :-)

Cheers,
Ben.
Benjamin Herrenschmidt - June 12, 2009, 2:23 p.m.
On Fri, 2009-06-12 at 16:11 +0200, Ingo Molnar wrote:
> > Maybe. But maybe it's representative... so far in this merge 
> > window, 100% of the powerpc build and runtime breakage upstream 
> > comes from stuff that didn't get into -next before.
> 
> But that's axiomatic, isnt it? linux-next build-tests PowerPC as the 
> first in the row of tests - so no change that was in linux-next can 
> ever cause a build failure on PowerPC, right?

I'd have to check with Stephen but I think linux-next tests a whole
bunch of archs each round. Anyway, the idea is, just don't get things
upstream before the at least had a chance to go through that little bit
of integration testing .. Is it -that- hard ?

Oh and before you ask me, yes, I do the same mistakes, and I have been
caught too merging things at the last minute that ended up broken and
that could have been caught by -next... I'm just trying to advocate the
idea that we all try to improve in that area :-)

Cheers,
Ben.
Stephen Rothwell - June 13, 2009, 4:54 a.m.
Hi Ingo,

On Fri, 12 Jun 2009 15:44:28 +0200 Ingo Molnar <mingo@elte.hu> wrote:
>
> In terms of test coverage, at least for our trees, less than 1% of 
> the bugs we handle get reported in a linux-next context - and most 
> of the bugs that get reported (against say the scheduler tree) are 
> related to rare architectures.

I expect that most bugs get reported and fixed before code gets to
linux-next (in fact one of the prerequisites for being in linux-next is
that code has been tested as well as possible).

> In fact, i checked, there were _zero_ x86 bugs reported against 
> linux-next and solved against it between v2.6.30-rc1 and v2.6.30:
> 
>    git log --grep=next -i v2.6.30-rc1..v2.6.30 arch/x86/
> 
> Doing it over the full cycle shows one commit altogether - a Xen 
> build failure. In fact, i just checked the whole stabilization cycle 
> for the whole kernel (v2.6.30-rc1..v2.6.30-final), and there were 
> only 5 linux-next originated patches, most of them build failures.

Nice set of figures.  For some other context, between April 6 and June 9
(2.6.30-rc1 to 2.6.30) I sent 50 emails with subjects like "linux-next:
xxx tree build failure".  What results from those emails?  I sometimes
don't even hear back.  Almost all of the failures get fixed.

A lot of these probably also get discovered independently.  I don't
really care as long as they do get fixed.

One of those failures was a sparc build failure due to a change in the
tip-core tree (see commit d2de688891909b148efe83a6fc9520a9cd6015f0).
Another report produced commit 27b19565fe4ca5b0e9d2ae98ce4b81ca728bf445.
Stephen Rothwell - June 13, 2009, 5:06 a.m.
Hi Ingo,

On Fri, 12 Jun 2009 16:11:18 +0200 Ingo Molnar <mingo@elte.hu> wrote:
>
> But that's axiomatic, isnt it? linux-next build-tests PowerPC as the 
> first in the row of tests - so no change that was in linux-next can 
> ever cause a build failure on PowerPC, right?

Not really.  I build a powerpc ppc64_defconfig and an x86_64 allmodconfig
between merging most trees.  At the end of the day, I do the following
builds before releasing linux-next:

powerpc allnoconfig
powerpc64 allnoconfig
powerpc ppc44x_defconfig
powerpc allyesconfig
i386 defconfig
sparc64 defconfig
sparc defconfig

Which clearly doesn't cover all possible configs, but is a start and
catches a lot (the powerpc allyesconfig is only 64 bit).

Then after release, linux-next gets built for a lot of architectures and
configs (see http://kisskb.ellerman.id.au/kisskb/branch/9/).  A couple of
people also do randconfig builds which find all sorts of things.

Patch

diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
index 20a44d0..5351237 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -156,8 +156,6 @@  static inline void clear_perf_counter_pending(void)
 		"i" (offsetof(struct paca_struct, perf_counter_pending)));
 }
 
-extern void perf_counter_do_pending(void);
-
 #else
 
 static inline unsigned long test_perf_counter_pending(void)
@@ -167,7 +165,6 @@  static inline unsigned long test_perf_counter_pending(void)
 
 static inline void set_perf_counter_pending(void) {}
 static inline void clear_perf_counter_pending(void) {}
-static inline void perf_counter_do_pending(void) {}
 #endif /* CONFIG_PERF_COUNTERS */
 
 #endif	/* __KERNEL__ */
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index feff792..844d3f8 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -53,6 +53,7 @@ 
 #include <linux/bootmem.h>
 #include <linux/pci.h>
 #include <linux/debugfs.h>
+#include <linux/perf_counter.h>
 
 #include <asm/uaccess.h>
 #include <asm/system.h>