diff mbox

Re: Linkstation Mini and __machine_arch_type problem, not booting since 3.8

Message ID 1434593555.13334.14.camel@dolka.fr
State New
Headers show

Commit Message

Benjamin Cama June 18, 2015, 2:12 a.m. UTC
Hi again (feeling lonely),

I have got some news, I think I found the culprit.

Le mardi 16 juin 2015 à 11:20 +0200, Benjamin Cama a écrit :
> Le lundi 15 juin 2015 à 15:51 +0200, Benjamin Cama a écrit :
> > it didn't boot anymore at
> > all: no message at all displayed, not even with earlyprintk
> 
> Note that I tried the earlyprintk=serial,ttyS0,115200 with a working
> kernel as I do for the non-working ones, and it hangs at boot too, so
> this is no indication, sorry.

(Note for later: read the manual first, earlyprintk on this arch is for
a special tool to handle on the serial port on the host side. So it
looks like it hangs when running with it.)

> > I bisected
> > the faulty commit down to b8b499c86be58cb309964fcab5b62ac4a240a878 
> > “ARM:
> > 7602/1: Pass real "__machine_arch_type" variable to 
> > setup_machine_tags()
> > procedure” which looks like a quite broad change, and makes me 1) 
> > not
> > really understand what it does 2) astonished not to see someone 
> > else
> > affected (judging by the time since it doesn't work).
> 
> I dug some more and found what changed: it simply passes the bootloader
> -provided machine ID instead of the kernel-configured one! This is
> quite a change indeed, which is not mentionned in the commit text. This
> may seem good for vendor-provided support which assigns IDs early with
> upstream, but for user one (like for this machine), the machine ID will
> in general be some ID that has nothing to do with the one passed by the
> bootloader. So this ought to be mentioned somewhere!

No comment on this? I continued with my hack which hardcodes the
machine ID, but has anyone some advice on how it should be done? Should
the behavior be reverted, or is there a clean way to force a machine ID
at build time?

> How I found this? Well, my bootloader passes a bad machine ID (526
> instead of 1858), and thus the code doesn't recognize it. What I cannot
> understand is why the decompressor doesn't even run with a wrong
> machine ID (MMU initialization that depend on it maybe?…). This is very
> strange.

(that was because I enabled earlyprintk, which for some reason removes
messages from the decompressor… or maybe I mixed things up while
configuring the kernel)

> And also, I don't get why a more recent kernel with a hardcoded machine
> ID (personal modification) doesn't work either. Maybe I should bisect
> that too.

And I did it. And I stumbled upon commit
a71b092a9c68685a270ebdde7b5986ba8787e575 “ARM: Convert handle_IRQ to
use __handle_domain_irq” (author Cc'ed). The new function
__handle_domain_irq (in kernel/irq/irqdesc.c, which comes just two
commits before, with 76ba59f8366f2d9282cb5bda9de75b4b68cbe55f) is
subtly different from the one it replaces, handle_IRQ, as it checks if
the irq is not 0 as well as checking for an upper bound. Removing the
check for 0 makes my machine work again! But honestly, I do not know if
a zero irq is legit, so I hope some more knowledgeable people will tell
me if this is OK.

-- >8 --
Subject: [PATCH] Make __handle_domain_irq accept IRQ 0

The same as handle_IRQ did before.

Signed-off-by: Benjamin Cama <benoar@dolka.fr>
---
 kernel/irq/irqdesc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Marc Zyngier June 18, 2015, 7:52 a.m. UTC | #1
Hi Benjamin,

On 18/06/15 03:12, Benjamin Cama wrote:
> Hi again (feeling lonely),

[...]

> And I did it. And I stumbled upon commit
> a71b092a9c68685a270ebdde7b5986ba8787e575 “ARM: Convert handle_IRQ to
> use __handle_domain_irq” (author Cc'ed). The new function
> __handle_domain_irq (in kernel/irq/irqdesc.c, which comes just two
> commits before, with 76ba59f8366f2d9282cb5bda9de75b4b68cbe55f) is
> subtly different from the one it replaces, handle_IRQ, as it checks if
> the irq is not 0 as well as checking for an upper bound. Removing the
> check for 0 makes my machine work again! But honestly, I do not know if
> a zero irq is legit, so I hope some more knowledgeable people will tell
> me if this is OK.
> 
> -- >8 --
> Subject: [PATCH] Make __handle_domain_irq accept IRQ 0
> 
> The same as handle_IRQ did before.
> 
> Signed-off-by: Benjamin Cama <benoar@dolka.fr>
> ---
>  kernel/irq/irqdesc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
> index a1782f8..bfbeeb6 100644
> --- a/kernel/irq/irqdesc.c
> +++ b/kernel/irq/irqdesc.c
> @@ -365,7 +365,7 @@ int __handle_domain_irq(struct irq_domain *domain, unsigned int hwirq,
>          * Some hardware gives randomly wrong interrupts.  Rather
>          * than crashing, do something sensible.
>          */
> -       if (unlikely(!irq || irq >= nr_irqs)) {
> +       if (unlikely(irq >= nr_irqs)) {
>                 ack_bad_irq(irq);
>                 ret = -EINVAL;
>         } else {
> 

Unfortunately, this is the wrong thing to do. IRQ0 is invalid, has been
for a very long time, and actually represents the lack of interrupt.

The way you can address this is by making sure your favourite platform
does not use IRQ0 at all, which is done by not assuming that Linux IRQ
number (which is always completely virtual) is the same as the number
designating the actual HW interrupt line.

For example, have a look at 18f3aec (ARM: 8230/1: sa1100: shift IRQs by
one) for an example of such a (very simple) conversion. You'll need to
tweak irq.c too.

Other commits for sa1100 will hopefully convince you to switch to irq
domains altogether. This will greatly facilitate a possible further
transition to DT if you wish to do so.

Looking forward to reviewing your patches,

	M.
Arnd Bergmann June 18, 2015, 8:14 a.m. UTC | #2
On Thursday 18 June 2015 08:52:08 Marc Zyngier wrote:
> 
> On 18/06/15 03:12, Benjamin Cama wrote:
> 
> Unfortunately, this is the wrong thing to do. IRQ0 is invalid, has been
> for a very long time, and actually represents the lack of interrupt.
> 
> The way you can address this is by making sure your favourite platform
> does not use IRQ0 at all, which is done by not assuming that Linux IRQ
> number (which is always completely virtual) is the same as the number
> designating the actual HW interrupt line.
> 
> For example, have a look at 18f3aec (ARM: 8230/1: sa1100: shift IRQs by
> one) for an example of such a (very simple) conversion. You'll need to
> tweak irq.c too.
> 
> Other commits for sa1100 will hopefully convince you to switch to irq
> domains altogether. This will greatly facilitate a possible further
> transition to DT if you wish to do so.
> 
> Looking forward to reviewing your patches,

Converting to DT should indeed solve the problem, as that uses
a more modern irqchip driver that does not use IRQ0.

Thomas Petazzoni has worked on converting orion5x machines to DT
in the past, I've cc'd him and the mvebu maintainers here, they
should be able to comment on what is actually required to do the
conversion. It's possible that all drivers you need have already
been converted and you just need to add a new dts files similar to
the other arch/arm/boot/dts/orion5x-*.dts files and can remove
that other board file.

	Arnd
Andrew Lunn June 18, 2015, 1:23 p.m. UTC | #3
> Thomas Petazzoni has worked on converting orion5x machines to DT
> in the past, I've cc'd him and the mvebu maintainers here, they
> should be able to comment on what is actually required to do the
> conversion. It's possible that all drivers you need have already
> been converted and you just need to add a new dts files similar to
> the other arch/arm/boot/dts/orion5x-*.dts files and can remove
> that other board file.

I had a quick look at lsmini-setup.c. Everything it uses is DT ready,
so the conversion should be painless.

   Andrew
Benjamin Cama June 19, 2015, 1:38 a.m. UTC | #4
Hi Marc,

Le jeudi 18 juin 2015 à 08:52 +0100, Marc Zyngier a écrit :
> On 18/06/15 03:12, Benjamin Cama wrote:
> > And I did it. And I stumbled upon commit
> > a71b092a9c68685a270ebdde7b5986ba8787e575 “ARM: Convert handle_IRQ to
> > use __handle_domain_irq” (author Cc'ed). The new function
> > __handle_domain_irq (in kernel/irq/irqdesc.c, which comes just two
> > commits before, with 76ba59f8366f2d9282cb5bda9de75b4b68cbe55f) is
> > subtly different from the one it replaces, handle_IRQ, as it checks if
> > the irq is not 0 as well as checking for an upper bound. Removing the
> > check for 0 makes my machine work again! But honestly, I do not know if
> > a zero irq is legit, so I hope some more knowledgeable people will tell
> > me if this is OK.
> > 
> > -- >8 --
> > Subject: [PATCH] Make __handle_domain_irq accept IRQ 0
> > 
> > The same as handle_IRQ did before.
> > 
> > Signed-off-by: Benjamin Cama <benoar@dolka.fr>
> > ---
> >  kernel/irq/irqdesc.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
> > index a1782f8..bfbeeb6 100644
> > --- a/kernel/irq/irqdesc.c
> > +++ b/kernel/irq/irqdesc.c
> > @@ -365,7 +365,7 @@ int __handle_domain_irq(struct irq_domain 
> > *domain, unsigned int hwirq,
> >          * Some hardware gives randomly wrong interrupts.  Rather
> >          * than crashing, do something sensible.
> >          */
> > -       if (unlikely(!irq || irq >= nr_irqs)) {
> > +       if (unlikely(irq >= nr_irqs)) {
> >                 ack_bad_irq(irq);
> >                 ret = -EINVAL;
> >         } else {
> > 
> 
> Unfortunately, this is the wrong thing to do. IRQ0 is invalid, has been
> for a very long time, and actually represents the lack of interrupt.

OK, sorry for the mistake, I didn't know. Shouldn't the IRQ
initialization routine check this and warn the user that it may cause
problems? “Silently” making IRQ0 forbidden doesn't help for the
platforms that are not fixed yet.

> The way you can address this is by making sure your favourite platform
> does not use IRQ0 at all, which is done by not assuming that Linux IRQ
> number (which is always completely virtual) is the same as the number
> designating the actual HW interrupt line.
>
> For example, have a look at 18f3aec (ARM: 8230/1: sa1100: shift IRQs by
> one) for an example of such a (very simple) conversion. You'll need to
> tweak irq.c too.
> 
> Other commits for sa1100 will hopefully convince you to switch to irq
> domains altogether. This will greatly facilitate a possible further
> transition to DT if you wish to do so.

I read quite a bit about all this virtual/hardware IRQ stuff, and the
irq domains, interesting. I see that orion5x seems to be one of the
last platforms not using it (appart from DT-converted boards); is it a
bad sign about its durability?…

> Looking forward to reviewing your patches,

Thanks for your help and support. As the others suggested (thanks to
all of you too), I will try the DT way directly then.

Regards,
--
benjamin
Marc Zyngier June 19, 2015, 9:13 a.m. UTC | #5
On 19/06/15 02:38, Benjamin Cama wrote:
> Hi Marc,
> 
> Le jeudi 18 juin 2015 à 08:52 +0100, Marc Zyngier a écrit :
>> On 18/06/15 03:12, Benjamin Cama wrote:
>>> And I did it. And I stumbled upon commit
>>> a71b092a9c68685a270ebdde7b5986ba8787e575 “ARM: Convert handle_IRQ to
>>> use __handle_domain_irq” (author Cc'ed). The new function
>>> __handle_domain_irq (in kernel/irq/irqdesc.c, which comes just two
>>> commits before, with 76ba59f8366f2d9282cb5bda9de75b4b68cbe55f) is
>>> subtly different from the one it replaces, handle_IRQ, as it checks if
>>> the irq is not 0 as well as checking for an upper bound. Removing the
>>> check for 0 makes my machine work again! But honestly, I do not know if
>>> a zero irq is legit, so I hope some more knowledgeable people will tell
>>> me if this is OK.
>>>
>>> -- >8 --
>>> Subject: [PATCH] Make __handle_domain_irq accept IRQ 0
>>>
>>> The same as handle_IRQ did before.
>>>
>>> Signed-off-by: Benjamin Cama <benoar@dolka.fr>
>>> ---
>>>  kernel/irq/irqdesc.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
>>> index a1782f8..bfbeeb6 100644
>>> --- a/kernel/irq/irqdesc.c
>>> +++ b/kernel/irq/irqdesc.c
>>> @@ -365,7 +365,7 @@ int __handle_domain_irq(struct irq_domain 
>>> *domain, unsigned int hwirq,
>>>          * Some hardware gives randomly wrong interrupts.  Rather
>>>          * than crashing, do something sensible.
>>>          */
>>> -       if (unlikely(!irq || irq >= nr_irqs)) {
>>> +       if (unlikely(irq >= nr_irqs)) {
>>>                 ack_bad_irq(irq);
>>>                 ret = -EINVAL;
>>>         } else {
>>>
>>
>> Unfortunately, this is the wrong thing to do. IRQ0 is invalid, has been
>> for a very long time, and actually represents the lack of interrupt.
> 
> OK, sorry for the mistake, I didn't know. Shouldn't the IRQ
> initialization routine check this and warn the user that it may cause
> problems? “Silently” making IRQ0 forbidden doesn't help for the
> platforms that are not fixed yet.

Well, this is hardly a new rule. It has been like this for quite a long
time: http://yarchive.net/comp/linux/zero.html

As for the checking and warning, this is on a very hot path, for an
error case that really shouldn't occur.

<rant>
I've also come to the conclusion that people often choose to ignore
warnings. It boots, so who cares? Funnily enough, they react when their
toy doesn't work any more.
</rant>

> 
>> The way you can address this is by making sure your favourite platform
>> does not use IRQ0 at all, which is done by not assuming that Linux IRQ
>> number (which is always completely virtual) is the same as the number
>> designating the actual HW interrupt line.
>>
>> For example, have a look at 18f3aec (ARM: 8230/1: sa1100: shift IRQs by
>> one) for an example of such a (very simple) conversion. You'll need to
>> tweak irq.c too.
>>
>> Other commits for sa1100 will hopefully convince you to switch to irq
>> domains altogether. This will greatly facilitate a possible further
>> transition to DT if you wish to do so.
> 
> I read quite a bit about all this virtual/hardware IRQ stuff, and the
> irq domains, interesting. I see that orion5x seems to be one of the
> last platforms not using it (appart from DT-converted boards); is it a
> bad sign about its durability?…

No, we have platforms that will most probably never be converted to DT,
and they are very far from rotting in the tree.

Thanks,

	M.
Benjamin Cama June 19, 2015, 12:16 p.m. UTC | #6
Le vendredi 19 juin 2015 à 10:13 +0100, Marc Zyngier a écrit :
> On 19/06/15 02:38, Benjamin Cama wrote:
> > Le jeudi 18 juin 2015 à 08:52 +0100, Marc Zyngier a écrit :
> >> Unfortunately, this is the wrong thing to do. IRQ0 is invalid, has been
> >> for a very long time, and actually represents the lack of interrupt.
> > 
> > OK, sorry for the mistake, I didn't know. Shouldn't the IRQ
> > initialization routine check this and warn the user that it may cause
> > problems? “Silently” making IRQ0 forbidden doesn't help for the
> > platforms that are not fixed yet.
> 
> Well, this is hardly a new rule. It has been like this for quite a long
> time: http://yarchive.net/comp/linux/zero.html
> 
> As for the checking and warning, this is on a very hot path, for an
> error case that really shouldn't occur.

I was not talking about the irq handler, but the irq initialization
routine (on orion5x, orion_irq_init calls irq_alloc_generic_chip with
0), which takes the starting irq number and may warn when it is zero
(well, it may also start allocating at zero but never use it, so this
may not be a totally correct assumption, but I think this comes close,
and it's just a warning).

> <rant>
> I've also come to the conclusion that people often choose to ignore
> warnings. It boots, so who cares? Funnily enough, they react when their
> toy doesn't work any more.
> </rant>

Well, original Orion support seems to come from Marvell itself…

> > 
> >> The way you can address this is by making sure your favourite platform
> >> does not use IRQ0 at all, which is done by not assuming that Linux IRQ
> >> number (which is always completely virtual) is the same as the number
> >> designating the actual HW interrupt line.
> >>
> >> For example, have a look at 18f3aec (ARM: 8230/1: sa1100: shift IRQs by
> >> one) for an example of such a (very simple) conversion. You'll need to
> >> tweak irq.c too.
> >>
> >> Other commits for sa1100 will hopefully convince you to switch to irq
> >> domains altogether. This will greatly facilitate a possible further
> >> transition to DT if you wish to do so.
> > 
> > I read quite a bit about all this virtual/hardware IRQ stuff, and the
> > irq domains, interesting. I see that orion5x seems to be one of the
> > last platforms not using it (appart from DT-converted boards); is it a
> > bad sign about its durability?…
> 
> No, we have platforms that will most probably never be converted to DT,
> and they are very far from rotting in the tree.

Nice to know. One of the reasons for converting my board to DT was fear
that it may one day be mandatory. Is there any particular reason for
keeping some boards non-DT while still being maintained?

Regards,
--
benjamin
Jason Cooper June 19, 2015, 1:01 p.m. UTC | #7
Benjamin,

On Fri, Jun 19, 2015 at 02:16:34PM +0200, Benjamin Cama wrote:
> Le vendredi 19 juin 2015 ?? 10:13 +0100, Marc Zyngier a ??crit :
> > On 19/06/15 02:38, Benjamin Cama wrote:
...
> > > I read quite a bit about all this virtual/hardware IRQ stuff, and the
> > > irq domains, interesting. I see that orion5x seems to be one of the
> > > last platforms not using it (appart from DT-converted boards); is it a
> > > bad sign about its durability????
> > 
> > No, we have platforms that will most probably never be converted to DT,
> > and they are very far from rotting in the tree.
> 
> Nice to know. One of the reasons for converting my board to DT was fear
> that it may one day be mandatory. Is there any particular reason for
> keeping some boards non-DT while still being maintained?

If folks are using the sub-arch and interested in maintaining them, they
stick around.  mach-orion5x/ is a wierd case because there aren't many
users.  The mvebu folks have put a lot of work into supporting /
converting the legacy platforms (kirkwood, dove).  Thomas did most of
the work converting orion5x, but there simply aren't many users to test.
So it's in a half-converted state.

iow, we're glad to see you contributing patches for orion5x.  ;-)

thx,

Jason.
Russell King - ARM Linux June 19, 2015, 1:13 p.m. UTC | #8
On Fri, Jun 19, 2015 at 02:16:34PM +0200, Benjamin Cama wrote:
> Le vendredi 19 juin 2015 à 10:13 +0100, Marc Zyngier a écrit :
> > On 19/06/15 02:38, Benjamin Cama wrote:
> > > Le jeudi 18 juin 2015 à 08:52 +0100, Marc Zyngier a écrit :
> > >> Unfortunately, this is the wrong thing to do. IRQ0 is invalid, has been
> > >> for a very long time, and actually represents the lack of interrupt.
> > > 
> > > OK, sorry for the mistake, I didn't know. Shouldn't the IRQ
> > > initialization routine check this and warn the user that it may cause
> > > problems? “Silently” making IRQ0 forbidden doesn't help for the
> > > platforms that are not fixed yet.
> > 
> > Well, this is hardly a new rule. It has been like this for quite a long
> > time: http://yarchive.net/comp/linux/zero.html
> > 
> > As for the checking and warning, this is on a very hot path, for an
> > error case that really shouldn't occur.
> 
> I was not talking about the irq handler, but the irq initialization
> routine (on orion5x, orion_irq_init calls irq_alloc_generic_chip with
> 0), which takes the starting irq number and may warn when it is zero
> (well, it may also start allocating at zero but never use it, so this
> may not be a totally correct assumption, but I think this comes close,
> and it's just a warning).

It needs fixing nevertheless - arguments along the lines of "this used
to work" don't work for this topic.

The simple answer is to adjust the initialisation to bump the IRQ
numbers up by one, and them adjust the interrupt numbers in
arch/arm/mach-whatever/include/asm/irqs.h also up by one.  That's
far easier to do than spending ages trying to argue against the
"IRQ0 is not valid" issue, only to ultimately get nowhere, and end
up with that as the only way forward anyway.
Benjamin Cama June 19, 2015, 1:46 p.m. UTC | #9
Hi Russell,

Le 2015-06-19 15:13, Russell King - ARM Linux a écrit :
> On Fri, Jun 19, 2015 at 02:16:34PM +0200, Benjamin Cama wrote:
>> Le vendredi 19 juin 2015 à 10:13 +0100, Marc Zyngier a écrit :
>> > On 19/06/15 02:38, Benjamin Cama wrote:
>> > > Le jeudi 18 juin 2015 à 08:52 +0100, Marc Zyngier a écrit :
>> > >> Unfortunately, this is the wrong thing to do. IRQ0 is invalid, 
>> has been
>> > >> for a very long time, and actually represents the lack of 
>> interrupt.
>> > >
>> > > OK, sorry for the mistake, I didn't know. Shouldn't the IRQ
>> > > initialization routine check this and warn the user that it may 
>> cause
>> > > problems? “Silently” making IRQ0 forbidden doesn't help for the
>> > > platforms that are not fixed yet.
>> >
>> > Well, this is hardly a new rule. It has been like this for quite a 
>> long
>> > time: http://yarchive.net/comp/linux/zero.html
>> >
>> > As for the checking and warning, this is on a very hot path, for 
>> an
>> > error case that really shouldn't occur.
>>
>> I was not talking about the irq handler, but the irq initialization
>> routine (on orion5x, orion_irq_init calls irq_alloc_generic_chip 
>> with
>> 0), which takes the starting irq number and may warn when it is zero
>> (well, it may also start allocating at zero but never use it, so 
>> this
>> may not be a totally correct assumption, but I think this comes 
>> close,
>> and it's just a warning).
>
> It needs fixing nevertheless - arguments along the lines of "this 
> used
> to work" don't work for this topic.
>
> The simple answer is to adjust the initialisation to bump the IRQ
> numbers up by one, and them adjust the interrupt numbers in
> arch/arm/mach-whatever/include/asm/irqs.h also up by one.  That's
> far easier to do than spending ages trying to argue against the
> "IRQ0 is not valid" issue, only to ultimately get nowhere, and end
> up with that as the only way forward anyway.

Do not misunderstand me: I am not at all for keeping the situation like 
this!
What I ask is just for users to be notified of this new requirement: 
for my
case, my board simply couldn't boot anymore, without any explanation. 
If there
was a message along the lines “You are setting up IRQs starting from 0, 
which
is not supported by the kernel anymore” just before crashing, maybe it 
would
help debugging the issue.

I could try to write a patch for it, but I was first wondering if this 
is a
good idea or not.

Regards,
--
benjamin
Jason Cooper June 19, 2015, 3:25 p.m. UTC | #10
On Fri, Jun 19, 2015 at 03:46:45PM +0200, Benjamin Cama wrote:
> Le 2015-06-19 15:13, Russell King - ARM Linux a ??crit??:
> >On Fri, Jun 19, 2015 at 02:16:34PM +0200, Benjamin Cama wrote:
...
> >>I was not talking about the irq handler, but the irq initialization
> >>routine (on orion5x, orion_irq_init calls irq_alloc_generic_chip
> >>with 0), which takes the starting irq number and may warn when it is
> >>zero (well, it may also start allocating at zero but never use it,
> >>so this may not be a totally correct assumption, but I think this
> >>comes close, and it's just a warning).
> >
> >It needs fixing nevertheless - arguments along the lines of "this
> >used to work" don't work for this topic.
> >
> >The simple answer is to adjust the initialisation to bump the IRQ
> >numbers up by one, and them adjust the interrupt numbers in
> >arch/arm/mach-whatever/include/asm/irqs.h also up by one.  That's
> >far easier to do than spending ages trying to argue against the
> >"IRQ0 is not valid" issue, only to ultimately get nowhere, and end
> >up with that as the only way forward anyway.
> 
> Do not misunderstand me: I am not at all for keeping the situation
> like this!  What I ask is just for users to be notified of this new
> requirement: for my case, my board simply couldn't boot anymore,
> without any explanation. If there was a message along the lines ???You
> are setting up IRQs starting from 0, which is not supported by the
> kernel anymore??? just before crashing, maybe it would help debugging
> the issue.
> 
> I could try to write a patch for it, but I was first wondering if this
> is a good idea or not.

Let's just get the dts patch reviewed and merged first.  Russell
actually wrote the patch to do what he's describing for mach-dove.

  http://lists.infradead.org/pipermail/linux-arm-kernel/2014-December/309684.html

Although, it looks like it never got updated for submission...

  http://lists.infradead.org/pipermail/linux-arm-kernel/2014-December/311800.html

thx,

Jason.
Russell King - ARM Linux June 19, 2015, 3:44 p.m. UTC | #11
On Fri, Jun 19, 2015 at 03:46:45PM +0200, Benjamin Cama wrote:
> Hi Russell,
> Do not misunderstand me: I am not at all for keeping the situation like
> this! What I ask is just for users to be notified of this new requirement:
> for my case, my board simply couldn't boot anymore, without any
> explanation.

It's not a new requirement.  It's something that's been there all along,
and is finally starting to bite places where it never used to bite before.
It's a latent bug in platform code, that's all.

This "IRQ0 is bad" has been known for quite some while - you've already
been pointed at Linus' rant about it, dated 25 Jan 2007.  If you do a bit
of research, you'll find that the orion code was contributed _after_ that
date.

While it's easy to stick a warning into orion_irq_init() (or whatever the
function is called), it's utterly pointless now that it's been found -
just fix all the users of orion_irq_init() now and be done with it.

As for "shouldn't this have had a warning in the past", well, maybe, but
consider this task:

	Please go and look at the other mach-* directories, and find
	all those which might use IRQ0, and arrange for them to print
	a warning.

I'm sure you'll see what a task it is that you're asking for.

What you're asking for is actually unreasonable.


(I am aware that Dove broke exactly for this reason, and I've been
carrying a patch since December 2014 to fix Dove - but I just don't have
the time to get it merged - like most of my other Dove work.)
Russell King - ARM Linux June 19, 2015, 3:48 p.m. UTC | #12
On Fri, Jun 19, 2015 at 03:25:52PM +0000, Jason Cooper wrote:
> Let's just get the dts patch reviewed and merged first.  Russell
> actually wrote the patch to do what he's describing for mach-dove.
> 
>   http://lists.infradead.org/pipermail/linux-arm-kernel/2014-December/309684.html
> 
> Although, it looks like it never got updated for submission...
> 
>   http://lists.infradead.org/pipermail/linux-arm-kernel/2014-December/311800.html

I do still have that patch, it's based on v4.0 though - like all my
Dove work, it only builds on the previous -final kernel release.

"Getting it out there" is really low on my priority list because of
the amount of hassle involved in trying to get anything merged (read
that as: I've basically given up trying anymore because of the amount
of back pressure from people who don't read the comments from the
previous submission, and end up making the same old comments every
time - thereby ensuring that there is little in the way of forward
progress.  No, we're not using syscon nor regmap for the PMU...)
Jason Cooper June 19, 2015, 5:13 p.m. UTC | #13
On Fri, Jun 19, 2015 at 04:48:50PM +0100, Russell King - ARM Linux wrote:
> On Fri, Jun 19, 2015 at 03:25:52PM +0000, Jason Cooper wrote:
> > Let's just get the dts patch reviewed and merged first.  Russell
> > actually wrote the patch to do what he's describing for mach-dove.
> > 
> >   http://lists.infradead.org/pipermail/linux-arm-kernel/2014-December/309684.html
> > 
> > Although, it looks like it never got updated for submission...
> > 
> >   http://lists.infradead.org/pipermail/linux-arm-kernel/2014-December/311800.html
> 
> I do still have that patch, it's based on v4.0 though - like all my
> Dove work, it only builds on the previous -final kernel release.

Benjamin, after the dts, would you mind picking up Russell's dove patch and
making a cooresponding orion5x patch?

thx,

Jason.
Benjamin Cama June 20, 2015, 1:01 a.m. UTC | #14
Le vendredi 19 juin 2015 à 16:44 +0100, Russell King - ARM Linux a
écrit :
> On Fri, Jun 19, 2015 at 03:46:45PM +0200, Benjamin Cama wrote:
> > Hi Russell,
> > Do not misunderstand me: I am not at all for keeping the situation like
> > this! What I ask is just for users to be notified of this new requirement:
> > for my case, my board simply couldn't boot anymore, without any
> > explanation.
> 
> It's not a new requirement.  It's something that's been there all along,
> and is finally starting to bite places where it never used to bite before.
> It's a latent bug in platform code, that's all.
> 
> This "IRQ0 is bad" has been known for quite some while - you've already
> been pointed at Linus' rant about it, dated 25 Jan 2007.  If you do a bit
> of research, you'll find that the orion code was contributed _after_ that
> date.

Yes, I understand.

> While it's easy to stick a warning into orion_irq_init() (or whatever the
> function is called), it's utterly pointless now that it's been found -
> just fix all the users of orion_irq_init() now and be done with it.

Well, I could do that but I do not have all the hardware to test it on.

> As for "shouldn't this have had a warning in the past", well, maybe, but
> consider this task:
> 
> 	> Please go and look at the other mach-* directories, and find
> 	> all those which might use IRQ0, and arrange for them to print
> 	> a warning.
> 
> I'm sure you'll see what a task it is that you're asking for.
> 
> What you're asking for is actually unreasonable.

I was not at all asking for that, sorry for the misunderstanding again:
I was imagining putting a warning or BUG_ON in e.g.
irq_alloc_generic_chip (and just one place; I am not even sure this is
the right one: I am quite a beginner in kernel code, to put things in
context). I tried that but with bad results (just a hard reboot for
BUG_ON without message) maybe because without earlyprintk, it happens
too early in the boot process to display anything to the user. So this
is a bad idea indeed.

> (I am aware that Dove broke exactly for this reason, and I've been
> carrying a patch since December 2014 to fix Dove - but I just don't have
> the time to get it merged - like most of my other Dove work.)

I tried a quick grep'ing of platforms allocating IRQ0, and all Orion is
indeed doing it, as well as davinci. But apart from them, the rest
seems OK, so the platforms having this problem seem indeed to be very
few.

Regards,
--
benjamin
Benjamin Cama June 21, 2015, 5:37 p.m. UTC | #15
Le vendredi 19 juin 2015 à 17:13 +0000, Jason Cooper a écrit :
> On Fri, Jun 19, 2015 at 04:48:50PM +0100, Russell King - ARM Linux wrote:
> > On Fri, Jun 19, 2015 at 03:25:52PM +0000, Jason Cooper wrote:
> > > Let's just get the dts patch reviewed and merged first.  Russell
> > > actually wrote the patch to do what he's describing for mach-dove.
> > > 
> > >   http://lists.infradead.org/pipermail/linux-arm-kernel/2014-December/309684.html
> > > 
> > > Although, it looks like it never got updated for submission...
> > > 
> > >   http://lists.infradead.org/pipermail/linux-arm-kernel/2014-December/311800.html
> > 
> > I do still have that patch, it's based on v4.0 though - like all my
> > Dove work, it only builds on the previous -final kernel release.
> 
> Benjamin, after the dts, would you mind picking up Russell's dove patch and
> making a cooresponding orion5x patch?

If the plan is to move to DT, is it still needed?

--
benjamin
Jason Cooper June 22, 2015, 12:08 p.m. UTC | #16
On Sun, Jun 21, 2015 at 07:37:55PM +0200, Benjamin Cama wrote:
> Le vendredi 19 juin 2015 ?? 17:13 +0000, Jason Cooper a ??crit :
> > On Fri, Jun 19, 2015 at 04:48:50PM +0100, Russell King - ARM Linux wrote:
> > > On Fri, Jun 19, 2015 at 03:25:52PM +0000, Jason Cooper wrote:
> > > > Let's just get the dts patch reviewed and merged first.  Russell
> > > > actually wrote the patch to do what he's describing for mach-dove.
> > > > 
> > > >   http://lists.infradead.org/pipermail/linux-arm-kernel/2014-December/309684.html
> > > > 
> > > > Although, it looks like it never got updated for submission...
> > > > 
> > > >   http://lists.infradead.org/pipermail/linux-arm-kernel/2014-December/311800.html
> > > 
> > > I do still have that patch, it's based on v4.0 though - like all my
> > > Dove work, it only builds on the previous -final kernel release.
> > 
> > Benjamin, after the dts, would you mind picking up Russell's dove patch and
> > making a cooresponding orion5x patch?
> 
> If the plan is to move to DT, is it still needed?

Sure, because orion5x is slow-going.  There aren't that many users.
Since we prefer tested patches, we need to grab the opportunity when it
comes a long.  It could be another year or more before we see more
patches for orion5x.  Fully converting orion5x to DT doesn't have a
schedule, it's as we encounter devs willing to post patches for their
boards.

The opportunity here is that you were just booting a legacy board file
on an orion5x SoC.  Therefore, you're in the best position to write/test
the irq change.

thx,

Jason.
Benjamin Cama June 22, 2015, 5:49 p.m. UTC | #17
Le lundi 22 juin 2015 à 12:08 +0000, Jason Cooper a écrit :
> On Sun, Jun 21, 2015 at 07:37:55PM +0200, Benjamin Cama wrote:
> > Le vendredi 19 juin 2015 ?? 17:13 +0000, Jason Cooper a ??crit :
> > > On Fri, Jun 19, 2015 at 04:48:50PM +0100, Russell King - ARM Linux wrote:
> > > > On Fri, Jun 19, 2015 at 03:25:52PM +0000, Jason Cooper wrote:
> > > > > Let's just get the dts patch reviewed and merged first.  Russell
> > > > > actually wrote the patch to do what he's describing for mach-dove.
> > > > > 
> > > > >   http://lists.infradead.org/pipermail/linux-arm-kernel/2014-December/309684.html
> > > > > 
> > > > > Although, it looks like it never got updated for submission...
> > > > > 
> > > > >   http://lists.infradead.org/pipermail/linux-arm-kernel/2014-December/311800.html
> > > > 
> > > > I do still have that patch, it's based on v4.0 though - like all my
> > > > Dove work, it only builds on the previous -final kernel release.
> > > 
> > > Benjamin, after the dts, would you mind picking up Russell's dove patch and
> > > making a cooresponding orion5x patch?
> > 
> > If the plan is to move to DT, is it still needed?
> 
> Sure, because orion5x is slow-going.  There aren't that many users.
> Since we prefer tested patches, we need to grab the opportunity when it
> comes a long.  It could be another year or more before we see more
> patches for orion5x.  Fully converting orion5x to DT doesn't have a
> schedule, it's as we encounter devs willing to post patches for their
> boards.

According to rmk, letting their board break may be a quicker way to get
the patches ;-) (and I agree with that: I wouldn't have moved to DT so
fast if the build didn't break, it is a good incentive).

> The opportunity here is that you were just booting a legacy board file
> on an orion5x SoC.  Therefore, you're in the best position to write/test
> the irq change.

OK, I understand. I'll try to find the time to test the non-DT case and
offer a patch. About Russell's one: how is the Signed-off-by going in
this case? I put his and then mine?

Regards,
--
benjamin
Russell King - ARM Linux June 22, 2015, 5:58 p.m. UTC | #18
On Mon, Jun 22, 2015 at 07:49:39PM +0200, Benjamin Cama wrote:
> OK, I understand. I'll try to find the time to test the non-DT case and
> offer a patch. About Russell's one: how is the Signed-off-by going in
> this case? I put his and then mine?

If you produce an entirely separate patch to address Orion5x in the same
way that I've done with Dove, then there's no reason to have my sign-off
on it, because none of it will contain work that I've done.

Also note that I've actually re-sent that patch today, and Andrew says
he'll apply it around -rc1 time, so you don't have to include the Dove
bits in your patch.
Jason Cooper June 22, 2015, 6:04 p.m. UTC | #19
On Mon, Jun 22, 2015 at 07:49:39PM +0200, Benjamin Cama wrote:
> Le lundi 22 juin 2015 ?? 12:08 +0000, Jason Cooper a ??crit :
> > On Sun, Jun 21, 2015 at 07:37:55PM +0200, Benjamin Cama wrote:
> > > Le vendredi 19 juin 2015 ?? 17:13 +0000, Jason Cooper a ??crit :
> > > > On Fri, Jun 19, 2015 at 04:48:50PM +0100, Russell King - ARM Linux wrote:
> > > > > On Fri, Jun 19, 2015 at 03:25:52PM +0000, Jason Cooper wrote:
> > > > > > Let's just get the dts patch reviewed and merged first.  Russell
> > > > > > actually wrote the patch to do what he's describing for mach-dove.
> > > > > > 
> > > > > >   http://lists.infradead.org/pipermail/linux-arm-kernel/2014-December/309684.html
> > > > > > 
> > > > > > Although, it looks like it never got updated for submission...
> > > > > > 
> > > > > >   http://lists.infradead.org/pipermail/linux-arm-kernel/2014-December/311800.html
> > > > > 
> > > > > I do still have that patch, it's based on v4.0 though - like all my
> > > > > Dove work, it only builds on the previous -final kernel release.
> > > > 
> > > > Benjamin, after the dts, would you mind picking up Russell's dove patch and
> > > > making a cooresponding orion5x patch?
> > > 
> > > If the plan is to move to DT, is it still needed?
> > 
> > Sure, because orion5x is slow-going.  There aren't that many users.
> > Since we prefer tested patches, we need to grab the opportunity when it
> > comes a long.  It could be another year or more before we see more
> > patches for orion5x.  Fully converting orion5x to DT doesn't have a
> > schedule, it's as we encounter devs willing to post patches for their
> > boards.
> 
> According to rmk, letting their board break may be a quicker way to get
> the patches ;-) (and I agree with that: I wouldn't have moved to DT so
> fast if the build didn't break, it is a good incentive).

Oddly enough, that's how I got roped into being a maintainer.  :-P

> > The opportunity here is that you were just booting a legacy board file
> > on an orion5x SoC.  Therefore, you're in the best position to write/test
> > the irq change.
> 
> OK, I understand. I'll try to find the time to test the non-DT case and
> offer a patch. About Russell's one: how is the Signed-off-by going in
> this case? I put his and then mine?

Well, it looks like it applies cleanly to linux-next, so we can probably
take it as is.  My request was wrt the original patch I linked to, which
would have needed a rebase.

So, no need to do anything with rmk's patch.  Just use it as a base for
the orion5x case if you need to.

thx,

Jason.
Arnd Bergmann July 14, 2015, 8:50 p.m. UTC | #20
On Tuesday 14 July 2015 16:25:58 Benjamin Cama wrote:
> Hi everyone,
> 
> I finally nailed it: my attempt at copying Russell's patch for dove did
> not work because MULTI_IRQ_HANDLER was not set. It is not in the
> defconfig; maybe it should be now? I hesitated to put it as a
> dependency for PLAT_ORION_LEGACY, but as the platform is globally
> broken without it, I added it.
> 
> What do you think?
> 

I think it makes sense the way you did here.

See also my patch at

http://git.kernel.org/cgit/linux/kernel/git/arnd/playground.git/commit/?h=multiplatform-4.2-next&id=44b55cab3bd74739e4c39205d41e1b2ed8953155

I'd be happy if we could just merge that one, along with the other
plat-orion multiplatform patches at
http://git.kernel.org/cgit/linux/kernel/git/arnd/playground.git/log/?h=b1f5b2f1a738d7f78b94856bfe7a1a7d4da1fbd1

I did not manage to get them merged before my parental leave and
certainly won't be able to do it before I get back to work, but maybe
someone else can have a look. I think the five patches should all
be fine there, they mainly need testing on real hardware.

	Arnd
Gregory CLEMENT Aug. 14, 2015, 3:46 p.m. UTC | #21
Hi Arnd, Benjamin,

On 14/07/2015 22:50, Arnd Bergmann wrote:
> On Tuesday 14 July 2015 16:25:58 Benjamin Cama wrote:
>> Hi everyone,
>>
>> I finally nailed it: my attempt at copying Russell's patch for dove did
>> not work because MULTI_IRQ_HANDLER was not set. It is not in the
>> defconfig; maybe it should be now? I hesitated to put it as a
>> dependency for PLAT_ORION_LEGACY, but as the platform is globally
>> broken without it, I added it.
>>
>> What do you think?
>>
> 
> I think it makes sense the way you did here.
> 
> See also my patch at
> 
> http://git.kernel.org/cgit/linux/kernel/git/arnd/playground.git/commit/?h=multiplatform-4.2-next&id=44b55cab3bd74739e4c39205d41e1b2ed8953155
> 
> I'd be happy if we could just merge that one, along with the other
> plat-orion multiplatform patches at
> http://git.kernel.org/cgit/linux/kernel/git/arnd/playground.git/log/?h=b1f5b2f1a738d7f78b94856bfe7a1a7d4da1fbd1
> 
> I did not manage to get them merged before my parental leave and
> certainly won't be able to do it before I get back to work, but maybe
> someone else can have a look. I think the five patches should all
> be fine there, they mainly need testing on real hardware.

Benjamin's patch definitely needs to be applied, as another orion user got
an issue too:
http://lists.infradead.org/pipermail/linux-arm-kernel/2015-August/364503.html


The remaining question in this thread was about only taking Benjamin patch
or the full series from Arnd. But now it is too late to take the full series
and currently there is a bug in 4.2-rc. So first I will apply this patch with
the hope that it could be merged before the 4.2 release.

Later we definitely will need to take care of Arnd series.

Thanks,

Gregory

> 
> 	Arnd
>
diff mbox

Patch

diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index a1782f8..bfbeeb6 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -365,7 +365,7 @@  int __handle_domain_irq(struct irq_domain *domain, unsigned int hwirq,
         * Some hardware gives randomly wrong interrupts.  Rather
         * than crashing, do something sensible.
         */
-       if (unlikely(!irq || irq >= nr_irqs)) {
+       if (unlikely(irq >= nr_irqs)) {
                ack_bad_irq(irq);
                ret = -EINVAL;
        } else {