diff mbox

ARM: imx6: allow booting with old DT

Message ID 1432658616-13733-1-git-send-email-l.stach@pengutronix.de
State New
Headers show

Commit Message

Lucas Stach May 26, 2015, 4:43 p.m. UTC
The GPC rewrite to IRQ domains has been on the premise that it may break
suspend/resume for new kernels on old DT, but otherwise keep things working
from a user perspective. This was an accepted compromise to be able to move
the GIC cleanup forward.

What actually happened was that booting a new kernel on an old DT crashes
before even the console is up, so the user does not even see the warning
that the DT is too old. The warning message suggests that this has been
known before, which is clearly unacceptable.

Fix the early crash by mapping the GPC memory space if the IRQ controller
doesn't claim it. This keeps at least CPUidle and the needed CPU wakeup
workarounds working. With this fixed the system is able to boot up
properly minus the expected suspend/resume breakage.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
---
Shawn, this is a critical fix for 4.1, to keep the combination of old
DT + new kernel at least somewhat working, to allow users to resolve
the situation on their systems. Please make sure this gets forwarded
ASAP.
---
 arch/arm/mach-imx/gpc.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

Comments

Shawn Guo May 27, 2015, 7:34 a.m. UTC | #1
On Tue, May 26, 2015 at 06:43:36PM +0200, Lucas Stach wrote:
> The GPC rewrite to IRQ domains has been on the premise that it may break
> suspend/resume for new kernels on old DT, but otherwise keep things working
> from a user perspective. This was an accepted compromise to be able to move
> the GIC cleanup forward.
> 
> What actually happened was that booting a new kernel on an old DT crashes
> before even the console is up, so the user does not even see the warning
> that the DT is too old. The warning message suggests that this has been
> known before, which is clearly unacceptable.

To see any early message like this one, low-level debug support is
expected to be turned on.

But anyway, with your change, we get a much better situation.  Will send
it to arm-soc soon.  Thanks for the patch.

Shawn

> 
> Fix the early crash by mapping the GPC memory space if the IRQ controller
> doesn't claim it. This keeps at least CPUidle and the needed CPU wakeup
> workarounds working. With this fixed the system is able to boot up
> properly minus the expected suspend/resume breakage.
> 
> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
> ---
> Shawn, this is a critical fix for 4.1, to keep the combination of old
> DT + new kernel at least somewhat working, to allow users to resolve
> the situation on their systems. Please make sure this gets forwarded
> ASAP.
> ---
>  arch/arm/mach-imx/gpc.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm/mach-imx/gpc.c b/arch/arm/mach-imx/gpc.c
> index 4d60005e9277..bbf015056221 100644
> --- a/arch/arm/mach-imx/gpc.c
> +++ b/arch/arm/mach-imx/gpc.c
> @@ -280,9 +280,15 @@ void __init imx_gpc_check_dt(void)
>  	struct device_node *np;
>  
>  	np = of_find_compatible_node(NULL, NULL, "fsl,imx6q-gpc");
> -	if (WARN_ON(!np ||
> -		    !of_find_property(np, "interrupt-controller", NULL)))
> -		pr_warn("Outdated DT detected, system is about to crash!!!\n");
> +	if (WARN_ON(!np))
> +		return;
> +
> +	if (WARN_ON(!of_find_property(np, "interrupt-controller", NULL))) {
> +		pr_warn("Outdated DT detected, suspend/resume will NOT work\n");
> +
> +		/* map GPC, so that at least CPUidle and WARs keep working */
> +		gpc_base = of_iomap(np, 0);
> +	}
>  }
>  
>  #ifdef CONFIG_PM_GENERIC_DOMAINS
> -- 
> 2.1.4
>
Lucas Stach May 27, 2015, 7:52 a.m. UTC | #2
Am Mittwoch, den 27.05.2015, 15:34 +0800 schrieb Shawn Guo:
> On Tue, May 26, 2015 at 06:43:36PM +0200, Lucas Stach wrote:
> > The GPC rewrite to IRQ domains has been on the premise that it may break
> > suspend/resume for new kernels on old DT, but otherwise keep things working
> > from a user perspective. This was an accepted compromise to be able to move
> > the GIC cleanup forward.
> > 
> > What actually happened was that booting a new kernel on an old DT crashes
> > before even the console is up, so the user does not even see the warning
> > that the DT is too old. The warning message suggests that this has been
> > known before, which is clearly unacceptable.
> 
> To see any early message like this one, low-level debug support is
> expected to be turned on.
> 
Using low-level debug might be acceptable for a developer.

>From a user perspective a kernel update without a corresponding DT
update should never render the machine completely broken. Keep in mind
that i.MX6 isn't only used in deeply embedded system, but is already in
devices where non-developer users might update the kernel. They might
even use prebuilt kernels where enabling low-level debug is not an
option.
We are not free to break the existing DT stability rules in such a
drastic way, especially if keeping things working to some extent is
easily done.

I have another patch to fix a power domain problem with new kernel on
old DT. Will send out in a minute.

Regards,
Lucas

> But anyway, with your change, we get a much better situation.  Will send
> it to arm-soc soon.  Thanks for the patch.
> 
> Shawn
> 
> > 
> > Fix the early crash by mapping the GPC memory space if the IRQ controller
> > doesn't claim it. This keeps at least CPUidle and the needed CPU wakeup
> > workarounds working. With this fixed the system is able to boot up
> > properly minus the expected suspend/resume breakage.
> > 
> > Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
> > ---
> > Shawn, this is a critical fix for 4.1, to keep the combination of old
> > DT + new kernel at least somewhat working, to allow users to resolve
> > the situation on their systems. Please make sure this gets forwarded
> > ASAP.
> > ---
> >  arch/arm/mach-imx/gpc.c | 12 +++++++++---
> >  1 file changed, 9 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/arm/mach-imx/gpc.c b/arch/arm/mach-imx/gpc.c
> > index 4d60005e9277..bbf015056221 100644
> > --- a/arch/arm/mach-imx/gpc.c
> > +++ b/arch/arm/mach-imx/gpc.c
> > @@ -280,9 +280,15 @@ void __init imx_gpc_check_dt(void)
> >  	struct device_node *np;
> >  
> >  	np = of_find_compatible_node(NULL, NULL, "fsl,imx6q-gpc");
> > -	if (WARN_ON(!np ||
> > -		    !of_find_property(np, "interrupt-controller", NULL)))
> > -		pr_warn("Outdated DT detected, system is about to crash!!!\n");
> > +	if (WARN_ON(!np))
> > +		return;
> > +
> > +	if (WARN_ON(!of_find_property(np, "interrupt-controller", NULL))) {
> > +		pr_warn("Outdated DT detected, suspend/resume will NOT work\n");
> > +
> > +		/* map GPC, so that at least CPUidle and WARs keep working */
> > +		gpc_base = of_iomap(np, 0);
> > +	}
> >  }
> >  
> >  #ifdef CONFIG_PM_GENERIC_DOMAINS
> > -- 
> > 2.1.4
> >
Marc Zyngier May 27, 2015, 8:07 a.m. UTC | #3
On 27/05/15 08:52, Lucas Stach wrote:
> Am Mittwoch, den 27.05.2015, 15:34 +0800 schrieb Shawn Guo:
>> On Tue, May 26, 2015 at 06:43:36PM +0200, Lucas Stach wrote:
>>> The GPC rewrite to IRQ domains has been on the premise that it may break
>>> suspend/resume for new kernels on old DT, but otherwise keep things working
>>> from a user perspective. This was an accepted compromise to be able to move
>>> the GIC cleanup forward.
>>>
>>> What actually happened was that booting a new kernel on an old DT crashes
>>> before even the console is up, so the user does not even see the warning
>>> that the DT is too old. The warning message suggests that this has been
>>> known before, which is clearly unacceptable.
>>
>> To see any early message like this one, low-level debug support is
>> expected to be turned on.
>>
> Using low-level debug might be acceptable for a developer.
> 
> From a user perspective a kernel update without a corresponding DT
> update should never render the machine completely broken. Keep in mind
> that i.MX6 isn't only used in deeply embedded system, but is already in
> devices where non-developer users might update the kernel. They might
> even use prebuilt kernels where enabling low-level debug is not an
> option.

I'd imagine that whoever provides this pre-build kernel would also
deliver some form of release notes indicating the procedure. Even
better, an installation script.

> We are not free to break the existing DT stability rules in such a
> drastic way, especially if keeping things working to some extent is
> easily done.

That would be on the condition that the DT was correct the first place,
and accurately described the hardware. It didn't, breaking the contract
we have the first place.

We can argue for years about DT stability, history proves that it
doesn't lead anywhere.

Thanks,

	M.
Lucas Stach May 27, 2015, 8:20 a.m. UTC | #4
Am Mittwoch, den 27.05.2015, 09:07 +0100 schrieb Marc Zyngier:
> On 27/05/15 08:52, Lucas Stach wrote:
> > Am Mittwoch, den 27.05.2015, 15:34 +0800 schrieb Shawn Guo:
> >> On Tue, May 26, 2015 at 06:43:36PM +0200, Lucas Stach wrote:
> >>> The GPC rewrite to IRQ domains has been on the premise that it may break
> >>> suspend/resume for new kernels on old DT, but otherwise keep things working
> >>> from a user perspective. This was an accepted compromise to be able to move
> >>> the GIC cleanup forward.
> >>>
> >>> What actually happened was that booting a new kernel on an old DT crashes
> >>> before even the console is up, so the user does not even see the warning
> >>> that the DT is too old. The warning message suggests that this has been
> >>> known before, which is clearly unacceptable.
> >>
> >> To see any early message like this one, low-level debug support is
> >> expected to be turned on.
> >>
> > Using low-level debug might be acceptable for a developer.
> > 
> > From a user perspective a kernel update without a corresponding DT
> > update should never render the machine completely broken. Keep in mind
> > that i.MX6 isn't only used in deeply embedded system, but is already in
> > devices where non-developer users might update the kernel. They might
> > even use prebuilt kernels where enabling low-level debug is not an
> > option.
> 
> I'd imagine that whoever provides this pre-build kernel would also
> deliver some form of release notes indicating the procedure. Even
> better, an installation script.
> 
> > We are not free to break the existing DT stability rules in such a
> > drastic way, especially if keeping things working to some extent is
> > easily done.
> 
> That would be on the condition that the DT was correct the first place,
> and accurately described the hardware. It didn't, breaking the contract
> we have the first place.
> 
> We can argue for years about DT stability, history proves that it
> doesn't lead anywhere.
> 
The fact that we are still in a learning phase with regard to DT and
have made mistakes in the past (and I'm sure we will still do some in
the future) isn't an excuse for not even trying to keep the stability.

In this case there was a clear consent that we break DT stability by
rendering suspend/resume broken. This was the accepted compromise
between DT stability and the highly needed GIC cleanup.
Completely breaking an entire SoC family in early boot was not part of
the the compromise! Especially as it isn't really hard to keep things
working to the specified extent, as this patch proves.

Regards,
Lucas
Marc Zyngier May 27, 2015, 8:33 a.m. UTC | #5
On 27/05/15 09:20, Lucas Stach wrote:
> Am Mittwoch, den 27.05.2015, 09:07 +0100 schrieb Marc Zyngier:
>> On 27/05/15 08:52, Lucas Stach wrote:
>>> Am Mittwoch, den 27.05.2015, 15:34 +0800 schrieb Shawn Guo:
>>>> On Tue, May 26, 2015 at 06:43:36PM +0200, Lucas Stach wrote:
>>>>> The GPC rewrite to IRQ domains has been on the premise that it may break
>>>>> suspend/resume for new kernels on old DT, but otherwise keep things working
>>>>> from a user perspective. This was an accepted compromise to be able to move
>>>>> the GIC cleanup forward.
>>>>>
>>>>> What actually happened was that booting a new kernel on an old DT crashes
>>>>> before even the console is up, so the user does not even see the warning
>>>>> that the DT is too old. The warning message suggests that this has been
>>>>> known before, which is clearly unacceptable.
>>>>
>>>> To see any early message like this one, low-level debug support is
>>>> expected to be turned on.
>>>>
>>> Using low-level debug might be acceptable for a developer.
>>>
>>> From a user perspective a kernel update without a corresponding DT
>>> update should never render the machine completely broken. Keep in mind
>>> that i.MX6 isn't only used in deeply embedded system, but is already in
>>> devices where non-developer users might update the kernel. They might
>>> even use prebuilt kernels where enabling low-level debug is not an
>>> option.
>>
>> I'd imagine that whoever provides this pre-build kernel would also
>> deliver some form of release notes indicating the procedure. Even
>> better, an installation script.
>>
>>> We are not free to break the existing DT stability rules in such a
>>> drastic way, especially if keeping things working to some extent is
>>> easily done.
>>
>> That would be on the condition that the DT was correct the first place,
>> and accurately described the hardware. It didn't, breaking the contract
>> we have the first place.
>>
>> We can argue for years about DT stability, history proves that it
>> doesn't lead anywhere.
>>
> The fact that we are still in a learning phase with regard to DT and
> have made mistakes in the past (and I'm sure we will still do some in
> the future) isn't an excuse for not even trying to keep the stability.
> 
> In this case there was a clear consent that we break DT stability by
> rendering suspend/resume broken. This was the accepted compromise
> between DT stability and the highly needed GIC cleanup.
> Completely breaking an entire SoC family in early boot was not part of
> the the compromise! Especially as it isn't really hard to keep things
> working to the specified extent, as this patch proves.

This patch series has been on the list for months, in -next for several
weeks, and now in mainline for about a month. I would have appreciated
your input and your patch during that time so that your pet platform
would have been kept in an acceptable shape. Others have done so, you
didn't.

If you have such a vested interest in keeping this SoC on a smooth
upgrade path, then I suggest you at least follow the various patch
series potentially touching it, and contribute in a timely manner.
Understand that people don't necessarily have access to your favourite
piece of HW, and may overlook that kind of detail. I rely on others to
let me know about it, and I encourage you to do so.

Thanks,

	M.
Lucas Stach May 27, 2015, 9:05 a.m. UTC | #6
Am Mittwoch, den 27.05.2015, 09:33 +0100 schrieb Marc Zyngier:
> On 27/05/15 09:20, Lucas Stach wrote:
> > Am Mittwoch, den 27.05.2015, 09:07 +0100 schrieb Marc Zyngier:
> >> On 27/05/15 08:52, Lucas Stach wrote:
> >>> Am Mittwoch, den 27.05.2015, 15:34 +0800 schrieb Shawn Guo:
> >>>> On Tue, May 26, 2015 at 06:43:36PM +0200, Lucas Stach wrote:
> >>>>> The GPC rewrite to IRQ domains has been on the premise that it may break
> >>>>> suspend/resume for new kernels on old DT, but otherwise keep things working
> >>>>> from a user perspective. This was an accepted compromise to be able to move
> >>>>> the GIC cleanup forward.
> >>>>>
> >>>>> What actually happened was that booting a new kernel on an old DT crashes
> >>>>> before even the console is up, so the user does not even see the warning
> >>>>> that the DT is too old. The warning message suggests that this has been
> >>>>> known before, which is clearly unacceptable.
> >>>>
> >>>> To see any early message like this one, low-level debug support is
> >>>> expected to be turned on.
> >>>>
> >>> Using low-level debug might be acceptable for a developer.
> >>>
> >>> From a user perspective a kernel update without a corresponding DT
> >>> update should never render the machine completely broken. Keep in mind
> >>> that i.MX6 isn't only used in deeply embedded system, but is already in
> >>> devices where non-developer users might update the kernel. They might
> >>> even use prebuilt kernels where enabling low-level debug is not an
> >>> option.
> >>
> >> I'd imagine that whoever provides this pre-build kernel would also
> >> deliver some form of release notes indicating the procedure. Even
> >> better, an installation script.
> >>
> >>> We are not free to break the existing DT stability rules in such a
> >>> drastic way, especially if keeping things working to some extent is
> >>> easily done.
> >>
> >> That would be on the condition that the DT was correct the first place,
> >> and accurately described the hardware. It didn't, breaking the contract
> >> we have the first place.
> >>
> >> We can argue for years about DT stability, history proves that it
> >> doesn't lead anywhere.
> >>
> > The fact that we are still in a learning phase with regard to DT and
> > have made mistakes in the past (and I'm sure we will still do some in
> > the future) isn't an excuse for not even trying to keep the stability.
> > 
> > In this case there was a clear consent that we break DT stability by
> > rendering suspend/resume broken. This was the accepted compromise
> > between DT stability and the highly needed GIC cleanup.
> > Completely breaking an entire SoC family in early boot was not part of
> > the the compromise! Especially as it isn't really hard to keep things
> > working to the specified extent, as this patch proves.
> 
> This patch series has been on the list for months, in -next for several
> weeks, and now in mainline for about a month. I would have appreciated
> your input and your patch during that time so that your pet platform
> would have been kept in an acceptable shape. Others have done so, you
> didn't.
> 
We are still in the RC phase of 4.1, so no harm has been done other than
upsetting some developers and potential adventurous users. So we are
still right in time to fix this.

I don't have the bandwidth to test every patchset flying by and while I
didn't provide an explicit ack, I was okay with it by looking at it.
Honestly I trusted you that the patchset does what was stated in the
commit message, which is break suspend/resume but otherwise keep things
working. 

> If you have such a vested interest in keeping this SoC on a smooth
> upgrade path, then I suggest you at least follow the various patch
> series potentially touching it, and contribute in a timely manner.
> Understand that people don't necessarily have access to your favourite
> piece of HW, and may overlook that kind of detail. I rely on others to
> let me know about it, and I encourage you to do so.
> 
I'm fine with fixing things at the RC stages and I did not blame anyone
personally. What is not okay is that the commit messages of the series
didn't provide a hint for this kind of breakage.
Until v6 of this series the warning message read "Outdated DT detected,
suspend/resume will NOT work", only later it was changed to "Outdated DT
detected, system is about to crash!!!", which seems to indicate that you
were aware of the breakage, but neglected to explicitly state this in a
commit message where it is easily accessible for other developers
looking at the patches or resolve the problem.

Also the attitude of saying "DT stability has been broken in the past so
why bother with it in the future" is troubling IMHO.

Regards,
Lucas
Marc Zyngier May 27, 2015, 9:24 a.m. UTC | #7
On 27/05/15 10:05, Lucas Stach wrote:
> Am Mittwoch, den 27.05.2015, 09:33 +0100 schrieb Marc Zyngier:
>> On 27/05/15 09:20, Lucas Stach wrote:
>>> Am Mittwoch, den 27.05.2015, 09:07 +0100 schrieb Marc Zyngier:
>>>> On 27/05/15 08:52, Lucas Stach wrote:
>>>>> Am Mittwoch, den 27.05.2015, 15:34 +0800 schrieb Shawn Guo:
>>>>>> On Tue, May 26, 2015 at 06:43:36PM +0200, Lucas Stach wrote:
>>>>>>> The GPC rewrite to IRQ domains has been on the premise that it may break
>>>>>>> suspend/resume for new kernels on old DT, but otherwise keep things working
>>>>>>> from a user perspective. This was an accepted compromise to be able to move
>>>>>>> the GIC cleanup forward.
>>>>>>>
>>>>>>> What actually happened was that booting a new kernel on an old DT crashes
>>>>>>> before even the console is up, so the user does not even see the warning
>>>>>>> that the DT is too old. The warning message suggests that this has been
>>>>>>> known before, which is clearly unacceptable.
>>>>>>
>>>>>> To see any early message like this one, low-level debug support is
>>>>>> expected to be turned on.
>>>>>>
>>>>> Using low-level debug might be acceptable for a developer.
>>>>>
>>>>> From a user perspective a kernel update without a corresponding DT
>>>>> update should never render the machine completely broken. Keep in mind
>>>>> that i.MX6 isn't only used in deeply embedded system, but is already in
>>>>> devices where non-developer users might update the kernel. They might
>>>>> even use prebuilt kernels where enabling low-level debug is not an
>>>>> option.
>>>>
>>>> I'd imagine that whoever provides this pre-build kernel would also
>>>> deliver some form of release notes indicating the procedure. Even
>>>> better, an installation script.
>>>>
>>>>> We are not free to break the existing DT stability rules in such a
>>>>> drastic way, especially if keeping things working to some extent is
>>>>> easily done.
>>>>
>>>> That would be on the condition that the DT was correct the first place,
>>>> and accurately described the hardware. It didn't, breaking the contract
>>>> we have the first place.
>>>>
>>>> We can argue for years about DT stability, history proves that it
>>>> doesn't lead anywhere.
>>>>
>>> The fact that we are still in a learning phase with regard to DT and
>>> have made mistakes in the past (and I'm sure we will still do some in
>>> the future) isn't an excuse for not even trying to keep the stability.
>>>
>>> In this case there was a clear consent that we break DT stability by
>>> rendering suspend/resume broken. This was the accepted compromise
>>> between DT stability and the highly needed GIC cleanup.
>>> Completely breaking an entire SoC family in early boot was not part of
>>> the the compromise! Especially as it isn't really hard to keep things
>>> working to the specified extent, as this patch proves.
>>
>> This patch series has been on the list for months, in -next for several
>> weeks, and now in mainline for about a month. I would have appreciated
>> your input and your patch during that time so that your pet platform
>> would have been kept in an acceptable shape. Others have done so, you
>> didn't.
>>
> We are still in the RC phase of 4.1, so no harm has been done other than
> upsetting some developers and potential adventurous users. So we are
> still right in time to fix this.
> 
> I don't have the bandwidth to test every patchset flying by and while I
> didn't provide an explicit ack, I was okay with it by looking at it.
> Honestly I trusted you that the patchset does what was stated in the
> commit message, which is break suspend/resume but otherwise keep things
> working. 

The cover letter explicitly said that I didn't test it on such HW (only
OMAP4/5 and Tegra-2). I don't have any FSL board in my zoo.

>> If you have such a vested interest in keeping this SoC on a smooth
>> upgrade path, then I suggest you at least follow the various patch
>> series potentially touching it, and contribute in a timely manner.
>> Understand that people don't necessarily have access to your favourite
>> piece of HW, and may overlook that kind of detail. I rely on others to
>> let me know about it, and I encourage you to do so.
>>
> I'm fine with fixing things at the RC stages and I did not blame anyone
> personally. What is not okay is that the commit messages of the series
> didn't provide a hint for this kind of breakage.
> Until v6 of this series the warning message read "Outdated DT detected,
> suspend/resume will NOT work", only later it was changed to "Outdated DT
> detected, system is about to crash!!!", which seems to indicate that you
> were aware of the breakage, but neglected to explicitly state this in a
> commit message where it is easily accessible for other developers
> looking at the patches or resolve the problem.

You really are reading *way* too much into a message. I would have
happily happily inserted a quote by Frank Zappa instead, but the meaning
could have been slightly obscure. As to know about such a breakage, do
you really think I care that little about the kernel to deliberately
leave a bug in and paper over it? Please give me a break...

> Also the attitude of saying "DT stability has been broken in the past so
> why bother with it in the future" is troubling IMHO.

Yes, this is a conspiracy. Read all about it.

	M.
diff mbox

Patch

diff --git a/arch/arm/mach-imx/gpc.c b/arch/arm/mach-imx/gpc.c
index 4d60005e9277..bbf015056221 100644
--- a/arch/arm/mach-imx/gpc.c
+++ b/arch/arm/mach-imx/gpc.c
@@ -280,9 +280,15 @@  void __init imx_gpc_check_dt(void)
 	struct device_node *np;
 
 	np = of_find_compatible_node(NULL, NULL, "fsl,imx6q-gpc");
-	if (WARN_ON(!np ||
-		    !of_find_property(np, "interrupt-controller", NULL)))
-		pr_warn("Outdated DT detected, system is about to crash!!!\n");
+	if (WARN_ON(!np))
+		return;
+
+	if (WARN_ON(!of_find_property(np, "interrupt-controller", NULL))) {
+		pr_warn("Outdated DT detected, suspend/resume will NOT work\n");
+
+		/* map GPC, so that at least CPUidle and WARs keep working */
+		gpc_base = of_iomap(np, 0);
+	}
 }
 
 #ifdef CONFIG_PM_GENERIC_DOMAINS