diff mbox series

efi: Call bootm_disable_interrupts earlier in efi_exit_boot_services

Message ID 20211119213304.2824837-1-trini@konsulko.com
State Accepted, archived
Commit 3f73e79de83ecc78b9a9c823b8118ab1fba63b0a
Delegated to: Heinrich Schuchardt
Headers show
Series efi: Call bootm_disable_interrupts earlier in efi_exit_boot_services | expand

Commit Message

Tom Rini Nov. 19, 2021, 9:33 p.m. UTC
If we look at the path that bootm/booti take when preparing to boot the
OS, we see that as part of (or prior to calling do_bootm_states,
explicitly) the process, bootm_disable_interrupts() is called prior to
announce_and_cleanup() which is where udc_disconnect() /
board_quiesce_devices() / dm_remove_devices_flags() are called from.  In
the EFI path, these are called afterwards.  In efi_exit_boot_services()
however we have been calling bootm_disable_interrupts() after the above
functions, as part of ensuring that we disable interrupts as required
by the spec.  However, bootm_disable_interrupts() is also where we go
and call usb_stop().  While this has been fine before, on the TI J721E
platform this leads us to an exception.  This exception seems likely to
be the case that we're trying to stop devices that we have already
disabled clocks for.  The most direct way to handle this particular
problem is to make EFI behave like the do_bootm_states() process and
ensure we call bootm_disable_interrupts() prior to ending up in
usb_stop().

Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Simon Glass <sjg@chromium.org>
Suggested-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Tom Rini <trini@konsulko.com>
---
First up, as my wording above implies, I'm assuming rather than being
100% confident in why calling usb_stop() is leading to the exception I
get.  It's this call:
                /* Locate root hub device */
                device_find_first_child(bus, &rh);
that causes the exception.  This board is also a little odd in that,
borrowing from dm_dump_all():
 nop           5  [   ]   cdns-ti               |   |-- cdns-usb@4114000
 usb           0  [ + ]   cdns-usb3-host        |   |   `-- usb@6400000
 usb_hub       0  [ + ]   usb_hub               |   |       `-- usb_hub
 usb_hub       1  [ + ]   usb_hub               |   |           `-- usb_hub
 usb_mass_s    0  [ + ]   usb_mass_storage      |   |               `-- usb_mass_storage
 blk           2  [ + ]   usb_storage_blk       |   |                   `-- usb_mass_storage.lun0
and physically, only that mass storage device is attached to the board
itself, the rest is on-device.

Second, while talking with Ilias he's said he'll see if there can be
some common function / abstractions done here between the
do_bootm_states() code and the efi_exit_boot_services() code as this
change shows other common code that's in arch/*/lib/bootm.c.  The call
to bootm_disable_interrupts() however I have tried to make clear is not
handled in a common way as bootm/booti spell out the call as they don't
use the BOOTM_STATE_LOADOS flag.

Third, I'm not 100% sure that 3d71c81a9bb0 is still the correct / ideal
thing to be doing, and that's what usb_stop() is all about, at that
point in the boot cycle.  It might well be the kind of quirk that we
have handled now, via DM.

Fourth, I really would like to figure out a fix here appropriate to
v2022.01 and I think this is the least invasive approach.
---
 lib/efi_loader/efi_boottime.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

Comments

Heinrich Schuchardt Nov. 19, 2021, 9:52 p.m. UTC | #1
Am 19. November 2021 22:33:04 MEZ schrieb Tom Rini <trini@konsulko.com>:
>If we look at the path that bootm/booti take when preparing to boot the
>OS, we see that as part of (or prior to calling do_bootm_states,
>explicitly) the process, bootm_disable_interrupts() is called prior to
>announce_and_cleanup() which is where udc_disconnect() /
>board_quiesce_devices() / dm_remove_devices_flags() are called from.  In
>the EFI path, these are called afterwards.  In efi_exit_boot_services()
>however we have been calling bootm_disable_interrupts() after the above
>functions, as part of ensuring that we disable interrupts as required
>by the spec.  However, bootm_disable_interrupts() is also where we go
>and call usb_stop().  While this has been fine before, on the TI J721E
>platform this leads us to an exception.  This exception seems likely to
>be the case that we're trying to stop devices that we have already
>disabled clocks for.  The most direct way to handle this particular

This patch may hide an error on your board but obviously does not address the real problem.

If dependencies in the shut down sequence should exist, we need to consider them in the driver model.

What is your plan to analyze the problem?

Best regards

Heinrich 

>problem is to make EFI behave like the do_bootm_states() process and
>ensure we call bootm_disable_interrupts() prior to ending up in
>usb_stop().
>
>Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
>Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
>Cc: Simon Glass <sjg@chromium.org>
>Suggested-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
>Signed-off-by: Tom Rini <trini@konsulko.com>
>---
>First up, as my wording above implies, I'm assuming rather than being
>100% confident in why calling usb_stop() is leading to the exception I
>get.  It's this call:
>                /* Locate root hub device */
>                device_find_first_child(bus, &rh);
>that causes the exception.  This board is also a little odd in that,
>borrowing from dm_dump_all():
> nop           5  [   ]   cdns-ti               |   |-- cdns-usb@4114000
> usb           0  [ + ]   cdns-usb3-host        |   |   `-- usb@6400000
> usb_hub       0  [ + ]   usb_hub               |   |       `-- usb_hub
> usb_hub       1  [ + ]   usb_hub               |   |           `-- usb_hub
> usb_mass_s    0  [ + ]   usb_mass_storage      |   |               `-- usb_mass_storage
> blk           2  [ + ]   usb_storage_blk       |   |                   `-- usb_mass_storage.lun0
>and physically, only that mass storage device is attached to the board
>itself, the rest is on-device.
>
>Second, while talking with Ilias he's said he'll see if there can be
>some common function / abstractions done here between the
>do_bootm_states() code and the efi_exit_boot_services() code as this
>change shows other common code that's in arch/*/lib/bootm.c.  The call
>to bootm_disable_interrupts() however I have tried to make clear is not
>handled in a common way as bootm/booti spell out the call as they don't
>use the BOOTM_STATE_LOADOS flag.
>
>Third, I'm not 100% sure that 3d71c81a9bb0 is still the correct / ideal
>thing to be doing, and that's what usb_stop() is all about, at that
>point in the boot cycle.  It might well be the kind of quirk that we
>have handled now, via DM.
>
>Fourth, I really would like to figure out a fix here appropriate to
>v2022.01 and I think this is the least invasive approach.
>---
> lib/efi_loader/efi_boottime.c | 4 +---
> 1 file changed, 1 insertion(+), 3 deletions(-)
>
>diff --git a/lib/efi_loader/efi_boottime.c b/lib/efi_loader/efi_boottime.c
>index 1823990d9bd5..0485870d34d9 100644
>--- a/lib/efi_loader/efi_boottime.c
>+++ b/lib/efi_loader/efi_boottime.c
>@@ -2154,6 +2154,7 @@ static efi_status_t EFIAPI efi_exit_boot_services(efi_handle_t image_handle,
> 	}
> 
> 	if (!efi_st_keep_devices) {
>+		bootm_disable_interrupts();
> 		if (IS_ENABLED(CONFIG_USB_DEVICE))
> 			udc_disconnect();
> 		board_quiesce_devices();
>@@ -2166,9 +2167,6 @@ static efi_status_t EFIAPI efi_exit_boot_services(efi_handle_t image_handle,
> 	/* Fix up caches for EFI payloads if necessary */
> 	efi_exit_caches();
> 
>-	/* This stops all lingering devices */
>-	bootm_disable_interrupts();
>-
> 	/* Disable boot time services */
> 	systab.con_in_handle = NULL;
> 	systab.con_in = NULL;
Tom Rini Nov. 19, 2021, 10:09 p.m. UTC | #2
On Fri, Nov 19, 2021 at 10:52:27PM +0100, Heinrich Schuchardt wrote:
> 
> 
> Am 19. November 2021 22:33:04 MEZ schrieb Tom Rini <trini@konsulko.com>:
> >If we look at the path that bootm/booti take when preparing to boot the
> >OS, we see that as part of (or prior to calling do_bootm_states,
> >explicitly) the process, bootm_disable_interrupts() is called prior to
> >announce_and_cleanup() which is where udc_disconnect() /
> >board_quiesce_devices() / dm_remove_devices_flags() are called from.  In
> >the EFI path, these are called afterwards.  In efi_exit_boot_services()
> >however we have been calling bootm_disable_interrupts() after the above
> >functions, as part of ensuring that we disable interrupts as required
> >by the spec.  However, bootm_disable_interrupts() is also where we go
> >and call usb_stop().  While this has been fine before, on the TI J721E
> >platform this leads us to an exception.  This exception seems likely to
> >be the case that we're trying to stop devices that we have already
> >disabled clocks for.  The most direct way to handle this particular
> 
> This patch may hide an error on your board but obviously does not address the real problem.
> 
> If dependencies in the shut down sequence should exist, we need to consider them in the driver model.
> 
> What is your plan to analyze the problem?

I'm not sure there is a different problem to solve here.  It's unsafe to
call the "shut everything down" function, which is what usb_stop() is,
after having shut everything down.  We may be able to stop calling
usb_stop() as any sort of shutdown should already have happened via
driver model, which is what we see now.
Ilias Apalodimas Nov. 20, 2021, 8:20 a.m. UTC | #3
On Sat, 20 Nov 2021 at 00:09, Tom Rini <trini@konsulko.com> wrote:
>
> On Fri, Nov 19, 2021 at 10:52:27PM +0100, Heinrich Schuchardt wrote:
> >
> >
> > Am 19. November 2021 22:33:04 MEZ schrieb Tom Rini <trini@konsulko.com>:
> > >If we look at the path that bootm/booti take when preparing to boot the
> > >OS, we see that as part of (or prior to calling do_bootm_states,
> > >explicitly) the process, bootm_disable_interrupts() is called prior to
> > >announce_and_cleanup() which is where udc_disconnect() /
> > >board_quiesce_devices() / dm_remove_devices_flags() are called from.  In
> > >the EFI path, these are called afterwards.  In efi_exit_boot_services()
> > >however we have been calling bootm_disable_interrupts() after the above
> > >functions, as part of ensuring that we disable interrupts as required
> > >by the spec.  However, bootm_disable_interrupts() is also where we go
> > >and call usb_stop().  While this has been fine before, on the TI J721E
> > >platform this leads us to an exception.  This exception seems likely to
> > >be the case that we're trying to stop devices that we have already
> > >disabled clocks for.  The most direct way to handle this particular
> >
> > This patch may hide an error on your board but obviously does not address the real problem.

I don't think it 'hides' it.  I think it's the other way around, this
board exposes the problem.  In any case I think disabling irq's etc
make sense to run before we disable devices completely.

> >
> > If dependencies in the shut down sequence should exist, we need to consider them in the driver model.

Yes agree 100% here.

> >
> > What is your plan to analyze the problem?
>
> I'm not sure there is a different problem to solve here.  It's unsafe to
> call the "shut everything down" function, which is what usb_stop() is,
> after having shut everything down.  We may be able to stop calling
> usb_stop() as any sort of shutdown should already have happened via
> driver model, which is what we see now.

Linux has CONFIG_EFI_DISABLE_PCI_DMA to deal with similar problems on
PCI devices.  Basically it disables busmastering until linux properly
configures the SMMU.  I think disabling the devices before handing
over to the OS (when possible) is a good policy.  In any case I think
this patch is reasonable since it at lest makes EFI and bootm/i behave
similarly.  I'll go have a look on bootm/i and cleanup the whole thing
at some point.

Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
>
> --
> Tom
Mark Kettenis Nov. 20, 2021, 11:11 a.m. UTC | #4
> From: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> Date: Sat, 20 Nov 2021 10:20:23 +0200
> 
> On Sat, 20 Nov 2021 at 00:09, Tom Rini <trini@konsulko.com> wrote:
> >
> > On Fri, Nov 19, 2021 at 10:52:27PM +0100, Heinrich Schuchardt wrote:
> > >
> > >
> > > Am 19. November 2021 22:33:04 MEZ schrieb Tom Rini <trini@konsulko.com>:
> > > >If we look at the path that bootm/booti take when preparing to boot the
> > > >OS, we see that as part of (or prior to calling do_bootm_states,
> > > >explicitly) the process, bootm_disable_interrupts() is called prior to
> > > >announce_and_cleanup() which is where udc_disconnect() /
> > > >board_quiesce_devices() / dm_remove_devices_flags() are called from.  In
> > > >the EFI path, these are called afterwards.  In efi_exit_boot_services()
> > > >however we have been calling bootm_disable_interrupts() after the above
> > > >functions, as part of ensuring that we disable interrupts as required
> > > >by the spec.  However, bootm_disable_interrupts() is also where we go
> > > >and call usb_stop().  While this has been fine before, on the TI J721E
> > > >platform this leads us to an exception.  This exception seems likely to
> > > >be the case that we're trying to stop devices that we have already
> > > >disabled clocks for.  The most direct way to handle this particular
> > >
> > > This patch may hide an error on your board but obviously does not address the real problem.
> 
> I don't think it 'hides' it.  I think it's the other way around, this
> board exposes the problem.  In any case I think disabling irq's etc
> make sense to run before we disable devices completely.
> 
> > >
> > > If dependencies in the shut down sequence should exist, we need to consider them in the driver model.
> 
> Yes agree 100% here.
> 
> > >
> > > What is your plan to analyze the problem?
> >
> > I'm not sure there is a different problem to solve here.  It's unsafe to
> > call the "shut everything down" function, which is what usb_stop() is,
> > after having shut everything down.  We may be able to stop calling
> > usb_stop() as any sort of shutdown should already have happened via
> > driver model, which is what we see now.
> 
> Linux has CONFIG_EFI_DISABLE_PCI_DMA to deal with similar problems on
> PCI devices.  Basically it disables busmastering until linux properly
> configures the SMMU.  I think disabling the devices before handing
> over to the OS (when possible) is a good policy.  In any case I think
> this patch is reasonable since it at lest makes EFI and bootm/i behave
> similarly.  I'll go have a look on bootm/i and cleanup the whole thing
> at some point.

Have to be careful here.  Some OSes may assume (implicitly or
explicitly) that the "firmware" has initialized the hardware to a
point where it can be used.  This is especially important for serial
ports in order to have a debugging channel available in the OS as
early as possible.  But it is also good for things like USB PHYs and
PCIe root complexes.  If you tear those down completely (by doing a
full reset and/or disabling associated clocks) you risk breaking OSes.
This happened to OpenBSD on the Raspberry Pi 4 because earlier this
year a change was committed that resets the PCIe root complex.  And I
think this has been responsible for some of the USB issues with
Rockchip platforms as well.

But yes, quiescing DMA masters is necessary, and disabling interrupts
is probably a good idea as well.
diff mbox series

Patch

diff --git a/lib/efi_loader/efi_boottime.c b/lib/efi_loader/efi_boottime.c
index 1823990d9bd5..0485870d34d9 100644
--- a/lib/efi_loader/efi_boottime.c
+++ b/lib/efi_loader/efi_boottime.c
@@ -2154,6 +2154,7 @@  static efi_status_t EFIAPI efi_exit_boot_services(efi_handle_t image_handle,
 	}
 
 	if (!efi_st_keep_devices) {
+		bootm_disable_interrupts();
 		if (IS_ENABLED(CONFIG_USB_DEVICE))
 			udc_disconnect();
 		board_quiesce_devices();
@@ -2166,9 +2167,6 @@  static efi_status_t EFIAPI efi_exit_boot_services(efi_handle_t image_handle,
 	/* Fix up caches for EFI payloads if necessary */
 	efi_exit_caches();
 
-	/* This stops all lingering devices */
-	bootm_disable_interrupts();
-
 	/* Disable boot time services */
 	systab.con_in_handle = NULL;
 	systab.con_in = NULL;