diff mbox series

[RFC,1/2] PCI: acpiphp: enable slot only if it hasn't been enabled already

Message ID 20231213003614.1648343-2-imammedo@redhat.com
State New
Headers show
Series PCI: acpiphp: workaround race between hotplug and SCSI_SCAN_ASYNC job | expand

Commit Message

Igor Mammedov Dec. 13, 2023, 12:36 a.m. UTC
When SCSI_SCAN_ASYNC is enabled (either via config or via cmd line),
adding device to bus and enabling it will kick in async host scan

 scsi_scan_host+0x21/0x1f0
 virtscsi_probe+0x2dd/0x350
 ..
 driver_probe_device+0x19/0x80
 ...
 driver_probe_device+0x19/0x80
 pci_bus_add_device+0x53/0x80
 pci_bus_add_devices+0x2b/0x70
 ...

which will schedule a job for async scan. That however breaks
if there are more than one SCSI host behind bridge, since
acpiphp_check_bridge() will walk over all slots and try to
enable each of them regardless of whether they were already
enabled.
As result the bridge might be reconfigured several times
and trigger following sequence:

  [cpu 0] acpiphp_check_bridge()
  [cpu 0]   enable_slot(a)
  [cpu 0]     configure bridge
  [cpu 0]     pci_bus_add_devices() -> scsi_scan_host(a1)
  [cpu 0]   enable_slot(b)
  ...
  [cpu 1] do_scsi_scan_host(a1) <- async jib scheduled for slot a
  ...
  [cpu 0]     configure bridge <- temporaly disables bridge

and cause do_scsi_scan_host() failure.
The same race affects SHPC (but it manages to avoid hitting the race due to
1sec delay when enabling slot).
To cover case of single device hotplug (at a time) do not attempt to
enable slot that have already been enabled.

Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary")
Reported-by: Dongli Zhang <dongli.zhang@oracle.com>
Reported-by: iona Ebner <f.ebner@proxmox.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 drivers/pci/hotplug/acpiphp_glue.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Fiona Ebner Dec. 13, 2023, 9:47 a.m. UTC | #1
Am 13.12.23 um 01:36 schrieb Igor Mammedov:
> When SCSI_SCAN_ASYNC is enabled (either via config or via cmd line),
> adding device to bus and enabling it will kick in async host scan
> 
>  scsi_scan_host+0x21/0x1f0
>  virtscsi_probe+0x2dd/0x350
>  ..
>  driver_probe_device+0x19/0x80
>  ...
>  driver_probe_device+0x19/0x80
>  pci_bus_add_device+0x53/0x80
>  pci_bus_add_devices+0x2b/0x70
>  ...
> 
> which will schedule a job for async scan. That however breaks
> if there are more than one SCSI host behind bridge, since
> acpiphp_check_bridge() will walk over all slots and try to
> enable each of them regardless of whether they were already
> enabled.
> As result the bridge might be reconfigured several times
> and trigger following sequence:
> 
>   [cpu 0] acpiphp_check_bridge()
>   [cpu 0]   enable_slot(a)
>   [cpu 0]     configure bridge
>   [cpu 0]     pci_bus_add_devices() -> scsi_scan_host(a1)
>   [cpu 0]   enable_slot(b)
>   ...
>   [cpu 1] do_scsi_scan_host(a1) <- async jib scheduled for slot a
>   ...
>   [cpu 0]     configure bridge <- temporaly disables bridge
> 
> and cause do_scsi_scan_host() failure.
> The same race affects SHPC (but it manages to avoid hitting the race due to
> 1sec delay when enabling slot).
> To cover case of single device hotplug (at a time) do not attempt to
> enable slot that have already been enabled.
> 
> Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary")
> Reported-by: Dongli Zhang <dongli.zhang@oracle.com>
> Reported-by: iona Ebner <f.ebner@proxmox.com>

Missing an F here ;)

> Signed-off-by: Igor Mammedov <imammedo@redhat.com>

Thank you! Works for me:

Tested-by: Fiona Ebner <f.ebner@proxmox.com>
Igor Mammedov Dec. 13, 2023, 10:07 a.m. UTC | #2
On Wed, 13 Dec 2023 10:47:27 +0100
Fiona Ebner <f.ebner@proxmox.com> wrote:

> Am 13.12.23 um 01:36 schrieb Igor Mammedov:
> > When SCSI_SCAN_ASYNC is enabled (either via config or via cmd line),
> > adding device to bus and enabling it will kick in async host scan
> > 
> >  scsi_scan_host+0x21/0x1f0
> >  virtscsi_probe+0x2dd/0x350
> >  ..
> >  driver_probe_device+0x19/0x80
> >  ...
> >  driver_probe_device+0x19/0x80
> >  pci_bus_add_device+0x53/0x80
> >  pci_bus_add_devices+0x2b/0x70
> >  ...
> > 
> > which will schedule a job for async scan. That however breaks
> > if there are more than one SCSI host behind bridge, since
> > acpiphp_check_bridge() will walk over all slots and try to
> > enable each of them regardless of whether they were already
> > enabled.
> > As result the bridge might be reconfigured several times
> > and trigger following sequence:
> > 
> >   [cpu 0] acpiphp_check_bridge()
> >   [cpu 0]   enable_slot(a)
> >   [cpu 0]     configure bridge
> >   [cpu 0]     pci_bus_add_devices() -> scsi_scan_host(a1)
> >   [cpu 0]   enable_slot(b)
> >   ...
> >   [cpu 1] do_scsi_scan_host(a1) <- async jib scheduled for slot a
> >   ...
> >   [cpu 0]     configure bridge <- temporaly disables bridge
> > 
> > and cause do_scsi_scan_host() failure.
> > The same race affects SHPC (but it manages to avoid hitting the race due to
> > 1sec delay when enabling slot).
> > To cover case of single device hotplug (at a time) do not attempt to
> > enable slot that have already been enabled.
> > 
> > Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary")
> > Reported-by: Dongli Zhang <dongli.zhang@oracle.com>
> > Reported-by: iona Ebner <f.ebner@proxmox.com>  
> 
> Missing an F here ;)

Sorry for copypaste mistake, I'll fix it up on the next submission.

> 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>  
> 
> Thank you! Works for me:
> 
> Tested-by: Fiona Ebner <f.ebner@proxmox.com>
>
Rafael J. Wysocki Dec. 13, 2023, 1:01 p.m. UTC | #3
On Wed, Dec 13, 2023 at 1:36 AM Igor Mammedov <imammedo@redhat.com> wrote:
>
> When SCSI_SCAN_ASYNC is enabled (either via config or via cmd line),
> adding device to bus and enabling it will kick in async host scan
>
>  scsi_scan_host+0x21/0x1f0
>  virtscsi_probe+0x2dd/0x350
>  ..
>  driver_probe_device+0x19/0x80
>  ...
>  driver_probe_device+0x19/0x80
>  pci_bus_add_device+0x53/0x80
>  pci_bus_add_devices+0x2b/0x70
>  ...
>
> which will schedule a job for async scan. That however breaks
> if there are more than one SCSI host behind bridge, since
> acpiphp_check_bridge() will walk over all slots and try to
> enable each of them regardless of whether they were already
> enabled.
> As result the bridge might be reconfigured several times
> and trigger following sequence:
>
>   [cpu 0] acpiphp_check_bridge()
>   [cpu 0]   enable_slot(a)
>   [cpu 0]     configure bridge
>   [cpu 0]     pci_bus_add_devices() -> scsi_scan_host(a1)
>   [cpu 0]   enable_slot(b)
>   ...
>   [cpu 1] do_scsi_scan_host(a1) <- async jib scheduled for slot a
>   ...
>   [cpu 0]     configure bridge <- temporaly disables bridge
>
> and cause do_scsi_scan_host() failure.
> The same race affects SHPC (but it manages to avoid hitting the race due to
> 1sec delay when enabling slot).
> To cover case of single device hotplug (at a time) do not attempt to
> enable slot that have already been enabled.
>
> Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary")
> Reported-by: Dongli Zhang <dongli.zhang@oracle.com>
> Reported-by: iona Ebner <f.ebner@proxmox.com>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
>  drivers/pci/hotplug/acpiphp_glue.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
> index 601129772b2d..6b11609927d6 100644
> --- a/drivers/pci/hotplug/acpiphp_glue.c
> +++ b/drivers/pci/hotplug/acpiphp_glue.c
> @@ -722,7 +722,9 @@ static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
>                                         trim_stale_devices(dev);
>
>                         /* configure all functions */
> -                       enable_slot(slot, true);
> +                       if (slot->flags != SLOT_ENABLED) {
> +                               enable_slot(slot, true);
> +                       }

Shouldn't this be following the acpiphp_enable_slot() pattern, that is

if (!(slot->flags & SLOT_ENABLED))
         enable_slot(slot, true);

Also the braces are redundant.

>                 } else {
>                         disable_slot(slot);
>                 }
> --
Igor Mammedov Dec. 13, 2023, 4:06 p.m. UTC | #4
On Wed, Dec 13, 2023 at 2:01 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Wed, Dec 13, 2023 at 1:36 AM Igor Mammedov <imammedo@redhat.com> wrote:
> >
> > When SCSI_SCAN_ASYNC is enabled (either via config or via cmd line),
> > adding device to bus and enabling it will kick in async host scan
> >
> >  scsi_scan_host+0x21/0x1f0
> >  virtscsi_probe+0x2dd/0x350
> >  ..
> >  driver_probe_device+0x19/0x80
> >  ...
> >  driver_probe_device+0x19/0x80
> >  pci_bus_add_device+0x53/0x80
> >  pci_bus_add_devices+0x2b/0x70
> >  ...
> >
> > which will schedule a job for async scan. That however breaks
> > if there are more than one SCSI host behind bridge, since
> > acpiphp_check_bridge() will walk over all slots and try to
> > enable each of them regardless of whether they were already
> > enabled.
> > As result the bridge might be reconfigured several times
> > and trigger following sequence:
> >
> >   [cpu 0] acpiphp_check_bridge()
> >   [cpu 0]   enable_slot(a)
> >   [cpu 0]     configure bridge
> >   [cpu 0]     pci_bus_add_devices() -> scsi_scan_host(a1)
> >   [cpu 0]   enable_slot(b)
> >   ...
> >   [cpu 1] do_scsi_scan_host(a1) <- async jib scheduled for slot a
> >   ...
> >   [cpu 0]     configure bridge <- temporaly disables bridge
> >
> > and cause do_scsi_scan_host() failure.
> > The same race affects SHPC (but it manages to avoid hitting the race due to
> > 1sec delay when enabling slot).
> > To cover case of single device hotplug (at a time) do not attempt to
> > enable slot that have already been enabled.
> >
> > Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary")
> > Reported-by: Dongli Zhang <dongli.zhang@oracle.com>
> > Reported-by: iona Ebner <f.ebner@proxmox.com>
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> >  drivers/pci/hotplug/acpiphp_glue.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
> > index 601129772b2d..6b11609927d6 100644
> > --- a/drivers/pci/hotplug/acpiphp_glue.c
> > +++ b/drivers/pci/hotplug/acpiphp_glue.c
> > @@ -722,7 +722,9 @@ static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
> >                                         trim_stale_devices(dev);
> >
> >                         /* configure all functions */
> > -                       enable_slot(slot, true);
> > +                       if (slot->flags != SLOT_ENABLED) {
> > +                               enable_slot(slot, true);
> > +                       }
>
> Shouldn't this be following the acpiphp_enable_slot() pattern, that is
>
> if (!(slot->flags & SLOT_ENABLED))
>          enable_slot(slot, true);
>
> Also the braces are redundant.

I'll fix up on respin if Bjorn is fine with the approach in general.

Patches need respin anyways to fix botched up white spacing.

>
> >                 } else {
> >                         disable_slot(slot);
> >                 }
> > --
>
diff mbox series

Patch

diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
index 601129772b2d..6b11609927d6 100644
--- a/drivers/pci/hotplug/acpiphp_glue.c
+++ b/drivers/pci/hotplug/acpiphp_glue.c
@@ -722,7 +722,9 @@  static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
 					trim_stale_devices(dev);
 
 			/* configure all functions */
-			enable_slot(slot, true);
+			if (slot->flags != SLOT_ENABLED) {
+				enable_slot(slot, true);
+			}
 		} else {
 			disable_slot(slot);
 		}