diff mbox

[PATCHv3,2/5] pci: Add is_removed state

Message ID 20161021162010.GB4221@wunner.de
State Superseded
Headers show

Commit Message

Lukas Wunner Oct. 21, 2016, 4:20 p.m. UTC
On Tue, Sep 27, 2016 at 04:23:32PM -0400, Keith Busch wrote:
> This adds a new state for devices that were once in the system, but
> unexpectedly removed. This is so device tear down functions can observe
> the device is not accessible so it may skip attempting to initialize
> the hardware.
[...]
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -337,6 +337,7 @@ struct pci_dev {
>  	unsigned int	multifunction:1;/* Part of multi-function device */
>  	/* keep track of device state */
>  	unsigned int	is_added:1;
> +	unsigned int	is_removed:1;	/* device was surprise removed */
>  	unsigned int	is_busmaster:1; /* device is busmaster */
>  	unsigned int	no_msi:1;	/* device may not use msi */
>  	unsigned int	no_64bit_msi:1; /* device may only use 32-bit MSIs */

The tg3 driver (as well as many other drivers) already check reachability
of a device before accessing its mmio space by calling pci_channel_offline().

I've found that the following simple change on top of your series is
already sufficient to make hot-removal of the Apple Gigabit Ethernet
adapter "just work" (no more soft lockups, which is a giant improvement):




This got me thinking:  We've got three pci_channel_state values defined
in include/linux/pci.h, "normal", "frozen" and "perm_failure".  Instead
of adding a new "is_removed" bit to struct pci_dev, would it perhaps
make more sense to just add a new type of pci_channel_state for removed
devices?  Then the above change to pci_channel_offline() wouldn't even
be necessary.  The pciehp and dpc drivers would just change the channel
status to "removed" and all the drivers already querying it with
pci_channel_offline() would pick up the change automatically.

Thoughts?

Thanks,

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Keith Busch Oct. 21, 2016, 5:08 p.m. UTC | #1
On Fri, Oct 21, 2016 at 06:20:10PM +0200, Lukas Wunner wrote:
> I've found that the following simple change on top of your series is
> already sufficient to make hot-removal of the Apple Gigabit Ethernet
> adapter "just work" (no more soft lockups, which is a giant improvement):
> 
> 
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 5c43012..cc8b234 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -406,7 +406,7 @@ struct pci_dev *pci_alloc_dev(struct pci_bus *bus);
>  
>  static inline int pci_channel_offline(struct pci_dev *pdev)
>  {
> -	return (pdev->error_state != pci_channel_io_normal);
> +	return pdev->error_state != pci_channel_io_normal || pdev->is_removed;
>  }
>  
>  struct pci_host_bridge {
> 
> 
> This got me thinking:  We've got three pci_channel_state values defined
> in include/linux/pci.h, "normal", "frozen" and "perm_failure".  Instead
> of adding a new "is_removed" bit to struct pci_dev, would it perhaps
> make more sense to just add a new type of pci_channel_state for removed
> devices?  Then the above change to pci_channel_offline() wouldn't even
> be necessary.  The pciehp and dpc drivers would just change the channel
> status to "removed" and all the drivers already querying it with
> pci_channel_offline() would pick up the change automatically.
> 
> Thoughts?

I'd be happy if we can reuse that, but concerned about overloading
error_state's intended purpose for AER. The conditions under which an
'is_removed' may be set can also create AER events, and the aer driver
overrides the error_state.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 5c43012..cc8b234 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -406,7 +406,7 @@  struct pci_dev *pci_alloc_dev(struct pci_bus *bus);
 
 static inline int pci_channel_offline(struct pci_dev *pdev)
 {
-	return (pdev->error_state != pci_channel_io_normal);
+	return pdev->error_state != pci_channel_io_normal || pdev->is_removed;
 }
 
 struct pci_host_bridge {