diff mbox series

net: ena: Add PCI shutdown handler to allow safe kexec

Message ID 20200320125534.28966-1-gpiccoli@canonical.com
State Accepted
Delegated to: David Miller
Headers show
Series net: ena: Add PCI shutdown handler to allow safe kexec | expand

Commit Message

Guilherme G. Piccoli March 20, 2020, 12:55 p.m. UTC
Currently ENA only provides the PCI remove() handler, used during rmmod
for example. This is not called on shutdown/kexec path; we are potentially
creating a failure scenario on kexec:

(a) Kexec is triggered, no shutdown() / remove() handler is called for ENA;
instead pci_device_shutdown() clears the master bit of the PCI device,
stopping all DMA transactions;

(b) Kexec reboot happens and the device gets enabled again, likely having
its FW with that DMA transaction buffered; then it may trigger the (now
invalid) memory operation in the new kernel, corrupting kernel memory area.

This patch aims to prevent this, by implementing a shutdown() handler
quite similar to the remove() one - the difference being the handling
of the netdev, which is unregistered on remove(), but following the
convention observed in other drivers, it's only detached on shutdown().

This prevents an odd issue in AWS Nitro instances, in which after the 2nd
kexec the next one will fail with an initrd corruption, caused by a wild
DMA write to invalid kernel memory. The lspci output for the adapter
present in my instance is:

00:05.0 Ethernet controller [0200]: Amazon.com, Inc. Elastic Network
Adapter (ENA) [1d0f:ec20]

Suggested-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
---


The idea for this patch came from an informal conversation with my
friend Gavin Shan, based on his past experience with similar issues.
I'd like to thank him for the great suggestion!

As a test metric, I've performed 1000 kexecs with this patch, whereas
without this one, the 3rd kexec failed with initrd corruption. Also,
one test that I've done before writing the patch was just to rmmod
the driver before the kexecs, and it worked fine too.

I suggest we add this patch in stable releases as well.
Thanks in advance for reviews,

Guilherme


 drivers/net/ethernet/amazon/ena/ena_netdev.c | 51 ++++++++++++++++----
 1 file changed, 41 insertions(+), 10 deletions(-)

Comments

David Miller March 24, 2020, 4:04 a.m. UTC | #1
From: "Guilherme G. Piccoli" <gpiccoli@canonical.com>
Date: Fri, 20 Mar 2020 09:55:34 -0300

> Currently ENA only provides the PCI remove() handler, used during rmmod
> for example. This is not called on shutdown/kexec path; we are potentially
> creating a failure scenario on kexec:
> 
> (a) Kexec is triggered, no shutdown() / remove() handler is called for ENA;
> instead pci_device_shutdown() clears the master bit of the PCI device,
> stopping all DMA transactions;
> 
> (b) Kexec reboot happens and the device gets enabled again, likely having
> its FW with that DMA transaction buffered; then it may trigger the (now
> invalid) memory operation in the new kernel, corrupting kernel memory area.
> 
> This patch aims to prevent this, by implementing a shutdown() handler
> quite similar to the remove() one - the difference being the handling
> of the netdev, which is unregistered on remove(), but following the
> convention observed in other drivers, it's only detached on shutdown().
> 
> This prevents an odd issue in AWS Nitro instances, in which after the 2nd
> kexec the next one will fail with an initrd corruption, caused by a wild
> DMA write to invalid kernel memory. The lspci output for the adapter
> present in my instance is:
> 
> 00:05.0 Ethernet controller [0200]: Amazon.com, Inc. Elastic Network
> Adapter (ENA) [1d0f:ec20]
> 
> Suggested-by: Gavin Shan <gshan@redhat.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>

Amazon folks, please review.
Bhupesh Sharma March 24, 2020, 8:04 p.m. UTC | #2
Hi Guilherme,

On 03/20/2020 06:25 PM, Guilherme G. Piccoli wrote:
> Currently ENA only provides the PCI remove() handler, used during rmmod
> for example. This is not called on shutdown/kexec path; we are potentially
> creating a failure scenario on kexec:
> 
> (a) Kexec is triggered, no shutdown() / remove() handler is called for ENA;
> instead pci_device_shutdown() clears the master bit of the PCI device,
> stopping all DMA transactions;
> 
> (b) Kexec reboot happens and the device gets enabled again, likely having
> its FW with that DMA transaction buffered; then it may trigger the (now
> invalid) memory operation in the new kernel, corrupting kernel memory area.
> 
> This patch aims to prevent this, by implementing a shutdown() handler
> quite similar to the remove() one - the difference being the handling
> of the netdev, which is unregistered on remove(), but following the
> convention observed in other drivers, it's only detached on shutdown().
> 
> This prevents an odd issue in AWS Nitro instances, in which after the 2nd
> kexec the next one will fail with an initrd corruption, caused by a wild
> DMA write to invalid kernel memory. The lspci output for the adapter
> present in my instance is:
> 
> 00:05.0 Ethernet controller [0200]: Amazon.com, Inc. Elastic Network
> Adapter (ENA) [1d0f:ec20]

Thanks for the patch.

> Suggested-by: Gavin Shan <gshan@redhat.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
> ---
> 
> 
> The idea for this patch came from an informal conversation with my
> friend Gavin Shan, based on his past experience with similar issues.
> I'd like to thank him for the great suggestion!
> 
> As a test metric, I've performed 1000 kexecs with this patch, whereas
> without this one, the 3rd kexec failed with initrd corruption. Also,
> one test that I've done before writing the patch was just to rmmod
> the driver before the kexecs, and it worked fine too.
> 
> I suggest we add this patch in stable releases as well.
> Thanks in advance for reviews,

This patch fixes the repetitive kexec reboot issues that I was facing 
for some time on the aws nitro (t3) machines. Normally the kexec reboots 
would not runs more than ~ 3 to 5 times on the machine.

Now with this patch, I can runs hundreds of repetitive nested kexec 
reboots on the aws nitro machines without any failure.

So, I think this is a really good patch and should be applied to stable 
trees as well.

Please feel free to add:

Tested-and-Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com>

Thanks,
Bhupesh

> Guilherme
> 
> 
>   drivers/net/ethernet/amazon/ena/ena_netdev.c | 51 ++++++++++++++++----
>   1 file changed, 41 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> index 0b2fd96b93d7..7a5c01ff2ee8 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> @@ -4325,13 +4325,15 @@ static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>   
>   /*****************************************************************************/
>   
> -/* ena_remove - Device Removal Routine
> +/* __ena_shutoff - Helper used in both PCI remove/shutdown routines
>    * @pdev: PCI device information struct
> + * @shutdown: Is it a shutdown operation? If false, means it is a removal
>    *
> - * ena_remove is called by the PCI subsystem to alert the driver
> - * that it should release a PCI device.
> + * __ena_shutoff is a helper routine that does the real work on shutdown and
> + * removal paths; the difference between those paths is with regards to whether
> + * dettach or unregister the netdevice.
>    */
> -static void ena_remove(struct pci_dev *pdev)
> +static void __ena_shutoff(struct pci_dev *pdev, bool shutdown)
>   {
>   	struct ena_adapter *adapter = pci_get_drvdata(pdev);
>   	struct ena_com_dev *ena_dev;
> @@ -4350,13 +4352,17 @@ static void ena_remove(struct pci_dev *pdev)
>   
>   	cancel_work_sync(&adapter->reset_task);
>   
> -	rtnl_lock();
> +	rtnl_lock(); /* lock released inside the below if-else block */
>   	ena_destroy_device(adapter, true);
> -	rtnl_unlock();
> -
> -	unregister_netdev(netdev);
> -
> -	free_netdev(netdev);
> +	if (shutdown) {
> +		netif_device_detach(netdev);
> +		dev_close(netdev);
> +		rtnl_unlock();
> +	} else {
> +		rtnl_unlock();
> +		unregister_netdev(netdev);
> +		free_netdev(netdev);
> +	}
>   
>   	ena_com_rss_destroy(ena_dev);
>   
> @@ -4371,6 +4377,30 @@ static void ena_remove(struct pci_dev *pdev)
>   	vfree(ena_dev);
>   }
>   
> +/* ena_remove - Device Removal Routine
> + * @pdev: PCI device information struct
> + *
> + * ena_remove is called by the PCI subsystem to alert the driver
> + * that it should release a PCI device.
> + */
> +
> +static void ena_remove(struct pci_dev *pdev)
> +{
> +	__ena_shutoff(pdev, false);
> +}
> +
> +/* ena_shutdown - Device Shutdown Routine
> + * @pdev: PCI device information struct
> + *
> + * ena_shutdown is called by the PCI subsystem to alert the driver that
> + * a shutdown/reboot (or kexec) is happening and device must be disabled.
> + */
> +
> +static void ena_shutdown(struct pci_dev *pdev)
> +{
> +	__ena_shutoff(pdev, true);
> +}
> +
>   #ifdef CONFIG_PM
>   /* ena_suspend - PM suspend callback
>    * @pdev: PCI device information struct
> @@ -4420,6 +4450,7 @@ static struct pci_driver ena_pci_driver = {
>   	.id_table	= ena_pci_tbl,
>   	.probe		= ena_probe,
>   	.remove		= ena_remove,
> +	.shutdown	= ena_shutdown,
>   #ifdef CONFIG_PM
>   	.suspend    = ena_suspend,
>   	.resume     = ena_resume,
>
David Miller March 25, 2020, 7:04 p.m. UTC | #3
From: "Guilherme G. Piccoli" <gpiccoli@canonical.com>
Date: Fri, 20 Mar 2020 09:55:34 -0300

> Currently ENA only provides the PCI remove() handler, used during rmmod
> for example. This is not called on shutdown/kexec path; we are potentially
> creating a failure scenario on kexec:
> 
> (a) Kexec is triggered, no shutdown() / remove() handler is called for ENA;
> instead pci_device_shutdown() clears the master bit of the PCI device,
> stopping all DMA transactions;
> 
> (b) Kexec reboot happens and the device gets enabled again, likely having
> its FW with that DMA transaction buffered; then it may trigger the (now
> invalid) memory operation in the new kernel, corrupting kernel memory area.
> 
> This patch aims to prevent this, by implementing a shutdown() handler
> quite similar to the remove() one - the difference being the handling
> of the netdev, which is unregistered on remove(), but following the
> convention observed in other drivers, it's only detached on shutdown().
> 
> This prevents an odd issue in AWS Nitro instances, in which after the 2nd
> kexec the next one will fail with an initrd corruption, caused by a wild
> DMA write to invalid kernel memory. The lspci output for the adapter
> present in my instance is:
> 
> 00:05.0 Ethernet controller [0200]: Amazon.com, Inc. Elastic Network
> Adapter (ENA) [1d0f:ec20]
> 
> Suggested-by: Gavin Shan <gshan@redhat.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>

Applied and queued up for -stable, thank you.
Guilherme G. Piccoli March 25, 2020, 7:06 p.m. UTC | #4
On 25/03/2020 16:04, David Miller wrote:
>[...]
> Applied and queued up for -stable, thank you.
> 

Thanks David and all involved in reviewing and testing the patch!
Cheers,


Guilherme
diff mbox series

Patch

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 0b2fd96b93d7..7a5c01ff2ee8 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -4325,13 +4325,15 @@  static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 /*****************************************************************************/
 
-/* ena_remove - Device Removal Routine
+/* __ena_shutoff - Helper used in both PCI remove/shutdown routines
  * @pdev: PCI device information struct
+ * @shutdown: Is it a shutdown operation? If false, means it is a removal
  *
- * ena_remove is called by the PCI subsystem to alert the driver
- * that it should release a PCI device.
+ * __ena_shutoff is a helper routine that does the real work on shutdown and
+ * removal paths; the difference between those paths is with regards to whether
+ * dettach or unregister the netdevice.
  */
-static void ena_remove(struct pci_dev *pdev)
+static void __ena_shutoff(struct pci_dev *pdev, bool shutdown)
 {
 	struct ena_adapter *adapter = pci_get_drvdata(pdev);
 	struct ena_com_dev *ena_dev;
@@ -4350,13 +4352,17 @@  static void ena_remove(struct pci_dev *pdev)
 
 	cancel_work_sync(&adapter->reset_task);
 
-	rtnl_lock();
+	rtnl_lock(); /* lock released inside the below if-else block */
 	ena_destroy_device(adapter, true);
-	rtnl_unlock();
-
-	unregister_netdev(netdev);
-
-	free_netdev(netdev);
+	if (shutdown) {
+		netif_device_detach(netdev);
+		dev_close(netdev);
+		rtnl_unlock();
+	} else {
+		rtnl_unlock();
+		unregister_netdev(netdev);
+		free_netdev(netdev);
+	}
 
 	ena_com_rss_destroy(ena_dev);
 
@@ -4371,6 +4377,30 @@  static void ena_remove(struct pci_dev *pdev)
 	vfree(ena_dev);
 }
 
+/* ena_remove - Device Removal Routine
+ * @pdev: PCI device information struct
+ *
+ * ena_remove is called by the PCI subsystem to alert the driver
+ * that it should release a PCI device.
+ */
+
+static void ena_remove(struct pci_dev *pdev)
+{
+	__ena_shutoff(pdev, false);
+}
+
+/* ena_shutdown - Device Shutdown Routine
+ * @pdev: PCI device information struct
+ *
+ * ena_shutdown is called by the PCI subsystem to alert the driver that
+ * a shutdown/reboot (or kexec) is happening and device must be disabled.
+ */
+
+static void ena_shutdown(struct pci_dev *pdev)
+{
+	__ena_shutoff(pdev, true);
+}
+
 #ifdef CONFIG_PM
 /* ena_suspend - PM suspend callback
  * @pdev: PCI device information struct
@@ -4420,6 +4450,7 @@  static struct pci_driver ena_pci_driver = {
 	.id_table	= ena_pci_tbl,
 	.probe		= ena_probe,
 	.remove		= ena_remove,
+	.shutdown	= ena_shutdown,
 #ifdef CONFIG_PM
 	.suspend    = ena_suspend,
 	.resume     = ena_resume,