Message ID | 1541702278-12795-1-git-send-email-kamal@canonical.com |
---|---|
State | New |
Headers | show |
Series | [SRU,XBCD] UBUNTU: SAUCE: net: ena: fix crash during ena_remove() | expand |
On 08/11/2018 18:37, Kamal Mostafa wrote: > From: Arthur Kiyanovski <akiyano@amazon.com> > > BugLink: http://bugs.launchpad.net/bugs/1802341 > > In ena_remove() we have the following stack call: > ena_remove() > unregister_netdev() > ena_destroy_device() > netif_carrier_off() > > Calling netif_carrier_off() causes linkwatch to try to handle the > link change event on the already unregistered netdev, which leads > to a read from an unreadable memory address. > > This patch switches the order of the two functions, so that > netif_carrier_off() is called on a regiestered netdev. > > To accomplish this fix we also had to: > 1. Remove the set bit ENA_FLAG_TRIGGER_RESET > 2. Add a sanitiy check in ena_close() > both to prevent double device reset (when calling unregister_netdev() > ena_close is called, but the device was already deleted in > ena_destroy_device()). > 3. Set the admin_queue running state to false to avoid using it after > device was reset (for example when calling ena_destroy_all_io_queues() > right after ena_com_dev_reset() in ena_down) > > Finally, driver version is also updated. > > Change-Id: I3cc1aafe9cb3701a6eaee44e00add0e175c93148 > > Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> > Signed-off-by: Kamal Mostafa <kamal@canonical.com> > --- > drivers/net/ethernet/amazon/ena/ena_netdev.c | 21 ++++++++++----------- > drivers/net/ethernet/amazon/ena/ena_netdev.h | 2 +- > 2 files changed, 11 insertions(+), 12 deletions(-) > > diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c > index 7739cca..08a05f3 100644 > --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c > +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c > @@ -1849,6 +1849,8 @@ static void ena_down(struct ena_adapter *adapter) > rc = ena_com_dev_reset(adapter->ena_dev, adapter->reset_reason); > if (rc) > dev_err(&adapter->pdev->dev, "Device reset failed\n"); > + /* stop submitting admin commands on a device that was reset */ > + ena_com_set_admin_running_state(adapter->ena_dev, false); > } > > ena_destroy_all_io_queues(adapter); > @@ -1915,6 +1917,9 @@ static int ena_close(struct net_device *netdev) > > netif_dbg(adapter, ifdown, netdev, "%s\n", __func__); > > + if (!test_bit(ENA_FLAG_DEVICE_RUNNING, &adapter->flags)) > + return 0; > + > if (test_bit(ENA_FLAG_DEV_UP, &adapter->flags)) > ena_down(adapter); > > @@ -2613,9 +2618,7 @@ static void ena_destroy_device(struct ena_adapter *adapter, bool graceful) > ena_down(adapter); > > /* Stop the device from sending AENQ events (in case reset flag is set > - * and device is up, ena_close already reset the device > - * In case the reset flag is set and the device is up, ena_down() > - * already perform the reset, so it can be skipped. > + * and device is up, ena_down() already reset the device. > */ > if (!(test_bit(ENA_FLAG_TRIGGER_RESET, &adapter->flags) && dev_up)) > ena_com_dev_reset(adapter->ena_dev, adapter->reset_reason); > @@ -3452,6 +3455,8 @@ static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent) > ena_com_rss_destroy(ena_dev); > err_free_msix: > ena_com_dev_reset(ena_dev, ENA_REGS_RESET_INIT_ERR); > + /* stop submitting admin commands on a device that was reset */ > + ena_com_set_admin_running_state(ena_dev, false); > ena_free_mgmnt_irq(adapter); > ena_disable_msix(adapter); > err_worker_destroy: > @@ -3524,18 +3529,12 @@ static void ena_remove(struct pci_dev *pdev) > > cancel_work_sync(&adapter->reset_task); > > - unregister_netdev(netdev); > - > - /* If the device is running then we want to make sure the device will be > - * reset to make sure no more events will be issued by the device. > - */ > - if (test_bit(ENA_FLAG_DEVICE_RUNNING, &adapter->flags)) > - set_bit(ENA_FLAG_TRIGGER_RESET, &adapter->flags); > - > rtnl_lock(); > ena_destroy_device(adapter, true); > rtnl_unlock(); > > + unregister_netdev(netdev); > + > free_netdev(netdev); > > ena_com_rss_destroy(ena_dev); > diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h > index 5218736..dc8b617 100644 > --- a/drivers/net/ethernet/amazon/ena/ena_netdev.h > +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h > @@ -45,7 +45,7 @@ > > #define DRV_MODULE_VER_MAJOR 2 > #define DRV_MODULE_VER_MINOR 0 > -#define DRV_MODULE_VER_SUBMINOR 1 > +#define DRV_MODULE_VER_SUBMINOR 2 > > #define DRV_MODULE_NAME "ena" > #ifndef DRV_MODULE_VERSION > Acked-by: Colin Ian King <colin.king@canonical.com>
On 2018-11-08 10:37:58, Kamal Mostafa wrote: > From: Arthur Kiyanovski <akiyano@amazon.com> > > BugLink: http://bugs.launchpad.net/bugs/1802341 > > In ena_remove() we have the following stack call: > ena_remove() > unregister_netdev() > ena_destroy_device() > netif_carrier_off() > > Calling netif_carrier_off() causes linkwatch to try to handle the > link change event on the already unregistered netdev, which leads > to a read from an unreadable memory address. > > This patch switches the order of the two functions, so that > netif_carrier_off() is called on a regiestered netdev. > > To accomplish this fix we also had to: > 1. Remove the set bit ENA_FLAG_TRIGGER_RESET > 2. Add a sanitiy check in ena_close() > both to prevent double device reset (when calling unregister_netdev() > ena_close is called, but the device was already deleted in > ena_destroy_device()). > 3. Set the admin_queue running state to false to avoid using it after > device was reset (for example when calling ena_destroy_all_io_queues() > right after ena_com_dev_reset() in ena_down) > > Finally, driver version is also updated. > > Change-Id: I3cc1aafe9cb3701a6eaee44e00add0e175c93148 > > Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> > Signed-off-by: Kamal Mostafa <kamal@canonical.com> Seems reasonable to me. Acked-by: Tyler Hicks <tyhicks@canonical.com> Tyler
On 2018-11-08 10:37:58 , Kamal Mostafa wrote: > From: Arthur Kiyanovski <akiyano@amazon.com> > > BugLink: http://bugs.launchpad.net/bugs/1802341 > > In ena_remove() we have the following stack call: > ena_remove() > unregister_netdev() > ena_destroy_device() > netif_carrier_off() > > Calling netif_carrier_off() causes linkwatch to try to handle the > link change event on the already unregistered netdev, which leads > to a read from an unreadable memory address. > > This patch switches the order of the two functions, so that > netif_carrier_off() is called on a regiestered netdev. > > To accomplish this fix we also had to: > 1. Remove the set bit ENA_FLAG_TRIGGER_RESET > 2. Add a sanitiy check in ena_close() > both to prevent double device reset (when calling unregister_netdev() > ena_close is called, but the device was already deleted in > ena_destroy_device()). > 3. Set the admin_queue running state to false to avoid using it after > device was reset (for example when calling ena_destroy_all_io_queues() > right after ena_com_dev_reset() in ena_down) > > Finally, driver version is also updated. > > Change-Id: I3cc1aafe9cb3701a6eaee44e00add0e175c93148 > > Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> > Signed-off-by: Kamal Mostafa <kamal@canonical.com> > --- > drivers/net/ethernet/amazon/ena/ena_netdev.c | 21 ++++++++++----------- > drivers/net/ethernet/amazon/ena/ena_netdev.h | 2 +- > 2 files changed, 11 insertions(+), 12 deletions(-) > > diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c > index 7739cca..08a05f3 100644 > --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c > +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c > @@ -1849,6 +1849,8 @@ static void ena_down(struct ena_adapter *adapter) > rc = ena_com_dev_reset(adapter->ena_dev, adapter->reset_reason); > if (rc) > dev_err(&adapter->pdev->dev, "Device reset failed\n"); > + /* stop submitting admin commands on a device that was reset */ > + ena_com_set_admin_running_state(adapter->ena_dev, false); > } > > ena_destroy_all_io_queues(adapter); > @@ -1915,6 +1917,9 @@ static int ena_close(struct net_device *netdev) > > netif_dbg(adapter, ifdown, netdev, "%s\n", __func__); > > + if (!test_bit(ENA_FLAG_DEVICE_RUNNING, &adapter->flags)) > + return 0; > + > if (test_bit(ENA_FLAG_DEV_UP, &adapter->flags)) > ena_down(adapter); > > @@ -2613,9 +2618,7 @@ static void ena_destroy_device(struct ena_adapter *adapter, bool graceful) > ena_down(adapter); > > /* Stop the device from sending AENQ events (in case reset flag is set > - * and device is up, ena_close already reset the device > - * In case the reset flag is set and the device is up, ena_down() > - * already perform the reset, so it can be skipped. > + * and device is up, ena_down() already reset the device. > */ > if (!(test_bit(ENA_FLAG_TRIGGER_RESET, &adapter->flags) && dev_up)) > ena_com_dev_reset(adapter->ena_dev, adapter->reset_reason); > @@ -3452,6 +3455,8 @@ static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent) > ena_com_rss_destroy(ena_dev); > err_free_msix: > ena_com_dev_reset(ena_dev, ENA_REGS_RESET_INIT_ERR); > + /* stop submitting admin commands on a device that was reset */ > + ena_com_set_admin_running_state(ena_dev, false); > ena_free_mgmnt_irq(adapter); > ena_disable_msix(adapter); > err_worker_destroy: > @@ -3524,18 +3529,12 @@ static void ena_remove(struct pci_dev *pdev) > > cancel_work_sync(&adapter->reset_task); > > - unregister_netdev(netdev); > - > - /* If the device is running then we want to make sure the device will be > - * reset to make sure no more events will be issued by the device. > - */ > - if (test_bit(ENA_FLAG_DEVICE_RUNNING, &adapter->flags)) > - set_bit(ENA_FLAG_TRIGGER_RESET, &adapter->flags); > - > rtnl_lock(); > ena_destroy_device(adapter, true); > rtnl_unlock(); > > + unregister_netdev(netdev); > + > free_netdev(netdev); > > ena_com_rss_destroy(ena_dev); > diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h > index 5218736..dc8b617 100644 > --- a/drivers/net/ethernet/amazon/ena/ena_netdev.h > +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h > @@ -45,7 +45,7 @@ > > #define DRV_MODULE_VER_MAJOR 2 > #define DRV_MODULE_VER_MINOR 0 > -#define DRV_MODULE_VER_SUBMINOR 1 > +#define DRV_MODULE_VER_SUBMINOR 2 > > #define DRV_MODULE_NAME "ena" > #ifndef DRV_MODULE_VERSION > -- > 2.7.4 > > > -- > kernel-team mailing list > kernel-team@lists.ubuntu.com > https://lists.ubuntu.com/mailman/listinfo/kernel-team
Just to clarify, this was applied to XBC but NOT Disco On 2018-11-09 11:12:45 , Khaled Elmously wrote: > On 2018-11-08 10:37:58 , Kamal Mostafa wrote: > > From: Arthur Kiyanovski <akiyano@amazon.com> > > > > BugLink: http://bugs.launchpad.net/bugs/1802341 > > > > In ena_remove() we have the following stack call: > > ena_remove() > > unregister_netdev() > > ena_destroy_device() > > netif_carrier_off() > > > > Calling netif_carrier_off() causes linkwatch to try to handle the > > link change event on the already unregistered netdev, which leads > > to a read from an unreadable memory address. > > > > This patch switches the order of the two functions, so that > > netif_carrier_off() is called on a regiestered netdev. > > > > To accomplish this fix we also had to: > > 1. Remove the set bit ENA_FLAG_TRIGGER_RESET > > 2. Add a sanitiy check in ena_close() > > both to prevent double device reset (when calling unregister_netdev() > > ena_close is called, but the device was already deleted in > > ena_destroy_device()). > > 3. Set the admin_queue running state to false to avoid using it after > > device was reset (for example when calling ena_destroy_all_io_queues() > > right after ena_com_dev_reset() in ena_down) > > > > Finally, driver version is also updated. > > > > Change-Id: I3cc1aafe9cb3701a6eaee44e00add0e175c93148 > > > > Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> > > Signed-off-by: Kamal Mostafa <kamal@canonical.com> > > --- > > drivers/net/ethernet/amazon/ena/ena_netdev.c | 21 ++++++++++----------- > > drivers/net/ethernet/amazon/ena/ena_netdev.h | 2 +- > > 2 files changed, 11 insertions(+), 12 deletions(-) > > > > diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c > > index 7739cca..08a05f3 100644 > > --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c > > +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c > > @@ -1849,6 +1849,8 @@ static void ena_down(struct ena_adapter *adapter) > > rc = ena_com_dev_reset(adapter->ena_dev, adapter->reset_reason); > > if (rc) > > dev_err(&adapter->pdev->dev, "Device reset failed\n"); > > + /* stop submitting admin commands on a device that was reset */ > > + ena_com_set_admin_running_state(adapter->ena_dev, false); > > } > > > > ena_destroy_all_io_queues(adapter); > > @@ -1915,6 +1917,9 @@ static int ena_close(struct net_device *netdev) > > > > netif_dbg(adapter, ifdown, netdev, "%s\n", __func__); > > > > + if (!test_bit(ENA_FLAG_DEVICE_RUNNING, &adapter->flags)) > > + return 0; > > + > > if (test_bit(ENA_FLAG_DEV_UP, &adapter->flags)) > > ena_down(adapter); > > > > @@ -2613,9 +2618,7 @@ static void ena_destroy_device(struct ena_adapter *adapter, bool graceful) > > ena_down(adapter); > > > > /* Stop the device from sending AENQ events (in case reset flag is set > > - * and device is up, ena_close already reset the device > > - * In case the reset flag is set and the device is up, ena_down() > > - * already perform the reset, so it can be skipped. > > + * and device is up, ena_down() already reset the device. > > */ > > if (!(test_bit(ENA_FLAG_TRIGGER_RESET, &adapter->flags) && dev_up)) > > ena_com_dev_reset(adapter->ena_dev, adapter->reset_reason); > > @@ -3452,6 +3455,8 @@ static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent) > > ena_com_rss_destroy(ena_dev); > > err_free_msix: > > ena_com_dev_reset(ena_dev, ENA_REGS_RESET_INIT_ERR); > > + /* stop submitting admin commands on a device that was reset */ > > + ena_com_set_admin_running_state(ena_dev, false); > > ena_free_mgmnt_irq(adapter); > > ena_disable_msix(adapter); > > err_worker_destroy: > > @@ -3524,18 +3529,12 @@ static void ena_remove(struct pci_dev *pdev) > > > > cancel_work_sync(&adapter->reset_task); > > > > - unregister_netdev(netdev); > > - > > - /* If the device is running then we want to make sure the device will be > > - * reset to make sure no more events will be issued by the device. > > - */ > > - if (test_bit(ENA_FLAG_DEVICE_RUNNING, &adapter->flags)) > > - set_bit(ENA_FLAG_TRIGGER_RESET, &adapter->flags); > > - > > rtnl_lock(); > > ena_destroy_device(adapter, true); > > rtnl_unlock(); > > > > + unregister_netdev(netdev); > > + > > free_netdev(netdev); > > > > ena_com_rss_destroy(ena_dev); > > diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h > > index 5218736..dc8b617 100644 > > --- a/drivers/net/ethernet/amazon/ena/ena_netdev.h > > +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h > > @@ -45,7 +45,7 @@ > > > > #define DRV_MODULE_VER_MAJOR 2 > > #define DRV_MODULE_VER_MINOR 0 > > -#define DRV_MODULE_VER_SUBMINOR 1 > > +#define DRV_MODULE_VER_SUBMINOR 2 > > > > #define DRV_MODULE_NAME "ena" > > #ifndef DRV_MODULE_VERSION > > -- > > 2.7.4 > > > > > > -- > > kernel-team mailing list > > kernel-team@lists.ubuntu.com > > https://lists.ubuntu.com/mailman/listinfo/kernel-team
On Thu, Nov 08, 2018 at 10:37:58AM -0800, Kamal Mostafa wrote: > From: Arthur Kiyanovski <akiyano@amazon.com> > > BugLink: http://bugs.launchpad.net/bugs/1802341 > > In ena_remove() we have the following stack call: > ena_remove() > unregister_netdev() > ena_destroy_device() > netif_carrier_off() > > Calling netif_carrier_off() causes linkwatch to try to handle the > link change event on the already unregistered netdev, which leads > to a read from an unreadable memory address. > > This patch switches the order of the two functions, so that > netif_carrier_off() is called on a regiestered netdev. > > To accomplish this fix we also had to: > 1. Remove the set bit ENA_FLAG_TRIGGER_RESET > 2. Add a sanitiy check in ena_close() > both to prevent double device reset (when calling unregister_netdev() > ena_close is called, but the device was already deleted in > ena_destroy_device()). > 3. Set the admin_queue running state to false to avoid using it after > device was reset (for example when calling ena_destroy_all_io_queues() > right after ena_com_dev_reset() in ena_down) > > Finally, driver version is also updated. > > Change-Id: I3cc1aafe9cb3701a6eaee44e00add0e175c93148 > > Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> > Signed-off-by: Kamal Mostafa <kamal@canonical.com> Applied to unstable/master, thanks!
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c index 7739cca..08a05f3 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c @@ -1849,6 +1849,8 @@ static void ena_down(struct ena_adapter *adapter) rc = ena_com_dev_reset(adapter->ena_dev, adapter->reset_reason); if (rc) dev_err(&adapter->pdev->dev, "Device reset failed\n"); + /* stop submitting admin commands on a device that was reset */ + ena_com_set_admin_running_state(adapter->ena_dev, false); } ena_destroy_all_io_queues(adapter); @@ -1915,6 +1917,9 @@ static int ena_close(struct net_device *netdev) netif_dbg(adapter, ifdown, netdev, "%s\n", __func__); + if (!test_bit(ENA_FLAG_DEVICE_RUNNING, &adapter->flags)) + return 0; + if (test_bit(ENA_FLAG_DEV_UP, &adapter->flags)) ena_down(adapter); @@ -2613,9 +2618,7 @@ static void ena_destroy_device(struct ena_adapter *adapter, bool graceful) ena_down(adapter); /* Stop the device from sending AENQ events (in case reset flag is set - * and device is up, ena_close already reset the device - * In case the reset flag is set and the device is up, ena_down() - * already perform the reset, so it can be skipped. + * and device is up, ena_down() already reset the device. */ if (!(test_bit(ENA_FLAG_TRIGGER_RESET, &adapter->flags) && dev_up)) ena_com_dev_reset(adapter->ena_dev, adapter->reset_reason); @@ -3452,6 +3455,8 @@ static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent) ena_com_rss_destroy(ena_dev); err_free_msix: ena_com_dev_reset(ena_dev, ENA_REGS_RESET_INIT_ERR); + /* stop submitting admin commands on a device that was reset */ + ena_com_set_admin_running_state(ena_dev, false); ena_free_mgmnt_irq(adapter); ena_disable_msix(adapter); err_worker_destroy: @@ -3524,18 +3529,12 @@ static void ena_remove(struct pci_dev *pdev) cancel_work_sync(&adapter->reset_task); - unregister_netdev(netdev); - - /* If the device is running then we want to make sure the device will be - * reset to make sure no more events will be issued by the device. - */ - if (test_bit(ENA_FLAG_DEVICE_RUNNING, &adapter->flags)) - set_bit(ENA_FLAG_TRIGGER_RESET, &adapter->flags); - rtnl_lock(); ena_destroy_device(adapter, true); rtnl_unlock(); + unregister_netdev(netdev); + free_netdev(netdev); ena_com_rss_destroy(ena_dev); diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h index 5218736..dc8b617 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.h +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h @@ -45,7 +45,7 @@ #define DRV_MODULE_VER_MAJOR 2 #define DRV_MODULE_VER_MINOR 0 -#define DRV_MODULE_VER_SUBMINOR 1 +#define DRV_MODULE_VER_SUBMINOR 2 #define DRV_MODULE_NAME "ena" #ifndef DRV_MODULE_VERSION