mbox series

[next-queue,v2,0/2] Address IRQ related crash seen due to io_perm_failure

Message ID 20191011153219.22313.60179.stgit@localhost.localdomain
Headers show
Series Address IRQ related crash seen due to io_perm_failure | expand

Message

Alexander H Duyck Oct. 11, 2019, 3:34 p.m. UTC
David Dai had submitted a patch[1] to address a reported issue with e1000e
calling pci_disable_msi without first freeing the interrupts. Looking over
the issue it seems the problem was the fact that e1000e_down was being
called in e1000_io_error_detected without calling e1000_free_irq, and this
was resulting in e1000e_close skipping over the call to e1000e_down and
e1000_free_irq.

The use of the __E1000_DOWN flag for the close test seems to have come from
the runtime power management changes that were made some time ago. From
what I can tell in the close path we should be disabling runtime power
management via a call to pm_runtime_get_sync. As such we can remove the
test for the __E1000_DOWN bit. However in comparing this with other drivers
we do need to avoid freeing the IRQs more than once. So in order to address
that I have copied the approach taken in igb and taken it a bit further so
that we will always detach the interface and if the interface is up we will
bring it down and free the IRQs. In addition we are able to reuse some of
the power management code so I have taken the opportunity to merge those
bits.

[1]: https://lore.kernel.org/lkml/1570121672-12172-1-git-send-email-zdai@linux.vnet.ibm.com/

v2: Move e1000e_pm_thaw out of CONFIG_PM region to fix build issue on Sparc64

---

Alexander Duyck (2):
      e1000e: Use rtnl_lock to prevent race conditions between net and pci/pm
      e1000e: Drop unnecessary __E1000_DOWN bit twiddling


 drivers/net/ethernet/intel/e1000e/netdev.c |   75 +++++++++++++---------------
 1 file changed, 36 insertions(+), 39 deletions(-)

--

Comments

David Dai Oct. 29, 2019, 2:43 p.m. UTC | #1
On Fri, 2019-10-11 at 08:34 -0700, Alexander Duyck wrote:
> David Dai had submitted a patch[1] to address a reported issue with e1000e
> calling pci_disable_msi without first freeing the interrupts. Looking over
> the issue it seems the problem was the fact that e1000e_down was being
> called in e1000_io_error_detected without calling e1000_free_irq, and this
> was resulting in e1000e_close skipping over the call to e1000e_down and
> e1000_free_irq.
> 
> The use of the __E1000_DOWN flag for the close test seems to have come from
> the runtime power management changes that were made some time ago. From
> what I can tell in the close path we should be disabling runtime power
> management via a call to pm_runtime_get_sync. As such we can remove the
> test for the __E1000_DOWN bit. However in comparing this with other drivers
> we do need to avoid freeing the IRQs more than once. So in order to address
> that I have copied the approach taken in igb and taken it a bit further so
> that we will always detach the interface and if the interface is up we will
> bring it down and free the IRQs. In addition we are able to reuse some of
> the power management code so I have taken the opportunity to merge those
> bits.
> 
> [1]: https://lore.kernel.org/lkml/1570121672-12172-1-git-send-email-zdai@linux.vnet.ibm.com/
> 
> v2: Move e1000e_pm_thaw out of CONFIG_PM region to fix build issue on Sparc64
> 
> ---
> 
> Alexander Duyck (2):
>       e1000e: Use rtnl_lock to prevent race conditions between net and pci/pm
>       e1000e: Drop unnecessary __E1000_DOWN bit twiddling
> 
> 
>  drivers/net/ethernet/intel/e1000e/netdev.c |   75 +++++++++++++---------------
>  1 file changed, 36 insertions(+), 39 deletions(-)
> 
I am not familiar with the process. Don't mean to push you in any way.
Just want to check if these 2 v2 patches will be accepted by upstream?
or any thing else needs to be done to finish the process? 

Thanks! - David