diff mbox

[RFC,2/2] igb/ixgbe: add code to trigger function reset if reset_devices is set

Message ID 20100731005910.32625.89518.stgit@localhost.localdomain
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Kirsher, Jeffrey T July 31, 2010, 12:59 a.m. UTC
From: Alexander Duyck <alexander.h.duyck@intel.com>

This change makes it so that both igb and ixgbe can trigger a full pcie
function reset if the reset_devices kernel parameter is defined.  The main
reason for adding this is that kdump can cause serious issues when the
kdump kernel resets the IOMMU while DMA transactions are still occurring.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/igb/igb_main.c     |    3 +++
 drivers/net/ixgbe/ixgbe_main.c |    3 +++
 2 files changed, 6 insertions(+), 0 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller Aug. 1, 2010, 8:15 a.m. UTC | #1
From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Fri, 30 Jul 2010 17:59:12 -0700

> From: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> This change makes it so that both igb and ixgbe can trigger a full pcie
> function reset if the reset_devices kernel parameter is defined.  The main
> reason for adding this is that kdump can cause serious issues when the
> kdump kernel resets the IOMMU while DMA transactions are still occurring.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

I tend to disagree with the essence of this change.

Which is that we should add workaround after workaround for things
that aren't functioning properly in kdump and kexec.

They should have a pass that shuts devices down properly, so that this
kind of stuff doesn't need to happen in the kernel we then boot into.

What happens on non-PCIE systems then?  Do they just lose when this
happens?

No, you dun goof'd.  :-) Find another way to fix this please.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Woodhouse Aug. 5, 2010, 4:27 p.m. UTC | #2
On Sun, 2010-08-01 at 01:15 -0700, David Miller wrote:
> From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Date: Fri, 30 Jul 2010 17:59:12 -0700
> 
> > From: Alexander Duyck <alexander.h.duyck@intel.com>
> > 
> > This change makes it so that both igb and ixgbe can trigger a full pcie
> > function reset if the reset_devices kernel parameter is defined.  The main
> > reason for adding this is that kdump can cause serious issues when the
> > kdump kernel resets the IOMMU while DMA transactions are still occurring.
> > 
> > Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> > Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> 
> I tend to disagree with the essence of this change.
> 
> Which is that we should add workaround after workaround for things
> that aren't functioning properly in kdump and kexec.
> 
> They should have a pass that shuts devices down properly, so that this
> kind of stuff doesn't need to happen in the kernel we then boot into.

For a normal kexec, arguably true.

But in the kdump case, the original kernel has *crashed* and we really
don't have that option -- we need to jump *straight* to the new kernel
and have it reset the hardware.

The device driver really *ought* to be able to reset the hardware from
whatever state it's in when the new kernel starts up. Anything less is
broken, and reminds me of those crappy drivers that only work after a
soft-reboot from Windows.

Most drivers *do* quite happily initialise their device and reliably get
it into a known state; it's just that this particular hardware goes into
a *particularly* stroppy fit when it gets a DMA master abort (which is
what happens when the IOMMU stops it from scribbling into memory after
the new kernel has taken over).

> What happens on non-PCIE systems then?  Do they just lose when this
> happens?

If they have a device that's this broken, and the driver can't get it
into a working state any other way, then yes -- I don't see any way to
*avoid* them losing.

I don't like the reset_devices thing though -- the device driver ought
to cope (and reset the device with a full PCIe reset if that's the only
way to make it stop sulking) *regardless* of that option, if it's
necessary.
Kenji Kaneshige Aug. 10, 2010, 9:31 a.m. UTC | #3
(2010/08/06 1:27), David Woodhouse wrote:
> On Sun, 2010-08-01 at 01:15 -0700, David Miller wrote:
>> From: Jeff Kirsher<jeffrey.t.kirsher@intel.com>
>> Date: Fri, 30 Jul 2010 17:59:12 -0700
>>
>>> From: Alexander Duyck<alexander.h.duyck@intel.com>
>>>
>>> This change makes it so that both igb and ixgbe can trigger a full pcie
>>> function reset if the reset_devices kernel parameter is defined.  The main
>>> reason for adding this is that kdump can cause serious issues when the
>>> kdump kernel resets the IOMMU while DMA transactions are still occurring.
>>>
>>> Signed-off-by: Alexander Duyck<alexander.h.duyck@intel.com>
>>> Signed-off-by: Jeff Kirsher<jeffrey.t.kirsher@intel.com>
>>
>> I tend to disagree with the essence of this change.
>>
>> Which is that we should add workaround after workaround for things
>> that aren't functioning properly in kdump and kexec.
>>
>> They should have a pass that shuts devices down properly, so that this
>> kind of stuff doesn't need to happen in the kernel we then boot into.
>
> For a normal kexec, arguably true.
>
> But in the kdump case, the original kernel has *crashed* and we really
> don't have that option -- we need to jump *straight* to the new kernel
> and have it reset the hardware.
>
> The device driver really *ought* to be able to reset the hardware from
> whatever state it's in when the new kernel starts up. Anything less is
> broken, and reminds me of those crappy drivers that only work after a
> soft-reboot from Windows.
>
> Most drivers *do* quite happily initialise their device and reliably get
> it into a known state; it's just that this particular hardware goes into
> a *particularly* stroppy fit when it gets a DMA master abort (which is
> what happens when the IOMMU stops it from scribbling into memory after
> the new kernel has taken over).
>
>> What happens on non-PCIE systems then?  Do they just lose when this
>> happens?
>
> If they have a device that's this broken, and the driver can't get it
> into a working state any other way, then yes -- I don't see any way to
> *avoid* them losing.

What about asserting secondary RST# on the bridge?
It would not work for devices on the root bus though.

Thanks,
Kenji Kaneshige

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index 667b527..b924443 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -1731,6 +1731,9 @@  static int __devinit igb_probe(struct pci_dev *pdev,
 		return -EINVAL;
 	}
 
+	if (reset_devices && pci_reset_device_function(pdev))
+		return -ENODEV;
+
 	err = pci_enable_device_mem(pdev);
 	if (err)
 		return err;
diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index 7d6a415..f459f24 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -6548,6 +6548,9 @@  static int __devinit ixgbe_probe(struct pci_dev *pdev,
 		return -EINVAL;
 	}
 
+	if (reset_devices && pci_reset_device_function(pdev))
+		return -ENODEV;
+
 	err = pci_enable_device_mem(pdev);
 	if (err)
 		return err;