diff mbox

pci, Add AER_panic sysfs file

Message ID 1337350606-32648-1-git-send-email-prarit@redhat.com
State Deferred
Headers show

Commit Message

Prarit Bhargava May 18, 2012, 2:16 p.m. UTC
Bjorn, ... [v2] with missing Doc file.

P.

----8<----

Consider the following case

		[ RP ]
		  |
		  |
	+---------+-----------+
	|	  |	      |
       [H1]      [H2]        [X1]

where RP is a PCIE Root Port, H1 and H2 are devices with drivers that support
PCIE AER driver error handling (ie, they have pci_error_handlers defined in
the driver), and X1 is a device with a driver that does not support PCIE
AER driver error handling.

If the Root Port takes an error what currently happens is that the
bus resets and H1 & H2 call their slot_reset functions.  X1 does nothing.

In some cases a user may not wish the system to continue because X1 is
an unhardened driver.  In these cases, the system should not do a bus reset,
but rather the system should panic to avoid any further possible data
corruption.

This patch implements an AER_panic sysfs entry for each root port which
a user can modify.  AER_panic = 1, means the system will panic on a
PCIE error which would have normally resulted in a secondary bus reset.
The default is AER_panic = 0.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Shyam Iyer <Shyam_Iyer@Dell.com>
Cc: gregkh@linuxfoundation.org
Cc: ddutile@redhat.com

[v2]: added missing Documentation/ABI/testing/sysfs-bus-pci
---
 Documentation/ABI/testing/sysfs-bus-pci |   10 +++++++
 drivers/pci/pci-sysfs.c                 |   42 ++++++++++++++++++++++++++++++-
 drivers/pci/pcie/aer/aerdrv.c           |    3 ++
 include/linux/pci.h                     |    1 +
 4 files changed, 55 insertions(+), 1 deletions(-)

Comments

Greg Kroah-Hartman May 18, 2012, 3:47 p.m. UTC | #1
On Fri, May 18, 2012 at 10:16:46AM -0400, Prarit Bhargava wrote:
> Bjorn, ... [v2] with missing Doc file.
> 
> P.
> 
> ----8<----
> 
> Consider the following case
> 
> 		[ RP ]
> 		  |
> 		  |
> 	+---------+-----------+
> 	|	  |	      |
>        [H1]      [H2]        [X1]
> 
> where RP is a PCIE Root Port, H1 and H2 are devices with drivers that support
> PCIE AER driver error handling (ie, they have pci_error_handlers defined in
> the driver), and X1 is a device with a driver that does not support PCIE
> AER driver error handling.

Why can't we provide "default" error handlers that recover from such
errors?

> If the Root Port takes an error what currently happens is that the
> bus resets and H1 & H2 call their slot_reset functions.  X1 does nothing.
> 
> In some cases a user may not wish the system to continue because X1 is
> an unhardened driver.

Please define "unhardened".  Why aren't all drivers "hardened"?

> In these cases, the system should not do a bus reset, but rather the
> system should panic to avoid any further possible data corruption.

Really?  You really want to panic the whole system and shut down and
potentially loose everything?  That does not sound like a good idea at
all to me, is there really no way to recover from this?

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Prarit Bhargava May 18, 2012, 5:17 p.m. UTC | #2
> 
> Please define "unhardened".  Why aren't all drivers "hardened"?

Most drivers _currently_ do not handle reading all f's (or -1) from hardware.
Some drivers do handle some situations but definitely not all of them.
Hardening a driver involves making the driver "-1" safe.

Some companies do ship hardened drivers, but the ones in the tree are not hardened.

[The above comment is in no way an approval of shipping drivers outside of the
kernel.  I'm just stating a fact.]

The effort involved in hardening this drivers is significant.  It will be a long
time before anyone considers the in-tree drivers hardened.  We should start with
a baby-step of acknowledging the problem and giving current users a way of
protecting their data.

> 
>> In these cases, the system should not do a bus reset, but rather the
>> system should panic to avoid any further possible data corruption.
> 
> Really?  You really want to panic the whole system and shut down and
> potentially loose everything?  That does not sound like a good idea at
> all to me, is there really no way to recover from this?

Yes, that's _exactly_ what I want to do.  Having a driver that is capable of
writing corrupted data to a disk or corrupting memory is much worse than
panicking and stopping the system for a short period of time.

The default is to handle an AER through a bus reset so a user must actively
request the panic.

P.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Prarit Bhargava May 18, 2012, 5:26 p.m. UTC | #3
> Prarit Bhargava <prarit <at> redhat.com> writes:
> 
>>  /**
>> diff --git a/drivers/pci/pcie/aer/aerdrv.c b/drivers/pci/pcie/aer/aerdrv.c
>> index 58ad791..dd6b352 100644
>> --- a/drivers/pci/pcie/aer/aerdrv.c
>> +++ b/drivers/pci/pcie/aer/aerdrv.c
>> @@ -346,6 +346,9 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
>>      u32 reg32;
>>      int pos;
>>
>> +    if (dev->rp_AER_panic)
>> +        panic("%s: AER detected on Root Port", pci_name(dev));
>> +
>>      pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR);
>>
> 
> 
> I really like this idea. I just wonder if the panic can happen in do_recovery in
> aerdrv_core.c before the broadcast_error_message() is done that invokes all of
> the error_detected callbacks. It would be best to panic as soon as possible to
> increase error containment.

Hi Betty,

Thanks for the review.

IMO that sounds like a better idea.  As you said, panicking early as possible in
order to increase error containment is the best option.

Bjorn, any objection to moving the panic up as far as possible?

I'll let the rest of the discussion settle before putting out a [v3].

P.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg Kroah-Hartman May 18, 2012, 6:13 p.m. UTC | #4
On Fri, May 18, 2012 at 01:17:54PM -0400, Prarit Bhargava wrote:
> 
> > 
> > Please define "unhardened".  Why aren't all drivers "hardened"?
> 
> Most drivers _currently_ do not handle reading all f's (or -1) from hardware.
> Some drivers do handle some situations but definitely not all of them.
> Hardening a driver involves making the driver "-1" safe.

That's not "hardening", it should be written as, "fixing broken
drivers".  It's a bug if a PCI driver can not handle this as that is
exactly what happens when a PCI device is removed from the system
without the driver knowing about it.

> Some companies do ship hardened drivers, but the ones in the tree are not hardened.

Why are there out-of-tree drivers that are so-called "hardened" and why
are those bug fixes not merged into the kernel tree?

> [The above comment is in no way an approval of shipping drivers outside of the
> kernel.  I'm just stating a fact.]

Any specific drivers you are referring to so that I can go and kick
someone to get their act together?

Seriously, this is a bug in the PCI drivers, not anything else, it needs
to be fixed there, not papered over with a kernel crash from the PCI
core.

> The effort involved in hardening this drivers is significant.

It shouldn't be, this has been well known for what, 13+ years now?  This
is nothing new at all, and again, is a bug if the driver can't handle
this.

> It will be a long time before anyone considers the in-tree drivers
> hardened.  We should start with a baby-step of acknowledging the
> problem and giving current users a way of protecting their data.

No, we need to fix the drivers, again, this is a well-known issue.

What specific drivers do you see in the kernel tree right now that can
not handle this type of thing.  A list would be great so that we can fix
them now.

> >> In these cases, the system should not do a bus reset, but rather the
> >> system should panic to avoid any further possible data corruption.
> > 
> > Really?  You really want to panic the whole system and shut down and
> > potentially loose everything?  That does not sound like a good idea at
> > all to me, is there really no way to recover from this?
> 
> Yes, that's _exactly_ what I want to do.  Having a driver that is capable of
> writing corrupted data to a disk or corrupting memory is much worse than
> panicking and stopping the system for a short period of time.

But by panicking, you just lost data and have potentially corrupt data
written to the disk in a half-finished manner, plus you now have a
broken system that is stuck and needs to be rebooted :)

> The default is to handle an AER through a bus reset so a user must actively
> request the panic.

Fair enough, I can understand why some people might want this type of
control over a system, and if they reboot-on-panic, they can recover
quickly and get back up and running.

But again, this needs to be fixed in the drivers themselves, otherwise
they are broken on systems that, again, have been shipping for 13+ years
now.  It's unacceptable for the driver authors to be that sloppy.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Prarit Bhargava May 18, 2012, 7:28 p.m. UTC | #5
On 05/18/2012 02:13 PM, Greg KH wrote:
> On Fri, May 18, 2012 at 01:17:54PM -0400, Prarit Bhargava wrote:
>>
>>>
>>> Please define "unhardened".  Why aren't all drivers "hardened"?
>>
>> Most drivers _currently_ do not handle reading all f's (or -1) from hardware.
>> Some drivers do handle some situations but definitely not all of them.
>> Hardening a driver involves making the driver "-1" safe.
> 
> That's not "hardening", it should be written as, "fixing broken
> drivers".  It's a bug if a PCI driver can not handle this as that is
> exactly what happens when a PCI device is removed from the system
> without the driver knowing about it.

Sorry Greg, I should have stated this at the beginning.  Without question
patches should be _and will be_ sent to drivers as problems are found.

> 
>> Some companies do ship hardened drivers, but the ones in the tree are not hardened.
> 
> Why are there out-of-tree drivers that are so-called "hardened" and why
> are those bug fixes not merged into the kernel tree?

See comment below.
> 
>> [The above comment is in no way an approval of shipping drivers outside of the
>> kernel.  I'm just stating a fact.]

I'm just stating a fact.  I have no idea why patches are not on the list.  Nor
am I condoning the activity of keeping fixes outside of the tree.  I've _just
started_ working with a group who has patches and am going to do some work with
them on getting patches out to the tree.

> 
> Any specific drivers you are referring to so that I can go and kick
> someone to get their act together?

I'll get a list together and hopefully we can get some patches out.

> 
>> The effort involved in hardening this drivers is significant.
> 
> It shouldn't be, this has been well known for what, 13+ years now?  This
> is nothing new at all, and again, is a bug if the driver can't handle
> this.

Most drivers cannot handle surprise removals, and in this case that's what we're
effectively talking about.  There may be a few that can, and even those might be
able to handle a few cases of surprise removal or reset events.

>>>> In these cases, the system should not do a bus reset, but rather the
>>>> system should panic to avoid any further possible data corruption.
>>>
>>> Really?  You really want to panic the whole system and shut down and
>>> potentially loose everything?  That does not sound like a good idea at
>>> all to me, is there really no way to recover from this?
>>
>> Yes, that's _exactly_ what I want to do.  Having a driver that is capable of
>> writing corrupted data to a disk or corrupting memory is much worse than
>> panicking and stopping the system for a short period of time.
> 
> But by panicking, you just lost data and have potentially corrupt data
> written to the disk in a half-finished manner, plus you now have a
> broken system that is stuck and needs to be rebooted :)

Fair enough -- I suppose this comes down to:  Which is worse?  Coming back to a
system with corrupt data or memory, or rebooting and losing data (and waiting
for a reboot)? :)

In my view, if a user *KNOWS* that a driver isn't going to play well with a
reset then let them make the call -- it shouldn't be up to us to decide what is
best for them.  If they want a panic, let them have it.  As time goes on the
drivers will get better but that isn't going to happen overnight.

> 
>> The default is to handle an AER through a bus reset so a user must actively
>> request the panic.
> 
> Fair enough, I can understand why some people might want this type of
> control over a system, and if they reboot-on-panic, they can recover
> quickly and get back up and running.
> 
> But again, this needs to be fixed in the drivers themselves, otherwise
> they are broken on systems that, again, have been shipping for 13+ years
> now.  It's unacceptable for the driver authors to be that sloppy.

I agree with you Greg.  I 100% agree with you.  But getting fixes into the
drivers will take some time.  Getting them to a state where
commercial/enterprise customers consider them reliable for surprise events is
going to take some time... so I'm arguing that we should go with a simple
user-based policy and fix the drivers as we move along.

P.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg Kroah-Hartman May 18, 2012, 11:19 p.m. UTC | #6
On Fri, May 18, 2012 at 03:28:36PM -0400, Prarit Bhargava wrote:
> 
> 
> On 05/18/2012 02:13 PM, Greg KH wrote:
> > On Fri, May 18, 2012 at 01:17:54PM -0400, Prarit Bhargava wrote:
> >>
> >>>
> >>> Please define "unhardened".  Why aren't all drivers "hardened"?
> >>
> >> Most drivers _currently_ do not handle reading all f's (or -1) from hardware.
> >> Some drivers do handle some situations but definitely not all of them.
> >> Hardening a driver involves making the driver "-1" safe.
> > 
> > That's not "hardening", it should be written as, "fixing broken
> > drivers".  It's a bug if a PCI driver can not handle this as that is
> > exactly what happens when a PCI device is removed from the system
> > without the driver knowing about it.
> 
> Sorry Greg, I should have stated this at the beginning.  Without question
> patches should be _and will be_ sent to drivers as problems are found.

Great, just do that then.

Seriously, that's all you need to do, no modifying the PCI core needed.

There shouldn't be these problems in any "real" drivers anyway, a number
of us did a lot of testing for this very problem years ago (yanking out
huge drawers of PCI slots, having fun with ExpressCard devices, etc.)

Any driver that can't handle this is fundimentally broken, and needs to
be fixed now, no messing around with this "enterprise" crud.

> >> Some companies do ship hardened drivers, but the ones in the tree are not hardened.
> > 
> > Why are there out-of-tree drivers that are so-called "hardened" and why
> > are those bug fixes not merged into the kernel tree?
> 
> See comment below.
> > 
> >> [The above comment is in no way an approval of shipping drivers outside of the
> >> kernel.  I'm just stating a fact.]
> 
> I'm just stating a fact.  I have no idea why patches are not on the list.  Nor
> am I condoning the activity of keeping fixes outside of the tree.  I've _just
> started_ working with a group who has patches and am going to do some work with
> them on getting patches out to the tree.

Send them today.  What's keeping them from going in right now?

> > Any specific drivers you are referring to so that I can go and kick
> > someone to get their act together?
> 
> I'll get a list together and hopefully we can get some patches out.
> 
> > 
> >> The effort involved in hardening this drivers is significant.
> > 
> > It shouldn't be, this has been well known for what, 13+ years now?  This
> > is nothing new at all, and again, is a bug if the driver can't handle
> > this.
> 
> Most drivers cannot handle surprise removals, and in this case that's what we're
> effectively talking about.  There may be a few that can, and even those might be
> able to handle a few cases of surprise removal or reset events.

I think it's really the other way around, most should be able to handle
this as it was tested by a lot of people.

Unless new code crept in that wasn't tested, but no, no company would
ever submit stuff like that, would they?  Nevermind...

> >>>> In these cases, the system should not do a bus reset, but rather the
> >>>> system should panic to avoid any further possible data corruption.
> >>>
> >>> Really?  You really want to panic the whole system and shut down and
> >>> potentially loose everything?  That does not sound like a good idea at
> >>> all to me, is there really no way to recover from this?
> >>
> >> Yes, that's _exactly_ what I want to do.  Having a driver that is capable of
> >> writing corrupted data to a disk or corrupting memory is much worse than
> >> panicking and stopping the system for a short period of time.
> > 
> > But by panicking, you just lost data and have potentially corrupt data
> > written to the disk in a half-finished manner, plus you now have a
> > broken system that is stuck and needs to be rebooted :)
> 
> Fair enough -- I suppose this comes down to:  Which is worse?  Coming back to a
> system with corrupt data or memory, or rebooting and losing data (and waiting
> for a reboot)? :)
> 
> In my view, if a user *KNOWS* that a driver isn't going to play well with a
> reset then let them make the call -- it shouldn't be up to us to decide what is
> best for them.  If they want a panic, let them have it.  As time goes on the
> drivers will get better but that isn't going to happen overnight.

How can a user know that?  You haven't even told us what drivers can't
handle this!  That's not fair at all, what you are really asking us to
do is:
	Take this core patch to cause oopses to work around some unnamed
	crappy drivers that are broken and can't handle basic,
	fundamental, aspects of handling their hardware.

That's not acceptable at all, fix the problem at the root cause, in the
drivers, right now.  There should not be anything stopping this.

> >> The default is to handle an AER through a bus reset so a user must actively
> >> request the panic.
> > 
> > Fair enough, I can understand why some people might want this type of
> > control over a system, and if they reboot-on-panic, they can recover
> > quickly and get back up and running.
> > 
> > But again, this needs to be fixed in the drivers themselves, otherwise
> > they are broken on systems that, again, have been shipping for 13+ years
> > now.  It's unacceptable for the driver authors to be that sloppy.
> 
> I agree with you Greg.  I 100% agree with you.  But getting fixes into the
> drivers will take some time.  Getting them to a state where
> commercial/enterprise customers consider them reliable for surprise events is
> going to take some time... so I'm arguing that we should go with a simple
> user-based policy and fix the drivers as we move along.

I don't care about "enterprise" customers (hint, there is no such thing,
everyone is an "enterprise").  I care about people who don't know how to
write basic PCI drivers to handle this type of event that has been known
for well over a decade[1].

You are also making the claim that somehow it is easier, and safer, to
modify the PCI core, and that these skittish "enterprise" people will
take that, instead of fixing their drivers.

So I say NAK to this patch, fix the root problem now, there is no excuse
for this work around.

greg k-h

[1] Actually longer than that, this went back to PCMCIA cards which came
out what, in the 1980's?
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
index 34f5110..e64d434 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci
+++ b/Documentation/ABI/testing/sysfs-bus-pci
@@ -210,3 +210,13 @@  Users:
 		firmware assigned instance number of the PCI
 		device that can help in understanding the firmware
 		intended order of the PCI device.
+
+What:		/sys/bus/pci/devices/.../AER_panic
+Date:		May 2012
+Contact:	linux-pci@vger.kernel.org, Prarit Bhargava <prarit@redhat.com>
+Description:
+		This file is present for PCIe Root Ports only and changes the
+		behavior when an AER event targets the port.
+		This attribute can have two values.  If the value is 0, this
+		PCIe bus will issue a bus reset on the secondary bus.  If the
+		value is 1, the kernel will panic.  The default value is 0.
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index a55e248..8c6d525 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -1135,6 +1135,35 @@  static ssize_t reset_store(struct device *dev,
 
 static struct device_attribute reset_attr = __ATTR(reset, 0200, NULL, reset_store);
 
+static ssize_t AER_panic_show(struct device *dev,
+				 struct device_attribute *attr, char *buf)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+
+	return sprintf(buf, "%d\n", pdev->rp_AER_panic);
+}
+
+static ssize_t AER_panic_store(struct device *dev,
+				  struct device_attribute *attr,
+				  const char *buf, size_t count)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	unsigned long val;
+
+	if (kstrtoul(buf, 0, &val) < 0)
+		return -EINVAL;
+
+	if ((val > 1) || (val < 0))
+		return -EINVAL;
+
+	pdev->rp_AER_panic = val;
+
+	return count;
+}
+
+static struct device_attribute rp_AER_panic_attr =
+	      __ATTR(AER_panic, 0600, AER_panic_show, AER_panic_store);
+
 static int pci_create_capabilities_sysfs(struct pci_dev *dev)
 {
 	int retval;
@@ -1169,8 +1198,16 @@  static int pci_create_capabilities_sysfs(struct pci_dev *dev)
 			goto error;
 		dev->reset_fn = 1;
 	}
-	return 0;
 
+	/* PCIE Root Port panic-on-AER allows a user to configure each root
+	 * port to panic on an AER error instead of issuing a bus reset.
+	 */
+	if (dev->pcie_type == PCI_EXP_TYPE_ROOT_PORT) {
+		retval = device_create_file(&dev->dev, &rp_AER_panic_attr);
+		if (retval)
+			goto error;
+	}
+	return 0;
 error:
 	pcie_aspm_remove_sysfs_dev_files(dev);
 	if (dev->vpd && dev->vpd->attr) {
@@ -1279,6 +1316,9 @@  static void pci_remove_capabilities_sysfs(struct pci_dev *dev)
 		device_remove_file(&dev->dev, &reset_attr);
 		dev->reset_fn = 0;
 	}
+
+	if (dev->pcie_type == PCI_EXP_TYPE_ROOT_PORT)
+		device_remove_file(&dev->dev, &rp_AER_panic_attr);
 }
 
 /**
diff --git a/drivers/pci/pcie/aer/aerdrv.c b/drivers/pci/pcie/aer/aerdrv.c
index 58ad791..dd6b352 100644
--- a/drivers/pci/pcie/aer/aerdrv.c
+++ b/drivers/pci/pcie/aer/aerdrv.c
@@ -346,6 +346,9 @@  static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
 	u32 reg32;
 	int pos;
 
+	if (dev->rp_AER_panic)
+		panic("%s: AER detected on Root Port", pci_name(dev));
+
 	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR);
 
 	/* Disable Root's interrupt in response to error messages */
diff --git a/include/linux/pci.h b/include/linux/pci.h
index e444f5b..a4e6a5a 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -324,6 +324,7 @@  struct pci_dev {
 	unsigned int    is_hotplug_bridge:1;
 	unsigned int    __aer_firmware_first_valid:1;
 	unsigned int	__aer_firmware_first:1;
+	unsigned int	rp_AER_panic:1; /* if 1, panic on AER bus reset */
 	pci_dev_flags_t dev_flags;
 	atomic_t	enable_cnt;	/* pci_enable_device has been called */