diff mbox

Disable Bus Master on PCI device shutdown

Message ID 20120427190033.GA17588@ldl.usa.hp.com
State Accepted
Headers show

Commit Message

Khalid Aziz April 27, 2012, 7 p.m. UTC
Disable Bus Master bit on the device in
pci_device_shutdown() to ensure PCI devices do not continue
to DMA data after shutdown. This can cause memory
corruption in case of a kexec where the current kernel
shuts down and transfers control to a new kernel while a
PCI device continues to DMA to memory that does not belong
to it any more in the new kernel.

I have tested this code on two laptops, two workstations and
a 16-socket server. kexec worked correctly on all of them.


Signed-off-by: Khalid Aziz <khalid.aziz@hp.com>
---
 drivers/pci/pci-driver.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

Comments

Bjorn Helgaas May 3, 2012, 11:52 p.m. UTC | #1
On Fri, Apr 27, 2012 at 1:00 PM, Khalid Aziz <khalid.aziz@hp.com> wrote:
> Disable Bus Master bit on the device in
> pci_device_shutdown() to ensure PCI devices do not continue
> to DMA data after shutdown. This can cause memory
> corruption in case of a kexec where the current kernel
> shuts down and transfers control to a new kernel while a
> PCI device continues to DMA to memory that does not belong
> to it any more in the new kernel.
>
> I have tested this code on two laptops, two workstations and
> a 16-socket server. kexec worked correctly on all of them.
>
>
> Signed-off-by: Khalid Aziz <khalid.aziz@hp.com>
> ---
>  drivers/pci/pci-driver.c |    6 ++++++
>  1 files changed, 6 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 6b54b23..9db5940 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -420,6 +420,12 @@ static void pci_device_shutdown(struct device *dev)
>        pci_msi_shutdown(pci_dev);
>        pci_msix_shutdown(pci_dev);
>
> +       /*
> +        * Turn off Bus Master bit on the device to tell it to not
> +        * continue to do DMA
> +        */
> +       pci_disable_device(pci_dev);
> +
>        /*
>         * Devices may be enabled to wake up by runtime PM, but they need not
>         * be supposed to wake up the system from its "power off" state (e.g.

Any comment on this, Eric?  It seems reasonable to me.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas May 4, 2012, 5:15 p.m. UTC | #2
On Thu, May 3, 2012 at 5:52 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Fri, Apr 27, 2012 at 1:00 PM, Khalid Aziz <khalid.aziz@hp.com> wrote:
>> Disable Bus Master bit on the device in
>> pci_device_shutdown() to ensure PCI devices do not continue
>> to DMA data after shutdown. This can cause memory
>> corruption in case of a kexec where the current kernel
>> shuts down and transfers control to a new kernel while a
>> PCI device continues to DMA to memory that does not belong
>> to it any more in the new kernel.
>>
>> I have tested this code on two laptops, two workstations and
>> a 16-socket server. kexec worked correctly on all of them.
>>
>>
>> Signed-off-by: Khalid Aziz <khalid.aziz@hp.com>
>> ---
>>  drivers/pci/pci-driver.c |    6 ++++++
>>  1 files changed, 6 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
>> index 6b54b23..9db5940 100644
>> --- a/drivers/pci/pci-driver.c
>> +++ b/drivers/pci/pci-driver.c
>> @@ -420,6 +420,12 @@ static void pci_device_shutdown(struct device *dev)
>>        pci_msi_shutdown(pci_dev);
>>        pci_msix_shutdown(pci_dev);
>>
>> +       /*
>> +        * Turn off Bus Master bit on the device to tell it to not
>> +        * continue to do DMA
>> +        */
>> +       pci_disable_device(pci_dev);
>> +
>>        /*
>>         * Devices may be enabled to wake up by runtime PM, but they need not
>>         * be supposed to wake up the system from its "power off" state (e.g.
>
> Any comment on this, Eric?  It seems reasonable to me.

Applied to my "next" branch, thanks.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matthew Garrett June 6, 2012, 1:50 p.m. UTC | #3
On Fri, Apr 27, 2012 at 01:00:33PM -0600, Khalid Aziz wrote:
> Disable Bus Master bit on the device in
> pci_device_shutdown() to ensure PCI devices do not continue
> to DMA data after shutdown. This can cause memory
> corruption in case of a kexec where the current kernel
> shuts down and transfers control to a new kernel while a
> PCI device continues to DMA to memory that does not belong
> to it any more in the new kernel.

This protects against the case where a piece of hardware is continuing 
to DMA even after the driver shutdown method has been called? I'm not 
convinced this is safe. Some Broadcom parts will crash if busmastering 
is disabled while they're still performing DMA, and they'll then hang 
the bus if reenabled. There's also the risk that the hardware will start 
DMAing again if it's reenabled after being shut down. It seems like 
you're covering over the case where the driver didn't correctly quiesce 
the hardware, but you risk triggering other bugs instead.
Khalid Aziz June 6, 2012, 4:17 p.m. UTC | #4
On Wed, 2012-06-06 at 14:50 +0100, Matthew Garrett wrote:
> On Fri, Apr 27, 2012 at 01:00:33PM -0600, Khalid Aziz wrote:
> > Disable Bus Master bit on the device in
> > pci_device_shutdown() to ensure PCI devices do not continue
> > to DMA data after shutdown. This can cause memory
> > corruption in case of a kexec where the current kernel
> > shuts down and transfers control to a new kernel while a
> > PCI device continues to DMA to memory that does not belong
> > to it any more in the new kernel.
> 
> This protects against the case where a piece of hardware is continuing 
> to DMA even after the driver shutdown method has been called? I'm not 
> convinced this is safe. Some Broadcom parts will crash if busmastering 
> is disabled while they're still performing DMA, and they'll then hang 
> the bus if reenabled. There's also the risk that the hardware will start 
> DMAing again if it's reenabled after being shut down. It seems like 
> you're covering over the case where the driver didn't correctly quiesce 
> the hardware, but you risk triggering other bugs instead.

Hi Matthew,

That is a good piece of information. I see your concern and agree with
it. My take is shutdown method for the drivers will end all active I/O
and clear the I/O queue. This should take care of any DMA caused by an
I/O request originating in the kernel. For devices like NIC, a DMA can
be triggered by an incoming packet and I am trying to stop that by
disabling Bus Master bit. This is the issue that was reported on kexec
mailing list in July of last year and it involved qla driver. I observed
similar problem with kexec on ia64 many years ago and had written a
patch to disable Bus Master bit on kexec. This patch was in ia64 tree
for some time before it was removed. HP shipped kernels with this patch
for many years and those kernels have been in deployment in field for
some 7+ years with no problems.

So it seems we do have a real problem. I understand there are devices
with quirks related to Bus Master bit and it really helps to know about
those. I have found disabling Bus Master bit has worked very well for
all of the systems I have deployed kernels with this patch on but I have
not come even close to having tried all PCI devices out there. I am open
to other suggestions on how to solve this problem and make kexec
reliable.

Thanks Matthew! I appreciate the feedback.
Matthew Garrett June 6, 2012, 4:27 p.m. UTC | #5
On Wed, Jun 06, 2012 at 10:17:43AM -0600, Khalid Aziz wrote:

> That is a good piece of information. I see your concern and agree with
> it. My take is shutdown method for the drivers will end all active I/O
> and clear the I/O queue. This should take care of any DMA caused by an
> I/O request originating in the kernel. For devices like NIC, a DMA can
> be triggered by an incoming packet and I am trying to stop that by
> disabling Bus Master bit. This is the issue that was reported on kexec
> mailing list in July of last year and it involved qla driver. I observed
> similar problem with kexec on ia64 many years ago and had written a
> patch to disable Bus Master bit on kexec. This patch was in ia64 tree
> for some time before it was removed. HP shipped kernels with this patch
> for many years and those kernels have been in deployment in field for
> some 7+ years with no problems.

If the qla driver knows that this is safe, then can't this just be done 
in the qla driver?
Khalid Aziz June 6, 2012, 5:32 p.m. UTC | #6
On Wed, 2012-06-06 at 17:27 +0100, Matthew Garrett wrote:
> If the qla driver knows that this is safe, then can't this just be done 
> in the qla driver?
> 

That is one way of doing it. It makes for a safe change but potentially
leaves out a number of other cases that could be helped by wider scope
change of disabling Bus Master bit on all PCI devices, until we
laboriously debug every one of those cases and then add code to disable
Bus Master bit. Sounds to me like it is not a clear win in either case.

Do we agree that if device shutdown routine cleanly shuts down all I/O,
clearing PCI Bus Mster bit should be safe? If yes, then we only have to
deal with broken devices. So the approach could be to disable Bus Master
bit unless the device ID matches a blacklist which we update as we find
broken devices. I really don't like the idea of maintaining blacklists
in the kernel for such things but is that a more practical approach? If
blacklist does not sound good, maybe we can ask drivers to tell PCI
subsystem if they are not ok with clearing Bus Master bit and then PCI
subsystem could skip those devices.
Matthew Garrett June 6, 2012, 5:42 p.m. UTC | #7
On Wed, Jun 06, 2012 at 11:32:36AM -0600, Khalid Aziz wrote:

> Do we agree that if device shutdown routine cleanly shuts down all I/O,
> clearing PCI Bus Mster bit should be safe?

In the absence of hardware that dislikes the bus master bit ever being 
disabled, yes. Do we know if hardware is ever tested in that situation?

> If yes, then we only have to deal with broken devices. So the approach 
> could be to disable Bus Master bit unless the device ID matches a 
> blacklist which we update as we find broken devices. I really don't 
> like the idea of maintaining blacklists in the kernel for such things 
> but is that a more practical approach? If blacklist does not sound 
> good, maybe we can ask drivers to tell PCI subsystem if they are not 
> ok with clearing Bus Master bit and then PCI subsystem could skip 
> those devices.

Or we could just put responsibility on the drivers to ensure that the 
hardware won't continue doing any DMA, either by shutting down the 
engines or clearing the bit.
Khalid Aziz June 6, 2012, 6:07 p.m. UTC | #8
On Wed, 2012-06-06 at 18:42 +0100, Matthew Garrett wrote:
> On Wed, Jun 06, 2012 at 11:32:36AM -0600, Khalid Aziz wrote:
> 
> > Do we agree that if device shutdown routine cleanly shuts down all I/O,
> > clearing PCI Bus Mster bit should be safe?
> 
> In the absence of hardware that dislikes the bus master bit ever being 
> disabled, yes. Do we know if hardware is ever tested in that situation?

I will wait for device vendors to comment on that. I can't claim I have
tested more than a few devices that way.

> 
> > If yes, then we only have to deal with broken devices. So the approach 
> > could be to disable Bus Master bit unless the device ID matches a 
> > blacklist which we update as we find broken devices. I really don't 
> > like the idea of maintaining blacklists in the kernel for such things 
> > but is that a more practical approach? If blacklist does not sound 
> > good, maybe we can ask drivers to tell PCI subsystem if they are not 
> > ok with clearing Bus Master bit and then PCI subsystem could skip 
> > those devices.
> 
> Or we could just put responsibility on the drivers to ensure that the 
> hardware won't continue doing any DMA, either by shutting down the 
> engines or clearing the bit.
> 

I assume device shutdown routine should stop all I/O and shutting down
DMA engine. Disabling Bus Master bit is just an extra measure of safety.
I do like the idea of disabling Bus Master bit in device shutdown
routine. After all, drivers know their hardware best. On the other hand,
it is change to lots of driver code to implement this which means it
will end up happening slowly over period of time. I don't mind doing the
work up front on a good number of drivers I feel comfortable modifying.
I am ok with pulling out code to clear bus master bit from PCI subsystem
and replacing it with modified shutdown routines for a few drivers to
start with.

Does any one see any other issues with modifying driver shutdown
routines for disabling Bus Master bit? Bjorn, any opinions?

====================================================================
Khalid Aziz                                         Unix Systems Lab
(970)898-9214                                        Hewlett-Packard
khalid.aziz@hp.com                                  Fort Collins, CO

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric W. Biederman June 6, 2012, 7:42 p.m. UTC | #9
Khalid Aziz <khalid.aziz@hp.com> writes:

> On Wed, 2012-06-06 at 18:42 +0100, Matthew Garrett wrote:
>> On Wed, Jun 06, 2012 at 11:32:36AM -0600, Khalid Aziz wrote:
>> 
>> > Do we agree that if device shutdown routine cleanly shuts down all I/O,
>> > clearing PCI Bus Mster bit should be safe?
>> 
>> In the absence of hardware that dislikes the bus master bit ever being 
>> disabled, yes. Do we know if hardware is ever tested in that situation?
>
> I will wait for device vendors to comment on that. I can't claim I have
> tested more than a few devices that way.

Testing is easy.  kexec into a new kernel.  Shrug.  A long standing
useful kernel feature.  In all other cases I expec the firmware triggers
a board level reset of the hardware to avoid issues during reboot.

>> > If yes, then we only have to deal with broken devices. So the approach 
>> > could be to disable Bus Master bit unless the device ID matches a 
>> > blacklist which we update as we find broken devices. I really don't 
>> > like the idea of maintaining blacklists in the kernel for such things 
>> > but is that a more practical approach? If blacklist does not sound 
>> > good, maybe we can ask drivers to tell PCI subsystem if they are not 
>> > ok with clearing Bus Master bit and then PCI subsystem could skip 
>> > those devices.
>> 
>> Or we could just put responsibility on the drivers to ensure that the 
>> hardware won't continue doing any DMA, either by shutting down the 
>> engines or clearing the bit.

Where the responsibily has squarely been for the last decade, and we
still have issues in the common case.

> I assume device shutdown routine should stop all I/O and shutting down
> DMA engine. Disabling Bus Master bit is just an extra measure of safety.
> I do like the idea of disabling Bus Master bit in device shutdown
> routine. After all, drivers know their hardware best. On the other hand,
> it is change to lots of driver code to implement this which means it
> will end up happening slowly over period of time. I don't mind doing the
> work up front on a good number of drivers I feel comfortable modifying.
> I am ok with pulling out code to clear bus master bit from PCI subsystem
> and replacing it with modified shutdown routines for a few drivers to
> start with.

Absent anyone even knowing if there are devices that exist that can not
tolerate their bus master bit being flipped when DMA is not ongoing I
think the current state of the code is good.  When we find the broken
hardware that can not tolerate a standard PCI bit being used in a
standard way we can add a flag in the core to avoid doing that.

pci_device_shutdown calls drv->shutdown before calling
pci_device_disable.  Which means that only devices that have trouble
with this bit being flipped while DMA is ongoing and don't bother
to stop their own DMA will have a problem.

As for shifting problems I do think we have shifted the problem in a
very positive way.  Now instead of having a random failure at a random
location caused by DMA happing at a random moment for no expected reason
we have failures happening when we disable or enable a device, which
should be much more debugable.

If we encounter devices that can't have their bus master bit disabled at
all we can move that functionality into the drivers or add some sort of
flag so that pci_device_shutdown avoids this on real hardware.

> Does any one see any other issues with modifying driver shutdown
> routines for disabling Bus Master bit? Bjorn, any opinions?

I don't have a problem with moving it all of the way into the drivers
I just think it might be a little bit silly at this point.

Ultimately I don't see the complaint raised by this thread.  Either
the drivers for the broadcom devices in questoin are buggy before we
added the pci_disable_device or those drivers are not buggy.

If we really want to do something to reduce the testing burden and make
certain things work better in general we need to merge the device
shutdown and the device remove methods.  Shrug.  People keep getting
squeamish when I suggest that.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matthew Garrett June 6, 2012, 8:09 p.m. UTC | #10
On Wed, Jun 06, 2012 at 12:42:07PM -0700, Eric W. Biederman wrote:

> pci_device_shutdown calls drv->shutdown before calling
> pci_device_disable.  Which means that only devices that have trouble
> with this bit being flipped while DMA is ongoing and don't bother
> to stop their own DMA will have a problem.

drv->shutdown should already be quiescing the hardware. If it isn't, it 
should be. If it is, what does this patch fix? Many drivers 
call pci_enable_device() early enough that they clearly expect the 
hardware to be quiescent when they do so, so this really does seem to 
simply handle the kexec case without handling any other cases that could 
be similarly problematic.
Myron Stowe June 6, 2012, 8:16 p.m. UTC | #11
On Wed, Jun 6, 2012 at 12:07 PM, Khalid Aziz <khalid.aziz@hp.com> wrote:
> On Wed, 2012-06-06 at 18:42 +0100, Matthew Garrett wrote:
>> On Wed, Jun 06, 2012 at 11:32:36AM -0600, Khalid Aziz wrote:
>>
>> > Do we agree that if device shutdown routine cleanly shuts down all I/O,
>> > clearing PCI Bus Mster bit should be safe?
>>
>> In the absence of hardware that dislikes the bus master bit ever being
>> disabled, yes. Do we know if hardware is ever tested in that situation?
>
> I will wait for device vendors to comment on that. I can't claim I have
> tested more than a few devices that way.
>
>>
>> > If yes, then we only have to deal with broken devices. So the approach
>> > could be to disable Bus Master bit unless the device ID matches a
>> > blacklist which we update as we find broken devices. I really don't
>> > like the idea of maintaining blacklists in the kernel for such things
>> > but is that a more practical approach? If blacklist does not sound
>> > good, maybe we can ask drivers to tell PCI subsystem if they are not
>> > ok with clearing Bus Master bit and then PCI subsystem could skip
>> > those devices.
>>
>> Or we could just put responsibility on the drivers to ensure that the
>> hardware won't continue doing any DMA, either by shutting down the
>> engines or clearing the bit.
>>
>
> I assume device shutdown routine should stop all I/O and shutting down
> DMA engine. Disabling Bus Master bit is just an extra measure of safety.
> I do like the idea of disabling Bus Master bit in device shutdown
> routine. After all, drivers know their hardware best. On the other hand,
> it is change to lots of driver code to implement this which means it
> will end up happening slowly over period of time. I don't mind doing the
> work up front on a good number of drivers I feel comfortable modifying.
> I am ok with pulling out code to clear bus master bit from PCI subsystem
> and replacing it with modified shutdown routines for a few drivers to
> start with.
>
> Does any one see any other issues with modifying driver shutdown
> routines for disabling Bus Master bit? Bjorn, any opinions?

Hay Khalid,

I've been following this discussion but still do not understand what the
perceived correct fix is for the situation you were originally trying to
circumvent - namely, stopping NIC devices from triggering DMA activity
upon receipt of an incoming packet.

From the discussion it sounds like a driver, upon shutdown, can end
all active I/O
clearing the I/O queue, and also (?) shut down its DMA engine, and may
also go as far as disabling the Bus Master bit.

What was occurring in the original issue - was the driver in question not
shutting down its DMA engine for some, possibly valid, reason?

Thanks,
 Myron
>
> ====================================================================
> Khalid Aziz                                         Unix Systems Lab
> (970)898-9214                                        Hewlett-Packard
> khalid.aziz@hp.com                                  Fort Collins, CO
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Cox June 6, 2012, 8:50 p.m. UTC | #12
> This protects against the case where a piece of hardware is continuing 
> to DMA even after the driver shutdown method has been called? I'm not 

It doesn't. We also have hardware which craps itself if you clear the bus
mastering bit and we have platforms where the BIOS gets most upset if you
do that on suspend paths. There are also lots of devices that simply
ignore the bus mastering bit !

> convinced this is safe. Some Broadcom parts will crash if busmastering 
> is disabled while they're still performing DMA, and they'll then hang 
> the bus if reenabled.

This is very common, disabling the bus master bit isn't exactly a
tested or well defined pathway. A lot of device FIFOs will also happily
belch out their queue when you flip the bit. So it might look like
progress but it probably isn't.

Unfortunately if you've got a device peeing into memory you need to fix
the driver, or sometimes the firmware. You can probably use the IOMMU for
some protection on newer systems.

And then you get things like the CS5520 where disabling bus mastering on
the IDE controller crashes the machine, and some UMA video devices that
honour the bit for video fetch.

From the IDE experience you really need to be careful here. Changing the
default is asking for nasty and weird regressions.

This isn't a fix, its a band aid to cover over broken driver shutdown
methods. Those driver shutdown methods need fixing.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Khalid Aziz June 6, 2012, 11:03 p.m. UTC | #13
On Wed, 2012-06-06 at 14:16 -0600, Myron Stowe wrote:
> Hay Khalid,
> 
> I've been following this discussion but still do not understand what the
> perceived correct fix is for the situation you were originally trying to
> circumvent - namely, stopping NIC devices from triggering DMA activity
> upon receipt of an incoming packet.
> 
> From the discussion it sounds like a driver, upon shutdown, can end
> all active I/O
> clearing the I/O queue, and also (?) shut down its DMA engine, and may
> also go as far as disabling the Bus Master bit.
> 
> What was occurring in the original issue - was the driver in question not
> shutting down its DMA engine for some, possibly valid, reason?
> 
> Thanks,
>  Myron

Hi Myron,

The last driver issue with kexec was reported on kexec mailing list in
July of last year and that was with qla driver. From what H Peter Anvin
described, the driver left DMA engine running.

For the issue I found on an ia64 platform, it was so many years ago that
you are now taxing my old and decrepit memory :) I am digging through
old internal bug reports and will let you know once I find that original
bug report.

====================================================================
Khalid Aziz                                         Unix Systems Lab
(970)898-9214                                        Hewlett-Packard
khalid.aziz@hp.com                                  Fort Collins, CO

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Myron Stowe June 6, 2012, 11:18 p.m. UTC | #14
On Wed, Jun 6, 2012 at 5:03 PM, Khalid Aziz <khalid.aziz@hp.com> wrote:
> On Wed, 2012-06-06 at 14:16 -0600, Myron Stowe wrote:
>> Hay Khalid,
>>
>> I've been following this discussion but still do not understand what the
>> perceived correct fix is for the situation you were originally trying to
>> circumvent - namely, stopping NIC devices from triggering DMA activity
>> upon receipt of an incoming packet.
>>
>> From the discussion it sounds like a driver, upon shutdown, can end
>> all active I/O
>> clearing the I/O queue, and also (?) shut down its DMA engine, and may
>> also go as far as disabling the Bus Master bit.
>>
>> What was occurring in the original issue - was the driver in question not
>> shutting down its DMA engine for some, possibly valid, reason?
>>
>> Thanks,
>>  Myron
>
> Hi Myron,
>
> The last driver issue with kexec was reported on kexec mailing list in
> July of last year and that was with qla driver. From what H Peter Anvin
> described, the driver left DMA engine running.
>
> For the issue I found on an ia64 platform, it was so many years ago that
> you are now taxing my old and decrepit memory :)

Yeah, if it was more than a couple of weeks ago then I'm likely to have
forgotten myself.

If you can find it great but don't spend too much time digging.  I just wanted
some validation of my understanding -
  o  That drivers must consider both "active I/O" and "DMA" (when
shutting down),
  o  That the Bus Master disablement was trying to compensate for a driver not
quiescing "DMA" as it probably should have (I wasn't sure if these were separate
capabilities and/or if there was possibly a valid reason for a driver
leaving "DMA"
in an enabled state upon shutdown - i.e. Wake on LAN or something similar).

Myron

> I am digging through
> old internal bug reports and will let you know once I find that original
> bug report.
>
> ====================================================================
> Khalid Aziz                                         Unix Systems Lab
> (970)898-9214                                        Hewlett-Packard
> khalid.aziz@hp.com                                  Fort Collins, CO
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Khalid Aziz June 7, 2012, 2:21 p.m. UTC | #15
On Wed, 2012-06-06 at 12:42 -0700, Eric W. Biederman wrote:
> Absent anyone even knowing if there are devices that exist that can not
> tolerate their bus master bit being flipped when DMA is not ongoing I
> think the current state of the code is good.  When we find the broken
> hardware that can not tolerate a standard PCI bit being used in a
> standard way we can add a flag in the core to avoid doing that.
> 
> pci_device_shutdown calls drv->shutdown before calling
> pci_device_disable.  Which means that only devices that have trouble
> with this bit being flipped while DMA is ongoing and don't bother
> to stop their own DMA will have a problem.
> 
> As for shifting problems I do think we have shifted the problem in a
> very positive way.  Now instead of having a random failure at a random
> location caused by DMA happing at a random moment for no expected reason
> we have failures happening when we disable or enable a device, which
> should be much more debugable.

That is a very good point. Failure at a predictable point is much better
than random failures with the root cause being elsewhere from the point
of failure in time and code. This at least gives us a good shot at being
able to debug buggy hardware and drivers.
Andi Kleen June 7, 2012, 5:07 p.m. UTC | #16
Alan Cox <alan@lxorguk.ukuu.org.uk> writes:

>> This protects against the case where a piece of hardware is continuing 
>> to DMA even after the driver shutdown method has been called? I'm not 
>
> It doesn't. We also have hardware which craps itself if you clear the bus
> mastering bit and we have platforms where the BIOS gets most upset if you
> do that on suspend paths. There are also lots of devices that simply
> ignore the bus mastering bit !

But it also makes my system do kexec successfully for the first time.

Maybe can make it an option.

n-Andi
Andi Kleen June 7, 2012, 5:08 p.m. UTC | #17
Matthew Garrett <mjg@redhat.com> writes:
>
> This protects against the case where a piece of hardware is continuing 
> to DMA even after the driver shutdown method has been called? I'm not 
> convinced this is safe. Some Broadcom parts will crash if busmastering 
> is disabled while they're still performing DMA, and they'll then hang 
> the bus if reenabled. There's also the risk that the hardware will start 
> DMAing again if it's reenabled after being shut down. It seems like 
> you're covering over the case where the driver didn't correctly quiesce 
> the hardware, but you risk triggering other bugs instead.

One alternative I've been pondering some time is to use AER link reset
instead. But this is mainly on servers, a lot of clients don't have it.

-Andi
Alan Cox June 7, 2012, 5:13 p.m. UTC | #18
> But it also makes my system do kexec successfully for the first time.
> 
> Maybe can make it an option.

More useful might be to report which devices have the bus master bit left
on at the time you tried the kexec, then we might actually find the bug.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Khalid Aziz June 7, 2012, 5:36 p.m. UTC | #19
On Thu, 2012-06-07 at 10:07 -0700, Andi Kleen wrote:
> Alan Cox <alan@lxorguk.ukuu.org.uk> writes:
> 
> >> This protects against the case where a piece of hardware is continuing 
> >> to DMA even after the driver shutdown method has been called? I'm not 
> >
> > It doesn't. We also have hardware which craps itself if you clear the bus
> > mastering bit and we have platforms where the BIOS gets most upset if you
> > do that on suspend paths. There are also lots of devices that simply
> > ignore the bus mastering bit !
> 
> But it also makes my system do kexec successfully for the first time.
> 
> Maybe can make it an option.
> 
> n-Andi
> 

Hi Andi,

That makes my day :) It will be very helpful to see which PCI devices
your system has.

I was thinking about kernel command line option this morning and had
started looking at reset_devices option to see if I could leverage that
one in any way.
Khalid Aziz June 7, 2012, 5:43 p.m. UTC | #20
On Wed, 2012-06-06 at 21:09 +0100, Matthew Garrett wrote:
> On Wed, Jun 06, 2012 at 12:42:07PM -0700, Eric W. Biederman wrote:
> 
> > pci_device_shutdown calls drv->shutdown before calling
> > pci_device_disable.  Which means that only devices that have trouble
> > with this bit being flipped while DMA is ongoing and don't bother
> > to stop their own DMA will have a problem.
> 
> drv->shutdown should already be quiescing the hardware. If it isn't, it 
> should be. If it is, what does this patch fix? Many drivers 
> call pci_enable_device() early enough that they clearly expect the 
> hardware to be quiescent when they do so, so this really does seem to 
> simply handle the kexec case without handling any other cases that could 
> be similarly problematic.                               
> 

I agree drv->shutdown should quiesce the hardware (although anecdotal
evidence suggests that is not happening), so turning Bus Master bit off
is an additional safety measure. As Alan pointed out, doing that can be
fraught with danger. Clearing Bus Master bit seems like the right thing
to do, but due to buggy hardware/firmware it can cause problems,
although those problems happen at predictable times and thus become
easier to diagnose.

I am considering other options. Andi mentioned command line option which
does allow for some control instead of blindly clearing Bus Master bit
on all system. I am also wondering if clearing Bus Master bit on just
the PCI bridges is an option. If Bus Master bit is clear on a bridge, it
is not supposed to allow DMA transactions through it. That can prevent
random memory corruption by broken devices and also not upset the
devices that do not like their Bus Master bit cleared. Any opinions on
this option?

I have also acquired a broadcom NIC that is causing what looks like
random DMAs into kernel memory on a kexec, although this is with a
kernel that does not have the Bus Master reset patch. I will run some
experiments on this NIC and see what I can figure out.

====================================================================
Khalid Aziz                                         Unix Systems Lab
(970)898-9214                                        Hewlett-Packard
khalid.aziz@hp.com                                  Fort Collins, CO

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 6b54b23..9db5940 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -420,6 +420,12 @@  static void pci_device_shutdown(struct device *dev)
 	pci_msi_shutdown(pci_dev);
 	pci_msix_shutdown(pci_dev);
 
+	/* 
+	 * Turn off Bus Master bit on the device to tell it to not
+	 * continue to do DMA
+	 */
+	pci_disable_device(pci_dev);
+
 	/*
 	 * Devices may be enabled to wake up by runtime PM, but they need not
 	 * be supposed to wake up the system from its "power off" state (e.g.