diff mbox

PCI: add a quirk for keeping Bus Master bit on shutdown

Message ID 20131126183750.GA29265@concerto
State Not Applicable
Headers show

Commit Message

Khalid Aziz Nov. 26, 2013, 6:37 p.m. UTC
On Tue, Nov 26, 2013 at 05:48:06PM +0000, Matthew Garrett wrote:
> On Tue, Nov 26, 2013 at 10:35:26AM -0700, Bjorn Helgaas wrote:
> > On Tue, Nov 26, 2013 at 9:40 AM, Khalid Aziz <khalid.aziz@oracle.com> wrote:
> > > Disabling Bus Master bit is effectively a brute force and not an elegant way
> > > to stop unwanted DMA. It can have side effects as Alan and others pointed
> > > out in the original discussion, and we are now seeing one with Lynx Point on
> > > Acer.
> > 
> > I'm getting more queasy all the time about disabling Bus Master.  I
> > don't think RHEL does it, and that's probably where most kexec use is.
> >  So I doubt we really have much experience with it yet.
> 
> Does Windows disable the BM bit on shutdown? If not, it's likely that 
> there are platforms where the SMM code assumes it's still enabled. We 
> also know that there are devices that hang if BM is disabled while their 
> DMA engines are still running.
> 
> Unless we verify that Windows does this, I think there's no way we can 
> guarantee that firmware won't make assumptions about the state of PCI. 
> The easiest compromise would probably be to set a flag that disables 
> busmastering purely when we're performing a kexec.
> 

This is the approach that is most appealing to me. If we confine
clearing Bus Master bit to just the kexec path, we can side step all
the land mines in normal shutdown path. Here is an informal patch
just for comments and discussion. It has been tested lightly. If
this approach is acceptable, I will create a more formal patch.



--
Khalid
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Konstantin Khlebnikov Nov. 26, 2013, 7:38 p.m. UTC | #1
On Tue, Nov 26, 2013 at 10:37 PM, Khalid Aziz <khalid.aziz@oracle.com> wrote:
> On Tue, Nov 26, 2013 at 05:48:06PM +0000, Matthew Garrett wrote:
>> On Tue, Nov 26, 2013 at 10:35:26AM -0700, Bjorn Helgaas wrote:
>> > On Tue, Nov 26, 2013 at 9:40 AM, Khalid Aziz <khalid.aziz@oracle.com> wrote:
>> > > Disabling Bus Master bit is effectively a brute force and not an elegant way
>> > > to stop unwanted DMA. It can have side effects as Alan and others pointed
>> > > out in the original discussion, and we are now seeing one with Lynx Point on
>> > > Acer.
>> >
>> > I'm getting more queasy all the time about disabling Bus Master.  I
>> > don't think RHEL does it, and that's probably where most kexec use is.
>> >  So I doubt we really have much experience with it yet.
>>
>> Does Windows disable the BM bit on shutdown? If not, it's likely that
>> there are platforms where the SMM code assumes it's still enabled. We
>> also know that there are devices that hang if BM is disabled while their
>> DMA engines are still running.
>>
>> Unless we verify that Windows does this, I think there's no way we can
>> guarantee that firmware won't make assumptions about the state of PCI.
>> The easiest compromise would probably be to set a flag that disables
>> busmastering purely when we're performing a kexec.
>>
>
> This is the approach that is most appealing to me. If we confine
> clearing Bus Master bit to just the kexec path, we can side step all
> the land mines in normal shutdown path. Here is an informal patch
> just for comments and discussion. It has been tested lightly. If
> this approach is acceptable, I will create a more formal patch.

ACK. This is perfect solution.

>
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 9042fdb..c605be0 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -400,10 +400,11 @@ static void pci_device_shutdown(struct device *dev)
>         pci_msix_shutdown(pci_dev);
>
>         /*
> -        * Turn off Bus Master bit on the device to tell it to not
> -        * continue to do DMA. Don't touch devices in D3cold or unknown states.
> +        * If this is a kexec reboot, turn off Bus Master bit on the
> +        * device to tell it to not continue to do DMA. Don't touch
> +        * devices in D3cold or unknown states.
>          */
> -       if (pci_dev->current_state <= PCI_D3hot)
> +       if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot))
>                 pci_clear_master(pci_dev);
>  }
>
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 9c91ecc..7d85733 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -9,6 +9,9 @@
>  extern const unsigned char pcix_bus_speed[];
>  extern const unsigned char pcie_link_speed[];
>
> +/* flag to track if kexec reboot is in progress */
> +extern unsigned long kexec_in_progress;
> +
>  /* Functions internal to the PCI core code */
>
>  int pci_create_sysfs_dev_files(struct pci_dev *pdev);
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index 490afc0..fd2d63e 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -47,6 +47,9 @@ u32 vmcoreinfo_note[VMCOREINFO_NOTE_SIZE/4];
>  size_t vmcoreinfo_size;
>  size_t vmcoreinfo_max_size = sizeof(vmcoreinfo_data);
>
> +/* Flag to indicate we are going to kexec a new kernel */
> +unsigned long kexec_in_progress = 0;
> +
>  /* Location of the reserved area for the crash kernel */
>  struct resource crashk_res = {
>         .name  = "Crash kernel",
> @@ -1675,6 +1678,7 @@ int kernel_kexec(void)
>         } else
>  #endif
>         {
> +               kexec_in_progress = 1;
>                 kernel_restart_prepare(NULL);
>                 printk(KERN_EMERG "Starting new kernel\n");
>                 machine_shutdown();
>
>
> --
> Khalid
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 9042fdb..c605be0 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -400,10 +400,11 @@  static void pci_device_shutdown(struct device *dev)
 	pci_msix_shutdown(pci_dev);
 
 	/*
-	 * Turn off Bus Master bit on the device to tell it to not
-	 * continue to do DMA. Don't touch devices in D3cold or unknown states.
+	 * If this is a kexec reboot, turn off Bus Master bit on the
+	 * device to tell it to not continue to do DMA. Don't touch
+	 * devices in D3cold or unknown states.
 	 */
-	if (pci_dev->current_state <= PCI_D3hot)
+	if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot))
 		pci_clear_master(pci_dev);
 }
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 9c91ecc..7d85733 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -9,6 +9,9 @@ 
 extern const unsigned char pcix_bus_speed[];
 extern const unsigned char pcie_link_speed[];
 
+/* flag to track if kexec reboot is in progress */
+extern unsigned long kexec_in_progress;
+
 /* Functions internal to the PCI core code */
 
 int pci_create_sysfs_dev_files(struct pci_dev *pdev);
diff --git a/kernel/kexec.c b/kernel/kexec.c
index 490afc0..fd2d63e 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -47,6 +47,9 @@  u32 vmcoreinfo_note[VMCOREINFO_NOTE_SIZE/4];
 size_t vmcoreinfo_size;
 size_t vmcoreinfo_max_size = sizeof(vmcoreinfo_data);
 
+/* Flag to indicate we are going to kexec a new kernel */
+unsigned long kexec_in_progress = 0;
+
 /* Location of the reserved area for the crash kernel */
 struct resource crashk_res = {
 	.name  = "Crash kernel",
@@ -1675,6 +1678,7 @@  int kernel_kexec(void)
 	} else
 #endif
 	{
+		kexec_in_progress = 1;
 		kernel_restart_prepare(NULL);
 		printk(KERN_EMERG "Starting new kernel\n");
 		machine_shutdown();