PCI: DPC: Clear AER status bits before disabling port containment

Message ID 20180516213306.27027-1-mr.nuke.me@gmail.com
State Not Applicable
Delegated to: Bjorn Helgaas
Headers show
Series
  • PCI: DPC: Clear AER status bits before disabling port containment
Related show

Commit Message

Alexandru Gagniuc May 16, 2018, 9:33 p.m.
AER status bits are sticky, and they survive system resets. Downstream
devices are usually taken care of after re-enumerating the downstream
busses, as the AER bits are cleared during probe().

However, nothing clears the bits of the port which contained the
error. These sticky bits may leave some BIOSes to think that something
bad happened, and print ominous messages on next boot. To prevent this,
tidy up the AER status bits before releasing containment.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/pci/pcie/dpc.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Sinan Kaya May 16, 2018, 10:44 p.m. | #1
On 5/16/2018 5:33 PM, Alexandru Gagniuc wrote:
> AER status bits are sticky, and they survive system resets. Downstream
> devices are usually taken care of after re-enumerating the downstream
> busses, as the AER bits are cleared during probe().
> 
> However, nothing clears the bits of the port which contained the
> error. These sticky bits may leave some BIOSes to think that something
> bad happened, and print ominous messages on next boot. To prevent this,
> tidy up the AER status bits before releasing containment.
> 
> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> ---
>  drivers/pci/pcie/dpc.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> index 8c57d607e603..bf82d6936556 100644
> --- a/drivers/pci/pcie/dpc.c
> +++ b/drivers/pci/pcie/dpc.c
> @@ -112,6 +112,10 @@ static void dpc_work(struct work_struct *work)
>  		dpc->rp_pio_status = 0;
>  	}
>  
> +	/* DPC event made a mess of our AER status bits. Clean them up. */
> +	pci_cleanup_aer_error_status_regs(pdev);
> +	/* TODO: Should we also use aer_print_error to log the event? */
> +
>  	pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
>  		PCI_EXP_DPC_STATUS_TRIGGER | PCI_EXP_DPC_STATUS_INTERRUPT);
>  
> 

I think Keith has a patch to fix this. It was under review at some point.
Keith Busch May 16, 2018, 11:12 p.m. | #2
On Wed, May 16, 2018 at 06:44:22PM -0400, Sinan Kaya wrote:
> On 5/16/2018 5:33 PM, Alexandru Gagniuc wrote:
> > AER status bits are sticky, and they survive system resets. Downstream
> > devices are usually taken care of after re-enumerating the downstream
> > busses, as the AER bits are cleared during probe().
> > 
> > However, nothing clears the bits of the port which contained the
> > error. These sticky bits may leave some BIOSes to think that something
> > bad happened, and print ominous messages on next boot. To prevent this,
> > tidy up the AER status bits before releasing containment.
> > 
> > Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> > ---
> >  drivers/pci/pcie/dpc.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> > index 8c57d607e603..bf82d6936556 100644
> > --- a/drivers/pci/pcie/dpc.c
> > +++ b/drivers/pci/pcie/dpc.c
> > @@ -112,6 +112,10 @@ static void dpc_work(struct work_struct *work)
> >  		dpc->rp_pio_status = 0;
> >  	}
> >  
> > +	/* DPC event made a mess of our AER status bits. Clean them up. */
> > +	pci_cleanup_aer_error_status_regs(pdev);
> > +	/* TODO: Should we also use aer_print_error to log the event? */
> > +
> >  	pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
> >  		PCI_EXP_DPC_STATUS_TRIGGER | PCI_EXP_DPC_STATUS_INTERRUPT);
> >  
> > 
> 
> I think Keith has a patch to fix this. It was under review at some point.

Right, I do intend to following up on this, but I've had some trouble
finding time the last few weeks. Sorry about that, things will clear up
for me shortly.
Bjorn Helgaas June 19, 2018, 9:57 p.m. | #3
On Wed, May 16, 2018 at 05:12:21PM -0600, Keith Busch wrote:
> On Wed, May 16, 2018 at 06:44:22PM -0400, Sinan Kaya wrote:
> > On 5/16/2018 5:33 PM, Alexandru Gagniuc wrote:
> > > AER status bits are sticky, and they survive system resets. Downstream
> > > devices are usually taken care of after re-enumerating the downstream
> > > busses, as the AER bits are cleared during probe().
> > > 
> > > However, nothing clears the bits of the port which contained the
> > > error. These sticky bits may leave some BIOSes to think that something
> > > bad happened, and print ominous messages on next boot. To prevent this,
> > > tidy up the AER status bits before releasing containment.
> > > 
> > > Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> > > ---
> > >  drivers/pci/pcie/dpc.c | 4 ++++
> > >  1 file changed, 4 insertions(+)
> > > 
> > > diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> > > index 8c57d607e603..bf82d6936556 100644
> > > --- a/drivers/pci/pcie/dpc.c
> > > +++ b/drivers/pci/pcie/dpc.c
> > > @@ -112,6 +112,10 @@ static void dpc_work(struct work_struct *work)
> > >  		dpc->rp_pio_status = 0;
> > >  	}
> > >  
> > > +	/* DPC event made a mess of our AER status bits. Clean them up. */
> > > +	pci_cleanup_aer_error_status_regs(pdev);
> > > +	/* TODO: Should we also use aer_print_error to log the event? */
> > > +
> > >  	pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
> > >  		PCI_EXP_DPC_STATUS_TRIGGER | PCI_EXP_DPC_STATUS_INTERRUPT);
> > >  
> > > 
> > 
> > I think Keith has a patch to fix this. It was under review at some point.
> 
> Right, I do intend to following up on this, but I've had some trouble
> finding time the last few weeks. Sorry about that, things will clear up
> for me shortly.

I'll drop this (Alexandru's) patch for now, waiting for your update, Keith.
Alexandru Gagniuc June 26, 2018, 8:51 p.m. | #4
On 06/19/2018 04:57 PM, Bjorn Helgaas wrote:
> On Wed, May 16, 2018 at 05:12:21PM -0600, Keith Busch wrote:
>> On Wed, May 16, 2018 at 06:44:22PM -0400, Sinan Kaya wrote:
>>> On 5/16/2018 5:33 PM, Alexandru Gagniuc wrote:
>>>> AER status bits are sticky, and they survive system resets. Downstream
>>>> devices are usually taken care of after re-enumerating the downstream
>>>> busses, as the AER bits are cleared during probe().
>>>>
>>>> However, nothing clears the bits of the port which contained the
>>>> error. These sticky bits may leave some BIOSes to think that something
>>>> bad happened, and print ominous messages on next boot. To prevent this,
>>>> tidy up the AER status bits before releasing containment.
>>>>
>>>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
>>>> ---
>>>>  drivers/pci/pcie/dpc.c | 4 ++++
>>>>  1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
>>>> index 8c57d607e603..bf82d6936556 100644
>>>> --- a/drivers/pci/pcie/dpc.c
>>>> +++ b/drivers/pci/pcie/dpc.c
>>>> @@ -112,6 +112,10 @@ static void dpc_work(struct work_struct *work)
>>>>  		dpc->rp_pio_status = 0;
>>>>  	}
>>>>  
>>>> +	/* DPC event made a mess of our AER status bits. Clean them up. */
>>>> +	pci_cleanup_aer_error_status_regs(pdev);
>>>> +	/* TODO: Should we also use aer_print_error to log the event? */
>>>> +
>>>>  	pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
>>>>  		PCI_EXP_DPC_STATUS_TRIGGER | PCI_EXP_DPC_STATUS_INTERRUPT);
>>>>  
>>>>
>>>
>>> I think Keith has a patch to fix this. It was under review at some point.
>>
>> Right, I do intend to following up on this, but I've had some trouble
>> finding time the last few weeks. Sorry about that, things will clear up
>> for me shortly.
> 
> I'll drop this (Alexandru's) patch for now, waiting for your update, Keith.

I wonder if clearing AER status bits is mutually exclusive with
refactoring other parts of DPC handling?

Alex

Patch

diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index 8c57d607e603..bf82d6936556 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -112,6 +112,10 @@  static void dpc_work(struct work_struct *work)
 		dpc->rp_pio_status = 0;
 	}
 
+	/* DPC event made a mess of our AER status bits. Clean them up. */
+	pci_cleanup_aer_error_status_regs(pdev);
+	/* TODO: Should we also use aer_print_error to log the event? */
+
 	pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
 		PCI_EXP_DPC_STATUS_TRIGGER | PCI_EXP_DPC_STATUS_INTERRUPT);