diff mbox series

[v3,3/9] NTB: Remove pci_aer_clear_nonfatal_status() call

Message ID 20220928105946.12469-4-chenzhuo.1@bytedance.com
State New
Headers show
Series PCI/AER: Fix and optimize usage of status clearing api | expand

Commit Message

Zhuo Chen Sept. 28, 2022, 10:59 a.m. UTC
There is no need to clear error status during init code, so remove it.

Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
---
 drivers/ntb/hw/idt/ntb_hw_idt.c | 2 --
 1 file changed, 2 deletions(-)

Comments

Serge Semin Sept. 28, 2022, 11:03 a.m. UTC | #1
On Wed, Sep 28, 2022 at 06:59:40PM +0800, Zhuo Chen wrote:
> There is no need to clear error status during init code, so remove it.

Why do you think there isn't? Justify in more details.

-Sergey

> 
> Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
> ---
>  drivers/ntb/hw/idt/ntb_hw_idt.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
> index 0ed6f809ff2e..fed03217289d 100644
> --- a/drivers/ntb/hw/idt/ntb_hw_idt.c
> +++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
> @@ -2657,8 +2657,6 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
>  	ret = pci_enable_pcie_error_reporting(pdev);
>  	if (ret != 0)
>  		dev_warn(&pdev->dev, "PCIe AER capability disabled\n");
> -	else /* Cleanup nonfatal error status before getting to init */
> -		pci_aer_clear_nonfatal_status(pdev);
>  
>  	/* First enable the PCI device */
>  	ret = pcim_enable_device(pdev);
> -- 
> 2.30.1 (Apple Git-130)
>
Bjorn Helgaas Dec. 6, 2022, 6:09 p.m. UTC | #2
On Wed, Sep 28, 2022 at 02:03:55PM +0300, Serge Semin wrote:
> On Wed, Sep 28, 2022 at 06:59:40PM +0800, Zhuo Chen wrote:
> > There is no need to clear error status during init code, so remove it.
> 
> Why do you think there isn't? Justify in more details.

Thanks for taking a look, Sergey!  I agree we should leave it or add
the rationale here.

> > Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
> > ---
> >  drivers/ntb/hw/idt/ntb_hw_idt.c | 2 --
> >  1 file changed, 2 deletions(-)
> > 
> > diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
> > index 0ed6f809ff2e..fed03217289d 100644
> > --- a/drivers/ntb/hw/idt/ntb_hw_idt.c
> > +++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
> > @@ -2657,8 +2657,6 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
> >  	ret = pci_enable_pcie_error_reporting(pdev);
> >  	if (ret != 0)
> >  		dev_warn(&pdev->dev, "PCIe AER capability disabled\n");
> > -	else /* Cleanup nonfatal error status before getting to init */
> > -		pci_aer_clear_nonfatal_status(pdev);

I do think drivers should not need to clear errors; I think the PCI
core should be responsible for that.

And I think the core *does* do that in this path:

  pci_init_capabilities
    pci_aer_init
      pci_aer_clear_status
        pci_aer_raw_clear_status
          pci_write_config_dword(pdev, aer + PCI_ERR_COR_STATUS)
          pci_write_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS)

pci_aer_clear_nonfatal_status() clears only non-fatal uncorrectable
errors, while pci_aer_init() clears all correctable and all
uncorrectable errors, so the PCI core is already doing more than
idt_init_pci() does.

So I think this change is good because it removes some work from the
driver, but let me know if you think otherwise.

> >  
> >  	/* First enable the PCI device */
> >  	ret = pcim_enable_device(pdev);
> > -- 
> > 2.30.1 (Apple Git-130)
> >
Serge Semin Dec. 6, 2022, 9:41 p.m. UTC | #3
Hi Bjorn

On Tue, Dec 06, 2022 at 12:09:56PM -0600, Bjorn Helgaas wrote:
> On Wed, Sep 28, 2022 at 02:03:55PM +0300, Serge Semin wrote:
> > On Wed, Sep 28, 2022 at 06:59:40PM +0800, Zhuo Chen wrote:
> > > There is no need to clear error status during init code, so remove it.
> > 
> > Why do you think there isn't? Justify in more details.
> 
> Thanks for taking a look, Sergey!  I agree we should leave it or add
> the rationale here.
> 
> > > Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
> > > ---
> > >  drivers/ntb/hw/idt/ntb_hw_idt.c | 2 --
> > >  1 file changed, 2 deletions(-)
> > > 
> > > diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
> > > index 0ed6f809ff2e..fed03217289d 100644
> > > --- a/drivers/ntb/hw/idt/ntb_hw_idt.c
> > > +++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
> > > @@ -2657,8 +2657,6 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
> > >  	ret = pci_enable_pcie_error_reporting(pdev);
> > >  	if (ret != 0)
> > >  		dev_warn(&pdev->dev, "PCIe AER capability disabled\n");
> > > -	else /* Cleanup nonfatal error status before getting to init */
> > > -		pci_aer_clear_nonfatal_status(pdev);
> 
> I do think drivers should not need to clear errors; I think the PCI
> core should be responsible for that.
> 
> And I think the core *does* do that in this path:
> 
>   pci_init_capabilities
>     pci_aer_init
>       pci_aer_clear_status
>         pci_aer_raw_clear_status
>           pci_write_config_dword(pdev, aer + PCI_ERR_COR_STATUS)
>           pci_write_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS)
> 
> pci_aer_clear_nonfatal_status() clears only non-fatal uncorrectable
> errors, while pci_aer_init() clears all correctable and all
> uncorrectable errors, so the PCI core is already doing more than
> idt_init_pci() does.
> 
> So I think this change is good because it removes some work from the
> driver, but let me know if you think otherwise.

It's hard to remember now all the details but IIRC back when this
driver was developed the "Unsupported Request" flag was left uncleared
on our platform even after the probe completion. Most likely an
erroneous TLP was generated by some action performed on the device
probe stage. The forced cleanup of the AER status solved that problem.
On the other hand the problem of having the UnsupReq+ flag set was
solved some time after the driver was merged in into the kernel (it
was caused by a vendor-specific behavior of the IDT PCIe switch placed
on the path between a RP and PCIe NTB). So since the original reason
of having the pci_aer_clear_nonfatal_status() method called here was
platform specific and fixed now anyway, and the AER flags cleanup is
done by the core, then I have no reason to be against the patch. It
would be good to add your clarification to the commit message though.

Reviewed-by: Serge Semin <fancer.lancer@gmail.com>

-Serge(y)

> 
> > >  
> > >  	/* First enable the PCI device */
> > >  	ret = pcim_enable_device(pdev);
> > > -- 
> > > 2.30.1 (Apple Git-130)
> > >
Bjorn Helgaas March 15, 2023, 9:31 p.m. UTC | #4
On Wed, Sep 28, 2022 at 06:59:40PM +0800, Zhuo Chen wrote:
> There is no need to clear error status during init code, so remove it.
> 
> Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>

Can you send this to the NTB folks?  It doesn't depend on anything, so
no real reason to merge via the PCI tree.

To help reviewers, ideally the commit log would mention where the PCI
core clears the non-fatal errors so the driver doesn't have to.

> ---
>  drivers/ntb/hw/idt/ntb_hw_idt.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
> index 0ed6f809ff2e..fed03217289d 100644
> --- a/drivers/ntb/hw/idt/ntb_hw_idt.c
> +++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
> @@ -2657,8 +2657,6 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
>  	ret = pci_enable_pcie_error_reporting(pdev);
>  	if (ret != 0)
>  		dev_warn(&pdev->dev, "PCIe AER capability disabled\n");
> -	else /* Cleanup nonfatal error status before getting to init */
> -		pci_aer_clear_nonfatal_status(pdev);
>  
>  	/* First enable the PCI device */
>  	ret = pcim_enable_device(pdev);
> -- 
> 2.30.1 (Apple Git-130)
>
diff mbox series

Patch

diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
index 0ed6f809ff2e..fed03217289d 100644
--- a/drivers/ntb/hw/idt/ntb_hw_idt.c
+++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
@@ -2657,8 +2657,6 @@  static int idt_init_pci(struct idt_ntb_dev *ndev)
 	ret = pci_enable_pcie_error_reporting(pdev);
 	if (ret != 0)
 		dev_warn(&pdev->dev, "PCIe AER capability disabled\n");
-	else /* Cleanup nonfatal error status before getting to init */
-		pci_aer_clear_nonfatal_status(pdev);
 
 	/* First enable the PCI device */
 	ret = pcim_enable_device(pdev);