Message ID | 153194245964.191586.14782253252654776509.stgit@bhelgaas-glaptop.roam.corp.google.com |
---|---|
Headers | show |
Series | Fix issues and cleanup for ERR_FATAL and ERR_NONFATAL | expand |
On 2018-07-19 01:14, Bjorn Helgaas wrote: > This is a v3 of Oza's patches [1]. It's available at [2] if you prefer > git. > > v3 changes: > - Add pci_aer_clear_fatal_status() to clear ERR_FATAL bits, only > called > from pcie_do_fatal_recovery(). Moved to first in series to avoid a > window where ERR_FATAL recovery only clears ERR_NONFATAL bits. > Visible > only inside the PCI core. > - Instead of having pci_cleanup_aer_uncorrect_error_status() do > different > things based on dev->error_state, use this only for ERR_NONFATAL > bits. > I didn't change the name because it's used by many drivers. > - Rename pci_cleanup_aer_error_device_status() to > pci_aer_clear_device_status(), make it void, and make it visible > only > inside the PCI core. > - Remove pcie_portdrv_err_handler.slot_reset altogether instead of > making > it a stub function. Possibly pcie_portdrv_err_handler could be > removed > completely? > > [1] > https://lkml.kernel.org/r/1529661494-20936-1-git-send-email-poza@codeaurora.org > [2] > https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/?h=pci/06-22-oza-aer > > --- > > Bjorn Helgaas (1): > PCI/AER: Clear only ERR_FATAL status bits during fatal recovery > > Oza Pawandeep (6): > PCI/AER: Clear only ERR_NONFATAL bits during non-fatal recovery > PCI/AER: Factor out ERR_NONFATAL status bit clearing > PCI/AER: Remove ERR_FATAL code from ERR_NONFATAL path > PCI/AER: Clear device status bits during ERR_FATAL and > ERR_NONFATAL > PCI/AER: Clear device status bits during ERR_COR handling > PCI/portdrv: Remove pcie_portdrv_err_handler.slot_reset > > > drivers/pci/pci.h | 5 ++++ > drivers/pci/pcie/aer.c | 47 > +++++++++++++++++++++++++++------------- > drivers/pci/pcie/err.c | 15 +++++-------- > drivers/pci/pcie/portdrv_pci.c | 25 --------------------- > 4 files changed, 43 insertions(+), 49 deletions(-) looks good to me. Thanks for the corrections. some x86 compilation errors, you want me to to fix it and push v4 ? Regards, Oza.
On 2018-07-19 01:14, Bjorn Helgaas wrote: > This is a v3 of Oza's patches [1]. It's available at [2] if you prefer > git. > > v3 changes: > - Add pci_aer_clear_fatal_status() to clear ERR_FATAL bits, only > called > from pcie_do_fatal_recovery(). Moved to first in series to avoid a > window where ERR_FATAL recovery only clears ERR_NONFATAL bits. > Visible > only inside the PCI core. > - Instead of having pci_cleanup_aer_uncorrect_error_status() do > different > things based on dev->error_state, use this only for ERR_NONFATAL > bits. > I didn't change the name because it's used by many drivers. > - Rename pci_cleanup_aer_error_device_status() to > pci_aer_clear_device_status(), make it void, and make it visible > only > inside the PCI core. > - Remove pcie_portdrv_err_handler.slot_reset altogether instead of > making > it a stub function. Possibly pcie_portdrv_err_handler could be > removed > completely? > > [1] > https://lkml.kernel.org/r/1529661494-20936-1-git-send-email-poza@codeaurora.org > [2] > https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/?h=pci/06-22-oza-aer > > --- > > Bjorn Helgaas (1): > PCI/AER: Clear only ERR_FATAL status bits during fatal recovery > > Oza Pawandeep (6): > PCI/AER: Clear only ERR_NONFATAL bits during non-fatal recovery > PCI/AER: Factor out ERR_NONFATAL status bit clearing > PCI/AER: Remove ERR_FATAL code from ERR_NONFATAL path > PCI/AER: Clear device status bits during ERR_FATAL and > ERR_NONFATAL > PCI/AER: Clear device status bits during ERR_COR handling > PCI/portdrv: Remove pcie_portdrv_err_handler.slot_reset > > > drivers/pci/pci.h | 5 ++++ > drivers/pci/pcie/aer.c | 47 > +++++++++++++++++++++++++++------------- > drivers/pci/pcie/err.c | 15 +++++-------- > drivers/pci/pcie/portdrv_pci.c | 25 --------------------- > 4 files changed, 43 insertions(+), 49 deletions(-) Hi Bjorn, I am planning on some things to do after this series. your text " 1) I don't think the driver slot_reset callbacks should be responsible for clearing these AER status bits. Can we clear them somewhere in the pcie_do_nonfatal_recovery() path and remove these calls from the drivers? " Oza: We can do following broadcast_error_message() if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { should do pci_walk_bus(dev->subordinate, pci_cleanup_aer_uncorrect_error_status, NULL); and update all the drivers and remove the call pci_cleanup_aer_uncorrect_error_status() 2) In principle, we should only read PCI_ERR_UNCOR_STATUS *once* per device when handling an error. We currently read it three times: aer_isr aer_isr_one_error find_source_device find_device_iter is_error_source read PCI_ERR_UNCOR_STATUS # 1 Oza: this is the first legitimate read aer_process_err_devices get_device_error_info(e_info->dev[i]) read PCI_ERR_UNCOR_STATUS # 2 Oza: I see this read used to check if link is healthy so the purpose of this read looks different to me. handle_error_source pcie_do_nonfatal_recovery ... report_slot_reset driver->err_handler->slot_reset pci_cleanup_aer_uncorrect_error_status read PCI_ERR_UNCOR_STATUS # 3 Oza: pci_cleanup_aer_uncorrect_error_status() is generic and able to clear status. for e.g. in point 4 as I suggested if we have to do pci_walk_bus(dev->subordinate, pci_cleanup_aer_uncorrect_error_status, NULL); then we have to read them. 3) we need to get rid of pci_channel_io_frozen permanently. Regards, Oza.
On Thu, Jul 19, 2018 at 09:23:47AM +0530, poza@codeaurora.org wrote: > On 2018-07-19 01:14, Bjorn Helgaas wrote: > > This is a v3 of Oza's patches [1]. It's available at [2] if you prefer > > git. > > > > v3 changes: > > - Add pci_aer_clear_fatal_status() to clear ERR_FATAL bits, only > > called > > from pcie_do_fatal_recovery(). Moved to first in series to avoid a > > window where ERR_FATAL recovery only clears ERR_NONFATAL bits. > > Visible > > only inside the PCI core. > > - Instead of having pci_cleanup_aer_uncorrect_error_status() do > > different > > things based on dev->error_state, use this only for ERR_NONFATAL > > bits. > > I didn't change the name because it's used by many drivers. > > - Rename pci_cleanup_aer_error_device_status() to > > pci_aer_clear_device_status(), make it void, and make it visible > > only > > inside the PCI core. > > - Remove pcie_portdrv_err_handler.slot_reset altogether instead of > > making > > it a stub function. Possibly pcie_portdrv_err_handler could be > > removed > > completely? > > > > [1] > > https://lkml.kernel.org/r/1529661494-20936-1-git-send-email-poza@codeaurora.org > > [2] > > https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/?h=pci/06-22-oza-aer > > > > --- > > > > Bjorn Helgaas (1): > > PCI/AER: Clear only ERR_FATAL status bits during fatal recovery > > > > Oza Pawandeep (6): > > PCI/AER: Clear only ERR_NONFATAL bits during non-fatal recovery > > PCI/AER: Factor out ERR_NONFATAL status bit clearing > > PCI/AER: Remove ERR_FATAL code from ERR_NONFATAL path > > PCI/AER: Clear device status bits during ERR_FATAL and > > ERR_NONFATAL > > PCI/AER: Clear device status bits during ERR_COR handling > > PCI/portdrv: Remove pcie_portdrv_err_handler.slot_reset > > > > > > drivers/pci/pci.h | 5 ++++ > > drivers/pci/pcie/aer.c | 47 > > +++++++++++++++++++++++++++------------- > > drivers/pci/pcie/err.c | 15 +++++-------- > > drivers/pci/pcie/portdrv_pci.c | 25 --------------------- > > 4 files changed, 43 insertions(+), 49 deletions(-) > > looks good to me. > Thanks for the corrections. > some x86 compilation errors, you want me to to fix it and push v4 ? I fixed those already. I moved these all to the pci/aer branch for v4.19. I'll merge them into "next" soon. Thanks! Bjorn