diff mbox

[v2] PCI/AER: enable SERR# forwarding for bridges and switches

Message ID 1449074998-9664-1-git-send-email-okaya@codeaurora.org
State Changes Requested
Headers show

Commit Message

Sinan Kaya Dec. 2, 2015, 4:49 p.m. UTC
A PCIe card behind a switch is unable to report its errors when SERR#
forwarding is not enabled on the PCIe# switch's secondary interface
according to the spec. This patch enables SERR# forwarding when the PCI
header type is bridge.

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/pci/pcie/aer/aerdrv_core.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

Comments

Bjorn Helgaas Dec. 4, 2015, 9:06 p.m. UTC | #1
Hi Sinan,

On Wed, Dec 02, 2015 at 11:49:56AM -0500, Sinan Kaya wrote:
> A PCIe card behind a switch is unable to report its errors when SERR#
> forwarding is not enabled on the PCIe# switch's secondary interface
> according to the spec. This patch enables SERR# forwarding when the PCI
> header type is bridge.
> 
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>

I'm sitting on this for the moment because if you have _HPP, it seems
like that should be enough to get SERR# forwarding turned on, and if
it's not, I'd like to understand why.  So no hurry, but I'm waiting on
your investigation.

Bjorn

> ---
>  drivers/pci/pcie/aer/aerdrv_core.c | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
> index 9803e3d..f248c17 100644
> --- a/drivers/pci/pcie/aer/aerdrv_core.c
> +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> @@ -37,21 +37,53 @@ module_param(nosourceid, bool, 0);
>  
>  int pci_enable_pcie_error_reporting(struct pci_dev *dev)
>  {
> +	u8 header_type;
> +
>  	if (pcie_aer_get_firmware_first(dev))
>  		return -EIO;
>  
>  	if (!pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR))
>  		return -EIO;
>  
> +	pci_read_config_byte(dev, PCI_HEADER_TYPE, &header_type);
> +
> +	/* needs to be a bridge/switch */
> +	if (header_type == PCI_HEADER_TYPE_BRIDGE) {
> +		u16 control;
> +
> +		/*
> +		 * A switch will not forward ERR_ messages coming from an
> +		 * endpoint if SERR# forwarding is not enabled.
> +		 * AER driver is checking the errors at the root only.
> +		 */
> +		pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &control);
> +		control |= PCI_BRIDGE_CTL_SERR;
> +		pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
> +	}
> +
>  	return pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_AER_FLAGS);
>  }
>  EXPORT_SYMBOL_GPL(pci_enable_pcie_error_reporting);
>  
>  int pci_disable_pcie_error_reporting(struct pci_dev *dev)
>  {
> +	u8 header_type;
> +
>  	if (pcie_aer_get_firmware_first(dev))
>  		return -EIO;
>  
> +	pci_read_config_byte(dev, PCI_HEADER_TYPE, &header_type);
> +
> +	/* needs to be a bridge/switch */
> +	if (header_type == PCI_HEADER_TYPE_BRIDGE) {
> +		u16 control;
> +
> +		/* clear serr forwarding */
> +		pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &control);
> +		control &= ~PCI_BRIDGE_CTL_SERR;
> +		pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
> +	}
> +
>  	return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
>  					  PCI_EXP_AER_FLAGS);
>  }
> -- 
> Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sinan Kaya Dec. 6, 2015, 4:19 a.m. UTC | #2
On 12/4/2015 4:06 PM, Bjorn Helgaas wrote:
> I'm sitting on this for the moment because if you have _HPP, it seems
> like that should be enough to get SERR# forwarding turned on, and if
> it's not, I'd like to understand why.  So no hurry, but I'm waiting on
> your investigation.
> 
> Bjorn

OK. I'll find out. I did not get a chance to debug it yet. My guess is
that I wrote this patch before enabling the hotplug support. Therefore,
I was not aware of any _HPP implementation in the kernel.
Sinan Kaya Dec. 10, 2015, 8:28 p.m. UTC | #3
Hi Bjorn,

On 12/4/2015 4:06 PM, Bjorn Helgaas wrote:
> I'm sitting on this for the moment because if you have _HPP, it seems
> like that should be enough to get SERR# forwarding turned on, and if
> it's not, I'd like to understand why.  So no hurry, but I'm waiting on
> your investigation.
> 
> Bjorn

Here is the overall summary after my investigation.

It looks like the kernel covers the hotplug use case. This patch is
needed for systems without hotplug support and when the firmware is not
setting up the SERR.

after boot

/# dmesg | grep hpp

[    3.115227] pci 0004:01:00.0: program_hpp_type0:1376
[    3.128870] pci 0004:01:00.0: program_hpp_type0:1378 pci_cmd |=
PCI_COMMAND_SERR
[    3.149597] pci 0004:01:00.0: program_hpp_type0:1391 pci_bctl |=
PCI_BRIDGE_CTL_SERR
[    3.191601] pci 0004:02:08.0: program_hpp_type0:1376
[    3.191611] pci 0004:02:08.0: program_hpp_type0:1378 pci_cmd |=
PCI_COMMAND_SERR
[    3.206630] pci 0004:02:08.0: program_hpp_type0:1391 pci_bctl |=
PCI_BRIDGE_CTL_SERR
[    3.253760] pci 0004:03:00.0: program_hpp_type0:1376
[    3.267335] pci 0004:03:00.0: program_hpp_type0:1378 pci_cmd |=
PCI_COMMAND_SERR
[    3.288046] pci 0004:03:00.0: program_hpp_type0:1391 pci_bctl |=
PCI_BRIDGE_CTL_SERR
[    3.355296] pci 0004:04:00.0: program_hpp_type0:1376
[    3.355306] pci 0004:04:00.0: program_hpp_type0:1378 pci_cmd |=
PCI_COMMAND_SERR
[    3.370334] pci 0004:04:00.0: program_hpp_type0:1391 pci_bctl |=
PCI_BRIDGE_CTL_SERR

/ # lspci
00:00.0 Class 0604: 17cb:0400
01:00.0 Class 0604: 10b5:8732
02:08.0 Class 0604: 10b5:8732
03:00.0 Class 0604: 10b5:8732
04:00.0 Class 0604: 10b5:8732
/ #


Without hpp in ACPI table, SERR is not enabled.

/# dmesg | grep type0
/#

Power up with HPP after boot.

[    3.129325]_pci_0004:01:00.0:_program_hpp_type0:1376
[    3.143286] pci 0004:01:00.0: program_hpp_type0:1378 pci_cmd |=
PCI_COMMAND_SERR
[    3.164016] pci 0004:01:00.0: program_hpp_type0:1391 pci_bctl |=
PCI_BRIDGE_CTL_SERR
[    3.206019] pci 0004:02:08.0: program_hpp_type0:1376
[    3.206028] pci 0004:02:08.0: program_hpp_type0:1378 pci_cmd |=
PCI_COMMAND_SERR
[    3.220609] pci 0004:02:08.0: program_hpp_type0:1391 pci_bctl |=
PCI_BRIDGE_CTL_SERR
[    3.267783] pci 0004:03:00.0: program_hpp_type0:1376
[    3.281420] pci 0004:03:00.0: program_hpp_type0:1378 pci_cmd |=
PCI_COMMAND_SERR
[    3.302197] pci 0004:03:00.0: program_hpp_type0:1391 pci_bctl |=
PCI_BRIDGE_CTL_SERR
[    3.369684] pci 0004:04:00.0: program_hpp_type0:1376
[    3.369694] pci 0004:04:00.0: program_hpp_type0:1378 pci_cmd |=
PCI_COMMAND_SERR
[    3.384080] pci 0004:04:00.0: program_hpp_type0:1391 pci_bctl |=
PCI_BRIDGE_CTL_SERR

hotplug eject

hotplug insert

[   98.338131] pci 0004:01:00.0: program_hpp_type0:1376
[   98.351813] pci 0004:01:00.0: program_hpp_type0:1378 pci_cmd |=
PCI_COMMAND_SERR
[   98.373147] pci 0004:01:00.0: program_hpp_type0:1391 pci_bctl |=
PCI_BRIDGE_CTL_SERR
[   98.452051] pci 0004:02:08.0: program_hpp_type0:1376
[   98.465772] pci 0004:02:08.0: program_hpp_type0:1378 pci_cmd |=
PCI_COMMAND_SERR
[   98.487142] pci 0004:02:08.0: program_hpp_type0:1391 pci_bctl |=
PCI_BRIDGE_CTL_SERR
[   98.597579] pci 0004:03:00.0: program_hpp_type0:1376
[   98.611290] pci 0004:03:00.0: program_hpp_type0:1378 pci_cmd |=
PCI_COMMAND_SERR
[   98.632181] pci 0004:03:00.0: program_hpp_type0:1391 pci_bctl |=
PCI_BRIDGE_CTL_SERR
[   98.736153] pci 0004:04:00.0: program_hpp_type0:1376
[   98.750437] pci 0004:04:00.0: program_hpp_type0:1378 pci_cmd |=
PCI_COMMAND_SERR
[   98.771202] pci 0004:04:00.0: program_hpp_type0:1391 pci_bctl |=
PCI_BRIDGE_CTL_SERR
/ #
Bjorn Helgaas Dec. 10, 2015, 10:37 p.m. UTC | #4
On Thu, Dec 10, 2015 at 03:28:35PM -0500, Sinan Kaya wrote:
> Hi Bjorn,
> 
> On 12/4/2015 4:06 PM, Bjorn Helgaas wrote:
> > I'm sitting on this for the moment because if you have _HPP, it seems
> > like that should be enough to get SERR# forwarding turned on, and if
> > it's not, I'd like to understand why.  So no hurry, but I'm waiting on
> > your investigation.
> > 
> > Bjorn
> 
> Here is the overall summary after my investigation.
> 
> It looks like the kernel covers the hotplug use case. This patch is
> needed for systems without hotplug support and when the firmware is not
> setting up the SERR.

Here's how I understand your results:

  Firmware   _HPP or     Devices      Hot-added    Hot-added
  enables    _HPX sets   present at   root ports   endpoints
  SERR#      SERR#       boot work    work         work
  --------   ---------   ----------   ----------   ---------
  no         no          no (1)       no (2)       no (4)
  no         yes         yes          yes          yes
  yes        no          yes          no (3)       no (5)
  yes        yes         yes          yes          yes

Your patch fixes cases 1-3 above, but I don't think it fixes cases 4
or 5.

The difference is that in cases 2 and 3, when we hot-add a root port,
the AER driver binds to the root port and (with your patch) enables
SERR for anything below it.

But in cases 4 and 5, the root port is already there, the AER driver
has already bound to it.  The AER driver tried to enable SERR for the
hierarchy below the root port, but there was nothing there.  Now we
add the endpoint, and the AER driver isn't involved, so I don't think
anything will enable SERR for the new endpoint.

I think the best way to fix all the cases would be to do something in
in pci_configure_device().  Then we could drop the AER bus walk in
set_downstream_devices_error_reporting().  A bus walk like that is
always an issue for hotplug.

In principle, we should be able to just enable PCI_COMMAND_SERR and
PCI_BRIDGE_CTL_SERR for everything, and then errors would get
forwarded to the root port, and if/when the AER driver claimed the
root port, it would start collecting them.

But I'm a little leery of doing it unconditionally because there are a
lot of platform- and driver-specific uses of those bits, and I'm
afraid of breaking something.  It might be possible, but it'll take
some care to do it safely.

> after boot
> 
> /# dmesg | grep hpp
> 
> [    3.115227] pci 0004:01:00.0: program_hpp_type0:1376
> [    3.128870] pci 0004:01:00.0: program_hpp_type0:1378 pci_cmd |=
> PCI_COMMAND_SERR
> [    3.149597] pci 0004:01:00.0: program_hpp_type0:1391 pci_bctl |=
> PCI_BRIDGE_CTL_SERR
> [    3.191601] pci 0004:02:08.0: program_hpp_type0:1376
> [    3.191611] pci 0004:02:08.0: program_hpp_type0:1378 pci_cmd |=
> PCI_COMMAND_SERR
> [    3.206630] pci 0004:02:08.0: program_hpp_type0:1391 pci_bctl |=
> PCI_BRIDGE_CTL_SERR
> [    3.253760] pci 0004:03:00.0: program_hpp_type0:1376
> [    3.267335] pci 0004:03:00.0: program_hpp_type0:1378 pci_cmd |=
> PCI_COMMAND_SERR
> [    3.288046] pci 0004:03:00.0: program_hpp_type0:1391 pci_bctl |=
> PCI_BRIDGE_CTL_SERR
> [    3.355296] pci 0004:04:00.0: program_hpp_type0:1376
> [    3.355306] pci 0004:04:00.0: program_hpp_type0:1378 pci_cmd |=
> PCI_COMMAND_SERR
> [    3.370334] pci 0004:04:00.0: program_hpp_type0:1391 pci_bctl |=
> PCI_BRIDGE_CTL_SERR
> 
> / # lspci
> 00:00.0 Class 0604: 17cb:0400
> 01:00.0 Class 0604: 10b5:8732
> 02:08.0 Class 0604: 10b5:8732
> 03:00.0 Class 0604: 10b5:8732
> 04:00.0 Class 0604: 10b5:8732
> / #
> 
> 
> Without hpp in ACPI table, SERR is not enabled.
> 
> /# dmesg | grep type0
> /#
> 
> Power up with HPP after boot.
> 
> [    3.129325]_pci_0004:01:00.0:_program_hpp_type0:1376
> [    3.143286] pci 0004:01:00.0: program_hpp_type0:1378 pci_cmd |=
> PCI_COMMAND_SERR
> [    3.164016] pci 0004:01:00.0: program_hpp_type0:1391 pci_bctl |=
> PCI_BRIDGE_CTL_SERR
> [    3.206019] pci 0004:02:08.0: program_hpp_type0:1376
> [    3.206028] pci 0004:02:08.0: program_hpp_type0:1378 pci_cmd |=
> PCI_COMMAND_SERR
> [    3.220609] pci 0004:02:08.0: program_hpp_type0:1391 pci_bctl |=
> PCI_BRIDGE_CTL_SERR
> [    3.267783] pci 0004:03:00.0: program_hpp_type0:1376
> [    3.281420] pci 0004:03:00.0: program_hpp_type0:1378 pci_cmd |=
> PCI_COMMAND_SERR
> [    3.302197] pci 0004:03:00.0: program_hpp_type0:1391 pci_bctl |=
> PCI_BRIDGE_CTL_SERR
> [    3.369684] pci 0004:04:00.0: program_hpp_type0:1376
> [    3.369694] pci 0004:04:00.0: program_hpp_type0:1378 pci_cmd |=
> PCI_COMMAND_SERR
> [    3.384080] pci 0004:04:00.0: program_hpp_type0:1391 pci_bctl |=
> PCI_BRIDGE_CTL_SERR
> 
> hotplug eject
> 
> hotplug insert
> 
> [   98.338131] pci 0004:01:00.0: program_hpp_type0:1376
> [   98.351813] pci 0004:01:00.0: program_hpp_type0:1378 pci_cmd |=
> PCI_COMMAND_SERR
> [   98.373147] pci 0004:01:00.0: program_hpp_type0:1391 pci_bctl |=
> PCI_BRIDGE_CTL_SERR
> [   98.452051] pci 0004:02:08.0: program_hpp_type0:1376
> [   98.465772] pci 0004:02:08.0: program_hpp_type0:1378 pci_cmd |=
> PCI_COMMAND_SERR
> [   98.487142] pci 0004:02:08.0: program_hpp_type0:1391 pci_bctl |=
> PCI_BRIDGE_CTL_SERR
> [   98.597579] pci 0004:03:00.0: program_hpp_type0:1376
> [   98.611290] pci 0004:03:00.0: program_hpp_type0:1378 pci_cmd |=
> PCI_COMMAND_SERR
> [   98.632181] pci 0004:03:00.0: program_hpp_type0:1391 pci_bctl |=
> PCI_BRIDGE_CTL_SERR
> [   98.736153] pci 0004:04:00.0: program_hpp_type0:1376
> [   98.750437] pci 0004:04:00.0: program_hpp_type0:1378 pci_cmd |=
> PCI_COMMAND_SERR
> [   98.771202] pci 0004:04:00.0: program_hpp_type0:1391 pci_bctl |=
> PCI_BRIDGE_CTL_SERR
> / #
> 
> 
> 
> -- 
> Sinan Kaya
> Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a
> Linux Foundation Collaborative Project
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sinan Kaya Dec. 11, 2015, 11:30 p.m. UTC | #5
On 12/10/2015 5:37 PM, Bjorn Helgaas wrote:
> On Thu, Dec 10, 2015 at 03:28:35PM -0500, Sinan Kaya wrote:
>> Hi Bjorn,
>>
>> On 12/4/2015 4:06 PM, Bjorn Helgaas wrote:
>>> I'm sitting on this for the moment because if you have _HPP, it seems
>>> like that should be enough to get SERR# forwarding turned on, and if
>>> it's not, I'd like to understand why.  So no hurry, but I'm waiting on
>>> your investigation.
>>>
>>> Bjorn
>>
>> Here is the overall summary after my investigation.
>>
>> It looks like the kernel covers the hotplug use case. This patch is
>> needed for systems without hotplug support and when the firmware is not
>> setting up the SERR.
> 
> Here's how I understand your results:
> 
>   Firmware   _HPP or     Devices      Hot-added    Hot-added
>   enables    _HPX sets   present at   root ports   endpoints
>   SERR#      SERR#       boot work    work         work
>   --------   ---------   ----------   ----------   ---------
>   no         no          no (1)       no (2)       no (4)
>   no         yes         yes          yes          yes
>   yes        no          yes          no (3)       no (5)
>   yes        yes         yes          yes          yes
> 
> Your patch fixes cases 1-3 above, but I don't think it fixes cases 4
> or 5.
> 

I think so. I don't supported 4 and 5 cases on our platform. But, I
could write up a patch if somebody can test it on such a platform.

> The difference is that in cases 2 and 3, when we hot-add a root port,
> the AER driver binds to the root port and (with your patch) enables
> SERR for anything below it.
> 
> But in cases 4 and 5, the root port is already there, the AER driver
> has already bound to it.  The AER driver tried to enable SERR for the
> hierarchy below the root port, but there was nothing there.  Now we
> add the endpoint, and the AER driver isn't involved, so I don't think
> anything will enable SERR for the new endpoint.
> 
> I think the best way to fix all the cases would be to do something in
> in pci_configure_device().  Then we could drop the AER bus walk in
> set_downstream_devices_error_reporting().  A bus walk like that is
> always an issue for hotplug.
> 

Let me read some code.

> In principle, we should be able to just enable PCI_COMMAND_SERR and
> PCI_BRIDGE_CTL_SERR for everything, and then errors would get
> forwarded to the root port, and if/when the AER driver claimed the
> root port, it would start collecting them.
> 
> But I'm a little leery of doing it unconditionally because there are a
> lot of platform- and driver-specific uses of those bits, and I'm
> afraid of breaking something.  It might be possible, but it'll take
> some care to do it safely.

Sure, when we were talking about ECRC the other day; you said we could
enable it on platforms post 2000 using some SMBIOS API. We could go the
same route here.

> 
>> after boot
>>
>> /# dmesg | grep hpp
>>
>> [    3.115227] pci 0004:01:00.0: program_hpp_type0:1376
>> [    3.128870] pci 0004:01:00.0: program_hpp_type0:1378 pci_cmd |=
>> PCI_COMMAND_SERR
>> [    3.149597] pci 0004:01:00.0: program_hpp_type0:1391 pci_bctl |=
>> PCI_BRIDGE_CTL_SERR
>> [    3.191601] pci 0004:02:08.0: program_hpp_type0:1376
>> [    3.191611] pci 0004:02:08.0: program_hpp_type0:1378 pci_cmd |=
>> PCI_COMMAND_SERR
>> [    3.206630] pci 0004:02:08.0: program_hpp_type0:1391 pci_bctl |=
>> PCI_BRIDGE_CTL_SERR
>> [    3.253760] pci 0004:03:00.0: program_hpp_type0:1376
>> [    3.267335] pci 0004:03:00.0: program_hpp_type0:1378 pci_cmd |=
>> PCI_COMMAND_SERR
>> [    3.288046] pci 0004:03:00.0: program_hpp_type0:1391 pci_bctl |=
>> PCI_BRIDGE_CTL_SERR
>> [    3.355296] pci 0004:04:00.0: program_hpp_type0:1376
>> [    3.355306] pci 0004:04:00.0: program_hpp_type0:1378 pci_cmd |=
>> PCI_COMMAND_SERR
>> [    3.370334] pci 0004:04:00.0: program_hpp_type0:1391 pci_bctl |=
>> PCI_BRIDGE_CTL_SERR
>>
>> / # lspci
>> 00:00.0 Class 0604: 17cb:0400
>> 01:00.0 Class 0604: 10b5:8732
>> 02:08.0 Class 0604: 10b5:8732
>> 03:00.0 Class 0604: 10b5:8732
>> 04:00.0 Class 0604: 10b5:8732
>> / #
>>
>>
>> Without hpp in ACPI table, SERR is not enabled.
>>
>> /# dmesg | grep type0
>> /#
>>
>> Power up with HPP after boot.
>>
>> [    3.129325]_pci_0004:01:00.0:_program_hpp_type0:1376
>> [    3.143286] pci 0004:01:00.0: program_hpp_type0:1378 pci_cmd |=
>> PCI_COMMAND_SERR
>> [    3.164016] pci 0004:01:00.0: program_hpp_type0:1391 pci_bctl |=
>> PCI_BRIDGE_CTL_SERR
>> [    3.206019] pci 0004:02:08.0: program_hpp_type0:1376
>> [    3.206028] pci 0004:02:08.0: program_hpp_type0:1378 pci_cmd |=
>> PCI_COMMAND_SERR
>> [    3.220609] pci 0004:02:08.0: program_hpp_type0:1391 pci_bctl |=
>> PCI_BRIDGE_CTL_SERR
>> [    3.267783] pci 0004:03:00.0: program_hpp_type0:1376
>> [    3.281420] pci 0004:03:00.0: program_hpp_type0:1378 pci_cmd |=
>> PCI_COMMAND_SERR
>> [    3.302197] pci 0004:03:00.0: program_hpp_type0:1391 pci_bctl |=
>> PCI_BRIDGE_CTL_SERR
>> [    3.369684] pci 0004:04:00.0: program_hpp_type0:1376
>> [    3.369694] pci 0004:04:00.0: program_hpp_type0:1378 pci_cmd |=
>> PCI_COMMAND_SERR
>> [    3.384080] pci 0004:04:00.0: program_hpp_type0:1391 pci_bctl |=
>> PCI_BRIDGE_CTL_SERR
>>
>> hotplug eject
>>
>> hotplug insert
>>
>> [   98.338131] pci 0004:01:00.0: program_hpp_type0:1376
>> [   98.351813] pci 0004:01:00.0: program_hpp_type0:1378 pci_cmd |=
>> PCI_COMMAND_SERR
>> [   98.373147] pci 0004:01:00.0: program_hpp_type0:1391 pci_bctl |=
>> PCI_BRIDGE_CTL_SERR
>> [   98.452051] pci 0004:02:08.0: program_hpp_type0:1376
>> [   98.465772] pci 0004:02:08.0: program_hpp_type0:1378 pci_cmd |=
>> PCI_COMMAND_SERR
>> [   98.487142] pci 0004:02:08.0: program_hpp_type0:1391 pci_bctl |=
>> PCI_BRIDGE_CTL_SERR
>> [   98.597579] pci 0004:03:00.0: program_hpp_type0:1376
>> [   98.611290] pci 0004:03:00.0: program_hpp_type0:1378 pci_cmd |=
>> PCI_COMMAND_SERR
>> [   98.632181] pci 0004:03:00.0: program_hpp_type0:1391 pci_bctl |=
>> PCI_BRIDGE_CTL_SERR
>> [   98.736153] pci 0004:04:00.0: program_hpp_type0:1376
>> [   98.750437] pci 0004:04:00.0: program_hpp_type0:1378 pci_cmd |=
>> PCI_COMMAND_SERR
>> [   98.771202] pci 0004:04:00.0: program_hpp_type0:1391 pci_bctl |=
>> PCI_BRIDGE_CTL_SERR
>> / #
>>
>>
>>
>> -- 
>> Sinan Kaya
>> Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a
>> Linux Foundation Collaborative Project
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Sinan Kaya Dec. 14, 2015, 6:22 p.m. UTC | #6
On 12/11/2015 6:30 PM, Sinan Kaya wrote:
>> I think the best way to fix all the cases would be to do something in
>> > in pci_configure_device().  Then we could drop the AER bus walk in
>> > set_downstream_devices_error_reporting().  A bus walk like that is
>> > always an issue for hotplug.
>> > 
> Let me read some code.
> 

OK, If I understand it right; pci_configure_device is where
program_hpp_type0 called. You also want to enable AER in this function.

Move the contents of set_device_error_reporting into
pci_configure_device like this below ?

...
+ int type = pci_pcie_type(dev);

pci_configure_mps(dev);

+	if ((type == PCI_EXP_TYPE_ROOT_PORT) ||
+	    (type == PCI_EXP_TYPE_UPSTREAM) ||
+	    (type == PCI_EXP_TYPE_DOWNSTREAM)) {
+		pci_enable_pcie_error_reporting(dev);
+	}

+	pcie_set_ecrc_checking(dev);
Bjorn Helgaas Dec. 28, 2015, 10:29 p.m. UTC | #7
Hi Sinan,

Sorry for the delay in responding; I was on vacation when you sent
this, and I missed it when I returned.

On Mon, Dec 14, 2015 at 01:22:39PM -0500, Sinan Kaya wrote:
> On 12/11/2015 6:30 PM, Sinan Kaya wrote:
> >> I think the best way to fix all the cases would be to do something in
> >> > in pci_configure_device().  Then we could drop the AER bus walk in
> >> > set_downstream_devices_error_reporting().  A bus walk like that is
> >> > always an issue for hotplug.
> >> > 
> > Let me read some code.
> 
> OK, If I understand it right; pci_configure_device is where
> program_hpp_type0 called. You also want to enable AER in this function.
> 
> Move the contents of set_device_error_reporting into
> pci_configure_device like this below ?
> 
> ...
> + int type = pci_pcie_type(dev);
> 
> pci_configure_mps(dev);
> 
> +	if ((type == PCI_EXP_TYPE_ROOT_PORT) ||
> +	    (type == PCI_EXP_TYPE_UPSTREAM) ||
> +	    (type == PCI_EXP_TYPE_DOWNSTREAM)) {
> +		pci_enable_pcie_error_reporting(dev);
> +	}
> 
> +	pcie_set_ecrc_checking(dev);

Yep, that's the sort of thing I'm thinking.

I think there are some subtleties to consider.

_HPP/_HPX can twiddle some of the same bits.  Should we allow _HPP to
clear a bit that pci_enable_pcie_error_reporting() would set?  What
about the reverse?

There are a ridiculous number of places that call
pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR).  We should
do it once and cache the result in struct pci_dev.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sinan Kaya Dec. 30, 2015, 1:26 p.m. UTC | #8
On 12/28/2015 5:29 PM, Bjorn Helgaas wrote:
> Hi Sinan,
> 
> Sorry for the delay in responding; I was on vacation when you sent
> this, and I missed it when I returned.
> 
> On Mon, Dec 14, 2015 at 01:22:39PM -0500, Sinan Kaya wrote:
>> On 12/11/2015 6:30 PM, Sinan Kaya wrote:
>>>> I think the best way to fix all the cases would be to do something in
>>>>> in pci_configure_device().  Then we could drop the AER bus walk in
>>>>> set_downstream_devices_error_reporting().  A bus walk like that is
>>>>> always an issue for hotplug.
>>>>>
>>> Let me read some code.
>>
>> OK, If I understand it right; pci_configure_device is where
>> program_hpp_type0 called. You also want to enable AER in this function.
>>
>> Move the contents of set_device_error_reporting into
>> pci_configure_device like this below ?
>>
>> ...
>> + int type = pci_pcie_type(dev);
>>
>> pci_configure_mps(dev);
>>
>> +	if ((type == PCI_EXP_TYPE_ROOT_PORT) ||
>> +	    (type == PCI_EXP_TYPE_UPSTREAM) ||
>> +	    (type == PCI_EXP_TYPE_DOWNSTREAM)) {
>> +		pci_enable_pcie_error_reporting(dev);
>> +	}
>>
>> +	pcie_set_ecrc_checking(dev);
> 
> Yep, that's the sort of thing I'm thinking.
> 
> I think there are some subtleties to consider.
> 
> _HPP/_HPX can twiddle some of the same bits.  Should we allow _HPP to
> clear a bit that pci_enable_pcie_error_reporting() would set?  What
> about the reverse?
> 
> There are a ridiculous number of places that call
> pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR).  We should
> do it once and cache the result in struct pci_dev.
> 
> Bjorn
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Thanks for the heads up. I had another pending question to you on another patch.
I just sent a ping. 

I'll start working on this patch after the holidays.
diff mbox

Patch

diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
index 9803e3d..f248c17 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -37,21 +37,53 @@  module_param(nosourceid, bool, 0);
 
 int pci_enable_pcie_error_reporting(struct pci_dev *dev)
 {
+	u8 header_type;
+
 	if (pcie_aer_get_firmware_first(dev))
 		return -EIO;
 
 	if (!pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR))
 		return -EIO;
 
+	pci_read_config_byte(dev, PCI_HEADER_TYPE, &header_type);
+
+	/* needs to be a bridge/switch */
+	if (header_type == PCI_HEADER_TYPE_BRIDGE) {
+		u16 control;
+
+		/*
+		 * A switch will not forward ERR_ messages coming from an
+		 * endpoint if SERR# forwarding is not enabled.
+		 * AER driver is checking the errors at the root only.
+		 */
+		pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &control);
+		control |= PCI_BRIDGE_CTL_SERR;
+		pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
+	}
+
 	return pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_AER_FLAGS);
 }
 EXPORT_SYMBOL_GPL(pci_enable_pcie_error_reporting);
 
 int pci_disable_pcie_error_reporting(struct pci_dev *dev)
 {
+	u8 header_type;
+
 	if (pcie_aer_get_firmware_first(dev))
 		return -EIO;
 
+	pci_read_config_byte(dev, PCI_HEADER_TYPE, &header_type);
+
+	/* needs to be a bridge/switch */
+	if (header_type == PCI_HEADER_TYPE_BRIDGE) {
+		u16 control;
+
+		/* clear serr forwarding */
+		pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &control);
+		control &= ~PCI_BRIDGE_CTL_SERR;
+		pci_write_config_word(dev, PCI_BRIDGE_CONTROL, control);
+	}
+
 	return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
 					  PCI_EXP_AER_FLAGS);
 }