diff mbox

PCI: Make SR-IOV capable GPU working on the SR-IOV incapable platform

Message ID BN6PR12MB1249312205E07A670C54D2CFECE20@BN6PR12MB1249.namprd12.prod.outlook.com
State Not Applicable
Headers show

Commit Message

Cheng, Collins May 12, 2017, 2:50 a.m. UTC
Hi Helgaas,

Some AMD GPUs have hardware support for graphics SR-IOV.
If the SR-IOV capable GPU is plugged into the SR-IOV incapable
platform. It would cause a problem on PCI resource allocation in
current Linux kernel.

Therefore in order to allow the PF (Physical Function) device of
SR-IOV capable GPU to work on the SR-IOV incapable platform,
it is required to verify conditions for initializing BAR resources
on AMD SR-IOV capable GPUs.

If the device is an AMD graphics device and it supports
SR-IOV it will require a large amount of resources.
Before calling sriov_init() must ensure that the system
BIOS also supports SR-IOV and that system BIOS has been
able to allocate enough resources.
If the VF BARs are zero then the system BIOS does not
support SR-IOV or it could not allocate the resources
and this platform will not support AMD graphics SR-IOV.
Therefore do not call sriov_init().
If the system BIOS does support SR-IOV then the VF BARs
will be properly initialized to non-zero values.

Below is the patch against to Kernel 4.8 & 4.9. Please review.

I checked the drivers/pci/quirks.c, it looks the workarounds/fixes in
quirks.c are for specific devices and one or more device ID are defined
for the specific devices. However my patch is for all AMD SR-IOV
capable GPUs, that includes all existing and future AMD server GPUs.
So it doesn't seem like a good fit to put the fix in quirks.c.



Signed-off-by: Collins Cheng <collins.cheng@amd.com>
---
 drivers/pci/iov.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 60 insertions(+), 3 deletions(-)

Comments

Alex Williamson May 12, 2017, 3:20 a.m. UTC | #1
On Fri, 12 May 2017 02:50:32 +0000
"Cheng, Collins" <Collins.Cheng@amd.com> wrote:

> Hi Helgaas,
> 
> Some AMD GPUs have hardware support for graphics SR-IOV.
> If the SR-IOV capable GPU is plugged into the SR-IOV incapable
> platform. It would cause a problem on PCI resource allocation in
> current Linux kernel.
> 
> Therefore in order to allow the PF (Physical Function) device of
> SR-IOV capable GPU to work on the SR-IOV incapable platform,
> it is required to verify conditions for initializing BAR resources
> on AMD SR-IOV capable GPUs.
> 
> If the device is an AMD graphics device and it supports
> SR-IOV it will require a large amount of resources.
> Before calling sriov_init() must ensure that the system
> BIOS also supports SR-IOV and that system BIOS has been
> able to allocate enough resources.
> If the VF BARs are zero then the system BIOS does not
> support SR-IOV or it could not allocate the resources
> and this platform will not support AMD graphics SR-IOV.
> Therefore do not call sriov_init().
> If the system BIOS does support SR-IOV then the VF BARs
> will be properly initialized to non-zero values.
> 
> Below is the patch against to Kernel 4.8 & 4.9. Please review.
> 
> I checked the drivers/pci/quirks.c, it looks the workarounds/fixes in
> quirks.c are for specific devices and one or more device ID are defined
> for the specific devices. However my patch is for all AMD SR-IOV
> capable GPUs, that includes all existing and future AMD server GPUs.
> So it doesn't seem like a good fit to put the fix in quirks.c.


Why is an AMD graphics card unique here?  Doesn't sriov_init() always
need to be able to deal with devices of any type where the BIOS hasn't
initialized the SR-IOV capability?  Some SR-IOV devices can fit their
VFs within a minimum bridge aperture, most cannot.  I don't understand
why the VF resource requirements being exceptionally large dictates
that they receive special handling.  Thanks,

Alex

> Signed-off-by: Collins Cheng <collins.cheng@amd.com>
> ---
>  drivers/pci/iov.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 60 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index e30f05c..e4f1405 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -523,6 +523,45 @@ static void sriov_restore_state(struct pci_dev *dev)
>  		msleep(100);
>  }
>  
> +/*
> + * pci_vf_bar_valid - check if VF BARs have resource allocated
> + * @dev: the PCI device
> + * @pos: register offset of SR-IOV capability in PCI config space
> + * Returns true any VF BAR has resource allocated, false
> + * if all VF BARs are empty.
> + */
> +static bool pci_vf_bar_valid(struct pci_dev *dev, int pos)
> +{
> +	int i;
> +	u32 bar_value;
> +	u32 bar_size_mask = ~(PCI_BASE_ADDRESS_SPACE |
> +			PCI_BASE_ADDRESS_MEM_TYPE_64 |
> +			PCI_BASE_ADDRESS_MEM_PREFETCH);
> +
> +	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
> +		pci_read_config_dword(dev, pos + PCI_SRIOV_BAR + i * 4, &bar_value);
> +		if (bar_value & bar_size_mask)
> +			return true;
> +	}
> +
> +	return false;
> +}
> +
> +/*
> + * is_amd_display_adapter - check if it is an AMD/ATI GPU device
> + * @dev: the PCI device
> + *
> + * Returns true if device is an AMD/ATI display adapter,
> + * otherwise return false.
> + */
> +
> +static bool is_amd_display_adapter(struct pci_dev *dev)
> +{
> +	return (((dev->class >> 16) == PCI_BASE_CLASS_DISPLAY) &&
> +		(dev->vendor == PCI_VENDOR_ID_ATI ||
> +		dev->vendor == PCI_VENDOR_ID_AMD));
> +}
> +
>  /**
>   * pci_iov_init - initialize the IOV capability
>   * @dev: the PCI device
> @@ -537,9 +576,27 @@ int pci_iov_init(struct pci_dev *dev)
>  		return -ENODEV;
>  
>  	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);
> -	if (pos)
> -		return sriov_init(dev, pos);
> -
> +	if (pos) {
> +	/*
> +	 * If the device is an AMD graphics device and it supports
> +	 * SR-IOV it will require a large amount of resources.
> +	 * Before calling sriov_init() must ensure that the system
> +	 * BIOS also supports SR-IOV and that system BIOS has been
> +	 * able to allocate enough resources.
> +	 * If the VF BARs are zero then the system BIOS does not
> +	 * support SR-IOV or it could not allocate the resources
> +	 * and this platform will not support AMD graphics SR-IOV.
> +	 * Therefore do not call sriov_init().
> +	 * If the system BIOS does support SR-IOV then the VF BARs
> +	 * will be properly initialized to non-zero values.
> +	 */
> +		if (is_amd_display_adapter(dev)) {
> +			if (pci_vf_bar_valid(dev, pos))
> +				return sriov_init(dev, pos);
> +		} else {
> +			return sriov_init(dev, pos);
> +		}
> +	}
>  	return -ENODEV;
>  }
>
Cheng, Collins May 12, 2017, 3:42 a.m. UTC | #2
Hi Williamson,

GPU card needs more BAR aperture resource than other PCI devices. For example, Intel SR-IOV network card only require 512KB memory resource for all VFs. AMD SR-IOV GPU card needs 256MB x16 VF = 4GB memory resource for frame buffer BAR aperture.

If the system BIOS supports SR-IOV, it will reserve enough resource for all VF BARs. If the system BIOS doesn't support SR-IOV or cannot allocate the enough resource for VF BARs, only PF BAR will be assigned and VF BARs are empty. Then system boots to Linux kernel and kernel doesn't check if the VF BARs are empty or valid. Kernel will re-assign the BAR resources for PF and all VFs. The problem I saw is that kernel will fail to allocate PF BAR resource because some resources are assigned to VF, this is not expected. So kernel might need to do some check before re-assign the PF/VF resource, so that PF device will be correctly assigned BAR resource and user can use PF device.

-Collins Cheng

-----Original Message-----
From: Alex Williamson [mailto:alex.williamson@redhat.com] 
Sent: Friday, May 12, 2017 11:21 AM
To: Cheng, Collins <Collins.Cheng@amd.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Deucher, Alexander <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>
Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV incapable platform

On Fri, 12 May 2017 02:50:32 +0000
"Cheng, Collins" <Collins.Cheng@amd.com> wrote:

> Hi Helgaas,
> 
> Some AMD GPUs have hardware support for graphics SR-IOV.
> If the SR-IOV capable GPU is plugged into the SR-IOV incapable 
> platform. It would cause a problem on PCI resource allocation in 
> current Linux kernel.
> 
> Therefore in order to allow the PF (Physical Function) device of 
> SR-IOV capable GPU to work on the SR-IOV incapable platform, it is 
> required to verify conditions for initializing BAR resources on AMD 
> SR-IOV capable GPUs.
> 
> If the device is an AMD graphics device and it supports SR-IOV it will 
> require a large amount of resources.
> Before calling sriov_init() must ensure that the system BIOS also 
> supports SR-IOV and that system BIOS has been able to allocate enough 
> resources.
> If the VF BARs are zero then the system BIOS does not support SR-IOV 
> or it could not allocate the resources and this platform will not 
> support AMD graphics SR-IOV.
> Therefore do not call sriov_init().
> If the system BIOS does support SR-IOV then the VF BARs will be 
> properly initialized to non-zero values.
> 
> Below is the patch against to Kernel 4.8 & 4.9. Please review.
> 
> I checked the drivers/pci/quirks.c, it looks the workarounds/fixes in 
> quirks.c are for specific devices and one or more device ID are 
> defined for the specific devices. However my patch is for all AMD 
> SR-IOV capable GPUs, that includes all existing and future AMD server GPUs.
> So it doesn't seem like a good fit to put the fix in quirks.c.


Why is an AMD graphics card unique here?  Doesn't sriov_init() always need to be able to deal with devices of any type where the BIOS hasn't initialized the SR-IOV capability?  Some SR-IOV devices can fit their VFs within a minimum bridge aperture, most cannot.  I don't understand why the VF resource requirements being exceptionally large dictates that they receive special handling.  Thanks,

Alex

> Signed-off-by: Collins Cheng <collins.cheng@amd.com>
> ---
>  drivers/pci/iov.c | 63 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 60 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 
> e30f05c..e4f1405 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -523,6 +523,45 @@ static void sriov_restore_state(struct pci_dev *dev)
>  		msleep(100);
>  }
>  
> +/*
> + * pci_vf_bar_valid - check if VF BARs have resource allocated
> + * @dev: the PCI device
> + * @pos: register offset of SR-IOV capability in PCI config space
> + * Returns true any VF BAR has resource allocated, false
> + * if all VF BARs are empty.
> + */
> +static bool pci_vf_bar_valid(struct pci_dev *dev, int pos) {
> +	int i;
> +	u32 bar_value;
> +	u32 bar_size_mask = ~(PCI_BASE_ADDRESS_SPACE |
> +			PCI_BASE_ADDRESS_MEM_TYPE_64 |
> +			PCI_BASE_ADDRESS_MEM_PREFETCH);
> +
> +	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
> +		pci_read_config_dword(dev, pos + PCI_SRIOV_BAR + i * 4, &bar_value);
> +		if (bar_value & bar_size_mask)
> +			return true;
> +	}
> +
> +	return false;
> +}
> +
> +/*
> + * is_amd_display_adapter - check if it is an AMD/ATI GPU device
> + * @dev: the PCI device
> + *
> + * Returns true if device is an AMD/ATI display adapter,
> + * otherwise return false.
> + */
> +
> +static bool is_amd_display_adapter(struct pci_dev *dev) {
> +	return (((dev->class >> 16) == PCI_BASE_CLASS_DISPLAY) &&
> +		(dev->vendor == PCI_VENDOR_ID_ATI ||
> +		dev->vendor == PCI_VENDOR_ID_AMD)); }
> +
>  /**
>   * pci_iov_init - initialize the IOV capability
>   * @dev: the PCI device
> @@ -537,9 +576,27 @@ int pci_iov_init(struct pci_dev *dev)
>  		return -ENODEV;
>  
>  	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);
> -	if (pos)
> -		return sriov_init(dev, pos);
> -
> +	if (pos) {
> +	/*
> +	 * If the device is an AMD graphics device and it supports
> +	 * SR-IOV it will require a large amount of resources.
> +	 * Before calling sriov_init() must ensure that the system
> +	 * BIOS also supports SR-IOV and that system BIOS has been
> +	 * able to allocate enough resources.
> +	 * If the VF BARs are zero then the system BIOS does not
> +	 * support SR-IOV or it could not allocate the resources
> +	 * and this platform will not support AMD graphics SR-IOV.
> +	 * Therefore do not call sriov_init().
> +	 * If the system BIOS does support SR-IOV then the VF BARs
> +	 * will be properly initialized to non-zero values.
> +	 */
> +		if (is_amd_display_adapter(dev)) {
> +			if (pci_vf_bar_valid(dev, pos))
> +				return sriov_init(dev, pos);
> +		} else {
> +			return sriov_init(dev, pos);
> +		}
> +	}
>  	return -ENODEV;
>  }
>
Zytaruk, Kelly May 12, 2017, 3:44 a.m. UTC | #3
>-----Original Message-----
>From: Alex Williamson [mailto:alex.williamson@redhat.com]
>Sent: Thursday, May 11, 2017 11:21 PM
>To: Cheng, Collins
>Cc: Bjorn Helgaas; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org;
>Deucher, Alexander; Zytaruk, Kelly
>Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV
>incapable platform
>
>On Fri, 12 May 2017 02:50:32 +0000
>"Cheng, Collins" <Collins.Cheng@amd.com> wrote:
>
>> Hi Helgaas,
>>
>> Some AMD GPUs have hardware support for graphics SR-IOV.
>> If the SR-IOV capable GPU is plugged into the SR-IOV incapable
>> platform. It would cause a problem on PCI resource allocation in
>> current Linux kernel.
>>
>> Therefore in order to allow the PF (Physical Function) device of
>> SR-IOV capable GPU to work on the SR-IOV incapable platform, it is
>> required to verify conditions for initializing BAR resources on AMD
>> SR-IOV capable GPUs.
>>
>> If the device is an AMD graphics device and it supports SR-IOV it will
>> require a large amount of resources.
>> Before calling sriov_init() must ensure that the system BIOS also
>> supports SR-IOV and that system BIOS has been able to allocate enough
>> resources.
>> If the VF BARs are zero then the system BIOS does not support SR-IOV
>> or it could not allocate the resources and this platform will not
>> support AMD graphics SR-IOV.
>> Therefore do not call sriov_init().
>> If the system BIOS does support SR-IOV then the VF BARs will be
>> properly initialized to non-zero values.
>>
>> Below is the patch against to Kernel 4.8 & 4.9. Please review.
>>
>> I checked the drivers/pci/quirks.c, it looks the workarounds/fixes in
>> quirks.c are for specific devices and one or more device ID are
>> defined for the specific devices. However my patch is for all AMD
>> SR-IOV capable GPUs, that includes all existing and future AMD server GPUs.
>> So it doesn't seem like a good fit to put the fix in quirks.c.
>
>
>Why is an AMD graphics card unique here?  Doesn't sriov_init() always need to be
>able to deal with devices of any type where the BIOS hasn't initialized the SR-IOV
>capability?  Some SR-IOV devices can fit their VFs within a minimum bridge
>aperture, most cannot.  I don't understand why the VF resource requirements
>being exceptionally large dictates that they receive special handling.  Thanks,
>
>Alex
>

Hi Alex,  
Many System Bios's are problematic in that they don't know how to fully support SRIOV.  Up until recently SRIOV devices typically used a small amount of resources, such as a NIC.
The AMD SRIOV GPU uses significant resources and many SBios' cannot handle this properly.  The faulty SBios will attempt to initialize, run out of resources and not indicate any error.
Even though we are not enabling SRIOV on these platforms this prevents us from running our SRIOV GPUs in non-SRIOV mode on these platforms.

Outward looking there is no indication that the SBios had any problems and the capability is set.  We have been able to detect these problematic SBios by noticing that they don't initialize our BARs as we expect them to be initialized.

If you have an alternative solution I would love to hear about it.

Thanks,
Kelly


>> Signed-off-by: Collins Cheng <collins.cheng@amd.com>
>> ---
>>  drivers/pci/iov.c | 63
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++---
>>  1 file changed, 60 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index
>> e30f05c..e4f1405 100644
>> --- a/drivers/pci/iov.c
>> +++ b/drivers/pci/iov.c
>> @@ -523,6 +523,45 @@ static void sriov_restore_state(struct pci_dev *dev)
>>  		msleep(100);
>>  }
>>
>> +/*
>> + * pci_vf_bar_valid - check if VF BARs have resource allocated
>> + * @dev: the PCI device
>> + * @pos: register offset of SR-IOV capability in PCI config space
>> + * Returns true any VF BAR has resource allocated, false
>> + * if all VF BARs are empty.
>> + */
>> +static bool pci_vf_bar_valid(struct pci_dev *dev, int pos) {
>> +	int i;
>> +	u32 bar_value;
>> +	u32 bar_size_mask = ~(PCI_BASE_ADDRESS_SPACE |
>> +			PCI_BASE_ADDRESS_MEM_TYPE_64 |
>> +			PCI_BASE_ADDRESS_MEM_PREFETCH);
>> +
>> +	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>> +		pci_read_config_dword(dev, pos + PCI_SRIOV_BAR + i * 4,
>&bar_value);
>> +		if (bar_value & bar_size_mask)
>> +			return true;
>> +	}
>> +
>> +	return false;
>> +}
>> +
>> +/*
>> + * is_amd_display_adapter - check if it is an AMD/ATI GPU device
>> + * @dev: the PCI device
>> + *
>> + * Returns true if device is an AMD/ATI display adapter,
>> + * otherwise return false.
>> + */
>> +
>> +static bool is_amd_display_adapter(struct pci_dev *dev) {
>> +	return (((dev->class >> 16) == PCI_BASE_CLASS_DISPLAY) &&
>> +		(dev->vendor == PCI_VENDOR_ID_ATI ||
>> +		dev->vendor == PCI_VENDOR_ID_AMD)); }
>> +
>>  /**
>>   * pci_iov_init - initialize the IOV capability
>>   * @dev: the PCI device
>> @@ -537,9 +576,27 @@ int pci_iov_init(struct pci_dev *dev)
>>  		return -ENODEV;
>>
>>  	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);
>> -	if (pos)
>> -		return sriov_init(dev, pos);
>> -
>> +	if (pos) {
>> +	/*
>> +	 * If the device is an AMD graphics device and it supports
>> +	 * SR-IOV it will require a large amount of resources.
>> +	 * Before calling sriov_init() must ensure that the system
>> +	 * BIOS also supports SR-IOV and that system BIOS has been
>> +	 * able to allocate enough resources.
>> +	 * If the VF BARs are zero then the system BIOS does not
>> +	 * support SR-IOV or it could not allocate the resources
>> +	 * and this platform will not support AMD graphics SR-IOV.
>> +	 * Therefore do not call sriov_init().
>> +	 * If the system BIOS does support SR-IOV then the VF BARs
>> +	 * will be properly initialized to non-zero values.
>> +	 */
>> +		if (is_amd_display_adapter(dev)) {
>> +			if (pci_vf_bar_valid(dev, pos))
>> +				return sriov_init(dev, pos);
>> +		} else {
>> +			return sriov_init(dev, pos);
>> +		}
>> +	}
>>  	return -ENODEV;
>>  }
>>
Alex Williamson May 12, 2017, 4:01 a.m. UTC | #4
On Fri, 12 May 2017 03:42:46 +0000
"Cheng, Collins" <Collins.Cheng@amd.com> wrote:

> Hi Williamson,
> 
> GPU card needs more BAR aperture resource than other PCI devices. For example, Intel SR-IOV network card only require 512KB memory resource for all VFs. AMD SR-IOV GPU card needs 256MB x16 VF = 4GB memory resource for frame buffer BAR aperture.
> 
> If the system BIOS supports SR-IOV, it will reserve enough resource for all VF BARs. If the system BIOS doesn't support SR-IOV or cannot allocate the enough resource for VF BARs, only PF BAR will be assigned and VF BARs are empty. Then system boots to Linux kernel and kernel doesn't check if the VF BARs are empty or valid. Kernel will re-assign the BAR resources for PF and all VFs. The problem I saw is that kernel will fail to allocate PF BAR resource because some resources are assigned to VF, this is not expected. So kernel might need to do some check before re-assign the PF/VF resource, so that PF device will be correctly assigned BAR resource and user can use PF device.

So the problem is that something bad happens when the kernel is trying
to reallocate resources in order to fulfill the requirements of the
VFs, leaving the PF resources incorrectly programmed?  Why not just fix
that bug rather than creating special handling for this vendor/class of
device which disables any attempt to fixup resources for SR-IOV?  IOW,
this patch just avoids the problem for your devices rather than fixing
the bug.  I'd suggest fixing the bug such that the PF is left in a
functional state if the kernel is unable to allocate sufficient
resources for the VFs.  Thanks,

Alex

> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com] 
> Sent: Friday, May 12, 2017 11:21 AM
> To: Cheng, Collins <Collins.Cheng@amd.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Deucher, Alexander <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>
> Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV incapable platform
> 
> On Fri, 12 May 2017 02:50:32 +0000
> "Cheng, Collins" <Collins.Cheng@amd.com> wrote:
> 
> > Hi Helgaas,
> > 
> > Some AMD GPUs have hardware support for graphics SR-IOV.
> > If the SR-IOV capable GPU is plugged into the SR-IOV incapable 
> > platform. It would cause a problem on PCI resource allocation in 
> > current Linux kernel.
> > 
> > Therefore in order to allow the PF (Physical Function) device of 
> > SR-IOV capable GPU to work on the SR-IOV incapable platform, it is 
> > required to verify conditions for initializing BAR resources on AMD 
> > SR-IOV capable GPUs.
> > 
> > If the device is an AMD graphics device and it supports SR-IOV it will 
> > require a large amount of resources.
> > Before calling sriov_init() must ensure that the system BIOS also 
> > supports SR-IOV and that system BIOS has been able to allocate enough 
> > resources.
> > If the VF BARs are zero then the system BIOS does not support SR-IOV 
> > or it could not allocate the resources and this platform will not 
> > support AMD graphics SR-IOV.
> > Therefore do not call sriov_init().
> > If the system BIOS does support SR-IOV then the VF BARs will be 
> > properly initialized to non-zero values.
> > 
> > Below is the patch against to Kernel 4.8 & 4.9. Please review.
> > 
> > I checked the drivers/pci/quirks.c, it looks the workarounds/fixes in 
> > quirks.c are for specific devices and one or more device ID are 
> > defined for the specific devices. However my patch is for all AMD 
> > SR-IOV capable GPUs, that includes all existing and future AMD server GPUs.
> > So it doesn't seem like a good fit to put the fix in quirks.c.  
> 
> 
> Why is an AMD graphics card unique here?  Doesn't sriov_init() always need to be able to deal with devices of any type where the BIOS hasn't initialized the SR-IOV capability?  Some SR-IOV devices can fit their VFs within a minimum bridge aperture, most cannot.  I don't understand why the VF resource requirements being exceptionally large dictates that they receive special handling.  Thanks,
> 
> Alex
> 
> > Signed-off-by: Collins Cheng <collins.cheng@amd.com>
> > ---
> >  drivers/pci/iov.c | 63 
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 60 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 
> > e30f05c..e4f1405 100644
> > --- a/drivers/pci/iov.c
> > +++ b/drivers/pci/iov.c
> > @@ -523,6 +523,45 @@ static void sriov_restore_state(struct pci_dev *dev)
> >  		msleep(100);
> >  }
> >  
> > +/*
> > + * pci_vf_bar_valid - check if VF BARs have resource allocated
> > + * @dev: the PCI device
> > + * @pos: register offset of SR-IOV capability in PCI config space
> > + * Returns true any VF BAR has resource allocated, false
> > + * if all VF BARs are empty.
> > + */
> > +static bool pci_vf_bar_valid(struct pci_dev *dev, int pos) {
> > +	int i;
> > +	u32 bar_value;
> > +	u32 bar_size_mask = ~(PCI_BASE_ADDRESS_SPACE |
> > +			PCI_BASE_ADDRESS_MEM_TYPE_64 |
> > +			PCI_BASE_ADDRESS_MEM_PREFETCH);
> > +
> > +	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
> > +		pci_read_config_dword(dev, pos + PCI_SRIOV_BAR + i * 4, &bar_value);
> > +		if (bar_value & bar_size_mask)
> > +			return true;
> > +	}
> > +
> > +	return false;
> > +}
> > +
> > +/*
> > + * is_amd_display_adapter - check if it is an AMD/ATI GPU device
> > + * @dev: the PCI device
> > + *
> > + * Returns true if device is an AMD/ATI display adapter,
> > + * otherwise return false.
> > + */
> > +
> > +static bool is_amd_display_adapter(struct pci_dev *dev) {
> > +	return (((dev->class >> 16) == PCI_BASE_CLASS_DISPLAY) &&
> > +		(dev->vendor == PCI_VENDOR_ID_ATI ||
> > +		dev->vendor == PCI_VENDOR_ID_AMD)); }
> > +
> >  /**
> >   * pci_iov_init - initialize the IOV capability
> >   * @dev: the PCI device
> > @@ -537,9 +576,27 @@ int pci_iov_init(struct pci_dev *dev)
> >  		return -ENODEV;
> >  
> >  	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);
> > -	if (pos)
> > -		return sriov_init(dev, pos);
> > -
> > +	if (pos) {
> > +	/*
> > +	 * If the device is an AMD graphics device and it supports
> > +	 * SR-IOV it will require a large amount of resources.
> > +	 * Before calling sriov_init() must ensure that the system
> > +	 * BIOS also supports SR-IOV and that system BIOS has been
> > +	 * able to allocate enough resources.
> > +	 * If the VF BARs are zero then the system BIOS does not
> > +	 * support SR-IOV or it could not allocate the resources
> > +	 * and this platform will not support AMD graphics SR-IOV.
> > +	 * Therefore do not call sriov_init().
> > +	 * If the system BIOS does support SR-IOV then the VF BARs
> > +	 * will be properly initialized to non-zero values.
> > +	 */
> > +		if (is_amd_display_adapter(dev)) {
> > +			if (pci_vf_bar_valid(dev, pos))
> > +				return sriov_init(dev, pos);
> > +		} else {
> > +			return sriov_init(dev, pos);
> > +		}
> > +	}
> >  	return -ENODEV;
> >  }
> >    
>
Cheng, Collins May 12, 2017, 4:51 a.m. UTC | #5
Hi Williamson,

I verified the patch is working for both AMD SR-IOV GPU and Intel SR-IOV NIC. I don't think it is redundant to check the VF BAR valid before call sriov_init(), it is safe and saving boot time, also there is no a better method to know if system BIOS has correctly initialized the SR-IOV capability or not.

I did not try to fix the issue from the kernel resource allocation perspective, it is because:
	1. I am not very familiar with the PCI resource allocation scheme in kernel. For example, in sriov_init(), kernel will re-assign the PCI resource for both VF and PF. I don't understand why kernel allocates resource for VF firstly, then PF. If it is PF firstly, then this issue could be avoided.
	2. I am not sure if kernel has error handler if PCI resource allocation failed. In this case, kernel cannot allocate enough resource to PF. It should trigger some error handler to either just keep original BAR values set by system BIOS, or disable this device and log errors.

-Collins Cheng


-----Original Message-----
From: Alex Williamson [mailto:alex.williamson@redhat.com] 
Sent: Friday, May 12, 2017 12:01 PM
To: Cheng, Collins <Collins.Cheng@amd.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Deucher, Alexander <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>
Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV incapable platform

On Fri, 12 May 2017 03:42:46 +0000
"Cheng, Collins" <Collins.Cheng@amd.com> wrote:

> Hi Williamson,
> 
> GPU card needs more BAR aperture resource than other PCI devices. For example, Intel SR-IOV network card only require 512KB memory resource for all VFs. AMD SR-IOV GPU card needs 256MB x16 VF = 4GB memory resource for frame buffer BAR aperture.
> 
> If the system BIOS supports SR-IOV, it will reserve enough resource for all VF BARs. If the system BIOS doesn't support SR-IOV or cannot allocate the enough resource for VF BARs, only PF BAR will be assigned and VF BARs are empty. Then system boots to Linux kernel and kernel doesn't check if the VF BARs are empty or valid. Kernel will re-assign the BAR resources for PF and all VFs. The problem I saw is that kernel will fail to allocate PF BAR resource because some resources are assigned to VF, this is not expected. So kernel might need to do some check before re-assign the PF/VF resource, so that PF device will be correctly assigned BAR resource and user can use PF device.

So the problem is that something bad happens when the kernel is trying to reallocate resources in order to fulfill the requirements of the VFs, leaving the PF resources incorrectly programmed?  Why not just fix that bug rather than creating special handling for this vendor/class of device which disables any attempt to fixup resources for SR-IOV?  IOW, this patch just avoids the problem for your devices rather than fixing the bug.  I'd suggest fixing the bug such that the PF is left in a functional state if the kernel is unable to allocate sufficient resources for the VFs.  Thanks,

Alex

> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Friday, May 12, 2017 11:21 AM
> To: Cheng, Collins <Collins.Cheng@amd.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org; 
> linux-kernel@vger.kernel.org; Deucher, Alexander 
> <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>
> Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the 
> SR-IOV incapable platform
> 
> On Fri, 12 May 2017 02:50:32 +0000
> "Cheng, Collins" <Collins.Cheng@amd.com> wrote:
> 
> > Hi Helgaas,
> > 
> > Some AMD GPUs have hardware support for graphics SR-IOV.
> > If the SR-IOV capable GPU is plugged into the SR-IOV incapable 
> > platform. It would cause a problem on PCI resource allocation in 
> > current Linux kernel.
> > 
> > Therefore in order to allow the PF (Physical Function) device of 
> > SR-IOV capable GPU to work on the SR-IOV incapable platform, it is 
> > required to verify conditions for initializing BAR resources on AMD 
> > SR-IOV capable GPUs.
> > 
> > If the device is an AMD graphics device and it supports SR-IOV it 
> > will require a large amount of resources.
> > Before calling sriov_init() must ensure that the system BIOS also 
> > supports SR-IOV and that system BIOS has been able to allocate 
> > enough resources.
> > If the VF BARs are zero then the system BIOS does not support SR-IOV 
> > or it could not allocate the resources and this platform will not 
> > support AMD graphics SR-IOV.
> > Therefore do not call sriov_init().
> > If the system BIOS does support SR-IOV then the VF BARs will be 
> > properly initialized to non-zero values.
> > 
> > Below is the patch against to Kernel 4.8 & 4.9. Please review.
> > 
> > I checked the drivers/pci/quirks.c, it looks the workarounds/fixes 
> > in quirks.c are for specific devices and one or more device ID are 
> > defined for the specific devices. However my patch is for all AMD 
> > SR-IOV capable GPUs, that includes all existing and future AMD server GPUs.
> > So it doesn't seem like a good fit to put the fix in quirks.c.  
> 
> 
> Why is an AMD graphics card unique here?  Doesn't sriov_init() always 
> need to be able to deal with devices of any type where the BIOS hasn't 
> initialized the SR-IOV capability?  Some SR-IOV devices can fit their 
> VFs within a minimum bridge aperture, most cannot.  I don't understand 
> why the VF resource requirements being exceptionally large dictates 
> that they receive special handling.  Thanks,
> 
> Alex
> 
> > Signed-off-by: Collins Cheng <collins.cheng@amd.com>
> > ---
> >  drivers/pci/iov.c | 63
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 60 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index
> > e30f05c..e4f1405 100644
> > --- a/drivers/pci/iov.c
> > +++ b/drivers/pci/iov.c
> > @@ -523,6 +523,45 @@ static void sriov_restore_state(struct pci_dev *dev)
> >  		msleep(100);
> >  }
> >  
> > +/*
> > + * pci_vf_bar_valid - check if VF BARs have resource allocated
> > + * @dev: the PCI device
> > + * @pos: register offset of SR-IOV capability in PCI config space
> > + * Returns true any VF BAR has resource allocated, false
> > + * if all VF BARs are empty.
> > + */
> > +static bool pci_vf_bar_valid(struct pci_dev *dev, int pos) {
> > +	int i;
> > +	u32 bar_value;
> > +	u32 bar_size_mask = ~(PCI_BASE_ADDRESS_SPACE |
> > +			PCI_BASE_ADDRESS_MEM_TYPE_64 |
> > +			PCI_BASE_ADDRESS_MEM_PREFETCH);
> > +
> > +	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
> > +		pci_read_config_dword(dev, pos + PCI_SRIOV_BAR + i * 4, &bar_value);
> > +		if (bar_value & bar_size_mask)
> > +			return true;
> > +	}
> > +
> > +	return false;
> > +}
> > +
> > +/*
> > + * is_amd_display_adapter - check if it is an AMD/ATI GPU device
> > + * @dev: the PCI device
> > + *
> > + * Returns true if device is an AMD/ATI display adapter,
> > + * otherwise return false.
> > + */
> > +
> > +static bool is_amd_display_adapter(struct pci_dev *dev) {
> > +	return (((dev->class >> 16) == PCI_BASE_CLASS_DISPLAY) &&
> > +		(dev->vendor == PCI_VENDOR_ID_ATI ||
> > +		dev->vendor == PCI_VENDOR_ID_AMD)); }
> > +
> >  /**
> >   * pci_iov_init - initialize the IOV capability
> >   * @dev: the PCI device
> > @@ -537,9 +576,27 @@ int pci_iov_init(struct pci_dev *dev)
> >  		return -ENODEV;
> >  
> >  	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);
> > -	if (pos)
> > -		return sriov_init(dev, pos);
> > -
> > +	if (pos) {
> > +	/*
> > +	 * If the device is an AMD graphics device and it supports
> > +	 * SR-IOV it will require a large amount of resources.
> > +	 * Before calling sriov_init() must ensure that the system
> > +	 * BIOS also supports SR-IOV and that system BIOS has been
> > +	 * able to allocate enough resources.
> > +	 * If the VF BARs are zero then the system BIOS does not
> > +	 * support SR-IOV or it could not allocate the resources
> > +	 * and this platform will not support AMD graphics SR-IOV.
> > +	 * Therefore do not call sriov_init().
> > +	 * If the system BIOS does support SR-IOV then the VF BARs
> > +	 * will be properly initialized to non-zero values.
> > +	 */
> > +		if (is_amd_display_adapter(dev)) {
> > +			if (pci_vf_bar_valid(dev, pos))
> > +				return sriov_init(dev, pos);
> > +		} else {
> > +			return sriov_init(dev, pos);
> > +		}
> > +	}
> >  	return -ENODEV;
> >  }
> >    
>
Alex Williamson May 12, 2017, 2:43 p.m. UTC | #6
On Fri, 12 May 2017 04:51:43 +0000
"Cheng, Collins" <Collins.Cheng@amd.com> wrote:

> Hi Williamson,
> 
> I verified the patch is working for both AMD SR-IOV GPU and Intel SR-IOV NIC. I don't think it is redundant to check the VF BAR valid before call sriov_init(), it is safe and saving boot time, also there is no a better method to know if system BIOS has correctly initialized the SR-IOV capability or not.

It also masks an underlying bug and creates a maintenance issue that we
won't know when it's safe to remove this workaround.  I don't think
faster boot is valid rationale, in one case SR-IOV is completely
disabled, the other we attempt to allocate the resources the BIOS
failed to provide.  I expect this is also a corner case, the BIOS
should typically support SR-IOV, therefore this situation should be an
exception.
 
> I did not try to fix the issue from the kernel resource allocation perspective, it is because:
> 	1. I am not very familiar with the PCI resource allocation scheme in kernel. For example, in sriov_init(), kernel will re-assign the PCI resource for both VF and PF. I don't understand why kernel allocates resource for VF firstly, then PF. If it is PF firstly, then this issue could be avoided.
> 	2. I am not sure if kernel has error handler if PCI resource allocation failed. In this case, kernel cannot allocate enough resource to PF. It should trigger some error handler to either just keep original BAR values set by system BIOS, or disable this device and log errors.

I think these are the issues we should be trying to solve and I'm sure
folks on the linux-pci list can help us identify the bug.  Minimally,
failure to allocate VF resources should leave the device in no worse
condition than before it tried.  Perhaps you could post more details
about the issue, boot with pci=earlydump, post dmesg of a boot where
the PF resources are incorrectly re-allocated, and include lspci -vvv
for the SR-IOV device.  Also, please test with the latest upstream
kernel, upstream only patches old kernels through stable backports of
commits to the latest kernel.  Adding Yinghai as a resource allocation
expert. Thanks,

Alex

> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com] 
> Sent: Friday, May 12, 2017 12:01 PM
> To: Cheng, Collins <Collins.Cheng@amd.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Deucher, Alexander <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>
> Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV incapable platform
> 
> On Fri, 12 May 2017 03:42:46 +0000
> "Cheng, Collins" <Collins.Cheng@amd.com> wrote:
> 
> > Hi Williamson,
> > 
> > GPU card needs more BAR aperture resource than other PCI devices. For example, Intel SR-IOV network card only require 512KB memory resource for all VFs. AMD SR-IOV GPU card needs 256MB x16 VF = 4GB memory resource for frame buffer BAR aperture.
> > 
> > If the system BIOS supports SR-IOV, it will reserve enough resource for all VF BARs. If the system BIOS doesn't support SR-IOV or cannot allocate the enough resource for VF BARs, only PF BAR will be assigned and VF BARs are empty. Then system boots to Linux kernel and kernel doesn't check if the VF BARs are empty or valid. Kernel will re-assign the BAR resources for PF and all VFs. The problem I saw is that kernel will fail to allocate PF BAR resource because some resources are assigned to VF, this is not expected. So kernel might need to do some check before re-assign the PF/VF resource, so that PF device will be correctly assigned BAR resource and user can use PF device.  
> 
> So the problem is that something bad happens when the kernel is trying to reallocate resources in order to fulfill the requirements of the VFs, leaving the PF resources incorrectly programmed?  Why not just fix that bug rather than creating special handling for this vendor/class of device which disables any attempt to fixup resources for SR-IOV?  IOW, this patch just avoids the problem for your devices rather than fixing the bug.  I'd suggest fixing the bug such that the PF is left in a functional state if the kernel is unable to allocate sufficient resources for the VFs.  Thanks,
> 
> Alex
> 
> > -----Original Message-----
> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > Sent: Friday, May 12, 2017 11:21 AM
> > To: Cheng, Collins <Collins.Cheng@amd.com>
> > Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org; 
> > linux-kernel@vger.kernel.org; Deucher, Alexander 
> > <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>
> > Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the 
> > SR-IOV incapable platform
> > 
> > On Fri, 12 May 2017 02:50:32 +0000
> > "Cheng, Collins" <Collins.Cheng@amd.com> wrote:
> >   
> > > Hi Helgaas,
> > > 
> > > Some AMD GPUs have hardware support for graphics SR-IOV.
> > > If the SR-IOV capable GPU is plugged into the SR-IOV incapable 
> > > platform. It would cause a problem on PCI resource allocation in 
> > > current Linux kernel.
> > > 
> > > Therefore in order to allow the PF (Physical Function) device of 
> > > SR-IOV capable GPU to work on the SR-IOV incapable platform, it is 
> > > required to verify conditions for initializing BAR resources on AMD 
> > > SR-IOV capable GPUs.
> > > 
> > > If the device is an AMD graphics device and it supports SR-IOV it 
> > > will require a large amount of resources.
> > > Before calling sriov_init() must ensure that the system BIOS also 
> > > supports SR-IOV and that system BIOS has been able to allocate 
> > > enough resources.
> > > If the VF BARs are zero then the system BIOS does not support SR-IOV 
> > > or it could not allocate the resources and this platform will not 
> > > support AMD graphics SR-IOV.
> > > Therefore do not call sriov_init().
> > > If the system BIOS does support SR-IOV then the VF BARs will be 
> > > properly initialized to non-zero values.
> > > 
> > > Below is the patch against to Kernel 4.8 & 4.9. Please review.
> > > 
> > > I checked the drivers/pci/quirks.c, it looks the workarounds/fixes 
> > > in quirks.c are for specific devices and one or more device ID are 
> > > defined for the specific devices. However my patch is for all AMD 
> > > SR-IOV capable GPUs, that includes all existing and future AMD server GPUs.
> > > So it doesn't seem like a good fit to put the fix in quirks.c.    
> > 
> > 
> > Why is an AMD graphics card unique here?  Doesn't sriov_init() always 
> > need to be able to deal with devices of any type where the BIOS hasn't 
> > initialized the SR-IOV capability?  Some SR-IOV devices can fit their 
> > VFs within a minimum bridge aperture, most cannot.  I don't understand 
> > why the VF resource requirements being exceptionally large dictates 
> > that they receive special handling.  Thanks,
> > 
> > Alex
> >   
> > > Signed-off-by: Collins Cheng <collins.cheng@amd.com>
> > > ---
> > >  drivers/pci/iov.c | 63
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++---
> > >  1 file changed, 60 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index
> > > e30f05c..e4f1405 100644
> > > --- a/drivers/pci/iov.c
> > > +++ b/drivers/pci/iov.c
> > > @@ -523,6 +523,45 @@ static void sriov_restore_state(struct pci_dev *dev)
> > >  		msleep(100);
> > >  }
> > >  
> > > +/*
> > > + * pci_vf_bar_valid - check if VF BARs have resource allocated
> > > + * @dev: the PCI device
> > > + * @pos: register offset of SR-IOV capability in PCI config space
> > > + * Returns true any VF BAR has resource allocated, false
> > > + * if all VF BARs are empty.
> > > + */
> > > +static bool pci_vf_bar_valid(struct pci_dev *dev, int pos) {
> > > +	int i;
> > > +	u32 bar_value;
> > > +	u32 bar_size_mask = ~(PCI_BASE_ADDRESS_SPACE |
> > > +			PCI_BASE_ADDRESS_MEM_TYPE_64 |
> > > +			PCI_BASE_ADDRESS_MEM_PREFETCH);
> > > +
> > > +	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
> > > +		pci_read_config_dword(dev, pos + PCI_SRIOV_BAR + i * 4, &bar_value);
> > > +		if (bar_value & bar_size_mask)
> > > +			return true;
> > > +	}
> > > +
> > > +	return false;
> > > +}
> > > +
> > > +/*
> > > + * is_amd_display_adapter - check if it is an AMD/ATI GPU device
> > > + * @dev: the PCI device
> > > + *
> > > + * Returns true if device is an AMD/ATI display adapter,
> > > + * otherwise return false.
> > > + */
> > > +
> > > +static bool is_amd_display_adapter(struct pci_dev *dev) {
> > > +	return (((dev->class >> 16) == PCI_BASE_CLASS_DISPLAY) &&
> > > +		(dev->vendor == PCI_VENDOR_ID_ATI ||
> > > +		dev->vendor == PCI_VENDOR_ID_AMD)); }
> > > +
> > >  /**
> > >   * pci_iov_init - initialize the IOV capability
> > >   * @dev: the PCI device
> > > @@ -537,9 +576,27 @@ int pci_iov_init(struct pci_dev *dev)
> > >  		return -ENODEV;
> > >  
> > >  	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);
> > > -	if (pos)
> > > -		return sriov_init(dev, pos);
> > > -
> > > +	if (pos) {
> > > +	/*
> > > +	 * If the device is an AMD graphics device and it supports
> > > +	 * SR-IOV it will require a large amount of resources.
> > > +	 * Before calling sriov_init() must ensure that the system
> > > +	 * BIOS also supports SR-IOV and that system BIOS has been
> > > +	 * able to allocate enough resources.
> > > +	 * If the VF BARs are zero then the system BIOS does not
> > > +	 * support SR-IOV or it could not allocate the resources
> > > +	 * and this platform will not support AMD graphics SR-IOV.
> > > +	 * Therefore do not call sriov_init().
> > > +	 * If the system BIOS does support SR-IOV then the VF BARs
> > > +	 * will be properly initialized to non-zero values.
> > > +	 */
> > > +		if (is_amd_display_adapter(dev)) {
> > > +			if (pci_vf_bar_valid(dev, pos))
> > > +				return sriov_init(dev, pos);
> > > +		} else {
> > > +			return sriov_init(dev, pos);
> > > +		}
> > > +	}
> > >  	return -ENODEV;
> > >  }
> > >      
> >   
>
Cheng, Collins May 15, 2017, 8:19 a.m. UTC | #7
Hi Williamson,

We cannot assume BIOS supports SR-IOV, actually only newer server motherboard BIOS supports SR-IOV. Normal desktop motherboard BIOS or older server motherboard BIOS doesn't support SR-IOV. This issue would happen if an user plugs our AMD SR-IOV capable GPU card to a normal desktop motherboard.

I agree that failure to allocate VF resources should leave the device in no worse condition than before it tried. I hope kernel could allocate PF device resource before allocating VF device resource, and keep PF device resource valid and functional if failed to allocate VF device resource.

I will send out dmesg log lspci info tomorrow. Thanks.


-Collins Cheng

-----Original Message-----
From: Alex Williamson [mailto:alex.williamson@redhat.com] 
Sent: Friday, May 12, 2017 10:43 PM
To: Cheng, Collins <Collins.Cheng@amd.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Deucher, Alexander <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>; Yinghai Lu <yinghai@kernel.org>
Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV incapable platform

On Fri, 12 May 2017 04:51:43 +0000
"Cheng, Collins" <Collins.Cheng@amd.com> wrote:

> Hi Williamson,
> 
> I verified the patch is working for both AMD SR-IOV GPU and Intel SR-IOV NIC. I don't think it is redundant to check the VF BAR valid before call sriov_init(), it is safe and saving boot time, also there is no a better method to know if system BIOS has correctly initialized the SR-IOV capability or not.

It also masks an underlying bug and creates a maintenance issue that we won't know when it's safe to remove this workaround.  I don't think faster boot is valid rationale, in one case SR-IOV is completely disabled, the other we attempt to allocate the resources the BIOS failed to provide.  I expect this is also a corner case, the BIOS should typically support SR-IOV, therefore this situation should be an exception.
 
> I did not try to fix the issue from the kernel resource allocation perspective, it is because:
> 	1. I am not very familiar with the PCI resource allocation scheme in kernel. For example, in sriov_init(), kernel will re-assign the PCI resource for both VF and PF. I don't understand why kernel allocates resource for VF firstly, then PF. If it is PF firstly, then this issue could be avoided.
> 	2. I am not sure if kernel has error handler if PCI resource allocation failed. In this case, kernel cannot allocate enough resource to PF. It should trigger some error handler to either just keep original BAR values set by system BIOS, or disable this device and log errors.

I think these are the issues we should be trying to solve and I'm sure folks on the linux-pci list can help us identify the bug.  Minimally, failure to allocate VF resources should leave the device in no worse condition than before it tried.  Perhaps you could post more details about the issue, boot with pci=earlydump, post dmesg of a boot where the PF resources are incorrectly re-allocated, and include lspci -vvv for the SR-IOV device.  Also, please test with the latest upstream kernel, upstream only patches old kernels through stable backports of commits to the latest kernel.  Adding Yinghai as a resource allocation expert. Thanks,

Alex

> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Friday, May 12, 2017 12:01 PM
> To: Cheng, Collins <Collins.Cheng@amd.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org; 
> linux-kernel@vger.kernel.org; Deucher, Alexander 
> <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>
> Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the 
> SR-IOV incapable platform
> 
> On Fri, 12 May 2017 03:42:46 +0000
> "Cheng, Collins" <Collins.Cheng@amd.com> wrote:
> 
> > Hi Williamson,
> > 
> > GPU card needs more BAR aperture resource than other PCI devices. For example, Intel SR-IOV network card only require 512KB memory resource for all VFs. AMD SR-IOV GPU card needs 256MB x16 VF = 4GB memory resource for frame buffer BAR aperture.
> > 
> > If the system BIOS supports SR-IOV, it will reserve enough resource for all VF BARs. If the system BIOS doesn't support SR-IOV or cannot allocate the enough resource for VF BARs, only PF BAR will be assigned and VF BARs are empty. Then system boots to Linux kernel and kernel doesn't check if the VF BARs are empty or valid. Kernel will re-assign the BAR resources for PF and all VFs. The problem I saw is that kernel will fail to allocate PF BAR resource because some resources are assigned to VF, this is not expected. So kernel might need to do some check before re-assign the PF/VF resource, so that PF device will be correctly assigned BAR resource and user can use PF device.  
> 
> So the problem is that something bad happens when the kernel is trying 
> to reallocate resources in order to fulfill the requirements of the 
> VFs, leaving the PF resources incorrectly programmed?  Why not just 
> fix that bug rather than creating special handling for this 
> vendor/class of device which disables any attempt to fixup resources 
> for SR-IOV?  IOW, this patch just avoids the problem for your devices 
> rather than fixing the bug.  I'd suggest fixing the bug such that the 
> PF is left in a functional state if the kernel is unable to allocate 
> sufficient resources for the VFs.  Thanks,
> 
> Alex
> 
> > -----Original Message-----
> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > Sent: Friday, May 12, 2017 11:21 AM
> > To: Cheng, Collins <Collins.Cheng@amd.com>
> > Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org; 
> > linux-kernel@vger.kernel.org; Deucher, Alexander 
> > <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>
> > Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the 
> > SR-IOV incapable platform
> > 
> > On Fri, 12 May 2017 02:50:32 +0000
> > "Cheng, Collins" <Collins.Cheng@amd.com> wrote:
> >   
> > > Hi Helgaas,
> > > 
> > > Some AMD GPUs have hardware support for graphics SR-IOV.
> > > If the SR-IOV capable GPU is plugged into the SR-IOV incapable 
> > > platform. It would cause a problem on PCI resource allocation in 
> > > current Linux kernel.
> > > 
> > > Therefore in order to allow the PF (Physical Function) device of 
> > > SR-IOV capable GPU to work on the SR-IOV incapable platform, it is 
> > > required to verify conditions for initializing BAR resources on 
> > > AMD SR-IOV capable GPUs.
> > > 
> > > If the device is an AMD graphics device and it supports SR-IOV it 
> > > will require a large amount of resources.
> > > Before calling sriov_init() must ensure that the system BIOS also 
> > > supports SR-IOV and that system BIOS has been able to allocate 
> > > enough resources.
> > > If the VF BARs are zero then the system BIOS does not support 
> > > SR-IOV or it could not allocate the resources and this platform 
> > > will not support AMD graphics SR-IOV.
> > > Therefore do not call sriov_init().
> > > If the system BIOS does support SR-IOV then the VF BARs will be 
> > > properly initialized to non-zero values.
> > > 
> > > Below is the patch against to Kernel 4.8 & 4.9. Please review.
> > > 
> > > I checked the drivers/pci/quirks.c, it looks the workarounds/fixes 
> > > in quirks.c are for specific devices and one or more device ID are 
> > > defined for the specific devices. However my patch is for all AMD 
> > > SR-IOV capable GPUs, that includes all existing and future AMD server GPUs.
> > > So it doesn't seem like a good fit to put the fix in quirks.c.    
> > 
> > 
> > Why is an AMD graphics card unique here?  Doesn't sriov_init() 
> > always need to be able to deal with devices of any type where the 
> > BIOS hasn't initialized the SR-IOV capability?  Some SR-IOV devices 
> > can fit their VFs within a minimum bridge aperture, most cannot.  I 
> > don't understand why the VF resource requirements being 
> > exceptionally large dictates that they receive special handling.  
> > Thanks,
> > 
> > Alex
> >   
> > > Signed-off-by: Collins Cheng <collins.cheng@amd.com>
> > > ---
> > >  drivers/pci/iov.c | 63
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++---
> > >  1 file changed, 60 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index
> > > e30f05c..e4f1405 100644
> > > --- a/drivers/pci/iov.c
> > > +++ b/drivers/pci/iov.c
> > > @@ -523,6 +523,45 @@ static void sriov_restore_state(struct pci_dev *dev)
> > >  		msleep(100);
> > >  }
> > >  
> > > +/*
> > > + * pci_vf_bar_valid - check if VF BARs have resource allocated
> > > + * @dev: the PCI device
> > > + * @pos: register offset of SR-IOV capability in PCI config space
> > > + * Returns true any VF BAR has resource allocated, false
> > > + * if all VF BARs are empty.
> > > + */
> > > +static bool pci_vf_bar_valid(struct pci_dev *dev, int pos) {
> > > +	int i;
> > > +	u32 bar_value;
> > > +	u32 bar_size_mask = ~(PCI_BASE_ADDRESS_SPACE |
> > > +			PCI_BASE_ADDRESS_MEM_TYPE_64 |
> > > +			PCI_BASE_ADDRESS_MEM_PREFETCH);
> > > +
> > > +	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
> > > +		pci_read_config_dword(dev, pos + PCI_SRIOV_BAR + i * 4, &bar_value);
> > > +		if (bar_value & bar_size_mask)
> > > +			return true;
> > > +	}
> > > +
> > > +	return false;
> > > +}
> > > +
> > > +/*
> > > + * is_amd_display_adapter - check if it is an AMD/ATI GPU device
> > > + * @dev: the PCI device
> > > + *
> > > + * Returns true if device is an AMD/ATI display adapter,
> > > + * otherwise return false.
> > > + */
> > > +
> > > +static bool is_amd_display_adapter(struct pci_dev *dev) {
> > > +	return (((dev->class >> 16) == PCI_BASE_CLASS_DISPLAY) &&
> > > +		(dev->vendor == PCI_VENDOR_ID_ATI ||
> > > +		dev->vendor == PCI_VENDOR_ID_AMD)); }
> > > +
> > >  /**
> > >   * pci_iov_init - initialize the IOV capability
> > >   * @dev: the PCI device
> > > @@ -537,9 +576,27 @@ int pci_iov_init(struct pci_dev *dev)
> > >  		return -ENODEV;
> > >  
> > >  	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);
> > > -	if (pos)
> > > -		return sriov_init(dev, pos);
> > > -
> > > +	if (pos) {
> > > +	/*
> > > +	 * If the device is an AMD graphics device and it supports
> > > +	 * SR-IOV it will require a large amount of resources.
> > > +	 * Before calling sriov_init() must ensure that the system
> > > +	 * BIOS also supports SR-IOV and that system BIOS has been
> > > +	 * able to allocate enough resources.
> > > +	 * If the VF BARs are zero then the system BIOS does not
> > > +	 * support SR-IOV or it could not allocate the resources
> > > +	 * and this platform will not support AMD graphics SR-IOV.
> > > +	 * Therefore do not call sriov_init().
> > > +	 * If the system BIOS does support SR-IOV then the VF BARs
> > > +	 * will be properly initialized to non-zero values.
> > > +	 */
> > > +		if (is_amd_display_adapter(dev)) {
> > > +			if (pci_vf_bar_valid(dev, pos))
> > > +				return sriov_init(dev, pos);
> > > +		} else {
> > > +			return sriov_init(dev, pos);
> > > +		}
> > > +	}
> > >  	return -ENODEV;
> > >  }
> > >      
> >   
>
Alex Williamson May 15, 2017, 5:53 p.m. UTC | #8
On Mon, 15 May 2017 08:19:28 +0000
"Cheng, Collins" <Collins.Cheng@amd.com> wrote:

> Hi Williamson,
> 
> We cannot assume BIOS supports SR-IOV, actually only newer server motherboard BIOS supports SR-IOV. Normal desktop motherboard BIOS or older server motherboard BIOS doesn't support SR-IOV. This issue would happen if an user plugs our AMD SR-IOV capable GPU card to a normal desktop motherboard.

Servers should be supporting SR-IOV for a long time now.  What really
is there to a BIOS supporting SR-IOV anyway, it's simply reserving
sufficient bus number and MMIO resources such that we can enable the
VFs.  This process isn't exclusively reserved for the BIOS.  Some
platforms may choose to only initialize boot devices, leaving the rest
for the OS to program.  The initial proposal here to disable SR-IOV if
not programmed at OS hand-off disables even the possibility of the OS
reallocating resources for this device.

> I agree that failure to allocate VF resources should leave the device in no worse condition than before it tried. I hope kernel could allocate PF device resource before allocating VF device resource, and keep PF device resource valid and functional if failed to allocate VF device resource.
> 
> I will send out dmesg log lspci info tomorrow. Thanks.

Thanks,
Alex

> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com] 
> Sent: Friday, May 12, 2017 10:43 PM
> To: Cheng, Collins <Collins.Cheng@amd.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Deucher, Alexander <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>; Yinghai Lu <yinghai@kernel.org>
> Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV incapable platform
> 
> On Fri, 12 May 2017 04:51:43 +0000
> "Cheng, Collins" <Collins.Cheng@amd.com> wrote:
> 
> > Hi Williamson,
> > 
> > I verified the patch is working for both AMD SR-IOV GPU and Intel SR-IOV NIC. I don't think it is redundant to check the VF BAR valid before call sriov_init(), it is safe and saving boot time, also there is no a better method to know if system BIOS has correctly initialized the SR-IOV capability or not.  
> 
> It also masks an underlying bug and creates a maintenance issue that we won't know when it's safe to remove this workaround.  I don't think faster boot is valid rationale, in one case SR-IOV is completely disabled, the other we attempt to allocate the resources the BIOS failed to provide.  I expect this is also a corner case, the BIOS should typically support SR-IOV, therefore this situation should be an exception.
>  
> > I did not try to fix the issue from the kernel resource allocation perspective, it is because:
> > 	1. I am not very familiar with the PCI resource allocation scheme in kernel. For example, in sriov_init(), kernel will re-assign the PCI resource for both VF and PF. I don't understand why kernel allocates resource for VF firstly, then PF. If it is PF firstly, then this issue could be avoided.
> > 	2. I am not sure if kernel has error handler if PCI resource allocation failed. In this case, kernel cannot allocate enough resource to PF. It should trigger some error handler to either just keep original BAR values set by system BIOS, or disable this device and log errors.  
> 
> I think these are the issues we should be trying to solve and I'm sure folks on the linux-pci list can help us identify the bug.  Minimally, failure to allocate VF resources should leave the device in no worse condition than before it tried.  Perhaps you could post more details about the issue, boot with pci=earlydump, post dmesg of a boot where the PF resources are incorrectly re-allocated, and include lspci -vvv for the SR-IOV device.  Also, please test with the latest upstream kernel, upstream only patches old kernels through stable backports of commits to the latest kernel.  Adding Yinghai as a resource allocation expert. Thanks,
> 
> Alex
> 
> > -----Original Message-----
> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > Sent: Friday, May 12, 2017 12:01 PM
> > To: Cheng, Collins <Collins.Cheng@amd.com>
> > Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org; 
> > linux-kernel@vger.kernel.org; Deucher, Alexander 
> > <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>
> > Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the 
> > SR-IOV incapable platform
> > 
> > On Fri, 12 May 2017 03:42:46 +0000
> > "Cheng, Collins" <Collins.Cheng@amd.com> wrote:
> >   
> > > Hi Williamson,
> > > 
> > > GPU card needs more BAR aperture resource than other PCI devices. For example, Intel SR-IOV network card only require 512KB memory resource for all VFs. AMD SR-IOV GPU card needs 256MB x16 VF = 4GB memory resource for frame buffer BAR aperture.
> > > 
> > > If the system BIOS supports SR-IOV, it will reserve enough resource for all VF BARs. If the system BIOS doesn't support SR-IOV or cannot allocate the enough resource for VF BARs, only PF BAR will be assigned and VF BARs are empty. Then system boots to Linux kernel and kernel doesn't check if the VF BARs are empty or valid. Kernel will re-assign the BAR resources for PF and all VFs. The problem I saw is that kernel will fail to allocate PF BAR resource because some resources are assigned to VF, this is not expected. So kernel might need to do some check before re-assign the PF/VF resource, so that PF device will be correctly assigned BAR resource and user can use PF device.    
> > 
> > So the problem is that something bad happens when the kernel is trying 
> > to reallocate resources in order to fulfill the requirements of the 
> > VFs, leaving the PF resources incorrectly programmed?  Why not just 
> > fix that bug rather than creating special handling for this 
> > vendor/class of device which disables any attempt to fixup resources 
> > for SR-IOV?  IOW, this patch just avoids the problem for your devices 
> > rather than fixing the bug.  I'd suggest fixing the bug such that the 
> > PF is left in a functional state if the kernel is unable to allocate 
> > sufficient resources for the VFs.  Thanks,
> > 
> > Alex
> >   
> > > -----Original Message-----
> > > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > > Sent: Friday, May 12, 2017 11:21 AM
> > > To: Cheng, Collins <Collins.Cheng@amd.com>
> > > Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org; 
> > > linux-kernel@vger.kernel.org; Deucher, Alexander 
> > > <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>
> > > Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the 
> > > SR-IOV incapable platform
> > > 
> > > On Fri, 12 May 2017 02:50:32 +0000
> > > "Cheng, Collins" <Collins.Cheng@amd.com> wrote:
> > >     
> > > > Hi Helgaas,
> > > > 
> > > > Some AMD GPUs have hardware support for graphics SR-IOV.
> > > > If the SR-IOV capable GPU is plugged into the SR-IOV incapable 
> > > > platform. It would cause a problem on PCI resource allocation in 
> > > > current Linux kernel.
> > > > 
> > > > Therefore in order to allow the PF (Physical Function) device of 
> > > > SR-IOV capable GPU to work on the SR-IOV incapable platform, it is 
> > > > required to verify conditions for initializing BAR resources on 
> > > > AMD SR-IOV capable GPUs.
> > > > 
> > > > If the device is an AMD graphics device and it supports SR-IOV it 
> > > > will require a large amount of resources.
> > > > Before calling sriov_init() must ensure that the system BIOS also 
> > > > supports SR-IOV and that system BIOS has been able to allocate 
> > > > enough resources.
> > > > If the VF BARs are zero then the system BIOS does not support 
> > > > SR-IOV or it could not allocate the resources and this platform 
> > > > will not support AMD graphics SR-IOV.
> > > > Therefore do not call sriov_init().
> > > > If the system BIOS does support SR-IOV then the VF BARs will be 
> > > > properly initialized to non-zero values.
> > > > 
> > > > Below is the patch against to Kernel 4.8 & 4.9. Please review.
> > > > 
> > > > I checked the drivers/pci/quirks.c, it looks the workarounds/fixes 
> > > > in quirks.c are for specific devices and one or more device ID are 
> > > > defined for the specific devices. However my patch is for all AMD 
> > > > SR-IOV capable GPUs, that includes all existing and future AMD server GPUs.
> > > > So it doesn't seem like a good fit to put the fix in quirks.c.      
> > > 
> > > 
> > > Why is an AMD graphics card unique here?  Doesn't sriov_init() 
> > > always need to be able to deal with devices of any type where the 
> > > BIOS hasn't initialized the SR-IOV capability?  Some SR-IOV devices 
> > > can fit their VFs within a minimum bridge aperture, most cannot.  I 
> > > don't understand why the VF resource requirements being 
> > > exceptionally large dictates that they receive special handling.  
> > > Thanks,
> > > 
> > > Alex
> > >     
> > > > Signed-off-by: Collins Cheng <collins.cheng@amd.com>
> > > > ---
> > > >  drivers/pci/iov.c | 63
> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++---
> > > >  1 file changed, 60 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index
> > > > e30f05c..e4f1405 100644
> > > > --- a/drivers/pci/iov.c
> > > > +++ b/drivers/pci/iov.c
> > > > @@ -523,6 +523,45 @@ static void sriov_restore_state(struct pci_dev *dev)
> > > >  		msleep(100);
> > > >  }
> > > >  
> > > > +/*
> > > > + * pci_vf_bar_valid - check if VF BARs have resource allocated
> > > > + * @dev: the PCI device
> > > > + * @pos: register offset of SR-IOV capability in PCI config space
> > > > + * Returns true any VF BAR has resource allocated, false
> > > > + * if all VF BARs are empty.
> > > > + */
> > > > +static bool pci_vf_bar_valid(struct pci_dev *dev, int pos) {
> > > > +	int i;
> > > > +	u32 bar_value;
> > > > +	u32 bar_size_mask = ~(PCI_BASE_ADDRESS_SPACE |
> > > > +			PCI_BASE_ADDRESS_MEM_TYPE_64 |
> > > > +			PCI_BASE_ADDRESS_MEM_PREFETCH);
> > > > +
> > > > +	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
> > > > +		pci_read_config_dword(dev, pos + PCI_SRIOV_BAR + i * 4, &bar_value);
> > > > +		if (bar_value & bar_size_mask)
> > > > +			return true;
> > > > +	}
> > > > +
> > > > +	return false;
> > > > +}
> > > > +
> > > > +/*
> > > > + * is_amd_display_adapter - check if it is an AMD/ATI GPU device
> > > > + * @dev: the PCI device
> > > > + *
> > > > + * Returns true if device is an AMD/ATI display adapter,
> > > > + * otherwise return false.
> > > > + */
> > > > +
> > > > +static bool is_amd_display_adapter(struct pci_dev *dev) {
> > > > +	return (((dev->class >> 16) == PCI_BASE_CLASS_DISPLAY) &&
> > > > +		(dev->vendor == PCI_VENDOR_ID_ATI ||
> > > > +		dev->vendor == PCI_VENDOR_ID_AMD)); }
> > > > +
> > > >  /**
> > > >   * pci_iov_init - initialize the IOV capability
> > > >   * @dev: the PCI device
> > > > @@ -537,9 +576,27 @@ int pci_iov_init(struct pci_dev *dev)
> > > >  		return -ENODEV;
> > > >  
> > > >  	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);
> > > > -	if (pos)
> > > > -		return sriov_init(dev, pos);
> > > > -
> > > > +	if (pos) {
> > > > +	/*
> > > > +	 * If the device is an AMD graphics device and it supports
> > > > +	 * SR-IOV it will require a large amount of resources.
> > > > +	 * Before calling sriov_init() must ensure that the system
> > > > +	 * BIOS also supports SR-IOV and that system BIOS has been
> > > > +	 * able to allocate enough resources.
> > > > +	 * If the VF BARs are zero then the system BIOS does not
> > > > +	 * support SR-IOV or it could not allocate the resources
> > > > +	 * and this platform will not support AMD graphics SR-IOV.
> > > > +	 * Therefore do not call sriov_init().
> > > > +	 * If the system BIOS does support SR-IOV then the VF BARs
> > > > +	 * will be properly initialized to non-zero values.
> > > > +	 */
> > > > +		if (is_amd_display_adapter(dev)) {
> > > > +			if (pci_vf_bar_valid(dev, pos))
> > > > +				return sriov_init(dev, pos);
> > > > +		} else {
> > > > +			return sriov_init(dev, pos);
> > > > +		}
> > > > +	}
> > > >  	return -ENODEV;
> > > >  }
> > > >        
> > >     
> >   
>
Cheng, Collins May 16, 2017, 8:45 a.m. UTC | #9
Hi Williamson,

Sorry I am busy on other task, I will look if I can get the dmesg log tomorrow.

My submitted patch is try to disable the possibility of the OS reallocating resources for VF device in sriov_init(), if VF BAR is empty at BIOS/OS hand-off.

-Collins Cheng


-----Original Message-----
From: Alex Williamson [mailto:alex.williamson@redhat.com] 
Sent: Tuesday, May 16, 2017 1:54 AM
To: Cheng, Collins <Collins.Cheng@amd.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Deucher, Alexander <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>; Yinghai Lu <yinghai@kernel.org>
Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV incapable platform

On Mon, 15 May 2017 08:19:28 +0000
"Cheng, Collins" <Collins.Cheng@amd.com> wrote:

> Hi Williamson,
> 
> We cannot assume BIOS supports SR-IOV, actually only newer server motherboard BIOS supports SR-IOV. Normal desktop motherboard BIOS or older server motherboard BIOS doesn't support SR-IOV. This issue would happen if an user plugs our AMD SR-IOV capable GPU card to a normal desktop motherboard.

Servers should be supporting SR-IOV for a long time now.  What really is there to a BIOS supporting SR-IOV anyway, it's simply reserving sufficient bus number and MMIO resources such that we can enable the VFs.  This process isn't exclusively reserved for the BIOS.  Some platforms may choose to only initialize boot devices, leaving the rest for the OS to program.  The initial proposal here to disable SR-IOV if not programmed at OS hand-off disables even the possibility of the OS reallocating resources for this device.

> I agree that failure to allocate VF resources should leave the device in no worse condition than before it tried. I hope kernel could allocate PF device resource before allocating VF device resource, and keep PF device resource valid and functional if failed to allocate VF device resource.
> 
> I will send out dmesg log lspci info tomorrow. Thanks.

Thanks,
Alex

> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Friday, May 12, 2017 10:43 PM
> To: Cheng, Collins <Collins.Cheng@amd.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org; 
> linux-kernel@vger.kernel.org; Deucher, Alexander 
> <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>; 
> Yinghai Lu <yinghai@kernel.org>
> Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the 
> SR-IOV incapable platform
> 
> On Fri, 12 May 2017 04:51:43 +0000
> "Cheng, Collins" <Collins.Cheng@amd.com> wrote:
> 
> > Hi Williamson,
> > 
> > I verified the patch is working for both AMD SR-IOV GPU and Intel SR-IOV NIC. I don't think it is redundant to check the VF BAR valid before call sriov_init(), it is safe and saving boot time, also there is no a better method to know if system BIOS has correctly initialized the SR-IOV capability or not.  
> 
> It also masks an underlying bug and creates a maintenance issue that we won't know when it's safe to remove this workaround.  I don't think faster boot is valid rationale, in one case SR-IOV is completely disabled, the other we attempt to allocate the resources the BIOS failed to provide.  I expect this is also a corner case, the BIOS should typically support SR-IOV, therefore this situation should be an exception.
>  
> > I did not try to fix the issue from the kernel resource allocation perspective, it is because:
> > 	1. I am not very familiar with the PCI resource allocation scheme in kernel. For example, in sriov_init(), kernel will re-assign the PCI resource for both VF and PF. I don't understand why kernel allocates resource for VF firstly, then PF. If it is PF firstly, then this issue could be avoided.
> > 	2. I am not sure if kernel has error handler if PCI resource allocation failed. In this case, kernel cannot allocate enough resource to PF. It should trigger some error handler to either just keep original BAR values set by system BIOS, or disable this device and log errors.  
> 
> I think these are the issues we should be trying to solve and I'm sure 
> folks on the linux-pci list can help us identify the bug.  Minimally, 
> failure to allocate VF resources should leave the device in no worse 
> condition than before it tried.  Perhaps you could post more details 
> about the issue, boot with pci=earlydump, post dmesg of a boot where 
> the PF resources are incorrectly re-allocated, and include lspci -vvv 
> for the SR-IOV device.  Also, please test with the latest upstream 
> kernel, upstream only patches old kernels through stable backports of 
> commits to the latest kernel.  Adding Yinghai as a resource allocation 
> expert. Thanks,
> 
> Alex
> 
> > -----Original Message-----
> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > Sent: Friday, May 12, 2017 12:01 PM
> > To: Cheng, Collins <Collins.Cheng@amd.com>
> > Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org; 
> > linux-kernel@vger.kernel.org; Deucher, Alexander 
> > <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>
> > Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the 
> > SR-IOV incapable platform
> > 
> > On Fri, 12 May 2017 03:42:46 +0000
> > "Cheng, Collins" <Collins.Cheng@amd.com> wrote:
> >   
> > > Hi Williamson,
> > > 
> > > GPU card needs more BAR aperture resource than other PCI devices. For example, Intel SR-IOV network card only require 512KB memory resource for all VFs. AMD SR-IOV GPU card needs 256MB x16 VF = 4GB memory resource for frame buffer BAR aperture.
> > > 
> > > If the system BIOS supports SR-IOV, it will reserve enough resource for all VF BARs. If the system BIOS doesn't support SR-IOV or cannot allocate the enough resource for VF BARs, only PF BAR will be assigned and VF BARs are empty. Then system boots to Linux kernel and kernel doesn't check if the VF BARs are empty or valid. Kernel will re-assign the BAR resources for PF and all VFs. The problem I saw is that kernel will fail to allocate PF BAR resource because some resources are assigned to VF, this is not expected. So kernel might need to do some check before re-assign the PF/VF resource, so that PF device will be correctly assigned BAR resource and user can use PF device.    
> > 
> > So the problem is that something bad happens when the kernel is 
> > trying to reallocate resources in order to fulfill the requirements 
> > of the VFs, leaving the PF resources incorrectly programmed?  Why 
> > not just fix that bug rather than creating special handling for this 
> > vendor/class of device which disables any attempt to fixup resources 
> > for SR-IOV?  IOW, this patch just avoids the problem for your 
> > devices rather than fixing the bug.  I'd suggest fixing the bug such 
> > that the PF is left in a functional state if the kernel is unable to 
> > allocate sufficient resources for the VFs.  Thanks,
> > 
> > Alex
> >   
> > > -----Original Message-----
> > > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > > Sent: Friday, May 12, 2017 11:21 AM
> > > To: Cheng, Collins <Collins.Cheng@amd.com>
> > > Cc: Bjorn Helgaas <bhelgaas@google.com>; 
> > > linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Deucher, 
> > > Alexander <Alexander.Deucher@amd.com>; Zytaruk, Kelly 
> > > <Kelly.Zytaruk@amd.com>
> > > Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the 
> > > SR-IOV incapable platform
> > > 
> > > On Fri, 12 May 2017 02:50:32 +0000 "Cheng, Collins" 
> > > <Collins.Cheng@amd.com> wrote:
> > >     
> > > > Hi Helgaas,
> > > > 
> > > > Some AMD GPUs have hardware support for graphics SR-IOV.
> > > > If the SR-IOV capable GPU is plugged into the SR-IOV incapable 
> > > > platform. It would cause a problem on PCI resource allocation in 
> > > > current Linux kernel.
> > > > 
> > > > Therefore in order to allow the PF (Physical Function) device of 
> > > > SR-IOV capable GPU to work on the SR-IOV incapable platform, it 
> > > > is required to verify conditions for initializing BAR resources 
> > > > on AMD SR-IOV capable GPUs.
> > > > 
> > > > If the device is an AMD graphics device and it supports SR-IOV 
> > > > it will require a large amount of resources.
> > > > Before calling sriov_init() must ensure that the system BIOS 
> > > > also supports SR-IOV and that system BIOS has been able to 
> > > > allocate enough resources.
> > > > If the VF BARs are zero then the system BIOS does not support 
> > > > SR-IOV or it could not allocate the resources and this platform 
> > > > will not support AMD graphics SR-IOV.
> > > > Therefore do not call sriov_init().
> > > > If the system BIOS does support SR-IOV then the VF BARs will be 
> > > > properly initialized to non-zero values.
> > > > 
> > > > Below is the patch against to Kernel 4.8 & 4.9. Please review.
> > > > 
> > > > I checked the drivers/pci/quirks.c, it looks the 
> > > > workarounds/fixes in quirks.c are for specific devices and one 
> > > > or more device ID are defined for the specific devices. However 
> > > > my patch is for all AMD SR-IOV capable GPUs, that includes all existing and future AMD server GPUs.
> > > > So it doesn't seem like a good fit to put the fix in quirks.c.      
> > > 
> > > 
> > > Why is an AMD graphics card unique here?  Doesn't sriov_init() 
> > > always need to be able to deal with devices of any type where the 
> > > BIOS hasn't initialized the SR-IOV capability?  Some SR-IOV 
> > > devices can fit their VFs within a minimum bridge aperture, most 
> > > cannot.  I don't understand why the VF resource requirements being 
> > > exceptionally large dictates that they receive special handling.
> > > Thanks,
> > > 
> > > Alex
> > >     
> > > > Signed-off-by: Collins Cheng <collins.cheng@amd.com>
> > > > ---
> > > >  drivers/pci/iov.c | 63
> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++---
> > > >  1 file changed, 60 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index
> > > > e30f05c..e4f1405 100644
> > > > --- a/drivers/pci/iov.c
> > > > +++ b/drivers/pci/iov.c
> > > > @@ -523,6 +523,45 @@ static void sriov_restore_state(struct pci_dev *dev)
> > > >  		msleep(100);
> > > >  }
> > > >  
> > > > +/*
> > > > + * pci_vf_bar_valid - check if VF BARs have resource allocated
> > > > + * @dev: the PCI device
> > > > + * @pos: register offset of SR-IOV capability in PCI config 
> > > > +space
> > > > + * Returns true any VF BAR has resource allocated, false
> > > > + * if all VF BARs are empty.
> > > > + */
> > > > +static bool pci_vf_bar_valid(struct pci_dev *dev, int pos) {
> > > > +	int i;
> > > > +	u32 bar_value;
> > > > +	u32 bar_size_mask = ~(PCI_BASE_ADDRESS_SPACE |
> > > > +			PCI_BASE_ADDRESS_MEM_TYPE_64 |
> > > > +			PCI_BASE_ADDRESS_MEM_PREFETCH);
> > > > +
> > > > +	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
> > > > +		pci_read_config_dword(dev, pos + PCI_SRIOV_BAR + i * 4, &bar_value);
> > > > +		if (bar_value & bar_size_mask)
> > > > +			return true;
> > > > +	}
> > > > +
> > > > +	return false;
> > > > +}
> > > > +
> > > > +/*
> > > > + * is_amd_display_adapter - check if it is an AMD/ATI GPU 
> > > > +device
> > > > + * @dev: the PCI device
> > > > + *
> > > > + * Returns true if device is an AMD/ATI display adapter,
> > > > + * otherwise return false.
> > > > + */
> > > > +
> > > > +static bool is_amd_display_adapter(struct pci_dev *dev) {
> > > > +	return (((dev->class >> 16) == PCI_BASE_CLASS_DISPLAY) &&
> > > > +		(dev->vendor == PCI_VENDOR_ID_ATI ||
> > > > +		dev->vendor == PCI_VENDOR_ID_AMD)); }
> > > > +
> > > >  /**
> > > >   * pci_iov_init - initialize the IOV capability
> > > >   * @dev: the PCI device
> > > > @@ -537,9 +576,27 @@ int pci_iov_init(struct pci_dev *dev)
> > > >  		return -ENODEV;
> > > >  
> > > >  	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);
> > > > -	if (pos)
> > > > -		return sriov_init(dev, pos);
> > > > -
> > > > +	if (pos) {
> > > > +	/*
> > > > +	 * If the device is an AMD graphics device and it supports
> > > > +	 * SR-IOV it will require a large amount of resources.
> > > > +	 * Before calling sriov_init() must ensure that the system
> > > > +	 * BIOS also supports SR-IOV and that system BIOS has been
> > > > +	 * able to allocate enough resources.
> > > > +	 * If the VF BARs are zero then the system BIOS does not
> > > > +	 * support SR-IOV or it could not allocate the resources
> > > > +	 * and this platform will not support AMD graphics SR-IOV.
> > > > +	 * Therefore do not call sriov_init().
> > > > +	 * If the system BIOS does support SR-IOV then the VF BARs
> > > > +	 * will be properly initialized to non-zero values.
> > > > +	 */
> > > > +		if (is_amd_display_adapter(dev)) {
> > > > +			if (pci_vf_bar_valid(dev, pos))
> > > > +				return sriov_init(dev, pos);
> > > > +		} else {
> > > > +			return sriov_init(dev, pos);
> > > > +		}
> > > > +	}
> > > >  	return -ENODEV;
> > > >  }
> > > >        
> > >     
> >   
>
Alexander H Duyck May 19, 2017, 3:43 p.m. UTC | #10
On Mon, May 15, 2017 at 10:53 AM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Mon, 15 May 2017 08:19:28 +0000
> "Cheng, Collins" <Collins.Cheng@amd.com> wrote:
>
>> Hi Williamson,
>>
>> We cannot assume BIOS supports SR-IOV, actually only newer server motherboard BIOS supports SR-IOV. Normal desktop motherboard BIOS or older server motherboard BIOS doesn't support SR-IOV. This issue would happen if an user plugs our AMD SR-IOV capable GPU card to a normal desktop motherboard.
>
> Servers should be supporting SR-IOV for a long time now.  What really
> is there to a BIOS supporting SR-IOV anyway, it's simply reserving
> sufficient bus number and MMIO resources such that we can enable the
> VFs.  This process isn't exclusively reserved for the BIOS.  Some
> platforms may choose to only initialize boot devices, leaving the rest
> for the OS to program.  The initial proposal here to disable SR-IOV if
> not programmed at OS hand-off disables even the possibility of the OS
> reallocating resources for this device.

There are differences between supporting SR-IOV and supporting SR-IOV
on devices with massive resources. I know I have seen NICs that will
keep a system from completing POST if SR-IOV is enabled, and MMIO
beyond 4G is not. My guess would be that the issues being seen are
probably that they disable SR-IOV in the BIOS in such a setup and end
up running into issues when they try to boot into the Linux kernel as
it goes through and tries to allocate resources for SR-IOV even though
it was disabled in the BIOS.

It might make sense to add a kernel parameter something like a
"pci=nosriov" that would allow for disabling SR-IOV and related
resource allocation if that is what we are talking about. That way you
could plug in these types of devices into a system with a legacy bios
or that doesn't wan to allocate addresses above 32b for MMIO, and this
parameter would be all that is needed to disable SR-IOV so you could
plug in a NIC that has SR-IOV associated with it.

>> I agree that failure to allocate VF resources should leave the device in no worse condition than before it tried. I hope kernel could allocate PF device resource before allocating VF device resource, and keep PF device resource valid and functional if failed to allocate VF device resource.
>>
>> I will send out dmesg log lspci info tomorrow. Thanks.
>
> Thanks,
> Alex
>
>> -----Original Message-----
>> From: Alex Williamson [mailto:alex.williamson@redhat.com]
>> Sent: Friday, May 12, 2017 10:43 PM
>> To: Cheng, Collins <Collins.Cheng@amd.com>
>> Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Deucher, Alexander <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>; Yinghai Lu <yinghai@kernel.org>
>> Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV incapable platform
>>
>> On Fri, 12 May 2017 04:51:43 +0000
>> "Cheng, Collins" <Collins.Cheng@amd.com> wrote:
>>
>> > Hi Williamson,
>> >
>> > I verified the patch is working for both AMD SR-IOV GPU and Intel SR-IOV NIC. I don't think it is redundant to check the VF BAR valid before call sriov_init(), it is safe and saving boot time, also there is no a better method to know if system BIOS has correctly initialized the SR-IOV capability or not.
>>
>> It also masks an underlying bug and creates a maintenance issue that we won't know when it's safe to remove this workaround.  I don't think faster boot is valid rationale, in one case SR-IOV is completely disabled, the other we attempt to allocate the resources the BIOS failed to provide.  I expect this is also a corner case, the BIOS should typically support SR-IOV, therefore this situation should be an exception.
>>
>> > I did not try to fix the issue from the kernel resource allocation perspective, it is because:
>> >     1. I am not very familiar with the PCI resource allocation scheme in kernel. For example, in sriov_init(), kernel will re-assign the PCI resource for both VF and PF. I don't understand why kernel allocates resource for VF firstly, then PF. If it is PF firstly, then this issue could be avoided.
>> >     2. I am not sure if kernel has error handler if PCI resource allocation failed. In this case, kernel cannot allocate enough resource to PF. It should trigger some error handler to either just keep original BAR values set by system BIOS, or disable this device and log errors.
>>
>> I think these are the issues we should be trying to solve and I'm sure folks on the linux-pci list can help us identify the bug.  Minimally, failure to allocate VF resources should leave the device in no worse condition than before it tried.  Perhaps you could post more details about the issue, boot with pci=earlydump, post dmesg of a boot where the PF resources are incorrectly re-allocated, and include lspci -vvv for the SR-IOV device.  Also, please test with the latest upstream kernel, upstream only patches old kernels through stable backports of commits to the latest kernel.  Adding Yinghai as a resource allocation expert. Thanks,
>>
>> Alex
>>
>> > -----Original Message-----
>> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
>> > Sent: Friday, May 12, 2017 12:01 PM
>> > To: Cheng, Collins <Collins.Cheng@amd.com>
>> > Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org;
>> > linux-kernel@vger.kernel.org; Deucher, Alexander
>> > <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>
>> > Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the
>> > SR-IOV incapable platform
>> >
>> > On Fri, 12 May 2017 03:42:46 +0000
>> > "Cheng, Collins" <Collins.Cheng@amd.com> wrote:
>> >
>> > > Hi Williamson,
>> > >
>> > > GPU card needs more BAR aperture resource than other PCI devices. For example, Intel SR-IOV network card only require 512KB memory resource for all VFs. AMD SR-IOV GPU card needs 256MB x16 VF = 4GB memory resource for frame buffer BAR aperture.
>> > >
>> > > If the system BIOS supports SR-IOV, it will reserve enough resource for all VF BARs. If the system BIOS doesn't support SR-IOV or cannot allocate the enough resource for VF BARs, only PF BAR will be assigned and VF BARs are empty. Then system boots to Linux kernel and kernel doesn't check if the VF BARs are empty or valid. Kernel will re-assign the BAR resources for PF and all VFs. The problem I saw is that kernel will fail to allocate PF BAR resource because some resources are assigned to VF, this is not expected. So kernel might need to do some check before re-assign the PF/VF resource, so that PF device will be correctly assigned BAR resource and user can use PF device.
>> >
>> > So the problem is that something bad happens when the kernel is trying
>> > to reallocate resources in order to fulfill the requirements of the
>> > VFs, leaving the PF resources incorrectly programmed?  Why not just
>> > fix that bug rather than creating special handling for this
>> > vendor/class of device which disables any attempt to fixup resources
>> > for SR-IOV?  IOW, this patch just avoids the problem for your devices
>> > rather than fixing the bug.  I'd suggest fixing the bug such that the
>> > PF is left in a functional state if the kernel is unable to allocate
>> > sufficient resources for the VFs.  Thanks,
>> >
>> > Alex
>> >
>> > > -----Original Message-----
>> > > From: Alex Williamson [mailto:alex.williamson@redhat.com]
>> > > Sent: Friday, May 12, 2017 11:21 AM
>> > > To: Cheng, Collins <Collins.Cheng@amd.com>
>> > > Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org;
>> > > linux-kernel@vger.kernel.org; Deucher, Alexander
>> > > <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>
>> > > Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the
>> > > SR-IOV incapable platform
>> > >
>> > > On Fri, 12 May 2017 02:50:32 +0000
>> > > "Cheng, Collins" <Collins.Cheng@amd.com> wrote:
>> > >
>> > > > Hi Helgaas,
>> > > >
>> > > > Some AMD GPUs have hardware support for graphics SR-IOV.
>> > > > If the SR-IOV capable GPU is plugged into the SR-IOV incapable
>> > > > platform. It would cause a problem on PCI resource allocation in
>> > > > current Linux kernel.
>> > > >
>> > > > Therefore in order to allow the PF (Physical Function) device of
>> > > > SR-IOV capable GPU to work on the SR-IOV incapable platform, it is
>> > > > required to verify conditions for initializing BAR resources on
>> > > > AMD SR-IOV capable GPUs.
>> > > >
>> > > > If the device is an AMD graphics device and it supports SR-IOV it
>> > > > will require a large amount of resources.
>> > > > Before calling sriov_init() must ensure that the system BIOS also
>> > > > supports SR-IOV and that system BIOS has been able to allocate
>> > > > enough resources.
>> > > > If the VF BARs are zero then the system BIOS does not support
>> > > > SR-IOV or it could not allocate the resources and this platform
>> > > > will not support AMD graphics SR-IOV.
>> > > > Therefore do not call sriov_init().
>> > > > If the system BIOS does support SR-IOV then the VF BARs will be
>> > > > properly initialized to non-zero values.
>> > > >
>> > > > Below is the patch against to Kernel 4.8 & 4.9. Please review.
>> > > >
>> > > > I checked the drivers/pci/quirks.c, it looks the workarounds/fixes
>> > > > in quirks.c are for specific devices and one or more device ID are
>> > > > defined for the specific devices. However my patch is for all AMD
>> > > > SR-IOV capable GPUs, that includes all existing and future AMD server GPUs.
>> > > > So it doesn't seem like a good fit to put the fix in quirks.c.
>> > >
>> > >
>> > > Why is an AMD graphics card unique here?  Doesn't sriov_init()
>> > > always need to be able to deal with devices of any type where the
>> > > BIOS hasn't initialized the SR-IOV capability?  Some SR-IOV devices
>> > > can fit their VFs within a minimum bridge aperture, most cannot.  I
>> > > don't understand why the VF resource requirements being
>> > > exceptionally large dictates that they receive special handling.
>> > > Thanks,
>> > >
>> > > Alex
>> > >
>> > > > Signed-off-by: Collins Cheng <collins.cheng@amd.com>
>> > > > ---
>> > > >  drivers/pci/iov.c | 63
>> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++---
>> > > >  1 file changed, 60 insertions(+), 3 deletions(-)
>> > > >
>> > > > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index
>> > > > e30f05c..e4f1405 100644
>> > > > --- a/drivers/pci/iov.c
>> > > > +++ b/drivers/pci/iov.c
>> > > > @@ -523,6 +523,45 @@ static void sriov_restore_state(struct pci_dev *dev)
>> > > >                 msleep(100);
>> > > >  }
>> > > >
>> > > > +/*
>> > > > + * pci_vf_bar_valid - check if VF BARs have resource allocated
>> > > > + * @dev: the PCI device
>> > > > + * @pos: register offset of SR-IOV capability in PCI config space
>> > > > + * Returns true any VF BAR has resource allocated, false
>> > > > + * if all VF BARs are empty.
>> > > > + */
>> > > > +static bool pci_vf_bar_valid(struct pci_dev *dev, int pos) {
>> > > > +       int i;
>> > > > +       u32 bar_value;
>> > > > +       u32 bar_size_mask = ~(PCI_BASE_ADDRESS_SPACE |
>> > > > +                       PCI_BASE_ADDRESS_MEM_TYPE_64 |
>> > > > +                       PCI_BASE_ADDRESS_MEM_PREFETCH);
>> > > > +
>> > > > +       for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>> > > > +               pci_read_config_dword(dev, pos + PCI_SRIOV_BAR + i * 4, &bar_value);
>> > > > +               if (bar_value & bar_size_mask)
>> > > > +                       return true;
>> > > > +       }
>> > > > +
>> > > > +       return false;
>> > > > +}
>> > > > +
>> > > > +/*
>> > > > + * is_amd_display_adapter - check if it is an AMD/ATI GPU device
>> > > > + * @dev: the PCI device
>> > > > + *
>> > > > + * Returns true if device is an AMD/ATI display adapter,
>> > > > + * otherwise return false.
>> > > > + */
>> > > > +
>> > > > +static bool is_amd_display_adapter(struct pci_dev *dev) {
>> > > > +       return (((dev->class >> 16) == PCI_BASE_CLASS_DISPLAY) &&
>> > > > +               (dev->vendor == PCI_VENDOR_ID_ATI ||
>> > > > +               dev->vendor == PCI_VENDOR_ID_AMD)); }
>> > > > +
>> > > >  /**
>> > > >   * pci_iov_init - initialize the IOV capability
>> > > >   * @dev: the PCI device
>> > > > @@ -537,9 +576,27 @@ int pci_iov_init(struct pci_dev *dev)
>> > > >                 return -ENODEV;
>> > > >
>> > > >         pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);
>> > > > -       if (pos)
>> > > > -               return sriov_init(dev, pos);
>> > > > -
>> > > > +       if (pos) {
>> > > > +       /*
>> > > > +        * If the device is an AMD graphics device and it supports
>> > > > +        * SR-IOV it will require a large amount of resources.
>> > > > +        * Before calling sriov_init() must ensure that the system
>> > > > +        * BIOS also supports SR-IOV and that system BIOS has been
>> > > > +        * able to allocate enough resources.
>> > > > +        * If the VF BARs are zero then the system BIOS does not
>> > > > +        * support SR-IOV or it could not allocate the resources
>> > > > +        * and this platform will not support AMD graphics SR-IOV.
>> > > > +        * Therefore do not call sriov_init().
>> > > > +        * If the system BIOS does support SR-IOV then the VF BARs
>> > > > +        * will be properly initialized to non-zero values.
>> > > > +        */
>> > > > +               if (is_amd_display_adapter(dev)) {
>> > > > +                       if (pci_vf_bar_valid(dev, pos))
>> > > > +                               return sriov_init(dev, pos);
>> > > > +               } else {
>> > > > +                       return sriov_init(dev, pos);
>> > > > +               }
>> > > > +       }
>> > > >         return -ENODEV;
>> > > >  }
>> > > >
>> > >
>> >
>>
>
Cheng, Collins May 20, 2017, 4:53 a.m. UTC | #11
Hi Alex,

Yes, I hope kernel can disable SR-IOV and related VF resource allocation if the system BIOS is not SR-IOV capable.

Adding the parameter "pci=nosriov" sounds a doable solution, but it would need user to add this parameter manually, right? I think an automatic detection would be better. My patch is trying to auto detect and bypass VF resource allocation.


-Collins Cheng


-----Original Message-----
From: Alexander Duyck [mailto:alexander.duyck@gmail.com] 

Sent: Friday, May 19, 2017 11:44 PM
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Cheng, Collins <Collins.Cheng@amd.com>; Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Deucher, Alexander <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>; Yinghai Lu <yinghai@kernel.org>
Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV incapable platform

On Mon, May 15, 2017 at 10:53 AM, Alex Williamson <alex.williamson@redhat.com> wrote:
> On Mon, 15 May 2017 08:19:28 +0000

> "Cheng, Collins" <Collins.Cheng@amd.com> wrote:

>

>> Hi Williamson,

>>

>> We cannot assume BIOS supports SR-IOV, actually only newer server motherboard BIOS supports SR-IOV. Normal desktop motherboard BIOS or older server motherboard BIOS doesn't support SR-IOV. This issue would happen if an user plugs our AMD SR-IOV capable GPU card to a normal desktop motherboard.

>

> Servers should be supporting SR-IOV for a long time now.  What really 

> is there to a BIOS supporting SR-IOV anyway, it's simply reserving 

> sufficient bus number and MMIO resources such that we can enable the 

> VFs.  This process isn't exclusively reserved for the BIOS.  Some 

> platforms may choose to only initialize boot devices, leaving the rest 

> for the OS to program.  The initial proposal here to disable SR-IOV if 

> not programmed at OS hand-off disables even the possibility of the OS 

> reallocating resources for this device.


There are differences between supporting SR-IOV and supporting SR-IOV on devices with massive resources. I know I have seen NICs that will keep a system from completing POST if SR-IOV is enabled, and MMIO beyond 4G is not. My guess would be that the issues being seen are probably that they disable SR-IOV in the BIOS in such a setup and end up running into issues when they try to boot into the Linux kernel as it goes through and tries to allocate resources for SR-IOV even though it was disabled in the BIOS.

It might make sense to add a kernel parameter something like a "pci=nosriov" that would allow for disabling SR-IOV and related resource allocation if that is what we are talking about. That way you could plug in these types of devices into a system with a legacy bios or that doesn't wan to allocate addresses above 32b for MMIO, and this parameter would be all that is needed to disable SR-IOV so you could plug in a NIC that has SR-IOV associated with it.

>> I agree that failure to allocate VF resources should leave the device in no worse condition than before it tried. I hope kernel could allocate PF device resource before allocating VF device resource, and keep PF device resource valid and functional if failed to allocate VF device resource.

>>

>> I will send out dmesg log lspci info tomorrow. Thanks.

>

> Thanks,

> Alex

>

>> -----Original Message-----

>> From: Alex Williamson [mailto:alex.williamson@redhat.com]

>> Sent: Friday, May 12, 2017 10:43 PM

>> To: Cheng, Collins <Collins.Cheng@amd.com>

>> Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org; 

>> linux-kernel@vger.kernel.org; Deucher, Alexander 

>> <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>; 

>> Yinghai Lu <yinghai@kernel.org>

>> Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the 

>> SR-IOV incapable platform

>>

>> On Fri, 12 May 2017 04:51:43 +0000

>> "Cheng, Collins" <Collins.Cheng@amd.com> wrote:

>>

>> > Hi Williamson,

>> >

>> > I verified the patch is working for both AMD SR-IOV GPU and Intel SR-IOV NIC. I don't think it is redundant to check the VF BAR valid before call sriov_init(), it is safe and saving boot time, also there is no a better method to know if system BIOS has correctly initialized the SR-IOV capability or not.

>>

>> It also masks an underlying bug and creates a maintenance issue that we won't know when it's safe to remove this workaround.  I don't think faster boot is valid rationale, in one case SR-IOV is completely disabled, the other we attempt to allocate the resources the BIOS failed to provide.  I expect this is also a corner case, the BIOS should typically support SR-IOV, therefore this situation should be an exception.

>>

>> > I did not try to fix the issue from the kernel resource allocation perspective, it is because:

>> >     1. I am not very familiar with the PCI resource allocation scheme in kernel. For example, in sriov_init(), kernel will re-assign the PCI resource for both VF and PF. I don't understand why kernel allocates resource for VF firstly, then PF. If it is PF firstly, then this issue could be avoided.

>> >     2. I am not sure if kernel has error handler if PCI resource allocation failed. In this case, kernel cannot allocate enough resource to PF. It should trigger some error handler to either just keep original BAR values set by system BIOS, or disable this device and log errors.

>>

>> I think these are the issues we should be trying to solve and I'm 

>> sure folks on the linux-pci list can help us identify the bug.  

>> Minimally, failure to allocate VF resources should leave the device 

>> in no worse condition than before it tried.  Perhaps you could post 

>> more details about the issue, boot with pci=earlydump, post dmesg of 

>> a boot where the PF resources are incorrectly re-allocated, and 

>> include lspci -vvv for the SR-IOV device.  Also, please test with the 

>> latest upstream kernel, upstream only patches old kernels through 

>> stable backports of commits to the latest kernel.  Adding Yinghai as 

>> a resource allocation expert. Thanks,

>>

>> Alex

>>

>> > -----Original Message-----

>> > From: Alex Williamson [mailto:alex.williamson@redhat.com]

>> > Sent: Friday, May 12, 2017 12:01 PM

>> > To: Cheng, Collins <Collins.Cheng@amd.com>

>> > Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org; 

>> > linux-kernel@vger.kernel.org; Deucher, Alexander 

>> > <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>

>> > Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the 

>> > SR-IOV incapable platform

>> >

>> > On Fri, 12 May 2017 03:42:46 +0000

>> > "Cheng, Collins" <Collins.Cheng@amd.com> wrote:

>> >

>> > > Hi Williamson,

>> > >

>> > > GPU card needs more BAR aperture resource than other PCI devices. For example, Intel SR-IOV network card only require 512KB memory resource for all VFs. AMD SR-IOV GPU card needs 256MB x16 VF = 4GB memory resource for frame buffer BAR aperture.

>> > >

>> > > If the system BIOS supports SR-IOV, it will reserve enough resource for all VF BARs. If the system BIOS doesn't support SR-IOV or cannot allocate the enough resource for VF BARs, only PF BAR will be assigned and VF BARs are empty. Then system boots to Linux kernel and kernel doesn't check if the VF BARs are empty or valid. Kernel will re-assign the BAR resources for PF and all VFs. The problem I saw is that kernel will fail to allocate PF BAR resource because some resources are assigned to VF, this is not expected. So kernel might need to do some check before re-assign the PF/VF resource, so that PF device will be correctly assigned BAR resource and user can use PF device.

>> >

>> > So the problem is that something bad happens when the kernel is 

>> > trying to reallocate resources in order to fulfill the requirements 

>> > of the VFs, leaving the PF resources incorrectly programmed?  Why 

>> > not just fix that bug rather than creating special handling for 

>> > this vendor/class of device which disables any attempt to fixup 

>> > resources for SR-IOV?  IOW, this patch just avoids the problem for 

>> > your devices rather than fixing the bug.  I'd suggest fixing the 

>> > bug such that the PF is left in a functional state if the kernel is 

>> > unable to allocate sufficient resources for the VFs.  Thanks,

>> >

>> > Alex

>> >

>> > > -----Original Message-----

>> > > From: Alex Williamson [mailto:alex.williamson@redhat.com]

>> > > Sent: Friday, May 12, 2017 11:21 AM

>> > > To: Cheng, Collins <Collins.Cheng@amd.com>

>> > > Cc: Bjorn Helgaas <bhelgaas@google.com>; 

>> > > linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Deucher, 

>> > > Alexander <Alexander.Deucher@amd.com>; Zytaruk, Kelly 

>> > > <Kelly.Zytaruk@amd.com>

>> > > Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the 

>> > > SR-IOV incapable platform

>> > >

>> > > On Fri, 12 May 2017 02:50:32 +0000 "Cheng, Collins" 

>> > > <Collins.Cheng@amd.com> wrote:

>> > >

>> > > > Hi Helgaas,

>> > > >

>> > > > Some AMD GPUs have hardware support for graphics SR-IOV.

>> > > > If the SR-IOV capable GPU is plugged into the SR-IOV incapable 

>> > > > platform. It would cause a problem on PCI resource allocation 

>> > > > in current Linux kernel.

>> > > >

>> > > > Therefore in order to allow the PF (Physical Function) device 

>> > > > of SR-IOV capable GPU to work on the SR-IOV incapable platform, 

>> > > > it is required to verify conditions for initializing BAR 

>> > > > resources on AMD SR-IOV capable GPUs.

>> > > >

>> > > > If the device is an AMD graphics device and it supports SR-IOV 

>> > > > it will require a large amount of resources.

>> > > > Before calling sriov_init() must ensure that the system BIOS 

>> > > > also supports SR-IOV and that system BIOS has been able to 

>> > > > allocate enough resources.

>> > > > If the VF BARs are zero then the system BIOS does not support 

>> > > > SR-IOV or it could not allocate the resources and this platform 

>> > > > will not support AMD graphics SR-IOV.

>> > > > Therefore do not call sriov_init().

>> > > > If the system BIOS does support SR-IOV then the VF BARs will be 

>> > > > properly initialized to non-zero values.

>> > > >

>> > > > Below is the patch against to Kernel 4.8 & 4.9. Please review.

>> > > >

>> > > > I checked the drivers/pci/quirks.c, it looks the 

>> > > > workarounds/fixes in quirks.c are for specific devices and one 

>> > > > or more device ID are defined for the specific devices. However 

>> > > > my patch is for all AMD SR-IOV capable GPUs, that includes all existing and future AMD server GPUs.

>> > > > So it doesn't seem like a good fit to put the fix in quirks.c.

>> > >

>> > >

>> > > Why is an AMD graphics card unique here?  Doesn't sriov_init() 

>> > > always need to be able to deal with devices of any type where the 

>> > > BIOS hasn't initialized the SR-IOV capability?  Some SR-IOV 

>> > > devices can fit their VFs within a minimum bridge aperture, most 

>> > > cannot.  I don't understand why the VF resource requirements 

>> > > being exceptionally large dictates that they receive special handling.

>> > > Thanks,

>> > >

>> > > Alex

>> > >

>> > > > Signed-off-by: Collins Cheng <collins.cheng@amd.com>

>> > > > ---

>> > > >  drivers/pci/iov.c | 63

>> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++---

>> > > >  1 file changed, 60 insertions(+), 3 deletions(-)

>> > > >

>> > > > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index

>> > > > e30f05c..e4f1405 100644

>> > > > --- a/drivers/pci/iov.c

>> > > > +++ b/drivers/pci/iov.c

>> > > > @@ -523,6 +523,45 @@ static void sriov_restore_state(struct pci_dev *dev)

>> > > >                 msleep(100);

>> > > >  }

>> > > >

>> > > > +/*

>> > > > + * pci_vf_bar_valid - check if VF BARs have resource allocated

>> > > > + * @dev: the PCI device

>> > > > + * @pos: register offset of SR-IOV capability in PCI config 

>> > > > +space

>> > > > + * Returns true any VF BAR has resource allocated, false

>> > > > + * if all VF BARs are empty.

>> > > > + */

>> > > > +static bool pci_vf_bar_valid(struct pci_dev *dev, int pos) {

>> > > > +       int i;

>> > > > +       u32 bar_value;

>> > > > +       u32 bar_size_mask = ~(PCI_BASE_ADDRESS_SPACE |

>> > > > +                       PCI_BASE_ADDRESS_MEM_TYPE_64 |

>> > > > +                       PCI_BASE_ADDRESS_MEM_PREFETCH);

>> > > > +

>> > > > +       for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {

>> > > > +               pci_read_config_dword(dev, pos + PCI_SRIOV_BAR + i * 4, &bar_value);

>> > > > +               if (bar_value & bar_size_mask)

>> > > > +                       return true;

>> > > > +       }

>> > > > +

>> > > > +       return false;

>> > > > +}

>> > > > +

>> > > > +/*

>> > > > + * is_amd_display_adapter - check if it is an AMD/ATI GPU 

>> > > > +device

>> > > > + * @dev: the PCI device

>> > > > + *

>> > > > + * Returns true if device is an AMD/ATI display adapter,

>> > > > + * otherwise return false.

>> > > > + */

>> > > > +

>> > > > +static bool is_amd_display_adapter(struct pci_dev *dev) {

>> > > > +       return (((dev->class >> 16) == PCI_BASE_CLASS_DISPLAY) &&

>> > > > +               (dev->vendor == PCI_VENDOR_ID_ATI ||

>> > > > +               dev->vendor == PCI_VENDOR_ID_AMD)); }

>> > > > +

>> > > >  /**

>> > > >   * pci_iov_init - initialize the IOV capability

>> > > >   * @dev: the PCI device

>> > > > @@ -537,9 +576,27 @@ int pci_iov_init(struct pci_dev *dev)

>> > > >                 return -ENODEV;

>> > > >

>> > > >         pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);

>> > > > -       if (pos)

>> > > > -               return sriov_init(dev, pos);

>> > > > -

>> > > > +       if (pos) {

>> > > > +       /*

>> > > > +        * If the device is an AMD graphics device and it supports

>> > > > +        * SR-IOV it will require a large amount of resources.

>> > > > +        * Before calling sriov_init() must ensure that the system

>> > > > +        * BIOS also supports SR-IOV and that system BIOS has been

>> > > > +        * able to allocate enough resources.

>> > > > +        * If the VF BARs are zero then the system BIOS does not

>> > > > +        * support SR-IOV or it could not allocate the resources

>> > > > +        * and this platform will not support AMD graphics SR-IOV.

>> > > > +        * Therefore do not call sriov_init().

>> > > > +        * If the system BIOS does support SR-IOV then the VF BARs

>> > > > +        * will be properly initialized to non-zero values.

>> > > > +        */

>> > > > +               if (is_amd_display_adapter(dev)) {

>> > > > +                       if (pci_vf_bar_valid(dev, pos))

>> > > > +                               return sriov_init(dev, pos);

>> > > > +               } else {

>> > > > +                       return sriov_init(dev, pos);

>> > > > +               }

>> > > > +       }

>> > > >         return -ENODEV;

>> > > >  }

>> > > >

>> > >

>> >

>>

>
Zytaruk, Kelly May 20, 2017, 10:27 a.m. UTC | #12
>-----Original Message-----

>From: Cheng, Collins

>Sent: Saturday, May 20, 2017 12:53 AM

>To: Alexander Duyck; Alex Williamson

>Cc: Bjorn Helgaas; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org;

>Deucher, Alexander; Zytaruk, Kelly; Yinghai Lu

>Subject: RE: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV

>incapable platform

>

>Hi Alex,

>

>Yes, I hope kernel can disable SR-IOV and related VF resource allocation if the

>system BIOS is not SR-IOV capable.

>

>Adding the parameter "pci=nosriov" sounds a doable solution, but it would need

>user to add this parameter manually, right? I think an automatic detection would

>be better. My patch is trying to auto detect and bypass VF resource allocation.

>

>

>-Collins Cheng

>


Collins, be careful about this.  I don't think that this is what we want.  If you add "pci=nosriov" then you are globally disabling SRIOV for all devices.  This is not the solution that we are looking for.
Remember that there are 3 types of SBIOS; 
"not SR-IOV capable", 
"SR-IOV capable but does not support large resources", 
"Complete SR-IOV support".

The problem is that we are trying to find a fix for "broken" SBIOS that does support SR-IOV but does not support the full SR-IOV capabilities that devices with large resources require.

Thanks,
Kelly

>

>-----Original Message-----

>From: Alexander Duyck [mailto:alexander.duyck@gmail.com]

>Sent: Friday, May 19, 2017 11:44 PM

>To: Alex Williamson <alex.williamson@redhat.com>

>Cc: Cheng, Collins <Collins.Cheng@amd.com>; Bjorn Helgaas

><bhelgaas@google.com>; linux-pci@vger.kernel.org; linux-

>kernel@vger.kernel.org; Deucher, Alexander <Alexander.Deucher@amd.com>;

>Zytaruk, Kelly <Kelly.Zytaruk@amd.com>; Yinghai Lu <yinghai@kernel.org>

>Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV

>incapable platform

>

>On Mon, May 15, 2017 at 10:53 AM, Alex Williamson

><alex.williamson@redhat.com> wrote:

>> On Mon, 15 May 2017 08:19:28 +0000

>> "Cheng, Collins" <Collins.Cheng@amd.com> wrote:

>>

>>> Hi Williamson,

>>>

>>> We cannot assume BIOS supports SR-IOV, actually only newer server

>motherboard BIOS supports SR-IOV. Normal desktop motherboard BIOS or older

>server motherboard BIOS doesn't support SR-IOV. This issue would happen if an

>user plugs our AMD SR-IOV capable GPU card to a normal desktop motherboard.

>>

>> Servers should be supporting SR-IOV for a long time now.  What really

>> is there to a BIOS supporting SR-IOV anyway, it's simply reserving

>> sufficient bus number and MMIO resources such that we can enable the

>> VFs.  This process isn't exclusively reserved for the BIOS.  Some

>> platforms may choose to only initialize boot devices, leaving the rest

>> for the OS to program.  The initial proposal here to disable SR-IOV if

>> not programmed at OS hand-off disables even the possibility of the OS

>> reallocating resources for this device.

>

>There are differences between supporting SR-IOV and supporting SR-IOV on

>devices with massive resources. I know I have seen NICs that will keep a system

>from completing POST if SR-IOV is enabled, and MMIO beyond 4G is not. My

>guess would be that the issues being seen are probably that they disable SR-IOV in

>the BIOS in such a setup and end up running into issues when they try to boot into

>the Linux kernel as it goes through and tries to allocate resources for SR-IOV even

>though it was disabled in the BIOS.

>

>It might make sense to add a kernel parameter something like a "pci=nosriov"

>that would allow for disabling SR-IOV and related resource allocation if that is

>what we are talking about. That way you could plug in these types of devices into

>a system with a legacy bios or that doesn't wan to allocate addresses above 32b

>for MMIO, and this parameter would be all that is needed to disable SR-IOV so

>you could plug in a NIC that has SR-IOV associated with it.

>

>>> I agree that failure to allocate VF resources should leave the device in no

>worse condition than before it tried. I hope kernel could allocate PF device

>resource before allocating VF device resource, and keep PF device resource valid

>and functional if failed to allocate VF device resource.

>>>

>>> I will send out dmesg log lspci info tomorrow. Thanks.

>>

>> Thanks,

>> Alex

>>

>>> -----Original Message-----

>>> From: Alex Williamson [mailto:alex.williamson@redhat.com]

>>> Sent: Friday, May 12, 2017 10:43 PM

>>> To: Cheng, Collins <Collins.Cheng@amd.com>

>>> Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org;

>>> linux-kernel@vger.kernel.org; Deucher, Alexander

>>> <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>;

>>> Yinghai Lu <yinghai@kernel.org>

>>> Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the

>>> SR-IOV incapable platform

>>>

>>> On Fri, 12 May 2017 04:51:43 +0000

>>> "Cheng, Collins" <Collins.Cheng@amd.com> wrote:

>>>

>>> > Hi Williamson,

>>> >

>>> > I verified the patch is working for both AMD SR-IOV GPU and Intel SR-IOV

>NIC. I don't think it is redundant to check the VF BAR valid before call sriov_init(),

>it is safe and saving boot time, also there is no a better method to know if system

>BIOS has correctly initialized the SR-IOV capability or not.

>>>

>>> It also masks an underlying bug and creates a maintenance issue that we won't

>know when it's safe to remove this workaround.  I don't think faster boot is valid

>rationale, in one case SR-IOV is completely disabled, the other we attempt to

>allocate the resources the BIOS failed to provide.  I expect this is also a corner

>case, the BIOS should typically support SR-IOV, therefore this situation should be

>an exception.

>>>

>>> > I did not try to fix the issue from the kernel resource allocation perspective,

>it is because:

>>> >     1. I am not very familiar with the PCI resource allocation scheme in kernel.

>For example, in sriov_init(), kernel will re-assign the PCI resource for both VF and

>PF. I don't understand why kernel allocates resource for VF firstly, then PF. If it is

>PF firstly, then this issue could be avoided.

>>> >     2. I am not sure if kernel has error handler if PCI resource allocation failed.

>In this case, kernel cannot allocate enough resource to PF. It should trigger some

>error handler to either just keep original BAR values set by system BIOS, or disable

>this device and log errors.

>>>

>>> I think these are the issues we should be trying to solve and I'm

>>> sure folks on the linux-pci list can help us identify the bug.

>>> Minimally, failure to allocate VF resources should leave the device

>>> in no worse condition than before it tried.  Perhaps you could post

>>> more details about the issue, boot with pci=earlydump, post dmesg of

>>> a boot where the PF resources are incorrectly re-allocated, and

>>> include lspci -vvv for the SR-IOV device.  Also, please test with the

>>> latest upstream kernel, upstream only patches old kernels through

>>> stable backports of commits to the latest kernel.  Adding Yinghai as

>>> a resource allocation expert. Thanks,

>>>

>>> Alex

>>>

>>> > -----Original Message-----

>>> > From: Alex Williamson [mailto:alex.williamson@redhat.com]

>>> > Sent: Friday, May 12, 2017 12:01 PM

>>> > To: Cheng, Collins <Collins.Cheng@amd.com>

>>> > Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org;

>>> > linux-kernel@vger.kernel.org; Deucher, Alexander

>>> > <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>

>>> > Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the

>>> > SR-IOV incapable platform

>>> >

>>> > On Fri, 12 May 2017 03:42:46 +0000

>>> > "Cheng, Collins" <Collins.Cheng@amd.com> wrote:

>>> >

>>> > > Hi Williamson,

>>> > >

>>> > > GPU card needs more BAR aperture resource than other PCI devices. For

>example, Intel SR-IOV network card only require 512KB memory resource for all

>VFs. AMD SR-IOV GPU card needs 256MB x16 VF = 4GB memory resource for

>frame buffer BAR aperture.

>>> > >

>>> > > If the system BIOS supports SR-IOV, it will reserve enough resource for all

>VF BARs. If the system BIOS doesn't support SR-IOV or cannot allocate the

>enough resource for VF BARs, only PF BAR will be assigned and VF BARs are

>empty. Then system boots to Linux kernel and kernel doesn't check if the VF BARs

>are empty or valid. Kernel will re-assign the BAR resources for PF and all VFs. The

>problem I saw is that kernel will fail to allocate PF BAR resource because some

>resources are assigned to VF, this is not expected. So kernel might need to do

>some check before re-assign the PF/VF resource, so that PF device will be

>correctly assigned BAR resource and user can use PF device.

>>> >

>>> > So the problem is that something bad happens when the kernel is

>>> > trying to reallocate resources in order to fulfill the requirements

>>> > of the VFs, leaving the PF resources incorrectly programmed?  Why

>>> > not just fix that bug rather than creating special handling for

>>> > this vendor/class of device which disables any attempt to fixup

>>> > resources for SR-IOV?  IOW, this patch just avoids the problem for

>>> > your devices rather than fixing the bug.  I'd suggest fixing the

>>> > bug such that the PF is left in a functional state if the kernel is

>>> > unable to allocate sufficient resources for the VFs.  Thanks,

>>> >

>>> > Alex

>>> >

>>> > > -----Original Message-----

>>> > > From: Alex Williamson [mailto:alex.williamson@redhat.com]

>>> > > Sent: Friday, May 12, 2017 11:21 AM

>>> > > To: Cheng, Collins <Collins.Cheng@amd.com>

>>> > > Cc: Bjorn Helgaas <bhelgaas@google.com>;

>>> > > linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Deucher,

>>> > > Alexander <Alexander.Deucher@amd.com>; Zytaruk, Kelly

>>> > > <Kelly.Zytaruk@amd.com>

>>> > > Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the

>>> > > SR-IOV incapable platform

>>> > >

>>> > > On Fri, 12 May 2017 02:50:32 +0000 "Cheng, Collins"

>>> > > <Collins.Cheng@amd.com> wrote:

>>> > >

>>> > > > Hi Helgaas,

>>> > > >

>>> > > > Some AMD GPUs have hardware support for graphics SR-IOV.

>>> > > > If the SR-IOV capable GPU is plugged into the SR-IOV incapable

>>> > > > platform. It would cause a problem on PCI resource allocation

>>> > > > in current Linux kernel.

>>> > > >

>>> > > > Therefore in order to allow the PF (Physical Function) device

>>> > > > of SR-IOV capable GPU to work on the SR-IOV incapable platform,

>>> > > > it is required to verify conditions for initializing BAR

>>> > > > resources on AMD SR-IOV capable GPUs.

>>> > > >

>>> > > > If the device is an AMD graphics device and it supports SR-IOV

>>> > > > it will require a large amount of resources.

>>> > > > Before calling sriov_init() must ensure that the system BIOS

>>> > > > also supports SR-IOV and that system BIOS has been able to

>>> > > > allocate enough resources.

>>> > > > If the VF BARs are zero then the system BIOS does not support

>>> > > > SR-IOV or it could not allocate the resources and this platform

>>> > > > will not support AMD graphics SR-IOV.

>>> > > > Therefore do not call sriov_init().

>>> > > > If the system BIOS does support SR-IOV then the VF BARs will be

>>> > > > properly initialized to non-zero values.

>>> > > >

>>> > > > Below is the patch against to Kernel 4.8 & 4.9. Please review.

>>> > > >

>>> > > > I checked the drivers/pci/quirks.c, it looks the

>>> > > > workarounds/fixes in quirks.c are for specific devices and one

>>> > > > or more device ID are defined for the specific devices. However

>>> > > > my patch is for all AMD SR-IOV capable GPUs, that includes all existing

>and future AMD server GPUs.

>>> > > > So it doesn't seem like a good fit to put the fix in quirks.c.

>>> > >

>>> > >

>>> > > Why is an AMD graphics card unique here?  Doesn't sriov_init()

>>> > > always need to be able to deal with devices of any type where the

>>> > > BIOS hasn't initialized the SR-IOV capability?  Some SR-IOV

>>> > > devices can fit their VFs within a minimum bridge aperture, most

>>> > > cannot.  I don't understand why the VF resource requirements

>>> > > being exceptionally large dictates that they receive special handling.

>>> > > Thanks,

>>> > >

>>> > > Alex

>>> > >

>>> > > > Signed-off-by: Collins Cheng <collins.cheng@amd.com>

>>> > > > ---

>>> > > >  drivers/pci/iov.c | 63

>>> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++---

>>> > > >  1 file changed, 60 insertions(+), 3 deletions(-)

>>> > > >

>>> > > > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index

>>> > > > e30f05c..e4f1405 100644

>>> > > > --- a/drivers/pci/iov.c

>>> > > > +++ b/drivers/pci/iov.c

>>> > > > @@ -523,6 +523,45 @@ static void sriov_restore_state(struct pci_dev

>*dev)

>>> > > >                 msleep(100);

>>> > > >  }

>>> > > >

>>> > > > +/*

>>> > > > + * pci_vf_bar_valid - check if VF BARs have resource allocated

>>> > > > + * @dev: the PCI device

>>> > > > + * @pos: register offset of SR-IOV capability in PCI config

>>> > > > +space

>>> > > > + * Returns true any VF BAR has resource allocated, false

>>> > > > + * if all VF BARs are empty.

>>> > > > + */

>>> > > > +static bool pci_vf_bar_valid(struct pci_dev *dev, int pos) {

>>> > > > +       int i;

>>> > > > +       u32 bar_value;

>>> > > > +       u32 bar_size_mask = ~(PCI_BASE_ADDRESS_SPACE |

>>> > > > +                       PCI_BASE_ADDRESS_MEM_TYPE_64 |

>>> > > > +                       PCI_BASE_ADDRESS_MEM_PREFETCH);

>>> > > > +

>>> > > > +       for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {

>>> > > > +               pci_read_config_dword(dev, pos + PCI_SRIOV_BAR + i * 4,

>&bar_value);

>>> > > > +               if (bar_value & bar_size_mask)

>>> > > > +                       return true;

>>> > > > +       }

>>> > > > +

>>> > > > +       return false;

>>> > > > +}

>>> > > > +

>>> > > > +/*

>>> > > > + * is_amd_display_adapter - check if it is an AMD/ATI GPU

>>> > > > +device

>>> > > > + * @dev: the PCI device

>>> > > > + *

>>> > > > + * Returns true if device is an AMD/ATI display adapter,

>>> > > > + * otherwise return false.

>>> > > > + */

>>> > > > +

>>> > > > +static bool is_amd_display_adapter(struct pci_dev *dev) {

>>> > > > +       return (((dev->class >> 16) == PCI_BASE_CLASS_DISPLAY) &&

>>> > > > +               (dev->vendor == PCI_VENDOR_ID_ATI ||

>>> > > > +               dev->vendor == PCI_VENDOR_ID_AMD)); }

>>> > > > +

>>> > > >  /**

>>> > > >   * pci_iov_init - initialize the IOV capability

>>> > > >   * @dev: the PCI device

>>> > > > @@ -537,9 +576,27 @@ int pci_iov_init(struct pci_dev *dev)

>>> > > >                 return -ENODEV;

>>> > > >

>>> > > >         pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);

>>> > > > -       if (pos)

>>> > > > -               return sriov_init(dev, pos);

>>> > > > -

>>> > > > +       if (pos) {

>>> > > > +       /*

>>> > > > +        * If the device is an AMD graphics device and it supports

>>> > > > +        * SR-IOV it will require a large amount of resources.

>>> > > > +        * Before calling sriov_init() must ensure that the system

>>> > > > +        * BIOS also supports SR-IOV and that system BIOS has been

>>> > > > +        * able to allocate enough resources.

>>> > > > +        * If the VF BARs are zero then the system BIOS does not

>>> > > > +        * support SR-IOV or it could not allocate the resources

>>> > > > +        * and this platform will not support AMD graphics SR-IOV.

>>> > > > +        * Therefore do not call sriov_init().

>>> > > > +        * If the system BIOS does support SR-IOV then the VF BARs

>>> > > > +        * will be properly initialized to non-zero values.

>>> > > > +        */

>>> > > > +               if (is_amd_display_adapter(dev)) {

>>> > > > +                       if (pci_vf_bar_valid(dev, pos))

>>> > > > +                               return sriov_init(dev, pos);

>>> > > > +               } else {

>>> > > > +                       return sriov_init(dev, pos);

>>> > > > +               }

>>> > > > +       }

>>> > > >         return -ENODEV;

>>> > > >  }

>>> > > >

>>> > >

>>> >

>>>

>>
Cheng, Collins May 20, 2017, 10:37 a.m. UTC | #13
Hi Kelly,

This issue also happens in "not SR-IOV capable" SBIOS. It seems some "not SR-IOV capable" SBIOS will directly report error in system BIOS boot stage and doesn't boot to OS. But other "not SR-IOV capable" SBIOS would not report error and boot to Linux.

-Collins Cheng


-----Original Message-----
From: Zytaruk, Kelly 

Sent: Saturday, May 20, 2017 6:28 PM
To: Cheng, Collins <Collins.Cheng@amd.com>; Alexander Duyck <alexander.duyck@gmail.com>; Alex Williamson <alex.williamson@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Deucher, Alexander <Alexander.Deucher@amd.com>; Yinghai Lu <yinghai@kernel.org>
Subject: RE: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV incapable platform



>-----Original Message-----

>From: Cheng, Collins

>Sent: Saturday, May 20, 2017 12:53 AM

>To: Alexander Duyck; Alex Williamson

>Cc: Bjorn Helgaas; linux-pci@vger.kernel.org; 

>linux-kernel@vger.kernel.org; Deucher, Alexander; Zytaruk, Kelly; 

>Yinghai Lu

>Subject: RE: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV 

>incapable platform

>

>Hi Alex,

>

>Yes, I hope kernel can disable SR-IOV and related VF resource 

>allocation if the system BIOS is not SR-IOV capable.

>

>Adding the parameter "pci=nosriov" sounds a doable solution, but it 

>would need user to add this parameter manually, right? I think an 

>automatic detection would be better. My patch is trying to auto detect and bypass VF resource allocation.

>

>

>-Collins Cheng

>


Collins, be careful about this.  I don't think that this is what we want.  If you add "pci=nosriov" then you are globally disabling SRIOV for all devices.  This is not the solution that we are looking for.
Remember that there are 3 types of SBIOS; "not SR-IOV capable", "SR-IOV capable but does not support large resources", "Complete SR-IOV support".

The problem is that we are trying to find a fix for "broken" SBIOS that does support SR-IOV but does not support the full SR-IOV capabilities that devices with large resources require.

Thanks,
Kelly

>

>-----Original Message-----

>From: Alexander Duyck [mailto:alexander.duyck@gmail.com]

>Sent: Friday, May 19, 2017 11:44 PM

>To: Alex Williamson <alex.williamson@redhat.com>

>Cc: Cheng, Collins <Collins.Cheng@amd.com>; Bjorn Helgaas 

><bhelgaas@google.com>; linux-pci@vger.kernel.org; linux- 

>kernel@vger.kernel.org; Deucher, Alexander <Alexander.Deucher@amd.com>; 

>Zytaruk, Kelly <Kelly.Zytaruk@amd.com>; Yinghai Lu <yinghai@kernel.org>

>Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV 

>incapable platform

>

>On Mon, May 15, 2017 at 10:53 AM, Alex Williamson 

><alex.williamson@redhat.com> wrote:

>> On Mon, 15 May 2017 08:19:28 +0000

>> "Cheng, Collins" <Collins.Cheng@amd.com> wrote:

>>

>>> Hi Williamson,

>>>

>>> We cannot assume BIOS supports SR-IOV, actually only newer server

>motherboard BIOS supports SR-IOV. Normal desktop motherboard BIOS or 

>older server motherboard BIOS doesn't support SR-IOV. This issue would 

>happen if an user plugs our AMD SR-IOV capable GPU card to a normal desktop motherboard.

>>

>> Servers should be supporting SR-IOV for a long time now.  What really 

>> is there to a BIOS supporting SR-IOV anyway, it's simply reserving 

>> sufficient bus number and MMIO resources such that we can enable the 

>> VFs.  This process isn't exclusively reserved for the BIOS.  Some 

>> platforms may choose to only initialize boot devices, leaving the 

>> rest for the OS to program.  The initial proposal here to disable 

>> SR-IOV if not programmed at OS hand-off disables even the possibility 

>> of the OS reallocating resources for this device.

>

>There are differences between supporting SR-IOV and supporting SR-IOV 

>on devices with massive resources. I know I have seen NICs that will 

>keep a system from completing POST if SR-IOV is enabled, and MMIO 

>beyond 4G is not. My guess would be that the issues being seen are 

>probably that they disable SR-IOV in the BIOS in such a setup and end 

>up running into issues when they try to boot into the Linux kernel as 

>it goes through and tries to allocate resources for SR-IOV even though it was disabled in the BIOS.

>

>It might make sense to add a kernel parameter something like a "pci=nosriov"

>that would allow for disabling SR-IOV and related resource allocation 

>if that is what we are talking about. That way you could plug in these 

>types of devices into a system with a legacy bios or that doesn't wan 

>to allocate addresses above 32b for MMIO, and this parameter would be 

>all that is needed to disable SR-IOV so you could plug in a NIC that has SR-IOV associated with it.

>

>>> I agree that failure to allocate VF resources should leave the 

>>> device in no

>worse condition than before it tried. I hope kernel could allocate PF 

>device resource before allocating VF device resource, and keep PF 

>device resource valid and functional if failed to allocate VF device resource.

>>>

>>> I will send out dmesg log lspci info tomorrow. Thanks.

>>

>> Thanks,

>> Alex

>>

>>> -----Original Message-----

>>> From: Alex Williamson [mailto:alex.williamson@redhat.com]

>>> Sent: Friday, May 12, 2017 10:43 PM

>>> To: Cheng, Collins <Collins.Cheng@amd.com>

>>> Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org; 

>>> linux-kernel@vger.kernel.org; Deucher, Alexander 

>>> <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>; 

>>> Yinghai Lu <yinghai@kernel.org>

>>> Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the 

>>> SR-IOV incapable platform

>>>

>>> On Fri, 12 May 2017 04:51:43 +0000

>>> "Cheng, Collins" <Collins.Cheng@amd.com> wrote:

>>>

>>> > Hi Williamson,

>>> >

>>> > I verified the patch is working for both AMD SR-IOV GPU and Intel 

>>> > SR-IOV

>NIC. I don't think it is redundant to check the VF BAR valid before 

>call sriov_init(), it is safe and saving boot time, also there is no a 

>better method to know if system BIOS has correctly initialized the SR-IOV capability or not.

>>>

>>> It also masks an underlying bug and creates a maintenance issue that 

>>> we won't

>know when it's safe to remove this workaround.  I don't think faster 

>boot is valid rationale, in one case SR-IOV is completely disabled, the 

>other we attempt to allocate the resources the BIOS failed to provide.  

>I expect this is also a corner case, the BIOS should typically support 

>SR-IOV, therefore this situation should be an exception.

>>>

>>> > I did not try to fix the issue from the kernel resource allocation 

>>> > perspective,

>it is because:

>>> >     1. I am not very familiar with the PCI resource allocation scheme in kernel.

>For example, in sriov_init(), kernel will re-assign the PCI resource 

>for both VF and PF. I don't understand why kernel allocates resource 

>for VF firstly, then PF. If it is PF firstly, then this issue could be avoided.

>>> >     2. I am not sure if kernel has error handler if PCI resource allocation failed.

>In this case, kernel cannot allocate enough resource to PF. It should 

>trigger some error handler to either just keep original BAR values set 

>by system BIOS, or disable this device and log errors.

>>>

>>> I think these are the issues we should be trying to solve and I'm 

>>> sure folks on the linux-pci list can help us identify the bug.

>>> Minimally, failure to allocate VF resources should leave the device 

>>> in no worse condition than before it tried.  Perhaps you could post 

>>> more details about the issue, boot with pci=earlydump, post dmesg of 

>>> a boot where the PF resources are incorrectly re-allocated, and 

>>> include lspci -vvv for the SR-IOV device.  Also, please test with 

>>> the latest upstream kernel, upstream only patches old kernels 

>>> through stable backports of commits to the latest kernel.  Adding 

>>> Yinghai as a resource allocation expert. Thanks,

>>>

>>> Alex

>>>

>>> > -----Original Message-----

>>> > From: Alex Williamson [mailto:alex.williamson@redhat.com]

>>> > Sent: Friday, May 12, 2017 12:01 PM

>>> > To: Cheng, Collins <Collins.Cheng@amd.com>

>>> > Cc: Bjorn Helgaas <bhelgaas@google.com>; 

>>> > linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Deucher, 

>>> > Alexander <Alexander.Deucher@amd.com>; Zytaruk, Kelly 

>>> > <Kelly.Zytaruk@amd.com>

>>> > Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the 

>>> > SR-IOV incapable platform

>>> >

>>> > On Fri, 12 May 2017 03:42:46 +0000 "Cheng, Collins" 

>>> > <Collins.Cheng@amd.com> wrote:

>>> >

>>> > > Hi Williamson,

>>> > >

>>> > > GPU card needs more BAR aperture resource than other PCI 

>>> > > devices. For

>example, Intel SR-IOV network card only require 512KB memory resource 

>for all VFs. AMD SR-IOV GPU card needs 256MB x16 VF = 4GB memory 

>resource for frame buffer BAR aperture.

>>> > >

>>> > > If the system BIOS supports SR-IOV, it will reserve enough 

>>> > > resource for all

>VF BARs. If the system BIOS doesn't support SR-IOV or cannot allocate 

>the enough resource for VF BARs, only PF BAR will be assigned and VF 

>BARs are empty. Then system boots to Linux kernel and kernel doesn't 

>check if the VF BARs are empty or valid. Kernel will re-assign the BAR 

>resources for PF and all VFs. The problem I saw is that kernel will 

>fail to allocate PF BAR resource because some resources are assigned to 

>VF, this is not expected. So kernel might need to do some check before 

>re-assign the PF/VF resource, so that PF device will be correctly assigned BAR resource and user can use PF device.

>>> >

>>> > So the problem is that something bad happens when the kernel is 

>>> > trying to reallocate resources in order to fulfill the 

>>> > requirements of the VFs, leaving the PF resources incorrectly 

>>> > programmed?  Why not just fix that bug rather than creating 

>>> > special handling for this vendor/class of device which disables 

>>> > any attempt to fixup resources for SR-IOV?  IOW, this patch just 

>>> > avoids the problem for your devices rather than fixing the bug.  

>>> > I'd suggest fixing the bug such that the PF is left in a 

>>> > functional state if the kernel is unable to allocate sufficient 

>>> > resources for the VFs.  Thanks,

>>> >

>>> > Alex

>>> >

>>> > > -----Original Message-----

>>> > > From: Alex Williamson [mailto:alex.williamson@redhat.com]

>>> > > Sent: Friday, May 12, 2017 11:21 AM

>>> > > To: Cheng, Collins <Collins.Cheng@amd.com>

>>> > > Cc: Bjorn Helgaas <bhelgaas@google.com>; 

>>> > > linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; 

>>> > > Deucher, Alexander <Alexander.Deucher@amd.com>; Zytaruk, Kelly 

>>> > > <Kelly.Zytaruk@amd.com>

>>> > > Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the 

>>> > > SR-IOV incapable platform

>>> > >

>>> > > On Fri, 12 May 2017 02:50:32 +0000 "Cheng, Collins"

>>> > > <Collins.Cheng@amd.com> wrote:

>>> > >

>>> > > > Hi Helgaas,

>>> > > >

>>> > > > Some AMD GPUs have hardware support for graphics SR-IOV.

>>> > > > If the SR-IOV capable GPU is plugged into the SR-IOV incapable 

>>> > > > platform. It would cause a problem on PCI resource allocation 

>>> > > > in current Linux kernel.

>>> > > >

>>> > > > Therefore in order to allow the PF (Physical Function) device 

>>> > > > of SR-IOV capable GPU to work on the SR-IOV incapable 

>>> > > > platform, it is required to verify conditions for initializing 

>>> > > > BAR resources on AMD SR-IOV capable GPUs.

>>> > > >

>>> > > > If the device is an AMD graphics device and it supports SR-IOV 

>>> > > > it will require a large amount of resources.

>>> > > > Before calling sriov_init() must ensure that the system BIOS 

>>> > > > also supports SR-IOV and that system BIOS has been able to 

>>> > > > allocate enough resources.

>>> > > > If the VF BARs are zero then the system BIOS does not support 

>>> > > > SR-IOV or it could not allocate the resources and this 

>>> > > > platform will not support AMD graphics SR-IOV.

>>> > > > Therefore do not call sriov_init().

>>> > > > If the system BIOS does support SR-IOV then the VF BARs will 

>>> > > > be properly initialized to non-zero values.

>>> > > >

>>> > > > Below is the patch against to Kernel 4.8 & 4.9. Please review.

>>> > > >

>>> > > > I checked the drivers/pci/quirks.c, it looks the 

>>> > > > workarounds/fixes in quirks.c are for specific devices and one 

>>> > > > or more device ID are defined for the specific devices. 

>>> > > > However my patch is for all AMD SR-IOV capable GPUs, that 

>>> > > > includes all existing

>and future AMD server GPUs.

>>> > > > So it doesn't seem like a good fit to put the fix in quirks.c.

>>> > >

>>> > >

>>> > > Why is an AMD graphics card unique here?  Doesn't sriov_init() 

>>> > > always need to be able to deal with devices of any type where 

>>> > > the BIOS hasn't initialized the SR-IOV capability?  Some SR-IOV 

>>> > > devices can fit their VFs within a minimum bridge aperture, most 

>>> > > cannot.  I don't understand why the VF resource requirements 

>>> > > being exceptionally large dictates that they receive special handling.

>>> > > Thanks,

>>> > >

>>> > > Alex

>>> > >

>>> > > > Signed-off-by: Collins Cheng <collins.cheng@amd.com>

>>> > > > ---

>>> > > >  drivers/pci/iov.c | 63

>>> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++---

>>> > > >  1 file changed, 60 insertions(+), 3 deletions(-)

>>> > > >

>>> > > > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index

>>> > > > e30f05c..e4f1405 100644

>>> > > > --- a/drivers/pci/iov.c

>>> > > > +++ b/drivers/pci/iov.c

>>> > > > @@ -523,6 +523,45 @@ static void sriov_restore_state(struct 

>>> > > > pci_dev

>*dev)

>>> > > >                 msleep(100);

>>> > > >  }

>>> > > >

>>> > > > +/*

>>> > > > + * pci_vf_bar_valid - check if VF BARs have resource 

>>> > > > +allocated

>>> > > > + * @dev: the PCI device

>>> > > > + * @pos: register offset of SR-IOV capability in PCI config 

>>> > > > +space

>>> > > > + * Returns true any VF BAR has resource allocated, false

>>> > > > + * if all VF BARs are empty.

>>> > > > + */

>>> > > > +static bool pci_vf_bar_valid(struct pci_dev *dev, int pos) {

>>> > > > +       int i;

>>> > > > +       u32 bar_value;

>>> > > > +       u32 bar_size_mask = ~(PCI_BASE_ADDRESS_SPACE |

>>> > > > +                       PCI_BASE_ADDRESS_MEM_TYPE_64 |

>>> > > > +                       PCI_BASE_ADDRESS_MEM_PREFETCH);

>>> > > > +

>>> > > > +       for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {

>>> > > > +               pci_read_config_dword(dev, pos + PCI_SRIOV_BAR 

>>> > > > + + i * 4,

>&bar_value);

>>> > > > +               if (bar_value & bar_size_mask)

>>> > > > +                       return true;

>>> > > > +       }

>>> > > > +

>>> > > > +       return false;

>>> > > > +}

>>> > > > +

>>> > > > +/*

>>> > > > + * is_amd_display_adapter - check if it is an AMD/ATI GPU 

>>> > > > +device

>>> > > > + * @dev: the PCI device

>>> > > > + *

>>> > > > + * Returns true if device is an AMD/ATI display adapter,

>>> > > > + * otherwise return false.

>>> > > > + */

>>> > > > +

>>> > > > +static bool is_amd_display_adapter(struct pci_dev *dev) {

>>> > > > +       return (((dev->class >> 16) == PCI_BASE_CLASS_DISPLAY) &&

>>> > > > +               (dev->vendor == PCI_VENDOR_ID_ATI ||

>>> > > > +               dev->vendor == PCI_VENDOR_ID_AMD)); }

>>> > > > +

>>> > > >  /**

>>> > > >   * pci_iov_init - initialize the IOV capability

>>> > > >   * @dev: the PCI device

>>> > > > @@ -537,9 +576,27 @@ int pci_iov_init(struct pci_dev *dev)

>>> > > >                 return -ENODEV;

>>> > > >

>>> > > >         pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);

>>> > > > -       if (pos)

>>> > > > -               return sriov_init(dev, pos);

>>> > > > -

>>> > > > +       if (pos) {

>>> > > > +       /*

>>> > > > +        * If the device is an AMD graphics device and it supports

>>> > > > +        * SR-IOV it will require a large amount of resources.

>>> > > > +        * Before calling sriov_init() must ensure that the system

>>> > > > +        * BIOS also supports SR-IOV and that system BIOS has been

>>> > > > +        * able to allocate enough resources.

>>> > > > +        * If the VF BARs are zero then the system BIOS does not

>>> > > > +        * support SR-IOV or it could not allocate the resources

>>> > > > +        * and this platform will not support AMD graphics SR-IOV.

>>> > > > +        * Therefore do not call sriov_init().

>>> > > > +        * If the system BIOS does support SR-IOV then the VF BARs

>>> > > > +        * will be properly initialized to non-zero values.

>>> > > > +        */

>>> > > > +               if (is_amd_display_adapter(dev)) {

>>> > > > +                       if (pci_vf_bar_valid(dev, pos))

>>> > > > +                               return sriov_init(dev, pos);

>>> > > > +               } else {

>>> > > > +                       return sriov_init(dev, pos);

>>> > > > +               }

>>> > > > +       }

>>> > > >         return -ENODEV;

>>> > > >  }

>>> > > >

>>> > >

>>> >

>>>

>>
Zytaruk, Kelly May 20, 2017, 2:29 p.m. UTC | #14
Collins,

Okay, good to know.
Is there a common solution that can handle all cases?

Thanks,
Kelly

>-----Original Message-----

>From: Cheng, Collins

>Sent: Saturday, May 20, 2017 6:38 AM

>To: Zytaruk, Kelly; Alexander Duyck; Alex Williamson

>Cc: Bjorn Helgaas; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org;

>Deucher, Alexander; Yinghai Lu

>Subject: RE: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV

>incapable platform

>

>Hi Kelly,

>

>This issue also happens in "not SR-IOV capable" SBIOS. It seems some "not SR-IOV

>capable" SBIOS will directly report error in system BIOS boot stage and doesn't

>boot to OS. But other "not SR-IOV capable" SBIOS would not report error and

>boot to Linux.

>

>-Collins Cheng

>

>

>-----Original Message-----

>From: Zytaruk, Kelly

>Sent: Saturday, May 20, 2017 6:28 PM

>To: Cheng, Collins <Collins.Cheng@amd.com>; Alexander Duyck

><alexander.duyck@gmail.com>; Alex Williamson <alex.williamson@redhat.com>

>Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org; linux-

>kernel@vger.kernel.org; Deucher, Alexander <Alexander.Deucher@amd.com>;

>Yinghai Lu <yinghai@kernel.org>

>Subject: RE: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV

>incapable platform

>

>

>

>>-----Original Message-----

>>From: Cheng, Collins

>>Sent: Saturday, May 20, 2017 12:53 AM

>>To: Alexander Duyck; Alex Williamson

>>Cc: Bjorn Helgaas; linux-pci@vger.kernel.org;

>>linux-kernel@vger.kernel.org; Deucher, Alexander; Zytaruk, Kelly;

>>Yinghai Lu

>>Subject: RE: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV

>>incapable platform

>>

>>Hi Alex,

>>

>>Yes, I hope kernel can disable SR-IOV and related VF resource

>>allocation if the system BIOS is not SR-IOV capable.

>>

>>Adding the parameter "pci=nosriov" sounds a doable solution, but it

>>would need user to add this parameter manually, right? I think an

>>automatic detection would be better. My patch is trying to auto detect and

>bypass VF resource allocation.

>>

>>

>>-Collins Cheng

>>

>

>Collins, be careful about this.  I don't think that this is what we want.  If you add

>"pci=nosriov" then you are globally disabling SRIOV for all devices.  This is not the

>solution that we are looking for.

>Remember that there are 3 types of SBIOS; "not SR-IOV capable", "SR-IOV

>capable but does not support large resources", "Complete SR-IOV support".

>

>The problem is that we are trying to find a fix for "broken" SBIOS that does

>support SR-IOV but does not support the full SR-IOV capabilities that devices with

>large resources require.

>

>Thanks,

>Kelly

>

>>

>>-----Original Message-----

>>From: Alexander Duyck [mailto:alexander.duyck@gmail.com]

>>Sent: Friday, May 19, 2017 11:44 PM

>>To: Alex Williamson <alex.williamson@redhat.com>

>>Cc: Cheng, Collins <Collins.Cheng@amd.com>; Bjorn Helgaas

>><bhelgaas@google.com>; linux-pci@vger.kernel.org; linux-

>>kernel@vger.kernel.org; Deucher, Alexander <Alexander.Deucher@amd.com>;

>>Zytaruk, Kelly <Kelly.Zytaruk@amd.com>; Yinghai Lu <yinghai@kernel.org>

>>Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV

>>incapable platform

>>

>>On Mon, May 15, 2017 at 10:53 AM, Alex Williamson

>><alex.williamson@redhat.com> wrote:

>>> On Mon, 15 May 2017 08:19:28 +0000

>>> "Cheng, Collins" <Collins.Cheng@amd.com> wrote:

>>>

>>>> Hi Williamson,

>>>>

>>>> We cannot assume BIOS supports SR-IOV, actually only newer server

>>motherboard BIOS supports SR-IOV. Normal desktop motherboard BIOS or

>>older server motherboard BIOS doesn't support SR-IOV. This issue would

>>happen if an user plugs our AMD SR-IOV capable GPU card to a normal desktop

>motherboard.

>>>

>>> Servers should be supporting SR-IOV for a long time now.  What really

>>> is there to a BIOS supporting SR-IOV anyway, it's simply reserving

>>> sufficient bus number and MMIO resources such that we can enable the

>>> VFs.  This process isn't exclusively reserved for the BIOS.  Some

>>> platforms may choose to only initialize boot devices, leaving the

>>> rest for the OS to program.  The initial proposal here to disable

>>> SR-IOV if not programmed at OS hand-off disables even the possibility

>>> of the OS reallocating resources for this device.

>>

>>There are differences between supporting SR-IOV and supporting SR-IOV

>>on devices with massive resources. I know I have seen NICs that will

>>keep a system from completing POST if SR-IOV is enabled, and MMIO

>>beyond 4G is not. My guess would be that the issues being seen are

>>probably that they disable SR-IOV in the BIOS in such a setup and end

>>up running into issues when they try to boot into the Linux kernel as

>>it goes through and tries to allocate resources for SR-IOV even though it was

>disabled in the BIOS.

>>

>>It might make sense to add a kernel parameter something like a "pci=nosriov"

>>that would allow for disabling SR-IOV and related resource allocation

>>if that is what we are talking about. That way you could plug in these

>>types of devices into a system with a legacy bios or that doesn't wan

>>to allocate addresses above 32b for MMIO, and this parameter would be

>>all that is needed to disable SR-IOV so you could plug in a NIC that has SR-IOV

>associated with it.

>>

>>>> I agree that failure to allocate VF resources should leave the

>>>> device in no

>>worse condition than before it tried. I hope kernel could allocate PF

>>device resource before allocating VF device resource, and keep PF

>>device resource valid and functional if failed to allocate VF device resource.

>>>>

>>>> I will send out dmesg log lspci info tomorrow. Thanks.

>>>

>>> Thanks,

>>> Alex

>>>

>>>> -----Original Message-----

>>>> From: Alex Williamson [mailto:alex.williamson@redhat.com]

>>>> Sent: Friday, May 12, 2017 10:43 PM

>>>> To: Cheng, Collins <Collins.Cheng@amd.com>

>>>> Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org;

>>>> linux-kernel@vger.kernel.org; Deucher, Alexander

>>>> <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>;

>>>> Yinghai Lu <yinghai@kernel.org>

>>>> Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the

>>>> SR-IOV incapable platform

>>>>

>>>> On Fri, 12 May 2017 04:51:43 +0000

>>>> "Cheng, Collins" <Collins.Cheng@amd.com> wrote:

>>>>

>>>> > Hi Williamson,

>>>> >

>>>> > I verified the patch is working for both AMD SR-IOV GPU and Intel

>>>> > SR-IOV

>>NIC. I don't think it is redundant to check the VF BAR valid before

>>call sriov_init(), it is safe and saving boot time, also there is no a

>>better method to know if system BIOS has correctly initialized the SR-IOV

>capability or not.

>>>>

>>>> It also masks an underlying bug and creates a maintenance issue that

>>>> we won't

>>know when it's safe to remove this workaround.  I don't think faster

>>boot is valid rationale, in one case SR-IOV is completely disabled, the

>>other we attempt to allocate the resources the BIOS failed to provide.

>>I expect this is also a corner case, the BIOS should typically support

>>SR-IOV, therefore this situation should be an exception.

>>>>

>>>> > I did not try to fix the issue from the kernel resource allocation

>>>> > perspective,

>>it is because:

>>>> >     1. I am not very familiar with the PCI resource allocation scheme in

>kernel.

>>For example, in sriov_init(), kernel will re-assign the PCI resource

>>for both VF and PF. I don't understand why kernel allocates resource

>>for VF firstly, then PF. If it is PF firstly, then this issue could be avoided.

>>>> >     2. I am not sure if kernel has error handler if PCI resource allocation

>failed.

>>In this case, kernel cannot allocate enough resource to PF. It should

>>trigger some error handler to either just keep original BAR values set

>>by system BIOS, or disable this device and log errors.

>>>>

>>>> I think these are the issues we should be trying to solve and I'm

>>>> sure folks on the linux-pci list can help us identify the bug.

>>>> Minimally, failure to allocate VF resources should leave the device

>>>> in no worse condition than before it tried.  Perhaps you could post

>>>> more details about the issue, boot with pci=earlydump, post dmesg of

>>>> a boot where the PF resources are incorrectly re-allocated, and

>>>> include lspci -vvv for the SR-IOV device.  Also, please test with

>>>> the latest upstream kernel, upstream only patches old kernels

>>>> through stable backports of commits to the latest kernel.  Adding

>>>> Yinghai as a resource allocation expert. Thanks,

>>>>

>>>> Alex

>>>>

>>>> > -----Original Message-----

>>>> > From: Alex Williamson [mailto:alex.williamson@redhat.com]

>>>> > Sent: Friday, May 12, 2017 12:01 PM

>>>> > To: Cheng, Collins <Collins.Cheng@amd.com>

>>>> > Cc: Bjorn Helgaas <bhelgaas@google.com>;

>>>> > linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Deucher,

>>>> > Alexander <Alexander.Deucher@amd.com>; Zytaruk, Kelly

>>>> > <Kelly.Zytaruk@amd.com>

>>>> > Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the

>>>> > SR-IOV incapable platform

>>>> >

>>>> > On Fri, 12 May 2017 03:42:46 +0000 "Cheng, Collins"

>>>> > <Collins.Cheng@amd.com> wrote:

>>>> >

>>>> > > Hi Williamson,

>>>> > >

>>>> > > GPU card needs more BAR aperture resource than other PCI

>>>> > > devices. For

>>example, Intel SR-IOV network card only require 512KB memory resource

>>for all VFs. AMD SR-IOV GPU card needs 256MB x16 VF = 4GB memory

>>resource for frame buffer BAR aperture.

>>>> > >

>>>> > > If the system BIOS supports SR-IOV, it will reserve enough

>>>> > > resource for all

>>VF BARs. If the system BIOS doesn't support SR-IOV or cannot allocate

>>the enough resource for VF BARs, only PF BAR will be assigned and VF

>>BARs are empty. Then system boots to Linux kernel and kernel doesn't

>>check if the VF BARs are empty or valid. Kernel will re-assign the BAR

>>resources for PF and all VFs. The problem I saw is that kernel will

>>fail to allocate PF BAR resource because some resources are assigned to

>>VF, this is not expected. So kernel might need to do some check before

>>re-assign the PF/VF resource, so that PF device will be correctly assigned BAR

>resource and user can use PF device.

>>>> >

>>>> > So the problem is that something bad happens when the kernel is

>>>> > trying to reallocate resources in order to fulfill the

>>>> > requirements of the VFs, leaving the PF resources incorrectly

>>>> > programmed?  Why not just fix that bug rather than creating

>>>> > special handling for this vendor/class of device which disables

>>>> > any attempt to fixup resources for SR-IOV?  IOW, this patch just

>>>> > avoids the problem for your devices rather than fixing the bug.

>>>> > I'd suggest fixing the bug such that the PF is left in a

>>>> > functional state if the kernel is unable to allocate sufficient

>>>> > resources for the VFs.  Thanks,

>>>> >

>>>> > Alex

>>>> >

>>>> > > -----Original Message-----

>>>> > > From: Alex Williamson [mailto:alex.williamson@redhat.com]

>>>> > > Sent: Friday, May 12, 2017 11:21 AM

>>>> > > To: Cheng, Collins <Collins.Cheng@amd.com>

>>>> > > Cc: Bjorn Helgaas <bhelgaas@google.com>;

>>>> > > linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org;

>>>> > > Deucher, Alexander <Alexander.Deucher@amd.com>; Zytaruk, Kelly

>>>> > > <Kelly.Zytaruk@amd.com>

>>>> > > Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the

>>>> > > SR-IOV incapable platform

>>>> > >

>>>> > > On Fri, 12 May 2017 02:50:32 +0000 "Cheng, Collins"

>>>> > > <Collins.Cheng@amd.com> wrote:

>>>> > >

>>>> > > > Hi Helgaas,

>>>> > > >

>>>> > > > Some AMD GPUs have hardware support for graphics SR-IOV.

>>>> > > > If the SR-IOV capable GPU is plugged into the SR-IOV incapable

>>>> > > > platform. It would cause a problem on PCI resource allocation

>>>> > > > in current Linux kernel.

>>>> > > >

>>>> > > > Therefore in order to allow the PF (Physical Function) device

>>>> > > > of SR-IOV capable GPU to work on the SR-IOV incapable

>>>> > > > platform, it is required to verify conditions for initializing

>>>> > > > BAR resources on AMD SR-IOV capable GPUs.

>>>> > > >

>>>> > > > If the device is an AMD graphics device and it supports SR-IOV

>>>> > > > it will require a large amount of resources.

>>>> > > > Before calling sriov_init() must ensure that the system BIOS

>>>> > > > also supports SR-IOV and that system BIOS has been able to

>>>> > > > allocate enough resources.

>>>> > > > If the VF BARs are zero then the system BIOS does not support

>>>> > > > SR-IOV or it could not allocate the resources and this

>>>> > > > platform will not support AMD graphics SR-IOV.

>>>> > > > Therefore do not call sriov_init().

>>>> > > > If the system BIOS does support SR-IOV then the VF BARs will

>>>> > > > be properly initialized to non-zero values.

>>>> > > >

>>>> > > > Below is the patch against to Kernel 4.8 & 4.9. Please review.

>>>> > > >

>>>> > > > I checked the drivers/pci/quirks.c, it looks the

>>>> > > > workarounds/fixes in quirks.c are for specific devices and one

>>>> > > > or more device ID are defined for the specific devices.

>>>> > > > However my patch is for all AMD SR-IOV capable GPUs, that

>>>> > > > includes all existing

>>and future AMD server GPUs.

>>>> > > > So it doesn't seem like a good fit to put the fix in quirks.c.

>>>> > >

>>>> > >

>>>> > > Why is an AMD graphics card unique here?  Doesn't sriov_init()

>>>> > > always need to be able to deal with devices of any type where

>>>> > > the BIOS hasn't initialized the SR-IOV capability?  Some SR-IOV

>>>> > > devices can fit their VFs within a minimum bridge aperture, most

>>>> > > cannot.  I don't understand why the VF resource requirements

>>>> > > being exceptionally large dictates that they receive special handling.

>>>> > > Thanks,

>>>> > >

>>>> > > Alex

>>>> > >

>>>> > > > Signed-off-by: Collins Cheng <collins.cheng@amd.com>

>>>> > > > ---

>>>> > > >  drivers/pci/iov.c | 63

>>>> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++---

>>>> > > >  1 file changed, 60 insertions(+), 3 deletions(-)

>>>> > > >

>>>> > > > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index

>>>> > > > e30f05c..e4f1405 100644

>>>> > > > --- a/drivers/pci/iov.c

>>>> > > > +++ b/drivers/pci/iov.c

>>>> > > > @@ -523,6 +523,45 @@ static void sriov_restore_state(struct

>>>> > > > pci_dev

>>*dev)

>>>> > > >                 msleep(100);

>>>> > > >  }

>>>> > > >

>>>> > > > +/*

>>>> > > > + * pci_vf_bar_valid - check if VF BARs have resource

>>>> > > > +allocated

>>>> > > > + * @dev: the PCI device

>>>> > > > + * @pos: register offset of SR-IOV capability in PCI config

>>>> > > > +space

>>>> > > > + * Returns true any VF BAR has resource allocated, false

>>>> > > > + * if all VF BARs are empty.

>>>> > > > + */

>>>> > > > +static bool pci_vf_bar_valid(struct pci_dev *dev, int pos) {

>>>> > > > +       int i;

>>>> > > > +       u32 bar_value;

>>>> > > > +       u32 bar_size_mask = ~(PCI_BASE_ADDRESS_SPACE |

>>>> > > > +                       PCI_BASE_ADDRESS_MEM_TYPE_64 |

>>>> > > > +                       PCI_BASE_ADDRESS_MEM_PREFETCH);

>>>> > > > +

>>>> > > > +       for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {

>>>> > > > +               pci_read_config_dword(dev, pos + PCI_SRIOV_BAR

>>>> > > > + + i * 4,

>>&bar_value);

>>>> > > > +               if (bar_value & bar_size_mask)

>>>> > > > +                       return true;

>>>> > > > +       }

>>>> > > > +

>>>> > > > +       return false;

>>>> > > > +}

>>>> > > > +

>>>> > > > +/*

>>>> > > > + * is_amd_display_adapter - check if it is an AMD/ATI GPU

>>>> > > > +device

>>>> > > > + * @dev: the PCI device

>>>> > > > + *

>>>> > > > + * Returns true if device is an AMD/ATI display adapter,

>>>> > > > + * otherwise return false.

>>>> > > > + */

>>>> > > > +

>>>> > > > +static bool is_amd_display_adapter(struct pci_dev *dev) {

>>>> > > > +       return (((dev->class >> 16) == PCI_BASE_CLASS_DISPLAY) &&

>>>> > > > +               (dev->vendor == PCI_VENDOR_ID_ATI ||

>>>> > > > +               dev->vendor == PCI_VENDOR_ID_AMD)); }

>>>> > > > +

>>>> > > >  /**

>>>> > > >   * pci_iov_init - initialize the IOV capability

>>>> > > >   * @dev: the PCI device

>>>> > > > @@ -537,9 +576,27 @@ int pci_iov_init(struct pci_dev *dev)

>>>> > > >                 return -ENODEV;

>>>> > > >

>>>> > > >         pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);

>>>> > > > -       if (pos)

>>>> > > > -               return sriov_init(dev, pos);

>>>> > > > -

>>>> > > > +       if (pos) {

>>>> > > > +       /*

>>>> > > > +        * If the device is an AMD graphics device and it supports

>>>> > > > +        * SR-IOV it will require a large amount of resources.

>>>> > > > +        * Before calling sriov_init() must ensure that the system

>>>> > > > +        * BIOS also supports SR-IOV and that system BIOS has been

>>>> > > > +        * able to allocate enough resources.

>>>> > > > +        * If the VF BARs are zero then the system BIOS does not

>>>> > > > +        * support SR-IOV or it could not allocate the resources

>>>> > > > +        * and this platform will not support AMD graphics SR-IOV.

>>>> > > > +        * Therefore do not call sriov_init().

>>>> > > > +        * If the system BIOS does support SR-IOV then the VF BARs

>>>> > > > +        * will be properly initialized to non-zero values.

>>>> > > > +        */

>>>> > > > +               if (is_amd_display_adapter(dev)) {

>>>> > > > +                       if (pci_vf_bar_valid(dev, pos))

>>>> > > > +                               return sriov_init(dev, pos);

>>>> > > > +               } else {

>>>> > > > +                       return sriov_init(dev, pos);

>>>> > > > +               }

>>>> > > > +       }

>>>> > > >         return -ENODEV;

>>>> > > >  }

>>>> > > >

>>>> > >

>>>> >

>>>>

>>>
Alexander H Duyck May 21, 2017, 12:03 a.m. UTC | #15
I'd say the common solution is probably the parameter that allows the
user to disable SR-IOV in the kernel on boot.

The problem with trying to do this automatically is that there are too
many scenarios to know what it was that the BIOS was trying to do.

Another alternative would be to look at providing a means of changing
how the SR-IOV code tries to fix broken setups. Right now it defaults
to trying to allocate the data as it assumes it is going to enable
SR-IOV on every device that has SR-IOV support. An alternative might
be to make the kernel option support multiple options. You could have
it do nosriov as one option, and another option that only enables
SR-IOV on devices that are fully configured and disabled it otherwise,
and then our current default option which is to try enabling SR-IOV on
any device that could support it. Then you could probably also make
the default something you could have as a kernel configuration options
so you could build a kernel that defaults to the middle option that
leaves SR-IOV devices correctly configured enabled, and disables it
otherwise.

- Alex

On Sat, May 20, 2017 at 7:29 AM, Zytaruk, Kelly <Kelly.Zytaruk@amd.com> wrote:
> Collins,
>
> Okay, good to know.
> Is there a common solution that can handle all cases?
>
> Thanks,
> Kelly
>
>>-----Original Message-----
>>From: Cheng, Collins
>>Sent: Saturday, May 20, 2017 6:38 AM
>>To: Zytaruk, Kelly; Alexander Duyck; Alex Williamson
>>Cc: Bjorn Helgaas; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org;
>>Deucher, Alexander; Yinghai Lu
>>Subject: RE: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV
>>incapable platform
>>
>>Hi Kelly,
>>
>>This issue also happens in "not SR-IOV capable" SBIOS. It seems some "not SR-IOV
>>capable" SBIOS will directly report error in system BIOS boot stage and doesn't
>>boot to OS. But other "not SR-IOV capable" SBIOS would not report error and
>>boot to Linux.
>>
>>-Collins Cheng
>>
>>
>>-----Original Message-----
>>From: Zytaruk, Kelly
>>Sent: Saturday, May 20, 2017 6:28 PM
>>To: Cheng, Collins <Collins.Cheng@amd.com>; Alexander Duyck
>><alexander.duyck@gmail.com>; Alex Williamson <alex.williamson@redhat.com>
>>Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org; linux-
>>kernel@vger.kernel.org; Deucher, Alexander <Alexander.Deucher@amd.com>;
>>Yinghai Lu <yinghai@kernel.org>
>>Subject: RE: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV
>>incapable platform
>>
>>
>>
>>>-----Original Message-----
>>>From: Cheng, Collins
>>>Sent: Saturday, May 20, 2017 12:53 AM
>>>To: Alexander Duyck; Alex Williamson
>>>Cc: Bjorn Helgaas; linux-pci@vger.kernel.org;
>>>linux-kernel@vger.kernel.org; Deucher, Alexander; Zytaruk, Kelly;
>>>Yinghai Lu
>>>Subject: RE: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV
>>>incapable platform
>>>
>>>Hi Alex,
>>>
>>>Yes, I hope kernel can disable SR-IOV and related VF resource
>>>allocation if the system BIOS is not SR-IOV capable.
>>>
>>>Adding the parameter "pci=nosriov" sounds a doable solution, but it
>>>would need user to add this parameter manually, right? I think an
>>>automatic detection would be better. My patch is trying to auto detect and
>>bypass VF resource allocation.
>>>
>>>
>>>-Collins Cheng
>>>
>>
>>Collins, be careful about this.  I don't think that this is what we want.  If you add
>>"pci=nosriov" then you are globally disabling SRIOV for all devices.  This is not the
>>solution that we are looking for.
>>Remember that there are 3 types of SBIOS; "not SR-IOV capable", "SR-IOV
>>capable but does not support large resources", "Complete SR-IOV support".
>>
>>The problem is that we are trying to find a fix for "broken" SBIOS that does
>>support SR-IOV but does not support the full SR-IOV capabilities that devices with
>>large resources require.
>>
>>Thanks,
>>Kelly
>>
>>>
>>>-----Original Message-----
>>>From: Alexander Duyck [mailto:alexander.duyck@gmail.com]
>>>Sent: Friday, May 19, 2017 11:44 PM
>>>To: Alex Williamson <alex.williamson@redhat.com>
>>>Cc: Cheng, Collins <Collins.Cheng@amd.com>; Bjorn Helgaas
>>><bhelgaas@google.com>; linux-pci@vger.kernel.org; linux-
>>>kernel@vger.kernel.org; Deucher, Alexander <Alexander.Deucher@amd.com>;
>>>Zytaruk, Kelly <Kelly.Zytaruk@amd.com>; Yinghai Lu <yinghai@kernel.org>
>>>Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV
>>>incapable platform
>>>
>>>On Mon, May 15, 2017 at 10:53 AM, Alex Williamson
>>><alex.williamson@redhat.com> wrote:
>>>> On Mon, 15 May 2017 08:19:28 +0000
>>>> "Cheng, Collins" <Collins.Cheng@amd.com> wrote:
>>>>
>>>>> Hi Williamson,
>>>>>
>>>>> We cannot assume BIOS supports SR-IOV, actually only newer server
>>>motherboard BIOS supports SR-IOV. Normal desktop motherboard BIOS or
>>>older server motherboard BIOS doesn't support SR-IOV. This issue would
>>>happen if an user plugs our AMD SR-IOV capable GPU card to a normal desktop
>>motherboard.
>>>>
>>>> Servers should be supporting SR-IOV for a long time now.  What really
>>>> is there to a BIOS supporting SR-IOV anyway, it's simply reserving
>>>> sufficient bus number and MMIO resources such that we can enable the
>>>> VFs.  This process isn't exclusively reserved for the BIOS.  Some
>>>> platforms may choose to only initialize boot devices, leaving the
>>>> rest for the OS to program.  The initial proposal here to disable
>>>> SR-IOV if not programmed at OS hand-off disables even the possibility
>>>> of the OS reallocating resources for this device.
>>>
>>>There are differences between supporting SR-IOV and supporting SR-IOV
>>>on devices with massive resources. I know I have seen NICs that will
>>>keep a system from completing POST if SR-IOV is enabled, and MMIO
>>>beyond 4G is not. My guess would be that the issues being seen are
>>>probably that they disable SR-IOV in the BIOS in such a setup and end
>>>up running into issues when they try to boot into the Linux kernel as
>>>it goes through and tries to allocate resources for SR-IOV even though it was
>>disabled in the BIOS.
>>>
>>>It might make sense to add a kernel parameter something like a "pci=nosriov"
>>>that would allow for disabling SR-IOV and related resource allocation
>>>if that is what we are talking about. That way you could plug in these
>>>types of devices into a system with a legacy bios or that doesn't wan
>>>to allocate addresses above 32b for MMIO, and this parameter would be
>>>all that is needed to disable SR-IOV so you could plug in a NIC that has SR-IOV
>>associated with it.
>>>
>>>>> I agree that failure to allocate VF resources should leave the
>>>>> device in no
>>>worse condition than before it tried. I hope kernel could allocate PF
>>>device resource before allocating VF device resource, and keep PF
>>>device resource valid and functional if failed to allocate VF device resource.
>>>>>
>>>>> I will send out dmesg log lspci info tomorrow. Thanks.
>>>>
>>>> Thanks,
>>>> Alex
>>>>
>>>>> -----Original Message-----
>>>>> From: Alex Williamson [mailto:alex.williamson@redhat.com]
>>>>> Sent: Friday, May 12, 2017 10:43 PM
>>>>> To: Cheng, Collins <Collins.Cheng@amd.com>
>>>>> Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org;
>>>>> linux-kernel@vger.kernel.org; Deucher, Alexander
>>>>> <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>;
>>>>> Yinghai Lu <yinghai@kernel.org>
>>>>> Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the
>>>>> SR-IOV incapable platform
>>>>>
>>>>> On Fri, 12 May 2017 04:51:43 +0000
>>>>> "Cheng, Collins" <Collins.Cheng@amd.com> wrote:
>>>>>
>>>>> > Hi Williamson,
>>>>> >
>>>>> > I verified the patch is working for both AMD SR-IOV GPU and Intel
>>>>> > SR-IOV
>>>NIC. I don't think it is redundant to check the VF BAR valid before
>>>call sriov_init(), it is safe and saving boot time, also there is no a
>>>better method to know if system BIOS has correctly initialized the SR-IOV
>>capability or not.
>>>>>
>>>>> It also masks an underlying bug and creates a maintenance issue that
>>>>> we won't
>>>know when it's safe to remove this workaround.  I don't think faster
>>>boot is valid rationale, in one case SR-IOV is completely disabled, the
>>>other we attempt to allocate the resources the BIOS failed to provide.
>>>I expect this is also a corner case, the BIOS should typically support
>>>SR-IOV, therefore this situation should be an exception.
>>>>>
>>>>> > I did not try to fix the issue from the kernel resource allocation
>>>>> > perspective,
>>>it is because:
>>>>> >     1. I am not very familiar with the PCI resource allocation scheme in
>>kernel.
>>>For example, in sriov_init(), kernel will re-assign the PCI resource
>>>for both VF and PF. I don't understand why kernel allocates resource
>>>for VF firstly, then PF. If it is PF firstly, then this issue could be avoided.
>>>>> >     2. I am not sure if kernel has error handler if PCI resource allocation
>>failed.
>>>In this case, kernel cannot allocate enough resource to PF. It should
>>>trigger some error handler to either just keep original BAR values set
>>>by system BIOS, or disable this device and log errors.
>>>>>
>>>>> I think these are the issues we should be trying to solve and I'm
>>>>> sure folks on the linux-pci list can help us identify the bug.
>>>>> Minimally, failure to allocate VF resources should leave the device
>>>>> in no worse condition than before it tried.  Perhaps you could post
>>>>> more details about the issue, boot with pci=earlydump, post dmesg of
>>>>> a boot where the PF resources are incorrectly re-allocated, and
>>>>> include lspci -vvv for the SR-IOV device.  Also, please test with
>>>>> the latest upstream kernel, upstream only patches old kernels
>>>>> through stable backports of commits to the latest kernel.  Adding
>>>>> Yinghai as a resource allocation expert. Thanks,
>>>>>
>>>>> Alex
>>>>>
>>>>> > -----Original Message-----
>>>>> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
>>>>> > Sent: Friday, May 12, 2017 12:01 PM
>>>>> > To: Cheng, Collins <Collins.Cheng@amd.com>
>>>>> > Cc: Bjorn Helgaas <bhelgaas@google.com>;
>>>>> > linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Deucher,
>>>>> > Alexander <Alexander.Deucher@amd.com>; Zytaruk, Kelly
>>>>> > <Kelly.Zytaruk@amd.com>
>>>>> > Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the
>>>>> > SR-IOV incapable platform
>>>>> >
>>>>> > On Fri, 12 May 2017 03:42:46 +0000 "Cheng, Collins"
>>>>> > <Collins.Cheng@amd.com> wrote:
>>>>> >
>>>>> > > Hi Williamson,
>>>>> > >
>>>>> > > GPU card needs more BAR aperture resource than other PCI
>>>>> > > devices. For
>>>example, Intel SR-IOV network card only require 512KB memory resource
>>>for all VFs. AMD SR-IOV GPU card needs 256MB x16 VF = 4GB memory
>>>resource for frame buffer BAR aperture.
>>>>> > >
>>>>> > > If the system BIOS supports SR-IOV, it will reserve enough
>>>>> > > resource for all
>>>VF BARs. If the system BIOS doesn't support SR-IOV or cannot allocate
>>>the enough resource for VF BARs, only PF BAR will be assigned and VF
>>>BARs are empty. Then system boots to Linux kernel and kernel doesn't
>>>check if the VF BARs are empty or valid. Kernel will re-assign the BAR
>>>resources for PF and all VFs. The problem I saw is that kernel will
>>>fail to allocate PF BAR resource because some resources are assigned to
>>>VF, this is not expected. So kernel might need to do some check before
>>>re-assign the PF/VF resource, so that PF device will be correctly assigned BAR
>>resource and user can use PF device.
>>>>> >
>>>>> > So the problem is that something bad happens when the kernel is
>>>>> > trying to reallocate resources in order to fulfill the
>>>>> > requirements of the VFs, leaving the PF resources incorrectly
>>>>> > programmed?  Why not just fix that bug rather than creating
>>>>> > special handling for this vendor/class of device which disables
>>>>> > any attempt to fixup resources for SR-IOV?  IOW, this patch just
>>>>> > avoids the problem for your devices rather than fixing the bug.
>>>>> > I'd suggest fixing the bug such that the PF is left in a
>>>>> > functional state if the kernel is unable to allocate sufficient
>>>>> > resources for the VFs.  Thanks,
>>>>> >
>>>>> > Alex
>>>>> >
>>>>> > > -----Original Message-----
>>>>> > > From: Alex Williamson [mailto:alex.williamson@redhat.com]
>>>>> > > Sent: Friday, May 12, 2017 11:21 AM
>>>>> > > To: Cheng, Collins <Collins.Cheng@amd.com>
>>>>> > > Cc: Bjorn Helgaas <bhelgaas@google.com>;
>>>>> > > linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org;
>>>>> > > Deucher, Alexander <Alexander.Deucher@amd.com>; Zytaruk, Kelly
>>>>> > > <Kelly.Zytaruk@amd.com>
>>>>> > > Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the
>>>>> > > SR-IOV incapable platform
>>>>> > >
>>>>> > > On Fri, 12 May 2017 02:50:32 +0000 "Cheng, Collins"
>>>>> > > <Collins.Cheng@amd.com> wrote:
>>>>> > >
>>>>> > > > Hi Helgaas,
>>>>> > > >
>>>>> > > > Some AMD GPUs have hardware support for graphics SR-IOV.
>>>>> > > > If the SR-IOV capable GPU is plugged into the SR-IOV incapable
>>>>> > > > platform. It would cause a problem on PCI resource allocation
>>>>> > > > in current Linux kernel.
>>>>> > > >
>>>>> > > > Therefore in order to allow the PF (Physical Function) device
>>>>> > > > of SR-IOV capable GPU to work on the SR-IOV incapable
>>>>> > > > platform, it is required to verify conditions for initializing
>>>>> > > > BAR resources on AMD SR-IOV capable GPUs.
>>>>> > > >
>>>>> > > > If the device is an AMD graphics device and it supports SR-IOV
>>>>> > > > it will require a large amount of resources.
>>>>> > > > Before calling sriov_init() must ensure that the system BIOS
>>>>> > > > also supports SR-IOV and that system BIOS has been able to
>>>>> > > > allocate enough resources.
>>>>> > > > If the VF BARs are zero then the system BIOS does not support
>>>>> > > > SR-IOV or it could not allocate the resources and this
>>>>> > > > platform will not support AMD graphics SR-IOV.
>>>>> > > > Therefore do not call sriov_init().
>>>>> > > > If the system BIOS does support SR-IOV then the VF BARs will
>>>>> > > > be properly initialized to non-zero values.
>>>>> > > >
>>>>> > > > Below is the patch against to Kernel 4.8 & 4.9. Please review.
>>>>> > > >
>>>>> > > > I checked the drivers/pci/quirks.c, it looks the
>>>>> > > > workarounds/fixes in quirks.c are for specific devices and one
>>>>> > > > or more device ID are defined for the specific devices.
>>>>> > > > However my patch is for all AMD SR-IOV capable GPUs, that
>>>>> > > > includes all existing
>>>and future AMD server GPUs.
>>>>> > > > So it doesn't seem like a good fit to put the fix in quirks.c.
>>>>> > >
>>>>> > >
>>>>> > > Why is an AMD graphics card unique here?  Doesn't sriov_init()
>>>>> > > always need to be able to deal with devices of any type where
>>>>> > > the BIOS hasn't initialized the SR-IOV capability?  Some SR-IOV
>>>>> > > devices can fit their VFs within a minimum bridge aperture, most
>>>>> > > cannot.  I don't understand why the VF resource requirements
>>>>> > > being exceptionally large dictates that they receive special handling.
>>>>> > > Thanks,
>>>>> > >
>>>>> > > Alex
>>>>> > >
>>>>> > > > Signed-off-by: Collins Cheng <collins.cheng@amd.com>
>>>>> > > > ---
>>>>> > > >  drivers/pci/iov.c | 63
>>>>> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++---
>>>>> > > >  1 file changed, 60 insertions(+), 3 deletions(-)
>>>>> > > >
>>>>> > > > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index
>>>>> > > > e30f05c..e4f1405 100644
>>>>> > > > --- a/drivers/pci/iov.c
>>>>> > > > +++ b/drivers/pci/iov.c
>>>>> > > > @@ -523,6 +523,45 @@ static void sriov_restore_state(struct
>>>>> > > > pci_dev
>>>*dev)
>>>>> > > >                 msleep(100);
>>>>> > > >  }
>>>>> > > >
>>>>> > > > +/*
>>>>> > > > + * pci_vf_bar_valid - check if VF BARs have resource
>>>>> > > > +allocated
>>>>> > > > + * @dev: the PCI device
>>>>> > > > + * @pos: register offset of SR-IOV capability in PCI config
>>>>> > > > +space
>>>>> > > > + * Returns true any VF BAR has resource allocated, false
>>>>> > > > + * if all VF BARs are empty.
>>>>> > > > + */
>>>>> > > > +static bool pci_vf_bar_valid(struct pci_dev *dev, int pos) {
>>>>> > > > +       int i;
>>>>> > > > +       u32 bar_value;
>>>>> > > > +       u32 bar_size_mask = ~(PCI_BASE_ADDRESS_SPACE |
>>>>> > > > +                       PCI_BASE_ADDRESS_MEM_TYPE_64 |
>>>>> > > > +                       PCI_BASE_ADDRESS_MEM_PREFETCH);
>>>>> > > > +
>>>>> > > > +       for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>>>>> > > > +               pci_read_config_dword(dev, pos + PCI_SRIOV_BAR
>>>>> > > > + + i * 4,
>>>&bar_value);
>>>>> > > > +               if (bar_value & bar_size_mask)
>>>>> > > > +                       return true;
>>>>> > > > +       }
>>>>> > > > +
>>>>> > > > +       return false;
>>>>> > > > +}
>>>>> > > > +
>>>>> > > > +/*
>>>>> > > > + * is_amd_display_adapter - check if it is an AMD/ATI GPU
>>>>> > > > +device
>>>>> > > > + * @dev: the PCI device
>>>>> > > > + *
>>>>> > > > + * Returns true if device is an AMD/ATI display adapter,
>>>>> > > > + * otherwise return false.
>>>>> > > > + */
>>>>> > > > +
>>>>> > > > +static bool is_amd_display_adapter(struct pci_dev *dev) {
>>>>> > > > +       return (((dev->class >> 16) == PCI_BASE_CLASS_DISPLAY) &&
>>>>> > > > +               (dev->vendor == PCI_VENDOR_ID_ATI ||
>>>>> > > > +               dev->vendor == PCI_VENDOR_ID_AMD)); }
>>>>> > > > +
>>>>> > > >  /**
>>>>> > > >   * pci_iov_init - initialize the IOV capability
>>>>> > > >   * @dev: the PCI device
>>>>> > > > @@ -537,9 +576,27 @@ int pci_iov_init(struct pci_dev *dev)
>>>>> > > >                 return -ENODEV;
>>>>> > > >
>>>>> > > >         pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);
>>>>> > > > -       if (pos)
>>>>> > > > -               return sriov_init(dev, pos);
>>>>> > > > -
>>>>> > > > +       if (pos) {
>>>>> > > > +       /*
>>>>> > > > +        * If the device is an AMD graphics device and it supports
>>>>> > > > +        * SR-IOV it will require a large amount of resources.
>>>>> > > > +        * Before calling sriov_init() must ensure that the system
>>>>> > > > +        * BIOS also supports SR-IOV and that system BIOS has been
>>>>> > > > +        * able to allocate enough resources.
>>>>> > > > +        * If the VF BARs are zero then the system BIOS does not
>>>>> > > > +        * support SR-IOV or it could not allocate the resources
>>>>> > > > +        * and this platform will not support AMD graphics SR-IOV.
>>>>> > > > +        * Therefore do not call sriov_init().
>>>>> > > > +        * If the system BIOS does support SR-IOV then the VF BARs
>>>>> > > > +        * will be properly initialized to non-zero values.
>>>>> > > > +        */
>>>>> > > > +               if (is_amd_display_adapter(dev)) {
>>>>> > > > +                       if (pci_vf_bar_valid(dev, pos))
>>>>> > > > +                               return sriov_init(dev, pos);
>>>>> > > > +               } else {
>>>>> > > > +                       return sriov_init(dev, pos);
>>>>> > > > +               }
>>>>> > > > +       }
>>>>> > > >         return -ENODEV;
>>>>> > > >  }
>>>>> > > >
>>>>> > >
>>>>> >
>>>>>
>>>>
Alex Williamson May 22, 2017, 3:44 p.m. UTC | #16
On Fri, 19 May 2017 08:43:38 -0700
Alexander Duyck <alexander.duyck@gmail.com> wrote:

> On Mon, May 15, 2017 at 10:53 AM, Alex Williamson
> <alex.williamson@redhat.com> wrote:
> > On Mon, 15 May 2017 08:19:28 +0000
> > "Cheng, Collins" <Collins.Cheng@amd.com> wrote:
> >  
> >> Hi Williamson,
> >>
> >> We cannot assume BIOS supports SR-IOV, actually only newer server motherboard BIOS supports SR-IOV. Normal desktop motherboard BIOS or older server motherboard BIOS doesn't support SR-IOV. This issue would happen if an user plugs our AMD SR-IOV capable GPU card to a normal desktop motherboard.  
> >
> > Servers should be supporting SR-IOV for a long time now.  What really
> > is there to a BIOS supporting SR-IOV anyway, it's simply reserving
> > sufficient bus number and MMIO resources such that we can enable the
> > VFs.  This process isn't exclusively reserved for the BIOS.  Some
> > platforms may choose to only initialize boot devices, leaving the rest
> > for the OS to program.  The initial proposal here to disable SR-IOV if
> > not programmed at OS hand-off disables even the possibility of the OS
> > reallocating resources for this device.  
> 
> There are differences between supporting SR-IOV and supporting SR-IOV
> on devices with massive resources. I know I have seen NICs that will
> keep a system from completing POST if SR-IOV is enabled, and MMIO
> beyond 4G is not. My guess would be that the issues being seen are
> probably that they disable SR-IOV in the BIOS in such a setup and end
> up running into issues when they try to boot into the Linux kernel as
> it goes through and tries to allocate resources for SR-IOV even though
> it was disabled in the BIOS.
> 
> It might make sense to add a kernel parameter something like a
> "pci=nosriov" that would allow for disabling SR-IOV and related
> resource allocation if that is what we are talking about. That way you
> could plug in these types of devices into a system with a legacy bios
> or that doesn't wan to allocate addresses above 32b for MMIO, and this
> parameter would be all that is needed to disable SR-IOV so you could
> plug in a NIC that has SR-IOV associated with it.

Hi,

a) I think we're still ignoring the real bug that is something goes
wrong with the reallocation leaving the PF without resources.

b) Why does an option to avoid re-allocation need to be sr-iov
specific?  Shouldn't pci=realloc=off cover this?

Thanks,
Alex

> >> I agree that failure to allocate VF resources should leave the device in no worse condition than before it tried. I hope kernel could allocate PF device resource before allocating VF device resource, and keep PF device resource valid and functional if failed to allocate VF device resource.
> >>
> >> I will send out dmesg log lspci info tomorrow. Thanks.  
> >
> > Thanks,
> > Alex
> >  
> >> -----Original Message-----
> >> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> >> Sent: Friday, May 12, 2017 10:43 PM
> >> To: Cheng, Collins <Collins.Cheng@amd.com>
> >> Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Deucher, Alexander <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>; Yinghai Lu <yinghai@kernel.org>
> >> Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV incapable platform
> >>
> >> On Fri, 12 May 2017 04:51:43 +0000
> >> "Cheng, Collins" <Collins.Cheng@amd.com> wrote:
> >>  
> >> > Hi Williamson,
> >> >
> >> > I verified the patch is working for both AMD SR-IOV GPU and Intel SR-IOV NIC. I don't think it is redundant to check the VF BAR valid before call sriov_init(), it is safe and saving boot time, also there is no a better method to know if system BIOS has correctly initialized the SR-IOV capability or not.  
> >>
> >> It also masks an underlying bug and creates a maintenance issue that we won't know when it's safe to remove this workaround.  I don't think faster boot is valid rationale, in one case SR-IOV is completely disabled, the other we attempt to allocate the resources the BIOS failed to provide.  I expect this is also a corner case, the BIOS should typically support SR-IOV, therefore this situation should be an exception.
> >>  
> >> > I did not try to fix the issue from the kernel resource allocation perspective, it is because:
> >> >     1. I am not very familiar with the PCI resource allocation scheme in kernel. For example, in sriov_init(), kernel will re-assign the PCI resource for both VF and PF. I don't understand why kernel allocates resource for VF firstly, then PF. If it is PF firstly, then this issue could be avoided.
> >> >     2. I am not sure if kernel has error handler if PCI resource allocation failed. In this case, kernel cannot allocate enough resource to PF. It should trigger some error handler to either just keep original BAR values set by system BIOS, or disable this device and log errors.  
> >>
> >> I think these are the issues we should be trying to solve and I'm sure folks on the linux-pci list can help us identify the bug.  Minimally, failure to allocate VF resources should leave the device in no worse condition than before it tried.  Perhaps you could post more details about the issue, boot with pci=earlydump, post dmesg of a boot where the PF resources are incorrectly re-allocated, and include lspci -vvv for the SR-IOV device.  Also, please test with the latest upstream kernel, upstream only patches old kernels through stable backports of commits to the latest kernel.  Adding Yinghai as a resource allocation expert. Thanks,
> >>
> >> Alex
> >>  
> >> > -----Original Message-----
> >> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> >> > Sent: Friday, May 12, 2017 12:01 PM
> >> > To: Cheng, Collins <Collins.Cheng@amd.com>
> >> > Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org;
> >> > linux-kernel@vger.kernel.org; Deucher, Alexander
> >> > <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>
> >> > Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the
> >> > SR-IOV incapable platform
> >> >
> >> > On Fri, 12 May 2017 03:42:46 +0000
> >> > "Cheng, Collins" <Collins.Cheng@amd.com> wrote:
> >> >  
> >> > > Hi Williamson,
> >> > >
> >> > > GPU card needs more BAR aperture resource than other PCI devices. For example, Intel SR-IOV network card only require 512KB memory resource for all VFs. AMD SR-IOV GPU card needs 256MB x16 VF = 4GB memory resource for frame buffer BAR aperture.
> >> > >
> >> > > If the system BIOS supports SR-IOV, it will reserve enough resource for all VF BARs. If the system BIOS doesn't support SR-IOV or cannot allocate the enough resource for VF BARs, only PF BAR will be assigned and VF BARs are empty. Then system boots to Linux kernel and kernel doesn't check if the VF BARs are empty or valid. Kernel will re-assign the BAR resources for PF and all VFs. The problem I saw is that kernel will fail to allocate PF BAR resource because some resources are assigned to VF, this is not expected. So kernel might need to do some check before re-assign the PF/VF resource, so that PF device will be correctly assigned BAR resource and user can use PF device.  
> >> >
> >> > So the problem is that something bad happens when the kernel is trying
> >> > to reallocate resources in order to fulfill the requirements of the
> >> > VFs, leaving the PF resources incorrectly programmed?  Why not just
> >> > fix that bug rather than creating special handling for this
> >> > vendor/class of device which disables any attempt to fixup resources
> >> > for SR-IOV?  IOW, this patch just avoids the problem for your devices
> >> > rather than fixing the bug.  I'd suggest fixing the bug such that the
> >> > PF is left in a functional state if the kernel is unable to allocate
> >> > sufficient resources for the VFs.  Thanks,
> >> >
> >> > Alex
> >> >  
> >> > > -----Original Message-----
> >> > > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> >> > > Sent: Friday, May 12, 2017 11:21 AM
> >> > > To: Cheng, Collins <Collins.Cheng@amd.com>
> >> > > Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org;
> >> > > linux-kernel@vger.kernel.org; Deucher, Alexander
> >> > > <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>
> >> > > Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the
> >> > > SR-IOV incapable platform
> >> > >
> >> > > On Fri, 12 May 2017 02:50:32 +0000
> >> > > "Cheng, Collins" <Collins.Cheng@amd.com> wrote:
> >> > >  
> >> > > > Hi Helgaas,
> >> > > >
> >> > > > Some AMD GPUs have hardware support for graphics SR-IOV.
> >> > > > If the SR-IOV capable GPU is plugged into the SR-IOV incapable
> >> > > > platform. It would cause a problem on PCI resource allocation in
> >> > > > current Linux kernel.
> >> > > >
> >> > > > Therefore in order to allow the PF (Physical Function) device of
> >> > > > SR-IOV capable GPU to work on the SR-IOV incapable platform, it is
> >> > > > required to verify conditions for initializing BAR resources on
> >> > > > AMD SR-IOV capable GPUs.
> >> > > >
> >> > > > If the device is an AMD graphics device and it supports SR-IOV it
> >> > > > will require a large amount of resources.
> >> > > > Before calling sriov_init() must ensure that the system BIOS also
> >> > > > supports SR-IOV and that system BIOS has been able to allocate
> >> > > > enough resources.
> >> > > > If the VF BARs are zero then the system BIOS does not support
> >> > > > SR-IOV or it could not allocate the resources and this platform
> >> > > > will not support AMD graphics SR-IOV.
> >> > > > Therefore do not call sriov_init().
> >> > > > If the system BIOS does support SR-IOV then the VF BARs will be
> >> > > > properly initialized to non-zero values.
> >> > > >
> >> > > > Below is the patch against to Kernel 4.8 & 4.9. Please review.
> >> > > >
> >> > > > I checked the drivers/pci/quirks.c, it looks the workarounds/fixes
> >> > > > in quirks.c are for specific devices and one or more device ID are
> >> > > > defined for the specific devices. However my patch is for all AMD
> >> > > > SR-IOV capable GPUs, that includes all existing and future AMD server GPUs.
> >> > > > So it doesn't seem like a good fit to put the fix in quirks.c.  
> >> > >
> >> > >
> >> > > Why is an AMD graphics card unique here?  Doesn't sriov_init()
> >> > > always need to be able to deal with devices of any type where the
> >> > > BIOS hasn't initialized the SR-IOV capability?  Some SR-IOV devices
> >> > > can fit their VFs within a minimum bridge aperture, most cannot.  I
> >> > > don't understand why the VF resource requirements being
> >> > > exceptionally large dictates that they receive special handling.
> >> > > Thanks,
> >> > >
> >> > > Alex
> >> > >  
> >> > > > Signed-off-by: Collins Cheng <collins.cheng@amd.com>
> >> > > > ---
> >> > > >  drivers/pci/iov.c | 63
> >> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++---
> >> > > >  1 file changed, 60 insertions(+), 3 deletions(-)
> >> > > >
> >> > > > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index
> >> > > > e30f05c..e4f1405 100644
> >> > > > --- a/drivers/pci/iov.c
> >> > > > +++ b/drivers/pci/iov.c
> >> > > > @@ -523,6 +523,45 @@ static void sriov_restore_state(struct pci_dev *dev)
> >> > > >                 msleep(100);
> >> > > >  }
> >> > > >
> >> > > > +/*
> >> > > > + * pci_vf_bar_valid - check if VF BARs have resource allocated
> >> > > > + * @dev: the PCI device
> >> > > > + * @pos: register offset of SR-IOV capability in PCI config space
> >> > > > + * Returns true any VF BAR has resource allocated, false
> >> > > > + * if all VF BARs are empty.
> >> > > > + */
> >> > > > +static bool pci_vf_bar_valid(struct pci_dev *dev, int pos) {
> >> > > > +       int i;
> >> > > > +       u32 bar_value;
> >> > > > +       u32 bar_size_mask = ~(PCI_BASE_ADDRESS_SPACE |
> >> > > > +                       PCI_BASE_ADDRESS_MEM_TYPE_64 |
> >> > > > +                       PCI_BASE_ADDRESS_MEM_PREFETCH);
> >> > > > +
> >> > > > +       for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
> >> > > > +               pci_read_config_dword(dev, pos + PCI_SRIOV_BAR + i * 4, &bar_value);
> >> > > > +               if (bar_value & bar_size_mask)
> >> > > > +                       return true;
> >> > > > +       }
> >> > > > +
> >> > > > +       return false;
> >> > > > +}
> >> > > > +
> >> > > > +/*
> >> > > > + * is_amd_display_adapter - check if it is an AMD/ATI GPU device
> >> > > > + * @dev: the PCI device
> >> > > > + *
> >> > > > + * Returns true if device is an AMD/ATI display adapter,
> >> > > > + * otherwise return false.
> >> > > > + */
> >> > > > +
> >> > > > +static bool is_amd_display_adapter(struct pci_dev *dev) {
> >> > > > +       return (((dev->class >> 16) == PCI_BASE_CLASS_DISPLAY) &&
> >> > > > +               (dev->vendor == PCI_VENDOR_ID_ATI ||
> >> > > > +               dev->vendor == PCI_VENDOR_ID_AMD)); }
> >> > > > +
> >> > > >  /**
> >> > > >   * pci_iov_init - initialize the IOV capability
> >> > > >   * @dev: the PCI device
> >> > > > @@ -537,9 +576,27 @@ int pci_iov_init(struct pci_dev *dev)
> >> > > >                 return -ENODEV;
> >> > > >
> >> > > >         pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);
> >> > > > -       if (pos)
> >> > > > -               return sriov_init(dev, pos);
> >> > > > -
> >> > > > +       if (pos) {
> >> > > > +       /*
> >> > > > +        * If the device is an AMD graphics device and it supports
> >> > > > +        * SR-IOV it will require a large amount of resources.
> >> > > > +        * Before calling sriov_init() must ensure that the system
> >> > > > +        * BIOS also supports SR-IOV and that system BIOS has been
> >> > > > +        * able to allocate enough resources.
> >> > > > +        * If the VF BARs are zero then the system BIOS does not
> >> > > > +        * support SR-IOV or it could not allocate the resources
> >> > > > +        * and this platform will not support AMD graphics SR-IOV.
> >> > > > +        * Therefore do not call sriov_init().
> >> > > > +        * If the system BIOS does support SR-IOV then the VF BARs
> >> > > > +        * will be properly initialized to non-zero values.
> >> > > > +        */
> >> > > > +               if (is_amd_display_adapter(dev)) {
> >> > > > +                       if (pci_vf_bar_valid(dev, pos))
> >> > > > +                               return sriov_init(dev, pos);
> >> > > > +               } else {
> >> > > > +                       return sriov_init(dev, pos);
> >> > > > +               }
> >> > > > +       }
> >> > > >         return -ENODEV;
> >> > > >  }
> >> > > >  
> >> > >  
> >> >  
> >>  
> >
Deucher, Alexander May 23, 2017, 12:15 p.m. UTC | #17
> -----Original Message-----
> From: Cheng, Collins
> Sent: Thursday, May 11, 2017 10:51 PM
> To: Bjorn Helgaas; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org
> Cc: Deucher, Alexander; Zytaruk, Kelly
> Subject: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV
> incapable platform
> 
> Hi Helgaas,
> 
> Some AMD GPUs have hardware support for graphics SR-IOV.
> If the SR-IOV capable GPU is plugged into the SR-IOV incapable
> platform. It would cause a problem on PCI resource allocation in
> current Linux kernel.
> 
> Therefore in order to allow the PF (Physical Function) device of
> SR-IOV capable GPU to work on the SR-IOV incapable platform,
> it is required to verify conditions for initializing BAR resources
> on AMD SR-IOV capable GPUs.
> 
> If the device is an AMD graphics device and it supports
> SR-IOV it will require a large amount of resources.
> Before calling sriov_init() must ensure that the system
> BIOS also supports SR-IOV and that system BIOS has been
> able to allocate enough resources.
> If the VF BARs are zero then the system BIOS does not
> support SR-IOV or it could not allocate the resources
> and this platform will not support AMD graphics SR-IOV.
> Therefore do not call sriov_init().
> If the system BIOS does support SR-IOV then the VF BARs
> will be properly initialized to non-zero values.
> 
> Below is the patch against to Kernel 4.8 & 4.9. Please review.

For upstream, the patch should be against Linus' master or the Bjorn's pci-next tree.

Alex

> 
> I checked the drivers/pci/quirks.c, it looks the workarounds/fixes in
> quirks.c are for specific devices and one or more device ID are defined
> for the specific devices. However my patch is for all AMD SR-IOV
> capable GPUs, that includes all existing and future AMD server GPUs.
> So it doesn't seem like a good fit to put the fix in quirks.c.
> 
> 
> 
> Signed-off-by: Collins Cheng <collins.cheng@amd.com>
> ---
>  drivers/pci/iov.c | 63
> ++++++++++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 60 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index e30f05c..e4f1405 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -523,6 +523,45 @@ static void sriov_restore_state(struct pci_dev *dev)
>  		msleep(100);
>  }
> 
> +/*
> + * pci_vf_bar_valid - check if VF BARs have resource allocated
> + * @dev: the PCI device
> + * @pos: register offset of SR-IOV capability in PCI config space
> + * Returns true any VF BAR has resource allocated, false
> + * if all VF BARs are empty.
> + */
> +static bool pci_vf_bar_valid(struct pci_dev *dev, int pos)
> +{
> +	int i;
> +	u32 bar_value;
> +	u32 bar_size_mask = ~(PCI_BASE_ADDRESS_SPACE |
> +			PCI_BASE_ADDRESS_MEM_TYPE_64 |
> +			PCI_BASE_ADDRESS_MEM_PREFETCH);
> +
> +	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
> +		pci_read_config_dword(dev, pos + PCI_SRIOV_BAR + i * 4,
> &bar_value);
> +		if (bar_value & bar_size_mask)
> +			return true;
> +	}
> +
> +	return false;
> +}
> +
> +/*
> + * is_amd_display_adapter - check if it is an AMD/ATI GPU device
> + * @dev: the PCI device
> + *
> + * Returns true if device is an AMD/ATI display adapter,
> + * otherwise return false.
> + */
> +
> +static bool is_amd_display_adapter(struct pci_dev *dev)
> +{
> +	return (((dev->class >> 16) == PCI_BASE_CLASS_DISPLAY) &&
> +		(dev->vendor == PCI_VENDOR_ID_ATI ||
> +		dev->vendor == PCI_VENDOR_ID_AMD));
> +}
> +
>  /**
>   * pci_iov_init - initialize the IOV capability
>   * @dev: the PCI device
> @@ -537,9 +576,27 @@ int pci_iov_init(struct pci_dev *dev)
>  		return -ENODEV;
> 
>  	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);
> -	if (pos)
> -		return sriov_init(dev, pos);
> -
> +	if (pos) {
> +	/*
> +	 * If the device is an AMD graphics device and it supports
> +	 * SR-IOV it will require a large amount of resources.
> +	 * Before calling sriov_init() must ensure that the system
> +	 * BIOS also supports SR-IOV and that system BIOS has been
> +	 * able to allocate enough resources.
> +	 * If the VF BARs are zero then the system BIOS does not
> +	 * support SR-IOV or it could not allocate the resources
> +	 * and this platform will not support AMD graphics SR-IOV.
> +	 * Therefore do not call sriov_init().
> +	 * If the system BIOS does support SR-IOV then the VF BARs
> +	 * will be properly initialized to non-zero values.
> +	 */
> +		if (is_amd_display_adapter(dev)) {
> +			if (pci_vf_bar_valid(dev, pos))
> +				return sriov_init(dev, pos);
> +		} else {
> +			return sriov_init(dev, pos);
> +		}
> +	}
>  	return -ENODEV;
>  }
> 
> --
> 1.9.1
> 
> 
> 
> -Collins Cheng
Alexander H Duyck May 23, 2017, 4:02 p.m. UTC | #18
So Alex Williamson brought up an interesting point. What happens if
you boot with "pci=realloc=off"? Do you see the same issue with it
attempting to reallocate resources? I'm just wondering what the state
of things is if we don't attempt to reallocate resources after the
BIOS has configured them?

- Alex

On Mon, May 22, 2017 at 8:41 PM, Cheng, Collins <Collins.Cheng@amd.com> wrote:
> Hi Alex,
>
> I owe you a dmesg log. Attachment are two log files. 1.txt is without "pci=earlydump", 2.txt is with "pci=earlydump". The platform is an ASUS Z170-A motherboard that doesn't support SR-IOV. The graphics card is AMD FirePro S7150 card which enabled SR-IOV.
>
> You could find the error info like below in both logs. From the log, kernel failed to reallocate resource for BAR0 which is PF's Frame Buffer BAR (256MB needed), but kernel reallocated resource for BAR9 which is for VF. You are right, the real bug that is something goes wrong with the reallocation leaving the PF without resources.
>
> [    0.992976] pci 0000:01:00.0: BAR 0: no space for [mem size 0x10000000 64bit pref]
> [    0.992976] pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x10000000 64bit pref]
> [    0.992977] pci 0000:01:00.0: BAR 7: no space for [mem size 0x40000000 64bit pref]
> [    0.992978] pci 0000:01:00.0: BAR 7: failed to assign [mem size 0x40000000 64bit pref]
> [    0.992979] pci 0000:01:00.0: BAR 9: assigned [mem 0x88c00000-0x8abfffff 64bit pref]
> [    0.992986] pci 0000:01:00.0: BAR 12: no space for [mem size 0x02000000]
> [    0.992986] pci 0000:01:00.0: BAR 12: failed to assign [mem size 0x02000000]
> [    0.992988] pci 0000:01:00.0: BAR 2: assigned [mem 0x8ac00000-0x8adfffff 64bit pref]
> [    0.992994] pci 0000:01:00.0: BAR 5: no space for [mem size 0x00040000]
> [    0.992995] pci 0000:01:00.0: BAR 5: failed to assign [mem size 0x00040000]
> [    0.992996] pci 0000:01:00.0: BAR 6: no space for [mem size 0x00020000 pref]
> [    0.992997] pci 0000:01:00.0: BAR 6: failed to assign [mem size 0x00020000 pref]
>
> -Collins Cheng
>
> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Monday, May 22, 2017 11:44 PM
> To: Alexander Duyck <alexander.duyck@gmail.com>
> Cc: Cheng, Collins <Collins.Cheng@amd.com>; Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Deucher, Alexander <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>; Yinghai Lu <yinghai@kernel.org>
> Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV incapable platform
>
> On Fri, 19 May 2017 08:43:38 -0700
> Alexander Duyck <alexander.duyck@gmail.com> wrote:
>
>> On Mon, May 15, 2017 at 10:53 AM, Alex Williamson
>> <alex.williamson@redhat.com> wrote:
>> > On Mon, 15 May 2017 08:19:28 +0000
>> > "Cheng, Collins" <Collins.Cheng@amd.com> wrote:
>> >
>> >> Hi Williamson,
>> >>
>> >> We cannot assume BIOS supports SR-IOV, actually only newer server motherboard BIOS supports SR-IOV. Normal desktop motherboard BIOS or older server motherboard BIOS doesn't support SR-IOV. This issue would happen if an user plugs our AMD SR-IOV capable GPU card to a normal desktop motherboard.
>> >
>> > Servers should be supporting SR-IOV for a long time now.  What
>> > really is there to a BIOS supporting SR-IOV anyway, it's simply
>> > reserving sufficient bus number and MMIO resources such that we can
>> > enable the VFs.  This process isn't exclusively reserved for the
>> > BIOS.  Some platforms may choose to only initialize boot devices,
>> > leaving the rest for the OS to program.  The initial proposal here
>> > to disable SR-IOV if not programmed at OS hand-off disables even the
>> > possibility of the OS reallocating resources for this device.
>>
>> There are differences between supporting SR-IOV and supporting SR-IOV
>> on devices with massive resources. I know I have seen NICs that will
>> keep a system from completing POST if SR-IOV is enabled, and MMIO
>> beyond 4G is not. My guess would be that the issues being seen are
>> probably that they disable SR-IOV in the BIOS in such a setup and end
>> up running into issues when they try to boot into the Linux kernel as
>> it goes through and tries to allocate resources for SR-IOV even though
>> it was disabled in the BIOS.
>>
>> It might make sense to add a kernel parameter something like a
>> "pci=nosriov" that would allow for disabling SR-IOV and related
>> resource allocation if that is what we are talking about. That way you
>> could plug in these types of devices into a system with a legacy bios
>> or that doesn't wan to allocate addresses above 32b for MMIO, and this
>> parameter would be all that is needed to disable SR-IOV so you could
>> plug in a NIC that has SR-IOV associated with it.
>
> Hi,
>
> a) I think we're still ignoring the real bug that is something goes wrong with the reallocation leaving the PF without resources.
>
> b) Why does an option to avoid re-allocation need to be sr-iov specific?  Shouldn't pci=realloc=off cover this?
>
> Thanks,
> Alex
>
>> >> I agree that failure to allocate VF resources should leave the device in no worse condition than before it tried. I hope kernel could allocate PF device resource before allocating VF device resource, and keep PF device resource valid and functional if failed to allocate VF device resource.
>> >>
>> >> I will send out dmesg log lspci info tomorrow. Thanks.
>> >
>> > Thanks,
>> > Alex
>> >
>> >> -----Original Message-----
>> >> From: Alex Williamson [mailto:alex.williamson@redhat.com]
>> >> Sent: Friday, May 12, 2017 10:43 PM
>> >> To: Cheng, Collins <Collins.Cheng@amd.com>
>> >> Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org;
>> >> linux-kernel@vger.kernel.org; Deucher, Alexander
>> >> <Alexander.Deucher@amd.com>; Zytaruk, Kelly
>> >> <Kelly.Zytaruk@amd.com>; Yinghai Lu <yinghai@kernel.org>
>> >> Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the
>> >> SR-IOV incapable platform
>> >>
>> >> On Fri, 12 May 2017 04:51:43 +0000
>> >> "Cheng, Collins" <Collins.Cheng@amd.com> wrote:
>> >>
>> >> > Hi Williamson,
>> >> >
>> >> > I verified the patch is working for both AMD SR-IOV GPU and Intel SR-IOV NIC. I don't think it is redundant to check the VF BAR valid before call sriov_init(), it is safe and saving boot time, also there is no a better method to know if system BIOS has correctly initialized the SR-IOV capability or not.
>> >>
>> >> It also masks an underlying bug and creates a maintenance issue that we won't know when it's safe to remove this workaround.  I don't think faster boot is valid rationale, in one case SR-IOV is completely disabled, the other we attempt to allocate the resources the BIOS failed to provide.  I expect this is also a corner case, the BIOS should typically support SR-IOV, therefore this situation should be an exception.
>> >>
>> >> > I did not try to fix the issue from the kernel resource allocation perspective, it is because:
>> >> >     1. I am not very familiar with the PCI resource allocation scheme in kernel. For example, in sriov_init(), kernel will re-assign the PCI resource for both VF and PF. I don't understand why kernel allocates resource for VF firstly, then PF. If it is PF firstly, then this issue could be avoided.
>> >> >     2. I am not sure if kernel has error handler if PCI resource allocation failed. In this case, kernel cannot allocate enough resource to PF. It should trigger some error handler to either just keep original BAR values set by system BIOS, or disable this device and log errors.
>> >>
>> >> I think these are the issues we should be trying to solve and I'm
>> >> sure folks on the linux-pci list can help us identify the bug.
>> >> Minimally, failure to allocate VF resources should leave the device
>> >> in no worse condition than before it tried.  Perhaps you could post
>> >> more details about the issue, boot with pci=earlydump, post dmesg
>> >> of a boot where the PF resources are incorrectly re-allocated, and
>> >> include lspci -vvv for the SR-IOV device.  Also, please test with
>> >> the latest upstream kernel, upstream only patches old kernels
>> >> through stable backports of commits to the latest kernel.  Adding
>> >> Yinghai as a resource allocation expert. Thanks,
>> >>
>> >> Alex
>> >>
>> >> > -----Original Message-----
>> >> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
>> >> > Sent: Friday, May 12, 2017 12:01 PM
>> >> > To: Cheng, Collins <Collins.Cheng@amd.com>
>> >> > Cc: Bjorn Helgaas <bhelgaas@google.com>;
>> >> > linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Deucher,
>> >> > Alexander <Alexander.Deucher@amd.com>; Zytaruk, Kelly
>> >> > <Kelly.Zytaruk@amd.com>
>> >> > Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the
>> >> > SR-IOV incapable platform
>> >> >
>> >> > On Fri, 12 May 2017 03:42:46 +0000 "Cheng, Collins"
>> >> > <Collins.Cheng@amd.com> wrote:
>> >> >
>> >> > > Hi Williamson,
>> >> > >
>> >> > > GPU card needs more BAR aperture resource than other PCI devices. For example, Intel SR-IOV network card only require 512KB memory resource for all VFs. AMD SR-IOV GPU card needs 256MB x16 VF = 4GB memory resource for frame buffer BAR aperture.
>> >> > >
>> >> > > If the system BIOS supports SR-IOV, it will reserve enough resource for all VF BARs. If the system BIOS doesn't support SR-IOV or cannot allocate the enough resource for VF BARs, only PF BAR will be assigned and VF BARs are empty. Then system boots to Linux kernel and kernel doesn't check if the VF BARs are empty or valid. Kernel will re-assign the BAR resources for PF and all VFs. The problem I saw is that kernel will fail to allocate PF BAR resource because some resources are assigned to VF, this is not expected. So kernel might need to do some check before re-assign the PF/VF resource, so that PF device will be correctly assigned BAR resource and user can use PF device.
>> >> >
>> >> > So the problem is that something bad happens when the kernel is
>> >> > trying to reallocate resources in order to fulfill the
>> >> > requirements of the VFs, leaving the PF resources incorrectly
>> >> > programmed?  Why not just fix that bug rather than creating
>> >> > special handling for this vendor/class of device which disables
>> >> > any attempt to fixup resources for SR-IOV?  IOW, this patch just
>> >> > avoids the problem for your devices rather than fixing the bug.
>> >> > I'd suggest fixing the bug such that the PF is left in a
>> >> > functional state if the kernel is unable to allocate sufficient
>> >> > resources for the VFs.  Thanks,
>> >> >
>> >> > Alex
>> >> >
>> >> > > -----Original Message-----
>> >> > > From: Alex Williamson [mailto:alex.williamson@redhat.com]
>> >> > > Sent: Friday, May 12, 2017 11:21 AM
>> >> > > To: Cheng, Collins <Collins.Cheng@amd.com>
>> >> > > Cc: Bjorn Helgaas <bhelgaas@google.com>;
>> >> > > linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org;
>> >> > > Deucher, Alexander <Alexander.Deucher@amd.com>; Zytaruk, Kelly
>> >> > > <Kelly.Zytaruk@amd.com>
>> >> > > Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on
>> >> > > the SR-IOV incapable platform
>> >> > >
>> >> > > On Fri, 12 May 2017 02:50:32 +0000 "Cheng, Collins"
>> >> > > <Collins.Cheng@amd.com> wrote:
>> >> > >
>> >> > > > Hi Helgaas,
>> >> > > >
>> >> > > > Some AMD GPUs have hardware support for graphics SR-IOV.
>> >> > > > If the SR-IOV capable GPU is plugged into the SR-IOV
>> >> > > > incapable platform. It would cause a problem on PCI resource
>> >> > > > allocation in current Linux kernel.
>> >> > > >
>> >> > > > Therefore in order to allow the PF (Physical Function) device
>> >> > > > of SR-IOV capable GPU to work on the SR-IOV incapable
>> >> > > > platform, it is required to verify conditions for
>> >> > > > initializing BAR resources on AMD SR-IOV capable GPUs.
>> >> > > >
>> >> > > > If the device is an AMD graphics device and it supports
>> >> > > > SR-IOV it will require a large amount of resources.
>> >> > > > Before calling sriov_init() must ensure that the system BIOS
>> >> > > > also supports SR-IOV and that system BIOS has been able to
>> >> > > > allocate enough resources.
>> >> > > > If the VF BARs are zero then the system BIOS does not support
>> >> > > > SR-IOV or it could not allocate the resources and this
>> >> > > > platform will not support AMD graphics SR-IOV.
>> >> > > > Therefore do not call sriov_init().
>> >> > > > If the system BIOS does support SR-IOV then the VF BARs will
>> >> > > > be properly initialized to non-zero values.
>> >> > > >
>> >> > > > Below is the patch against to Kernel 4.8 & 4.9. Please review.
>> >> > > >
>> >> > > > I checked the drivers/pci/quirks.c, it looks the
>> >> > > > workarounds/fixes in quirks.c are for specific devices and
>> >> > > > one or more device ID are defined for the specific devices.
>> >> > > > However my patch is for all AMD SR-IOV capable GPUs, that includes all existing and future AMD server GPUs.
>> >> > > > So it doesn't seem like a good fit to put the fix in quirks.c.
>> >> > >
>> >> > >
>> >> > > Why is an AMD graphics card unique here?  Doesn't sriov_init()
>> >> > > always need to be able to deal with devices of any type where
>> >> > > the BIOS hasn't initialized the SR-IOV capability?  Some SR-IOV
>> >> > > devices can fit their VFs within a minimum bridge aperture,
>> >> > > most cannot.  I don't understand why the VF resource
>> >> > > requirements being exceptionally large dictates that they receive special handling.
>> >> > > Thanks,
>> >> > >
>> >> > > Alex
>> >> > >
>> >> > > > Signed-off-by: Collins Cheng <collins.cheng@amd.com>
>> >> > > > ---
>> >> > > >  drivers/pci/iov.c | 63
>> >> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++---
>> >> > > >  1 file changed, 60 insertions(+), 3 deletions(-)
>> >> > > >
>> >> > > > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index
>> >> > > > e30f05c..e4f1405 100644
>> >> > > > --- a/drivers/pci/iov.c
>> >> > > > +++ b/drivers/pci/iov.c
>> >> > > > @@ -523,6 +523,45 @@ static void sriov_restore_state(struct pci_dev *dev)
>> >> > > >                 msleep(100);
>> >> > > >  }
>> >> > > >
>> >> > > > +/*
>> >> > > > + * pci_vf_bar_valid - check if VF BARs have resource
>> >> > > > +allocated
>> >> > > > + * @dev: the PCI device
>> >> > > > + * @pos: register offset of SR-IOV capability in PCI config
>> >> > > > +space
>> >> > > > + * Returns true any VF BAR has resource allocated, false
>> >> > > > + * if all VF BARs are empty.
>> >> > > > + */
>> >> > > > +static bool pci_vf_bar_valid(struct pci_dev *dev, int pos) {
>> >> > > > +       int i;
>> >> > > > +       u32 bar_value;
>> >> > > > +       u32 bar_size_mask = ~(PCI_BASE_ADDRESS_SPACE |
>> >> > > > +                       PCI_BASE_ADDRESS_MEM_TYPE_64 |
>> >> > > > +                       PCI_BASE_ADDRESS_MEM_PREFETCH);
>> >> > > > +
>> >> > > > +       for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
>> >> > > > +               pci_read_config_dword(dev, pos + PCI_SRIOV_BAR + i * 4, &bar_value);
>> >> > > > +               if (bar_value & bar_size_mask)
>> >> > > > +                       return true;
>> >> > > > +       }
>> >> > > > +
>> >> > > > +       return false;
>> >> > > > +}
>> >> > > > +
>> >> > > > +/*
>> >> > > > + * is_amd_display_adapter - check if it is an AMD/ATI GPU
>> >> > > > +device
>> >> > > > + * @dev: the PCI device
>> >> > > > + *
>> >> > > > + * Returns true if device is an AMD/ATI display adapter,
>> >> > > > + * otherwise return false.
>> >> > > > + */
>> >> > > > +
>> >> > > > +static bool is_amd_display_adapter(struct pci_dev *dev) {
>> >> > > > +       return (((dev->class >> 16) == PCI_BASE_CLASS_DISPLAY) &&
>> >> > > > +               (dev->vendor == PCI_VENDOR_ID_ATI ||
>> >> > > > +               dev->vendor == PCI_VENDOR_ID_AMD)); }
>> >> > > > +
>> >> > > >  /**
>> >> > > >   * pci_iov_init - initialize the IOV capability
>> >> > > >   * @dev: the PCI device
>> >> > > > @@ -537,9 +576,27 @@ int pci_iov_init(struct pci_dev *dev)
>> >> > > >                 return -ENODEV;
>> >> > > >
>> >> > > >         pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);
>> >> > > > -       if (pos)
>> >> > > > -               return sriov_init(dev, pos);
>> >> > > > -
>> >> > > > +       if (pos) {
>> >> > > > +       /*
>> >> > > > +        * If the device is an AMD graphics device and it supports
>> >> > > > +        * SR-IOV it will require a large amount of resources.
>> >> > > > +        * Before calling sriov_init() must ensure that the system
>> >> > > > +        * BIOS also supports SR-IOV and that system BIOS has been
>> >> > > > +        * able to allocate enough resources.
>> >> > > > +        * If the VF BARs are zero then the system BIOS does not
>> >> > > > +        * support SR-IOV or it could not allocate the resources
>> >> > > > +        * and this platform will not support AMD graphics SR-IOV.
>> >> > > > +        * Therefore do not call sriov_init().
>> >> > > > +        * If the system BIOS does support SR-IOV then the VF BARs
>> >> > > > +        * will be properly initialized to non-zero values.
>> >> > > > +        */
>> >> > > > +               if (is_amd_display_adapter(dev)) {
>> >> > > > +                       if (pci_vf_bar_valid(dev, pos))
>> >> > > > +                               return sriov_init(dev, pos);
>> >> > > > +               } else {
>> >> > > > +                       return sriov_init(dev, pos);
>> >> > > > +               }
>> >> > > > +       }
>> >> > > >         return -ENODEV;
>> >> > > >  }
>> >> > > >
>> >> > >
>> >> >
>> >>
>> >
>
Alex Williamson May 23, 2017, 6:20 p.m. UTC | #19
On Tue, 23 May 2017 03:41:21 +0000
"Cheng, Collins" <Collins.Cheng@amd.com> wrote:

> Hi Alex,
> 
> I owe you a dmesg log. Attachment are two log files. 1.txt is without "pci=earlydump", 2.txt is with "pci=earlydump". The platform is an ASUS Z170-A motherboard that doesn't support SR-IOV. The graphics card is AMD FirePro S7150 card which enabled SR-IOV. 
> 
> You could find the error info like below in both logs. From the log, kernel failed to reallocate resource for BAR0 which is PF's Frame Buffer BAR (256MB needed), but kernel reallocated resource for BAR9 which is for VF. You are right, the real bug that is something goes wrong with the reallocation leaving the PF without resources.
> 
> [    0.992976] pci 0000:01:00.0: BAR 0: no space for [mem size 0x10000000 64bit pref]
> [    0.992976] pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x10000000 64bit pref]
> [    0.992977] pci 0000:01:00.0: BAR 7: no space for [mem size 0x40000000 64bit pref]
> [    0.992978] pci 0000:01:00.0: BAR 7: failed to assign [mem size 0x40000000 64bit pref]
> [    0.992979] pci 0000:01:00.0: BAR 9: assigned [mem 0x88c00000-0x8abfffff 64bit pref]
> [    0.992986] pci 0000:01:00.0: BAR 12: no space for [mem size 0x02000000]
> [    0.992986] pci 0000:01:00.0: BAR 12: failed to assign [mem size 0x02000000]
> [    0.992988] pci 0000:01:00.0: BAR 2: assigned [mem 0x8ac00000-0x8adfffff 64bit pref]
> [    0.992994] pci 0000:01:00.0: BAR 5: no space for [mem size 0x00040000]
> [    0.992995] pci 0000:01:00.0: BAR 5: failed to assign [mem size 0x00040000]
> [    0.992996] pci 0000:01:00.0: BAR 6: no space for [mem size 0x00020000 pref]
> [    0.992997] pci 0000:01:00.0: BAR 6: failed to assign [mem size 0x00020000 pref]

I've tried to extract more of the relevant resizing efforts below,
perhaps Yinghai or others can make more out of it.  In particular this
system offers no 64-bit MMIO and we'll never manage to allocate the
necessary SR-IOV resources without it.  AIUI, the PCI core won't try to
use anything outside the ACPI _CRS data without the option pci=nocrs.
This might present a second alternative in addition to the
pci=realloc=off, which is actually suggested by the kernel below.  So I
think we have at least two potential workarounds in the code as it
exists today, one leaving SR-IOV disabled, the other (hopefully)
enabling it using 64bit MMIO not described by the system BIOS.
Certainly an improvement would still be detecting the impossible
reallocation problem without nocrs and abandoning the process and
of course to revert the process before leaving more BARs unprogrammed
than we started with.  Thanks,

Alex

[    0.891319] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
[    0.891321] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
[    0.891322] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
[    0.891323] pci_bus 0000:00: root bus resource [mem 0x88800000-0xdfffffff window]
[    0.891324] pci_bus 0000:00: root bus resource [mem 0xfd000000-0xfe7fffff window]
[    0.891325] pci_bus 0000:00: root bus resource [bus 00-fe]
...
[    0.896481] pci 0000:01:00.0: [1002:6929] type 00 class 0x030000
[    0.896496] pci 0000:01:00.0: reg 0x10: [mem 0xc0000000-0xcfffffff 64bit pref]
[    0.896506] pci 0000:01:00.0: reg 0x18: [mem 0xd0000000-0xd01fffff 64bit pref]
[    0.896513] pci 0000:01:00.0: reg 0x20: [io  0xe000-0xe0ff]
[    0.896519] pci 0000:01:00.0: reg 0x24: [mem 0xdfe00000-0xdfe3ffff]
[    0.896526] pci 0000:01:00.0: reg 0x30: [mem 0xdfe40000-0xdfe5ffff pref]
[    0.896590] pci 0000:01:00.0: supports D1 D2
[    0.896590] pci 0000:01:00.0: PME# supported from D1 D2 D3hot D3cold
[    0.896625] pci 0000:01:00.0: reg 0x354: [mem 0x00000000-0x07ffffff 64bit pref]
[    0.896626] pci 0000:01:00.0: VF(n) BAR0 space: [mem 0x00000000-0x3fffffff 64bit pref] (contains BAR0 for 8 VFs)
[    0.896634] pci 0000:01:00.0: reg 0x35c: [mem 0x00000000-0x003fffff 64bit pref]
[    0.896635] pci 0000:01:00.0: VF(n) BAR2 space: [mem 0x00000000-0x01ffffff 64bit pref] (contains BAR2 for 8 VFs)
[    0.896646] pci 0000:01:00.0: reg 0x368: [mem 0x00000000-0x003fffff]
[    0.896647] pci 0000:01:00.0: VF(n) BAR5 space: [mem 0x00000000-0x01ffffff] (contains BAR5 for 8 VFs)
[    0.896700] pci 0000:01:00.0: System wakeup disabled by ACPI
[    0.906527] pci 0000:00:1b.0: PCI bridge to [bus 01]
[    0.906544] pci 0000:00:1b.0:   bridge window [io  0xe000-0xefff]
[    0.906546] pci 0000:00:1b.0:   bridge window [mem 0xdfe00000-0xdfefffff]
[    0.906549] pci 0000:00:1b.0:   bridge window [mem 0xc0000000-0xd01fffff 64bit pref]
[    0.906550] pci 0000:00:1b.0: bridge has subordinate 01 but max busn 02
...
[    0.943584] vgaarb: setting as boot device: PCI:0000:01:00.0
[    0.943585] vgaarb: device added: PCI:0000:01:00.0,decodes=io+mem,owns=io+mem,locks=none
[    0.943586] vgaarb: loaded
[    0.943586] vgaarb: bridge control possible 0000:01:00.0
...
[    0.997491] pci 0000:01:00.0: BAR 7: no space for [mem size 0x40000000 64bit pref]
[    0.997491] pci 0000:01:00.0: BAR 7: failed to assign [mem size 0x40000000 64bit pref]
[    0.997493] pci 0000:01:00.0: BAR 9: no space for [mem size 0x02000000 64bit pref]
[    0.997493] pci 0000:01:00.0: BAR 9: failed to assign [mem size 0x02000000 64bit pref]
[    0.997495] pci 0000:01:00.0: BAR 12: no space for [mem size 0x02000000]
[    0.997495] pci 0000:01:00.0: BAR 12: failed to assign [mem size 0x02000000]
[    0.997497] pci 0000:00:1b.0: PCI bridge to [bus 01]
[    0.997498] pci 0000:00:1b.0:   bridge window [io  0xe000-0xefff]
[    0.997501] pci 0000:00:1b.0:   bridge window [mem 0xdfe00000-0xdfefffff]
[    0.997502] pci 0000:00:1b.0:   bridge window [mem 0xc0000000-0xd01fffff 64bit pref]
...
[    0.997540] pci_bus 0000:00: No. 2 try to assign unassigned res
[    0.997540] release child resource [mem 0xdfe00000-0xdfe3ffff]
[    0.997540] release child resource [mem 0xdfe40000-0xdfe5ffff pref]
[    0.997541] pci 0000:00:1b.0: resource 14 [mem 0xdfe00000-0xdfefffff] released
[    0.997542] pci 0000:00:1b.0: PCI bridge to [bus 01]
[    0.997543] release child resource [mem 0xc0000000-0xcfffffff 64bit pref]
[    0.997544] release child resource [mem 0xd0000000-0xd01fffff 64bit pref]
[    0.997544] pci 0000:00:1b.0: resource 15 [mem 0xc0000000-0xd01fffff 64bit pref] released
[    0.997545] pci 0000:00:1b.0: PCI bridge to [bus 01]
[    0.997576] pci 0000:00:1b.0: BAR 15: no space for [mem size 0x58000000 64bit pref]
[    0.997577] pci 0000:00:1b.0: BAR 15: failed to assign [mem size 0x58000000 64bit pref]
[    0.997578] pci 0000:00:1b.0: BAR 14: assigned [mem 0x88c00000-0x8adfffff]
[    0.997583] pci 0000:01:00.0: BAR 0: no space for [mem size 0x10000000 64bit pref]
[    0.997583] pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x10000000 64bit pref]
[    0.997585] pci 0000:01:00.0: BAR 7: no space for [mem size 0x40000000 64bit pref]
[    0.997585] pci 0000:01:00.0: BAR 7: failed to assign [mem size 0x40000000 64bit pref]
[    0.997587] pci 0000:01:00.0: BAR 9: assigned [mem 0x88c00000-0x8abfffff 64bit pref]
[    0.997593] pci 0000:01:00.0: BAR 12: no space for [mem size 0x02000000]
[    0.997594] pci 0000:01:00.0: BAR 12: failed to assign [mem size 0x02000000]
[    0.997595] pci 0000:01:00.0: BAR 2: assigned [mem 0x8ac00000-0x8adfffff 64bit pref]
[    0.997602] pci 0000:01:00.0: BAR 5: no space for [mem size 0x00040000]
[    0.997602] pci 0000:01:00.0: BAR 5: failed to assign [mem size 0x00040000]
[    0.997603] pci 0000:01:00.0: BAR 6: no space for [mem size 0x00020000 pref]
[    0.997604] pci 0000:01:00.0: BAR 6: failed to assign [mem size 0x00020000 pref]
[    0.997606] pci 0000:00:1b.0: PCI bridge to [bus 01]
[    0.997607] pci 0000:00:1b.0:   bridge window [io  0xe000-0xefff]
[    0.997609] pci 0000:00:1b.0:   bridge window [mem 0x88c00000-0x8adfffff]
...
[    0.997647] pci_bus 0000:00: No. 3 try to assign unassigned res
[    0.997648] release child resource [mem 0x88c00000-0x8abfffff 64bit pref]
[    0.997648] release child resource [mem 0x8ac00000-0x8adfffff 64bit pref]
[    0.997649] pci 0000:00:1b.0: resource 14 [mem 0x88c00000-0x8adfffff] released
[    0.997649] pci 0000:00:1b.0: PCI bridge to [bus 01]
[    0.997651] release child resource [mem 0xdfd00000-0xdfd07fff 64bit]
[    0.997651] pci 0000:00:1c.0: resource 14 [mem 0xdfd00000-0xdfdfffff] released
[    0.997652] pci 0000:00:1c.0: PCI bridge to [bus 02]
[    0.997654] pci 0000:00:1d.0: resource 15 [mem 0x88a00000-0x88bfffff 64bit pref] released
[    0.997654] pci 0000:00:1d.0: PCI bridge to [bus 05]
[    0.997664] pci 0000:00:1b.0: bridge window [mem 0x08000000-0x5fffffff 64bit pref] to [bus 01] add_size 48000000 add_align 8000000
[    0.997666] pci 0000:00:1b.0: bridge window [mem 0x00100000-0x022fffff] to [bus 01] add_size 2200000 add_align 400000
[    0.997687] pci 0000:00:1d.0: bridge window [mem 0x00100000-0x002fffff 64bit pref] to [bus 05] add_size 200000 add_align 100000
[    0.997692] pci 0000:00:1b.0: res[15]=[mem 0x08000000-0x5fffffff 64bit pref] res_to_dev_res add_size 48000000 min_align 8000000
[    0.997693] pci 0000:00:1b.0: res[15]=[mem 0x08000000-0xa7ffffff 64bit pref] res_to_dev_res add_size 48000000 min_align 8000000
[    0.997693] pci 0000:00:1b.0: res[14]=[mem 0x00100000-0x022fffff] res_to_dev_res add_size 2200000 min_align 400000
[    0.997694] pci 0000:00:1b.0: res[14]=[mem 0x00100000-0x044fffff] res_to_dev_res add_size 2200000 min_align 400000
[    0.997695] pci 0000:00:1d.0: res[15]=[mem 0x00100000-0x002fffff 64bit pref] res_to_dev_res add_size 200000 min_align 100000
[    0.997696] pci 0000:00:1d.0: res[15]=[mem 0x00100000-0x004fffff 64bit pref] res_to_dev_res add_size 200000 min_align 100000
[    0.997698] pci 0000:00:1b.0: BAR 15: no space for [mem size 0xa0000000 64bit pref]
[    0.997699] pci 0000:00:1b.0: BAR 15: failed to assign [mem size 0xa0000000 64bit pref]
[    0.997700] pci 0000:00:1b.0: BAR 14: assigned [mem 0x88c00000-0x8cffffff]
[    0.997701] pci 0000:00:1c.0: BAR 14: assigned [mem 0x88a00000-0x88afffff]
[    0.997702] pci 0000:00:1d.0: BAR 15: assigned [mem 0x8d000000-0x8d3fffff 64bit pref]
[    0.997705] pci 0000:00:1b.0: BAR 15: no space for [mem size 0x58000000 64bit pref]
[    0.997706] pci 0000:00:1b.0: BAR 15: failed to assign [mem size 0x58000000 64bit pref]
[    0.997707] pci 0000:00:1b.0: BAR 14: assigned [mem 0x88a00000-0x8abfffff]
[    0.997708] pci 0000:00:1c.0: BAR 14: assigned [mem 0x8ac00000-0x8acfffff]
[    0.997709] pci 0000:00:1d.0: BAR 15: assigned [mem 0x8ad00000-0x8aefffff 64bit pref]
[    0.997711] pci 0000:00:1d.0: BAR 15: reassigned [mem 0x8ad00000-0x8b0fffff 64bit pref] (expanded by 0x200000)
[    0.997713] pci 0000:00:1b.0: BAR 14: reassigned [mem 0x8b400000-0x8f7fffff] (expanded by 0x2200000)
[    0.997719] pci 0000:01:00.0: res[7]=[mem size 0x00000000 64bit pref] res_to_dev_res add_size 40000000 min_align 0
[    0.997720] pci 0000:01:00.0: res[9]=[mem 0x00000000-0xffffffffffffffff 64bit pref] res_to_dev_res add_size 2000000 min_align 0
[    0.997721] pci 0000:01:00.0: res[12]=[mem size 0x00000000] res_to_dev_res add_size 2000000 min_align 0
[    0.997722] pci 0000:01:00.0: BAR 0: no space for [mem size 0x10000000 64bit pref]
[    0.997722] pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x10000000 64bit pref]
[    0.997723] pci 0000:01:00.0: BAR 7: no space for [mem size 0x40000000 64bit pref]
[    0.997724] pci 0000:01:00.0: BAR 7: failed to assign [mem size 0x40000000 64bit pref]
[    0.997725] pci 0000:01:00.0: BAR 9: assigned [mem 0x8b400000-0x8d3fffff 64bit pref]
[    0.997731] pci 0000:01:00.0: BAR 12: assigned [mem 0x8d400000-0x8f3fffff]
[    0.997734] pci 0000:01:00.0: BAR 2: assigned [mem 0x8f400000-0x8f5fffff 64bit pref]
[    0.997740] pci 0000:01:00.0: BAR 5: assigned [mem 0x8f600000-0x8f63ffff]
[    0.997744] pci 0000:01:00.0: BAR 0: no space for [mem size 0x10000000 64bit pref]
[    0.997745] pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x10000000 64bit pref]
[    0.997746] pci 0000:01:00.0: BAR 2: assigned [mem 0x8b400000-0x8b5fffff 64bit pref]
[    0.997753] pci 0000:01:00.0: BAR 5: assigned [mem 0x8b600000-0x8b63ffff]
[    0.997756] pci 0000:01:00.0: BAR 12: assigned [mem 0x8b800000-0x8d7fffff]
[    0.997758] pci 0000:01:00.0: BAR 9: assigned [mem 0x8d800000-0x8f7fffff 64bit pref]
[    0.997765] pci 0000:01:00.0: BAR 7: no space for [mem size 0x40000000 64bit pref]
[    0.997765] pci 0000:01:00.0: BAR 7: failed to assign [mem size 0x40000000 64bit pref]
[    0.997767] pci 0000:00:1b.0: PCI bridge to [bus 01]
[    0.997768] pci 0000:00:1b.0:   bridge window [io  0xe000-0xefff]
[    0.997770] pci 0000:00:1b.0:   bridge window [mem 0x8b400000-0x8f7fffff]
...
[    0.997818] pci_bus 0000:00: Automatically enabled pci realloc, if
you have problem, try booting with pci=realloc=off
Cheng, Collins May 24, 2017, 8:57 a.m. UTC | #20
Hi Alex,

How do you know "particular this system offers no 64-bit MMIO", from dmesg log?

-Collins Cheng


-----Original Message-----
From: Cheng, Collins 
Sent: Wednesday, May 24, 2017 4:56 PM
To: 'Alex Williamson' <alex.williamson@redhat.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>; Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Deucher, Alexander <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>; Yinghai Lu <yinghai@kernel.org>
Subject: RE: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV incapable platform

Hi Alex W, Alex D,

I just tried two options, one is enable "Above 4G Decoding" in BIOS setup menu, the other is add "pci=realloc=off" in grub. Both can fix this issue. Please see the attached log files.

Previously I thought "Above 4G Decoding" is enabled, but it is off when I looked CMOS setup today.

For now I think we have a solution. For the system that supports "Above 4G Decoding", user should enable it when use a SR-IOV supported device. For the system that doesn't support "Above 4G Decoding", user needs to add "pci=realloc=off" in grub.

Potentially I think kernel still needs to find a way to avoid this issue happen, like keeps the resource as the BIOS assigned value if there is a failure on device's resource reallocation.


-Collins Cheng


-----Original Message-----
From: Alex Williamson [mailto:alex.williamson@redhat.com] 
Sent: Wednesday, May 24, 2017 2:20 AM
To: Cheng, Collins <Collins.Cheng@amd.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>; Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Deucher, Alexander <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>; Yinghai Lu <yinghai@kernel.org>
Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV incapable platform

On Tue, 23 May 2017 03:41:21 +0000
"Cheng, Collins" <Collins.Cheng@amd.com> wrote:

> Hi Alex,
> 
> I owe you a dmesg log. Attachment are two log files. 1.txt is without "pci=earlydump", 2.txt is with "pci=earlydump". The platform is an ASUS Z170-A motherboard that doesn't support SR-IOV. The graphics card is AMD FirePro S7150 card which enabled SR-IOV. 
> 
> You could find the error info like below in both logs. From the log, kernel failed to reallocate resource for BAR0 which is PF's Frame Buffer BAR (256MB needed), but kernel reallocated resource for BAR9 which is for VF. You are right, the real bug that is something goes wrong with the reallocation leaving the PF without resources.
> 
> [    0.992976] pci 0000:01:00.0: BAR 0: no space for [mem size 0x10000000 64bit pref]
> [    0.992976] pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x10000000 64bit pref]
> [    0.992977] pci 0000:01:00.0: BAR 7: no space for [mem size 0x40000000 64bit pref]
> [    0.992978] pci 0000:01:00.0: BAR 7: failed to assign [mem size 0x40000000 64bit pref]
> [    0.992979] pci 0000:01:00.0: BAR 9: assigned [mem 0x88c00000-0x8abfffff 64bit pref]
> [    0.992986] pci 0000:01:00.0: BAR 12: no space for [mem size 0x02000000]
> [    0.992986] pci 0000:01:00.0: BAR 12: failed to assign [mem size 0x02000000]
> [    0.992988] pci 0000:01:00.0: BAR 2: assigned [mem 0x8ac00000-0x8adfffff 64bit pref]
> [    0.992994] pci 0000:01:00.0: BAR 5: no space for [mem size 0x00040000]
> [    0.992995] pci 0000:01:00.0: BAR 5: failed to assign [mem size 0x00040000]
> [    0.992996] pci 0000:01:00.0: BAR 6: no space for [mem size 0x00020000 pref]
> [    0.992997] pci 0000:01:00.0: BAR 6: failed to assign [mem size 0x00020000 pref]

I've tried to extract more of the relevant resizing efforts below, perhaps Yinghai or others can make more out of it.  In particular this system offers no 64-bit MMIO and we'll never manage to allocate the necessary SR-IOV resources without it.  AIUI, the PCI core won't try to use anything outside the ACPI _CRS data without the option pci=nocrs.
This might present a second alternative in addition to the pci=realloc=off, which is actually suggested by the kernel below.  So I think we have at least two potential workarounds in the code as it exists today, one leaving SR-IOV disabled, the other (hopefully) enabling it using 64bit MMIO not described by the system BIOS.
Certainly an improvement would still be detecting the impossible reallocation problem without nocrs and abandoning the process and of course to revert the process before leaving more BARs unprogrammed than we started with.  Thanks,

Alex

[    0.891319] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
[    0.891321] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
[    0.891322] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
[    0.891323] pci_bus 0000:00: root bus resource [mem 0x88800000-0xdfffffff window]
[    0.891324] pci_bus 0000:00: root bus resource [mem 0xfd000000-0xfe7fffff window]
[    0.891325] pci_bus 0000:00: root bus resource [bus 00-fe]
...
[    0.896481] pci 0000:01:00.0: [1002:6929] type 00 class 0x030000
[    0.896496] pci 0000:01:00.0: reg 0x10: [mem 0xc0000000-0xcfffffff 64bit pref]
[    0.896506] pci 0000:01:00.0: reg 0x18: [mem 0xd0000000-0xd01fffff 64bit pref]
[    0.896513] pci 0000:01:00.0: reg 0x20: [io  0xe000-0xe0ff]
[    0.896519] pci 0000:01:00.0: reg 0x24: [mem 0xdfe00000-0xdfe3ffff]
[    0.896526] pci 0000:01:00.0: reg 0x30: [mem 0xdfe40000-0xdfe5ffff pref]
[    0.896590] pci 0000:01:00.0: supports D1 D2
[    0.896590] pci 0000:01:00.0: PME# supported from D1 D2 D3hot D3cold
[    0.896625] pci 0000:01:00.0: reg 0x354: [mem 0x00000000-0x07ffffff 64bit pref]
[    0.896626] pci 0000:01:00.0: VF(n) BAR0 space: [mem 0x00000000-0x3fffffff 64bit pref] (contains BAR0 for 8 VFs)
[    0.896634] pci 0000:01:00.0: reg 0x35c: [mem 0x00000000-0x003fffff 64bit pref]
[    0.896635] pci 0000:01:00.0: VF(n) BAR2 space: [mem 0x00000000-0x01ffffff 64bit pref] (contains BAR2 for 8 VFs)
[    0.896646] pci 0000:01:00.0: reg 0x368: [mem 0x00000000-0x003fffff]
[    0.896647] pci 0000:01:00.0: VF(n) BAR5 space: [mem 0x00000000-0x01ffffff] (contains BAR5 for 8 VFs)
[    0.896700] pci 0000:01:00.0: System wakeup disabled by ACPI
[    0.906527] pci 0000:00:1b.0: PCI bridge to [bus 01]
[    0.906544] pci 0000:00:1b.0:   bridge window [io  0xe000-0xefff]
[    0.906546] pci 0000:00:1b.0:   bridge window [mem 0xdfe00000-0xdfefffff]
[    0.906549] pci 0000:00:1b.0:   bridge window [mem 0xc0000000-0xd01fffff 64bit pref]
[    0.906550] pci 0000:00:1b.0: bridge has subordinate 01 but max busn 02
...
[    0.943584] vgaarb: setting as boot device: PCI:0000:01:00.0
[    0.943585] vgaarb: device added: PCI:0000:01:00.0,decodes=io+mem,owns=io+mem,locks=none
[    0.943586] vgaarb: loaded
[    0.943586] vgaarb: bridge control possible 0000:01:00.0
...
[    0.997491] pci 0000:01:00.0: BAR 7: no space for [mem size 0x40000000 64bit pref]
[    0.997491] pci 0000:01:00.0: BAR 7: failed to assign [mem size 0x40000000 64bit pref]
[    0.997493] pci 0000:01:00.0: BAR 9: no space for [mem size 0x02000000 64bit pref]
[    0.997493] pci 0000:01:00.0: BAR 9: failed to assign [mem size 0x02000000 64bit pref]
[    0.997495] pci 0000:01:00.0: BAR 12: no space for [mem size 0x02000000]
[    0.997495] pci 0000:01:00.0: BAR 12: failed to assign [mem size 0x02000000]
[    0.997497] pci 0000:00:1b.0: PCI bridge to [bus 01]
[    0.997498] pci 0000:00:1b.0:   bridge window [io  0xe000-0xefff]
[    0.997501] pci 0000:00:1b.0:   bridge window [mem 0xdfe00000-0xdfefffff]
[    0.997502] pci 0000:00:1b.0:   bridge window [mem 0xc0000000-0xd01fffff 64bit pref]
...
[    0.997540] pci_bus 0000:00: No. 2 try to assign unassigned res
[    0.997540] release child resource [mem 0xdfe00000-0xdfe3ffff]
[    0.997540] release child resource [mem 0xdfe40000-0xdfe5ffff pref]
[    0.997541] pci 0000:00:1b.0: resource 14 [mem 0xdfe00000-0xdfefffff] released
[    0.997542] pci 0000:00:1b.0: PCI bridge to [bus 01]
[    0.997543] release child resource [mem 0xc0000000-0xcfffffff 64bit pref]
[    0.997544] release child resource [mem 0xd0000000-0xd01fffff 64bit pref]
[    0.997544] pci 0000:00:1b.0: resource 15 [mem 0xc0000000-0xd01fffff 64bit pref] released
[    0.997545] pci 0000:00:1b.0: PCI bridge to [bus 01]
[    0.997576] pci 0000:00:1b.0: BAR 15: no space for [mem size 0x58000000 64bit pref]
[    0.997577] pci 0000:00:1b.0: BAR 15: failed to assign [mem size 0x58000000 64bit pref]
[    0.997578] pci 0000:00:1b.0: BAR 14: assigned [mem 0x88c00000-0x8adfffff]
[    0.997583] pci 0000:01:00.0: BAR 0: no space for [mem size 0x10000000 64bit pref]
[    0.997583] pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x10000000 64bit pref]
[    0.997585] pci 0000:01:00.0: BAR 7: no space for [mem size 0x40000000 64bit pref]
[    0.997585] pci 0000:01:00.0: BAR 7: failed to assign [mem size 0x40000000 64bit pref]
[    0.997587] pci 0000:01:00.0: BAR 9: assigned [mem 0x88c00000-0x8abfffff 64bit pref]
[    0.997593] pci 0000:01:00.0: BAR 12: no space for [mem size 0x02000000]
[    0.997594] pci 0000:01:00.0: BAR 12: failed to assign [mem size 0x02000000]
[    0.997595] pci 0000:01:00.0: BAR 2: assigned [mem 0x8ac00000-0x8adfffff 64bit pref]
[    0.997602] pci 0000:01:00.0: BAR 5: no space for [mem size 0x00040000]
[    0.997602] pci 0000:01:00.0: BAR 5: failed to assign [mem size 0x00040000]
[    0.997603] pci 0000:01:00.0: BAR 6: no space for [mem size 0x00020000 pref]
[    0.997604] pci 0000:01:00.0: BAR 6: failed to assign [mem size 0x00020000 pref]
[    0.997606] pci 0000:00:1b.0: PCI bridge to [bus 01]
[    0.997607] pci 0000:00:1b.0:   bridge window [io  0xe000-0xefff]
[    0.997609] pci 0000:00:1b.0:   bridge window [mem 0x88c00000-0x8adfffff]
...
[    0.997647] pci_bus 0000:00: No. 3 try to assign unassigned res
[    0.997648] release child resource [mem 0x88c00000-0x8abfffff 64bit pref]
[    0.997648] release child resource [mem 0x8ac00000-0x8adfffff 64bit pref]
[    0.997649] pci 0000:00:1b.0: resource 14 [mem 0x88c00000-0x8adfffff] released
[    0.997649] pci 0000:00:1b.0: PCI bridge to [bus 01]
[    0.997651] release child resource [mem 0xdfd00000-0xdfd07fff 64bit]
[    0.997651] pci 0000:00:1c.0: resource 14 [mem 0xdfd00000-0xdfdfffff] released
[    0.997652] pci 0000:00:1c.0: PCI bridge to [bus 02]
[    0.997654] pci 0000:00:1d.0: resource 15 [mem 0x88a00000-0x88bfffff 64bit pref] released
[    0.997654] pci 0000:00:1d.0: PCI bridge to [bus 05]
[    0.997664] pci 0000:00:1b.0: bridge window [mem 0x08000000-0x5fffffff 64bit pref] to [bus 01] add_size 48000000 add_align 8000000
[    0.997666] pci 0000:00:1b.0: bridge window [mem 0x00100000-0x022fffff] to [bus 01] add_size 2200000 add_align 400000
[    0.997687] pci 0000:00:1d.0: bridge window [mem 0x00100000-0x002fffff 64bit pref] to [bus 05] add_size 200000 add_align 100000
[    0.997692] pci 0000:00:1b.0: res[15]=[mem 0x08000000-0x5fffffff 64bit pref] res_to_dev_res add_size 48000000 min_align 8000000
[    0.997693] pci 0000:00:1b.0: res[15]=[mem 0x08000000-0xa7ffffff 64bit pref] res_to_dev_res add_size 48000000 min_align 8000000
[    0.997693] pci 0000:00:1b.0: res[14]=[mem 0x00100000-0x022fffff] res_to_dev_res add_size 2200000 min_align 400000
[    0.997694] pci 0000:00:1b.0: res[14]=[mem 0x00100000-0x044fffff] res_to_dev_res add_size 2200000 min_align 400000
[    0.997695] pci 0000:00:1d.0: res[15]=[mem 0x00100000-0x002fffff 64bit pref] res_to_dev_res add_size 200000 min_align 100000
[    0.997696] pci 0000:00:1d.0: res[15]=[mem 0x00100000-0x004fffff 64bit pref] res_to_dev_res add_size 200000 min_align 100000
[    0.997698] pci 0000:00:1b.0: BAR 15: no space for [mem size 0xa0000000 64bit pref]
[    0.997699] pci 0000:00:1b.0: BAR 15: failed to assign [mem size 0xa0000000 64bit pref]
[    0.997700] pci 0000:00:1b.0: BAR 14: assigned [mem 0x88c00000-0x8cffffff]
[    0.997701] pci 0000:00:1c.0: BAR 14: assigned [mem 0x88a00000-0x88afffff]
[    0.997702] pci 0000:00:1d.0: BAR 15: assigned [mem 0x8d000000-0x8d3fffff 64bit pref]
[    0.997705] pci 0000:00:1b.0: BAR 15: no space for [mem size 0x58000000 64bit pref]
[    0.997706] pci 0000:00:1b.0: BAR 15: failed to assign [mem size 0x58000000 64bit pref]
[    0.997707] pci 0000:00:1b.0: BAR 14: assigned [mem 0x88a00000-0x8abfffff]
[    0.997708] pci 0000:00:1c.0: BAR 14: assigned [mem 0x8ac00000-0x8acfffff]
[    0.997709] pci 0000:00:1d.0: BAR 15: assigned [mem 0x8ad00000-0x8aefffff 64bit pref]
[    0.997711] pci 0000:00:1d.0: BAR 15: reassigned [mem 0x8ad00000-0x8b0fffff 64bit pref] (expanded by 0x200000)
[    0.997713] pci 0000:00:1b.0: BAR 14: reassigned [mem 0x8b400000-0x8f7fffff] (expanded by 0x2200000)
[    0.997719] pci 0000:01:00.0: res[7]=[mem size 0x00000000 64bit pref] res_to_dev_res add_size 40000000 min_align 0
[    0.997720] pci 0000:01:00.0: res[9]=[mem 0x00000000-0xffffffffffffffff 64bit pref] res_to_dev_res add_size 2000000 min_align 0
[    0.997721] pci 0000:01:00.0: res[12]=[mem size 0x00000000] res_to_dev_res add_size 2000000 min_align 0
[    0.997722] pci 0000:01:00.0: BAR 0: no space for [mem size 0x10000000 64bit pref]
[    0.997722] pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x10000000 64bit pref]
[    0.997723] pci 0000:01:00.0: BAR 7: no space for [mem size 0x40000000 64bit pref]
[    0.997724] pci 0000:01:00.0: BAR 7: failed to assign [mem size 0x40000000 64bit pref]
[    0.997725] pci 0000:01:00.0: BAR 9: assigned [mem 0x8b400000-0x8d3fffff 64bit pref]
[    0.997731] pci 0000:01:00.0: BAR 12: assigned [mem 0x8d400000-0x8f3fffff]
[    0.997734] pci 0000:01:00.0: BAR 2: assigned [mem 0x8f400000-0x8f5fffff 64bit pref]
[    0.997740] pci 0000:01:00.0: BAR 5: assigned [mem 0x8f600000-0x8f63ffff]
[    0.997744] pci 0000:01:00.0: BAR 0: no space for [mem size 0x10000000 64bit pref]
[    0.997745] pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x10000000 64bit pref]
[    0.997746] pci 0000:01:00.0: BAR 2: assigned [mem 0x8b400000-0x8b5fffff 64bit pref]
[    0.997753] pci 0000:01:00.0: BAR 5: assigned [mem 0x8b600000-0x8b63ffff]
[    0.997756] pci 0000:01:00.0: BAR 12: assigned [mem 0x8b800000-0x8d7fffff]
[    0.997758] pci 0000:01:00.0: BAR 9: assigned [mem 0x8d800000-0x8f7fffff 64bit pref]
[    0.997765] pci 0000:01:00.0: BAR 7: no space for [mem size 0x40000000 64bit pref]
[    0.997765] pci 0000:01:00.0: BAR 7: failed to assign [mem size 0x40000000 64bit pref]
[    0.997767] pci 0000:00:1b.0: PCI bridge to [bus 01]
[    0.997768] pci 0000:00:1b.0:   bridge window [io  0xe000-0xefff]
[    0.997770] pci 0000:00:1b.0:   bridge window [mem 0x8b400000-0x8f7fffff]
...
[    0.997818] pci_bus 0000:00: Automatically enabled pci realloc, if
you have problem, try booting with pci=realloc=off
Alex Williamson May 24, 2017, 2:57 p.m. UTC | #21
On Wed, 24 May 2017 08:57:53 +0000
"Cheng, Collins" <Collins.Cheng@amd.com> wrote:

> Hi Alex,
> 
> How do you know "particular this system offers no 64-bit MMIO", from dmesg log?

From this:

> [    0.891319] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
> [    0.891321] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
> [    0.891322] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
> [    0.891323] pci_bus 0000:00: root bus resource [mem 0x88800000-0xdfffffff window]
> [    0.891324] pci_bus 0000:00: root bus resource [mem 0xfd000000-0xfe7fffff window]
> [    0.891325] pci_bus 0000:00: root bus resource [bus 00-fe]

There are only 32-bit ranges listed.  Thanks,

Alex
Cheng, Collins May 26, 2017, 1:52 a.m. UTC | #22
Hi Alex W,

I don't need the kernel patch anymore. However it looks the kernel could be improved to handle this more gracefully when PCI resource allocation fail. Do you have a plan to improve it in kernel PCI code?

-Collins Cheng


-----Original Message-----
From: Cheng, Collins 
Sent: Wednesday, May 24, 2017 4:56 PM
To: 'Alex Williamson' <alex.williamson@redhat.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>; Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Deucher, Alexander <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>; Yinghai Lu <yinghai@kernel.org>
Subject: RE: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV incapable platform

Hi Alex W, Alex D,

I just tried two options, one is enable "Above 4G Decoding" in BIOS setup menu, the other is add "pci=realloc=off" in grub. Both can fix this issue. Please see the attached log files.

Previously I thought "Above 4G Decoding" is enabled, but it is off when I looked CMOS setup today.

For now I think we have a solution. For the system that supports "Above 4G Decoding", user should enable it when use a SR-IOV supported device. For the system that doesn't support "Above 4G Decoding", user needs to add "pci=realloc=off" in grub.

Potentially I think kernel still needs to find a way to avoid this issue happen, like keeps the resource as the BIOS assigned value if there is a failure on device's resource reallocation.


-Collins Cheng


-----Original Message-----
From: Alex Williamson [mailto:alex.williamson@redhat.com] 
Sent: Wednesday, May 24, 2017 2:20 AM
To: Cheng, Collins <Collins.Cheng@amd.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>; Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Deucher, Alexander <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>; Yinghai Lu <yinghai@kernel.org>
Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV incapable platform

On Tue, 23 May 2017 03:41:21 +0000
"Cheng, Collins" <Collins.Cheng@amd.com> wrote:

> Hi Alex,
> 
> I owe you a dmesg log. Attachment are two log files. 1.txt is without "pci=earlydump", 2.txt is with "pci=earlydump". The platform is an ASUS Z170-A motherboard that doesn't support SR-IOV. The graphics card is AMD FirePro S7150 card which enabled SR-IOV. 
> 
> You could find the error info like below in both logs. From the log, kernel failed to reallocate resource for BAR0 which is PF's Frame Buffer BAR (256MB needed), but kernel reallocated resource for BAR9 which is for VF. You are right, the real bug that is something goes wrong with the reallocation leaving the PF without resources.
> 
> [    0.992976] pci 0000:01:00.0: BAR 0: no space for [mem size 0x10000000 64bit pref]
> [    0.992976] pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x10000000 64bit pref]
> [    0.992977] pci 0000:01:00.0: BAR 7: no space for [mem size 0x40000000 64bit pref]
> [    0.992978] pci 0000:01:00.0: BAR 7: failed to assign [mem size 0x40000000 64bit pref]
> [    0.992979] pci 0000:01:00.0: BAR 9: assigned [mem 0x88c00000-0x8abfffff 64bit pref]
> [    0.992986] pci 0000:01:00.0: BAR 12: no space for [mem size 0x02000000]
> [    0.992986] pci 0000:01:00.0: BAR 12: failed to assign [mem size 0x02000000]
> [    0.992988] pci 0000:01:00.0: BAR 2: assigned [mem 0x8ac00000-0x8adfffff 64bit pref]
> [    0.992994] pci 0000:01:00.0: BAR 5: no space for [mem size 0x00040000]
> [    0.992995] pci 0000:01:00.0: BAR 5: failed to assign [mem size 0x00040000]
> [    0.992996] pci 0000:01:00.0: BAR 6: no space for [mem size 0x00020000 pref]
> [    0.992997] pci 0000:01:00.0: BAR 6: failed to assign [mem size 0x00020000 pref]

I've tried to extract more of the relevant resizing efforts below, perhaps Yinghai or others can make more out of it.  In particular this system offers no 64-bit MMIO and we'll never manage to allocate the necessary SR-IOV resources without it.  AIUI, the PCI core won't try to use anything outside the ACPI _CRS data without the option pci=nocrs.
This might present a second alternative in addition to the pci=realloc=off, which is actually suggested by the kernel below.  So I think we have at least two potential workarounds in the code as it exists today, one leaving SR-IOV disabled, the other (hopefully) enabling it using 64bit MMIO not described by the system BIOS.
Certainly an improvement would still be detecting the impossible reallocation problem without nocrs and abandoning the process and of course to revert the process before leaving more BARs unprogrammed than we started with.  Thanks,

Alex

[    0.891319] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
[    0.891321] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
[    0.891322] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
[    0.891323] pci_bus 0000:00: root bus resource [mem 0x88800000-0xdfffffff window]
[    0.891324] pci_bus 0000:00: root bus resource [mem 0xfd000000-0xfe7fffff window]
[    0.891325] pci_bus 0000:00: root bus resource [bus 00-fe]
...
[    0.896481] pci 0000:01:00.0: [1002:6929] type 00 class 0x030000
[    0.896496] pci 0000:01:00.0: reg 0x10: [mem 0xc0000000-0xcfffffff 64bit pref]
[    0.896506] pci 0000:01:00.0: reg 0x18: [mem 0xd0000000-0xd01fffff 64bit pref]
[    0.896513] pci 0000:01:00.0: reg 0x20: [io  0xe000-0xe0ff]
[    0.896519] pci 0000:01:00.0: reg 0x24: [mem 0xdfe00000-0xdfe3ffff]
[    0.896526] pci 0000:01:00.0: reg 0x30: [mem 0xdfe40000-0xdfe5ffff pref]
[    0.896590] pci 0000:01:00.0: supports D1 D2
[    0.896590] pci 0000:01:00.0: PME# supported from D1 D2 D3hot D3cold
[    0.896625] pci 0000:01:00.0: reg 0x354: [mem 0x00000000-0x07ffffff 64bit pref]
[    0.896626] pci 0000:01:00.0: VF(n) BAR0 space: [mem 0x00000000-0x3fffffff 64bit pref] (contains BAR0 for 8 VFs)
[    0.896634] pci 0000:01:00.0: reg 0x35c: [mem 0x00000000-0x003fffff 64bit pref]
[    0.896635] pci 0000:01:00.0: VF(n) BAR2 space: [mem 0x00000000-0x01ffffff 64bit pref] (contains BAR2 for 8 VFs)
[    0.896646] pci 0000:01:00.0: reg 0x368: [mem 0x00000000-0x003fffff]
[    0.896647] pci 0000:01:00.0: VF(n) BAR5 space: [mem 0x00000000-0x01ffffff] (contains BAR5 for 8 VFs)
[    0.896700] pci 0000:01:00.0: System wakeup disabled by ACPI
[    0.906527] pci 0000:00:1b.0: PCI bridge to [bus 01]
[    0.906544] pci 0000:00:1b.0:   bridge window [io  0xe000-0xefff]
[    0.906546] pci 0000:00:1b.0:   bridge window [mem 0xdfe00000-0xdfefffff]
[    0.906549] pci 0000:00:1b.0:   bridge window [mem 0xc0000000-0xd01fffff 64bit pref]
[    0.906550] pci 0000:00:1b.0: bridge has subordinate 01 but max busn 02
...
[    0.943584] vgaarb: setting as boot device: PCI:0000:01:00.0
[    0.943585] vgaarb: device added: PCI:0000:01:00.0,decodes=io+mem,owns=io+mem,locks=none
[    0.943586] vgaarb: loaded
[    0.943586] vgaarb: bridge control possible 0000:01:00.0
...
[    0.997491] pci 0000:01:00.0: BAR 7: no space for [mem size 0x40000000 64bit pref]
[    0.997491] pci 0000:01:00.0: BAR 7: failed to assign [mem size 0x40000000 64bit pref]
[    0.997493] pci 0000:01:00.0: BAR 9: no space for [mem size 0x02000000 64bit pref]
[    0.997493] pci 0000:01:00.0: BAR 9: failed to assign [mem size 0x02000000 64bit pref]
[    0.997495] pci 0000:01:00.0: BAR 12: no space for [mem size 0x02000000]
[    0.997495] pci 0000:01:00.0: BAR 12: failed to assign [mem size 0x02000000]
[    0.997497] pci 0000:00:1b.0: PCI bridge to [bus 01]
[    0.997498] pci 0000:00:1b.0:   bridge window [io  0xe000-0xefff]
[    0.997501] pci 0000:00:1b.0:   bridge window [mem 0xdfe00000-0xdfefffff]
[    0.997502] pci 0000:00:1b.0:   bridge window [mem 0xc0000000-0xd01fffff 64bit pref]
...
[    0.997540] pci_bus 0000:00: No. 2 try to assign unassigned res
[    0.997540] release child resource [mem 0xdfe00000-0xdfe3ffff]
[    0.997540] release child resource [mem 0xdfe40000-0xdfe5ffff pref]
[    0.997541] pci 0000:00:1b.0: resource 14 [mem 0xdfe00000-0xdfefffff] released
[    0.997542] pci 0000:00:1b.0: PCI bridge to [bus 01]
[    0.997543] release child resource [mem 0xc0000000-0xcfffffff 64bit pref]
[    0.997544] release child resource [mem 0xd0000000-0xd01fffff 64bit pref]
[    0.997544] pci 0000:00:1b.0: resource 15 [mem 0xc0000000-0xd01fffff 64bit pref] released
[    0.997545] pci 0000:00:1b.0: PCI bridge to [bus 01]
[    0.997576] pci 0000:00:1b.0: BAR 15: no space for [mem size 0x58000000 64bit pref]
[    0.997577] pci 0000:00:1b.0: BAR 15: failed to assign [mem size 0x58000000 64bit pref]
[    0.997578] pci 0000:00:1b.0: BAR 14: assigned [mem 0x88c00000-0x8adfffff]
[    0.997583] pci 0000:01:00.0: BAR 0: no space for [mem size 0x10000000 64bit pref]
[    0.997583] pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x10000000 64bit pref]
[    0.997585] pci 0000:01:00.0: BAR 7: no space for [mem size 0x40000000 64bit pref]
[    0.997585] pci 0000:01:00.0: BAR 7: failed to assign [mem size 0x40000000 64bit pref]
[    0.997587] pci 0000:01:00.0: BAR 9: assigned [mem 0x88c00000-0x8abfffff 64bit pref]
[    0.997593] pci 0000:01:00.0: BAR 12: no space for [mem size 0x02000000]
[    0.997594] pci 0000:01:00.0: BAR 12: failed to assign [mem size 0x02000000]
[    0.997595] pci 0000:01:00.0: BAR 2: assigned [mem 0x8ac00000-0x8adfffff 64bit pref]
[    0.997602] pci 0000:01:00.0: BAR 5: no space for [mem size 0x00040000]
[    0.997602] pci 0000:01:00.0: BAR 5: failed to assign [mem size 0x00040000]
[    0.997603] pci 0000:01:00.0: BAR 6: no space for [mem size 0x00020000 pref]
[    0.997604] pci 0000:01:00.0: BAR 6: failed to assign [mem size 0x00020000 pref]
[    0.997606] pci 0000:00:1b.0: PCI bridge to [bus 01]
[    0.997607] pci 0000:00:1b.0:   bridge window [io  0xe000-0xefff]
[    0.997609] pci 0000:00:1b.0:   bridge window [mem 0x88c00000-0x8adfffff]
...
[    0.997647] pci_bus 0000:00: No. 3 try to assign unassigned res
[    0.997648] release child resource [mem 0x88c00000-0x8abfffff 64bit pref]
[    0.997648] release child resource [mem 0x8ac00000-0x8adfffff 64bit pref]
[    0.997649] pci 0000:00:1b.0: resource 14 [mem 0x88c00000-0x8adfffff] released
[    0.997649] pci 0000:00:1b.0: PCI bridge to [bus 01]
[    0.997651] release child resource [mem 0xdfd00000-0xdfd07fff 64bit]
[    0.997651] pci 0000:00:1c.0: resource 14 [mem 0xdfd00000-0xdfdfffff] released
[    0.997652] pci 0000:00:1c.0: PCI bridge to [bus 02]
[    0.997654] pci 0000:00:1d.0: resource 15 [mem 0x88a00000-0x88bfffff 64bit pref] released
[    0.997654] pci 0000:00:1d.0: PCI bridge to [bus 05]
[    0.997664] pci 0000:00:1b.0: bridge window [mem 0x08000000-0x5fffffff 64bit pref] to [bus 01] add_size 48000000 add_align 8000000
[    0.997666] pci 0000:00:1b.0: bridge window [mem 0x00100000-0x022fffff] to [bus 01] add_size 2200000 add_align 400000
[    0.997687] pci 0000:00:1d.0: bridge window [mem 0x00100000-0x002fffff 64bit pref] to [bus 05] add_size 200000 add_align 100000
[    0.997692] pci 0000:00:1b.0: res[15]=[mem 0x08000000-0x5fffffff 64bit pref] res_to_dev_res add_size 48000000 min_align 8000000
[    0.997693] pci 0000:00:1b.0: res[15]=[mem 0x08000000-0xa7ffffff 64bit pref] res_to_dev_res add_size 48000000 min_align 8000000
[    0.997693] pci 0000:00:1b.0: res[14]=[mem 0x00100000-0x022fffff] res_to_dev_res add_size 2200000 min_align 400000
[    0.997694] pci 0000:00:1b.0: res[14]=[mem 0x00100000-0x044fffff] res_to_dev_res add_size 2200000 min_align 400000
[    0.997695] pci 0000:00:1d.0: res[15]=[mem 0x00100000-0x002fffff 64bit pref] res_to_dev_res add_size 200000 min_align 100000
[    0.997696] pci 0000:00:1d.0: res[15]=[mem 0x00100000-0x004fffff 64bit pref] res_to_dev_res add_size 200000 min_align 100000
[    0.997698] pci 0000:00:1b.0: BAR 15: no space for [mem size 0xa0000000 64bit pref]
[    0.997699] pci 0000:00:1b.0: BAR 15: failed to assign [mem size 0xa0000000 64bit pref]
[    0.997700] pci 0000:00:1b.0: BAR 14: assigned [mem 0x88c00000-0x8cffffff]
[    0.997701] pci 0000:00:1c.0: BAR 14: assigned [mem 0x88a00000-0x88afffff]
[    0.997702] pci 0000:00:1d.0: BAR 15: assigned [mem 0x8d000000-0x8d3fffff 64bit pref]
[    0.997705] pci 0000:00:1b.0: BAR 15: no space for [mem size 0x58000000 64bit pref]
[    0.997706] pci 0000:00:1b.0: BAR 15: failed to assign [mem size 0x58000000 64bit pref]
[    0.997707] pci 0000:00:1b.0: BAR 14: assigned [mem 0x88a00000-0x8abfffff]
[    0.997708] pci 0000:00:1c.0: BAR 14: assigned [mem 0x8ac00000-0x8acfffff]
[    0.997709] pci 0000:00:1d.0: BAR 15: assigned [mem 0x8ad00000-0x8aefffff 64bit pref]
[    0.997711] pci 0000:00:1d.0: BAR 15: reassigned [mem 0x8ad00000-0x8b0fffff 64bit pref] (expanded by 0x200000)
[    0.997713] pci 0000:00:1b.0: BAR 14: reassigned [mem 0x8b400000-0x8f7fffff] (expanded by 0x2200000)
[    0.997719] pci 0000:01:00.0: res[7]=[mem size 0x00000000 64bit pref] res_to_dev_res add_size 40000000 min_align 0
[    0.997720] pci 0000:01:00.0: res[9]=[mem 0x00000000-0xffffffffffffffff 64bit pref] res_to_dev_res add_size 2000000 min_align 0
[    0.997721] pci 0000:01:00.0: res[12]=[mem size 0x00000000] res_to_dev_res add_size 2000000 min_align 0
[    0.997722] pci 0000:01:00.0: BAR 0: no space for [mem size 0x10000000 64bit pref]
[    0.997722] pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x10000000 64bit pref]
[    0.997723] pci 0000:01:00.0: BAR 7: no space for [mem size 0x40000000 64bit pref]
[    0.997724] pci 0000:01:00.0: BAR 7: failed to assign [mem size 0x40000000 64bit pref]
[    0.997725] pci 0000:01:00.0: BAR 9: assigned [mem 0x8b400000-0x8d3fffff 64bit pref]
[    0.997731] pci 0000:01:00.0: BAR 12: assigned [mem 0x8d400000-0x8f3fffff]
[    0.997734] pci 0000:01:00.0: BAR 2: assigned [mem 0x8f400000-0x8f5fffff 64bit pref]
[    0.997740] pci 0000:01:00.0: BAR 5: assigned [mem 0x8f600000-0x8f63ffff]
[    0.997744] pci 0000:01:00.0: BAR 0: no space for [mem size 0x10000000 64bit pref]
[    0.997745] pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x10000000 64bit pref]
[    0.997746] pci 0000:01:00.0: BAR 2: assigned [mem 0x8b400000-0x8b5fffff 64bit pref]
[    0.997753] pci 0000:01:00.0: BAR 5: assigned [mem 0x8b600000-0x8b63ffff]
[    0.997756] pci 0000:01:00.0: BAR 12: assigned [mem 0x8b800000-0x8d7fffff]
[    0.997758] pci 0000:01:00.0: BAR 9: assigned [mem 0x8d800000-0x8f7fffff 64bit pref]
[    0.997765] pci 0000:01:00.0: BAR 7: no space for [mem size 0x40000000 64bit pref]
[    0.997765] pci 0000:01:00.0: BAR 7: failed to assign [mem size 0x40000000 64bit pref]
[    0.997767] pci 0000:00:1b.0: PCI bridge to [bus 01]
[    0.997768] pci 0000:00:1b.0:   bridge window [io  0xe000-0xefff]
[    0.997770] pci 0000:00:1b.0:   bridge window [mem 0x8b400000-0x8f7fffff]
...
[    0.997818] pci_bus 0000:00: Automatically enabled pci realloc, if
you have problem, try booting with pci=realloc=off
Alex Williamson May 26, 2017, 3:52 p.m. UTC | #23
On Fri, 26 May 2017 01:52:35 +0000
"Cheng, Collins" <Collins.Cheng@amd.com> wrote:

> Hi Alex W,
> 
> I don't need the kernel patch anymore. However it looks the kernel could be improved to handle this more gracefully when PCI resource allocation fail. Do you have a plan to improve it in kernel PCI code?

I don't have a device capable of reproducing and I'm currently working
on issues elsewhere.  If you don't plan to continue working on it, I'd
suggest filing a bug at bugzilla.kernel.org so that we can at least
track the problem.  Thanks,

Alex

> -----Original Message-----
> From: Cheng, Collins 
> Sent: Wednesday, May 24, 2017 4:56 PM
> To: 'Alex Williamson' <alex.williamson@redhat.com>
> Cc: Alexander Duyck <alexander.duyck@gmail.com>; Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Deucher, Alexander <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>; Yinghai Lu <yinghai@kernel.org>
> Subject: RE: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV incapable platform
> 
> Hi Alex W, Alex D,
> 
> I just tried two options, one is enable "Above 4G Decoding" in BIOS setup menu, the other is add "pci=realloc=off" in grub. Both can fix this issue. Please see the attached log files.
> 
> Previously I thought "Above 4G Decoding" is enabled, but it is off when I looked CMOS setup today.
> 
> For now I think we have a solution. For the system that supports "Above 4G Decoding", user should enable it when use a SR-IOV supported device. For the system that doesn't support "Above 4G Decoding", user needs to add "pci=realloc=off" in grub.
> 
> Potentially I think kernel still needs to find a way to avoid this issue happen, like keeps the resource as the BIOS assigned value if there is a failure on device's resource reallocation.
> 
> 
> -Collins Cheng
> 
> 
> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com] 
> Sent: Wednesday, May 24, 2017 2:20 AM
> To: Cheng, Collins <Collins.Cheng@amd.com>
> Cc: Alexander Duyck <alexander.duyck@gmail.com>; Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Deucher, Alexander <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>; Yinghai Lu <yinghai@kernel.org>
> Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV incapable platform
> 
> On Tue, 23 May 2017 03:41:21 +0000
> "Cheng, Collins" <Collins.Cheng@amd.com> wrote:
> 
> > Hi Alex,
> > 
> > I owe you a dmesg log. Attachment are two log files. 1.txt is without "pci=earlydump", 2.txt is with "pci=earlydump". The platform is an ASUS Z170-A motherboard that doesn't support SR-IOV. The graphics card is AMD FirePro S7150 card which enabled SR-IOV. 
> > 
> > You could find the error info like below in both logs. From the log, kernel failed to reallocate resource for BAR0 which is PF's Frame Buffer BAR (256MB needed), but kernel reallocated resource for BAR9 which is for VF. You are right, the real bug that is something goes wrong with the reallocation leaving the PF without resources.
> > 
> > [    0.992976] pci 0000:01:00.0: BAR 0: no space for [mem size 0x10000000 64bit pref]
> > [    0.992976] pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x10000000 64bit pref]
> > [    0.992977] pci 0000:01:00.0: BAR 7: no space for [mem size 0x40000000 64bit pref]
> > [    0.992978] pci 0000:01:00.0: BAR 7: failed to assign [mem size 0x40000000 64bit pref]
> > [    0.992979] pci 0000:01:00.0: BAR 9: assigned [mem 0x88c00000-0x8abfffff 64bit pref]
> > [    0.992986] pci 0000:01:00.0: BAR 12: no space for [mem size 0x02000000]
> > [    0.992986] pci 0000:01:00.0: BAR 12: failed to assign [mem size 0x02000000]
> > [    0.992988] pci 0000:01:00.0: BAR 2: assigned [mem 0x8ac00000-0x8adfffff 64bit pref]
> > [    0.992994] pci 0000:01:00.0: BAR 5: no space for [mem size 0x00040000]
> > [    0.992995] pci 0000:01:00.0: BAR 5: failed to assign [mem size 0x00040000]
> > [    0.992996] pci 0000:01:00.0: BAR 6: no space for [mem size 0x00020000 pref]
> > [    0.992997] pci 0000:01:00.0: BAR 6: failed to assign [mem size 0x00020000 pref]  
> 
> I've tried to extract more of the relevant resizing efforts below, perhaps Yinghai or others can make more out of it.  In particular this system offers no 64-bit MMIO and we'll never manage to allocate the necessary SR-IOV resources without it.  AIUI, the PCI core won't try to use anything outside the ACPI _CRS data without the option pci=nocrs.
> This might present a second alternative in addition to the pci=realloc=off, which is actually suggested by the kernel below.  So I think we have at least two potential workarounds in the code as it exists today, one leaving SR-IOV disabled, the other (hopefully) enabling it using 64bit MMIO not described by the system BIOS.
> Certainly an improvement would still be detecting the impossible reallocation problem without nocrs and abandoning the process and of course to revert the process before leaving more BARs unprogrammed than we started with.  Thanks,
> 
> Alex
> 
> [    0.891319] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
> [    0.891321] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
> [    0.891322] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
> [    0.891323] pci_bus 0000:00: root bus resource [mem 0x88800000-0xdfffffff window]
> [    0.891324] pci_bus 0000:00: root bus resource [mem 0xfd000000-0xfe7fffff window]
> [    0.891325] pci_bus 0000:00: root bus resource [bus 00-fe]
> ...
> [    0.896481] pci 0000:01:00.0: [1002:6929] type 00 class 0x030000
> [    0.896496] pci 0000:01:00.0: reg 0x10: [mem 0xc0000000-0xcfffffff 64bit pref]
> [    0.896506] pci 0000:01:00.0: reg 0x18: [mem 0xd0000000-0xd01fffff 64bit pref]
> [    0.896513] pci 0000:01:00.0: reg 0x20: [io  0xe000-0xe0ff]
> [    0.896519] pci 0000:01:00.0: reg 0x24: [mem 0xdfe00000-0xdfe3ffff]
> [    0.896526] pci 0000:01:00.0: reg 0x30: [mem 0xdfe40000-0xdfe5ffff pref]
> [    0.896590] pci 0000:01:00.0: supports D1 D2
> [    0.896590] pci 0000:01:00.0: PME# supported from D1 D2 D3hot D3cold
> [    0.896625] pci 0000:01:00.0: reg 0x354: [mem 0x00000000-0x07ffffff 64bit pref]
> [    0.896626] pci 0000:01:00.0: VF(n) BAR0 space: [mem 0x00000000-0x3fffffff 64bit pref] (contains BAR0 for 8 VFs)
> [    0.896634] pci 0000:01:00.0: reg 0x35c: [mem 0x00000000-0x003fffff 64bit pref]
> [    0.896635] pci 0000:01:00.0: VF(n) BAR2 space: [mem 0x00000000-0x01ffffff 64bit pref] (contains BAR2 for 8 VFs)
> [    0.896646] pci 0000:01:00.0: reg 0x368: [mem 0x00000000-0x003fffff]
> [    0.896647] pci 0000:01:00.0: VF(n) BAR5 space: [mem 0x00000000-0x01ffffff] (contains BAR5 for 8 VFs)
> [    0.896700] pci 0000:01:00.0: System wakeup disabled by ACPI
> [    0.906527] pci 0000:00:1b.0: PCI bridge to [bus 01]
> [    0.906544] pci 0000:00:1b.0:   bridge window [io  0xe000-0xefff]
> [    0.906546] pci 0000:00:1b.0:   bridge window [mem 0xdfe00000-0xdfefffff]
> [    0.906549] pci 0000:00:1b.0:   bridge window [mem 0xc0000000-0xd01fffff 64bit pref]
> [    0.906550] pci 0000:00:1b.0: bridge has subordinate 01 but max busn 02
> ...
> [    0.943584] vgaarb: setting as boot device: PCI:0000:01:00.0
> [    0.943585] vgaarb: device added: PCI:0000:01:00.0,decodes=io+mem,owns=io+mem,locks=none
> [    0.943586] vgaarb: loaded
> [    0.943586] vgaarb: bridge control possible 0000:01:00.0
> ...
> [    0.997491] pci 0000:01:00.0: BAR 7: no space for [mem size 0x40000000 64bit pref]
> [    0.997491] pci 0000:01:00.0: BAR 7: failed to assign [mem size 0x40000000 64bit pref]
> [    0.997493] pci 0000:01:00.0: BAR 9: no space for [mem size 0x02000000 64bit pref]
> [    0.997493] pci 0000:01:00.0: BAR 9: failed to assign [mem size 0x02000000 64bit pref]
> [    0.997495] pci 0000:01:00.0: BAR 12: no space for [mem size 0x02000000]
> [    0.997495] pci 0000:01:00.0: BAR 12: failed to assign [mem size 0x02000000]
> [    0.997497] pci 0000:00:1b.0: PCI bridge to [bus 01]
> [    0.997498] pci 0000:00:1b.0:   bridge window [io  0xe000-0xefff]
> [    0.997501] pci 0000:00:1b.0:   bridge window [mem 0xdfe00000-0xdfefffff]
> [    0.997502] pci 0000:00:1b.0:   bridge window [mem 0xc0000000-0xd01fffff 64bit pref]
> ...
> [    0.997540] pci_bus 0000:00: No. 2 try to assign unassigned res
> [    0.997540] release child resource [mem 0xdfe00000-0xdfe3ffff]
> [    0.997540] release child resource [mem 0xdfe40000-0xdfe5ffff pref]
> [    0.997541] pci 0000:00:1b.0: resource 14 [mem 0xdfe00000-0xdfefffff] released
> [    0.997542] pci 0000:00:1b.0: PCI bridge to [bus 01]
> [    0.997543] release child resource [mem 0xc0000000-0xcfffffff 64bit pref]
> [    0.997544] release child resource [mem 0xd0000000-0xd01fffff 64bit pref]
> [    0.997544] pci 0000:00:1b.0: resource 15 [mem 0xc0000000-0xd01fffff 64bit pref] released
> [    0.997545] pci 0000:00:1b.0: PCI bridge to [bus 01]
> [    0.997576] pci 0000:00:1b.0: BAR 15: no space for [mem size 0x58000000 64bit pref]
> [    0.997577] pci 0000:00:1b.0: BAR 15: failed to assign [mem size 0x58000000 64bit pref]
> [    0.997578] pci 0000:00:1b.0: BAR 14: assigned [mem 0x88c00000-0x8adfffff]
> [    0.997583] pci 0000:01:00.0: BAR 0: no space for [mem size 0x10000000 64bit pref]
> [    0.997583] pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x10000000 64bit pref]
> [    0.997585] pci 0000:01:00.0: BAR 7: no space for [mem size 0x40000000 64bit pref]
> [    0.997585] pci 0000:01:00.0: BAR 7: failed to assign [mem size 0x40000000 64bit pref]
> [    0.997587] pci 0000:01:00.0: BAR 9: assigned [mem 0x88c00000-0x8abfffff 64bit pref]
> [    0.997593] pci 0000:01:00.0: BAR 12: no space for [mem size 0x02000000]
> [    0.997594] pci 0000:01:00.0: BAR 12: failed to assign [mem size 0x02000000]
> [    0.997595] pci 0000:01:00.0: BAR 2: assigned [mem 0x8ac00000-0x8adfffff 64bit pref]
> [    0.997602] pci 0000:01:00.0: BAR 5: no space for [mem size 0x00040000]
> [    0.997602] pci 0000:01:00.0: BAR 5: failed to assign [mem size 0x00040000]
> [    0.997603] pci 0000:01:00.0: BAR 6: no space for [mem size 0x00020000 pref]
> [    0.997604] pci 0000:01:00.0: BAR 6: failed to assign [mem size 0x00020000 pref]
> [    0.997606] pci 0000:00:1b.0: PCI bridge to [bus 01]
> [    0.997607] pci 0000:00:1b.0:   bridge window [io  0xe000-0xefff]
> [    0.997609] pci 0000:00:1b.0:   bridge window [mem 0x88c00000-0x8adfffff]
> ...
> [    0.997647] pci_bus 0000:00: No. 3 try to assign unassigned res
> [    0.997648] release child resource [mem 0x88c00000-0x8abfffff 64bit pref]
> [    0.997648] release child resource [mem 0x8ac00000-0x8adfffff 64bit pref]
> [    0.997649] pci 0000:00:1b.0: resource 14 [mem 0x88c00000-0x8adfffff] released
> [    0.997649] pci 0000:00:1b.0: PCI bridge to [bus 01]
> [    0.997651] release child resource [mem 0xdfd00000-0xdfd07fff 64bit]
> [    0.997651] pci 0000:00:1c.0: resource 14 [mem 0xdfd00000-0xdfdfffff] released
> [    0.997652] pci 0000:00:1c.0: PCI bridge to [bus 02]
> [    0.997654] pci 0000:00:1d.0: resource 15 [mem 0x88a00000-0x88bfffff 64bit pref] released
> [    0.997654] pci 0000:00:1d.0: PCI bridge to [bus 05]
> [    0.997664] pci 0000:00:1b.0: bridge window [mem 0x08000000-0x5fffffff 64bit pref] to [bus 01] add_size 48000000 add_align 8000000
> [    0.997666] pci 0000:00:1b.0: bridge window [mem 0x00100000-0x022fffff] to [bus 01] add_size 2200000 add_align 400000
> [    0.997687] pci 0000:00:1d.0: bridge window [mem 0x00100000-0x002fffff 64bit pref] to [bus 05] add_size 200000 add_align 100000
> [    0.997692] pci 0000:00:1b.0: res[15]=[mem 0x08000000-0x5fffffff 64bit pref] res_to_dev_res add_size 48000000 min_align 8000000
> [    0.997693] pci 0000:00:1b.0: res[15]=[mem 0x08000000-0xa7ffffff 64bit pref] res_to_dev_res add_size 48000000 min_align 8000000
> [    0.997693] pci 0000:00:1b.0: res[14]=[mem 0x00100000-0x022fffff] res_to_dev_res add_size 2200000 min_align 400000
> [    0.997694] pci 0000:00:1b.0: res[14]=[mem 0x00100000-0x044fffff] res_to_dev_res add_size 2200000 min_align 400000
> [    0.997695] pci 0000:00:1d.0: res[15]=[mem 0x00100000-0x002fffff 64bit pref] res_to_dev_res add_size 200000 min_align 100000
> [    0.997696] pci 0000:00:1d.0: res[15]=[mem 0x00100000-0x004fffff 64bit pref] res_to_dev_res add_size 200000 min_align 100000
> [    0.997698] pci 0000:00:1b.0: BAR 15: no space for [mem size 0xa0000000 64bit pref]
> [    0.997699] pci 0000:00:1b.0: BAR 15: failed to assign [mem size 0xa0000000 64bit pref]
> [    0.997700] pci 0000:00:1b.0: BAR 14: assigned [mem 0x88c00000-0x8cffffff]
> [    0.997701] pci 0000:00:1c.0: BAR 14: assigned [mem 0x88a00000-0x88afffff]
> [    0.997702] pci 0000:00:1d.0: BAR 15: assigned [mem 0x8d000000-0x8d3fffff 64bit pref]
> [    0.997705] pci 0000:00:1b.0: BAR 15: no space for [mem size 0x58000000 64bit pref]
> [    0.997706] pci 0000:00:1b.0: BAR 15: failed to assign [mem size 0x58000000 64bit pref]
> [    0.997707] pci 0000:00:1b.0: BAR 14: assigned [mem 0x88a00000-0x8abfffff]
> [    0.997708] pci 0000:00:1c.0: BAR 14: assigned [mem 0x8ac00000-0x8acfffff]
> [    0.997709] pci 0000:00:1d.0: BAR 15: assigned [mem 0x8ad00000-0x8aefffff 64bit pref]
> [    0.997711] pci 0000:00:1d.0: BAR 15: reassigned [mem 0x8ad00000-0x8b0fffff 64bit pref] (expanded by 0x200000)
> [    0.997713] pci 0000:00:1b.0: BAR 14: reassigned [mem 0x8b400000-0x8f7fffff] (expanded by 0x2200000)
> [    0.997719] pci 0000:01:00.0: res[7]=[mem size 0x00000000 64bit pref] res_to_dev_res add_size 40000000 min_align 0
> [    0.997720] pci 0000:01:00.0: res[9]=[mem 0x00000000-0xffffffffffffffff 64bit pref] res_to_dev_res add_size 2000000 min_align 0
> [    0.997721] pci 0000:01:00.0: res[12]=[mem size 0x00000000] res_to_dev_res add_size 2000000 min_align 0
> [    0.997722] pci 0000:01:00.0: BAR 0: no space for [mem size 0x10000000 64bit pref]
> [    0.997722] pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x10000000 64bit pref]
> [    0.997723] pci 0000:01:00.0: BAR 7: no space for [mem size 0x40000000 64bit pref]
> [    0.997724] pci 0000:01:00.0: BAR 7: failed to assign [mem size 0x40000000 64bit pref]
> [    0.997725] pci 0000:01:00.0: BAR 9: assigned [mem 0x8b400000-0x8d3fffff 64bit pref]
> [    0.997731] pci 0000:01:00.0: BAR 12: assigned [mem 0x8d400000-0x8f3fffff]
> [    0.997734] pci 0000:01:00.0: BAR 2: assigned [mem 0x8f400000-0x8f5fffff 64bit pref]
> [    0.997740] pci 0000:01:00.0: BAR 5: assigned [mem 0x8f600000-0x8f63ffff]
> [    0.997744] pci 0000:01:00.0: BAR 0: no space for [mem size 0x10000000 64bit pref]
> [    0.997745] pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x10000000 64bit pref]
> [    0.997746] pci 0000:01:00.0: BAR 2: assigned [mem 0x8b400000-0x8b5fffff 64bit pref]
> [    0.997753] pci 0000:01:00.0: BAR 5: assigned [mem 0x8b600000-0x8b63ffff]
> [    0.997756] pci 0000:01:00.0: BAR 12: assigned [mem 0x8b800000-0x8d7fffff]
> [    0.997758] pci 0000:01:00.0: BAR 9: assigned [mem 0x8d800000-0x8f7fffff 64bit pref]
> [    0.997765] pci 0000:01:00.0: BAR 7: no space for [mem size 0x40000000 64bit pref]
> [    0.997765] pci 0000:01:00.0: BAR 7: failed to assign [mem size 0x40000000 64bit pref]
> [    0.997767] pci 0000:00:1b.0: PCI bridge to [bus 01]
> [    0.997768] pci 0000:00:1b.0:   bridge window [io  0xe000-0xefff]
> [    0.997770] pci 0000:00:1b.0:   bridge window [mem 0x8b400000-0x8f7fffff]
> ...
> [    0.997818] pci_bus 0000:00: Automatically enabled pci realloc, if
> you have problem, try booting with pci=realloc=off
Cheng, Collins May 27, 2017, 8:17 a.m. UTC | #24
Thanks Alex. I know it is difficult to reproduce this issue on your side.

I just created a bug to track it at https://bugzilla.kernel.org/show_bug.cgi?id=195891


-Collins Cheng


-----Original Message-----
From: Alex Williamson [mailto:alex.williamson@redhat.com] 
Sent: Friday, May 26, 2017 11:52 PM
To: Cheng, Collins <Collins.Cheng@amd.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>; Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Deucher, Alexander <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>; Yinghai Lu <yinghai@kernel.org>
Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV incapable platform

On Fri, 26 May 2017 01:52:35 +0000
"Cheng, Collins" <Collins.Cheng@amd.com> wrote:

> Hi Alex W,
> 
> I don't need the kernel patch anymore. However it looks the kernel could be improved to handle this more gracefully when PCI resource allocation fail. Do you have a plan to improve it in kernel PCI code?

I don't have a device capable of reproducing and I'm currently working on issues elsewhere.  If you don't plan to continue working on it, I'd suggest filing a bug at bugzilla.kernel.org so that we can at least track the problem.  Thanks,

Alex

> -----Original Message-----
> From: Cheng, Collins
> Sent: Wednesday, May 24, 2017 4:56 PM
> To: 'Alex Williamson' <alex.williamson@redhat.com>
> Cc: Alexander Duyck <alexander.duyck@gmail.com>; Bjorn Helgaas 
> <bhelgaas@google.com>; linux-pci@vger.kernel.org; 
> linux-kernel@vger.kernel.org; Deucher, Alexander 
> <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>; 
> Yinghai Lu <yinghai@kernel.org>
> Subject: RE: [PATCH] PCI: Make SR-IOV capable GPU working on the 
> SR-IOV incapable platform
> 
> Hi Alex W, Alex D,
> 
> I just tried two options, one is enable "Above 4G Decoding" in BIOS setup menu, the other is add "pci=realloc=off" in grub. Both can fix this issue. Please see the attached log files.
> 
> Previously I thought "Above 4G Decoding" is enabled, but it is off when I looked CMOS setup today.
> 
> For now I think we have a solution. For the system that supports "Above 4G Decoding", user should enable it when use a SR-IOV supported device. For the system that doesn't support "Above 4G Decoding", user needs to add "pci=realloc=off" in grub.
> 
> Potentially I think kernel still needs to find a way to avoid this issue happen, like keeps the resource as the BIOS assigned value if there is a failure on device's resource reallocation.
> 
> 
> -Collins Cheng
> 
> 
> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Wednesday, May 24, 2017 2:20 AM
> To: Cheng, Collins <Collins.Cheng@amd.com>
> Cc: Alexander Duyck <alexander.duyck@gmail.com>; Bjorn Helgaas 
> <bhelgaas@google.com>; linux-pci@vger.kernel.org; 
> linux-kernel@vger.kernel.org; Deucher, Alexander 
> <Alexander.Deucher@amd.com>; Zytaruk, Kelly <Kelly.Zytaruk@amd.com>; 
> Yinghai Lu <yinghai@kernel.org>
> Subject: Re: [PATCH] PCI: Make SR-IOV capable GPU working on the 
> SR-IOV incapable platform
> 
> On Tue, 23 May 2017 03:41:21 +0000
> "Cheng, Collins" <Collins.Cheng@amd.com> wrote:
> 
> > Hi Alex,
> > 
> > I owe you a dmesg log. Attachment are two log files. 1.txt is without "pci=earlydump", 2.txt is with "pci=earlydump". The platform is an ASUS Z170-A motherboard that doesn't support SR-IOV. The graphics card is AMD FirePro S7150 card which enabled SR-IOV. 
> > 
> > You could find the error info like below in both logs. From the log, kernel failed to reallocate resource for BAR0 which is PF's Frame Buffer BAR (256MB needed), but kernel reallocated resource for BAR9 which is for VF. You are right, the real bug that is something goes wrong with the reallocation leaving the PF without resources.
> > 
> > [    0.992976] pci 0000:01:00.0: BAR 0: no space for [mem size 0x10000000 64bit pref]
> > [    0.992976] pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x10000000 64bit pref]
> > [    0.992977] pci 0000:01:00.0: BAR 7: no space for [mem size 0x40000000 64bit pref]
> > [    0.992978] pci 0000:01:00.0: BAR 7: failed to assign [mem size 0x40000000 64bit pref]
> > [    0.992979] pci 0000:01:00.0: BAR 9: assigned [mem 0x88c00000-0x8abfffff 64bit pref]
> > [    0.992986] pci 0000:01:00.0: BAR 12: no space for [mem size 0x02000000]
> > [    0.992986] pci 0000:01:00.0: BAR 12: failed to assign [mem size 0x02000000]
> > [    0.992988] pci 0000:01:00.0: BAR 2: assigned [mem 0x8ac00000-0x8adfffff 64bit pref]
> > [    0.992994] pci 0000:01:00.0: BAR 5: no space for [mem size 0x00040000]
> > [    0.992995] pci 0000:01:00.0: BAR 5: failed to assign [mem size 0x00040000]
> > [    0.992996] pci 0000:01:00.0: BAR 6: no space for [mem size 0x00020000 pref]
> > [    0.992997] pci 0000:01:00.0: BAR 6: failed to assign [mem size 0x00020000 pref]  
> 
> I've tried to extract more of the relevant resizing efforts below, perhaps Yinghai or others can make more out of it.  In particular this system offers no 64-bit MMIO and we'll never manage to allocate the necessary SR-IOV resources without it.  AIUI, the PCI core won't try to use anything outside the ACPI _CRS data without the option pci=nocrs.
> This might present a second alternative in addition to the pci=realloc=off, which is actually suggested by the kernel below.  So I think we have at least two potential workarounds in the code as it exists today, one leaving SR-IOV disabled, the other (hopefully) enabling it using 64bit MMIO not described by the system BIOS.
> Certainly an improvement would still be detecting the impossible 
> reallocation problem without nocrs and abandoning the process and of 
> course to revert the process before leaving more BARs unprogrammed 
> than we started with.  Thanks,
> 
> Alex
> 
> [    0.891319] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
> [    0.891321] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
> [    0.891322] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
> [    0.891323] pci_bus 0000:00: root bus resource [mem 0x88800000-0xdfffffff window]
> [    0.891324] pci_bus 0000:00: root bus resource [mem 0xfd000000-0xfe7fffff window]
> [    0.891325] pci_bus 0000:00: root bus resource [bus 00-fe]
> ...
> [    0.896481] pci 0000:01:00.0: [1002:6929] type 00 class 0x030000
> [    0.896496] pci 0000:01:00.0: reg 0x10: [mem 0xc0000000-0xcfffffff 64bit pref]
> [    0.896506] pci 0000:01:00.0: reg 0x18: [mem 0xd0000000-0xd01fffff 64bit pref]
> [    0.896513] pci 0000:01:00.0: reg 0x20: [io  0xe000-0xe0ff]
> [    0.896519] pci 0000:01:00.0: reg 0x24: [mem 0xdfe00000-0xdfe3ffff]
> [    0.896526] pci 0000:01:00.0: reg 0x30: [mem 0xdfe40000-0xdfe5ffff pref]
> [    0.896590] pci 0000:01:00.0: supports D1 D2
> [    0.896590] pci 0000:01:00.0: PME# supported from D1 D2 D3hot D3cold
> [    0.896625] pci 0000:01:00.0: reg 0x354: [mem 0x00000000-0x07ffffff 64bit pref]
> [    0.896626] pci 0000:01:00.0: VF(n) BAR0 space: [mem 0x00000000-0x3fffffff 64bit pref] (contains BAR0 for 8 VFs)
> [    0.896634] pci 0000:01:00.0: reg 0x35c: [mem 0x00000000-0x003fffff 64bit pref]
> [    0.896635] pci 0000:01:00.0: VF(n) BAR2 space: [mem 0x00000000-0x01ffffff 64bit pref] (contains BAR2 for 8 VFs)
> [    0.896646] pci 0000:01:00.0: reg 0x368: [mem 0x00000000-0x003fffff]
> [    0.896647] pci 0000:01:00.0: VF(n) BAR5 space: [mem 0x00000000-0x01ffffff] (contains BAR5 for 8 VFs)
> [    0.896700] pci 0000:01:00.0: System wakeup disabled by ACPI
> [    0.906527] pci 0000:00:1b.0: PCI bridge to [bus 01]
> [    0.906544] pci 0000:00:1b.0:   bridge window [io  0xe000-0xefff]
> [    0.906546] pci 0000:00:1b.0:   bridge window [mem 0xdfe00000-0xdfefffff]
> [    0.906549] pci 0000:00:1b.0:   bridge window [mem 0xc0000000-0xd01fffff 64bit pref]
> [    0.906550] pci 0000:00:1b.0: bridge has subordinate 01 but max busn 02
> ...
> [    0.943584] vgaarb: setting as boot device: PCI:0000:01:00.0
> [    0.943585] vgaarb: device added: PCI:0000:01:00.0,decodes=io+mem,owns=io+mem,locks=none
> [    0.943586] vgaarb: loaded
> [    0.943586] vgaarb: bridge control possible 0000:01:00.0
> ...
> [    0.997491] pci 0000:01:00.0: BAR 7: no space for [mem size 0x40000000 64bit pref]
> [    0.997491] pci 0000:01:00.0: BAR 7: failed to assign [mem size 0x40000000 64bit pref]
> [    0.997493] pci 0000:01:00.0: BAR 9: no space for [mem size 0x02000000 64bit pref]
> [    0.997493] pci 0000:01:00.0: BAR 9: failed to assign [mem size 0x02000000 64bit pref]
> [    0.997495] pci 0000:01:00.0: BAR 12: no space for [mem size 0x02000000]
> [    0.997495] pci 0000:01:00.0: BAR 12: failed to assign [mem size 0x02000000]
> [    0.997497] pci 0000:00:1b.0: PCI bridge to [bus 01]
> [    0.997498] pci 0000:00:1b.0:   bridge window [io  0xe000-0xefff]
> [    0.997501] pci 0000:00:1b.0:   bridge window [mem 0xdfe00000-0xdfefffff]
> [    0.997502] pci 0000:00:1b.0:   bridge window [mem 0xc0000000-0xd01fffff 64bit pref]
> ...
> [    0.997540] pci_bus 0000:00: No. 2 try to assign unassigned res
> [    0.997540] release child resource [mem 0xdfe00000-0xdfe3ffff]
> [    0.997540] release child resource [mem 0xdfe40000-0xdfe5ffff pref]
> [    0.997541] pci 0000:00:1b.0: resource 14 [mem 0xdfe00000-0xdfefffff] released
> [    0.997542] pci 0000:00:1b.0: PCI bridge to [bus 01]
> [    0.997543] release child resource [mem 0xc0000000-0xcfffffff 64bit pref]
> [    0.997544] release child resource [mem 0xd0000000-0xd01fffff 64bit pref]
> [    0.997544] pci 0000:00:1b.0: resource 15 [mem 0xc0000000-0xd01fffff 64bit pref] released
> [    0.997545] pci 0000:00:1b.0: PCI bridge to [bus 01]
> [    0.997576] pci 0000:00:1b.0: BAR 15: no space for [mem size 0x58000000 64bit pref]
> [    0.997577] pci 0000:00:1b.0: BAR 15: failed to assign [mem size 0x58000000 64bit pref]
> [    0.997578] pci 0000:00:1b.0: BAR 14: assigned [mem 0x88c00000-0x8adfffff]
> [    0.997583] pci 0000:01:00.0: BAR 0: no space for [mem size 0x10000000 64bit pref]
> [    0.997583] pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x10000000 64bit pref]
> [    0.997585] pci 0000:01:00.0: BAR 7: no space for [mem size 0x40000000 64bit pref]
> [    0.997585] pci 0000:01:00.0: BAR 7: failed to assign [mem size 0x40000000 64bit pref]
> [    0.997587] pci 0000:01:00.0: BAR 9: assigned [mem 0x88c00000-0x8abfffff 64bit pref]
> [    0.997593] pci 0000:01:00.0: BAR 12: no space for [mem size 0x02000000]
> [    0.997594] pci 0000:01:00.0: BAR 12: failed to assign [mem size 0x02000000]
> [    0.997595] pci 0000:01:00.0: BAR 2: assigned [mem 0x8ac00000-0x8adfffff 64bit pref]
> [    0.997602] pci 0000:01:00.0: BAR 5: no space for [mem size 0x00040000]
> [    0.997602] pci 0000:01:00.0: BAR 5: failed to assign [mem size 0x00040000]
> [    0.997603] pci 0000:01:00.0: BAR 6: no space for [mem size 0x00020000 pref]
> [    0.997604] pci 0000:01:00.0: BAR 6: failed to assign [mem size 0x00020000 pref]
> [    0.997606] pci 0000:00:1b.0: PCI bridge to [bus 01]
> [    0.997607] pci 0000:00:1b.0:   bridge window [io  0xe000-0xefff]
> [    0.997609] pci 0000:00:1b.0:   bridge window [mem 0x88c00000-0x8adfffff]
> ...
> [    0.997647] pci_bus 0000:00: No. 3 try to assign unassigned res
> [    0.997648] release child resource [mem 0x88c00000-0x8abfffff 64bit pref]
> [    0.997648] release child resource [mem 0x8ac00000-0x8adfffff 64bit pref]
> [    0.997649] pci 0000:00:1b.0: resource 14 [mem 0x88c00000-0x8adfffff] released
> [    0.997649] pci 0000:00:1b.0: PCI bridge to [bus 01]
> [    0.997651] release child resource [mem 0xdfd00000-0xdfd07fff 64bit]
> [    0.997651] pci 0000:00:1c.0: resource 14 [mem 0xdfd00000-0xdfdfffff] released
> [    0.997652] pci 0000:00:1c.0: PCI bridge to [bus 02]
> [    0.997654] pci 0000:00:1d.0: resource 15 [mem 0x88a00000-0x88bfffff 64bit pref] released
> [    0.997654] pci 0000:00:1d.0: PCI bridge to [bus 05]
> [    0.997664] pci 0000:00:1b.0: bridge window [mem 0x08000000-0x5fffffff 64bit pref] to [bus 01] add_size 48000000 add_align 8000000
> [    0.997666] pci 0000:00:1b.0: bridge window [mem 0x00100000-0x022fffff] to [bus 01] add_size 2200000 add_align 400000
> [    0.997687] pci 0000:00:1d.0: bridge window [mem 0x00100000-0x002fffff 64bit pref] to [bus 05] add_size 200000 add_align 100000
> [    0.997692] pci 0000:00:1b.0: res[15]=[mem 0x08000000-0x5fffffff 64bit pref] res_to_dev_res add_size 48000000 min_align 8000000
> [    0.997693] pci 0000:00:1b.0: res[15]=[mem 0x08000000-0xa7ffffff 64bit pref] res_to_dev_res add_size 48000000 min_align 8000000
> [    0.997693] pci 0000:00:1b.0: res[14]=[mem 0x00100000-0x022fffff] res_to_dev_res add_size 2200000 min_align 400000
> [    0.997694] pci 0000:00:1b.0: res[14]=[mem 0x00100000-0x044fffff] res_to_dev_res add_size 2200000 min_align 400000
> [    0.997695] pci 0000:00:1d.0: res[15]=[mem 0x00100000-0x002fffff 64bit pref] res_to_dev_res add_size 200000 min_align 100000
> [    0.997696] pci 0000:00:1d.0: res[15]=[mem 0x00100000-0x004fffff 64bit pref] res_to_dev_res add_size 200000 min_align 100000
> [    0.997698] pci 0000:00:1b.0: BAR 15: no space for [mem size 0xa0000000 64bit pref]
> [    0.997699] pci 0000:00:1b.0: BAR 15: failed to assign [mem size 0xa0000000 64bit pref]
> [    0.997700] pci 0000:00:1b.0: BAR 14: assigned [mem 0x88c00000-0x8cffffff]
> [    0.997701] pci 0000:00:1c.0: BAR 14: assigned [mem 0x88a00000-0x88afffff]
> [    0.997702] pci 0000:00:1d.0: BAR 15: assigned [mem 0x8d000000-0x8d3fffff 64bit pref]
> [    0.997705] pci 0000:00:1b.0: BAR 15: no space for [mem size 0x58000000 64bit pref]
> [    0.997706] pci 0000:00:1b.0: BAR 15: failed to assign [mem size 0x58000000 64bit pref]
> [    0.997707] pci 0000:00:1b.0: BAR 14: assigned [mem 0x88a00000-0x8abfffff]
> [    0.997708] pci 0000:00:1c.0: BAR 14: assigned [mem 0x8ac00000-0x8acfffff]
> [    0.997709] pci 0000:00:1d.0: BAR 15: assigned [mem 0x8ad00000-0x8aefffff 64bit pref]
> [    0.997711] pci 0000:00:1d.0: BAR 15: reassigned [mem 0x8ad00000-0x8b0fffff 64bit pref] (expanded by 0x200000)
> [    0.997713] pci 0000:00:1b.0: BAR 14: reassigned [mem 0x8b400000-0x8f7fffff] (expanded by 0x2200000)
> [    0.997719] pci 0000:01:00.0: res[7]=[mem size 0x00000000 64bit pref] res_to_dev_res add_size 40000000 min_align 0
> [    0.997720] pci 0000:01:00.0: res[9]=[mem 0x00000000-0xffffffffffffffff 64bit pref] res_to_dev_res add_size 2000000 min_align 0
> [    0.997721] pci 0000:01:00.0: res[12]=[mem size 0x00000000] res_to_dev_res add_size 2000000 min_align 0
> [    0.997722] pci 0000:01:00.0: BAR 0: no space for [mem size 0x10000000 64bit pref]
> [    0.997722] pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x10000000 64bit pref]
> [    0.997723] pci 0000:01:00.0: BAR 7: no space for [mem size 0x40000000 64bit pref]
> [    0.997724] pci 0000:01:00.0: BAR 7: failed to assign [mem size 0x40000000 64bit pref]
> [    0.997725] pci 0000:01:00.0: BAR 9: assigned [mem 0x8b400000-0x8d3fffff 64bit pref]
> [    0.997731] pci 0000:01:00.0: BAR 12: assigned [mem 0x8d400000-0x8f3fffff]
> [    0.997734] pci 0000:01:00.0: BAR 2: assigned [mem 0x8f400000-0x8f5fffff 64bit pref]
> [    0.997740] pci 0000:01:00.0: BAR 5: assigned [mem 0x8f600000-0x8f63ffff]
> [    0.997744] pci 0000:01:00.0: BAR 0: no space for [mem size 0x10000000 64bit pref]
> [    0.997745] pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x10000000 64bit pref]
> [    0.997746] pci 0000:01:00.0: BAR 2: assigned [mem 0x8b400000-0x8b5fffff 64bit pref]
> [    0.997753] pci 0000:01:00.0: BAR 5: assigned [mem 0x8b600000-0x8b63ffff]
> [    0.997756] pci 0000:01:00.0: BAR 12: assigned [mem 0x8b800000-0x8d7fffff]
> [    0.997758] pci 0000:01:00.0: BAR 9: assigned [mem 0x8d800000-0x8f7fffff 64bit pref]
> [    0.997765] pci 0000:01:00.0: BAR 7: no space for [mem size 0x40000000 64bit pref]
> [    0.997765] pci 0000:01:00.0: BAR 7: failed to assign [mem size 0x40000000 64bit pref]
> [    0.997767] pci 0000:00:1b.0: PCI bridge to [bus 01]
> [    0.997768] pci 0000:00:1b.0:   bridge window [io  0xe000-0xefff]
> [    0.997770] pci 0000:00:1b.0:   bridge window [mem 0x8b400000-0x8f7fffff]
> ...
> [    0.997818] pci_bus 0000:00: Automatically enabled pci realloc, if
> you have problem, try booting with pci=realloc=off
diff mbox

Patch

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index e30f05c..e4f1405 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -523,6 +523,45 @@  static void sriov_restore_state(struct pci_dev *dev)
 		msleep(100);
 }
 
+/*
+ * pci_vf_bar_valid - check if VF BARs have resource allocated
+ * @dev: the PCI device
+ * @pos: register offset of SR-IOV capability in PCI config space
+ * Returns true any VF BAR has resource allocated, false
+ * if all VF BARs are empty.
+ */
+static bool pci_vf_bar_valid(struct pci_dev *dev, int pos)
+{
+	int i;
+	u32 bar_value;
+	u32 bar_size_mask = ~(PCI_BASE_ADDRESS_SPACE |
+			PCI_BASE_ADDRESS_MEM_TYPE_64 |
+			PCI_BASE_ADDRESS_MEM_PREFETCH);
+
+	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
+		pci_read_config_dword(dev, pos + PCI_SRIOV_BAR + i * 4, &bar_value);
+		if (bar_value & bar_size_mask)
+			return true;
+	}
+
+	return false;
+}
+
+/*
+ * is_amd_display_adapter - check if it is an AMD/ATI GPU device
+ * @dev: the PCI device
+ *
+ * Returns true if device is an AMD/ATI display adapter,
+ * otherwise return false.
+ */
+
+static bool is_amd_display_adapter(struct pci_dev *dev)
+{
+	return (((dev->class >> 16) == PCI_BASE_CLASS_DISPLAY) &&
+		(dev->vendor == PCI_VENDOR_ID_ATI ||
+		dev->vendor == PCI_VENDOR_ID_AMD));
+}
+
 /**
  * pci_iov_init - initialize the IOV capability
  * @dev: the PCI device
@@ -537,9 +576,27 @@  int pci_iov_init(struct pci_dev *dev)
 		return -ENODEV;
 
 	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);
-	if (pos)
-		return sriov_init(dev, pos);
-
+	if (pos) {
+	/*
+	 * If the device is an AMD graphics device and it supports
+	 * SR-IOV it will require a large amount of resources.
+	 * Before calling sriov_init() must ensure that the system
+	 * BIOS also supports SR-IOV and that system BIOS has been
+	 * able to allocate enough resources.
+	 * If the VF BARs are zero then the system BIOS does not
+	 * support SR-IOV or it could not allocate the resources
+	 * and this platform will not support AMD graphics SR-IOV.
+	 * Therefore do not call sriov_init().
+	 * If the system BIOS does support SR-IOV then the VF BARs
+	 * will be properly initialized to non-zero values.
+	 */
+		if (is_amd_display_adapter(dev)) {
+			if (pci_vf_bar_valid(dev, pos))
+				return sriov_init(dev, pos);
+		} else {
+			return sriov_init(dev, pos);
+		}
+	}
 	return -ENODEV;
 }