[RFC] PCI: add support for Immediate Readiness

Message ID 20180802113635.7097-1-felipe.balbi@linux.intel.com
State Changes Requested
Delegated to: Bjorn Helgaas
Headers show
Series
  • [RFC] PCI: add support for Immediate Readiness
Related show

Commit Message

Felipe Balbi Aug. 2, 2018, 11:36 a.m.
PCIe GEN4 defines a new bit on Status Register which tells us that, if
Set, a function is immediately ready after a Reset. This means that
all delays after a Conventional or Function Reset can be skipped.

This patch reads such bit and caches its value in a flag inside struct
pci_dev to be checked later if we should delay or can skip delays
after a reset.

Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
---
 drivers/pci/pci.c             | 9 ++++++++-
 include/linux/pci.h           | 1 +
 include/uapi/linux/pci_regs.h | 1 +
 3 files changed, 10 insertions(+), 1 deletion(-)

Comments

Christoph Hellwig Aug. 2, 2018, 11:56 a.m. | #1
> +	unsigned int	immediate:1;	/* Supports Immediate Readiness */

That name is a bit too generic.  immediate_ready (or imm_ready if you
need it short) would be a lot more descriptive.
Felipe Balbi Aug. 2, 2018, 12:11 p.m. | #2
Hi,

Christoph Hellwig <hch@infradead.org> writes:
>> +	unsigned int	immediate:1;	/* Supports Immediate Readiness */
>
> That name is a bit too generic.  immediate_ready (or imm_ready if you
> need it short) would be a lot more descriptive.

That can be easily patched. I'll until tomorrow for more comments then
send a v2
Sinan Kaya Aug. 3, 2018, 6:14 a.m. | #3
On 8/2/2018 7:36 AM, Felipe Balbi wrote:
> PCIe GEN4 defines a new bit on Status Register which tells us that, if
> Set, a function is immediately ready after a Reset. This means that
> all delays after a Conventional or Function Reset can be skipped.

Can you give a reference to the section of the specification? or
a pointer to the ECN?
Felipe Balbi Aug. 3, 2018, 6:26 a.m. | #4
Sinan Kaya <okaya@kernel.org> writes:

> On 8/2/2018 7:36 AM, Felipe Balbi wrote:
>> PCIe GEN4 defines a new bit on Status Register which tells us that, if
>> Set, a function is immediately ready after a Reset. This means that
>> all delays after a Conventional or Function Reset can be skipped.
>
> Can you give a reference to the section of the specification? or
> a pointer to the ECN?

Section 7.5.1.1.4 of PCIe GEN4 spec. Table 7-4:

Immediate Readiness – This optional bit, when Set, indicates the
Function is guaranteed to be ready to successfully complete valid
configuration accesses at any time following any reset that the host is
capable of issuing Configuration Requests to this Function.

When this bit is Set, for accesses to this Function, software is exempt
from all requirements to delay configuration accesses following any type
of reset, including but not limited to the timing requirements defined
in Section 6.6.  How this guarantee is established is beyond the scope
of this document.

It is permitted that system software/firmware provide mechanisms that
supersede the indication provided by this bit, however such
software/firmware mechanisms are outside the scope of this
specification.
Sinan Kaya Aug. 3, 2018, 5:25 p.m. | #5
On 8/3/2018 2:26 AM, Felipe Balbi wrote:
> Sinan Kaya <okaya@kernel.org> writes:
> 
>> On 8/2/2018 7:36 AM, Felipe Balbi wrote:
>>> PCIe GEN4 defines a new bit on Status Register which tells us that, if
>>> Set, a function is immediately ready after a Reset. This means that
>>> all delays after a Conventional or Function Reset can be skipped.
>>
>> Can you give a reference to the section of the specification? or
>> a pointer to the ECN?
> 
> Section 7.5.1.1.4 of PCIe GEN4 spec. Table 7-4:
> 
> Immediate Readiness – This optional bit, when Set, indicates the
> Function is guaranteed to be ready to successfully complete valid
> configuration accesses at any time following any reset that the host is
> capable of issuing Configuration Requests to this Function.
> 
> When this bit is Set, for accesses to this Function, software is exempt
> from all requirements to delay configuration accesses following any type
> of reset, including but not limited to the timing requirements defined
> in Section 6.6.  How this guarantee is established is beyond the scope
> of this document.
> 
> It is permitted that system software/firmware provide mechanisms that
> supersede the indication provided by this bit, however such
> software/firmware mechanisms are outside the scope of this
> specification.
> 

Thanks for the spec reference.

I think the patch is touching the wrong places. pci_dev_wait() is there
to wait for CRS response to finish following reset.

Typical sequences are:
1. Do some kind of reset in another routine
2. Wait reset specific wait time (1sec for secondary bus reset as an
example and 100ms for d3-d0 transition)
3. call pci_dev_wait() after reset to see if device can accept config
transactions.

Since this applies to all resets, I think you also need to get rid of
waits following different reset types in step #2 and return immediately.
I suggest you review callers of pci_dev_wait() and tap in there.

Another thing is that this is a common functionality. Initializing
the flag in pm_init() would not be the best place.
Bjorn Helgaas Sept. 4, 2018, 6:11 p.m. | #6
On Fri, Aug 03, 2018 at 01:25:58PM -0400, Sinan Kaya wrote:
> On 8/3/2018 2:26 AM, Felipe Balbi wrote:
> > Sinan Kaya <okaya@kernel.org> writes:
> > 
> > > On 8/2/2018 7:36 AM, Felipe Balbi wrote:
> > > > PCIe GEN4 defines a new bit on Status Register which tells us that, if
> > > > Set, a function is immediately ready after a Reset. This means that
> > > > all delays after a Conventional or Function Reset can be skipped.
> > > 
> > > Can you give a reference to the section of the specification? or
> > > a pointer to the ECN?
> > 
> > Section 7.5.1.1.4 of PCIe GEN4 spec. Table 7-4:
> > 
> > Immediate Readiness – This optional bit, when Set, indicates the
> > Function is guaranteed to be ready to successfully complete valid
> > configuration accesses at any time following any reset that the host is
> > capable of issuing Configuration Requests to this Function.
> > 
> > When this bit is Set, for accesses to this Function, software is exempt
> > from all requirements to delay configuration accesses following any type
> > of reset, including but not limited to the timing requirements defined
> > in Section 6.6.  How this guarantee is established is beyond the scope
> > of this document.
> > 
> > It is permitted that system software/firmware provide mechanisms that
> > supersede the indication provided by this bit, however such
> > software/firmware mechanisms are outside the scope of this
> > specification.
> > 
> 
> Thanks for the spec reference.

Yes.  Please include the reference in the changelog of v2.

> I think the patch is touching the wrong places. pci_dev_wait() is there
> to wait for CRS response to finish following reset.
> 
> Typical sequences are:
> 1. Do some kind of reset in another routine
> 2. Wait reset specific wait time (1sec for secondary bus reset as an
> example and 100ms for d3-d0 transition)
> 3. call pci_dev_wait() after reset to see if device can accept config
> transactions.
> 
> Since this applies to all resets, I think you also need to get rid of
> waits following different reset types in step #2 and return immediately.
> I suggest you review callers of pci_dev_wait() and tap in there.

I agree; I think we should be able to skip the delays in pcie_flr(),
pci_af_flr(), etc.

> Another thing is that this is a common functionality. Initializing
> the flag in pm_init() would not be the best place.

The immediate readiness thing is not directly related to PM, but
pci_pm_init() does deal with similar delays, and I don't have a better
suggestion off the top of my head.

Bjorn
Felipe Balbi Sept. 5, 2018, 5:18 a.m. | #7
Hi,

Bjorn Helgaas <helgaas@kernel.org> writes:
> On Fri, Aug 03, 2018 at 01:25:58PM -0400, Sinan Kaya wrote:
>> On 8/3/2018 2:26 AM, Felipe Balbi wrote:
>> > Sinan Kaya <okaya@kernel.org> writes:
>> > 
>> > > On 8/2/2018 7:36 AM, Felipe Balbi wrote:
>> > > > PCIe GEN4 defines a new bit on Status Register which tells us that, if
>> > > > Set, a function is immediately ready after a Reset. This means that
>> > > > all delays after a Conventional or Function Reset can be skipped.
>> > > 
>> > > Can you give a reference to the section of the specification? or
>> > > a pointer to the ECN?
>> > 
>> > Section 7.5.1.1.4 of PCIe GEN4 spec. Table 7-4:
>> > 
>> > Immediate Readiness – This optional bit, when Set, indicates the
>> > Function is guaranteed to be ready to successfully complete valid
>> > configuration accesses at any time following any reset that the host is
>> > capable of issuing Configuration Requests to this Function.
>> > 
>> > When this bit is Set, for accesses to this Function, software is exempt
>> > from all requirements to delay configuration accesses following any type
>> > of reset, including but not limited to the timing requirements defined
>> > in Section 6.6.  How this guarantee is established is beyond the scope
>> > of this document.
>> > 
>> > It is permitted that system software/firmware provide mechanisms that
>> > supersede the indication provided by this bit, however such
>> > software/firmware mechanisms are outside the scope of this
>> > specification.
>> > 
>> 
>> Thanks for the spec reference.
>
> Yes.  Please include the reference in the changelog of v2.

will do

>> I think the patch is touching the wrong places. pci_dev_wait() is there
>> to wait for CRS response to finish following reset.
>> 
>> Typical sequences are:
>> 1. Do some kind of reset in another routine
>> 2. Wait reset specific wait time (1sec for secondary bus reset as an
>> example and 100ms for d3-d0 transition)
>> 3. call pci_dev_wait() after reset to see if device can accept config
>> transactions.
>> 
>> Since this applies to all resets, I think you also need to get rid of
>> waits following different reset types in step #2 and return immediately.
>> I suggest you review callers of pci_dev_wait() and tap in there.
>
> I agree; I think we should be able to skip the delays in pcie_flr(),
> pci_af_flr(), etc.

but that's why I put the code in pci_dev_wait(). Both pcie_flr() and
pci_af_flr() call pci_dev_wait(); or are you saying that the msleep(100)
call in pci_af_flr() can also be removed if (dev->imm_ready)?
Felipe Balbi Sept. 5, 2018, 5:23 a.m. | #8
Hi,

Felipe Balbi <felipe.balbi@linux.intel.com> writes:
>> Since this applies to all resets, I think you also need to get rid of
>>> waits following different reset types in step #2 and return immediately.
>>> I suggest you review callers of pci_dev_wait() and tap in there.
>>
>> I agree; I think we should be able to skip the delays in pcie_flr(),
>> pci_af_flr(), etc.
>
> but that's why I put the code in pci_dev_wait(). Both pcie_flr() and
> pci_af_flr() call pci_dev_wait(); or are you saying that the msleep(100)
> call in pci_af_flr() can also be removed if (dev->imm_ready)?

What about moving that msleep() to pci_dev_wait() itself?

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 29ff9619b5fa..7d4da2196618 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -999,7 +999,7 @@ static void __pci_start_power_transition(struct pci_dev *dev, pci_power_t state)
 		 * because have already delayed for the bridge.
 		 */
 		if (dev->runtime_d3cold) {
-			if (dev->d3cold_delay)
+			if (dev->d3cold_delay && !dev->imm_ready)
 				msleep(dev->d3cold_delay);
 			/*
 			 * When powering on a bridge from D3cold, the
@@ -2644,6 +2644,7 @@ EXPORT_SYMBOL_GPL(pci_d3cold_disable);
 void pci_pm_init(struct pci_dev *dev)
 {
 	int pm;
+	u16 status;
 	u16 pmc;
 
 	pm_runtime_forbid(&dev->dev);
@@ -2706,6 +2707,9 @@ void pci_pm_init(struct pci_dev *dev)
 		/* Disable the PME# generation functionality */
 		pci_pme_active(dev, false);
 	}
+
+	pci_read_config_word(dev, PCI_STATUS, &status);
+	dev->imm_ready = status & PCI_STATUS_IMMEDIATE;
 }
 
 static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop)
@@ -4306,6 +4310,17 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
 	int delay = 1;
 	u32 id;
 
+	if (dev->imm_ready)
+		return 0;
+
+	/*
+	 * Per Advanced Capabilities for Conventional PCI ECN, 13 April 2006,
+	 * updated 27 July 2006; a device must complete an FLR within
+	 * 100ms, but may silently discard requests while the FLR is in
+	 * progress.  Wait 100ms before trying to access the device.
+	 */
+	msleep(100);
+
 	/*
 	 * After reset, the device should not silently discard config
 	 * requests, but it may still indicate that it needs more time by
@@ -4376,13 +4391,6 @@ int pcie_flr(struct pci_dev *dev)
 
 	pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_BCR_FLR);
 
-	/*
-	 * Per PCIe r4.0, sec 6.6.2, a device must complete an FLR within
-	 * 100ms, but may silently discard requests while the FLR is in
-	 * progress.  Wait 100ms before trying to access the device.
-	 */
-	msleep(100);
-
 	return pci_dev_wait(dev, "FLR", PCIE_RESET_READY_POLL_MS);
 }
 EXPORT_SYMBOL_GPL(pcie_flr);
@@ -4416,15 +4424,6 @@ static int pci_af_flr(struct pci_dev *dev, int probe)
 		pci_err(dev, "timed out waiting for pending transaction; performing AF function level reset anyway\n");
 
 	pci_write_config_byte(dev, pos + PCI_AF_CTRL, PCI_AF_CTRL_FLR);
-
-	/*
-	 * Per Advanced Capabilities for Conventional PCI ECN, 13 April 2006,
-	 * updated 27 July 2006; a device must complete an FLR within
-	 * 100ms, but may silently discard requests while the FLR is in
-	 * progress.  Wait 100ms before trying to access the device.
-	 */
-	msleep(100);
-
 	return pci_dev_wait(dev, "AF_FLR", PCIE_RESET_READY_POLL_MS);
 }
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index e72ca8dd6241..7eed464e844a 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -325,6 +325,7 @@ struct pci_dev {
 	pci_power_t	current_state;	/* Current operating state. In ACPI,
 					   this is D0-D3, D0 being fully
 					   functional, and D3 being off. */
+	unsigned int	imm_ready:1;	/* Supports Immediate Readiness */
 	u8		pm_cap;		/* PM capability offset */
 	unsigned int	pme_support:5;	/* Bitmask of states from which PME#
 					   can be generated */
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index ee556ccc93f4..b5cf51a06cae 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -52,6 +52,7 @@
 #define  PCI_COMMAND_INTX_DISABLE 0x400 /* INTx Emulation Disable */
 
 #define PCI_STATUS		0x06	/* 16 bits */
+#define  PCI_STATUS_IMMEDIATE	0x01	/* Immediate Readiness */
 #define  PCI_STATUS_INTERRUPT	0x08	/* Interrupt status */
 #define  PCI_STATUS_CAP_LIST	0x10	/* Support Capability List */
 #define  PCI_STATUS_66MHZ	0x20	/* Support 66 MHz PCI 2.1 bus */
Felipe Balbi Sept. 5, 2018, 5:29 a.m. | #9
Hi,

Felipe Balbi <felipe.balbi@linux.intel.com> writes:
> Felipe Balbi <felipe.balbi@linux.intel.com> writes:
>>> Since this applies to all resets, I think you also need to get rid of
>>>> waits following different reset types in step #2 and return immediately.
>>>> I suggest you review callers of pci_dev_wait() and tap in there.
>>>
>>> I agree; I think we should be able to skip the delays in pcie_flr(),
>>> pci_af_flr(), etc.
>>
>> but that's why I put the code in pci_dev_wait(). Both pcie_flr() and
>> pci_af_flr() call pci_dev_wait(); or are you saying that the msleep(100)
>> call in pci_af_flr() can also be removed if (dev->imm_ready)?
>
> What about moving that msleep() to pci_dev_wait() itself?

this will incur and extra 100ms delay on D3->D0 transitions and bus
reset. Is that acceptable or would you prefer to add another check on
pcie_flr() and pci_af_flr()?

- 
balbi
Bjorn Helgaas Sept. 5, 2018, 4:49 p.m. | #10
On Wed, Sep 05, 2018 at 08:18:48AM +0300, Felipe Balbi wrote:
> 
> Hi,
> 
> Bjorn Helgaas <helgaas@kernel.org> writes:
> > On Fri, Aug 03, 2018 at 01:25:58PM -0400, Sinan Kaya wrote:
> >> On 8/3/2018 2:26 AM, Felipe Balbi wrote:
> >> > Sinan Kaya <okaya@kernel.org> writes:
> >> > 
> >> > > On 8/2/2018 7:36 AM, Felipe Balbi wrote:
> >> > > > PCIe GEN4 defines a new bit on Status Register which tells us that, if
> >> > > > Set, a function is immediately ready after a Reset. This means that
> >> > > > all delays after a Conventional or Function Reset can be skipped.
> >> > > 
> >> > > Can you give a reference to the section of the specification? or
> >> > > a pointer to the ECN?
> >> > 
> >> > Section 7.5.1.1.4 of PCIe GEN4 spec. Table 7-4:
> >> > 
> >> > Immediate Readiness – This optional bit, when Set, indicates the
> >> > Function is guaranteed to be ready to successfully complete valid
> >> > configuration accesses at any time following any reset that the host is
> >> > capable of issuing Configuration Requests to this Function.
> >> > 
> >> > When this bit is Set, for accesses to this Function, software is exempt
> >> > from all requirements to delay configuration accesses following any type
> >> > of reset, including but not limited to the timing requirements defined
> >> > in Section 6.6.  How this guarantee is established is beyond the scope
> >> > of this document.
> >> > 
> >> > It is permitted that system software/firmware provide mechanisms that
> >> > supersede the indication provided by this bit, however such
> >> > software/firmware mechanisms are outside the scope of this
> >> > specification.
> >> > 
> >> 
> >> Thanks for the spec reference.
> >
> > Yes.  Please include the reference in the changelog of v2.
> 
> will do
> 
> >> I think the patch is touching the wrong places. pci_dev_wait() is there
> >> to wait for CRS response to finish following reset.
> >> 
> >> Typical sequences are:
> >> 1. Do some kind of reset in another routine
> >> 2. Wait reset specific wait time (1sec for secondary bus reset as an
> >> example and 100ms for d3-d0 transition)
> >> 3. call pci_dev_wait() after reset to see if device can accept config
> >> transactions.
> >> 
> >> Since this applies to all resets, I think you also need to get rid of
> >> waits following different reset types in step #2 and return immediately.
> >> I suggest you review callers of pci_dev_wait() and tap in there.
> >
> > I agree; I think we should be able to skip the delays in pcie_flr(),
> > pci_af_flr(), etc.
> 
> but that's why I put the code in pci_dev_wait(). Both pcie_flr() and
> pci_af_flr() call pci_dev_wait(); or are you saying that the msleep(100)
> call in pci_af_flr() can also be removed if (dev->imm_ready)?

Yes.  The spec says (PCIe r4.0, sec 7.5.1.1.4):

  When this bit is Set, for accesses to this Function, software is
  exempt from all requirements to delay configuration accesses
  following any type of reset, including but not limited to the timing
  requirements defined in Section 6.6.

Sec 6.6 contains various references to waiting 100ms after different
types of resets, and I think those correspond to the msleep(100)
calls.  So I think we can skip those delays if the device supports
Immediate Readiness.

pci_dev_wait() is connected with CRS.  There's some reason that
function doesn't actually looks for CRS responses, which I can't
remember right now.  But you can look it up in the changelogs.

So I don't think we should put the Immediate Readiness check in
pci_dev_wait().

Bjorn
Sinan Kaya Sept. 5, 2018, 4:54 p.m. | #11
On 9/5/2018 9:49 AM, Bjorn Helgaas wrote:
> pci_dev_wait() is connected with CRS.  There's some reason that
> function doesn't actually looks for CRS responses, which I can't
> remember right now.  But you can look it up in the changelogs.
> 

We wanted to be able to support virtual functions (that can't read
vendor id) as well as systems without CRS visibility support. That
was the reason why we are polling a well known register such as
command status against ~0 rather than the vendor id register.

> So I don't think we should put the Immediate Readiness check in
> pci_dev_wait().

I agree, it should be outside of pci_dev_wait()
Felipe Balbi Sept. 6, 2018, 6:02 a.m. | #12
Hi,

Bjorn Helgaas <helgaas@kernel.org> writes:
>> >> I think the patch is touching the wrong places. pci_dev_wait() is there
>> >> to wait for CRS response to finish following reset.
>> >> 
>> >> Typical sequences are:
>> >> 1. Do some kind of reset in another routine
>> >> 2. Wait reset specific wait time (1sec for secondary bus reset as an
>> >> example and 100ms for d3-d0 transition)
>> >> 3. call pci_dev_wait() after reset to see if device can accept config
>> >> transactions.
>> >> 
>> >> Since this applies to all resets, I think you also need to get rid of
>> >> waits following different reset types in step #2 and return immediately.
>> >> I suggest you review callers of pci_dev_wait() and tap in there.
>> >
>> > I agree; I think we should be able to skip the delays in pcie_flr(),
>> > pci_af_flr(), etc.
>> 
>> but that's why I put the code in pci_dev_wait(). Both pcie_flr() and
>> pci_af_flr() call pci_dev_wait(); or are you saying that the msleep(100)
>> call in pci_af_flr() can also be removed if (dev->imm_ready)?
>
> Yes.  The spec says (PCIe r4.0, sec 7.5.1.1.4):
>
>   When this bit is Set, for accesses to this Function, software is
>   exempt from all requirements to delay configuration accesses
>   following any type of reset, including but not limited to the timing
>   requirements defined in Section 6.6.
>
> Sec 6.6 contains various references to waiting 100ms after different
> types of resets, and I think those correspond to the msleep(100)
> calls.  So I think we can skip those delays if the device supports
> Immediate Readiness.
>
> pci_dev_wait() is connected with CRS.  There's some reason that
> function doesn't actually looks for CRS responses, which I can't
> remember right now.  But you can look it up in the changelogs.
>
> So I don't think we should put the Immediate Readiness check in
> pci_dev_wait().

IOW, you want something like below? If that's the case, I can send
another version shortly.

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 29ff9619b5fa..61be196db92b 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -999,7 +999,7 @@ static void __pci_start_power_transition(struct pci_dev *dev, pci_power_t state)
 		 * because have already delayed for the bridge.
 		 */
 		if (dev->runtime_d3cold) {
-			if (dev->d3cold_delay)
+			if (dev->d3cold_delay && !dev->imm_ready)
 				msleep(dev->d3cold_delay);
 			/*
 			 * When powering on a bridge from D3cold, the
@@ -2644,6 +2644,7 @@ EXPORT_SYMBOL_GPL(pci_d3cold_disable);
 void pci_pm_init(struct pci_dev *dev)
 {
 	int pm;
+	u16 status;
 	u16 pmc;
 
 	pm_runtime_forbid(&dev->dev);
@@ -2706,6 +2707,9 @@ void pci_pm_init(struct pci_dev *dev)
 		/* Disable the PME# generation functionality */
 		pci_pme_active(dev, false);
 	}
+
+	pci_read_config_word(dev, PCI_STATUS, &status);
+	dev->imm_ready = status & PCI_STATUS_IMMEDIATE;
 }
 
 static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop)
@@ -4376,6 +4380,9 @@ int pcie_flr(struct pci_dev *dev)
 
 	pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_BCR_FLR);
 
+	if (dev->imm_ready)
+		return 0;
+
 	/*
 	 * Per PCIe r4.0, sec 6.6.2, a device must complete an FLR within
 	 * 100ms, but may silently discard requests while the FLR is in
@@ -4417,6 +4424,9 @@ static int pci_af_flr(struct pci_dev *dev, int probe)
 
 	pci_write_config_byte(dev, pos + PCI_AF_CTRL, PCI_AF_CTRL_FLR);
 
+	if (dev->imm_ready)
+		return 0;
+
 	/*
 	 * Per Advanced Capabilities for Conventional PCI ECN, 13 April 2006,
 	 * updated 27 July 2006; a device must complete an FLR within
diff --git a/include/linux/pci.h b/include/linux/pci.h
index e72ca8dd6241..7eed464e844a 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -325,6 +325,7 @@ struct pci_dev {
 	pci_power_t	current_state;	/* Current operating state. In ACPI,
 					   this is D0-D3, D0 being fully
 					   functional, and D3 being off. */
+	unsigned int	imm_ready:1;	/* Supports Immediate Readiness */
 	u8		pm_cap;		/* PM capability offset */
 	unsigned int	pme_support:5;	/* Bitmask of states from which PME#
 					   can be generated */
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index ee556ccc93f4..b5cf51a06cae 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -52,6 +52,7 @@
 #define  PCI_COMMAND_INTX_DISABLE 0x400 /* INTx Emulation Disable */
 
 #define PCI_STATUS		0x06	/* 16 bits */
+#define  PCI_STATUS_IMMEDIATE	0x01	/* Immediate Readiness */
 #define  PCI_STATUS_INTERRUPT	0x08	/* Interrupt status */
 #define  PCI_STATUS_CAP_LIST	0x10	/* Support Capability List */
 #define  PCI_STATUS_66MHZ	0x20	/* Support 66 MHz PCI 2.1 bus */
Bjorn Helgaas Sept. 6, 2018, 2:13 p.m. | #13
On Thu, Sep 06, 2018 at 09:02:56AM +0300, Felipe Balbi wrote:
> Bjorn Helgaas <helgaas@kernel.org> writes:
> >> >> I think the patch is touching the wrong places. pci_dev_wait() is there
> >> >> to wait for CRS response to finish following reset.
> >> >> 
> >> >> Typical sequences are:
> >> >> 1. Do some kind of reset in another routine
> >> >> 2. Wait reset specific wait time (1sec for secondary bus reset as an
> >> >> example and 100ms for d3-d0 transition)
> >> >> 3. call pci_dev_wait() after reset to see if device can accept config
> >> >> transactions.
> >> >> 
> >> >> Since this applies to all resets, I think you also need to get rid of
> >> >> waits following different reset types in step #2 and return immediately.
> >> >> I suggest you review callers of pci_dev_wait() and tap in there.
> >> >
> >> > I agree; I think we should be able to skip the delays in pcie_flr(),
> >> > pci_af_flr(), etc.
> >> 
> >> but that's why I put the code in pci_dev_wait(). Both pcie_flr() and
> >> pci_af_flr() call pci_dev_wait(); or are you saying that the msleep(100)
> >> call in pci_af_flr() can also be removed if (dev->imm_ready)?
> >
> > Yes.  The spec says (PCIe r4.0, sec 7.5.1.1.4):
> >
> >   When this bit is Set, for accesses to this Function, software is
> >   exempt from all requirements to delay configuration accesses
> >   following any type of reset, including but not limited to the timing
> >   requirements defined in Section 6.6.
> >
> > Sec 6.6 contains various references to waiting 100ms after different
> > types of resets, and I think those correspond to the msleep(100)
> > calls.  So I think we can skip those delays if the device supports
> > Immediate Readiness.
> >
> > pci_dev_wait() is connected with CRS.  There's some reason that
> > function doesn't actually looks for CRS responses, which I can't
> > remember right now.  But you can look it up in the changelogs.
> >
> > So I don't think we should put the Immediate Readiness check in
> > pci_dev_wait().
> 
> IOW, you want something like below? If that's the case, I can send
> another version shortly.
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 29ff9619b5fa..61be196db92b 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -999,7 +999,7 @@ static void __pci_start_power_transition(struct pci_dev *dev, pci_power_t state)
>  		 * because have already delayed for the bridge.
>  		 */
>  		if (dev->runtime_d3cold) {
> -			if (dev->d3cold_delay)
> +			if (dev->d3cold_delay && !dev->imm_ready)
>  				msleep(dev->d3cold_delay);
>  			/*
>  			 * When powering on a bridge from D3cold, the
> @@ -2644,6 +2644,7 @@ EXPORT_SYMBOL_GPL(pci_d3cold_disable);
>  void pci_pm_init(struct pci_dev *dev)
>  {
>  	int pm;
> +	u16 status;
>  	u16 pmc;
>  
>  	pm_runtime_forbid(&dev->dev);
> @@ -2706,6 +2707,9 @@ void pci_pm_init(struct pci_dev *dev)
>  		/* Disable the PME# generation functionality */
>  		pci_pme_active(dev, false);
>  	}
> +
> +	pci_read_config_word(dev, PCI_STATUS, &status);
> +	dev->imm_ready = status & PCI_STATUS_IMMEDIATE;

This needs something like

  if (status & PCI_STATUS_IMMEDIATE)
    dev->imm_ready = 1;

It's true that PCI_STATUS_IMMEDIATE is the least-significant bit,
but the code here shouldn't depend on that.

But otherwise this looks good.

>  }
>  
>  static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop)
> @@ -4376,6 +4380,9 @@ int pcie_flr(struct pci_dev *dev)
>  
>  	pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_BCR_FLR);
>  
> +	if (dev->imm_ready)
> +		return 0;
> +
>  	/*
>  	 * Per PCIe r4.0, sec 6.6.2, a device must complete an FLR within
>  	 * 100ms, but may silently discard requests while the FLR is in
> @@ -4417,6 +4424,9 @@ static int pci_af_flr(struct pci_dev *dev, int probe)
>  
>  	pci_write_config_byte(dev, pos + PCI_AF_CTRL, PCI_AF_CTRL_FLR);
>  
> +	if (dev->imm_ready)
> +		return 0;
> +
>  	/*
>  	 * Per Advanced Capabilities for Conventional PCI ECN, 13 April 2006,
>  	 * updated 27 July 2006; a device must complete an FLR within
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index e72ca8dd6241..7eed464e844a 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -325,6 +325,7 @@ struct pci_dev {
>  	pci_power_t	current_state;	/* Current operating state. In ACPI,
>  					   this is D0-D3, D0 being fully
>  					   functional, and D3 being off. */
> +	unsigned int	imm_ready:1;	/* Supports Immediate Readiness */
>  	u8		pm_cap;		/* PM capability offset */
>  	unsigned int	pme_support:5;	/* Bitmask of states from which PME#
>  					   can be generated */
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index ee556ccc93f4..b5cf51a06cae 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -52,6 +52,7 @@
>  #define  PCI_COMMAND_INTX_DISABLE 0x400 /* INTx Emulation Disable */
>  
>  #define PCI_STATUS		0x06	/* 16 bits */
> +#define  PCI_STATUS_IMMEDIATE	0x01	/* Immediate Readiness */
>  #define  PCI_STATUS_INTERRUPT	0x08	/* Interrupt status */
>  #define  PCI_STATUS_CAP_LIST	0x10	/* Support Capability List */
>  #define  PCI_STATUS_66MHZ	0x20	/* Support 66 MHz PCI 2.1 bus */
> 
> 
> -- 
> balbi

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 316496e99da9..cf5239edd46a 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -835,7 +835,7 @@  static void __pci_start_power_transition(struct pci_dev *dev, pci_power_t state)
 		 * because have already delayed for the bridge.
 		 */
 		if (dev->runtime_d3cold) {
-			if (dev->d3cold_delay)
+			if (dev->d3cold_delay && !dev->immediate)
 				msleep(dev->d3cold_delay);
 			/*
 			 * When powering on a bridge from D3cold, the
@@ -2442,6 +2442,7 @@  EXPORT_SYMBOL_GPL(pci_d3cold_disable);
 void pci_pm_init(struct pci_dev *dev)
 {
 	int pm;
+	u16 status;
 	u16 pmc;
 
 	pm_runtime_forbid(&dev->dev);
@@ -2504,6 +2505,9 @@  void pci_pm_init(struct pci_dev *dev)
 		/* Disable the PME# generation functionality */
 		pci_pme_active(dev, false);
 	}
+
+	pci_read_config_word(dev, PCI_STATUS, &status);
+	dev->immediate = status & PCI_STATUS_IMMEDIATE;
 }
 
 static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop)
@@ -4034,6 +4038,9 @@  static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
 	int delay = 1;
 	u32 id;
 
+	if (dev->immediate)
+		return 0;
+
 	/*
 	 * After reset, the device should not silently discard config
 	 * requests, but it may still indicate that it needs more time by
diff --git a/include/linux/pci.h b/include/linux/pci.h
index abd5d5e17aee..d8f0a382b026 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -321,6 +321,7 @@  struct pci_dev {
 	pci_power_t	current_state;	/* Current operating state. In ACPI,
 					   this is D0-D3, D0 being fully
 					   functional, and D3 being off. */
+	unsigned int	immediate:1;	/* Supports Immediate Readiness */
 	u8		pm_cap;		/* PM capability offset */
 	unsigned int	pme_support:5;	/* Bitmask of states from which PME#
 					   can be generated */
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index 4da87e2ef8a8..796d12910791 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -52,6 +52,7 @@ 
 #define  PCI_COMMAND_INTX_DISABLE 0x400 /* INTx Emulation Disable */
 
 #define PCI_STATUS		0x06	/* 16 bits */
+#define  PCI_STATUS_IMMEDIATE	0x01	/* Immediate Readiness */
 #define  PCI_STATUS_INTERRUPT	0x08	/* Interrupt status */
 #define  PCI_STATUS_CAP_LIST	0x10	/* Support Capability List */
 #define  PCI_STATUS_66MHZ	0x20	/* Support 66 MHz PCI 2.1 bus */