diff mbox

PCI: Fix regression in pci_restore_state()

Message ID 201204152140.40747.rjw@sisk.pl
State Superseded, archived
Headers show

Commit Message

Rafael J. Wysocki April 15, 2012, 7:40 p.m. UTC
On Sunday, April 15, 2012, Linus Torvalds wrote:
> On Sun, Apr 15, 2012 at 11:47 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> >
> > mdelay(10) doesn't really look good either to me in this case, though.
> 
> Oh, I agree. What kind of ass-backwards device actually needs that
> kind of crazy delays? It is almost certainly buggy.
> 
> With retries, 10ms delays are totally unacceptable. There's something wrong.
> 
> A single ms *may* be ok.
> 
> Anyway, can you also split the actual "write _one_ register with
> retry" into a function of its own? The code looks like crap with those
> multiple levels of looping, with conditionals inside them etc. With a
> simple helper function, you could change the break into return, and it
> would look much better, I bet.

Sure.  It appears cleaner this way.

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PCI: Fix regression in pci_restore_state(), v3

Commit 26f41062f28de65e11d3cf353e52d0be73442be1

    PCI: check for pci bar restore completion and retry

attempted to address problems with PCI BAR restoration on systems
where FLR had not been completed before pci_restore_state() was
called, but it did that in an utterly wrong way.

First off, instead of retrying the writes for the BAR registers
only, it did that for all of the PCI config space of the device,
including the status register (whose value after the write quite
obviously need not be the same as the written one).  Second, it
added arbitrary delay to pci_restore_state() even for systems
where the PCI config space restoration was successful at first
attempt.  Finally, the mdelay(10) it added to every iteration of the
writing loop was way too much of a delay for any reasonable device.

All of this actually caused resume failures for some devices on
the Mikko's system.

To fix the regression, make pci_restore_state() only retry the
writes for BAR registers and only wait if the first read from
the register doesn't return the written value.  Additionaly, make
it wait for 1 ms, instead of 10 ms, after every failing attempt
to write into config space.

Reported-by: Mikko Vinni <mmvinni@yahoo.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/pci/pci.c |   57 ++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 39 insertions(+), 18 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Bjorn Helgaas April 23, 2012, 5:03 p.m. UTC | #1
On Sun, Apr 15, 2012 at 1:40 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Sunday, April 15, 2012, Linus Torvalds wrote:
>> On Sun, Apr 15, 2012 at 11:47 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>> >
>> > mdelay(10) doesn't really look good either to me in this case, though.
>>
>> Oh, I agree. What kind of ass-backwards device actually needs that
>> kind of crazy delays? It is almost certainly buggy.
>>
>> With retries, 10ms delays are totally unacceptable. There's something wrong.
>>
>> A single ms *may* be ok.
>>
>> Anyway, can you also split the actual "write _one_ register with
>> retry" into a function of its own? The code looks like crap with those
>> multiple levels of looping, with conditionals inside them etc. With a
>> simple helper function, you could change the break into return, and it
>> would look much better, I bet.
>
> Sure.  It appears cleaner this way.
>
> ---
> From: Rafael J. Wysocki <rjw@sisk.pl>
> Subject: PCI: Fix regression in pci_restore_state(), v3
>
> Commit 26f41062f28de65e11d3cf353e52d0be73442be1
>
>    PCI: check for pci bar restore completion and retry
>
> attempted to address problems with PCI BAR restoration on systems
> where FLR had not been completed before pci_restore_state() was
> called, but it did that in an utterly wrong way.
>
> First off, instead of retrying the writes for the BAR registers
> only, it did that for all of the PCI config space of the device,
> including the status register (whose value after the write quite
> obviously need not be the same as the written one).  Second, it
> added arbitrary delay to pci_restore_state() even for systems
> where the PCI config space restoration was successful at first
> attempt.  Finally, the mdelay(10) it added to every iteration of the
> writing loop was way too much of a delay for any reasonable device.
>
> All of this actually caused resume failures for some devices on
> the Mikko's system.
>
> To fix the regression, make pci_restore_state() only retry the
> writes for BAR registers and only wait if the first read from
> the register doesn't return the written value.  Additionaly, make
> it wait for 1 ms, instead of 10 ms, after every failing attempt
> to write into config space.
>
> Reported-by: Mikko Vinni <mmvinni@yahoo.com>
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> ---
>  drivers/pci/pci.c |   57 ++++++++++++++++++++++++++++++++++++------------------
>  1 file changed, 39 insertions(+), 18 deletions(-)
>
> Index: linux/drivers/pci/pci.c
> ===================================================================
> --- linux.orig/drivers/pci/pci.c
> +++ linux/drivers/pci/pci.c
> @@ -967,16 +967,47 @@ pci_save_state(struct pci_dev *dev)
>        return 0;
>  }
>
> +static void pci_restore_config_dword(struct pci_dev *pdev, int offset,
> +                                    u32 saved_val, int retry)
> +{
> +       u32 val;
> +
> +       pci_read_config_dword(pdev, offset, &val);
> +       if (val == saved_val)
> +               return;
> +
> +       for (;;) {
> +               dev_dbg(&pdev->dev, "restoring config space at offset "
> +                       "%#x (was %#x, writing %#x)\n", offset, val, saved_val);
> +               pci_write_config_dword(pdev, offset, saved_val);
> +               if (retry-- <= 0)
> +                       return;
> +
> +               pci_read_config_dword(pdev, offset, &val);
> +               if (val == saved_val)
> +                       return;
> +
> +               mdelay(1);
> +       }
> +}
> +
> +static void pci_restore_config_space(struct pci_dev *pdev, int start, int end,
> +                                    int retry)
> +{
> +       int index;
> +
> +       for (index = end; index >= start; index--)
> +               pci_restore_config_dword(pdev, 4 * index,
> +                                        pdev->saved_config_space[index],
> +                                        retry);
> +}
> +
>  /**
>  * pci_restore_state - Restore the saved state of a PCI device
>  * @dev: - PCI device that we're dealing with
>  */
>  void pci_restore_state(struct pci_dev *dev)
>  {
> -       int i;
> -       u32 val;
> -       int tries;
> -
>        if (!dev->state_saved)
>                return;
>
> @@ -984,24 +1015,14 @@ void pci_restore_state(struct pci_dev *d
>        pci_restore_pcie_state(dev);
>        pci_restore_ats_state(dev);
>
> +       pci_restore_config_space(dev, 10, 15, 0);
>        /*
>         * The Base Address register should be programmed before the command
>         * register(s)
>         */
> -       for (i = 15; i >= 0; i--) {
> -               pci_read_config_dword(dev, i * 4, &val);
> -               tries = 10;
> -               while (tries && val != dev->saved_config_space[i]) {
> -                       dev_dbg(&dev->dev, "restoring config "
> -                               "space at offset %#x (was %#x, writing %#x)\n",
> -                               i, val, (int)dev->saved_config_space[i]);
> -                       pci_write_config_dword(dev,i * 4,
> -                               dev->saved_config_space[i]);
> -                       pci_read_config_dword(dev, i * 4, &val);
> -                       mdelay(10);
> -                       tries--;
> -               }
> -       }
> +       pci_restore_config_space(dev, 4, 9, 10);
> +       pci_restore_config_space(dev, 0, 3, 0);
> +
>        pci_restore_pcix_state(dev);
>        pci_restore_msi_state(dev);
>        pci_restore_iov_state(dev);

I'd feel better about this if there were a way to delay in the FLR
path instead.  If we delay in the restore path, we're only fixing one
of the many ways config space can be written.  Other paths that write
config space will just silently fail.

The PCIe spec (r3.0, sec 6.6.2) mentions waiting for the "pre-FLR
value for Completion Timeout," but I don't see anything that looks
like that in pcie_flr() or pci_af_flr().  Are there any other direct
ways we can detect when the FLR is complete?

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki April 23, 2012, 7:53 p.m. UTC | #2
On Monday, April 23, 2012, Bjorn Helgaas wrote:
> On Sun, Apr 15, 2012 at 1:40 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > On Sunday, April 15, 2012, Linus Torvalds wrote:
> >> On Sun, Apr 15, 2012 at 11:47 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> >> >
> >> > mdelay(10) doesn't really look good either to me in this case, though.
> >>
> >> Oh, I agree. What kind of ass-backwards device actually needs that
> >> kind of crazy delays? It is almost certainly buggy.
> >>
> >> With retries, 10ms delays are totally unacceptable. There's something wrong.
> >>
> >> A single ms *may* be ok.
> >>
> >> Anyway, can you also split the actual "write _one_ register with
> >> retry" into a function of its own? The code looks like crap with those
> >> multiple levels of looping, with conditionals inside them etc. With a
> >> simple helper function, you could change the break into return, and it
> >> would look much better, I bet.
> >
> > Sure.  It appears cleaner this way.
> >
> > ---
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > Subject: PCI: Fix regression in pci_restore_state(), v3
> >
> > Commit 26f41062f28de65e11d3cf353e52d0be73442be1
> >
> >    PCI: check for pci bar restore completion and retry
> >
> > attempted to address problems with PCI BAR restoration on systems
> > where FLR had not been completed before pci_restore_state() was
> > called, but it did that in an utterly wrong way.
> >
> > First off, instead of retrying the writes for the BAR registers
> > only, it did that for all of the PCI config space of the device,
> > including the status register (whose value after the write quite
> > obviously need not be the same as the written one).  Second, it
> > added arbitrary delay to pci_restore_state() even for systems
> > where the PCI config space restoration was successful at first
> > attempt.  Finally, the mdelay(10) it added to every iteration of the
> > writing loop was way too much of a delay for any reasonable device.
> >
> > All of this actually caused resume failures for some devices on
> > the Mikko's system.
> >
> > To fix the regression, make pci_restore_state() only retry the
> > writes for BAR registers and only wait if the first read from
> > the register doesn't return the written value.  Additionaly, make
> > it wait for 1 ms, instead of 10 ms, after every failing attempt
> > to write into config space.
> >
> > Reported-by: Mikko Vinni <mmvinni@yahoo.com>
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > ---
> >  drivers/pci/pci.c |   57 ++++++++++++++++++++++++++++++++++++------------------
> >  1 file changed, 39 insertions(+), 18 deletions(-)
> >
> > Index: linux/drivers/pci/pci.c
> > ===================================================================
> > --- linux.orig/drivers/pci/pci.c
> > +++ linux/drivers/pci/pci.c
> > @@ -967,16 +967,47 @@ pci_save_state(struct pci_dev *dev)
> >        return 0;
> >  }
> >
> > +static void pci_restore_config_dword(struct pci_dev *pdev, int offset,
> > +                                    u32 saved_val, int retry)
> > +{
> > +       u32 val;
> > +
> > +       pci_read_config_dword(pdev, offset, &val);
> > +       if (val == saved_val)
> > +               return;
> > +
> > +       for (;;) {
> > +               dev_dbg(&pdev->dev, "restoring config space at offset "
> > +                       "%#x (was %#x, writing %#x)\n", offset, val, saved_val);
> > +               pci_write_config_dword(pdev, offset, saved_val);
> > +               if (retry-- <= 0)
> > +                       return;
> > +
> > +               pci_read_config_dword(pdev, offset, &val);
> > +               if (val == saved_val)
> > +                       return;
> > +
> > +               mdelay(1);
> > +       }
> > +}
> > +
> > +static void pci_restore_config_space(struct pci_dev *pdev, int start, int end,
> > +                                    int retry)
> > +{
> > +       int index;
> > +
> > +       for (index = end; index >= start; index--)
> > +               pci_restore_config_dword(pdev, 4 * index,
> > +                                        pdev->saved_config_space[index],
> > +                                        retry);
> > +}
> > +
> >  /**
> >  * pci_restore_state - Restore the saved state of a PCI device
> >  * @dev: - PCI device that we're dealing with
> >  */
> >  void pci_restore_state(struct pci_dev *dev)
> >  {
> > -       int i;
> > -       u32 val;
> > -       int tries;
> > -
> >        if (!dev->state_saved)
> >                return;
> >
> > @@ -984,24 +1015,14 @@ void pci_restore_state(struct pci_dev *d
> >        pci_restore_pcie_state(dev);
> >        pci_restore_ats_state(dev);
> >
> > +       pci_restore_config_space(dev, 10, 15, 0);
> >        /*
> >         * The Base Address register should be programmed before the command
> >         * register(s)
> >         */
> > -       for (i = 15; i >= 0; i--) {
> > -               pci_read_config_dword(dev, i * 4, &val);
> > -               tries = 10;
> > -               while (tries && val != dev->saved_config_space[i]) {
> > -                       dev_dbg(&dev->dev, "restoring config "
> > -                               "space at offset %#x (was %#x, writing %#x)\n",
> > -                               i, val, (int)dev->saved_config_space[i]);
> > -                       pci_write_config_dword(dev,i * 4,
> > -                               dev->saved_config_space[i]);
> > -                       pci_read_config_dword(dev, i * 4, &val);
> > -                       mdelay(10);
> > -                       tries--;
> > -               }
> > -       }
> > +       pci_restore_config_space(dev, 4, 9, 10);
> > +       pci_restore_config_space(dev, 0, 3, 0);
> > +
> >        pci_restore_pcix_state(dev);
> >        pci_restore_msi_state(dev);
> >        pci_restore_iov_state(dev);
> 
> I'd feel better about this if there were a way to delay in the FLR
> path instead.  If we delay in the restore path, we're only fixing one
> of the many ways config space can be written.  Other paths that write
> config space will just silently fail.
> 
> The PCIe spec (r3.0, sec 6.6.2) mentions waiting for the "pre-FLR
> value for Completion Timeout," but I don't see anything that looks
> like that in pcie_flr() or pci_af_flr().  Are there any other direct
> ways we can detect when the FLR is complete?

I'm not aware of any.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Don Dutile April 23, 2012, 8:07 p.m. UTC | #3
On 04/23/2012 03:53 PM, Rafael J. Wysocki wrote:
> On Monday, April 23, 2012, Bjorn Helgaas wrote:
>> On Sun, Apr 15, 2012 at 1:40 PM, Rafael J. Wysocki<rjw@sisk.pl>  wrote:
>>> On Sunday, April 15, 2012, Linus Torvalds wrote:
>>>> On Sun, Apr 15, 2012 at 11:47 AM, Rafael J. Wysocki<rjw@sisk.pl>  wrote:
>>>>>
>>>>> mdelay(10) doesn't really look good either to me in this case, though.
>>>>
>>>> Oh, I agree. What kind of ass-backwards device actually needs that
>>>> kind of crazy delays? It is almost certainly buggy.
>>>>
>>>> With retries, 10ms delays are totally unacceptable. There's something wrong.
>>>>
>>>> A single ms *may* be ok.
>>>>
>>>> Anyway, can you also split the actual "write _one_ register with
>>>> retry" into a function of its own? The code looks like crap with those
>>>> multiple levels of looping, with conditionals inside them etc. With a
>>>> simple helper function, you could change the break into return, and it
>>>> would look much better, I bet.
>>>
>>> Sure.  It appears cleaner this way.
>>>
>>> ---
>>> From: Rafael J. Wysocki<rjw@sisk.pl>
>>> Subject: PCI: Fix regression in pci_restore_state(), v3
>>>
>>> Commit 26f41062f28de65e11d3cf353e52d0be73442be1
>>>
>>>     PCI: check for pci bar restore completion and retry
>>>
>>> attempted to address problems with PCI BAR restoration on systems
>>> where FLR had not been completed before pci_restore_state() was
>>> called, but it did that in an utterly wrong way.
>>>
>>> First off, instead of retrying the writes for the BAR registers
>>> only, it did that for all of the PCI config space of the device,
>>> including the status register (whose value after the write quite
>>> obviously need not be the same as the written one).  Second, it
>>> added arbitrary delay to pci_restore_state() even for systems
>>> where the PCI config space restoration was successful at first
>>> attempt.  Finally, the mdelay(10) it added to every iteration of the
>>> writing loop was way too much of a delay for any reasonable device.
>>>
>>> All of this actually caused resume failures for some devices on
>>> the Mikko's system.
>>>
>>> To fix the regression, make pci_restore_state() only retry the
>>> writes for BAR registers and only wait if the first read from
>>> the register doesn't return the written value.  Additionaly, make
>>> it wait for 1 ms, instead of 10 ms, after every failing attempt
>>> to write into config space.
>>>
>>> Reported-by: Mikko Vinni<mmvinni@yahoo.com>
>>> Signed-off-by: Rafael J. Wysocki<rjw@sisk.pl>
>>> ---
>>>   drivers/pci/pci.c |   57 ++++++++++++++++++++++++++++++++++++------------------
>>>   1 file changed, 39 insertions(+), 18 deletions(-)
>>>
>>> Index: linux/drivers/pci/pci.c
>>> ===================================================================
>>> --- linux.orig/drivers/pci/pci.c
>>> +++ linux/drivers/pci/pci.c
>>> @@ -967,16 +967,47 @@ pci_save_state(struct pci_dev *dev)
>>>         return 0;
>>>   }
>>>
>>> +static void pci_restore_config_dword(struct pci_dev *pdev, int offset,
>>> +                                    u32 saved_val, int retry)
>>> +{
>>> +       u32 val;
>>> +
>>> +       pci_read_config_dword(pdev, offset,&val);
>>> +       if (val == saved_val)
>>> +               return;
>>> +
>>> +       for (;;) {
>>> +               dev_dbg(&pdev->dev, "restoring config space at offset "
>>> +                       "%#x (was %#x, writing %#x)\n", offset, val, saved_val);
>>> +               pci_write_config_dword(pdev, offset, saved_val);
>>> +               if (retry--<= 0)
>>> +                       return;
>>> +
>>> +               pci_read_config_dword(pdev, offset,&val);
>>> +               if (val == saved_val)
>>> +                       return;
>>> +
>>> +               mdelay(1);
>>> +       }
>>> +}
>>> +
>>> +static void pci_restore_config_space(struct pci_dev *pdev, int start, int end,
>>> +                                    int retry)
>>> +{
>>> +       int index;
>>> +
>>> +       for (index = end; index>= start; index--)
>>> +               pci_restore_config_dword(pdev, 4 * index,
>>> +                                        pdev->saved_config_space[index],
>>> +                                        retry);
>>> +}
>>> +
>>>   /**
>>>   * pci_restore_state - Restore the saved state of a PCI device
>>>   * @dev: - PCI device that we're dealing with
>>>   */
>>>   void pci_restore_state(struct pci_dev *dev)
>>>   {
>>> -       int i;
>>> -       u32 val;
>>> -       int tries;
>>> -
>>>         if (!dev->state_saved)
>>>                 return;
>>>
>>> @@ -984,24 +1015,14 @@ void pci_restore_state(struct pci_dev *d
>>>         pci_restore_pcie_state(dev);
>>>         pci_restore_ats_state(dev);
>>>
>>> +       pci_restore_config_space(dev, 10, 15, 0);
>>>         /*
>>>          * The Base Address register should be programmed before the command
>>>          * register(s)
>>>          */
>>> -       for (i = 15; i>= 0; i--) {
>>> -               pci_read_config_dword(dev, i * 4,&val);
>>> -               tries = 10;
>>> -               while (tries&&  val != dev->saved_config_space[i]) {
>>> -                       dev_dbg(&dev->dev, "restoring config "
>>> -                               "space at offset %#x (was %#x, writing %#x)\n",
>>> -                               i, val, (int)dev->saved_config_space[i]);
>>> -                       pci_write_config_dword(dev,i * 4,
>>> -                               dev->saved_config_space[i]);
>>> -                       pci_read_config_dword(dev, i * 4,&val);
>>> -                       mdelay(10);
>>> -                       tries--;
>>> -               }
>>> -       }
>>> +       pci_restore_config_space(dev, 4, 9, 10);
>>> +       pci_restore_config_space(dev, 0, 3, 0);
>>> +
>>>         pci_restore_pcix_state(dev);
>>>         pci_restore_msi_state(dev);
>>>         pci_restore_iov_state(dev);
>>
>> I'd feel better about this if there were a way to delay in the FLR
>> path instead.  If we delay in the restore path, we're only fixing one
>> of the many ways config space can be written.  Other paths that write
>> config space will just silently fail.
>>
>> The PCIe spec (r3.0, sec 6.6.2) mentions waiting for the "pre-FLR
>> value for Completion Timeout," but I don't see anything that looks
>> like that in pcie_flr() or pci_af_flr().  Are there any other direct
>> ways we can detect when the FLR is complete?
>
> I'm not aware of any.
>
> Thanks,
> Rafael
I don't think so, either.
I believe an ECN is being worked in the PCI-SIG
to add such a notification, though.
Even if adopted, need to wait for another crank of the hw before
the notification can be used.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas April 23, 2012, 10:33 p.m. UTC | #4
On Mon, Apr 23, 2012 at 2:07 PM, Don Dutile <ddutile@redhat.com> wrote:
> On 04/23/2012 03:53 PM, Rafael J. Wysocki wrote:
>>
>> On Monday, April 23, 2012, Bjorn Helgaas wrote:
>>>
>>> On Sun, Apr 15, 2012 at 1:40 PM, Rafael J. Wysocki<rjw@sisk.pl>  wrote:
>>>>
>>>> On Sunday, April 15, 2012, Linus Torvalds wrote:
>>>>>
>>>>> On Sun, Apr 15, 2012 at 11:47 AM, Rafael J. Wysocki<rjw@sisk.pl>
>>>>>  wrote:
>>>>>>
>>>>>>
>>>>>> mdelay(10) doesn't really look good either to me in this case, though.
>>>>>
>>>>>
>>>>> Oh, I agree. What kind of ass-backwards device actually needs that
>>>>> kind of crazy delays? It is almost certainly buggy.
>>>>>
>>>>> With retries, 10ms delays are totally unacceptable. There's something
>>>>> wrong.
>>>>>
>>>>> A single ms *may* be ok.
>>>>>
>>>>> Anyway, can you also split the actual "write _one_ register with
>>>>> retry" into a function of its own? The code looks like crap with those
>>>>> multiple levels of looping, with conditionals inside them etc. With a
>>>>> simple helper function, you could change the break into return, and it
>>>>> would look much better, I bet.
>>>>
>>>>
>>>> Sure.  It appears cleaner this way.
>>>>
>>>> ---
>>>> From: Rafael J. Wysocki<rjw@sisk.pl>
>>>> Subject: PCI: Fix regression in pci_restore_state(), v3
>>>>
>>>> Commit 26f41062f28de65e11d3cf353e52d0be73442be1
>>>>
>>>>    PCI: check for pci bar restore completion and retry
>>>>
>>>> attempted to address problems with PCI BAR restoration on systems
>>>> where FLR had not been completed before pci_restore_state() was
>>>> called, but it did that in an utterly wrong way.
>>>>
>>>> First off, instead of retrying the writes for the BAR registers
>>>> only, it did that for all of the PCI config space of the device,
>>>> including the status register (whose value after the write quite
>>>> obviously need not be the same as the written one).  Second, it
>>>> added arbitrary delay to pci_restore_state() even for systems
>>>> where the PCI config space restoration was successful at first
>>>> attempt.  Finally, the mdelay(10) it added to every iteration of the
>>>> writing loop was way too much of a delay for any reasonable device.
>>>>
>>>> All of this actually caused resume failures for some devices on
>>>> the Mikko's system.
>>>>
>>>> To fix the regression, make pci_restore_state() only retry the
>>>> writes for BAR registers and only wait if the first read from
>>>> the register doesn't return the written value.  Additionaly, make
>>>> it wait for 1 ms, instead of 10 ms, after every failing attempt
>>>> to write into config space.
>>>>
>>>> Reported-by: Mikko Vinni<mmvinni@yahoo.com>
>>>> Signed-off-by: Rafael J. Wysocki<rjw@sisk.pl>
>>>> ---
>>>>  drivers/pci/pci.c |   57
>>>> ++++++++++++++++++++++++++++++++++++------------------
>>>>  1 file changed, 39 insertions(+), 18 deletions(-)
>>>>
>>>> Index: linux/drivers/pci/pci.c
>>>> ===================================================================
>>>> --- linux.orig/drivers/pci/pci.c
>>>> +++ linux/drivers/pci/pci.c
>>>> @@ -967,16 +967,47 @@ pci_save_state(struct pci_dev *dev)
>>>>        return 0;
>>>>  }
>>>>
>>>> +static void pci_restore_config_dword(struct pci_dev *pdev, int offset,
>>>> +                                    u32 saved_val, int retry)
>>>> +{
>>>> +       u32 val;
>>>> +
>>>> +       pci_read_config_dword(pdev, offset,&val);
>>>> +       if (val == saved_val)
>>>> +               return;
>>>> +
>>>> +       for (;;) {
>>>> +               dev_dbg(&pdev->dev, "restoring config space at offset "
>>>> +                       "%#x (was %#x, writing %#x)\n", offset, val,
>>>> saved_val);
>>>> +               pci_write_config_dword(pdev, offset, saved_val);
>>>> +               if (retry--<= 0)
>>>> +                       return;
>>>> +
>>>> +               pci_read_config_dword(pdev, offset,&val);
>>>> +               if (val == saved_val)
>>>> +                       return;
>>>> +
>>>> +               mdelay(1);
>>>> +       }
>>>> +}
>>>> +
>>>> +static void pci_restore_config_space(struct pci_dev *pdev, int start,
>>>> int end,
>>>> +                                    int retry)
>>>> +{
>>>> +       int index;
>>>> +
>>>> +       for (index = end; index>= start; index--)
>>>> +               pci_restore_config_dword(pdev, 4 * index,
>>>> +
>>>>  pdev->saved_config_space[index],
>>>> +                                        retry);
>>>> +}
>>>> +
>>>>  /**
>>>>  * pci_restore_state - Restore the saved state of a PCI device
>>>>  * @dev: - PCI device that we're dealing with
>>>>  */
>>>>  void pci_restore_state(struct pci_dev *dev)
>>>>  {
>>>> -       int i;
>>>> -       u32 val;
>>>> -       int tries;
>>>> -
>>>>        if (!dev->state_saved)
>>>>                return;
>>>>
>>>> @@ -984,24 +1015,14 @@ void pci_restore_state(struct pci_dev *d
>>>>        pci_restore_pcie_state(dev);
>>>>        pci_restore_ats_state(dev);
>>>>
>>>> +       pci_restore_config_space(dev, 10, 15, 0);
>>>>        /*
>>>>         * The Base Address register should be programmed before the
>>>> command
>>>>         * register(s)
>>>>         */
>>>> -       for (i = 15; i>= 0; i--) {
>>>> -               pci_read_config_dword(dev, i * 4,&val);
>>>> -               tries = 10;
>>>> -               while (tries&&  val != dev->saved_config_space[i]) {
>>>>
>>>> -                       dev_dbg(&dev->dev, "restoring config "
>>>> -                               "space at offset %#x (was %#x, writing
>>>> %#x)\n",
>>>> -                               i, val,
>>>> (int)dev->saved_config_space[i]);
>>>> -                       pci_write_config_dword(dev,i * 4,
>>>> -                               dev->saved_config_space[i]);
>>>> -                       pci_read_config_dword(dev, i * 4,&val);
>>>> -                       mdelay(10);
>>>> -                       tries--;
>>>> -               }
>>>> -       }
>>>> +       pci_restore_config_space(dev, 4, 9, 10);
>>>> +       pci_restore_config_space(dev, 0, 3, 0);
>>>> +
>>>>        pci_restore_pcix_state(dev);
>>>>        pci_restore_msi_state(dev);
>>>>        pci_restore_iov_state(dev);
>>>
>>>
>>> I'd feel better about this if there were a way to delay in the FLR
>>> path instead.  If we delay in the restore path, we're only fixing one
>>> of the many ways config space can be written.  Other paths that write
>>> config space will just silently fail.
>>>
>>> The PCIe spec (r3.0, sec 6.6.2) mentions waiting for the "pre-FLR
>>> value for Completion Timeout," but I don't see anything that looks
>>> like that in pcie_flr() or pci_af_flr().  Are there any other direct
>>> ways we can detect when the FLR is complete?
>>
>>
>> I'm not aware of any.
>>
>> Thanks,
>> Rafael
>
> I don't think so, either.
> I believe an ECN is being worked in the PCI-SIG
> to add such a notification, though.
> Even if adopted, need to wait for another crank of the hw before
> the notification can be used.

I agree, we can't do something that works only on new hardware -- we
have to make the existing hardware in the field work.

What about the "waiting for as much time as the pre-FLR value for
Completion Timeout" part?

Or can we do something like asserting FLR, sleeping 100ms, then
attempting a write to something in config space and retrying until it
sticks?  It's kludgy, but I'm not sure it's any worse than putting the
retries in the restore path, and it would have the advantage that
other writers of config space wouldn't have to worry.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Don Dutile April 24, 2012, 4:03 p.m. UTC | #5
On 04/23/2012 06:33 PM, Bjorn Helgaas wrote:
> On Mon, Apr 23, 2012 at 2:07 PM, Don Dutile<ddutile@redhat.com>  wrote:
>> On 04/23/2012 03:53 PM, Rafael J. Wysocki wrote:
>>>
>>> On Monday, April 23, 2012, Bjorn Helgaas wrote:
>>>>
>>>> On Sun, Apr 15, 2012 at 1:40 PM, Rafael J. Wysocki<rjw@sisk.pl>    wrote:
>>>>>
>>>>> On Sunday, April 15, 2012, Linus Torvalds wrote:
>>>>>>
>>>>>> On Sun, Apr 15, 2012 at 11:47 AM, Rafael J. Wysocki<rjw@sisk.pl>
>>>>>>   wrote:
>>>>>>>
>>>>>>>
>>>>>>> mdelay(10) doesn't really look good either to me in this case, though.
>>>>>>
>>>>>>
>>>>>> Oh, I agree. What kind of ass-backwards device actually needs that
>>>>>> kind of crazy delays? It is almost certainly buggy.
>>>>>>
>>>>>> With retries, 10ms delays are totally unacceptable. There's something
>>>>>> wrong.
>>>>>>
>>>>>> A single ms *may* be ok.
>>>>>>
>>>>>> Anyway, can you also split the actual "write _one_ register with
>>>>>> retry" into a function of its own? The code looks like crap with those
>>>>>> multiple levels of looping, with conditionals inside them etc. With a
>>>>>> simple helper function, you could change the break into return, and it
>>>>>> would look much better, I bet.
>>>>>
>>>>>
>>>>> Sure.  It appears cleaner this way.
>>>>>
>>>>> ---
>>>>> From: Rafael J. Wysocki<rjw@sisk.pl>
>>>>> Subject: PCI: Fix regression in pci_restore_state(), v3
>>>>>
>>>>> Commit 26f41062f28de65e11d3cf353e52d0be73442be1
>>>>>
>>>>>     PCI: check for pci bar restore completion and retry
>>>>>
>>>>> attempted to address problems with PCI BAR restoration on systems
>>>>> where FLR had not been completed before pci_restore_state() was
>>>>> called, but it did that in an utterly wrong way.
>>>>>
>>>>> First off, instead of retrying the writes for the BAR registers
>>>>> only, it did that for all of the PCI config space of the device,
>>>>> including the status register (whose value after the write quite
>>>>> obviously need not be the same as the written one).  Second, it
>>>>> added arbitrary delay to pci_restore_state() even for systems
>>>>> where the PCI config space restoration was successful at first
>>>>> attempt.  Finally, the mdelay(10) it added to every iteration of the
>>>>> writing loop was way too much of a delay for any reasonable device.
>>>>>
>>>>> All of this actually caused resume failures for some devices on
>>>>> the Mikko's system.
>>>>>
>>>>> To fix the regression, make pci_restore_state() only retry the
>>>>> writes for BAR registers and only wait if the first read from
>>>>> the register doesn't return the written value.  Additionaly, make
>>>>> it wait for 1 ms, instead of 10 ms, after every failing attempt
>>>>> to write into config space.
>>>>>
>>>>> Reported-by: Mikko Vinni<mmvinni@yahoo.com>
>>>>> Signed-off-by: Rafael J. Wysocki<rjw@sisk.pl>
>>>>> ---
>>>>>   drivers/pci/pci.c |   57
>>>>> ++++++++++++++++++++++++++++++++++++------------------
>>>>>   1 file changed, 39 insertions(+), 18 deletions(-)
>>>>>
>>>>> Index: linux/drivers/pci/pci.c
>>>>> ===================================================================
>>>>> --- linux.orig/drivers/pci/pci.c
>>>>> +++ linux/drivers/pci/pci.c
>>>>> @@ -967,16 +967,47 @@ pci_save_state(struct pci_dev *dev)
>>>>>         return 0;
>>>>>   }
>>>>>
>>>>> +static void pci_restore_config_dword(struct pci_dev *pdev, int offset,
>>>>> +                                    u32 saved_val, int retry)
>>>>> +{
>>>>> +       u32 val;
>>>>> +
>>>>> +       pci_read_config_dword(pdev, offset,&val);
>>>>> +       if (val == saved_val)
>>>>> +               return;
>>>>> +
>>>>> +       for (;;) {
>>>>> +               dev_dbg(&pdev->dev, "restoring config space at offset "
>>>>> +                       "%#x (was %#x, writing %#x)\n", offset, val,
>>>>> saved_val);
>>>>> +               pci_write_config_dword(pdev, offset, saved_val);
>>>>> +               if (retry--<= 0)
>>>>> +                       return;
>>>>> +
>>>>> +               pci_read_config_dword(pdev, offset,&val);
>>>>> +               if (val == saved_val)
>>>>> +                       return;
>>>>> +
>>>>> +               mdelay(1);
>>>>> +       }
>>>>> +}
>>>>> +
>>>>> +static void pci_restore_config_space(struct pci_dev *pdev, int start,
>>>>> int end,
>>>>> +                                    int retry)
>>>>> +{
>>>>> +       int index;
>>>>> +
>>>>> +       for (index = end; index>= start; index--)
>>>>> +               pci_restore_config_dword(pdev, 4 * index,
>>>>> +
>>>>>   pdev->saved_config_space[index],
>>>>> +                                        retry);
>>>>> +}
>>>>> +
>>>>>   /**
>>>>>   * pci_restore_state - Restore the saved state of a PCI device
>>>>>   * @dev: - PCI device that we're dealing with
>>>>>   */
>>>>>   void pci_restore_state(struct pci_dev *dev)
>>>>>   {
>>>>> -       int i;
>>>>> -       u32 val;
>>>>> -       int tries;
>>>>> -
>>>>>         if (!dev->state_saved)
>>>>>                 return;
>>>>>
>>>>> @@ -984,24 +1015,14 @@ void pci_restore_state(struct pci_dev *d
>>>>>         pci_restore_pcie_state(dev);
>>>>>         pci_restore_ats_state(dev);
>>>>>
>>>>> +       pci_restore_config_space(dev, 10, 15, 0);
>>>>>         /*
>>>>>          * The Base Address register should be programmed before the
>>>>> command
>>>>>          * register(s)
>>>>>          */
>>>>> -       for (i = 15; i>= 0; i--) {
>>>>> -               pci_read_config_dword(dev, i * 4,&val);
>>>>> -               tries = 10;
>>>>> -               while (tries&&    val != dev->saved_config_space[i]) {
>>>>>
>>>>> -                       dev_dbg(&dev->dev, "restoring config "
>>>>> -                               "space at offset %#x (was %#x, writing
>>>>> %#x)\n",
>>>>> -                               i, val,
>>>>> (int)dev->saved_config_space[i]);
>>>>> -                       pci_write_config_dword(dev,i * 4,
>>>>> -                               dev->saved_config_space[i]);
>>>>> -                       pci_read_config_dword(dev, i * 4,&val);
>>>>> -                       mdelay(10);
>>>>> -                       tries--;
>>>>> -               }
>>>>> -       }
>>>>> +       pci_restore_config_space(dev, 4, 9, 10);
>>>>> +       pci_restore_config_space(dev, 0, 3, 0);
>>>>> +
>>>>>         pci_restore_pcix_state(dev);
>>>>>         pci_restore_msi_state(dev);
>>>>>         pci_restore_iov_state(dev);
>>>>
>>>>
>>>> I'd feel better about this if there were a way to delay in the FLR
>>>> path instead.  If we delay in the restore path, we're only fixing one
>>>> of the many ways config space can be written.  Other paths that write
>>>> config space will just silently fail.
>>>>
>>>> The PCIe spec (r3.0, sec 6.6.2) mentions waiting for the "pre-FLR
>>>> value for Completion Timeout," but I don't see anything that looks
>>>> like that in pcie_flr() or pci_af_flr().  Are there any other direct
>>>> ways we can detect when the FLR is complete?
>>>
>>>
>>> I'm not aware of any.
>>>
>>> Thanks,
>>> Rafael
>>
>> I don't think so, either.
>> I believe an ECN is being worked in the PCI-SIG
>> to add such a notification, though.
>> Even if adopted, need to wait for another crank of the hw before
>> the notification can be used.
>
> I agree, we can't do something that works only on new hardware -- we
> have to make the existing hardware in the field work.
>
> What about the "waiting for as much time as the pre-FLR value for
> Completion Timeout" part?
>
> Or can we do something like asserting FLR, sleeping 100ms, then
> attempting a write to something in config space and retrying until it
> sticks?  It's kludgy, but I'm not sure it's any worse than putting the
> retries in the restore path, and it would have the advantage that
> other writers of config space wouldn't have to worry.
>
> Bjorn

Depending on system config, reading a port that is being FLR'd
can cause AERs, which if a driver is registered for the endpoint,
it will get AERs reported to the driver and potentially complicate the
FLRhandling.

This implies a hook to temp-disable AER during FLR, then turning it
back on (hw &/or sw).

- Don

ps -- and there was a deadly embrace where an AER induced during
       an FLR that was initiated by userspace (libvirt writing to device
       reset file in sysfs) would lock up the system b/c the AER
       handler did a config space access, which used the same mutex.
       Not sure if that was cleaned up finally.... very corner-case-ish,
       but it shows how subtle multiple PCIe events can become complicated.

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas April 24, 2012, 5:01 p.m. UTC | #6
On Tue, Apr 24, 2012 at 10:03 AM, Don Dutile <ddutile@redhat.com> wrote:
> On 04/23/2012 06:33 PM, Bjorn Helgaas wrote:
>>
>> On Mon, Apr 23, 2012 at 2:07 PM, Don Dutile<ddutile@redhat.com>  wrote:
>>>
>>> On 04/23/2012 03:53 PM, Rafael J. Wysocki wrote:
>>>>
>>>>
>>>> On Monday, April 23, 2012, Bjorn Helgaas wrote:
>>>>>
>>>>>
>>>>> On Sun, Apr 15, 2012 at 1:40 PM, Rafael J. Wysocki<rjw@sisk.pl>
>>>>>  wrote:
>>>>>>
>>>>>>
>>>>>> On Sunday, April 15, 2012, Linus Torvalds wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Apr 15, 2012 at 11:47 AM, Rafael J. Wysocki<rjw@sisk.pl>
>>>>>>>  wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> mdelay(10) doesn't really look good either to me in this case,
>>>>>>>> though.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Oh, I agree. What kind of ass-backwards device actually needs that
>>>>>>> kind of crazy delays? It is almost certainly buggy.
>>>>>>>
>>>>>>> With retries, 10ms delays are totally unacceptable. There's something
>>>>>>> wrong.
>>>>>>>
>>>>>>> A single ms *may* be ok.
>>>>>>>
>>>>>>> Anyway, can you also split the actual "write _one_ register with
>>>>>>> retry" into a function of its own? The code looks like crap with
>>>>>>> those
>>>>>>> multiple levels of looping, with conditionals inside them etc. With a
>>>>>>> simple helper function, you could change the break into return, and
>>>>>>> it
>>>>>>> would look much better, I bet.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Sure.  It appears cleaner this way.
>>>>>>
>>>>>> ---
>>>>>> From: Rafael J. Wysocki<rjw@sisk.pl>
>>>>>> Subject: PCI: Fix regression in pci_restore_state(), v3
>>>>>>
>>>>>> Commit 26f41062f28de65e11d3cf353e52d0be73442be1
>>>>>>
>>>>>>    PCI: check for pci bar restore completion and retry
>>>>>>
>>>>>> attempted to address problems with PCI BAR restoration on systems
>>>>>> where FLR had not been completed before pci_restore_state() was
>>>>>> called, but it did that in an utterly wrong way.
>>>>>>
>>>>>> First off, instead of retrying the writes for the BAR registers
>>>>>> only, it did that for all of the PCI config space of the device,
>>>>>> including the status register (whose value after the write quite
>>>>>> obviously need not be the same as the written one).  Second, it
>>>>>> added arbitrary delay to pci_restore_state() even for systems
>>>>>> where the PCI config space restoration was successful at first
>>>>>> attempt.  Finally, the mdelay(10) it added to every iteration of the
>>>>>> writing loop was way too much of a delay for any reasonable device.
>>>>>>
>>>>>> All of this actually caused resume failures for some devices on
>>>>>> the Mikko's system.
>>>>>>
>>>>>> To fix the regression, make pci_restore_state() only retry the
>>>>>> writes for BAR registers and only wait if the first read from
>>>>>> the register doesn't return the written value.  Additionaly, make
>>>>>> it wait for 1 ms, instead of 10 ms, after every failing attempt
>>>>>> to write into config space.
>>>>>>
>>>>>> Reported-by: Mikko Vinni<mmvinni@yahoo.com>
>>>>>> Signed-off-by: Rafael J. Wysocki<rjw@sisk.pl>
>>>>>> ---
>>>>>>  drivers/pci/pci.c |   57
>>>>>> ++++++++++++++++++++++++++++++++++++------------------
>>>>>>  1 file changed, 39 insertions(+), 18 deletions(-)
>>>>>>
>>>>>> Index: linux/drivers/pci/pci.c
>>>>>> ===================================================================
>>>>>> --- linux.orig/drivers/pci/pci.c
>>>>>> +++ linux/drivers/pci/pci.c
>>>>>> @@ -967,16 +967,47 @@ pci_save_state(struct pci_dev *dev)
>>>>>>        return 0;
>>>>>>  }
>>>>>>
>>>>>> +static void pci_restore_config_dword(struct pci_dev *pdev, int
>>>>>> offset,
>>>>>> +                                    u32 saved_val, int retry)
>>>>>> +{
>>>>>> +       u32 val;
>>>>>> +
>>>>>> +       pci_read_config_dword(pdev, offset,&val);
>>>>>> +       if (val == saved_val)
>>>>>> +               return;
>>>>>> +
>>>>>> +       for (;;) {
>>>>>> +               dev_dbg(&pdev->dev, "restoring config space at offset
>>>>>> "
>>>>>> +                       "%#x (was %#x, writing %#x)\n", offset, val,
>>>>>> saved_val);
>>>>>> +               pci_write_config_dword(pdev, offset, saved_val);
>>>>>> +               if (retry--<= 0)
>>>>>> +                       return;
>>>>>> +
>>>>>> +               pci_read_config_dword(pdev, offset,&val);
>>>>>> +               if (val == saved_val)
>>>>>> +                       return;
>>>>>> +
>>>>>> +               mdelay(1);
>>>>>> +       }
>>>>>> +}
>>>>>> +
>>>>>> +static void pci_restore_config_space(struct pci_dev *pdev, int start,
>>>>>> int end,
>>>>>> +                                    int retry)
>>>>>> +{
>>>>>> +       int index;
>>>>>> +
>>>>>> +       for (index = end; index>= start; index--)
>>>>>> +               pci_restore_config_dword(pdev, 4 * index,
>>>>>> +
>>>>>>  pdev->saved_config_space[index],
>>>>>> +                                        retry);
>>>>>> +}
>>>>>> +
>>>>>>  /**
>>>>>>  * pci_restore_state - Restore the saved state of a PCI device
>>>>>>  * @dev: - PCI device that we're dealing with
>>>>>>  */
>>>>>>  void pci_restore_state(struct pci_dev *dev)
>>>>>>  {
>>>>>> -       int i;
>>>>>> -       u32 val;
>>>>>> -       int tries;
>>>>>> -
>>>>>>        if (!dev->state_saved)
>>>>>>                return;
>>>>>>
>>>>>> @@ -984,24 +1015,14 @@ void pci_restore_state(struct pci_dev *d
>>>>>>        pci_restore_pcie_state(dev);
>>>>>>        pci_restore_ats_state(dev);
>>>>>>
>>>>>> +       pci_restore_config_space(dev, 10, 15, 0);
>>>>>>        /*
>>>>>>         * The Base Address register should be programmed before the
>>>>>> command
>>>>>>         * register(s)
>>>>>>         */
>>>>>> -       for (i = 15; i>= 0; i--) {
>>>>>> -               pci_read_config_dword(dev, i * 4,&val);
>>>>>> -               tries = 10;
>>>>>> -               while (tries&&    val != dev->saved_config_space[i]) {
>>>>>>
>>>>>> -                       dev_dbg(&dev->dev, "restoring config "
>>>>>> -                               "space at offset %#x (was %#x, writing
>>>>>> %#x)\n",
>>>>>> -                               i, val,
>>>>>> (int)dev->saved_config_space[i]);
>>>>>> -                       pci_write_config_dword(dev,i * 4,
>>>>>> -                               dev->saved_config_space[i]);
>>>>>> -                       pci_read_config_dword(dev, i * 4,&val);
>>>>>> -                       mdelay(10);
>>>>>> -                       tries--;
>>>>>> -               }
>>>>>> -       }
>>>>>> +       pci_restore_config_space(dev, 4, 9, 10);
>>>>>> +       pci_restore_config_space(dev, 0, 3, 0);
>>>>>> +
>>>>>>        pci_restore_pcix_state(dev);
>>>>>>        pci_restore_msi_state(dev);
>>>>>>        pci_restore_iov_state(dev);
>>>>>
>>>>>
>>>>>
>>>>> I'd feel better about this if there were a way to delay in the FLR
>>>>> path instead.  If we delay in the restore path, we're only fixing one
>>>>> of the many ways config space can be written.  Other paths that write
>>>>> config space will just silently fail.
>>>>>
>>>>> The PCIe spec (r3.0, sec 6.6.2) mentions waiting for the "pre-FLR
>>>>> value for Completion Timeout," but I don't see anything that looks
>>>>> like that in pcie_flr() or pci_af_flr().  Are there any other direct
>>>>> ways we can detect when the FLR is complete?
>>>>
>>>>
>>>>
>>>> I'm not aware of any.
>>>>
>>>> Thanks,
>>>> Rafael
>>>
>>>
>>> I don't think so, either.
>>> I believe an ECN is being worked in the PCI-SIG
>>> to add such a notification, though.
>>> Even if adopted, need to wait for another crank of the hw before
>>> the notification can be used.
>>
>>
>> I agree, we can't do something that works only on new hardware -- we
>> have to make the existing hardware in the field work.
>>
>> What about the "waiting for as much time as the pre-FLR value for
>> Completion Timeout" part?
>>
>> Or can we do something like asserting FLR, sleeping 100ms, then
>> attempting a write to something in config space and retrying until it
>> sticks?  It's kludgy, but I'm not sure it's any worse than putting the
>> retries in the restore path, and it would have the advantage that
>> other writers of config space wouldn't have to worry.
>
> Depending on system config, reading a port that is being FLR'd
> can cause AERs, which if a driver is registered for the endpoint,
> it will get AERs reported to the driver and potentially complicate the
> FLRhandling.
>
> This implies a hook to temp-disable AER during FLR, then turning it
> back on (hw &/or sw).

Don't we have the same possibility of causing AERs if we add this
retry stuff in pci_restore_state()?  It sounds like the same
possibility exists regardless of whether the config access happens in
the FLR path or the restore path.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Don Dutile April 24, 2012, 5:35 p.m. UTC | #7
On 04/24/2012 01:01 PM, Bjorn Helgaas wrote:
> On Tue, Apr 24, 2012 at 10:03 AM, Don Dutile<ddutile@redhat.com>  wrote:
>> On 04/23/2012 06:33 PM, Bjorn Helgaas wrote:
>>>
>>> On Mon, Apr 23, 2012 at 2:07 PM, Don Dutile<ddutile@redhat.com>    wrote:
>>>>
>>>> On 04/23/2012 03:53 PM, Rafael J. Wysocki wrote:
>>>>>
>>>>>
>>>>> On Monday, April 23, 2012, Bjorn Helgaas wrote:
>>>>>>
>>>>>>
>>>>>> On Sun, Apr 15, 2012 at 1:40 PM, Rafael J. Wysocki<rjw@sisk.pl>
>>>>>>   wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Sunday, April 15, 2012, Linus Torvalds wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, Apr 15, 2012 at 11:47 AM, Rafael J. Wysocki<rjw@sisk.pl>
>>>>>>>>   wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> mdelay(10) doesn't really look good either to me in this case,
>>>>>>>>> though.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Oh, I agree. What kind of ass-backwards device actually needs that
>>>>>>>> kind of crazy delays? It is almost certainly buggy.
>>>>>>>>
>>>>>>>> With retries, 10ms delays are totally unacceptable. There's something
>>>>>>>> wrong.
>>>>>>>>
>>>>>>>> A single ms *may* be ok.
>>>>>>>>
>>>>>>>> Anyway, can you also split the actual "write _one_ register with
>>>>>>>> retry" into a function of its own? The code looks like crap with
>>>>>>>> those
>>>>>>>> multiple levels of looping, with conditionals inside them etc. With a
>>>>>>>> simple helper function, you could change the break into return, and
>>>>>>>> it
>>>>>>>> would look much better, I bet.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Sure.  It appears cleaner this way.
>>>>>>>
>>>>>>> ---
>>>>>>> From: Rafael J. Wysocki<rjw@sisk.pl>
>>>>>>> Subject: PCI: Fix regression in pci_restore_state(), v3
>>>>>>>
>>>>>>> Commit 26f41062f28de65e11d3cf353e52d0be73442be1
>>>>>>>
>>>>>>>     PCI: check for pci bar restore completion and retry
>>>>>>>
>>>>>>> attempted to address problems with PCI BAR restoration on systems
>>>>>>> where FLR had not been completed before pci_restore_state() was
>>>>>>> called, but it did that in an utterly wrong way.
>>>>>>>
>>>>>>> First off, instead of retrying the writes for the BAR registers
>>>>>>> only, it did that for all of the PCI config space of the device,
>>>>>>> including the status register (whose value after the write quite
>>>>>>> obviously need not be the same as the written one).  Second, it
>>>>>>> added arbitrary delay to pci_restore_state() even for systems
>>>>>>> where the PCI config space restoration was successful at first
>>>>>>> attempt.  Finally, the mdelay(10) it added to every iteration of the
>>>>>>> writing loop was way too much of a delay for any reasonable device.
>>>>>>>
>>>>>>> All of this actually caused resume failures for some devices on
>>>>>>> the Mikko's system.
>>>>>>>
>>>>>>> To fix the regression, make pci_restore_state() only retry the
>>>>>>> writes for BAR registers and only wait if the first read from
>>>>>>> the register doesn't return the written value.  Additionaly, make
>>>>>>> it wait for 1 ms, instead of 10 ms, after every failing attempt
>>>>>>> to write into config space.
>>>>>>>
>>>>>>> Reported-by: Mikko Vinni<mmvinni@yahoo.com>
>>>>>>> Signed-off-by: Rafael J. Wysocki<rjw@sisk.pl>
>>>>>>> ---
>>>>>>>   drivers/pci/pci.c |   57
>>>>>>> ++++++++++++++++++++++++++++++++++++------------------
>>>>>>>   1 file changed, 39 insertions(+), 18 deletions(-)
>>>>>>>
>>>>>>> Index: linux/drivers/pci/pci.c
>>>>>>> ===================================================================
>>>>>>> --- linux.orig/drivers/pci/pci.c
>>>>>>> +++ linux/drivers/pci/pci.c
>>>>>>> @@ -967,16 +967,47 @@ pci_save_state(struct pci_dev *dev)
>>>>>>>         return 0;
>>>>>>>   }
>>>>>>>
>>>>>>> +static void pci_restore_config_dword(struct pci_dev *pdev, int
>>>>>>> offset,
>>>>>>> +                                    u32 saved_val, int retry)
>>>>>>> +{
>>>>>>> +       u32 val;
>>>>>>> +
>>>>>>> +       pci_read_config_dword(pdev, offset,&val);
>>>>>>> +       if (val == saved_val)
>>>>>>> +               return;
>>>>>>> +
>>>>>>> +       for (;;) {
>>>>>>> +               dev_dbg(&pdev->dev, "restoring config space at offset
>>>>>>> "
>>>>>>> +                       "%#x (was %#x, writing %#x)\n", offset, val,
>>>>>>> saved_val);
>>>>>>> +               pci_write_config_dword(pdev, offset, saved_val);
>>>>>>> +               if (retry--<= 0)
>>>>>>> +                       return;
>>>>>>> +
>>>>>>> +               pci_read_config_dword(pdev, offset,&val);
>>>>>>> +               if (val == saved_val)
>>>>>>> +                       return;
>>>>>>> +
>>>>>>> +               mdelay(1);
>>>>>>> +       }
>>>>>>> +}
>>>>>>> +
>>>>>>> +static void pci_restore_config_space(struct pci_dev *pdev, int start,
>>>>>>> int end,
>>>>>>> +                                    int retry)
>>>>>>> +{
>>>>>>> +       int index;
>>>>>>> +
>>>>>>> +       for (index = end; index>= start; index--)
>>>>>>> +               pci_restore_config_dword(pdev, 4 * index,
>>>>>>> +
>>>>>>>   pdev->saved_config_space[index],
>>>>>>> +                                        retry);
>>>>>>> +}
>>>>>>> +
>>>>>>>   /**
>>>>>>>   * pci_restore_state - Restore the saved state of a PCI device
>>>>>>>   * @dev: - PCI device that we're dealing with
>>>>>>>   */
>>>>>>>   void pci_restore_state(struct pci_dev *dev)
>>>>>>>   {
>>>>>>> -       int i;
>>>>>>> -       u32 val;
>>>>>>> -       int tries;
>>>>>>> -
>>>>>>>         if (!dev->state_saved)
>>>>>>>                 return;
>>>>>>>
>>>>>>> @@ -984,24 +1015,14 @@ void pci_restore_state(struct pci_dev *d
>>>>>>>         pci_restore_pcie_state(dev);
>>>>>>>         pci_restore_ats_state(dev);
>>>>>>>
>>>>>>> +       pci_restore_config_space(dev, 10, 15, 0);
>>>>>>>         /*
>>>>>>>          * The Base Address register should be programmed before the
>>>>>>> command
>>>>>>>          * register(s)
>>>>>>>          */
>>>>>>> -       for (i = 15; i>= 0; i--) {
>>>>>>> -               pci_read_config_dword(dev, i * 4,&val);
>>>>>>> -               tries = 10;
>>>>>>> -               while (tries&&      val != dev->saved_config_space[i]) {
>>>>>>>
>>>>>>> -                       dev_dbg(&dev->dev, "restoring config "
>>>>>>> -                               "space at offset %#x (was %#x, writing
>>>>>>> %#x)\n",
>>>>>>> -                               i, val,
>>>>>>> (int)dev->saved_config_space[i]);
>>>>>>> -                       pci_write_config_dword(dev,i * 4,
>>>>>>> -                               dev->saved_config_space[i]);
>>>>>>> -                       pci_read_config_dword(dev, i * 4,&val);
>>>>>>> -                       mdelay(10);
>>>>>>> -                       tries--;
>>>>>>> -               }
>>>>>>> -       }
>>>>>>> +       pci_restore_config_space(dev, 4, 9, 10);
>>>>>>> +       pci_restore_config_space(dev, 0, 3, 0);
>>>>>>> +
>>>>>>>         pci_restore_pcix_state(dev);
>>>>>>>         pci_restore_msi_state(dev);
>>>>>>>         pci_restore_iov_state(dev);
>>>>>>
>>>>>>
>>>>>>
>>>>>> I'd feel better about this if there were a way to delay in the FLR
>>>>>> path instead.  If we delay in the restore path, we're only fixing one
>>>>>> of the many ways config space can be written.  Other paths that write
>>>>>> config space will just silently fail.
>>>>>>
>>>>>> The PCIe spec (r3.0, sec 6.6.2) mentions waiting for the "pre-FLR
>>>>>> value for Completion Timeout," but I don't see anything that looks
>>>>>> like that in pcie_flr() or pci_af_flr().  Are there any other direct
>>>>>> ways we can detect when the FLR is complete?
>>>>>
>>>>>
>>>>>
>>>>> I'm not aware of any.
>>>>>
>>>>> Thanks,
>>>>> Rafael
>>>>
>>>>
>>>> I don't think so, either.
>>>> I believe an ECN is being worked in the PCI-SIG
>>>> to add such a notification, though.
>>>> Even if adopted, need to wait for another crank of the hw before
>>>> the notification can be used.
>>>
>>>
>>> I agree, we can't do something that works only on new hardware -- we
>>> have to make the existing hardware in the field work.
>>>
>>> What about the "waiting for as much time as the pre-FLR value for
>>> Completion Timeout" part?
>>>
>>> Or can we do something like asserting FLR, sleeping 100ms, then
>>> attempting a write to something in config space and retrying until it
>>> sticks?  It's kludgy, but I'm not sure it's any worse than putting the
>>> retries in the restore path, and it would have the advantage that
>>> other writers of config space wouldn't have to worry.
>>
>> Depending on system config, reading a port that is being FLR'd
>> can cause AERs, which if a driver is registered for the endpoint,
>> it will get AERs reported to the driver and potentially complicate the
>> FLRhandling.
>>
>> This implies a hook to temp-disable AER during FLR, then turning it
>> back on (hw&/or sw).
>
> Don't we have the same possibility of causing AERs if we add this
> retry stuff in pci_restore_state()?  It sounds like the same
> possibility exists regardless of whether the config access happens in
> the FLR path or the restore path.


Dont know; have to re-read thread on where/when pci_restore_state() is recalled
wrt link synch/enable state(s).

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas April 27, 2012, 10:20 p.m. UTC | #8
On Tue, Apr 24, 2012 at 11:35 AM, Don Dutile <ddutile@redhat.com> wrote:
> On 04/24/2012 01:01 PM, Bjorn Helgaas wrote:
>>
>> On Tue, Apr 24, 2012 at 10:03 AM, Don Dutile<ddutile@redhat.com>  wrote:
>>>
>>> On 04/23/2012 06:33 PM, Bjorn Helgaas wrote:
>>>>
>>>>
>>>> On Mon, Apr 23, 2012 at 2:07 PM, Don Dutile<ddutile@redhat.com>
>>>>  wrote:
>>>>>
>>>>>
>>>>> On 04/23/2012 03:53 PM, Rafael J. Wysocki wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Monday, April 23, 2012, Bjorn Helgaas wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Apr 15, 2012 at 1:40 PM, Rafael J. Wysocki<rjw@sisk.pl>
>>>>>>>  wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sunday, April 15, 2012, Linus Torvalds wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, Apr 15, 2012 at 11:47 AM, Rafael J. Wysocki<rjw@sisk.pl>
>>>>>>>>>  wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> mdelay(10) doesn't really look good either to me in this case,
>>>>>>>>>> though.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Oh, I agree. What kind of ass-backwards device actually needs that
>>>>>>>>> kind of crazy delays? It is almost certainly buggy.
>>>>>>>>>
>>>>>>>>> With retries, 10ms delays are totally unacceptable. There's
>>>>>>>>> something
>>>>>>>>> wrong.
>>>>>>>>>
>>>>>>>>> A single ms *may* be ok.
>>>>>>>>>
>>>>>>>>> Anyway, can you also split the actual "write _one_ register with
>>>>>>>>> retry" into a function of its own? The code looks like crap with
>>>>>>>>> those
>>>>>>>>> multiple levels of looping, with conditionals inside them etc. With
>>>>>>>>> a
>>>>>>>>> simple helper function, you could change the break into return, and
>>>>>>>>> it
>>>>>>>>> would look much better, I bet.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Sure.  It appears cleaner this way.
>>>>>>>>
>>>>>>>> ---
>>>>>>>> From: Rafael J. Wysocki<rjw@sisk.pl>
>>>>>>>> Subject: PCI: Fix regression in pci_restore_state(), v3
>>>>>>>>
>>>>>>>> Commit 26f41062f28de65e11d3cf353e52d0be73442be1
>>>>>>>>
>>>>>>>>    PCI: check for pci bar restore completion and retry
>>>>>>>>
>>>>>>>> attempted to address problems with PCI BAR restoration on systems
>>>>>>>> where FLR had not been completed before pci_restore_state() was
>>>>>>>> called, but it did that in an utterly wrong way.
>>>>>>>>
>>>>>>>> First off, instead of retrying the writes for the BAR registers
>>>>>>>> only, it did that for all of the PCI config space of the device,
>>>>>>>> including the status register (whose value after the write quite
>>>>>>>> obviously need not be the same as the written one).  Second, it
>>>>>>>> added arbitrary delay to pci_restore_state() even for systems
>>>>>>>> where the PCI config space restoration was successful at first
>>>>>>>> attempt.  Finally, the mdelay(10) it added to every iteration of the
>>>>>>>> writing loop was way too much of a delay for any reasonable device.
>>>>>>>>
>>>>>>>> All of this actually caused resume failures for some devices on
>>>>>>>> the Mikko's system.
>>>>>>>>
>>>>>>>> To fix the regression, make pci_restore_state() only retry the
>>>>>>>> writes for BAR registers and only wait if the first read from
>>>>>>>> the register doesn't return the written value.  Additionaly, make
>>>>>>>> it wait for 1 ms, instead of 10 ms, after every failing attempt
>>>>>>>> to write into config space.
>>>>>>>>
>>>>>>>> Reported-by: Mikko Vinni<mmvinni@yahoo.com>
>>>>>>>> Signed-off-by: Rafael J. Wysocki<rjw@sisk.pl>
>>>>>>>> ---
>>>>>>>>  drivers/pci/pci.c |   57
>>>>>>>> ++++++++++++++++++++++++++++++++++++------------------
>>>>>>>>  1 file changed, 39 insertions(+), 18 deletions(-)
>>>>>>>>
>>>>>>>> Index: linux/drivers/pci/pci.c
>>>>>>>> ===================================================================
>>>>>>>> --- linux.orig/drivers/pci/pci.c
>>>>>>>> +++ linux/drivers/pci/pci.c
>>>>>>>> @@ -967,16 +967,47 @@ pci_save_state(struct pci_dev *dev)
>>>>>>>>        return 0;
>>>>>>>>  }
>>>>>>>>
>>>>>>>> +static void pci_restore_config_dword(struct pci_dev *pdev, int
>>>>>>>> offset,
>>>>>>>> +                                    u32 saved_val, int retry)
>>>>>>>> +{
>>>>>>>> +       u32 val;
>>>>>>>> +
>>>>>>>> +       pci_read_config_dword(pdev, offset,&val);
>>>>>>>> +       if (val == saved_val)
>>>>>>>> +               return;
>>>>>>>> +
>>>>>>>> +       for (;;) {
>>>>>>>> +               dev_dbg(&pdev->dev, "restoring config space at
>>>>>>>> offset
>>>>>>>> "
>>>>>>>> +                       "%#x (was %#x, writing %#x)\n", offset, val,
>>>>>>>> saved_val);
>>>>>>>> +               pci_write_config_dword(pdev, offset, saved_val);
>>>>>>>> +               if (retry--<= 0)
>>>>>>>> +                       return;
>>>>>>>> +
>>>>>>>> +               pci_read_config_dword(pdev, offset,&val);
>>>>>>>> +               if (val == saved_val)
>>>>>>>> +                       return;
>>>>>>>> +
>>>>>>>> +               mdelay(1);
>>>>>>>> +       }
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void pci_restore_config_space(struct pci_dev *pdev, int
>>>>>>>> start,
>>>>>>>> int end,
>>>>>>>> +                                    int retry)
>>>>>>>> +{
>>>>>>>> +       int index;
>>>>>>>> +
>>>>>>>> +       for (index = end; index>= start; index--)
>>>>>>>> +               pci_restore_config_dword(pdev, 4 * index,
>>>>>>>> +
>>>>>>>>  pdev->saved_config_space[index],
>>>>>>>> +                                        retry);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>  /**
>>>>>>>>  * pci_restore_state - Restore the saved state of a PCI device
>>>>>>>>  * @dev: - PCI device that we're dealing with
>>>>>>>>  */
>>>>>>>>  void pci_restore_state(struct pci_dev *dev)
>>>>>>>>  {
>>>>>>>> -       int i;
>>>>>>>> -       u32 val;
>>>>>>>> -       int tries;
>>>>>>>> -
>>>>>>>>        if (!dev->state_saved)
>>>>>>>>                return;
>>>>>>>>
>>>>>>>> @@ -984,24 +1015,14 @@ void pci_restore_state(struct pci_dev *d
>>>>>>>>        pci_restore_pcie_state(dev);
>>>>>>>>        pci_restore_ats_state(dev);
>>>>>>>>
>>>>>>>> +       pci_restore_config_space(dev, 10, 15, 0);
>>>>>>>>        /*
>>>>>>>>         * The Base Address register should be programmed before the
>>>>>>>> command
>>>>>>>>         * register(s)
>>>>>>>>         */
>>>>>>>> -       for (i = 15; i>= 0; i--) {
>>>>>>>> -               pci_read_config_dword(dev, i * 4,&val);
>>>>>>>> -               tries = 10;
>>>>>>>> -               while (tries&&      val !=
>>>>>>>> dev->saved_config_space[i]) {
>>>>>>>>
>>>>>>>> -                       dev_dbg(&dev->dev, "restoring config "
>>>>>>>> -                               "space at offset %#x (was %#x,
>>>>>>>> writing
>>>>>>>> %#x)\n",
>>>>>>>> -                               i, val,
>>>>>>>> (int)dev->saved_config_space[i]);
>>>>>>>> -                       pci_write_config_dword(dev,i * 4,
>>>>>>>> -                               dev->saved_config_space[i]);
>>>>>>>> -                       pci_read_config_dword(dev, i * 4,&val);
>>>>>>>> -                       mdelay(10);
>>>>>>>> -                       tries--;
>>>>>>>> -               }
>>>>>>>> -       }
>>>>>>>> +       pci_restore_config_space(dev, 4, 9, 10);
>>>>>>>> +       pci_restore_config_space(dev, 0, 3, 0);
>>>>>>>> +
>>>>>>>>        pci_restore_pcix_state(dev);
>>>>>>>>        pci_restore_msi_state(dev);
>>>>>>>>        pci_restore_iov_state(dev);
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I'd feel better about this if there were a way to delay in the FLR
>>>>>>> path instead.  If we delay in the restore path, we're only fixing one
>>>>>>> of the many ways config space can be written.  Other paths that write
>>>>>>> config space will just silently fail.
>>>>>>>
>>>>>>> The PCIe spec (r3.0, sec 6.6.2) mentions waiting for the "pre-FLR
>>>>>>> value for Completion Timeout," but I don't see anything that looks
>>>>>>> like that in pcie_flr() or pci_af_flr().  Are there any other direct
>>>>>>> ways we can detect when the FLR is complete?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> I'm not aware of any.
>>>>>>
>>>>>> Thanks,
>>>>>> Rafael
>>>>>
>>>>>
>>>>>
>>>>> I don't think so, either.
>>>>> I believe an ECN is being worked in the PCI-SIG
>>>>> to add such a notification, though.
>>>>> Even if adopted, need to wait for another crank of the hw before
>>>>> the notification can be used.
>>>>
>>>>
>>>>
>>>> I agree, we can't do something that works only on new hardware -- we
>>>> have to make the existing hardware in the field work.
>>>>
>>>> What about the "waiting for as much time as the pre-FLR value for
>>>> Completion Timeout" part?
>>>>
>>>> Or can we do something like asserting FLR, sleeping 100ms, then
>>>> attempting a write to something in config space and retrying until it
>>>> sticks?  It's kludgy, but I'm not sure it's any worse than putting the
>>>> retries in the restore path, and it would have the advantage that
>>>> other writers of config space wouldn't have to worry.
>>>
>>>
>>> Depending on system config, reading a port that is being FLR'd
>>> can cause AERs, which if a driver is registered for the endpoint,
>>> it will get AERs reported to the driver and potentially complicate the
>>> FLRhandling.
>>>
>>> This implies a hook to temp-disable AER during FLR, then turning it
>>> back on (hw&/or sw).
>>
>> Don't we have the same possibility of causing AERs if we add this
>> retry stuff in pci_restore_state()?  It sounds like the same
>> possibility exists regardless of whether the config access happens in
>> the FLR path or the restore path.
>
> Dont know; have to re-read thread on where/when pci_restore_state() is
> recalled
> wrt link synch/enable state(s).

I don't know what to do with this patch yet.  Here's what I think
happens with this patch applied, in a case where we have to do some
retries:

    pcie_flr
      pci_write_config_word(dev, PCI_EXP_DEVCTL)  /* start FLR */
      msleep(100)
    ...
    pci_restore_state
      pci_read_config_dword(dev, 0x28)  /* CardBus CIS pointer */
      pci_write_config_dword(dev, 0x28)
      pci_read_config_dword(dev, 0x30)  /* Subsystem ID/vendor ID */
      pci_write_config_dword(dev, 0x30)
      ...
      pci_read_config_dword(dev, 0x10)  /* BAR 0 */
      for (;;) {
        pci_write_config_dword(dev, 0x10, saved_val)
        pci_read_config_dword(dev, 0x10, &val)
        if (val == saved_val)
          break;
      }

1) What happens if another code path reads or writes config space
before pci_restore_path()?  As far as I can tell, it fails silently.

2) Apparently (per Don), config access to a device in the middle of
FLR can cause AER.  If that's true, it seems like even this patch
could cause an AER, and the only possible thing is to rely on a delay
in pcie_flr() or temporarily disable AER.

3) If a config space write being accepted is a signal that the FLR is
complete, why don't we first restore a register where we *check*
whether the write is accepted?  The patch restores dwords 10-15 first,
and we don't retry those, so it seems like those updates could be
lost.  In fact, we do lots of other config writes in
pci_restore_pcie_state(), etc., and presumably all those could be
being dropped.

4) Why doesn't pcie_flr() do anything with the Completion Timeout
(PCIe spec r3.0, sec 6.6.2)?  It seems like using this could cause us
to wait longer than the current 100ms in pcie_flr().

5) If we're not retrying, it seems pointless to even read the register
first.  A nit, I know.

I don't feel comfortable that we understand the real problem here, so
I'm reluctant to apply the patch.  Obviously I know little about FLR
and the restore path, so possibly I just need to be educated :)

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: linux/drivers/pci/pci.c
===================================================================
--- linux.orig/drivers/pci/pci.c
+++ linux/drivers/pci/pci.c
@@ -967,16 +967,47 @@  pci_save_state(struct pci_dev *dev)
 	return 0;
 }
 
+static void pci_restore_config_dword(struct pci_dev *pdev, int offset,
+				     u32 saved_val, int retry)
+{
+	u32 val;
+
+	pci_read_config_dword(pdev, offset, &val);
+	if (val == saved_val)
+		return;
+
+	for (;;) {
+		dev_dbg(&pdev->dev, "restoring config space at offset "
+			"%#x (was %#x, writing %#x)\n", offset, val, saved_val);
+		pci_write_config_dword(pdev, offset, saved_val);
+		if (retry-- <= 0)
+			return;
+
+		pci_read_config_dword(pdev, offset, &val);
+		if (val == saved_val)
+			return;
+
+		mdelay(1);
+	}
+}
+
+static void pci_restore_config_space(struct pci_dev *pdev, int start, int end,
+				     int retry)
+{
+	int index;
+
+	for (index = end; index >= start; index--)
+		pci_restore_config_dword(pdev, 4 * index,
+					 pdev->saved_config_space[index],
+					 retry);
+}
+
 /** 
  * pci_restore_state - Restore the saved state of a PCI device
  * @dev: - PCI device that we're dealing with
  */
 void pci_restore_state(struct pci_dev *dev)
 {
-	int i;
-	u32 val;
-	int tries;
-
 	if (!dev->state_saved)
 		return;
 
@@ -984,24 +1015,14 @@  void pci_restore_state(struct pci_dev *d
 	pci_restore_pcie_state(dev);
 	pci_restore_ats_state(dev);
 
+	pci_restore_config_space(dev, 10, 15, 0);
 	/*
 	 * The Base Address register should be programmed before the command
 	 * register(s)
 	 */
-	for (i = 15; i >= 0; i--) {
-		pci_read_config_dword(dev, i * 4, &val);
-		tries = 10;		
-		while (tries && val != dev->saved_config_space[i]) {
-			dev_dbg(&dev->dev, "restoring config "
-				"space at offset %#x (was %#x, writing %#x)\n",
-				i, val, (int)dev->saved_config_space[i]);
-			pci_write_config_dword(dev,i * 4,
-				dev->saved_config_space[i]);
-			pci_read_config_dword(dev, i * 4, &val);
-			mdelay(10);
-			tries--;
-		}
-	}
+	pci_restore_config_space(dev, 4, 9, 10);
+	pci_restore_config_space(dev, 0, 3, 0);
+
 	pci_restore_pcix_state(dev);
 	pci_restore_msi_state(dev);
 	pci_restore_iov_state(dev);