diff mbox

pci: Disable slot presence detection around bus reset

Message ID 20130214183640.16220.33060.stgit@bling.home
State Superseded
Headers show

Commit Message

Alex Williamson Feb. 14, 2013, 6:37 p.m. UTC
A bus reset can trigger a presence detection change and result in a
suprise hotplug.  This is generally not what we want to happen when
trying to reset a device.  Disable the presence detection control on
on bridges around bus reset.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/pci/pci.c |   29 ++++++++++++++++++++++++-----
 1 file changed, 24 insertions(+), 5 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Alex Williamson Feb. 14, 2013, 10:17 p.m. UTC | #1
On Thu, 2013-02-14 at 11:37 -0700, Alex Williamson wrote:
> A bus reset can trigger a presence detection change and result in a
> suprise hotplug.  This is generally not what we want to happen when
> trying to reset a device.  Disable the presence detection control on
> on bridges around bus reset.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
>  drivers/pci/pci.c |   29 ++++++++++++++++++++++++-----
>  1 file changed, 24 insertions(+), 5 deletions(-)


Hmm, this doesn't seem to be sufficient, still seeing it
occasionally :-\

> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 5cb5820..c1f7d77 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3229,8 +3229,8 @@ static int pci_pm_reset(struct pci_dev *dev, int probe)
>  
>  static int pci_parent_bus_reset(struct pci_dev *dev, int probe)
>  {
> -	u16 ctrl;
> -	struct pci_dev *pdev;
> +	u16 ctrl, flags, sltctl = 0;
> +	struct pci_dev *pdev, *bridge;
>  
>  	if (pci_is_root_bus(dev->bus) || dev->subordinate || !dev->bus->self)
>  		return -ENOTTY;
> @@ -3242,15 +3242,34 @@ static int pci_parent_bus_reset(struct pci_dev *dev, int probe)
>  	if (probe)
>  		return 0;
>  
> -	pci_read_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, &ctrl);
> +	bridge = dev->bus->self;
> +
> +	/*
> +	 * If the parent device supports a slot with presence detection
> +	 * change enabled, holding the bus in reset can trigger that and
> +	 * cause an unwanted surprise removal.  Disable presence detection
> +	 * around the bus reset.
> +	 */
> +	pcie_capability_read_word(bridge, PCI_EXP_FLAGS, &flags);
> +	if (flags & PCI_EXP_FLAGS_SLOT) {
> +		pcie_capability_read_word(bridge, PCI_EXP_SLTCTL, &sltctl);
> +		if (sltctl & PCI_EXP_SLTCTL_PDCE)
> +			pcie_capability_write_word(bridge, PCI_EXP_SLTCTL,
> +						sltctl & ~PCI_EXP_SLTCTL_PDCE);
> +	}
> +
> +	pci_read_config_word(bridge, PCI_BRIDGE_CONTROL, &ctrl);
>  	ctrl |= PCI_BRIDGE_CTL_BUS_RESET;
> -	pci_write_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, ctrl);
> +	pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, ctrl);
>  	msleep(100);
>  
>  	ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET;
> -	pci_write_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, ctrl);
> +	pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, ctrl);
>  	msleep(100);
>  
> +	if (sltctl & PCI_EXP_SLTCTL_PDCE)
> +		pcie_capability_write_word(bridge, PCI_EXP_SLTCTL, sltctl);
> +
>  	return 0;
>  }
>  
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin Mokrejs Feb. 14, 2013, 11:18 p.m. UTC | #2
Hi Alex,
  I was just going to ask you whether your patch would "explain" why pciehp has
in my experience broken presence detection while acpiphp has not (on 3.7 kernel)
and whether the patch will fix it.
  Some testing I have done in the past on 3.2 kernel and on 3.7.1, with no fixes.
Maybe you are interested in these threads? Actually, another user confirmed that
pciehp is broken on 3.7 while luckily, he also could have shifted to acpiphp.
Still, it is weird the behavior is different for different express cards
(USB3 vs. SATA vs. RS232 vs. firewire).

Four thread subjects on card presence detection:

Re: 3.2.11: PCI Express card cannot be re-detected withing cca 60sec timeframe
Re: linux-3.4-rc5: eSATA Sil3132 ExpressCard removal results in warn_slowpath_common
Re: Dell Vostro 3550: pci_hotplug+acpiphp require 'pcie_aspm=force' on kernel command-line for hotplug to work
Re: [PATCH v3 3/6] ACPI/pci_slot: update PCI slot information when PCI hotplug event happens
Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

Maybe you will crack it? ;-)
Thanks,
Martin

Alex Williamson wrote:
> On Thu, 2013-02-14 at 11:37 -0700, Alex Williamson wrote:
>> A bus reset can trigger a presence detection change and result in a
>> suprise hotplug.  This is generally not what we want to happen when
>> trying to reset a device.  Disable the presence detection control on
>> on bridges around bus reset.
>>
>> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
>> ---
>>  drivers/pci/pci.c |   29 ++++++++++++++++++++++++-----
>>  1 file changed, 24 insertions(+), 5 deletions(-)
> 
> 
> Hmm, this doesn't seem to be sufficient, still seeing it
> occasionally :-\
> 
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 5cb5820..c1f7d77 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -3229,8 +3229,8 @@ static int pci_pm_reset(struct pci_dev *dev, int probe)
>>  
>>  static int pci_parent_bus_reset(struct pci_dev *dev, int probe)
>>  {
>> -	u16 ctrl;
>> -	struct pci_dev *pdev;
>> +	u16 ctrl, flags, sltctl = 0;
>> +	struct pci_dev *pdev, *bridge;
>>  
>>  	if (pci_is_root_bus(dev->bus) || dev->subordinate || !dev->bus->self)
>>  		return -ENOTTY;
>> @@ -3242,15 +3242,34 @@ static int pci_parent_bus_reset(struct pci_dev *dev, int probe)
>>  	if (probe)
>>  		return 0;
>>  
>> -	pci_read_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, &ctrl);
>> +	bridge = dev->bus->self;
>> +
>> +	/*
>> +	 * If the parent device supports a slot with presence detection
>> +	 * change enabled, holding the bus in reset can trigger that and
>> +	 * cause an unwanted surprise removal.  Disable presence detection
>> +	 * around the bus reset.
>> +	 */
>> +	pcie_capability_read_word(bridge, PCI_EXP_FLAGS, &flags);
>> +	if (flags & PCI_EXP_FLAGS_SLOT) {
>> +		pcie_capability_read_word(bridge, PCI_EXP_SLTCTL, &sltctl);
>> +		if (sltctl & PCI_EXP_SLTCTL_PDCE)
>> +			pcie_capability_write_word(bridge, PCI_EXP_SLTCTL,
>> +						sltctl & ~PCI_EXP_SLTCTL_PDCE);
>> +	}
>> +
>> +	pci_read_config_word(bridge, PCI_BRIDGE_CONTROL, &ctrl);
>>  	ctrl |= PCI_BRIDGE_CTL_BUS_RESET;
>> -	pci_write_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, ctrl);
>> +	pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, ctrl);
>>  	msleep(100);
>>  
>>  	ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET;
>> -	pci_write_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, ctrl);
>> +	pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, ctrl);
>>  	msleep(100);
>>  
>> +	if (sltctl & PCI_EXP_SLTCTL_PDCE)
>> +		pcie_capability_write_word(bridge, PCI_EXP_SLTCTL, sltctl);
>> +
>>  	return 0;
>>  }
>>  
>>
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Feb. 14, 2013, 11:47 p.m. UTC | #3
On Thu, Feb 14, 2013 at 11:37 AM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> A bus reset can trigger a presence detection change and result in a
> suprise hotplug.  This is generally not what we want to happen when
> trying to reset a device.  Disable the presence detection control on
> on bridges around bus reset.

This is a really interesting situation, and I'm not quite ready to
sign up to the idea that this is really a problem and that if it is,
this is the way we want to fix it.

What would happen if we *did* handle this as a hotplug event, with a
removal followed by an add?

The scheme where pci_reset_function() does "pci_save_state(dev);
pci_dev_reset(dev); pci_restore_state(dev);" makes me nervous.

We're saving and restoring some of PCI config space around the reset,
but there's no guarantee that we're preserving *all* the important
state in config space because I think devices can have non-architected
device-specific things in config space that we don't know how to
save/restore.

Devices also have internal state not exposed via config space.  That
state is lost during the reset but can't be restored by
pci_restore_state().  So it seems like pci_reset_function() is
pretending to do something it can't really do reliably.

If we make it so a reset is always handled as a remove+add, then we'll
use a more generic path, and we'll get all the stuff you expect when
initializing a new device -- resource assignment, IRQ setup, quirks,
etc.  Quirks in particular seem like something we want, but don't
currently get with pci_reset_function().

Oh, and the "disable presence detect" approach below only works for
things below a PCIe bridge with native hotplug, right?  I wonder what
happens if we reset devices below a bridge using SHPC or acpiphp.

> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
>  drivers/pci/pci.c |   29 ++++++++++++++++++++++++-----
>  1 file changed, 24 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 5cb5820..c1f7d77 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3229,8 +3229,8 @@ static int pci_pm_reset(struct pci_dev *dev, int probe)
>
>  static int pci_parent_bus_reset(struct pci_dev *dev, int probe)
>  {
> -       u16 ctrl;
> -       struct pci_dev *pdev;
> +       u16 ctrl, flags, sltctl = 0;
> +       struct pci_dev *pdev, *bridge;
>
>         if (pci_is_root_bus(dev->bus) || dev->subordinate || !dev->bus->self)
>                 return -ENOTTY;
> @@ -3242,15 +3242,34 @@ static int pci_parent_bus_reset(struct pci_dev *dev, int probe)
>         if (probe)
>                 return 0;
>
> -       pci_read_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, &ctrl);
> +       bridge = dev->bus->self;
> +
> +       /*
> +        * If the parent device supports a slot with presence detection
> +        * change enabled, holding the bus in reset can trigger that and
> +        * cause an unwanted surprise removal.  Disable presence detection
> +        * around the bus reset.
> +        */
> +       pcie_capability_read_word(bridge, PCI_EXP_FLAGS, &flags);
> +       if (flags & PCI_EXP_FLAGS_SLOT) {
> +               pcie_capability_read_word(bridge, PCI_EXP_SLTCTL, &sltctl);
> +               if (sltctl & PCI_EXP_SLTCTL_PDCE)
> +                       pcie_capability_write_word(bridge, PCI_EXP_SLTCTL,
> +                                               sltctl & ~PCI_EXP_SLTCTL_PDCE);
> +       }
> +
> +       pci_read_config_word(bridge, PCI_BRIDGE_CONTROL, &ctrl);
>         ctrl |= PCI_BRIDGE_CTL_BUS_RESET;
> -       pci_write_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, ctrl);
> +       pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, ctrl);
>         msleep(100);
>
>         ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET;
> -       pci_write_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, ctrl);
> +       pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, ctrl);
>         msleep(100);
>
> +       if (sltctl & PCI_EXP_SLTCTL_PDCE)
> +               pcie_capability_write_word(bridge, PCI_EXP_SLTCTL, sltctl);
> +
>         return 0;
>  }
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson Feb. 15, 2013, 3:53 a.m. UTC | #4
On Thu, 2013-02-14 at 16:47 -0700, Bjorn Helgaas wrote:
> On Thu, Feb 14, 2013 at 11:37 AM, Alex Williamson
> <alex.williamson@redhat.com> wrote:
> > A bus reset can trigger a presence detection change and result in a
> > suprise hotplug.  This is generally not what we want to happen when
> > trying to reset a device.  Disable the presence detection control on
> > on bridges around bus reset.
> 
> This is a really interesting situation, and I'm not quite ready to
> sign up to the idea that this is really a problem and that if it is,
> this is the way we want to fix it.
> 
> What would happen if we *did* handle this as a hotplug event, with a
> removal followed by an add?
> 
> The scheme where pci_reset_function() does "pci_save_state(dev);
> pci_dev_reset(dev); pci_restore_state(dev);" makes me nervous.
> 
> We're saving and restoring some of PCI config space around the reset,
> but there's no guarantee that we're preserving *all* the important
> state in config space because I think devices can have non-architected
> device-specific things in config space that we don't know how to
> save/restore.
> 
> Devices also have internal state not exposed via config space.  That
> state is lost during the reset but can't be restored by
> pci_restore_state().  So it seems like pci_reset_function() is
> pretending to do something it can't really do reliably.
> 
> If we make it so a reset is always handled as a remove+add, then we'll
> use a more generic path, and we'll get all the stuff you expect when
> initializing a new device -- resource assignment, IRQ setup, quirks,
> etc.  Quirks in particular seem like something we want, but don't
> currently get with pci_reset_function().
> 
> Oh, and the "disable presence detect" approach below only works for
> things below a PCIe bridge with native hotplug, right?  I wonder what
> happens if we reset devices below a bridge using SHPC or acpiphp.

Triggering a remove+add is not useful for the way we use it today.  The
users I'm aware of are KVM device assignment and VFIO, where we trigger
it in an attempt to get the device to a known state so that we have some
hope of repeatability.  In those scenarios the reset is initiated by the
driver.  The interface isn't meant to guarantee the device is returned
to an identical state as it was before reset.  If it did, why would we
call it?  We want to get to a state as near to power on, but still with
config programming, as we can.

Being driver directed, having the reset initiate a remove is pretty near
the last thing we want.  That limits the scope of calling it to only
when the driver can readily release the device.  If we have the device
attached to a guest or userspace driver, that's potentially a lot of
setup and teardown and effectively extending a surprise removal all the
way up the stack.

Obviously a bus reset is a big hammer and we do exhaust all the little
hammers of flr and pm reset before we try it, but in this case, we know
the device that's going away and with all likelihood, it's coming right
back at the same location.  If we take the path of forcing a remove+add,
let's just remove it from the reset_function call path and we'll do
without reset for those devices.  Thanks,

Alex

> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > ---
> >  drivers/pci/pci.c |   29 ++++++++++++++++++++++++-----
> >  1 file changed, 24 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 5cb5820..c1f7d77 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -3229,8 +3229,8 @@ static int pci_pm_reset(struct pci_dev *dev, int probe)
> >
> >  static int pci_parent_bus_reset(struct pci_dev *dev, int probe)
> >  {
> > -       u16 ctrl;
> > -       struct pci_dev *pdev;
> > +       u16 ctrl, flags, sltctl = 0;
> > +       struct pci_dev *pdev, *bridge;
> >
> >         if (pci_is_root_bus(dev->bus) || dev->subordinate || !dev->bus->self)
> >                 return -ENOTTY;
> > @@ -3242,15 +3242,34 @@ static int pci_parent_bus_reset(struct pci_dev *dev, int probe)
> >         if (probe)
> >                 return 0;
> >
> > -       pci_read_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, &ctrl);
> > +       bridge = dev->bus->self;
> > +
> > +       /*
> > +        * If the parent device supports a slot with presence detection
> > +        * change enabled, holding the bus in reset can trigger that and
> > +        * cause an unwanted surprise removal.  Disable presence detection
> > +        * around the bus reset.
> > +        */
> > +       pcie_capability_read_word(bridge, PCI_EXP_FLAGS, &flags);
> > +       if (flags & PCI_EXP_FLAGS_SLOT) {
> > +               pcie_capability_read_word(bridge, PCI_EXP_SLTCTL, &sltctl);
> > +               if (sltctl & PCI_EXP_SLTCTL_PDCE)
> > +                       pcie_capability_write_word(bridge, PCI_EXP_SLTCTL,
> > +                                               sltctl & ~PCI_EXP_SLTCTL_PDCE);
> > +       }
> > +
> > +       pci_read_config_word(bridge, PCI_BRIDGE_CONTROL, &ctrl);
> >         ctrl |= PCI_BRIDGE_CTL_BUS_RESET;
> > -       pci_write_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, ctrl);
> > +       pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, ctrl);
> >         msleep(100);
> >
> >         ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET;
> > -       pci_write_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, ctrl);
> > +       pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, ctrl);
> >         msleep(100);
> >
> > +       if (sltctl & PCI_EXP_SLTCTL_PDCE)
> > +               pcie_capability_write_word(bridge, PCI_EXP_SLTCTL, sltctl);
> > +
> >         return 0;
> >  }
> >
> >



--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson April 24, 2013, 9:33 p.m. UTC | #5
On Thu, 2013-02-14 at 20:53 -0700, Alex Williamson wrote:
> On Thu, 2013-02-14 at 16:47 -0700, Bjorn Helgaas wrote:
> > On Thu, Feb 14, 2013 at 11:37 AM, Alex Williamson
> > <alex.williamson@redhat.com> wrote:
> > > A bus reset can trigger a presence detection change and result in a
> > > suprise hotplug.  This is generally not what we want to happen when
> > > trying to reset a device.  Disable the presence detection control on
> > > on bridges around bus reset.
> > 
> > This is a really interesting situation, and I'm not quite ready to
> > sign up to the idea that this is really a problem and that if it is,
> > this is the way we want to fix it.
> > 
> > What would happen if we *did* handle this as a hotplug event, with a
> > removal followed by an add?
> > 
> > The scheme where pci_reset_function() does "pci_save_state(dev);
> > pci_dev_reset(dev); pci_restore_state(dev);" makes me nervous.
> > 
> > We're saving and restoring some of PCI config space around the reset,
> > but there's no guarantee that we're preserving *all* the important
> > state in config space because I think devices can have non-architected
> > device-specific things in config space that we don't know how to
> > save/restore.
> > 
> > Devices also have internal state not exposed via config space.  That
> > state is lost during the reset but can't be restored by
> > pci_restore_state().  So it seems like pci_reset_function() is
> > pretending to do something it can't really do reliably.
> > 
> > If we make it so a reset is always handled as a remove+add, then we'll
> > use a more generic path, and we'll get all the stuff you expect when
> > initializing a new device -- resource assignment, IRQ setup, quirks,
> > etc.  Quirks in particular seem like something we want, but don't
> > currently get with pci_reset_function().
> > 
> > Oh, and the "disable presence detect" approach below only works for
> > things below a PCIe bridge with native hotplug, right?  I wonder what
> > happens if we reset devices below a bridge using SHPC or acpiphp.
> 
> Triggering a remove+add is not useful for the way we use it today.  The
> users I'm aware of are KVM device assignment and VFIO, where we trigger
> it in an attempt to get the device to a known state so that we have some
> hope of repeatability.  In those scenarios the reset is initiated by the
> driver.  The interface isn't meant to guarantee the device is returned
> to an identical state as it was before reset.  If it did, why would we
> call it?  We want to get to a state as near to power on, but still with
> config programming, as we can.
> 
> Being driver directed, having the reset initiate a remove is pretty near
> the last thing we want.  That limits the scope of calling it to only
> when the driver can readily release the device.  If we have the device
> attached to a guest or userspace driver, that's potentially a lot of
> setup and teardown and effectively extending a surprise removal all the
> way up the stack.
> 
> Obviously a bus reset is a big hammer and we do exhaust all the little
> hammers of flr and pm reset before we try it, but in this case, we know
> the device that's going away and with all likelihood, it's coming right
> back at the same location.  If we take the path of forcing a remove+add,
> let's just remove it from the reset_function call path and we'll do
> without reset for those devices.  Thanks,

Time to revisit this bug.  Clearly when a driver or userspace calls
pci_reset_function the intention is not to have the device be
hot-unplugged and re-plugged.  So I think we either need to prevent that
from happening or politely decline the reset.

I don't really know how to do this on acpiphp or shpc or whatever other
hotplug controllers we support.  So, what if we add a reset_slot
callback to hotplug_slot_ops?  We could then make pci_parent_bus_reset
do something like:

if (dev->slot && dev->slot->hotplug_slot) {
    if (!dev->slot->hotplug_slot->reset_slot)
        return -ENOTTY;

    return dev->slot->hotplug_slot->reset_slot(dev->slot->hotplug_slot);
} else {
    ... standard secondary bus reset...
}

I'd actually also like to add a pci_reset_bus interface because we do
have cases where the pci_reset_function is not sufficient (device
doesn't do any useful reset of it's own and pci_parent_bus_reset won't
because there are other devices on the bus).  Graphics cards in
particular are biting us here.  When all of the devices on the bus are
owned by a driver, this would provide a less device dependent reset.  It
would use same logic and code as enabled with reset_slot.  Thoughts?
Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas April 26, 2013, 7:49 p.m. UTC | #6
On Wed, Apr 24, 2013 at 3:33 PM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Thu, 2013-02-14 at 20:53 -0700, Alex Williamson wrote:
>> On Thu, 2013-02-14 at 16:47 -0700, Bjorn Helgaas wrote:
>> > On Thu, Feb 14, 2013 at 11:37 AM, Alex Williamson
>> > <alex.williamson@redhat.com> wrote:
>> > > A bus reset can trigger a presence detection change and result in a
>> > > suprise hotplug.  This is generally not what we want to happen when
>> > > trying to reset a device.  Disable the presence detection control on
>> > > on bridges around bus reset.
>> >
>> > This is a really interesting situation, and I'm not quite ready to
>> > sign up to the idea that this is really a problem and that if it is,
>> > this is the way we want to fix it.
>> >
>> > What would happen if we *did* handle this as a hotplug event, with a
>> > removal followed by an add?
>> >
>> > The scheme where pci_reset_function() does "pci_save_state(dev);
>> > pci_dev_reset(dev); pci_restore_state(dev);" makes me nervous.
>> >
>> > We're saving and restoring some of PCI config space around the reset,
>> > but there's no guarantee that we're preserving *all* the important
>> > state in config space because I think devices can have non-architected
>> > device-specific things in config space that we don't know how to
>> > save/restore.
>> >
>> > Devices also have internal state not exposed via config space.  That
>> > state is lost during the reset but can't be restored by
>> > pci_restore_state().  So it seems like pci_reset_function() is
>> > pretending to do something it can't really do reliably.
>> >
>> > If we make it so a reset is always handled as a remove+add, then we'll
>> > use a more generic path, and we'll get all the stuff you expect when
>> > initializing a new device -- resource assignment, IRQ setup, quirks,
>> > etc.  Quirks in particular seem like something we want, but don't
>> > currently get with pci_reset_function().
>> >
>> > Oh, and the "disable presence detect" approach below only works for
>> > things below a PCIe bridge with native hotplug, right?  I wonder what
>> > happens if we reset devices below a bridge using SHPC or acpiphp.
>>
>> Triggering a remove+add is not useful for the way we use it today.  The
>> users I'm aware of are KVM device assignment and VFIO, where we trigger
>> it in an attempt to get the device to a known state so that we have some
>> hope of repeatability.  In those scenarios the reset is initiated by the
>> driver.  The interface isn't meant to guarantee the device is returned
>> to an identical state as it was before reset.  If it did, why would we
>> call it?  We want to get to a state as near to power on, but still with
>> config programming, as we can.

I know you don't want the identical state before the reset.  But it
would be good if it were the same state as when the PCI core first
called the .probe() method.

What we have right now is the reset path doing *part* of the device
initialization but not all of it.  The exception I can think of is
that we don't apply any quirks in the reset path.  Maybe running those
would be enough to get to the same state as when the PCI core first
gave it to the driver.  But it's going to be hard to really confirm
that and to keep these paths in sync in the future.  And quirks are
currently entitled to assume that they run *before* a driver gets its
mitts on the device, so there could be issues there.

There probably aren't any quirks for the devices you're interested in,
so my concerns seem sort of academic.  But it would still be nice if
the scheme didn't depend on the absence of quirks.

>> Being driver directed, having the reset initiate a remove is pretty near
>> the last thing we want.  That limits the scope of calling it to only
>> when the driver can readily release the device.  If we have the device
>> attached to a guest or userspace driver, that's potentially a lot of
>> setup and teardown and effectively extending a surprise removal all the
>> way up the stack.
>>
>> Obviously a bus reset is a big hammer and we do exhaust all the little
>> hammers of flr and pm reset before we try it, but in this case, we know
>> the device that's going away and with all likelihood, it's coming right
>> back at the same location.  If we take the path of forcing a remove+add,
>> let's just remove it from the reset_function call path and we'll do
>> without reset for those devices.  Thanks,
>
> Time to revisit this bug.  Clearly when a driver or userspace calls
> pci_reset_function the intention is not to have the device be
> hot-unplugged and re-plugged.  So I think we either need to prevent that
> from happening or politely decline the reset.
>
> I don't really know how to do this on acpiphp or shpc or whatever other
> hotplug controllers we support.  So, what if we add a reset_slot
> callback to hotplug_slot_ops?  We could then make pci_parent_bus_reset
> do something like:
>
> if (dev->slot && dev->slot->hotplug_slot) {
>     if (!dev->slot->hotplug_slot->reset_slot)
>         return -ENOTTY;
>
>     return dev->slot->hotplug_slot->reset_slot(dev->slot->hotplug_slot);
> } else {
>     ... standard secondary bus reset...
> }

If we end up masking the hotplug notification, I do like the idea of
putting the controller-specific knowledge in hotplug_slot_ops rather
than directly in pci_parent_bus_reset().

> I'd actually also like to add a pci_reset_bus interface because we do
> have cases where the pci_reset_function is not sufficient (device
> doesn't do any useful reset of it's own and pci_parent_bus_reset won't
> because there are other devices on the bus).  Graphics cards in
> particular are biting us here.  When all of the devices on the bus are
> owned by a driver, this would provide a less device dependent reset.  It
> would use same logic and code as enabled with reset_slot.  Thoughts?

You're thinking that pci_reset_bus() would do a secondary bus reset,
but only if every device on the bus is owned by the same driver?

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson April 29, 2013, 4:08 p.m. UTC | #7
On Fri, 2013-04-26 at 13:49 -0600, Bjorn Helgaas wrote:
> On Wed, Apr 24, 2013 at 3:33 PM, Alex Williamson
> <alex.williamson@redhat.com> wrote:
> > On Thu, 2013-02-14 at 20:53 -0700, Alex Williamson wrote:
> >> On Thu, 2013-02-14 at 16:47 -0700, Bjorn Helgaas wrote:
> >> > On Thu, Feb 14, 2013 at 11:37 AM, Alex Williamson
> >> > <alex.williamson@redhat.com> wrote:
> >> > > A bus reset can trigger a presence detection change and result in a
> >> > > suprise hotplug.  This is generally not what we want to happen when
> >> > > trying to reset a device.  Disable the presence detection control on
> >> > > on bridges around bus reset.
> >> >
> >> > This is a really interesting situation, and I'm not quite ready to
> >> > sign up to the idea that this is really a problem and that if it is,
> >> > this is the way we want to fix it.
> >> >
> >> > What would happen if we *did* handle this as a hotplug event, with a
> >> > removal followed by an add?
> >> >
> >> > The scheme where pci_reset_function() does "pci_save_state(dev);
> >> > pci_dev_reset(dev); pci_restore_state(dev);" makes me nervous.
> >> >
> >> > We're saving and restoring some of PCI config space around the reset,
> >> > but there's no guarantee that we're preserving *all* the important
> >> > state in config space because I think devices can have non-architected
> >> > device-specific things in config space that we don't know how to
> >> > save/restore.
> >> >
> >> > Devices also have internal state not exposed via config space.  That
> >> > state is lost during the reset but can't be restored by
> >> > pci_restore_state().  So it seems like pci_reset_function() is
> >> > pretending to do something it can't really do reliably.
> >> >
> >> > If we make it so a reset is always handled as a remove+add, then we'll
> >> > use a more generic path, and we'll get all the stuff you expect when
> >> > initializing a new device -- resource assignment, IRQ setup, quirks,
> >> > etc.  Quirks in particular seem like something we want, but don't
> >> > currently get with pci_reset_function().
> >> >
> >> > Oh, and the "disable presence detect" approach below only works for
> >> > things below a PCIe bridge with native hotplug, right?  I wonder what
> >> > happens if we reset devices below a bridge using SHPC or acpiphp.
> >>
> >> Triggering a remove+add is not useful for the way we use it today.  The
> >> users I'm aware of are KVM device assignment and VFIO, where we trigger
> >> it in an attempt to get the device to a known state so that we have some
> >> hope of repeatability.  In those scenarios the reset is initiated by the
> >> driver.  The interface isn't meant to guarantee the device is returned
> >> to an identical state as it was before reset.  If it did, why would we
> >> call it?  We want to get to a state as near to power on, but still with
> >> config programming, as we can.
> 
> I know you don't want the identical state before the reset.  But it
> would be good if it were the same state as when the PCI core first
> called the .probe() method.

Well, to clarify, we do want the known PCI state of the device to be the
same before and after reset, otherwise we risk PCI-core getting out of
sync with the actual device state.  What we want cleared is all of the
device internal state, so I think we have to assume that users of this
interface want that to be cleared or know how to restore it.  PCI-core
can't sign-up for restoring state that it's unaware of.

> What we have right now is the reset path doing *part* of the device
> initialization but not all of it.  The exception I can think of is
> that we don't apply any quirks in the reset path.  Maybe running those
> would be enough to get to the same state as when the PCI core first
> gave it to the driver.  But it's going to be hard to really confirm
> that and to keep these paths in sync in the future.  And quirks are
> currently entitled to assume that they run *before* a driver gets its
> mitts on the device, so there could be issues there.
> 
> There probably aren't any quirks for the devices you're interested in,
> so my concerns seem sort of academic.  But it would still be nice if
> the scheme didn't depend on the absence of quirks.

I don't quite understand the value of running quirks in the reset path.
The device is owned by a driver across this reset, so why would we want
to get it back to the pre-.probe() state?  That might be something we
would want for a full device re-init between drivers, but that seems
additional to a reset interface (ie. reset & re-init).

> >> Being driver directed, having the reset initiate a remove is pretty near
> >> the last thing we want.  That limits the scope of calling it to only
> >> when the driver can readily release the device.  If we have the device
> >> attached to a guest or userspace driver, that's potentially a lot of
> >> setup and teardown and effectively extending a surprise removal all the
> >> way up the stack.
> >>
> >> Obviously a bus reset is a big hammer and we do exhaust all the little
> >> hammers of flr and pm reset before we try it, but in this case, we know
> >> the device that's going away and with all likelihood, it's coming right
> >> back at the same location.  If we take the path of forcing a remove+add,
> >> let's just remove it from the reset_function call path and we'll do
> >> without reset for those devices.  Thanks,
> >
> > Time to revisit this bug.  Clearly when a driver or userspace calls
> > pci_reset_function the intention is not to have the device be
> > hot-unplugged and re-plugged.  So I think we either need to prevent that
> > from happening or politely decline the reset.
> >
> > I don't really know how to do this on acpiphp or shpc or whatever other
> > hotplug controllers we support.  So, what if we add a reset_slot
> > callback to hotplug_slot_ops?  We could then make pci_parent_bus_reset
> > do something like:
> >
> > if (dev->slot && dev->slot->hotplug_slot) {
> >     if (!dev->slot->hotplug_slot->reset_slot)
> >         return -ENOTTY;
> >
> >     return dev->slot->hotplug_slot->reset_slot(dev->slot->hotplug_slot);
> > } else {
> >     ... standard secondary bus reset...
> > }
> 
> If we end up masking the hotplug notification, I do like the idea of
> putting the controller-specific knowledge in hotplug_slot_ops rather
> than directly in pci_parent_bus_reset().

Great :)  For now we can assume that any hotplug_slot_ops that doesn't
implement a reset function requires nothing special.

> > I'd actually also like to add a pci_reset_bus interface because we do
> > have cases where the pci_reset_function is not sufficient (device
> > doesn't do any useful reset of it's own and pci_parent_bus_reset won't
> > because there are other devices on the bus).  Graphics cards in
> > particular are biting us here.  When all of the devices on the bus are
> > owned by a driver, this would provide a less device dependent reset.  It
> > would use same logic and code as enabled with reset_slot.  Thoughts?
> 
> You're thinking that pci_reset_bus() would do a secondary bus reset,
> but only if every device on the bus is owned by the same driver?

I don't think it's PCI-core's responsibility to figure out the driver
situation, the caller should determine that.  A motivated driver has
always had the ability to poke the secondary bus reset of a bridge, I
think we just want to create a common interface and wrap it with device
save/restore like the pci_reset_function interface does.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 5cb5820..c1f7d77 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3229,8 +3229,8 @@  static int pci_pm_reset(struct pci_dev *dev, int probe)
 
 static int pci_parent_bus_reset(struct pci_dev *dev, int probe)
 {
-	u16 ctrl;
-	struct pci_dev *pdev;
+	u16 ctrl, flags, sltctl = 0;
+	struct pci_dev *pdev, *bridge;
 
 	if (pci_is_root_bus(dev->bus) || dev->subordinate || !dev->bus->self)
 		return -ENOTTY;
@@ -3242,15 +3242,34 @@  static int pci_parent_bus_reset(struct pci_dev *dev, int probe)
 	if (probe)
 		return 0;
 
-	pci_read_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, &ctrl);
+	bridge = dev->bus->self;
+
+	/*
+	 * If the parent device supports a slot with presence detection
+	 * change enabled, holding the bus in reset can trigger that and
+	 * cause an unwanted surprise removal.  Disable presence detection
+	 * around the bus reset.
+	 */
+	pcie_capability_read_word(bridge, PCI_EXP_FLAGS, &flags);
+	if (flags & PCI_EXP_FLAGS_SLOT) {
+		pcie_capability_read_word(bridge, PCI_EXP_SLTCTL, &sltctl);
+		if (sltctl & PCI_EXP_SLTCTL_PDCE)
+			pcie_capability_write_word(bridge, PCI_EXP_SLTCTL,
+						sltctl & ~PCI_EXP_SLTCTL_PDCE);
+	}
+
+	pci_read_config_word(bridge, PCI_BRIDGE_CONTROL, &ctrl);
 	ctrl |= PCI_BRIDGE_CTL_BUS_RESET;
-	pci_write_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, ctrl);
+	pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, ctrl);
 	msleep(100);
 
 	ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET;
-	pci_write_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, ctrl);
+	pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, ctrl);
 	msleep(100);
 
+	if (sltctl & PCI_EXP_SLTCTL_PDCE)
+		pcie_capability_write_word(bridge, PCI_EXP_SLTCTL, sltctl);
+
 	return 0;
 }