Patchwork PCI: Stop sriov before remove PF

login
register
mail settings
Submitter Yinghai Lu
Date July 19, 2013, 7:14 p.m.
Message ID <1374261258-23036-3-git-send-email-yinghai@kernel.org>
Download mbox | patch
Permalink /patch/260336/
State Not Applicable
Headers show

Comments

Yinghai Lu - July 19, 2013, 7:14 p.m.
After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
(PCI: Simplify IOV implementation and fix reference count races)
VF need to be removed via virtfn_remove to make sure ref to PF
is put back.

Some driver (like ixgbe) does not call pci_disable_sriov() if
sriov is enabled via /sys/.../sriov_numvfs setting.
ixgbe does allow driver for PF get detached, but still have VFs
around.

But how about PF get removed via /sys or pciehp?

During hot-remove, VF will still hold one ref to PF and it
prevent PF to be removed.
That make the next hot-add fails, as old PF dev struct is still around.

We need to add pci_disable_sriov() calling during pci dev removing.

Need this one for v3.11

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Donald Dutile <ddutile@redhat.com>
Cc: Greg Rose <gregory.v.rose@intel.com>

---
 drivers/pci/remove.c |    3 +++
 1 file changed, 3 insertions(+)

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Duyck - July 19, 2013, 9:46 p.m.
On 07/19/2013 12:14 PM, Yinghai Lu wrote:
> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
> (PCI: Simplify IOV implementation and fix reference count races)
> VF need to be removed via virtfn_remove to make sure ref to PF
> is put back.
> 
> Some driver (like ixgbe) does not call pci_disable_sriov() if
> sriov is enabled via /sys/.../sriov_numvfs setting.
> ixgbe does allow driver for PF get detached, but still have VFs
> around.
> 
> But how about PF get removed via /sys or pciehp?
> 
> During hot-remove, VF will still hold one ref to PF and it
> prevent PF to be removed.
> That make the next hot-add fails, as old PF dev struct is still around.
> 
> We need to add pci_disable_sriov() calling during pci dev removing.
> 
> Need this one for v3.11
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Jiang Liu <liuj97@gmail.com>
> Cc: Alexander Duyck <alexander.h.duyck@intel.com>
> Cc: Donald Dutile <ddutile@redhat.com>
> Cc: Greg Rose <gregory.v.rose@intel.com>
> 
> ---
>  drivers/pci/remove.c |    3 +++
>  1 file changed, 3 insertions(+)
> 
> Index: linux-2.6/drivers/pci/remove.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/remove.c
> +++ linux-2.6/drivers/pci/remove.c
> @@ -34,6 +34,9 @@ static void pci_stop_dev(struct pci_dev
>  
>  static void pci_destroy_dev(struct pci_dev *dev)
>  {
> +	/* remove VF, if PF driver skip that */
> +	pci_disable_sriov(dev);
> +
>  	down_write(&pci_bus_sem);
>  	list_del(&dev->bus_list);
>  	up_write(&pci_bus_sem);
> 

How are you able to hot-remove the PF if the VFs are still holding
references to it?

The issue I see with this patch is that if the PF has any VFs direct
assigned, hot plug removing the PF will cause the guests containing
those VFs to panic.

Thanks,

Alex



Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu - July 19, 2013, 10:44 p.m.
On Fri, Jul 19, 2013 at 2:46 PM, Alexander Duyck
<alexander.h.duyck@intel.com> wrote:
> On 07/19/2013 12:14 PM, Yinghai Lu wrote:
>> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
>> (PCI: Simplify IOV implementation and fix reference count races)
>> VF need to be removed via virtfn_remove to make sure ref to PF
>> is put back.
>>
>> Some driver (like ixgbe) does not call pci_disable_sriov() if
>> sriov is enabled via /sys/.../sriov_numvfs setting.
>> ixgbe does allow driver for PF get detached, but still have VFs
>> around.
>>
>> But how about PF get removed via /sys or pciehp?
>>
>> During hot-remove, VF will still hold one ref to PF and it
>> prevent PF to be removed.
>> That make the next hot-add fails, as old PF dev struct is still around.
>>
>> We need to add pci_disable_sriov() calling during pci dev removing.
>>
>> Need this one for v3.11
>>
>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>> Cc: Jiang Liu <liuj97@gmail.com>
>> Cc: Alexander Duyck <alexander.h.duyck@intel.com>
>> Cc: Donald Dutile <ddutile@redhat.com>
>> Cc: Greg Rose <gregory.v.rose@intel.com>
>>
>> ---
>>  drivers/pci/remove.c |    3 +++
>>  1 file changed, 3 insertions(+)
>>
>> Index: linux-2.6/drivers/pci/remove.c
>> ===================================================================
>> --- linux-2.6.orig/drivers/pci/remove.c
>> +++ linux-2.6/drivers/pci/remove.c
>> @@ -34,6 +34,9 @@ static void pci_stop_dev(struct pci_dev
>>
>>  static void pci_destroy_dev(struct pci_dev *dev)
>>  {
>> +     /* remove VF, if PF driver skip that */
>> +     pci_disable_sriov(dev);
>> +
>>       down_write(&pci_bus_sem);
>>       list_del(&dev->bus_list);
>>       up_write(&pci_bus_sem);
>>
>
> How are you able to hot-remove the PF if the VFs are still holding
> references to it?

usually pci_stop_and_remove_bus_device always successfully, and
power get turned off for that card.

>
> The issue I see with this patch is that if the PF has any VFs direct
> assigned, hot plug removing the PF will cause the guests containing
> those VFs to panic.

Then you should make guest support hotplug or suprise removal.

If the guest does panic because it does support hotplug, that is right behavior.

Just like in bare metal machine, if it does not support hotplug, and user would
know what is going to happen if he remove one pcie card.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Duyck - July 19, 2013, 11:22 p.m.
On 07/19/2013 03:44 PM, Yinghai Lu wrote:
> On Fri, Jul 19, 2013 at 2:46 PM, Alexander Duyck
> <alexander.h.duyck@intel.com> wrote:
>> On 07/19/2013 12:14 PM, Yinghai Lu wrote:
>>> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
>>> (PCI: Simplify IOV implementation and fix reference count races)
>>> VF need to be removed via virtfn_remove to make sure ref to PF
>>> is put back.
>>>
>>> Some driver (like ixgbe) does not call pci_disable_sriov() if
>>> sriov is enabled via /sys/.../sriov_numvfs setting.
>>> ixgbe does allow driver for PF get detached, but still have VFs
>>> around.
>>>
>>> But how about PF get removed via /sys or pciehp?
>>>
>>> During hot-remove, VF will still hold one ref to PF and it
>>> prevent PF to be removed.
>>> That make the next hot-add fails, as old PF dev struct is still around.
>>>
>>> We need to add pci_disable_sriov() calling during pci dev removing.
>>>
>>> Need this one for v3.11
>>>
>>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>>> Cc: Jiang Liu <liuj97@gmail.com>
>>> Cc: Alexander Duyck <alexander.h.duyck@intel.com>
>>> Cc: Donald Dutile <ddutile@redhat.com>
>>> Cc: Greg Rose <gregory.v.rose@intel.com>
>>>
>>> ---
>>>  drivers/pci/remove.c |    3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> Index: linux-2.6/drivers/pci/remove.c
>>> ===================================================================
>>> --- linux-2.6.orig/drivers/pci/remove.c
>>> +++ linux-2.6/drivers/pci/remove.c
>>> @@ -34,6 +34,9 @@ static void pci_stop_dev(struct pci_dev
>>>
>>>  static void pci_destroy_dev(struct pci_dev *dev)
>>>  {
>>> +     /* remove VF, if PF driver skip that */
>>> +     pci_disable_sriov(dev);
>>> +
>>>       down_write(&pci_bus_sem);
>>>       list_del(&dev->bus_list);
>>>       up_write(&pci_bus_sem);
>>>
>> How are you able to hot-remove the PF if the VFs are still holding
>> references to it?
> usually pci_stop_and_remove_bus_device always successfully, and
> power get turned off for that card.

I'm not an expert in this area, but that doesn't seem right.  How is it
you can remove a device if there are still outstanding references to
it?  Is this one of those cases where we have to succeed because the
system is removing the device and there is nothing we can do to stop it?

>> The issue I see with this patch is that if the PF has any VFs direct
>> assigned, hot plug removing the PF will cause the guests containing
>> those VFs to panic.
> Then you should make guest support hotplug or suprise removal.
>
> If the guest does panic because it does support hotplug, that is right behavior.
>
> Just like in bare metal machine, if it does not support hotplug, and user would
> know what is going to happen if he remove one pcie card.
>
> Thanks
>
> Yinghai

I suspect that is much easier said than done.  We probably need somebody
familiar with the KVM side of things to address the feasibility of
something like that.  I believe it was Greg and Don that worked on the
original patches that made it so that we could leave the VFs in place on
driver removal.  They would likely have a better answer as to why it is
preferable to leave the VFs in place than panic a non-compliant guest.

Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas - July 22, 2013, 11:15 p.m.
On Fri, Jul 19, 2013 at 1:14 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
> (PCI: Simplify IOV implementation and fix reference count races)
> VF need to be removed via virtfn_remove to make sure ref to PF
> is put back.
>
> Some driver (like ixgbe) does not call pci_disable_sriov() if
> sriov is enabled via /sys/.../sriov_numvfs setting.
> ixgbe does allow driver for PF get detached, but still have VFs
> around.
>
> But how about PF get removed via /sys or pciehp?
>
> During hot-remove, VF will still hold one ref to PF and it
> prevent PF to be removed.
> That make the next hot-add fails, as old PF dev struct is still around.
>
> We need to add pci_disable_sriov() calling during pci dev removing.
>
> Need this one for v3.11

Needs explanation.  Pretend Linus is asking why we should put this in
after the merge window :)

I think the answer is that dc087f2f introduced a regression in certain
hot-remove/hot-add scenarios, but an example transcript showing the
issue would help a lot.

> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Jiang Liu <liuj97@gmail.com>
> Cc: Alexander Duyck <alexander.h.duyck@intel.com>
> Cc: Donald Dutile <ddutile@redhat.com>
> Cc: Greg Rose <gregory.v.rose@intel.com>
>
> ---
>  drivers/pci/remove.c |    3 +++
>  1 file changed, 3 insertions(+)
>
> Index: linux-2.6/drivers/pci/remove.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/remove.c
> +++ linux-2.6/drivers/pci/remove.c
> @@ -34,6 +34,9 @@ static void pci_stop_dev(struct pci_dev
>
>  static void pci_destroy_dev(struct pci_dev *dev)
>  {
> +       /* remove VF, if PF driver skip that */
> +       pci_disable_sriov(dev);

How did you decide to call pci_disable_sriov() here rather than, for
example, in pci_stop_dev()?  We already have some PME and ASPM cleanup
in pci_stop_dev(), and this seems sort of similar to those.

>         down_write(&pci_bus_sem);
>         list_del(&dev->bus_list);
>         up_write(&pci_bus_sem);
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu - July 23, 2013, 1:59 a.m.
On Mon, Jul 22, 2013 at 4:15 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Fri, Jul 19, 2013 at 1:14 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
>> (PCI: Simplify IOV implementation and fix reference count races)
>> VF need to be removed via virtfn_remove to make sure ref to PF
>> is put back.
>>
>> Some driver (like ixgbe) does not call pci_disable_sriov() if
>> sriov is enabled via /sys/.../sriov_numvfs setting.
>> ixgbe does allow driver for PF get detached, but still have VFs
>> around.
>>
>> But how about PF get removed via /sys or pciehp?
>>
>> During hot-remove, VF will still hold one ref to PF and it
>> prevent PF to be removed.
>> That make the next hot-add fails, as old PF dev struct is still around.
>>
>> We need to add pci_disable_sriov() calling during pci dev removing.
>>
>> Need this one for v3.11
>
> Needs explanation.  Pretend Linus is asking why we should put this in
> after the merge window :)
>
> I think the answer is that dc087f2f introduced a regression in certain
> hot-remove/hot-add scenarios, but an example transcript showing the
> issue would help a lot.

for intel 10GB ethernet, user could use /sys/../num_vfs to enable SRIOV
after PF driver is loaded.

If the cards in slots for pciehp, when user press button or use
/sys/bus/pci/slots/../power to turn off the power.

The PF driver will be stopped (but it does not call pci_disable_sriov),
and try to remove the PF, then turn off the power.
Actually the PF's pci_dev struct is not freed, because VFs still hold
some reference to it.

That is not fun, hotadd will not work, as old pci_dev struct is still there.
those VF struct still have old ref to it.

>
>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>> Cc: Jiang Liu <liuj97@gmail.com>
>> Cc: Alexander Duyck <alexander.h.duyck@intel.com>
>> Cc: Donald Dutile <ddutile@redhat.com>
>> Cc: Greg Rose <gregory.v.rose@intel.com>
>>
>> ---
>>  drivers/pci/remove.c |    3 +++
>>  1 file changed, 3 insertions(+)
>>
>> Index: linux-2.6/drivers/pci/remove.c
>> ===================================================================
>> --- linux-2.6.orig/drivers/pci/remove.c
>> +++ linux-2.6/drivers/pci/remove.c
>> @@ -34,6 +34,9 @@ static void pci_stop_dev(struct pci_dev
>>
>>  static void pci_destroy_dev(struct pci_dev *dev)
>>  {
>> +       /* remove VF, if PF driver skip that */
>> +       pci_disable_sriov(dev);
>
> How did you decide to call pci_disable_sriov() here rather than, for
> example, in pci_stop_dev()?  We already have some PME and ASPM cleanup
> in pci_stop_dev(), and this seems sort of similar to those.

yes, pci_stop_dev is better.

I was thinking that pci_stop_dev could be used when PF's driver is
unloaded or detached.

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Don Dutile - July 23, 2013, 3:34 p.m.
On 07/19/2013 07:22 PM, Alexander Duyck wrote:
> On 07/19/2013 03:44 PM, Yinghai Lu wrote:
>> On Fri, Jul 19, 2013 at 2:46 PM, Alexander Duyck
>> <alexander.h.duyck@intel.com>  wrote:
>>> On 07/19/2013 12:14 PM, Yinghai Lu wrote:
>>>> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
>>>> (PCI: Simplify IOV implementation and fix reference count races)
>>>> VF need to be removed via virtfn_remove to make sure ref to PF
>>>> is put back.
>>>>
>>>> Some driver (like ixgbe) does not call pci_disable_sriov() if
>>>> sriov is enabled via /sys/.../sriov_numvfs setting.
>>>> ixgbe does allow driver for PF get detached, but still have VFs
>>>> around.
>>>>
>>>> But how about PF get removed via /sys or pciehp?
>>>>
>>>> During hot-remove, VF will still hold one ref to PF and it
>>>> prevent PF to be removed.
>>>> That make the next hot-add fails, as old PF dev struct is still around.
>>>>
>>>> We need to add pci_disable_sriov() calling during pci dev removing.
>>>>
>>>> Need this one for v3.11
>>>>
>>>> Signed-off-by: Yinghai Lu<yinghai@kernel.org>
>>>> Cc: Jiang Liu<liuj97@gmail.com>
>>>> Cc: Alexander Duyck<alexander.h.duyck@intel.com>
>>>> Cc: Donald Dutile<ddutile@redhat.com>
>>>> Cc: Greg Rose<gregory.v.rose@intel.com>
>>>>
>>>> ---
>>>>   drivers/pci/remove.c |    3 +++
>>>>   1 file changed, 3 insertions(+)
>>>>
>>>> Index: linux-2.6/drivers/pci/remove.c
>>>> ===================================================================
>>>> --- linux-2.6.orig/drivers/pci/remove.c
>>>> +++ linux-2.6/drivers/pci/remove.c
>>>> @@ -34,6 +34,9 @@ static void pci_stop_dev(struct pci_dev
>>>>
>>>>   static void pci_destroy_dev(struct pci_dev *dev)
>>>>   {
>>>> +     /* remove VF, if PF driver skip that */
>>>> +     pci_disable_sriov(dev);
>>>> +
>>>>        down_write(&pci_bus_sem);
>>>>        list_del(&dev->bus_list);
>>>>        up_write(&pci_bus_sem);
>>>>
>>> How are you able to hot-remove the PF if the VFs are still holding
>>> references to it?
>> usually pci_stop_and_remove_bus_device always successfully, and
>> power get turned off for that card.
>
> I'm not an expert in this area, but that doesn't seem right.  How is it
> you can remove a device if there are still outstanding references to
> it?  Is this one of those cases where we have to succeed because the
> system is removing the device and there is nothing we can do to stop it?
>
>>> The issue I see with this patch is that if the PF has any VFs direct
>>> assigned, hot plug removing the PF will cause the guests containing
>>> those VFs to panic.
>> Then you should make guest support hotplug or suprise removal.
>>
>> If the guest does panic because it does support hotplug, that is right behavior.
>>
>> Just like in bare metal machine, if it does not support hotplug, and user would
>> know what is going to happen if he remove one pcie card.
>>
>> Thanks
>>
>> Yinghai
>
> I suspect that is much easier said than done.  We probably need somebody
> familiar with the KVM side of things to address the feasibility of
> something like that.  I believe it was Greg and Don that worked on the
> original patches that made it so that we could leave the VFs in place on
> driver removal.  They would likely have a better answer as to why it is
> preferable to leave the VFs in place than panic a non-compliant guest.

The virtual effect of leaving the VFs in place was the equivalent of unplugging
the cable from the VF device in the guest. When the PF driver was reloaded, it
caused the virtual effect of the network cable being reconnected.  Before that
patch set (in ixgbe & igb), a PF driver unload in the host would result in the VF
assigned to KVM a guest caused a *host crash*.
So, start up a KVM (linux) guest, hot-remove the PF with a VF
assigned to a guest, and with these patches applied, ensure the host doesn't crash.
if it does crash, that's a regression that can't be tolerated, and this patch (set)
will need further work.
- Don

>
> Thanks,
>
> Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu - July 23, 2013, 4:10 p.m.
On Tue, Jul 23, 2013 at 8:34 AM, Don Dutile <ddutile@redhat.com> wrote:
>>
>> I suspect that is much easier said than done.  We probably need somebody
>> familiar with the KVM side of things to address the feasibility of
>> something like that.  I believe it was Greg and Don that worked on the
>> original patches that made it so that we could leave the VFs in place on
>> driver removal.  They would likely have a better answer as to why it is
>> preferable to leave the VFs in place than panic a non-compliant guest.
>
>
> The virtual effect of leaving the VFs in place was the equivalent of
> unplugging
> the cable from the VF device in the guest. When the PF driver was reloaded,
> it
> caused the virtual effect of the network cable being reconnected.  Before
> that
> patch set (in ixgbe & igb), a PF driver unload in the host would result in
> the VF
> assigned to KVM a guest caused a *host crash*.
> So, start up a KVM (linux) guest, hot-remove the PF with a VF
> assigned to a guest, and with these patches applied, ensure the host doesn't
> crash.
> if it does crash, that's a regression that can't be tolerated, and this
> patch (set)
> will need further work.

at beginning, we have pcie native hotlpug working even with sriov enabled.

later patchset (in ixgbe & igb) will not call disable_sriov, that cause

hotplug does not work anymore, so that is first regression at all.

Anyway, guest host crash is not regression when PF get removed, that
is old behavior when guest/sriov pci passthrough support is added.

So need to pci_stub to notify guest to do hotremove, when PF's driver
get detached?

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

Index: linux-2.6/drivers/pci/remove.c
===================================================================
--- linux-2.6.orig/drivers/pci/remove.c
+++ linux-2.6/drivers/pci/remove.c
@@ -34,6 +34,9 @@  static void pci_stop_dev(struct pci_dev
 
 static void pci_destroy_dev(struct pci_dev *dev)
 {
+	/* remove VF, if PF driver skip that */
+	pci_disable_sriov(dev);
+
 	down_write(&pci_bus_sem);
 	list_del(&dev->bus_list);
 	up_write(&pci_bus_sem);