diff mbox

[v2,3/3] PCI: Stop sriov after stop PF if PF's driver skip that

Message ID 1374937868-24437-2-git-send-email-yinghai@kernel.org
State Rejected
Headers show

Commit Message

Yinghai Lu July 27, 2013, 3:11 p.m. UTC
After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
(PCI: Simplify IOV implementation and fix reference count races)
VF need to be removed via virtfn_remove to make sure ref to PF
is put back.

Some driver (like ixgbe) does not call pci_disable_sriov() if
sriov is enabled via /sys/.../sriov_numvfs setting.
ixgbe does allow driver for PF get detached, but still have VFs
around.

But how about PF get removed via /sys or pciehp finally?

During hot-remove, VF will still hold one ref to PF and it
prevent PF to be removed.
That make the next hot-add fails, as old PF dev struct is still around.

We need to add pci_disable_sriov() calling during stop PF .

Need this one for v3.11

-v2: Accoring to Bjorn, move that calling to pci_stop_dev.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Donald Dutile <ddutile@redhat.com>
Cc: Greg Rose <gregory.v.rose@intel.com>

---
 drivers/pci/remove.c |    2 ++
 1 file changed, 2 insertions(+)

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Bjorn Helgaas July 29, 2013, 7:58 p.m. UTC | #1
On Sat, Jul 27, 2013 at 9:11 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
> (PCI: Simplify IOV implementation and fix reference count races)
> VF need to be removed via virtfn_remove to make sure ref to PF
> is put back.
>
> Some driver (like ixgbe) does not call pci_disable_sriov() if
> sriov is enabled via /sys/.../sriov_numvfs setting.
> ixgbe does allow driver for PF get detached, but still have VFs
> around.

Is this something ixgbe should be doing differently?

I'm not 100% sold on the idea of the VFs staying active after the
driver releases the PF.  It seems asymmetric because I think the
driver has to claim the PF to *enable* the VFs, but we don't disable
them when releasing the PF.

What's the use case for detaching the PF driver while the VFs are active?

> But how about PF get removed via /sys or pciehp finally?
>
> During hot-remove, VF will still hold one ref to PF and it
> prevent PF to be removed.
> That make the next hot-add fails, as old PF dev struct is still around.
>
> We need to add pci_disable_sriov() calling during stop PF .
>
> Need this one for v3.11

Don had a concern that there might be a regression here ... I'm a bit
confused on the details, but you guys need to come to agreement that
this doesn't make things worse.

> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Jiang Liu <liuj97@gmail.com>
> Cc: Alexander Duyck <alexander.h.duyck@intel.com>
> Cc: Donald Dutile <ddutile@redhat.com>
> Cc: Greg Rose <gregory.v.rose@intel.com>
>
> ---
>  drivers/pci/remove.c |    2 ++
>  1 file changed, 2 insertions(+)
>
> Index: linux-2.6/drivers/pci/remove.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/remove.c
> +++ linux-2.6/drivers/pci/remove.c
> @@ -25,6 +25,8 @@ static void pci_stop_dev(struct pci_dev
>                 pci_proc_detach_device(dev);
>                 pci_remove_sysfs_dev_files(dev);
>                 device_del(&dev->dev);
> +               /* remove VF, if PF driver skip that */
> +               pci_disable_sriov(dev);
>                 dev->is_added = 0;
>         }
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Don Dutile July 29, 2013, 8:32 p.m. UTC | #2
On 07/29/2013 03:58 PM, Bjorn Helgaas wrote:
> On Sat, Jul 27, 2013 at 9:11 AM, Yinghai Lu<yinghai@kernel.org>  wrote:
>> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
>> (PCI: Simplify IOV implementation and fix reference count races)
>> VF need to be removed via virtfn_remove to make sure ref to PF
>> is put back.
>>
>> Some driver (like ixgbe) does not call pci_disable_sriov() if
>> sriov is enabled via /sys/.../sriov_numvfs setting.
>> ixgbe does allow driver for PF get detached, but still have VFs
>> around.
>
> Is this something ixgbe should be doing differently?
>
> I'm not 100% sold on the idea of the VFs staying active after the
> driver releases the PF.  It seems asymmetric because I think the
> driver has to claim the PF to *enable* the VFs, but we don't disable
> them when releasing the PF.
>
> What's the use case for detaching the PF driver while the VFs are active?
>
VF's assigned to (KVM) guest (via device-assignment).
Virtually, it's as if the enet cable is unplugged to the VF in the guest --
the device is still there (the PF wasn't unplugged, just the driver de-configured).

Pre-sysfs-based configuration, the std way to configure the VFs into
a system was to unload the PF driver, and reload it with a vf-enabling parameter
(like max_vfs=<n> in the case of ixgbe, igb).
Now, if someone unloaded the PF driver in the host, the unplanned removal
of the PF enabled the VF to crash the host (maybe AlexD can provide the
details how that occurred).
So, the solution was to 'pause' the VF operation and let packets drop
in the guest, and re-enable the VF operation if the PF driver was re-configured.

So, as I stated in previous postings, this patch is acceptable if
it doesn't cause a guest crash when a VF is assigned to a KVM guest.
If you tested this case, then please state as such in the posting.
If not, then can AlexD test this case ?

- Don
>> But how about PF get removed via /sys or pciehp finally?
>>
>> During hot-remove, VF will still hold one ref to PF and it
>> prevent PF to be removed.
>> That make the next hot-add fails, as old PF dev struct is still around.
>>
>> We need to add pci_disable_sriov() calling during stop PF .
>>
>> Need this one for v3.11
>
> Don had a concern that there might be a regression here ... I'm a bit
> confused on the details, but you guys need to come to agreement that
> this doesn't make things worse.
>
>> Signed-off-by: Yinghai Lu<yinghai@kernel.org>
>> Cc: Jiang Liu<liuj97@gmail.com>
>> Cc: Alexander Duyck<alexander.h.duyck@intel.com>
>> Cc: Donald Dutile<ddutile@redhat.com>
>> Cc: Greg Rose<gregory.v.rose@intel.com>
>>
>> ---
>>   drivers/pci/remove.c |    2 ++
>>   1 file changed, 2 insertions(+)
>>
>> Index: linux-2.6/drivers/pci/remove.c
>> ===================================================================
>> --- linux-2.6.orig/drivers/pci/remove.c
>> +++ linux-2.6/drivers/pci/remove.c
>> @@ -25,6 +25,8 @@ static void pci_stop_dev(struct pci_dev
>>                  pci_proc_detach_device(dev);
>>                  pci_remove_sysfs_dev_files(dev);
>>                  device_del(&dev->dev);
>> +               /* remove VF, if PF driver skip that */
>> +               pci_disable_sriov(dev);
>>                  dev->is_added = 0;
>>          }
>>

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Duyck, Alexander H July 29, 2013, 9:07 p.m. UTC | #3
On 07/29/2013 01:32 PM, Don Dutile wrote:
> On 07/29/2013 03:58 PM, Bjorn Helgaas wrote:
>> On Sat, Jul 27, 2013 at 9:11 AM, Yinghai Lu<yinghai@kernel.org>  wrote:
>>> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
>>> (PCI: Simplify IOV implementation and fix reference count races)
>>> VF need to be removed via virtfn_remove to make sure ref to PF
>>> is put back.
>>>
>>> Some driver (like ixgbe) does not call pci_disable_sriov() if
>>> sriov is enabled via /sys/.../sriov_numvfs setting.
>>> ixgbe does allow driver for PF get detached, but still have VFs
>>> around.
>>
>> Is this something ixgbe should be doing differently?
>>
>> I'm not 100% sold on the idea of the VFs staying active after the
>> driver releases the PF.  It seems asymmetric because I think the
>> driver has to claim the PF to *enable* the VFs, but we don't disable
>> them when releasing the PF.
>>
>> What's the use case for detaching the PF driver while the VFs are
>> active?
>>
> VF's assigned to (KVM) guest (via device-assignment).
> Virtually, it's as if the enet cable is unplugged to the VF in the
> guest --
> the device is still there (the PF wasn't unplugged, just the driver
> de-configured).
>
> Pre-sysfs-based configuration, the std way to configure the VFs into
> a system was to unload the PF driver, and reload it with a vf-enabling
> parameter
> (like max_vfs=<n> in the case of ixgbe, igb).
> Now, if someone unloaded the PF driver in the host, the unplanned removal
> of the PF enabled the VF to crash the host (maybe AlexD can provide the
> details how that occurred).
> So, the solution was to 'pause' the VF operation and let packets drop
> in the guest, and re-enable the VF operation if the PF driver was
> re-configured.
>
> So, as I stated in previous postings, this patch is acceptable if
> it doesn't cause a guest crash when a VF is assigned to a KVM guest.
> If you tested this case, then please state as such in the posting.
> If not, then can AlexD test this case ?
>
> - Don

I actually haven't done much with direct assignment to guests.  I
believe it was Greg who did that work to fix this issue.

I'm adding Jeff Kirsher to the CC.  Perhaps he can pull this patch into
a copy of the net tree and submit it to our validation team for testing
to see if they end up being able to reproduce the kernel panic issue
that was originally addressed by allowing the VFs to be persistent.

Thanks,

Alex

>>> But how about PF get removed via /sys or pciehp finally?
>>>
>>> During hot-remove, VF will still hold one ref to PF and it
>>> prevent PF to be removed.
>>> That make the next hot-add fails, as old PF dev struct is still around.
>>>
>>> We need to add pci_disable_sriov() calling during stop PF .
>>>
>>> Need this one for v3.11
>>
>> Don had a concern that there might be a regression here ... I'm a bit
>> confused on the details, but you guys need to come to agreement that
>> this doesn't make things worse.
>>
>>> Signed-off-by: Yinghai Lu<yinghai@kernel.org>
>>> Cc: Jiang Liu<liuj97@gmail.com>
>>> Cc: Alexander Duyck<alexander.h.duyck@intel.com>
>>> Cc: Donald Dutile<ddutile@redhat.com>
>>> Cc: Greg Rose<gregory.v.rose@intel.com>
>>>
>>> ---
>>>   drivers/pci/remove.c |    2 ++
>>>   1 file changed, 2 insertions(+)
>>>
>>> Index: linux-2.6/drivers/pci/remove.c
>>> ===================================================================
>>> --- linux-2.6.orig/drivers/pci/remove.c
>>> +++ linux-2.6/drivers/pci/remove.c
>>> @@ -25,6 +25,8 @@ static void pci_stop_dev(struct pci_dev
>>>                  pci_proc_detach_device(dev);
>>>                  pci_remove_sysfs_dev_files(dev);
>>>                  device_del(&dev->dev);
>>> +               /* remove VF, if PF driver skip that */
>>> +               pci_disable_sriov(dev);
>>>                  dev->is_added = 0;
>>>          }
>>>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Don Dutile July 29, 2013, 9:31 p.m. UTC | #4
On 07/29/2013 05:07 PM, Alexander Duyck wrote:
> On 07/29/2013 01:32 PM, Don Dutile wrote:
>> On 07/29/2013 03:58 PM, Bjorn Helgaas wrote:
>>> On Sat, Jul 27, 2013 at 9:11 AM, Yinghai Lu<yinghai@kernel.org>   wrote:
>>>> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
>>>> (PCI: Simplify IOV implementation and fix reference count races)
>>>> VF need to be removed via virtfn_remove to make sure ref to PF
>>>> is put back.
>>>>
>>>> Some driver (like ixgbe) does not call pci_disable_sriov() if
>>>> sriov is enabled via /sys/.../sriov_numvfs setting.
>>>> ixgbe does allow driver for PF get detached, but still have VFs
>>>> around.
>>>
>>> Is this something ixgbe should be doing differently?
>>>
>>> I'm not 100% sold on the idea of the VFs staying active after the
>>> driver releases the PF.  It seems asymmetric because I think the
>>> driver has to claim the PF to *enable* the VFs, but we don't disable
>>> them when releasing the PF.
>>>
>>> What's the use case for detaching the PF driver while the VFs are
>>> active?
>>>
>> VF's assigned to (KVM) guest (via device-assignment).
>> Virtually, it's as if the enet cable is unplugged to the VF in the
>> guest --
>> the device is still there (the PF wasn't unplugged, just the driver
>> de-configured).
>>
>> Pre-sysfs-based configuration, the std way to configure the VFs into
>> a system was to unload the PF driver, and reload it with a vf-enabling
>> parameter
>> (like max_vfs=<n>  in the case of ixgbe, igb).
>> Now, if someone unloaded the PF driver in the host, the unplanned removal
>> of the PF enabled the VF to crash the host (maybe AlexD can provide the
>> details how that occurred).
>> So, the solution was to 'pause' the VF operation and let packets drop
>> in the guest, and re-enable the VF operation if the PF driver was
>> re-configured.
>>
>> So, as I stated in previous postings, this patch is acceptable if
>> it doesn't cause a guest crash when a VF is assigned to a KVM guest.
>> If you tested this case, then please state as such in the posting.
>> If not, then can AlexD test this case ?
>>
>> - Don
>
> I actually haven't done much with direct assignment to guests.  I
> believe it was Greg who did that work to fix this issue.
>
> I'm adding Jeff Kirsher to the CC.  Perhaps he can pull this patch into
> a copy of the net tree and submit it to our validation team for testing
> to see if they end up being able to reproduce the kernel panic issue
> that was originally addressed by allowing the VFs to be persistent.
>
I'd appreciate hearing Jeff's test results....

> Thanks,
>
> Alex
>
>>>> But how about PF get removed via /sys or pciehp finally?
>>>>
>>>> During hot-remove, VF will still hold one ref to PF and it
>>>> prevent PF to be removed.
>>>> That make the next hot-add fails, as old PF dev struct is still around.
>>>>
>>>> We need to add pci_disable_sriov() calling during stop PF .
>>>>
>>>> Need this one for v3.11
>>>
>>> Don had a concern that there might be a regression here ... I'm a bit
>>> confused on the details, but you guys need to come to agreement that
>>> this doesn't make things worse.
>>>
>>>> Signed-off-by: Yinghai Lu<yinghai@kernel.org>
>>>> Cc: Jiang Liu<liuj97@gmail.com>
>>>> Cc: Alexander Duyck<alexander.h.duyck@intel.com>
>>>> Cc: Donald Dutile<ddutile@redhat.com>
>>>> Cc: Greg Rose<gregory.v.rose@intel.com>
>>>>
>>>> ---
>>>>    drivers/pci/remove.c |    2 ++
>>>>    1 file changed, 2 insertions(+)
>>>>
>>>> Index: linux-2.6/drivers/pci/remove.c
>>>> ===================================================================
>>>> --- linux-2.6.orig/drivers/pci/remove.c
>>>> +++ linux-2.6/drivers/pci/remove.c
>>>> @@ -25,6 +25,8 @@ static void pci_stop_dev(struct pci_dev
>>>>                   pci_proc_detach_device(dev);
>>>>                   pci_remove_sysfs_dev_files(dev);
>>>>                   device_del(&dev->dev);
>>>> +               /* remove VF, if PF driver skip that */
>>>> +               pci_disable_sriov(dev);
>>>>                   dev->is_added = 0;
>>>>           }
>>>>
>>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kirsher, Jeffrey T July 29, 2013, 9:52 p.m. UTC | #5
On Mon, 2013-07-29 at 17:31 -0400, Don Dutile wrote:
> On 07/29/2013 05:07 PM, Alexander Duyck wrote:
> > On 07/29/2013 01:32 PM, Don Dutile wrote:
> >> On 07/29/2013 03:58 PM, Bjorn Helgaas wrote:
> >>> On Sat, Jul 27, 2013 at 9:11 AM, Yinghai Lu<yinghai@kernel.org>   wrote:
> >>>> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
> >>>> (PCI: Simplify IOV implementation and fix reference count races)
> >>>> VF need to be removed via virtfn_remove to make sure ref to PF
> >>>> is put back.
> >>>>
> >>>> Some driver (like ixgbe) does not call pci_disable_sriov() if
> >>>> sriov is enabled via /sys/.../sriov_numvfs setting.
> >>>> ixgbe does allow driver for PF get detached, but still have VFs
> >>>> around.
> >>>
> >>> Is this something ixgbe should be doing differently?
> >>>
> >>> I'm not 100% sold on the idea of the VFs staying active after the
> >>> driver releases the PF.  It seems asymmetric because I think the
> >>> driver has to claim the PF to *enable* the VFs, but we don't disable
> >>> them when releasing the PF.
> >>>
> >>> What's the use case for detaching the PF driver while the VFs are
> >>> active?
> >>>
> >> VF's assigned to (KVM) guest (via device-assignment).
> >> Virtually, it's as if the enet cable is unplugged to the VF in the
> >> guest --
> >> the device is still there (the PF wasn't unplugged, just the driver
> >> de-configured).
> >>
> >> Pre-sysfs-based configuration, the std way to configure the VFs into
> >> a system was to unload the PF driver, and reload it with a vf-enabling
> >> parameter
> >> (like max_vfs=<n>  in the case of ixgbe, igb).
> >> Now, if someone unloaded the PF driver in the host, the unplanned removal
> >> of the PF enabled the VF to crash the host (maybe AlexD can provide the
> >> details how that occurred).
> >> So, the solution was to 'pause' the VF operation and let packets drop
> >> in the guest, and re-enable the VF operation if the PF driver was
> >> re-configured.
> >>
> >> So, as I stated in previous postings, this patch is acceptable if
> >> it doesn't cause a guest crash when a VF is assigned to a KVM guest.
> >> If you tested this case, then please state as such in the posting.
> >> If not, then can AlexD test this case ?
> >>
> >> - Don
> >
> > I actually haven't done much with direct assignment to guests.  I
> > believe it was Greg who did that work to fix this issue.
> >
> > I'm adding Jeff Kirsher to the CC.  Perhaps he can pull this patch into
> > a copy of the net tree and submit it to our validation team for testing
> > to see if they end up being able to reproduce the kernel panic issue
> > that was originally addressed by allowing the VFs to be persistent.
> >
> I'd appreciate hearing Jeff's test results....

Can I apply this patch as a standalone? or will I need the entire 3
patch series?

> 
> > Thanks,
> >
> > Alex
> >
> >>>> But how about PF get removed via /sys or pciehp finally?
> >>>>
> >>>> During hot-remove, VF will still hold one ref to PF and it
> >>>> prevent PF to be removed.
> >>>> That make the next hot-add fails, as old PF dev struct is still around.
> >>>>
> >>>> We need to add pci_disable_sriov() calling during stop PF .
> >>>>
> >>>> Need this one for v3.11
> >>>
> >>> Don had a concern that there might be a regression here ... I'm a bit
> >>> confused on the details, but you guys need to come to agreement that
> >>> this doesn't make things worse.
> >>>
> >>>> Signed-off-by: Yinghai Lu<yinghai@kernel.org>
> >>>> Cc: Jiang Liu<liuj97@gmail.com>
> >>>> Cc: Alexander Duyck<alexander.h.duyck@intel.com>
> >>>> Cc: Donald Dutile<ddutile@redhat.com>
> >>>> Cc: Greg Rose<gregory.v.rose@intel.com>
> >>>>
> >>>> ---
> >>>>    drivers/pci/remove.c |    2 ++
> >>>>    1 file changed, 2 insertions(+)
> >>>>
> >>>> Index: linux-2.6/drivers/pci/remove.c
> >>>> ===================================================================
> >>>> --- linux-2.6.orig/drivers/pci/remove.c
> >>>> +++ linux-2.6/drivers/pci/remove.c
> >>>> @@ -25,6 +25,8 @@ static void pci_stop_dev(struct pci_dev
> >>>>                   pci_proc_detach_device(dev);
> >>>>                   pci_remove_sysfs_dev_files(dev);
> >>>>                   device_del(&dev->dev);
> >>>> +               /* remove VF, if PF driver skip that */
> >>>> +               pci_disable_sriov(dev);
> >>>>                   dev->is_added = 0;
> >>>>           }
> >>>>
> >>
> >
>
Yinghai Lu July 29, 2013, 11:14 p.m. UTC | #6
On Mon, Jul 29, 2013 at 2:52 PM, Jeff Kirsher
<jeffrey.t.kirsher@intel.com> wrote:
> Can I apply this patch as a standalone? or will I need the entire 3
> patch series?

would be better if you can test those three on top of pci/for-linus branch.
http://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/log/?h=for-linus
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kirsher, Jeffrey T July 29, 2013, 11:23 p.m. UTC | #7
On Mon, 2013-07-29 at 16:14 -0700, Yinghai Lu wrote:
> On Mon, Jul 29, 2013 at 2:52 PM, Jeff Kirsher
> <jeffrey.t.kirsher@intel.com> wrote:
> > Can I apply this patch as a standalone? or will I need the entire 3
> > patch series?
> 
> would be better if you can test those three on top of pci/for-linus branch.
> http://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/log/?h=for-linus

I will have to do that, because I was only able to get patch 3 of the
series to apply to David Miller's net tree.
Don Dutile July 30, 2013, 3:04 p.m. UTC | #8
On 07/29/2013 07:23 PM, Jeff Kirsher wrote:
> On Mon, 2013-07-29 at 16:14 -0700, Yinghai Lu wrote:
>> On Mon, Jul 29, 2013 at 2:52 PM, Jeff Kirsher
>> <jeffrey.t.kirsher@intel.com>  wrote:
>>> Can I apply this patch as a standalone? or will I need the entire 3
>>> patch series?
>>
>> would be better if you can test those three on top of pci/for-linus branch.
>> http://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/log/?h=for-linus
>
> I will have to do that, because I was only able to get patch 3 of the
> series to apply to David Miller's net tree.
I could only find 1/3 of the other 2 patches; don't know why 2/3 is missing in my mailbox.

Given that Yinghai has requested all 3, and you agreed, sounds like you have your answer. ;-)

- Don

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Aug. 1, 2013, 8:16 p.m. UTC | #9
On Mon, Jul 29, 2013 at 2:32 PM, Don Dutile <ddutile@redhat.com> wrote:
> On 07/29/2013 03:58 PM, Bjorn Helgaas wrote:
>>
>> On Sat, Jul 27, 2013 at 9:11 AM, Yinghai Lu<yinghai@kernel.org>  wrote:
>>>
>>> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
>>> (PCI: Simplify IOV implementation and fix reference count races)
>>> VF need to be removed via virtfn_remove to make sure ref to PF
>>> is put back.

OK, I'm lost here.  I think this is the scenario where Yinghai saw a
regression (please correct me if not):

  00:03.0 Root port to [bus 02]
  02:00.0 SR-IOV NIC (PF in slot 2)
  02:00.1 VF (for example)

  # echo -n 0 > /sys/bus/pci/slots/2/power
  02:00.0 PF is powered off
  02:00.0 PF pci_dev is released, but VF 02:00.1 pci_dev still exists
and holds a reference to the PF pci_dev, so the 02:00.0 pci_dev is not
actually deallocated

  # echo -n 1 > /sys/bus/pci/slots/2/power
  pciehp 0000:00:03.0:pcie04: Device 0000:02:00.0 already exists at
0000:02:00, cannot hot-add

Prior to dc087f2f6 ("Simplify IOV implementation and fix reference
count races"), this scenario (powering the slot off then back on)
apparently worked, and hot-adding 02:00.0 worked fine.

But what about the VF pci_devs?  Prior to dc087f2f6, I assume they
still existed even after we removed power from the PF.  But obviously
the hardware VFs are disabled when we power the slot back up.  It
doesn't make sense to me to have pci_devs for these VFs that are no
longer enabled, so maybe I'm missing something here.

>>> Some driver (like ixgbe) does not call pci_disable_sriov() if
>>> sriov is enabled via /sys/.../sriov_numvfs setting.
>>> ixgbe does allow driver for PF get detached, but still have VFs
>>> around.
>>
>> Is this something ixgbe should be doing differently?
>>
>> I'm not 100% sold on the idea of the VFs staying active after the
>> driver releases the PF.  It seems asymmetric because I think the
>> driver has to claim the PF to *enable* the VFs, but we don't disable
>> them when releasing the PF.
>>
>> What's the use case for detaching the PF driver while the VFs are active?
>>
> VF's assigned to (KVM) guest (via device-assignment).
> Virtually, it's as if the enet cable is unplugged to the VF in the guest --
> the device is still there (the PF wasn't unplugged, just the driver
> de-configured).

OK, but why is it necessary to detach and reattach the PF driver at
all?  Are you trying to update the PF driver or something?  Change the
number of VFs?

> Pre-sysfs-based configuration, the std way to configure the VFs into
> a system was to unload the PF driver, and reload it with a vf-enabling
> parameter
> (like max_vfs=<n> in the case of ixgbe, igb).

Yes.  But I assume the first time the PF was loaded, there were no VFs
enabled.  So disabling VFs at unload wouldn't cause any problem there.
 Then you reload the driver with "max_vfs=<n>".  The driver enables
VFs.  Is there any reason to unload the PF driver again?

> Now, if someone unloaded the PF driver in the host, the unplanned removal
> of the PF enabled the VF to crash the host (maybe AlexD can provide the
> details how that occurred).
> So, the solution was to 'pause' the VF operation and let packets drop
> in the guest, and re-enable the VF operation if the PF driver was
> re-configured.

I don't get this.  Why should we tolerate "unplanned removal of the
PF"?  If you yank the card, I would *expect* anything using it to
crash.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Don Dutile Aug. 1, 2013, 9:21 p.m. UTC | #10
On 08/01/2013 04:16 PM, Bjorn Helgaas wrote:
> On Mon, Jul 29, 2013 at 2:32 PM, Don Dutile<ddutile@redhat.com>  wrote:
>> On 07/29/2013 03:58 PM, Bjorn Helgaas wrote:
>>>
>>> On Sat, Jul 27, 2013 at 9:11 AM, Yinghai Lu<yinghai@kernel.org>   wrote:
>>>>
>>>> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
>>>> (PCI: Simplify IOV implementation and fix reference count races)
>>>> VF need to be removed via virtfn_remove to make sure ref to PF
>>>> is put back.
>
> OK, I'm lost here.  I think this is the scenario where Yinghai saw a
> regression (please correct me if not):
>
>    00:03.0 Root port to [bus 02]
>    02:00.0 SR-IOV NIC (PF in slot 2)
>    02:00.1 VF (for example)
>
>    # echo -n 0>  /sys/bus/pci/slots/2/power
>    02:00.0 PF is powered off
>    02:00.0 PF pci_dev is released, but VF 02:00.1 pci_dev still exists
> and holds a reference to the PF pci_dev, so the 02:00.0 pci_dev is not
> actually deallocated
>
>    # echo -n 1>  /sys/bus/pci/slots/2/power
>    pciehp 0000:00:03.0:pcie04: Device 0000:02:00.0 already exists at
> 0000:02:00, cannot hot-add
>
> Prior to dc087f2f6 ("Simplify IOV implementation and fix reference
> count races"), this scenario (powering the slot off then back on)
> apparently worked, and hot-adding 02:00.0 worked fine.
>
> But what about the VF pci_devs?  Prior to dc087f2f6, I assume they
> still existed even after we removed power from the PF.  But obviously
> the hardware VFs are disabled when we power the slot back up.  It
> doesn't make sense to me to have pci_devs for these VFs that are no
> longer enabled, so maybe I'm missing something here.
>
>>>> Some driver (like ixgbe) does not call pci_disable_sriov() if
>>>> sriov is enabled via /sys/.../sriov_numvfs setting.
>>>> ixgbe does allow driver for PF get detached, but still have VFs
>>>> around.
>>>
>>> Is this something ixgbe should be doing differently?
>>>
>>> I'm not 100% sold on the idea of the VFs staying active after the
>>> driver releases the PF.  It seems asymmetric because I think the
>>> driver has to claim the PF to *enable* the VFs, but we don't disable
>>> them when releasing the PF.
>>>
>>> What's the use case for detaching the PF driver while the VFs are active?
>>>
>> VF's assigned to (KVM) guest (via device-assignment).
>> Virtually, it's as if the enet cable is unplugged to the VF in the guest --
>> the device is still there (the PF wasn't unplugged, just the driver
>> de-configured).
>
> OK, but why is it necessary to detach and reattach the PF driver at
> all?  Are you trying to update the PF driver or something?  Change the
> number of VFs?
>
One case is to update the PF driver; the other is 'operator error' :-/
Want to not crash the guest w/VFs, and allow operator to 'mend their error',
i.e., re-load the PF driver back.

>> Pre-sysfs-based configuration, the std way to configure the VFs into
>> a system was to unload the PF driver, and reload it with a vf-enabling
>> parameter
>> (like max_vfs=<n>  in the case of ixgbe, igb).
>
> Yes.  But I assume the first time the PF was loaded, there were no VFs
> enabled.  So disabling VFs at unload wouldn't cause any problem there.
>   Then you reload the driver with "max_vfs=<n>".  The driver enables
> VFs.  Is there any reason to unload the PF driver again?
>
>> Now, if someone unloaded the PF driver in the host, the unplanned removal
>> of the PF enabled the VF to crash the host (maybe AlexD can provide the
>> details how that occurred).
>> So, the solution was to 'pause' the VF operation and let packets drop
>> in the guest, and re-enable the VF operation if the PF driver was
>> re-configured.
>
> I don't get this.  Why should we tolerate "unplanned removal of the
> PF"?  If you yank the card, I would *expect* anything using it to
> crash.
>
Last time I checked, PCIe handles surprise removal, electrically,
on unplanned removal -- it's capacitively coupled so crashes due to
odd signalling transition can be handled (malformed pcie packet, w/c,
which aer handles/dismisses).

Note, the i(x)gb(e) code was designed to handle operator error around
PF driver removal.  hot plug/unplug was probably not tested against
the 'save the VF state' code in the i(x)gb(e)vf drivers.
Again, try asking the driver owner(s).
  
> Bjorn

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Aug. 1, 2013, 9:41 p.m. UTC | #11
On Thu, Aug 1, 2013 at 3:21 PM, Don Dutile <ddutile@redhat.com> wrote:
> On 08/01/2013 04:16 PM, Bjorn Helgaas wrote:
>>
>> On Mon, Jul 29, 2013 at 2:32 PM, Don Dutile<ddutile@redhat.com>  wrote:
>>>
>>> On 07/29/2013 03:58 PM, Bjorn Helgaas wrote:
>>>>
>>>>
>>>> On Sat, Jul 27, 2013 at 9:11 AM, Yinghai Lu<yinghai@kernel.org>   wrote:
>>>>>
>>>>>
>>>>> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
>>>>> (PCI: Simplify IOV implementation and fix reference count races)
>>>>> VF need to be removed via virtfn_remove to make sure ref to PF
>>>>> is put back.
>>
>>
>> OK, I'm lost here.  I think this is the scenario where Yinghai saw a
>> regression (please correct me if not):
>>
>>    00:03.0 Root port to [bus 02]
>>    02:00.0 SR-IOV NIC (PF in slot 2)
>>    02:00.1 VF (for example)
>>
>>    # echo -n 0>  /sys/bus/pci/slots/2/power
>>    02:00.0 PF is powered off
>>    02:00.0 PF pci_dev is released, but VF 02:00.1 pci_dev still exists
>> and holds a reference to the PF pci_dev, so the 02:00.0 pci_dev is not
>> actually deallocated
>>
>>    # echo -n 1>  /sys/bus/pci/slots/2/power
>>    pciehp 0000:00:03.0:pcie04: Device 0000:02:00.0 already exists at
>> 0000:02:00, cannot hot-add
>>
>> Prior to dc087f2f6 ("Simplify IOV implementation and fix reference
>> count races"), this scenario (powering the slot off then back on)
>> apparently worked, and hot-adding 02:00.0 worked fine.
>>
>> But what about the VF pci_devs?  Prior to dc087f2f6, I assume they
>> still existed even after we removed power from the PF.  But obviously
>> the hardware VFs are disabled when we power the slot back up.  It
>> doesn't make sense to me to have pci_devs for these VFs that are no
>> longer enabled, so maybe I'm missing something here.
>>
>>>>> Some driver (like ixgbe) does not call pci_disable_sriov() if
>>>>> sriov is enabled via /sys/.../sriov_numvfs setting.
>>>>> ixgbe does allow driver for PF get detached, but still have VFs
>>>>> around.
>>>>
>>>>
>>>> Is this something ixgbe should be doing differently?
>>>>
>>>> I'm not 100% sold on the idea of the VFs staying active after the
>>>> driver releases the PF.  It seems asymmetric because I think the
>>>> driver has to claim the PF to *enable* the VFs, but we don't disable
>>>> them when releasing the PF.
>>>>
>>>> What's the use case for detaching the PF driver while the VFs are
>>>> active?
>>>>
>>> VF's assigned to (KVM) guest (via device-assignment).
>>> Virtually, it's as if the enet cable is unplugged to the VF in the guest
>>> --
>>> the device is still there (the PF wasn't unplugged, just the driver
>>> de-configured).
>>
>>
>> OK, but why is it necessary to detach and reattach the PF driver at
>> all?  Are you trying to update the PF driver or something?  Change the
>> number of VFs?
>>
> One case is to update the PF driver; the other is 'operator error' :-/
> Want to not crash the guest w/VFs, and allow operator to 'mend their error',
> i.e., re-load the PF driver back.

Neither of these sounds very compelling to me.

We don't support updating drivers for non-SR-IOV devices; why should
we bother doing something different just because this device has
SR-IOV enabled?

Is there a way to address the operator error issue by making rmmod
fail if there are VFs assigned to a guest?
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Duyck, Alexander H Aug. 1, 2013, 9:55 p.m. UTC | #12
On 08/01/2013 02:21 PM, Don Dutile wrote:
> On 08/01/2013 04:16 PM, Bjorn Helgaas wrote:
>> On Mon, Jul 29, 2013 at 2:32 PM, Don Dutile<ddutile@redhat.com>  wrote:
>>> On 07/29/2013 03:58 PM, Bjorn Helgaas wrote:
>>>>
>>>> On Sat, Jul 27, 2013 at 9:11 AM, Yinghai Lu<yinghai@kernel.org>  
>>>> wrote:
>>>>>
>>>>> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
>>>>> (PCI: Simplify IOV implementation and fix reference count races)
>>>>> VF need to be removed via virtfn_remove to make sure ref to PF
>>>>> is put back.
>>
>> OK, I'm lost here.  I think this is the scenario where Yinghai saw a
>> regression (please correct me if not):
>>
>>    00:03.0 Root port to [bus 02]
>>    02:00.0 SR-IOV NIC (PF in slot 2)
>>    02:00.1 VF (for example)
>>
>>    # echo -n 0>  /sys/bus/pci/slots/2/power
>>    02:00.0 PF is powered off
>>    02:00.0 PF pci_dev is released, but VF 02:00.1 pci_dev still exists
>> and holds a reference to the PF pci_dev, so the 02:00.0 pci_dev is not
>> actually deallocated
>>
>>    # echo -n 1>  /sys/bus/pci/slots/2/power
>>    pciehp 0000:00:03.0:pcie04: Device 0000:02:00.0 already exists at
>> 0000:02:00, cannot hot-add
>>
>> Prior to dc087f2f6 ("Simplify IOV implementation and fix reference
>> count races"), this scenario (powering the slot off then back on)
>> apparently worked, and hot-adding 02:00.0 worked fine.
>>
>> But what about the VF pci_devs?  Prior to dc087f2f6, I assume they
>> still existed even after we removed power from the PF.  But obviously
>> the hardware VFs are disabled when we power the slot back up.  It
>> doesn't make sense to me to have pci_devs for these VFs that are no
>> longer enabled, so maybe I'm missing something here.
>>

Actually this is part of what I don't understand as well.  I would have
thought that if the device was hot-plugged that the VF devices would
have been removed as devices on the PCI bus since they all lived on the
same bus.  Why is it the VFs are still present to hold a reference to
the PF at all?

I'm not sure why we even have to disable SR-IOV in this case.  It seems
like the hot-plug should have walked though the bus already and removed
the VFs.

>>>>> Some driver (like ixgbe) does not call pci_disable_sriov() if
>>>>> sriov is enabled via /sys/.../sriov_numvfs setting.
>>>>> ixgbe does allow driver for PF get detached, but still have VFs
>>>>> around.
>>>>
>>>> Is this something ixgbe should be doing differently?
>>>>
>>>> I'm not 100% sold on the idea of the VFs staying active after the
>>>> driver releases the PF.  It seems asymmetric because I think the
>>>> driver has to claim the PF to *enable* the VFs, but we don't disable
>>>> them when releasing the PF.
>>>>
>>>> What's the use case for detaching the PF driver while the VFs are
>>>> active?
>>>>
>>> VF's assigned to (KVM) guest (via device-assignment).
>>> Virtually, it's as if the enet cable is unplugged to the VF in the
>>> guest --
>>> the device is still there (the PF wasn't unplugged, just the driver
>>> de-configured).
>>
>> OK, but why is it necessary to detach and reattach the PF driver at
>> all?  Are you trying to update the PF driver or something?  Change the
>> number of VFs?
>>
> One case is to update the PF driver; the other is 'operator error' :-/
> Want to not crash the guest w/VFs, and allow operator to 'mend their
> error',
> i.e., re-load the PF driver back.
>

That pretty much matches my understanding of things.  Basically we
cannot modify the number of VFs until all of the VFs are unassigned.

>>> Pre-sysfs-based configuration, the std way to configure the VFs into
>>> a system was to unload the PF driver, and reload it with a vf-enabling
>>> parameter
>>> (like max_vfs=<n>  in the case of ixgbe, igb).
>>
>> Yes.  But I assume the first time the PF was loaded, there were no VFs
>> enabled.  So disabling VFs at unload wouldn't cause any problem there.
>>   Then you reload the driver with "max_vfs=<n>".  The driver enables
>> VFs.  Is there any reason to unload the PF driver again?
>>
>>> Now, if someone unloaded the PF driver in the host, the unplanned
>>> removal
>>> of the PF enabled the VF to crash the host (maybe AlexD can provide the
>>> details how that occurred).
>>> So, the solution was to 'pause' the VF operation and let packets drop
>>> in the guest, and re-enable the VF operation if the PF driver was
>>> re-configured.
>>
>> I don't get this.  Why should we tolerate "unplanned removal of the
>> PF"?  If you yank the card, I would *expect* anything using it to
>> crash.
>>
> Last time I checked, PCIe handles surprise removal, electrically,
> on unplanned removal -- it's capacitively coupled so crashes due to
> odd signalling transition can be handled (malformed pcie packet, w/c,
> which aer handles/dismisses).
>

PCIe handles the surprise removal, but the kernel doesn't do much about
it.  From what I have seen in the past surprise removal results in
completion timeouts and reads returning ~0.  The device itself doesn't
get removed from the bus by the device going away.

> Note, the i(x)gb(e) code was designed to handle operator error around
> PF driver removal.  hot plug/unplug was probably not tested against
> the 'save the VF state' code in the i(x)gb(e)vf drivers.
> Again, try asking the driver owner(s).

I'm suspecting hot-plug will probably cause the same issues the we saw
with the PF driver removal.  I'm still waiting on the response from
testing before I know one way or the other.

Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: linux-2.6/drivers/pci/remove.c
===================================================================
--- linux-2.6.orig/drivers/pci/remove.c
+++ linux-2.6/drivers/pci/remove.c
@@ -25,6 +25,8 @@  static void pci_stop_dev(struct pci_dev
 		pci_proc_detach_device(dev);
 		pci_remove_sysfs_dev_files(dev);
 		device_del(&dev->dev);
+		/* remove VF, if PF driver skip that */
+		pci_disable_sriov(dev);
 		dev->is_added = 0;
 	}