diff mbox

[Xen-devel,v5,9/9] xen/pciback: Implement PCI reset slot or bus with 'do_flr' SysFS attribute

Message ID 5480528F.8010106@citrix.com
State Not Applicable
Headers show

Commit Message

David Vrabel Dec. 4, 2014, 12:24 p.m. UTC
On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:
> 
> On Dec 4, 2014 6:30 AM, David Vrabel <david.vrabel@citrix.com> wrote:
>>
>> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote: 
>>>
>>> Instead of doing all this complex dance, we depend on the toolstack 
>>> doing the right thing. As such implement the 'do_flr' SysFS attribute 
>>> which 'xl' uses when a device is detached or attached from/to a guest. 
>>> It bypasses the need to worry about the PCI lock. 
>>
>> No.  Get pciback to add its own "reset" sysfs file (as I have repeatedly 
>> proposed). 
>>
> 
> Which does not work as the kobj will complain (as there is already an 'reset' associated with the PCI device).

It is only needed if the core won't provide one.

+static int pcistub_try_create_reset_file(struct pci_dev *pci)
+{
+       struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
+       struct device *dev = &pci->dev;
+       int ret;
+
+       /* Already have a per-function reset? */
+       if (pci_probe_reset_function(pci) == 0)
+               return 0;
+
+       ret = device_create_file(dev, &dev_attr_reset);
+       if (ret < 0)
+               return ret;
+       dev_data->created_reset_file = true;
+       return 0;
+}

David

Comments

Sander Eikelenboom Dec. 4, 2014, 1:10 p.m. UTC | #1
Thursday, December 4, 2014, 1:24:47 PM, you wrote:

> On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:
>> 
>> On Dec 4, 2014 6:30 AM, David Vrabel <david.vrabel@citrix.com> wrote:
>>>
>>> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote: 
>>>>
>>>> Instead of doing all this complex dance, we depend on the toolstack 
>>>> doing the right thing. As such implement the 'do_flr' SysFS attribute 
>>>> which 'xl' uses when a device is detached or attached from/to a guest. 
>>>> It bypasses the need to worry about the PCI lock. 
>>>
>>> No.  Get pciback to add its own "reset" sysfs file (as I have repeatedly 
>>> proposed). 
>>>
>> 
>> Which does not work as the kobj will complain (as there is already an 'reset' associated with the PCI device).

> It is only needed if the core won't provide one.

> +static int pcistub_try_create_reset_file(struct pci_dev *pci)
> +{
> +       struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
> +       struct device *dev = &pci->dev;
> +       int ret;
> +
> +       /* Already have a per-function reset? */
> +       if (pci_probe_reset_function(pci) == 0)
> +               return 0;
> +
> +       ret = device_create_file(dev, &dev_attr_reset);
> +       if (ret < 0)
> +               return ret;
+       dev_data->>created_reset_file = true;
> +       return 0;
> +}

Wouldn't the "core-reset-sysfs-file" be still wired to the end up calling 
"pci.c:__pci_dev_reset" ?

The problem with that function is that from my testing it seems that the 
first option "pci_dev_specific_reset" always seems to return succes, so all the
other options are skipped (flr, pm, slot, bus). However the device it self is 
not properly reset enough (perhaps the pci_dev_specific_reset is good enough for 
none virtualization purposes and it's probably the least intrusive. For 
virtualization however it would be nice to be sure it resets properly, or have a 
way to force a specific reset routine.) 

So it's the ordering and skipping of the other resets that seems to make
this workaround necessary in the first place.

> David





--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Vrabel Dec. 4, 2014, 1:43 p.m. UTC | #2
On 04/12/14 13:10, Sander Eikelenboom wrote:
> 
> Thursday, December 4, 2014, 1:24:47 PM, you wrote:
> 
>> On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:
>>>
>>> On Dec 4, 2014 6:30 AM, David Vrabel <david.vrabel@citrix.com> wrote:
>>>>
>>>> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote: 
>>>>>
>>>>> Instead of doing all this complex dance, we depend on the toolstack 
>>>>> doing the right thing. As such implement the 'do_flr' SysFS attribute 
>>>>> which 'xl' uses when a device is detached or attached from/to a guest. 
>>>>> It bypasses the need to worry about the PCI lock. 
>>>>
>>>> No.  Get pciback to add its own "reset" sysfs file (as I have repeatedly 
>>>> proposed). 
>>>>
>>>
>>> Which does not work as the kobj will complain (as there is already an 'reset' associated with the PCI device).
> 
>> It is only needed if the core won't provide one.
> 
>> +static int pcistub_try_create_reset_file(struct pci_dev *pci)
>> +{
>> +       struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
>> +       struct device *dev = &pci->dev;
>> +       int ret;
>> +
>> +       /* Already have a per-function reset? */
>> +       if (pci_probe_reset_function(pci) == 0)
>> +               return 0;
>> +
>> +       ret = device_create_file(dev, &dev_attr_reset);
>> +       if (ret < 0)
>> +               return ret;
> +       dev_data->>created_reset_file = true;
>> +       return 0;
>> +}
> 
> Wouldn't the "core-reset-sysfs-file" be still wired to the end up calling 
> "pci.c:__pci_dev_reset" ?
> 
> The problem with that function is that from my testing it seems that the 
> first option "pci_dev_specific_reset" always seems to return succes, so all the
> other options are skipped (flr, pm, slot, bus). However the device it self is 
> not properly reset enough (perhaps the pci_dev_specific_reset is good enough for 
> none virtualization purposes and it's probably the least intrusive. For 
> virtualization however it would be nice to be sure it resets properly, or have a 
> way to force a specific reset routine.)

Then you need work with the maintainer for those specific devices or
drivers to fix their specific reset function.

I'm not adding stuff to pciback to workaround broken quirks.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sander Eikelenboom Dec. 4, 2014, 2:09 p.m. UTC | #3
Thursday, December 4, 2014, 2:43:06 PM, you wrote:

> On 04/12/14 13:10, Sander Eikelenboom wrote:
>> 
>> Thursday, December 4, 2014, 1:24:47 PM, you wrote:
>> 
>>> On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:
>>>>
>>>> On Dec 4, 2014 6:30 AM, David Vrabel <david.vrabel@citrix.com> wrote:
>>>>>
>>>>> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote: 
>>>>>>
>>>>>> Instead of doing all this complex dance, we depend on the toolstack 
>>>>>> doing the right thing. As such implement the 'do_flr' SysFS attribute 
>>>>>> which 'xl' uses when a device is detached or attached from/to a guest. 
>>>>>> It bypasses the need to worry about the PCI lock. 
>>>>>
>>>>> No.  Get pciback to add its own "reset" sysfs file (as I have repeatedly 
>>>>> proposed). 
>>>>>
>>>>
>>>> Which does not work as the kobj will complain (as there is already an 'reset' associated with the PCI device).
>> 
>>> It is only needed if the core won't provide one.
>> 
>>> +static int pcistub_try_create_reset_file(struct pci_dev *pci)
>>> +{
>>> +       struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
>>> +       struct device *dev = &pci->dev;
>>> +       int ret;
>>> +
>>> +       /* Already have a per-function reset? */
>>> +       if (pci_probe_reset_function(pci) == 0)
>>> +               return 0;
>>> +
>>> +       ret = device_create_file(dev, &dev_attr_reset);
>>> +       if (ret < 0)
>>> +               return ret;
>> +       dev_data->>created_reset_file = true;
>>> +       return 0;
>>> +}
>> 
>> Wouldn't the "core-reset-sysfs-file" be still wired to the end up calling 
>> "pci.c:__pci_dev_reset" ?
>> 
>> The problem with that function is that from my testing it seems that the 
>> first option "pci_dev_specific_reset" always seems to return succes, so all the
>> other options are skipped (flr, pm, slot, bus). However the device it self is 
>> not properly reset enough (perhaps the pci_dev_specific_reset is good enough for 
>> none virtualization purposes and it's probably the least intrusive. For 
>> virtualization however it would be nice to be sure it resets properly, or have a 
>> way to force a specific reset routine.)

> Then you need work with the maintainer for those specific devices or
> drivers to fix their specific reset function.

> I'm not adding stuff to pciback to workaround broken quirks.

OK that's a pretty clear message there, so if one wants to use pci and vga
passthrough one should better use KVM and vfio-pci. 

vfio-pci has:
- logic to do the try-slot-bus-reset logic
- it has quirks specific to vga passthrough
implemented in from the looks of it a quite clean driver.
(the main issue with it so far was you could only seize devices based on 
vendor and device id, which can be a problem when you have multiple devices.
However that was resolved recently if i am correct.)

And neither of those will be supported by xen-pciback if i get your message 
right ?

--
Sander

> David


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sander Eikelenboom Dec. 4, 2014, 2:14 p.m. UTC | #4
Hello Sander,

Thursday, December 4, 2014, 3:09:09 PM, you wrote:


> Thursday, December 4, 2014, 2:43:06 PM, you wrote:

>> On 04/12/14 13:10, Sander Eikelenboom wrote:
>>> 
>>> Thursday, December 4, 2014, 1:24:47 PM, you wrote:
>>> 
>>>> On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:
>>>>>
>>>>> On Dec 4, 2014 6:30 AM, David Vrabel <david.vrabel@citrix.com> wrote:
>>>>>>
>>>>>> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote: 
>>>>>>>
>>>>>>> Instead of doing all this complex dance, we depend on the toolstack 
>>>>>>> doing the right thing. As such implement the 'do_flr' SysFS attribute 
>>>>>>> which 'xl' uses when a device is detached or attached from/to a guest. 
>>>>>>> It bypasses the need to worry about the PCI lock. 
>>>>>>
>>>>>> No.  Get pciback to add its own "reset" sysfs file (as I have repeatedly 
>>>>>> proposed). 
>>>>>>
>>>>>
>>>>> Which does not work as the kobj will complain (as there is already an 'reset' associated with the PCI device).
>>> 
>>>> It is only needed if the core won't provide one.
>>> 
>>>> +static int pcistub_try_create_reset_file(struct pci_dev *pci)
>>>> +{
>>>> +       struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
>>>> +       struct device *dev = &pci->dev;
>>>> +       int ret;
>>>> +
>>>> +       /* Already have a per-function reset? */
>>>> +       if (pci_probe_reset_function(pci) == 0)
>>>> +               return 0;
>>>> +
>>>> +       ret = device_create_file(dev, &dev_attr_reset);
>>>> +       if (ret < 0)
>>>> +               return ret;
>>> +       dev_data->>created_reset_file = true;
>>>> +       return 0;
>>>> +}
>>> 
>>> Wouldn't the "core-reset-sysfs-file" be still wired to the end up calling 
>>> "pci.c:__pci_dev_reset" ?
>>> 
>>> The problem with that function is that from my testing it seems that the 
>>> first option "pci_dev_specific_reset" always seems to return succes, so all the
>>> other options are skipped (flr, pm, slot, bus). However the device it self is 
>>> not properly reset enough (perhaps the pci_dev_specific_reset is good enough for 
>>> none virtualization purposes and it's probably the least intrusive. For 
>>> virtualization however it would be nice to be sure it resets properly, or have a 
>>> way to force a specific reset routine.)

>> Then you need work with the maintainer for those specific devices or
>> drivers to fix their specific reset function.

>> I'm not adding stuff to pciback to workaround broken quirks.

> OK that's a pretty clear message there, so if one wants to use pci and vga
> passthrough one should better use KVM and vfio-pci. 

> vfio-pci has:
> - logic to do the try-slot-bus-reset logic
> - it has quirks specific to vga passthrough
Hrmm have to correct my self because the vga-pt quirks are part of the vfio-pci 
part in qemu.

The try-slot-bus-reset logic is part of the kernel vfio-pci driver though and they faced the same locking issue it 
seems:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=61cf16d8bd38c3dc52033ea75d5b1f8368514a17
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=890ed578df82f5b7b5a874f9f2fa4f117305df5f

> implemented in from the looks of it a quite clean driver.
> (the main issue with it so far was you could only seize devices based on 
> vendor and device id, which can be a problem when you have multiple devices.
> However that was resolved recently if i am correct.)

> And neither of those will be supported by xen-pciback if i get your message 
> right ?

> --
> Sander

>> David


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Vrabel Dec. 4, 2014, 2:31 p.m. UTC | #5
On 04/12/14 14:09, Sander Eikelenboom wrote:
> 
> Thursday, December 4, 2014, 2:43:06 PM, you wrote:
> 
>> On 04/12/14 13:10, Sander Eikelenboom wrote:
>>>
>>> Thursday, December 4, 2014, 1:24:47 PM, you wrote:
>>>
>>>> On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:
>>>>>
>>>>> On Dec 4, 2014 6:30 AM, David Vrabel <david.vrabel@citrix.com> wrote:
>>>>>>
>>>>>> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote: 
>>>>>>>
>>>>>>> Instead of doing all this complex dance, we depend on the toolstack 
>>>>>>> doing the right thing. As such implement the 'do_flr' SysFS attribute 
>>>>>>> which 'xl' uses when a device is detached or attached from/to a guest. 
>>>>>>> It bypasses the need to worry about the PCI lock. 
>>>>>>
>>>>>> No.  Get pciback to add its own "reset" sysfs file (as I have repeatedly 
>>>>>> proposed). 
>>>>>>
>>>>>
>>>>> Which does not work as the kobj will complain (as there is already an 'reset' associated with the PCI device).
>>>
>>>> It is only needed if the core won't provide one.
>>>
>>>> +static int pcistub_try_create_reset_file(struct pci_dev *pci)
>>>> +{
>>>> +       struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
>>>> +       struct device *dev = &pci->dev;
>>>> +       int ret;
>>>> +
>>>> +       /* Already have a per-function reset? */
>>>> +       if (pci_probe_reset_function(pci) == 0)
>>>> +               return 0;
>>>> +
>>>> +       ret = device_create_file(dev, &dev_attr_reset);
>>>> +       if (ret < 0)
>>>> +               return ret;
>>> +       dev_data->>created_reset_file = true;
>>>> +       return 0;
>>>> +}
>>>
>>> Wouldn't the "core-reset-sysfs-file" be still wired to the end up calling 
>>> "pci.c:__pci_dev_reset" ?
>>>
>>> The problem with that function is that from my testing it seems that the 
>>> first option "pci_dev_specific_reset" always seems to return succes, so all the
>>> other options are skipped (flr, pm, slot, bus). However the device it self is 
>>> not properly reset enough (perhaps the pci_dev_specific_reset is good enough for 
>>> none virtualization purposes and it's probably the least intrusive. For 
>>> virtualization however it would be nice to be sure it resets properly, or have a 
>>> way to force a specific reset routine.)
> 
>> Then you need work with the maintainer for those specific devices or
>> drivers to fix their specific reset function.
> 
>> I'm not adding stuff to pciback to workaround broken quirks.
> 
> OK that's a pretty clear message there, so if one wants to use pci and vga
> passthrough one should better use KVM and vfio-pci.

Have you (or anyone else) ever raised the problem with the broken reset
quirk for certain devices with the relevant maintainer?

> vfio-pci has:
> - logic to do the try-slot-bus-reset logic

Just because vfio-pci fixed it incorrectly doesn't mean pciback has to
as well.

It makes no sense for both pciback and vfio-pci to workaround problems
with pci_function_reset() in different ways -- it should be fixed in the
core PCI code so both can benefit and make use of the same code.

David

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sander Eikelenboom Dec. 4, 2014, 2:50 p.m. UTC | #6
Thursday, December 4, 2014, 3:31:11 PM, you wrote:

> On 04/12/14 14:09, Sander Eikelenboom wrote:
>> 
>> Thursday, December 4, 2014, 2:43:06 PM, you wrote:
>> 
>>> On 04/12/14 13:10, Sander Eikelenboom wrote:
>>>>
>>>> Thursday, December 4, 2014, 1:24:47 PM, you wrote:
>>>>
>>>>> On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:
>>>>>>
>>>>>> On Dec 4, 2014 6:30 AM, David Vrabel <david.vrabel@citrix.com> wrote:
>>>>>>>
>>>>>>> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote: 
>>>>>>>>
>>>>>>>> Instead of doing all this complex dance, we depend on the toolstack 
>>>>>>>> doing the right thing. As such implement the 'do_flr' SysFS attribute 
>>>>>>>> which 'xl' uses when a device is detached or attached from/to a guest. 
>>>>>>>> It bypasses the need to worry about the PCI lock. 
>>>>>>>
>>>>>>> No.  Get pciback to add its own "reset" sysfs file (as I have repeatedly 
>>>>>>> proposed). 
>>>>>>>
>>>>>>
>>>>>> Which does not work as the kobj will complain (as there is already an 'reset' associated with the PCI device).
>>>>
>>>>> It is only needed if the core won't provide one.
>>>>
>>>>> +static int pcistub_try_create_reset_file(struct pci_dev *pci)
>>>>> +{
>>>>> +       struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
>>>>> +       struct device *dev = &pci->dev;
>>>>> +       int ret;
>>>>> +
>>>>> +       /* Already have a per-function reset? */
>>>>> +       if (pci_probe_reset_function(pci) == 0)
>>>>> +               return 0;
>>>>> +
>>>>> +       ret = device_create_file(dev, &dev_attr_reset);
>>>>> +       if (ret < 0)
>>>>> +               return ret;
>>>> +       dev_data->>created_reset_file = true;
>>>>> +       return 0;
>>>>> +}
>>>>
>>>> Wouldn't the "core-reset-sysfs-file" be still wired to the end up calling 
>>>> "pci.c:__pci_dev_reset" ?
>>>>
>>>> The problem with that function is that from my testing it seems that the 
>>>> first option "pci_dev_specific_reset" always seems to return succes, so all the
>>>> other options are skipped (flr, pm, slot, bus). However the device it self is 
>>>> not properly reset enough (perhaps the pci_dev_specific_reset is good enough for 
>>>> none virtualization purposes and it's probably the least intrusive. For 
>>>> virtualization however it would be nice to be sure it resets properly, or have a 
>>>> way to force a specific reset routine.)
>> 
>>> Then you need work with the maintainer for those specific devices or
>>> drivers to fix their specific reset function.
>> 
>>> I'm not adding stuff to pciback to workaround broken quirks.
>> 
>> OK that's a pretty clear message there, so if one wants to use pci and vga
>> passthrough one should better use KVM and vfio-pci.

> Have you (or anyone else) ever raised the problem with the broken reset
> quirk for certain devices with the relevant maintainer?

>> vfio-pci has:
>> - logic to do the try-slot-bus-reset logic

> Just because vfio-pci fixed it incorrectly doesn't mean pciback has to
> as well.

Depends on what you call an "incorrect fix" .. it fixes a quirk .. 
you can say that's incorrect, but then you would have to remove 50% of
the kernel and Xen code as well.

(i do in general agree it's better to strive for a generic solution though,
that's exactly why i brought up that that function doesn't seem to work perfect
for virtualization purposes) 

> It makes no sense for both pciback and vfio-pci to workaround problems
> with pci_function_reset() in different ways -- it should be fixed in the
> core PCI code so both can benefit and make use of the same code.

Well perhaps Bjorn knows why the order of resets and skipping the rest as
implemented in "pci.c:__pci_dev_reset" was implemented in that way ?

Especially what is the expectation about pci_dev_specific_reset doing a proper 
reset for say a vga-card:
- i know it doesn't work on a radeon card (doesn't blank screen, on next guest 
  boot reports it's already posted, powermanagement doesn't work).
- while with a slot/bus reset, that all just works fine, screen blanks 
  immediately and everything else also works.

Added Alex as well since he added this workaround for KVM/vfio-pci, perhaps he knows why
he introduced the workaround in vfio-pci instead of trying to fix it in core pci 
code ?

--
Sander


> David



--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson Dec. 4, 2014, 3:39 p.m. UTC | #7
On Thu, 2014-12-04 at 15:50 +0100, Sander Eikelenboom wrote:
> Thursday, December 4, 2014, 3:31:11 PM, you wrote:
> 
> > On 04/12/14 14:09, Sander Eikelenboom wrote:
> >> 
> >> Thursday, December 4, 2014, 2:43:06 PM, you wrote:
> >> 
> >>> On 04/12/14 13:10, Sander Eikelenboom wrote:
> >>>>
> >>>> Thursday, December 4, 2014, 1:24:47 PM, you wrote:
> >>>>
> >>>>> On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:
> >>>>>>
> >>>>>> On Dec 4, 2014 6:30 AM, David Vrabel <david.vrabel@citrix.com> wrote:
> >>>>>>>
> >>>>>>> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote: 
> >>>>>>>>
> >>>>>>>> Instead of doing all this complex dance, we depend on the toolstack 
> >>>>>>>> doing the right thing. As such implement the 'do_flr' SysFS attribute 
> >>>>>>>> which 'xl' uses when a device is detached or attached from/to a guest. 
> >>>>>>>> It bypasses the need to worry about the PCI lock. 
> >>>>>>>
> >>>>>>> No.  Get pciback to add its own "reset" sysfs file (as I have repeatedly 
> >>>>>>> proposed). 
> >>>>>>>
> >>>>>>
> >>>>>> Which does not work as the kobj will complain (as there is already an 'reset' associated with the PCI device).
> >>>>
> >>>>> It is only needed if the core won't provide one.
> >>>>
> >>>>> +static int pcistub_try_create_reset_file(struct pci_dev *pci)
> >>>>> +{
> >>>>> +       struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
> >>>>> +       struct device *dev = &pci->dev;
> >>>>> +       int ret;
> >>>>> +
> >>>>> +       /* Already have a per-function reset? */
> >>>>> +       if (pci_probe_reset_function(pci) == 0)
> >>>>> +               return 0;
> >>>>> +
> >>>>> +       ret = device_create_file(dev, &dev_attr_reset);
> >>>>> +       if (ret < 0)
> >>>>> +               return ret;
> >>>> +       dev_data->>created_reset_file = true;
> >>>>> +       return 0;
> >>>>> +}
> >>>>
> >>>> Wouldn't the "core-reset-sysfs-file" be still wired to the end up calling 
> >>>> "pci.c:__pci_dev_reset" ?
> >>>>
> >>>> The problem with that function is that from my testing it seems that the 
> >>>> first option "pci_dev_specific_reset" always seems to return succes, so all the
> >>>> other options are skipped (flr, pm, slot, bus). However the device it self is 
> >>>> not properly reset enough (perhaps the pci_dev_specific_reset is good enough for 
> >>>> none virtualization purposes and it's probably the least intrusive. For 
> >>>> virtualization however it would be nice to be sure it resets properly, or have a 
> >>>> way to force a specific reset routine.)
> >> 
> >>> Then you need work with the maintainer for those specific devices or
> >>> drivers to fix their specific reset function.
> >> 
> >>> I'm not adding stuff to pciback to workaround broken quirks.
> >> 
> >> OK that's a pretty clear message there, so if one wants to use pci and vga
> >> passthrough one should better use KVM and vfio-pci.
> 
> > Have you (or anyone else) ever raised the problem with the broken reset
> > quirk for certain devices with the relevant maintainer?
> 
> >> vfio-pci has:
> >> - logic to do the try-slot-bus-reset logic
> 
> > Just because vfio-pci fixed it incorrectly doesn't mean pciback has to
> > as well.
> 
> Depends on what you call an "incorrect fix" .. it fixes a quirk .. 
> you can say that's incorrect, but then you would have to remove 50% of
> the kernel and Xen code as well.
> 
> (i do in general agree it's better to strive for a generic solution though,
> that's exactly why i brought up that that function doesn't seem to work perfect
> for virtualization purposes) 
> 
> > It makes no sense for both pciback and vfio-pci to workaround problems
> > with pci_function_reset() in different ways -- it should be fixed in the
> > core PCI code so both can benefit and make use of the same code.
> 
> Well perhaps Bjorn knows why the order of resets and skipping the rest as
> implemented in "pci.c:__pci_dev_reset" was implemented in that way ?
> 
> Especially what is the expectation about pci_dev_specific_reset doing a proper 
> reset for say a vga-card:
> - i know it doesn't work on a radeon card (doesn't blank screen, on next guest 
>   boot reports it's already posted, powermanagement doesn't work).
> - while with a slot/bus reset, that all just works fine, screen blanks 
>   immediately and everything else also works.
> 
> Added Alex as well since he added this workaround for KVM/vfio-pci, perhaps he knows why
> he introduced the workaround in vfio-pci instead of trying to fix it in core pci 
> code ?

I don't know what workaround you're talking about.  As devices are
released from the user, vfio-pci attempts to reset them.  If
pci_reset_function() returns success we mark the device clean, otherwise
it gets marked dirty.  Each time a device is released, if there are
dirty devices we test whether we can try a bus/slot reset to clean them.
In the case of assigning a GPU this typically means that the GPU or
audio function come through first, there's no reset mechanism so it gets
marked dirty, the next device comes through and we manage to try a bus
reset.  vfio-pci does not have any device specific resets, all
functionality is added to the PCI-core, thank-you-very-much.  I even
posted a generic PCI quirk patch recently that marks AMD VGA PM reset as
bad so that pci_reset_function() won't claim that worked.  All VGA
access quirks are done in QEMU, the kernel doesn't have any business in
remapping config space over MMIO regions or trapping other config space
backdoors.

I have never heard of problems with the dev specific reset claiming to
work and not doing anything, there are only a few of these, it should be
easy to debug.

I didn't read the original patch, but the title alone of this patch is
quite confusing.  FLR is specifically a function-level-reset, so one
would expect 'do_flr' to be function specific.  The pci-sysfs 'reset'
attribute is already function specific.  If pci_reset_function() isn't
doing the job and we need to use bus/slot reset, it's clearly not an
FLR.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sander Eikelenboom Dec. 4, 2014, 4:25 p.m. UTC | #8
Thursday, December 4, 2014, 4:39:06 PM, you wrote:

> On Thu, 2014-12-04 at 15:50 +0100, Sander Eikelenboom wrote:
>> Thursday, December 4, 2014, 3:31:11 PM, you wrote:
>> 
>> > On 04/12/14 14:09, Sander Eikelenboom wrote:
>> >> 
>> >> Thursday, December 4, 2014, 2:43:06 PM, you wrote:
>> >> 
>> >>> On 04/12/14 13:10, Sander Eikelenboom wrote:
>> >>>>
>> >>>> Thursday, December 4, 2014, 1:24:47 PM, you wrote:
>> >>>>
>> >>>>> On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:
>> >>>>>>
>> >>>>>> On Dec 4, 2014 6:30 AM, David Vrabel <david.vrabel@citrix.com> wrote:
>> >>>>>>>
>> >>>>>>> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote: 
>> >>>>>>>>
>> >>>>>>>> Instead of doing all this complex dance, we depend on the toolstack 
>> >>>>>>>> doing the right thing. As such implement the 'do_flr' SysFS attribute 
>> >>>>>>>> which 'xl' uses when a device is detached or attached from/to a guest. 
>> >>>>>>>> It bypasses the need to worry about the PCI lock. 
>> >>>>>>>
>> >>>>>>> No.  Get pciback to add its own "reset" sysfs file (as I have repeatedly 
>> >>>>>>> proposed). 
>> >>>>>>>
>> >>>>>>
>> >>>>>> Which does not work as the kobj will complain (as there is already an 'reset' associated with the PCI device).
>> >>>>
>> >>>>> It is only needed if the core won't provide one.
>> >>>>
>> >>>>> +static int pcistub_try_create_reset_file(struct pci_dev *pci)
>> >>>>> +{
>> >>>>> +       struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
>> >>>>> +       struct device *dev = &pci->dev;
>> >>>>> +       int ret;
>> >>>>> +
>> >>>>> +       /* Already have a per-function reset? */
>> >>>>> +       if (pci_probe_reset_function(pci) == 0)
>> >>>>> +               return 0;
>> >>>>> +
>> >>>>> +       ret = device_create_file(dev, &dev_attr_reset);
>> >>>>> +       if (ret < 0)
>> >>>>> +               return ret;
>> >>>> +       dev_data->>created_reset_file = true;
>> >>>>> +       return 0;
>> >>>>> +}
>> >>>>
>> >>>> Wouldn't the "core-reset-sysfs-file" be still wired to the end up calling 
>> >>>> "pci.c:__pci_dev_reset" ?
>> >>>>
>> >>>> The problem with that function is that from my testing it seems that the 
>> >>>> first option "pci_dev_specific_reset" always seems to return succes, so all the
>> >>>> other options are skipped (flr, pm, slot, bus). However the device it self is 
>> >>>> not properly reset enough (perhaps the pci_dev_specific_reset is good enough for 
>> >>>> none virtualization purposes and it's probably the least intrusive. For 
>> >>>> virtualization however it would be nice to be sure it resets properly, or have a 
>> >>>> way to force a specific reset routine.)
>> >> 
>> >>> Then you need work with the maintainer for those specific devices or
>> >>> drivers to fix their specific reset function.
>> >> 
>> >>> I'm not adding stuff to pciback to workaround broken quirks.
>> >> 
>> >> OK that's a pretty clear message there, so if one wants to use pci and vga
>> >> passthrough one should better use KVM and vfio-pci.
>> 
>> > Have you (or anyone else) ever raised the problem with the broken reset
>> > quirk for certain devices with the relevant maintainer?
>> 
>> >> vfio-pci has:
>> >> - logic to do the try-slot-bus-reset logic
>> 
>> > Just because vfio-pci fixed it incorrectly doesn't mean pciback has to
>> > as well.
>> 
>> Depends on what you call an "incorrect fix" .. it fixes a quirk .. 
>> you can say that's incorrect, but then you would have to remove 50% of
>> the kernel and Xen code as well.
>> 
>> (i do in general agree it's better to strive for a generic solution though,
>> that's exactly why i brought up that that function doesn't seem to work perfect
>> for virtualization purposes) 
>> 
>> > It makes no sense for both pciback and vfio-pci to workaround problems
>> > with pci_function_reset() in different ways -- it should be fixed in the
>> > core PCI code so both can benefit and make use of the same code.
>> 
>> Well perhaps Bjorn knows why the order of resets and skipping the rest as
>> implemented in "pci.c:__pci_dev_reset" was implemented in that way ?
>> 
>> Especially what is the expectation about pci_dev_specific_reset doing a proper 
>> reset for say a vga-card:
>> - i know it doesn't work on a radeon card (doesn't blank screen, on next guest 
>>   boot reports it's already posted, powermanagement doesn't work).
>> - while with a slot/bus reset, that all just works fine, screen blanks 
>>   immediately and everything else also works.
>> 
>> Added Alex as well since he added this workaround for KVM/vfio-pci, perhaps he knows why
>> he introduced the workaround in vfio-pci instead of trying to fix it in core pci 
>> code ?

> I don't know what workaround you're talking about.  As devices are
> released from the user, vfio-pci attempts to reset them.  If
> pci_reset_function() returns success we mark the device clean, otherwise
> it gets marked dirty.  Each time a device is released, if there are
> dirty devices we test whether we can try a bus/slot reset to clean them.
> In the case of assigning a GPU this typically means that the GPU or
> audio function come through first, there's no reset mechanism so it gets
> marked dirty, the next device comes through and we manage to try a bus
> reset.  vfio-pci does not have any device specific resets, all
> functionality is added to the PCI-core, thank-you-very-much.  I even
> posted a generic PCI quirk patch recently that marks AMD VGA PM reset as
> bad so that pci_reset_function() won't claim that worked.  All VGA
> access quirks are done in QEMU, the kernel doesn't have any business in
> remapping config space over MMIO regions or trapping other config space
> backdoors.

Thanks for your insightful reply!

With "workaround" I was trying to refer to "vfio_pci_try_bus_reset()" which
implements how to reset the devices, it indeed uses function you introduced in
pci core code (with a solution for locking issues Konrad also seems to have 
ran into: 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=61cf16d8bd38c3dc52033ea75d5b1f8368514a17

David seems to be arguing the whole "vfio_pci_try_bus_reset()" should be not 
needed and just doing calling "pci_reset_function()" (directly or by
echo "1" > /sys/bus/pci/devices/BDF/reset shoud always magically do the
right thing. 
(Which in my opinion seems the contradict with the mere existence
of "vfio_pci_try_bus_reset()" (i don't think you would have implemented it 
when you would have deemed it unnecessary)) 

> I have never heard of problems with the dev specific reset claiming to
> work and not doing anything, there are only a few of these, it should be
> easy to debug.

> I didn't read the original patch, but the title alone of this patch is
> quite confusing.  FLR is specifically a function-level-reset, so one
> would expect 'do_flr' to be function specific.  The pci-sysfs 'reset'
> attribute is already function specific.  If pci_reset_function() isn't
> doing the job and we need to use bus/slot reset, it's clearly not an
> FLR.  Thanks,
> Alex

The name "do_flr" is coming from the Xen xl toolstack which historically has 
code that tries to reset devices using a echo "BDF" > /sys/bus/pci/drivers/pciback/do_flr

But the name "do_flr" and the debug messages indeed are incorrect (it's not 
doing a flr nor a D3/PM reset), confusing and should not be used.

And as you seem to have solved the locking issue for vfio-pci, it is probably 
possible for xen-pciback to do the same. Instead of letting xen-pciback
work around the locking problem by deferring to the xl toolstack the resetting
logic could be kept into xen-pciback it self. 
That would also mean that the sysfs attribute would be unnecessary and make 
the naming issue moot.

--
Sander


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson Dec. 4, 2014, 4:55 p.m. UTC | #9
On Thu, 2014-12-04 at 17:25 +0100, Sander Eikelenboom wrote:
> Thursday, December 4, 2014, 4:39:06 PM, you wrote:
> 
> > On Thu, 2014-12-04 at 15:50 +0100, Sander Eikelenboom wrote:
> >> Thursday, December 4, 2014, 3:31:11 PM, you wrote:
> >> 
> >> > On 04/12/14 14:09, Sander Eikelenboom wrote:
> >> >> 
> >> >> Thursday, December 4, 2014, 2:43:06 PM, you wrote:
> >> >> 
> >> >>> On 04/12/14 13:10, Sander Eikelenboom wrote:
> >> >>>>
> >> >>>> Thursday, December 4, 2014, 1:24:47 PM, you wrote:
> >> >>>>
> >> >>>>> On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:
> >> >>>>>>
> >> >>>>>> On Dec 4, 2014 6:30 AM, David Vrabel <david.vrabel@citrix.com> wrote:
> >> >>>>>>>
> >> >>>>>>> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote: 
> >> >>>>>>>>
> >> >>>>>>>> Instead of doing all this complex dance, we depend on the toolstack 
> >> >>>>>>>> doing the right thing. As such implement the 'do_flr' SysFS attribute 
> >> >>>>>>>> which 'xl' uses when a device is detached or attached from/to a guest. 
> >> >>>>>>>> It bypasses the need to worry about the PCI lock. 
> >> >>>>>>>
> >> >>>>>>> No.  Get pciback to add its own "reset" sysfs file (as I have repeatedly 
> >> >>>>>>> proposed). 
> >> >>>>>>>
> >> >>>>>>
> >> >>>>>> Which does not work as the kobj will complain (as there is already an 'reset' associated with the PCI device).
> >> >>>>
> >> >>>>> It is only needed if the core won't provide one.
> >> >>>>
> >> >>>>> +static int pcistub_try_create_reset_file(struct pci_dev *pci)
> >> >>>>> +{
> >> >>>>> +       struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
> >> >>>>> +       struct device *dev = &pci->dev;
> >> >>>>> +       int ret;
> >> >>>>> +
> >> >>>>> +       /* Already have a per-function reset? */
> >> >>>>> +       if (pci_probe_reset_function(pci) == 0)
> >> >>>>> +               return 0;
> >> >>>>> +
> >> >>>>> +       ret = device_create_file(dev, &dev_attr_reset);
> >> >>>>> +       if (ret < 0)
> >> >>>>> +               return ret;
> >> >>>> +       dev_data->>created_reset_file = true;
> >> >>>>> +       return 0;
> >> >>>>> +}
> >> >>>>
> >> >>>> Wouldn't the "core-reset-sysfs-file" be still wired to the end up calling 
> >> >>>> "pci.c:__pci_dev_reset" ?
> >> >>>>
> >> >>>> The problem with that function is that from my testing it seems that the 
> >> >>>> first option "pci_dev_specific_reset" always seems to return succes, so all the
> >> >>>> other options are skipped (flr, pm, slot, bus). However the device it self is 
> >> >>>> not properly reset enough (perhaps the pci_dev_specific_reset is good enough for 
> >> >>>> none virtualization purposes and it's probably the least intrusive. For 
> >> >>>> virtualization however it would be nice to be sure it resets properly, or have a 
> >> >>>> way to force a specific reset routine.)
> >> >> 
> >> >>> Then you need work with the maintainer for those specific devices or
> >> >>> drivers to fix their specific reset function.
> >> >> 
> >> >>> I'm not adding stuff to pciback to workaround broken quirks.
> >> >> 
> >> >> OK that's a pretty clear message there, so if one wants to use pci and vga
> >> >> passthrough one should better use KVM and vfio-pci.
> >> 
> >> > Have you (or anyone else) ever raised the problem with the broken reset
> >> > quirk for certain devices with the relevant maintainer?
> >> 
> >> >> vfio-pci has:
> >> >> - logic to do the try-slot-bus-reset logic
> >> 
> >> > Just because vfio-pci fixed it incorrectly doesn't mean pciback has to
> >> > as well.
> >> 
> >> Depends on what you call an "incorrect fix" .. it fixes a quirk .. 
> >> you can say that's incorrect, but then you would have to remove 50% of
> >> the kernel and Xen code as well.
> >> 
> >> (i do in general agree it's better to strive for a generic solution though,
> >> that's exactly why i brought up that that function doesn't seem to work perfect
> >> for virtualization purposes) 
> >> 
> >> > It makes no sense for both pciback and vfio-pci to workaround problems
> >> > with pci_function_reset() in different ways -- it should be fixed in the
> >> > core PCI code so both can benefit and make use of the same code.
> >> 
> >> Well perhaps Bjorn knows why the order of resets and skipping the rest as
> >> implemented in "pci.c:__pci_dev_reset" was implemented in that way ?
> >> 
> >> Especially what is the expectation about pci_dev_specific_reset doing a proper 
> >> reset for say a vga-card:
> >> - i know it doesn't work on a radeon card (doesn't blank screen, on next guest 
> >>   boot reports it's already posted, powermanagement doesn't work).
> >> - while with a slot/bus reset, that all just works fine, screen blanks 
> >>   immediately and everything else also works.
> >> 
> >> Added Alex as well since he added this workaround for KVM/vfio-pci, perhaps he knows why
> >> he introduced the workaround in vfio-pci instead of trying to fix it in core pci 
> >> code ?
> 
> > I don't know what workaround you're talking about.  As devices are
> > released from the user, vfio-pci attempts to reset them.  If
> > pci_reset_function() returns success we mark the device clean, otherwise
> > it gets marked dirty.  Each time a device is released, if there are
> > dirty devices we test whether we can try a bus/slot reset to clean them.
> > In the case of assigning a GPU this typically means that the GPU or
> > audio function come through first, there's no reset mechanism so it gets
> > marked dirty, the next device comes through and we manage to try a bus
> > reset.  vfio-pci does not have any device specific resets, all
> > functionality is added to the PCI-core, thank-you-very-much.  I even
> > posted a generic PCI quirk patch recently that marks AMD VGA PM reset as
> > bad so that pci_reset_function() won't claim that worked.  All VGA
> > access quirks are done in QEMU, the kernel doesn't have any business in
> > remapping config space over MMIO regions or trapping other config space
> > backdoors.
> 
> Thanks for your insightful reply!
> 
> With "workaround" I was trying to refer to "vfio_pci_try_bus_reset()" which
> implements how to reset the devices, it indeed uses function you introduced in
> pci core code (with a solution for locking issues Konrad also seems to have 
> ran into: 
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=61cf16d8bd38c3dc52033ea75d5b1f8368514a17
> 
> David seems to be arguing the whole "vfio_pci_try_bus_reset()" should be not 
> needed and just doing calling "pci_reset_function()" (directly or by
> echo "1" > /sys/bus/pci/devices/BDF/reset shoud always magically do the
> right thing. 
> (Which in my opinion seems the contradict with the mere existence
> of "vfio_pci_try_bus_reset()" (i don't think you would have implemented it 
> when you would have deemed it unnecessary)) 

That truly would be magic because a bus/slot reset and function reset
are completely different beasts.  QEMU, through vfio-pci, makes use of
both.  Take for instance hot-plugging the second port of a dual-port NIC
to a guest, where the first port may be (a) assigned to the same guest,
(b) assigned to a different guest, (c) in-use by the host, or (d)
not-in-use.  For a hotplug I can only make use of a bus/slot reset in
one of those cases (d).  For a cold-plug or VM reset, only two (a,d).  I
don't see how pci_reset_function() can have that sort of visibility to
the ownership and usage of a given device.  vfio-pci doesn't even have
this visibility, which is why the distinction is made in QEMU.  vfio-pci
is just a conduit and gatekeeper to the PCI-core interfaces, for
instance preventing QEMU from doing a reset in the (b) and (c) cases.
What prevents that in the Xen case?  Userspace?

> > I have never heard of problems with the dev specific reset claiming to
> > work and not doing anything, there are only a few of these, it should be
> > easy to debug.
> 
> > I didn't read the original patch, but the title alone of this patch is
> > quite confusing.  FLR is specifically a function-level-reset, so one
> > would expect 'do_flr' to be function specific.  The pci-sysfs 'reset'
> > attribute is already function specific.  If pci_reset_function() isn't
> > doing the job and we need to use bus/slot reset, it's clearly not an
> > FLR.  Thanks,
> > Alex
> 
> The name "do_flr" is coming from the Xen xl toolstack which historically has 
> code that tries to reset devices using a echo "BDF" > /sys/bus/pci/drivers/pciback/do_flr

Redundant to /sys/bus/pci/devices/DDDD:BB:DD.F/reset

> But the name "do_flr" and the debug messages indeed are incorrect (it's not 
> doing a flr nor a D3/PM reset), confusing and should not be used.
> 
> And as you seem to have solved the locking issue for vfio-pci, it is probably 
> possible for xen-pciback to do the same. Instead of letting xen-pciback
> work around the locking problem by deferring to the xl toolstack the resetting
> logic could be kept into xen-pciback it self. 
> That would also mean that the sysfs attribute would be unnecessary and make 
> the naming issue moot.

I would consider the try_*_reset() interfaces to be a workaround for
existing locking issues which are much harder to solve.  It makes the
vfio-pci reset-on-release a best effort approach, which is generally
fine.  For vfio I can't rely on a toolstack, nor maybe should you.
There's always a chance that the VM/user is sent a kill -9 and it's the
kernel's job to protect itself and return things to a quiescent state.
This is why I don't simply have QEMU send a bus reset on shutdown or put
reset policy that can affect other users or the host in userspace.
Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Konrad Rzeszutek Wilk Dec. 4, 2014, 7:05 p.m. UTC | #10
On Thu, Dec 04, 2014 at 02:31:11PM +0000, David Vrabel wrote:
> On 04/12/14 14:09, Sander Eikelenboom wrote:
> > 
> > Thursday, December 4, 2014, 2:43:06 PM, you wrote:
> > 
> >> On 04/12/14 13:10, Sander Eikelenboom wrote:
> >>>
> >>> Thursday, December 4, 2014, 1:24:47 PM, you wrote:
> >>>
> >>>> On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:
> >>>>>
> >>>>> On Dec 4, 2014 6:30 AM, David Vrabel <david.vrabel@citrix.com> wrote:
> >>>>>>
> >>>>>> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote: 
> >>>>>>>
> >>>>>>> Instead of doing all this complex dance, we depend on the toolstack 
> >>>>>>> doing the right thing. As such implement the 'do_flr' SysFS attribute 
> >>>>>>> which 'xl' uses when a device is detached or attached from/to a guest. 
> >>>>>>> It bypasses the need to worry about the PCI lock. 
> >>>>>>
> >>>>>> No.  Get pciback to add its own "reset" sysfs file (as I have repeatedly 
> >>>>>> proposed). 
> >>>>>>
> >>>>>
> >>>>> Which does not work as the kobj will complain (as there is already an 'reset' associated with the PCI device).
> >>>
> >>>> It is only needed if the core won't provide one.
> >>>
> >>>> +static int pcistub_try_create_reset_file(struct pci_dev *pci)
> >>>> +{
> >>>> +       struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
> >>>> +       struct device *dev = &pci->dev;
> >>>> +       int ret;
> >>>> +
> >>>> +       /* Already have a per-function reset? */
> >>>> +       if (pci_probe_reset_function(pci) == 0)
> >>>> +               return 0;
> >>>> +
> >>>> +       ret = device_create_file(dev, &dev_attr_reset);
> >>>> +       if (ret < 0)
> >>>> +               return ret;
> >>> +       dev_data->>created_reset_file = true;
> >>>> +       return 0;
> >>>> +}
> >>>
> >>> Wouldn't the "core-reset-sysfs-file" be still wired to the end up calling 
> >>> "pci.c:__pci_dev_reset" ?
> >>>
> >>> The problem with that function is that from my testing it seems that the 
> >>> first option "pci_dev_specific_reset" always seems to return succes, so all the
> >>> other options are skipped (flr, pm, slot, bus). However the device it self is 
> >>> not properly reset enough (perhaps the pci_dev_specific_reset is good enough for 
> >>> none virtualization purposes and it's probably the least intrusive. For 
> >>> virtualization however it would be nice to be sure it resets properly, or have a 
> >>> way to force a specific reset routine.)
> > 
> >> Then you need work with the maintainer for those specific devices or
> >> drivers to fix their specific reset function.
> > 
> >> I'm not adding stuff to pciback to workaround broken quirks.
> > 
> > OK that's a pretty clear message there, so if one wants to use pci and vga
> > passthrough one should better use KVM and vfio-pci.
> 
> Have you (or anyone else) ever raised the problem with the broken reset
> quirk for certain devices with the relevant maintainer?
> 
> > vfio-pci has:
> > - logic to do the try-slot-bus-reset logic
> 
> Just because vfio-pci fixed it incorrectly doesn't mean pciback has to
> as well.
> 
> It makes no sense for both pciback and vfio-pci to workaround problems
> with pci_function_reset() in different ways -- it should be fixed in the
> core PCI code so both can benefit and make use of the same code.

We seem to be going in circles.

This thread: http://patchwork.ozlabs.org/patch/368668/

has an interesting discussion that pretty much touches all of this
and I believe it ends with David agreeing with Alex that an
bus-reset is a perfect use-case.

I believe the contention was on how to expose this interface
to the user-space. David's was thinking to use 'reset' while
I used 'do_flr' (which is a misleading name but hte toolstack
already does it). Perhaps we should just have (as David suggested)
an 'bus_reset' on the devices.

I am going to go on a limb and presume that David was thinking
that this 'bus_reset' would be exposed in the generic PCI land?

> 
> David
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Vrabel Dec. 5, 2014, 10:30 a.m. UTC | #11
On 04/12/14 15:39, Alex Williamson wrote:
> 
> I don't know what workaround you're talking about.  As devices are
> released from the user, vfio-pci attempts to reset them.  If
> pci_reset_function() returns success we mark the device clean, otherwise
> it gets marked dirty.  Each time a device is released, if there are
> dirty devices we test whether we can try a bus/slot reset to clean them.
> In the case of assigning a GPU this typically means that the GPU or
> audio function come through first, there's no reset mechanism so it gets
> marked dirty, the next device comes through and we manage to try a bus
> reset.  vfio-pci does not have any device specific resets, all
> functionality is added to the PCI-core, thank-you-very-much.  I even
> posted a generic PCI quirk patch recently that marks AMD VGA PM reset as
> bad so that pci_reset_function() won't claim that worked.  All VGA
> access quirks are done in QEMU, the kernel doesn't have any business in
> remapping config space over MMIO regions or trapping other config space
> backdoors.

Thanks for the info Alex, I hadn't got around to actually looking and
the vfio-pci code and was just going to what Sander said.

We probably do need to have a more in depth look at now PCI devices and
handled by both the toolstack and pciback but in the short term I would
like a simple solution that does not extend the ABI.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Konrad Rzeszutek Wilk Dec. 5, 2014, 5:22 p.m. UTC | #12
On Fri, Dec 05, 2014 at 10:30:01AM +0000, David Vrabel wrote:
> On 04/12/14 15:39, Alex Williamson wrote:
> > 
> > I don't know what workaround you're talking about.  As devices are
> > released from the user, vfio-pci attempts to reset them.  If
> > pci_reset_function() returns success we mark the device clean, otherwise
> > it gets marked dirty.  Each time a device is released, if there are
> > dirty devices we test whether we can try a bus/slot reset to clean them.
> > In the case of assigning a GPU this typically means that the GPU or
> > audio function come through first, there's no reset mechanism so it gets
> > marked dirty, the next device comes through and we manage to try a bus
> > reset.  vfio-pci does not have any device specific resets, all
> > functionality is added to the PCI-core, thank-you-very-much.  I even
> > posted a generic PCI quirk patch recently that marks AMD VGA PM reset as
> > bad so that pci_reset_function() won't claim that worked.  All VGA
> > access quirks are done in QEMU, the kernel doesn't have any business in
> > remapping config space over MMIO regions or trapping other config space
> > backdoors.
> 
> Thanks for the info Alex, I hadn't got around to actually looking and
> the vfio-pci code and was just going to what Sander said.
> 
> We probably do need to have a more in depth look at now PCI devices and
> handled by both the toolstack and pciback but in the short term I would
> like a simple solution that does not extend the ABI.

Could you enumerate the 'simple solution' then please? I am having
a frustrating time figuring out what it is that you are proposing.


> 
> David
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Vrabel Dec. 8, 2014, 10:38 a.m. UTC | #13
On 05/12/14 17:22, Konrad Rzeszutek Wilk wrote:
> On Fri, Dec 05, 2014 at 10:30:01AM +0000, David Vrabel wrote:
>> On 04/12/14 15:39, Alex Williamson wrote:
>>>
>>> I don't know what workaround you're talking about.  As devices are
>>> released from the user, vfio-pci attempts to reset them.  If
>>> pci_reset_function() returns success we mark the device clean, otherwise
>>> it gets marked dirty.  Each time a device is released, if there are
>>> dirty devices we test whether we can try a bus/slot reset to clean them.
>>> In the case of assigning a GPU this typically means that the GPU or
>>> audio function come through first, there's no reset mechanism so it gets
>>> marked dirty, the next device comes through and we manage to try a bus
>>> reset.  vfio-pci does not have any device specific resets, all
>>> functionality is added to the PCI-core, thank-you-very-much.  I even
>>> posted a generic PCI quirk patch recently that marks AMD VGA PM reset as
>>> bad so that pci_reset_function() won't claim that worked.  All VGA
>>> access quirks are done in QEMU, the kernel doesn't have any business in
>>> remapping config space over MMIO regions or trapping other config space
>>> backdoors.
>>
>> Thanks for the info Alex, I hadn't got around to actually looking and
>> the vfio-pci code and was just going to what Sander said.
>>
>> We probably do need to have a more in depth look at now PCI devices and
>> handled by both the toolstack and pciback but in the short term I would
>> like a simple solution that does not extend the ABI.
> 
> Could you enumerate the 'simple solution' then please? I am having
> a frustrating time figuring out what it is that you are proposing.

I've posted it repeatedly.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Konrad Rzeszutek Wilk Dec. 8, 2014, 4:04 p.m. UTC | #14
On Mon, Dec 08, 2014 at 10:38:09AM +0000, David Vrabel wrote:
> On 05/12/14 17:22, Konrad Rzeszutek Wilk wrote:
> > On Fri, Dec 05, 2014 at 10:30:01AM +0000, David Vrabel wrote:
> >> On 04/12/14 15:39, Alex Williamson wrote:
> >>>
> >>> I don't know what workaround you're talking about.  As devices are
> >>> released from the user, vfio-pci attempts to reset them.  If
> >>> pci_reset_function() returns success we mark the device clean, otherwise
> >>> it gets marked dirty.  Each time a device is released, if there are
> >>> dirty devices we test whether we can try a bus/slot reset to clean them.
> >>> In the case of assigning a GPU this typically means that the GPU or
> >>> audio function come through first, there's no reset mechanism so it gets
> >>> marked dirty, the next device comes through and we manage to try a bus
> >>> reset.  vfio-pci does not have any device specific resets, all
> >>> functionality is added to the PCI-core, thank-you-very-much.  I even
> >>> posted a generic PCI quirk patch recently that marks AMD VGA PM reset as
> >>> bad so that pci_reset_function() won't claim that worked.  All VGA
> >>> access quirks are done in QEMU, the kernel doesn't have any business in
> >>> remapping config space over MMIO regions or trapping other config space
> >>> backdoors.
> >>
> >> Thanks for the info Alex, I hadn't got around to actually looking and
> >> the vfio-pci code and was just going to what Sander said.
> >>
> >> We probably do need to have a more in depth look at now PCI devices and
> >> handled by both the toolstack and pciback but in the short term I would
> >> like a simple solution that does not extend the ABI.
> > 
> > Could you enumerate the 'simple solution' then please? I am having
> > a frustrating time figuring out what it is that you are proposing.
> 
> I've posted it repeatedly.

Are you referring to http://lists.xenproject.org/archives/html/xen-devel/2014-07/msg01270.html
which is still waiting for your feedback?

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

From e65a814106bc9994ec47b5317f7cdc975270d27f Mon Sep 17 00:00:00 2001
From: David Vrabel <david.vrabel@citrix.com>
Date: Thu, 10 Jul 2014 13:09:25 +0100
Subject: [PATCH 2/2] xen-pciback: provide a "reset" sysfs file to try harder
 at an SBR

The standard implementation of pci_reset_function() does not try an
SBR if there are other sibling devices.  This is a common
configuration for multi-function devices (e.g., GPUs with a HDMI audio
device function).

If all the functions are co-assigned to the same domain at the same
time, then it is safe to perform an SBR to reset all functions at the
same time.

Add a "reset" sysfs file with the same interface as the standard one
(i.e., write 1 to reset) that will try an SBR if all sibling devices
are unbound or bound to pciback.

Note that this is weaker than the requirement for functions to be
simultaneously co-assigned, but the toolstack is expected to ensure
this.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c
index 4e8ba38..d6c29aa 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -14,6 +14,7 @@ 
 #include <linux/wait.h>
 #include <linux/sched.h>
 #include <linux/atomic.h>
+#include <linux/delay.h>
 #include <xen/events.h>
 #include <asm/xen/pci.h>
 #include <asm/xen/hypervisor.h>
@@ -60,6 +61,114 @@  static LIST_HEAD(pcistub_devices);
 static int initialize_devices;
 static LIST_HEAD(seized_devices);
 
+/*
+ * pci_reset_function() will only work if there is a mechanism to
+ * reset that single function (e.g., FLR or a D-state transition).
+ * For PCI hardware that has two or more functions but no per-function
+ * reset, we can do a bus reset iff all the functions are co-assigned
+ * to the same domain.
+ *
+ * If a function has no per-function reset mechanism the 'reset' sysfs
+ * file that the toolstack uses to reset a function prior to assigning
+ * the device will be missing.  In this case, pciback adds its own
+ * which will try a bus reset.
+ *
+ * Note: pciback does not check for co-assigment before doing a bus
+ * reset, only that the devices are bound to pciback.  The toolstack
+ * is assumed to have done the right thing.
+ */
+static int __pcistub_reset_function(struct pci_dev *dev)
+{
+	struct pci_dev *pdev;
+	u16 ctrl;
+	int ret;
+
+	ret = __pci_reset_function_locked(dev);
+	if (ret == 0)
+		return 0;
+
+	if (pci_is_root_bus(dev->bus) || dev->subordinate || !dev->bus->self)
+		return -ENOTTY;
+
+	list_for_each_entry(pdev, &dev->bus->devices, bus_list) {
+		if (pdev != dev && (!pdev->driver
+				    || strcmp(pdev->driver->name, "pciback")))
+			return -ENOTTY;
+		pci_save_state(pdev);
+	}
+
+	pci_read_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, &ctrl);
+	ctrl |= PCI_BRIDGE_CTL_BUS_RESET;
+	pci_write_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, ctrl);
+	msleep(200);
+
+	ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET;
+	pci_write_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, ctrl);
+	msleep(200);
+
+	list_for_each_entry(pdev, &dev->bus->devices, bus_list)
+		pci_restore_state(pdev);
+
+	return 0;
+}
+
+static int pcistub_reset_function(struct pci_dev *dev)
+{
+	int ret;
+
+	device_lock(&dev->dev);
+	ret = __pcistub_reset_function(dev);
+	device_unlock(&dev->dev);
+
+	return ret;
+}
+
+static ssize_t pcistub_reset_store(struct device *dev,
+				   struct device_attribute *attr,
+				   const char *buf, size_t count)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	unsigned long val;
+	ssize_t result = strict_strtoul(buf, 0, &val);
+
+	if (result < 0)
+		return result;
+
+	if (val != 1)
+		return -EINVAL;
+
+	result = pcistub_reset_function(pdev);
+	if (result < 0)
+		return result;
+	return count;
+}
+static DEVICE_ATTR(reset, 0200, NULL, pcistub_reset_store);
+
+static int pcistub_try_create_reset_file(struct pci_dev *pci)
+{
+	struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
+	struct device *dev = &pci->dev;
+	int ret;
+
+	/* Already have a per-function reset? */
+	if (pci_probe_reset_function(pci) == 0)
+		return 0;
+
+	ret = device_create_file(dev, &dev_attr_reset);
+	if (ret < 0)
+		return ret;
+	dev_data->created_reset_file = true;
+	return 0;
+}
+
+static void pcistub_remove_reset_file(struct pci_dev *pci)
+{
+	struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
+
+	if (dev_data && dev_data->created_reset_file)
+		device_remove_file(&pci->dev, &dev_attr_reset);
+}
+
 static struct pcistub_device *pcistub_device_alloc(struct pci_dev *dev)
 {
 	struct pcistub_device *psdev;
@@ -100,7 +209,8 @@  static void pcistub_device_release(struct kref *kref)
 	/* Call the reset function which does not take lock as this
 	 * is called from "unbind" which takes a device_lock mutex.
 	 */
-	__pci_reset_function_locked(dev);
+	__pcistub_reset_function(psdev->dev);
+
 	if (pci_load_and_free_saved_state(dev, &dev_data->pci_saved_state))
 		dev_dbg(&dev->dev, "Could not reload PCI state\n");
 	else
@@ -268,7 +378,7 @@  void pcistub_put_pci_dev(struct pci_dev *dev)
 	/* This is OK - we are running from workqueue context
 	 * and want to inhibit the user from fiddling with 'reset'
 	 */
-	pci_reset_function(dev);
+	pcistub_reset_function(dev);
 	pci_restore_state(psdev->dev);
 
 	/* This disables the device. */
@@ -392,7 +502,7 @@  static int pcistub_init_device(struct pci_dev *dev)
 		dev_err(&dev->dev, "Could not store PCI conf saved state!\n");
 	else {
 		dev_dbg(&dev->dev, "resetting (FLR, D3, etc) the device\n");
-		__pci_reset_function_locked(dev);
+		__pcistub_reset_function(dev);
 		pci_restore_state(dev);
 	}
 	/* Now disable the device (this also ensures some private device
@@ -401,6 +511,10 @@  static int pcistub_init_device(struct pci_dev *dev)
 	dev_dbg(&dev->dev, "reset device\n");
 	xen_pcibk_reset_device(dev);
 
+	err = pcistub_try_create_reset_file(dev);
+	if (err < 0)
+		goto config_release;
+
 	dev->dev_flags |= PCI_DEV_FLAGS_ASSIGNED;
 	return 0;
 
@@ -526,6 +640,8 @@  static void pcistub_remove(struct pci_dev *dev)
 
 	dev_dbg(&dev->dev, "removing\n");
 
+	pcistub_remove_reset_file(dev);
+
 	spin_lock_irqsave(&pcistub_devices_lock, flags);
 
 	xen_pcibk_config_quirk_release(dev);
diff --git a/drivers/xen/xen-pciback/pciback.h b/drivers/xen/xen-pciback/pciback.h
index f72af87..708ade9 100644
--- a/drivers/xen/xen-pciback/pciback.h
+++ b/drivers/xen/xen-pciback/pciback.h
@@ -42,6 +42,7 @@  struct xen_pcibk_device {
 struct xen_pcibk_dev_data {
 	struct list_head config_fields;
 	struct pci_saved_state *pci_saved_state;
+	bool created_reset_file;
 	unsigned int permissive:1;
 	unsigned int warned_on_write:1;
 	unsigned int enable_intx:1;