Patchwork [v3] pci: clean all funcs when hot-removing multifunc device

login
register
mail settings
Submitter Amos Kong
Date May 10, 2012, 3:44 p.m.
Message ID <20120510154423.11306.85353.stgit@t>
Download mbox | patch
Permalink /patch/158337/
State New
Headers show

Comments

Amos Kong - May 10, 2012, 3:44 p.m.
Hotplug CallTrace:
int acpiphp_enable_slot(struct acpiphp_slot *slot)
    \_enable_device(slot);
       \_pci_bus_add_devices(bus);
            # un-added new devs(all funcs in slot) will be added
            list_for_each_entry(dev, &bus->devices, bus_list) {
                if (dev->is_added)
                        continue;
                pci_bus_add_device(dev);
                device_add(&dev->dev);
                dev->is_added = 1;

'dev->is_added' is used to trace if pci dev is added to bus, all funcs in
same slot would be added to bus in enable_device(slot). So we need to clean
all funcs of same slot in disable_device(slot).


But hot-remove exists bug: https://bugzilla.kernel.org/show_bug.cgi?id=43219
(dmesg and DSDT were attached in bz), detail:

. Boot up a Linux VM with 8 pci block devices which are the 8
functions in one pci slot.
| # qemu-kvm ...
| -drive file=images/u0,if=none,id=drv0,format=qcow2,cache=none \
| -device virtio-blk-pci,drive=drv0,id=v0,multifunction=on,addr=0x03.0 \
| ....
| -drive file=images/u7,if=none,id=drv7,format=qcow2,cache=none \
| -device virtio-blk-pci,drive=drv7,id=v7,multifunction=on,addr=0x03.7 \

. Check devices in guest.
| vm)# ls /dev/vd*
|    vda vdb vdc vde vdf vdg vdh
| vm)# lspci |grep block
| 00:03.0 SCSI storage controller: Red Hat, Inc Virtio block device
|    ...
| 00:03.7 SCSI storage controller: Red Hat, Inc Virtio block device
|

. Func1~7 still exist in guest after hot-removing the whole slot
by qemu monitor cmd.
| vm)# lspci |grep block    (00:03.0 disappeared)
| 00:03.1 SCSI storage controller: Red Hat, Inc Virtio block device (rev ff)
|    ...
| 00:03.7 SCSI storage controller: Red Hat, Inc Virtio block device (rev ff)
| vm)# ls /dev/vd*          (vda disappeared)
|    vdb vdc vde vdf vdg vdh
| vm)# mkfs /dev/vdb
|    INFO: task mkfs.ext2:1784 blocked for more than 120 seconds.

We process pci slot as a whole device in pciphp spec, seabios only
defines one device for a slot in ACPI DSDT table.
In acpiphp_glue.c:register_slot(), only one entry is added (for func#0)
into 'slot->funcs' list. When we release the whole slot, only
the entry in 'slot->funcs' will be cleaned, so func#1~7 could
not be cleaned from system.

| drivers/pci/hotplug/acpiphp_glue.c:
| static int disable_device(struct acpiphp_slot *slot) {
| 	list_for_each_entry(func, &slot->funcs, sibling) {
| 		pdev = pci_get_slot(slot->bridge->pci_bus,
| 		       PCI_DEVFN(slot->device, func->function));
| 		..clean code.. // those code is only executed 1 time(for func#0)
|                 __pci_remove_bus_device(pdev);
|                 pci_dev_put(pdev);


Hotpluging multifunc of guests(WinXp/Win7) is fine.

---
v1 thread: http://marc.info/?t=131597601700003&r=1&w=2

Changes from v1:
- rebase patch to latest linux.git
- remove unnecessary multiplefunction check
- rename 'i' to meaningful 'fn'
- fix coding style

Changes from v2:
- update detail reason(calltrace) to commitlog
- remove hardcode 8, find funcs in pci devlist

Signed-off-by: Amos Kong <akong@redhat.com>
---
 drivers/pci/hotplug/acpiphp_glue.c |    9 +++++----
 1 files changed, 5 insertions(+), 4 deletions(-)
Jiang Liu - May 10, 2012, 5:09 p.m.
On 05/10/2012 11:44 PM, Amos Kong wrote:

> diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
> index 806c44f..a7442d9 100644
> --- a/drivers/pci/hotplug/acpiphp_glue.c
> +++ b/drivers/pci/hotplug/acpiphp_glue.c
> @@ -885,7 +885,7 @@ static void disable_bridges(struct pci_bus *bus)
>  static int disable_device(struct acpiphp_slot *slot)
>  {
>  	struct acpiphp_func *func;
> -	struct pci_dev *pdev;
> +	struct pci_dev *pdev, *tmp;
>  	struct pci_bus *bus = slot->bridge->pci_bus;
>  
>  	/* The slot will be enabled when func 0 is added, so check
> @@ -902,9 +902,10 @@ static int disable_device(struct acpiphp_slot *slot)
>  			func->bridge = NULL;
>  		}
>  
> -		pdev = pci_get_slot(slot->bridge->pci_bus,
> -				    PCI_DEVFN(slot->device, func->function));
> -		if (pdev) {
> +		list_for_each_entry_safe(pdev, tmp, &bus->devices, bus_list) {
> +			if (PCI_SLOT(pdev->devfn) != slot->device)
> +				continue;
> +
The pci_bus_sem lock should be acquired when walking the bus->devices list.
Otherwise it may cause invalid memory access if another thread is modifying
the bus->devices list concurrently.

BTW, what's the relationship with "[PATCH v3] hotplug: add device per func
in ACPI DSDT tables"? Seems they are both solving the same issue.

>  			pci_stop_bus_device(pdev);
>  			if (pdev->subordinate) {
>  				disable_bridges(pdev->subordinate);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin - May 10, 2012, 6:55 p.m.
On Fri, May 11, 2012 at 01:09:13AM +0800, Jiang Liu wrote:
> On 05/10/2012 11:44 PM, Amos Kong wrote:
> 
> > diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
> > index 806c44f..a7442d9 100644
> > --- a/drivers/pci/hotplug/acpiphp_glue.c
> > +++ b/drivers/pci/hotplug/acpiphp_glue.c
> > @@ -885,7 +885,7 @@ static void disable_bridges(struct pci_bus *bus)
> >  static int disable_device(struct acpiphp_slot *slot)
> >  {
> >  	struct acpiphp_func *func;
> > -	struct pci_dev *pdev;
> > +	struct pci_dev *pdev, *tmp;
> >  	struct pci_bus *bus = slot->bridge->pci_bus;
> >  
> >  	/* The slot will be enabled when func 0 is added, so check
> > @@ -902,9 +902,10 @@ static int disable_device(struct acpiphp_slot *slot)
> >  			func->bridge = NULL;
> >  		}
> >  
> > -		pdev = pci_get_slot(slot->bridge->pci_bus,
> > -				    PCI_DEVFN(slot->device, func->function));
> > -		if (pdev) {
> > +		list_for_each_entry_safe(pdev, tmp, &bus->devices, bus_list) {
> > +			if (PCI_SLOT(pdev->devfn) != slot->device)
> > +				continue;
> > +
> The pci_bus_sem lock should be acquired when walking the bus->devices list.
> Otherwise it may cause invalid memory access if another thread is modifying
> the bus->devices list concurrently.
> 
> BTW, what's the relationship with "[PATCH v3] hotplug: add device per func
> in ACPI DSDT tables"? Seems they are both solving the same issue.

That's a bios patch. It's needed if you want broken linux to work.  This
makes linux behave properly on the original bios.

> >  			pci_stop_bus_device(pdev);
> >  			if (pdev->subordinate) {
> >  				disable_bridges(pdev->subordinate);
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
Amos Kong - May 10, 2012, 11:54 p.m.
On 05/11/2012 02:55 AM, Michael S. Tsirkin wrote:
> On Fri, May 11, 2012 at 01:09:13AM +0800, Jiang Liu wrote:
>> On 05/10/2012 11:44 PM, Amos Kong wrote:
>>
>>> diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
>>> index 806c44f..a7442d9 100644
>>> --- a/drivers/pci/hotplug/acpiphp_glue.c
>>> +++ b/drivers/pci/hotplug/acpiphp_glue.c
>>> @@ -885,7 +885,7 @@ static void disable_bridges(struct pci_bus *bus)
>>>  static int disable_device(struct acpiphp_slot *slot)
>>>  {
>>>  	struct acpiphp_func *func;
>>> -	struct pci_dev *pdev;
>>> +	struct pci_dev *pdev, *tmp;
>>>  	struct pci_bus *bus = slot->bridge->pci_bus;
>>>  
>>>  	/* The slot will be enabled when func 0 is added, so check
>>> @@ -902,9 +902,10 @@ static int disable_device(struct acpiphp_slot *slot)
>>>  			func->bridge = NULL;
>>>  		}
>>>  
>>> -		pdev = pci_get_slot(slot->bridge->pci_bus,
>>> -				    PCI_DEVFN(slot->device, func->function));
>>> -		if (pdev) {
>>> +		list_for_each_entry_safe(pdev, tmp, &bus->devices, bus_list) {
>>> +			if (PCI_SLOT(pdev->devfn) != slot->device)
>>> +				continue;
>>> +
>> The pci_bus_sem lock should be acquired when walking the bus->devices list.
>> Otherwise it may cause invalid memory access if another thread is modifying
>> the bus->devices list concurrently.


>> BTW, what's the relationship with "[PATCH v3] hotplug: add device per func
>> in ACPI DSDT tables"? Seems they are both solving the same issue.

Two work need to be done when we disable a slot, cleaning
configuration(in OS) and power off slot.

Currently the second part(power off) works fine, all funcs disappear
from "(qemu)#info block" after hot-remove slot.
The only problem is func 1~7 are not unconfigured, so I NAKed seabios
patch, and try to fix this problem in pci driver.

(btw, winxp & win7 hotplug works currently)

/**
 * acpiphp_disable_slot - power off slot
 * @slot: ACPI PHP slot
 */
int acpiphp_disable_slot(struct acpiphp_slot *slot)
{
        mutex_lock(&slot->crit_sect);

        /* unconfigure all functions */
        retval = disable_device(slot);

        /* power off all functions */
        retval = power_off_slot(slot);
        ....
}

> That's a bios patch. It's needed if you want broken linux to work.  This
> makes linux behave properly on the original bios.
> 
>>>  			pci_stop_bus_device(pdev);
>>>  			if (pdev->subordinate) {
>>>  				disable_bridges(pdev->subordinate);
Amos Kong - May 11, 2012, 12:24 a.m.
On 05/11/2012 07:54 AM, Amos Kong wrote:
> On 05/11/2012 02:55 AM, Michael S. Tsirkin wrote:
>> On Fri, May 11, 2012 at 01:09:13AM +0800, Jiang Liu wrote:
>>> On 05/10/2012 11:44 PM, Amos Kong wrote:
>>>
>>>> diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
>>>> index 806c44f..a7442d9 100644
>>>> --- a/drivers/pci/hotplug/acpiphp_glue.c
>>>> +++ b/drivers/pci/hotplug/acpiphp_glue.c
>>>> @@ -885,7 +885,7 @@ static void disable_bridges(struct pci_bus *bus)
>>>>  static int disable_device(struct acpiphp_slot *slot)
>>>>  {
>>>>  	struct acpiphp_func *func;
>>>> -	struct pci_dev *pdev;
>>>> +	struct pci_dev *pdev, *tmp;
>>>>  	struct pci_bus *bus = slot->bridge->pci_bus;
>>>>  
>>>>  	/* The slot will be enabled when func 0 is added, so check
>>>> @@ -902,9 +902,10 @@ static int disable_device(struct acpiphp_slot *slot)
>>>>  			func->bridge = NULL;
>>>>  		}
>>>>  
>>>> -		pdev = pci_get_slot(slot->bridge->pci_bus,
>>>> -				    PCI_DEVFN(slot->device, func->function));
>>>> -		if (pdev) {
>>>> +		list_for_each_entry_safe(pdev, tmp, &bus->devices, bus_list) {
>>>> +			if (PCI_SLOT(pdev->devfn) != slot->device)
>>>> +				continue;
>>>> +
>>> The pci_bus_sem lock should be acquired when walking the bus->devices list.
>>> Otherwise it may cause invalid memory access if another thread is modifying
>>> the bus->devices list concurrently.

pci_bus_sem lock is only request for writing &bus->devices list, right ?
and this protection already exists in pci_destory_dev().


static int disable_device(struct acpiphp_slot *slot)
    \_ list_for_each_entry_safe(pdev, tmp, &bus->devices, bus_list) {
        \_  __pci_remove_bus_device(pdev);
               \_ pci_destroy_dev(dev);


static void pci_destroy_dev(struct pci_dev *dev)
{
        /* Remove the device from the device lists, and prevent any further
         * list accesses from this device */
        down_write(&pci_bus_sem);
        list_del(&dev->bus_list);
        dev->bus_list.next = dev->bus_list.prev = NULL;
        up_write(&pci_bus_sem);
        ...
}
Jiang Liu - May 11, 2012, 2 p.m.
On 05/11/2012 08:24 AM, Amos Kong wrote:
> On 05/11/2012 07:54 AM, Amos Kong wrote:
>> On 05/11/2012 02:55 AM, Michael S. Tsirkin wrote:
>>> On Fri, May 11, 2012 at 01:09:13AM +0800, Jiang Liu wrote:
>>>> On 05/10/2012 11:44 PM, Amos Kong wrote:
>>>>
>>>>> diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
>>>>> index 806c44f..a7442d9 100644
>>>>> --- a/drivers/pci/hotplug/acpiphp_glue.c
>>>>> +++ b/drivers/pci/hotplug/acpiphp_glue.c
>>>>> @@ -885,7 +885,7 @@ static void disable_bridges(struct pci_bus *bus)
>>>>>  static int disable_device(struct acpiphp_slot *slot)
>>>>>  {
>>>>>  	struct acpiphp_func *func;
>>>>> -	struct pci_dev *pdev;
>>>>> +	struct pci_dev *pdev, *tmp;
>>>>>  	struct pci_bus *bus = slot->bridge->pci_bus;
>>>>>  
>>>>>  	/* The slot will be enabled when func 0 is added, so check
>>>>> @@ -902,9 +902,10 @@ static int disable_device(struct acpiphp_slot *slot)
>>>>>  			func->bridge = NULL;
>>>>>  		}
>>>>>  
>>>>> -		pdev = pci_get_slot(slot->bridge->pci_bus,
>>>>> -				    PCI_DEVFN(slot->device, func->function));
>>>>> -		if (pdev) {
>>>>> +		list_for_each_entry_safe(pdev, tmp, &bus->devices, bus_list) {
>>>>> +			if (PCI_SLOT(pdev->devfn) != slot->device)
>>>>> +				continue;
>>>>> +
>>>> The pci_bus_sem lock should be acquired when walking the bus->devices list.
>>>> Otherwise it may cause invalid memory access if another thread is modifying
>>>> the bus->devices list concurrently.
> 
> pci_bus_sem lock is only request for writing &bus->devices list, right ?
> and this protection already exists in pci_destory_dev().
That's for writer. For reader to walk the pci_bus->devices list, you also need
to acquire the reader lock by down_read(&pci_bus_sem). Please refer to 
pci_get_slot() for example. This especially import for native OS because there
may be multiple PCI slots/devices on the bus.
Bjorn Helgaas - May 16, 2012, 3:26 p.m.
On Fri, May 11, 2012 at 8:00 AM, Jiang Liu <liuj97@gmail.com> wrote:
> On 05/11/2012 08:24 AM, Amos Kong wrote:
>> On 05/11/2012 07:54 AM, Amos Kong wrote:
>>> On 05/11/2012 02:55 AM, Michael S. Tsirkin wrote:
>>>> On Fri, May 11, 2012 at 01:09:13AM +0800, Jiang Liu wrote:
>>>>> On 05/10/2012 11:44 PM, Amos Kong wrote:
>>>>>
>>>>>> diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
>>>>>> index 806c44f..a7442d9 100644
>>>>>> --- a/drivers/pci/hotplug/acpiphp_glue.c
>>>>>> +++ b/drivers/pci/hotplug/acpiphp_glue.c
>>>>>> @@ -885,7 +885,7 @@ static void disable_bridges(struct pci_bus *bus)
>>>>>>  static int disable_device(struct acpiphp_slot *slot)
>>>>>>  {
>>>>>>   struct acpiphp_func *func;
>>>>>> - struct pci_dev *pdev;
>>>>>> + struct pci_dev *pdev, *tmp;
>>>>>>   struct pci_bus *bus = slot->bridge->pci_bus;
>>>>>>
>>>>>>   /* The slot will be enabled when func 0 is added, so check
>>>>>> @@ -902,9 +902,10 @@ static int disable_device(struct acpiphp_slot *slot)
>>>>>>                   func->bridge = NULL;
>>>>>>           }
>>>>>>
>>>>>> -         pdev = pci_get_slot(slot->bridge->pci_bus,
>>>>>> -                             PCI_DEVFN(slot->device, func->function));
>>>>>> -         if (pdev) {
>>>>>> +         list_for_each_entry_safe(pdev, tmp, &bus->devices, bus_list) {
>>>>>> +                 if (PCI_SLOT(pdev->devfn) != slot->device)
>>>>>> +                         continue;

I think the concept is good: in enable_device(), we use
pci_scan_slot(), which scans all possible functions in the slot.  So
in disable_device() we should do something symmetric to remove all the
functions.

>>>>>> +
>>>>> The pci_bus_sem lock should be acquired when walking the bus->devices list.
>>>>> Otherwise it may cause invalid memory access if another thread is modifying
>>>>> the bus->devices list concurrently.
>>
>> pci_bus_sem lock is only request for writing &bus->devices list, right ?
>> and this protection already exists in pci_destory_dev().
> That's for writer. For reader to walk the pci_bus->devices list, you also need
> to acquire the reader lock by down_read(&pci_bus_sem). Please refer to
> pci_get_slot() for example. This especially import for native OS because there
> may be multiple PCI slots/devices on the bus.

There is a lot of existing code that walks bus->devices without
holding pci_bus_sem, but most of it is boot-time code that is arguably
safe (though I think things like pcibios_fixup_bus() are poorly
designed and don't fit well in the hotplug-enabled world).

In this case, I do think we need to protect against updates while
we're walking bus->devices.  It's probably not trivial because
__pci_remove_bus_device() calls pci_destroy_dev(), where we do the
down_write(), so simply wrapping the whole thing with down_read() will
cause a deadlock.

Kenji-san, Yinghai, do you have any input?

Bjorn
Jianjun Kong - May 20, 2012, 2:36 a.m.
On Wed, May 16, 2012 at 11:26 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Fri, May 11, 2012 at 8:00 AM, Jiang Liu <liuj97@gmail.com> wrote:
>> On 05/11/2012 08:24 AM, Amos Kong wrote:
>>> On 05/11/2012 07:54 AM, Amos Kong wrote:
>>>> On 05/11/2012 02:55 AM, Michael S. Tsirkin wrote:
>>>>> On Fri, May 11, 2012 at 01:09:13AM +0800, Jiang Liu wrote:
>>>>>> On 05/10/2012 11:44 PM, Amos Kong wrote:
>>>>>>
>>>>>>> diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
>>>>>>> index 806c44f..a7442d9 100644
>>>>>>> --- a/drivers/pci/hotplug/acpiphp_glue.c
>>>>>>> +++ b/drivers/pci/hotplug/acpiphp_glue.c
>>>>>>> @@ -885,7 +885,7 @@ static void disable_bridges(struct pci_bus *bus)
>>>>>>>  static int disable_device(struct acpiphp_slot *slot)
>>>>>>>  {
>>>>>>>   struct acpiphp_func *func;
>>>>>>> - struct pci_dev *pdev;
>>>>>>> + struct pci_dev *pdev, *tmp;
>>>>>>>   struct pci_bus *bus = slot->bridge->pci_bus;
>>>>>>>
>>>>>>>   /* The slot will be enabled when func 0 is added, so check
>>>>>>> @@ -902,9 +902,10 @@ static int disable_device(struct acpiphp_slot *slot)
>>>>>>>                   func->bridge = NULL;
>>>>>>>           }
>>>>>>>
>>>>>>> -         pdev = pci_get_slot(slot->bridge->pci_bus,
>>>>>>> -                             PCI_DEVFN(slot->device, func->function));
>>>>>>> -         if (pdev) {
>>>>>>> +         list_for_each_entry_safe(pdev, tmp, &bus->devices, bus_list) {
>>>>>>> +                 if (PCI_SLOT(pdev->devfn) != slot->device)
>>>>>>> +                         continue;
>
> I think the concept is good: in enable_device(), we use
> pci_scan_slot(), which scans all possible functions in the slot.  So
> in disable_device() we should do something symmetric to remove all the
> functions.

Right!

>>>>>>> +
>>>>>> The pci_bus_sem lock should be acquired when walking the bus->devices list.
>>>>>> Otherwise it may cause invalid memory access if another thread is modifying
>>>>>> the bus->devices list concurrently.
>>>
>>> pci_bus_sem lock is only request for writing &bus->devices list, right ?
>>> and this protection already exists in pci_destory_dev().
>> That's for writer. For reader to walk the pci_bus->devices list, you also need
>> to acquire the reader lock by down_read(&pci_bus_sem). Please refer to
>> pci_get_slot() for example. This especially import for native OS because there
>> may be multiple PCI slots/devices on the bus.
>
> There is a lot of existing code that walks bus->devices without
> holding pci_bus_sem, but most of it is boot-time code that is arguably
> safe (though I think things like pcibios_fixup_bus() are poorly
> designed and don't fit well in the hotplug-enabled world).

disable_remove() is not boot-time code, we might hot-remove devices
when system is running.

> In this case, I do think we need to protect against updates while
> we're walking bus->devices.  It's probably not trivial because
> __pci_remove_bus_device() calls pci_destroy_dev(), where we do the
> down_write(), so simply wrapping the whole thing with down_read() will
> cause a deadlock.

I posted a V4 to add pci_bus_sem protection , please help to review.
Thanks for Jiang Liu's guide.

> Kenji-san, Yinghai, do you have any input?
>
> Bjorn

Patch

diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
index 806c44f..a7442d9 100644
--- a/drivers/pci/hotplug/acpiphp_glue.c
+++ b/drivers/pci/hotplug/acpiphp_glue.c
@@ -885,7 +885,7 @@  static void disable_bridges(struct pci_bus *bus)
 static int disable_device(struct acpiphp_slot *slot)
 {
 	struct acpiphp_func *func;
-	struct pci_dev *pdev;
+	struct pci_dev *pdev, *tmp;
 	struct pci_bus *bus = slot->bridge->pci_bus;
 
 	/* The slot will be enabled when func 0 is added, so check
@@ -902,9 +902,10 @@  static int disable_device(struct acpiphp_slot *slot)
 			func->bridge = NULL;
 		}
 
-		pdev = pci_get_slot(slot->bridge->pci_bus,
-				    PCI_DEVFN(slot->device, func->function));
-		if (pdev) {
+		list_for_each_entry_safe(pdev, tmp, &bus->devices, bus_list) {
+			if (PCI_SLOT(pdev->devfn) != slot->device)
+				continue;
+
 			pci_stop_bus_device(pdev);
 			if (pdev->subordinate) {
 				disable_bridges(pdev->subordinate);