diff mbox series

PCI: hv: Fix a bug on removing child devices on the bus

Message ID 1629789620-11049-1-git-send-email-longli@linuxonhyperv.com
State New
Headers show
Series PCI: hv: Fix a bug on removing child devices on the bus | expand

Commit Message

Long Li Aug. 24, 2021, 7:20 a.m. UTC
From: Long Li <longli@microsoft.com>

In hv_pci_bus_exit, the code is holding a spinlock while calling
pci_destroy_slot(), which takes a mutex.

This is not safe for spinlock. Fix this by moving the children to be
deleted to a list on the stack, and removing them after spinlock is
released.

Fixes: 94d22763207a ("PCI: hv: Fix a race condition when removing the device")

Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Rob Herring <robh@kernel.org>
Cc: "Krzysztof Wilczyński" <kw@linux.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Michael Kelley <mikelley@microsoft.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Long Li <longli@microsoft.com>
---
 drivers/pci/controller/pci-hyperv.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

Comments

Wei Liu Aug. 24, 2021, 11:02 a.m. UTC | #1
On Tue, Aug 24, 2021 at 12:20:20AM -0700, longli@linuxonhyperv.com wrote:
> From: Long Li <longli@microsoft.com>
> 
> In hv_pci_bus_exit, the code is holding a spinlock while calling
> pci_destroy_slot(), which takes a mutex.
> 
> This is not safe for spinlock. Fix this by moving the children to be
> deleted to a list on the stack, and removing them after spinlock is
> released.
> 
> Fixes: 94d22763207a ("PCI: hv: Fix a race condition when removing the device")
> 
> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
> Cc: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: Stephen Hemminger <sthemmin@microsoft.com>
> Cc: Wei Liu <wei.liu@kernel.org>
> Cc: Dexuan Cui <decui@microsoft.com>
> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Cc: Rob Herring <robh@kernel.org>
> Cc: "Krzysztof Wilczyński" <kw@linux.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Michael Kelley <mikelley@microsoft.com>
> Cc: Dan Carpenter <dan.carpenter@oracle.com>
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> Signed-off-by: Long Li <longli@microsoft.com>
> ---
>  drivers/pci/controller/pci-hyperv.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index a53bd8728d0d..d4f3cce18957 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -3220,6 +3220,7 @@ static int hv_pci_bus_exit(struct hv_device *hdev, bool keep_devs)
>  	struct hv_pci_dev *hpdev, *tmp;
>  	unsigned long flags;
>  	int ret;
> +	struct list_head removed;

This can be moved to where it is needed -- the if(!keep_dev) branch --
to limit its scope.

>  
>  	/*
>  	 * After the host sends the RESCIND_CHANNEL message, it doesn't
> @@ -3229,9 +3230,18 @@ static int hv_pci_bus_exit(struct hv_device *hdev, bool keep_devs)
>  		return 0;
>  
>  	if (!keep_devs) {
> -		/* Delete any children which might still exist. */
> +		INIT_LIST_HEAD(&removed);
> +
> +		/* Move all present children to the list on stack */
>  		spin_lock_irqsave(&hbus->device_list_lock, flags);
> -		list_for_each_entry_safe(hpdev, tmp, &hbus->children, list_entry) {
> +		list_for_each_entry_safe(hpdev, tmp, &hbus->children, list_entry)
> +			list_move_tail(&hpdev->list_entry, &removed);
> +		spin_unlock_irqrestore(&hbus->device_list_lock, flags);
> +
> +		/* Remove all children in the list */
> +		while (!list_empty(&removed)) {
> +			hpdev = list_first_entry(&removed, struct hv_pci_dev,
> +						 list_entry);

list_for_each_entry_safe can also be used here, right?

Wei.

>  			list_del(&hpdev->list_entry);
>  			if (hpdev->pci_slot)
>  				pci_destroy_slot(hpdev->pci_slot);
> @@ -3239,7 +3249,6 @@ static int hv_pci_bus_exit(struct hv_device *hdev, bool keep_devs)
>  			put_pcichild(hpdev);
>  			put_pcichild(hpdev);
>  		}
> -		spin_unlock_irqrestore(&hbus->device_list_lock, flags);
>  	}
>  
>  	ret = hv_send_resources_released(hdev);
> -- 
> 2.25.1
>
Bjorn Helgaas Aug. 24, 2021, 12:25 p.m. UTC | #2
"Fix a bug ..." is not a very useful subject line.  It doesn't say
anything about what the patch *does*.  It doesn't hint at a locking
change.

On Tue, Aug 24, 2021 at 12:20:20AM -0700, longli@linuxonhyperv.com wrote:
> From: Long Li <longli@microsoft.com>
> 
> In hv_pci_bus_exit, the code is holding a spinlock while calling
> pci_destroy_slot(), which takes a mutex.

It's unfortunate that slots are not better integrated into the PCI
core.  I'm sorry your driver even has to worry about this.
> 
> This is not safe for spinlock. Fix this by moving the children to be
> deleted to a list on the stack, and removing them after spinlock is
> released.
> 
> Fixes: 94d22763207a ("PCI: hv: Fix a race condition when removing the device")
> 
> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
> Cc: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: Stephen Hemminger <sthemmin@microsoft.com>
> Cc: Wei Liu <wei.liu@kernel.org>
> Cc: Dexuan Cui <decui@microsoft.com>
> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Cc: Rob Herring <robh@kernel.org>
> Cc: "Krzysztof Wilczyński" <kw@linux.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Michael Kelley <mikelley@microsoft.com>
> Cc: Dan Carpenter <dan.carpenter@oracle.com>
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

A lore link to Dan's report would be useful here.

> Signed-off-by: Long Li <longli@microsoft.com>
> ---
>  drivers/pci/controller/pci-hyperv.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index a53bd8728d0d..d4f3cce18957 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -3220,6 +3220,7 @@ static int hv_pci_bus_exit(struct hv_device *hdev, bool keep_devs)
>  	struct hv_pci_dev *hpdev, *tmp;
>  	unsigned long flags;
>  	int ret;
> +	struct list_head removed;
>  
>  	/*
>  	 * After the host sends the RESCIND_CHANNEL message, it doesn't
> @@ -3229,9 +3230,18 @@ static int hv_pci_bus_exit(struct hv_device *hdev, bool keep_devs)
>  		return 0;
>  
>  	if (!keep_devs) {
> -		/* Delete any children which might still exist. */
> +		INIT_LIST_HEAD(&removed);
> +
> +		/* Move all present children to the list on stack */
>  		spin_lock_irqsave(&hbus->device_list_lock, flags);
> -		list_for_each_entry_safe(hpdev, tmp, &hbus->children, list_entry) {
> +		list_for_each_entry_safe(hpdev, tmp, &hbus->children, list_entry)
> +			list_move_tail(&hpdev->list_entry, &removed);
> +		spin_unlock_irqrestore(&hbus->device_list_lock, flags);
> +
> +		/* Remove all children in the list */
> +		while (!list_empty(&removed)) {
> +			hpdev = list_first_entry(&removed, struct hv_pci_dev,
> +						 list_entry);
>  			list_del(&hpdev->list_entry);
>  			if (hpdev->pci_slot)
>  				pci_destroy_slot(hpdev->pci_slot);
> @@ -3239,7 +3249,6 @@ static int hv_pci_bus_exit(struct hv_device *hdev, bool keep_devs)
>  			put_pcichild(hpdev);
>  			put_pcichild(hpdev);
>  		}
> -		spin_unlock_irqrestore(&hbus->device_list_lock, flags);
>  	}
>  
>  	ret = hv_send_resources_released(hdev);
> -- 
> 2.25.1
>
Long Li Aug. 24, 2021, 5:28 p.m. UTC | #3
> Subject: Re: [PATCH] PCI: hv: Fix a bug on removing child devices on the bus
> 
> On Tue, Aug 24, 2021 at 12:20:20AM -0700, longli@linuxonhyperv.com wrote:
> > From: Long Li <longli@microsoft.com>
> >
> > In hv_pci_bus_exit, the code is holding a spinlock while calling
> > pci_destroy_slot(), which takes a mutex.
> >
> > This is not safe for spinlock. Fix this by moving the children to be
> > deleted to a list on the stack, and removing them after spinlock is
> > released.
> >
> > Fixes: 94d22763207a ("PCI: hv: Fix a race condition when removing the
> > device")
> >
> > Cc: "K. Y. Srinivasan" <kys@microsoft.com>
> > Cc: Haiyang Zhang <haiyangz@microsoft.com>
> > Cc: Stephen Hemminger <sthemmin@microsoft.com>
> > Cc: Wei Liu <wei.liu@kernel.org>
> > Cc: Dexuan Cui <decui@microsoft.com>
> > Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> > Cc: Rob Herring <robh@kernel.org>
> > Cc: "Krzysztof Wilczyński" <kw@linux.com>
> > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > Cc: Michael Kelley <mikelley@microsoft.com>
> > Cc: Dan Carpenter <dan.carpenter@oracle.com>
> > Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> > Signed-off-by: Long Li <longli@microsoft.com>
> > ---
> >  drivers/pci/controller/pci-hyperv.c | 15 ++++++++++++---
> >  1 file changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/pci/controller/pci-hyperv.c
> > b/drivers/pci/controller/pci-hyperv.c
> > index a53bd8728d0d..d4f3cce18957 100644
> > --- a/drivers/pci/controller/pci-hyperv.c
> > +++ b/drivers/pci/controller/pci-hyperv.c
> > @@ -3220,6 +3220,7 @@ static int hv_pci_bus_exit(struct hv_device *hdev,
> bool keep_devs)
> >  	struct hv_pci_dev *hpdev, *tmp;
> >  	unsigned long flags;
> >  	int ret;
> > +	struct list_head removed;
> 
> This can be moved to where it is needed -- the if(!keep_dev) branch -- to limit its
> scope.
> 
> >
> >  	/*
> >  	 * After the host sends the RESCIND_CHANNEL message, it doesn't @@
> > -3229,9 +3230,18 @@ static int hv_pci_bus_exit(struct hv_device *hdev, bool
> keep_devs)
> >  		return 0;
> >
> >  	if (!keep_devs) {
> > -		/* Delete any children which might still exist. */
> > +		INIT_LIST_HEAD(&removed);
> > +
> > +		/* Move all present children to the list on stack */
> >  		spin_lock_irqsave(&hbus->device_list_lock, flags);
> > -		list_for_each_entry_safe(hpdev, tmp, &hbus->children,
> list_entry) {
> > +		list_for_each_entry_safe(hpdev, tmp, &hbus->children,
> list_entry)
> > +			list_move_tail(&hpdev->list_entry, &removed);
> > +		spin_unlock_irqrestore(&hbus->device_list_lock, flags);
> > +
> > +		/* Remove all children in the list */
> > +		while (!list_empty(&removed)) {
> > +			hpdev = list_first_entry(&removed, struct hv_pci_dev,
> > +						 list_entry);
> 
> list_for_each_entry_safe can also be used here, right?
> 
> Wei.

I will address your comments.

Long

> 
> >  			list_del(&hpdev->list_entry);
> >  			if (hpdev->pci_slot)
> >  				pci_destroy_slot(hpdev->pci_slot);
> > @@ -3239,7 +3249,6 @@ static int hv_pci_bus_exit(struct hv_device *hdev,
> bool keep_devs)
> >  			put_pcichild(hpdev);
> >  			put_pcichild(hpdev);
> >  		}
> > -		spin_unlock_irqrestore(&hbus->device_list_lock, flags);
> >  	}
> >
> >  	ret = hv_send_resources_released(hdev);
> > --
> > 2.25.1
> >
Long Li Aug. 24, 2021, 5:30 p.m. UTC | #4
> Subject: Re: [PATCH] PCI: hv: Fix a bug on removing child devices on the bus
> 
> "Fix a bug ..." is not a very useful subject line.  It doesn't say anything about what
> the patch *does*.  It doesn't hint at a locking change.
> 
> On Tue, Aug 24, 2021 at 12:20:20AM -0700, longli@linuxonhyperv.com wrote:
> > From: Long Li <longli@microsoft.com>
> >
> > In hv_pci_bus_exit, the code is holding a spinlock while calling
> > pci_destroy_slot(), which takes a mutex.
> 
> It's unfortunate that slots are not better integrated into the PCI core.  I'm sorry
> your driver even has to worry about this.
> >
> > This is not safe for spinlock. Fix this by moving the children to be
> > deleted to a list on the stack, and removing them after spinlock is
> > released.
> >
> > Fixes: 94d22763207a ("PCI: hv: Fix a race condition when removing the
> > device")
> >
> > Cc: "K. Y. Srinivasan" <kys@microsoft.com>
> > Cc: Haiyang Zhang <haiyangz@microsoft.com>
> > Cc: Stephen Hemminger <sthemmin@microsoft.com>
> > Cc: Wei Liu <wei.liu@kernel.org>
> > Cc: Dexuan Cui <decui@microsoft.com>
> > Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> > Cc: Rob Herring <robh@kernel.org>
> > Cc: "Krzysztof Wilczyński" <kw@linux.com>
> > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > Cc: Michael Kelley <mikelley@microsoft.com>
> > Cc: Dan Carpenter <dan.carpenter@oracle.com>
> > Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> 
> A lore link to Dan's report would be useful here.

I will address your comments and send v2.

Long

> 
> > Signed-off-by: Long Li <longli@microsoft.com>
> > ---
> >  drivers/pci/controller/pci-hyperv.c | 15 ++++++++++++---
> >  1 file changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/pci/controller/pci-hyperv.c
> > b/drivers/pci/controller/pci-hyperv.c
> > index a53bd8728d0d..d4f3cce18957 100644
> > --- a/drivers/pci/controller/pci-hyperv.c
> > +++ b/drivers/pci/controller/pci-hyperv.c
> > @@ -3220,6 +3220,7 @@ static int hv_pci_bus_exit(struct hv_device *hdev,
> bool keep_devs)
> >  	struct hv_pci_dev *hpdev, *tmp;
> >  	unsigned long flags;
> >  	int ret;
> > +	struct list_head removed;
> >
> >  	/*
> >  	 * After the host sends the RESCIND_CHANNEL message, it doesn't @@
> > -3229,9 +3230,18 @@ static int hv_pci_bus_exit(struct hv_device *hdev, bool
> keep_devs)
> >  		return 0;
> >
> >  	if (!keep_devs) {
> > -		/* Delete any children which might still exist. */
> > +		INIT_LIST_HEAD(&removed);
> > +
> > +		/* Move all present children to the list on stack */
> >  		spin_lock_irqsave(&hbus->device_list_lock, flags);
> > -		list_for_each_entry_safe(hpdev, tmp, &hbus->children,
> list_entry) {
> > +		list_for_each_entry_safe(hpdev, tmp, &hbus->children,
> list_entry)
> > +			list_move_tail(&hpdev->list_entry, &removed);
> > +		spin_unlock_irqrestore(&hbus->device_list_lock, flags);
> > +
> > +		/* Remove all children in the list */
> > +		while (!list_empty(&removed)) {
> > +			hpdev = list_first_entry(&removed, struct hv_pci_dev,
> > +						 list_entry);
> >  			list_del(&hpdev->list_entry);
> >  			if (hpdev->pci_slot)
> >  				pci_destroy_slot(hpdev->pci_slot);
> > @@ -3239,7 +3249,6 @@ static int hv_pci_bus_exit(struct hv_device *hdev,
> bool keep_devs)
> >  			put_pcichild(hpdev);
> >  			put_pcichild(hpdev);
> >  		}
> > -		spin_unlock_irqrestore(&hbus->device_list_lock, flags);
> >  	}
> >
> >  	ret = hv_send_resources_released(hdev);
> > --
> > 2.25.1
> >
Michael Kelley (LINUX) Aug. 25, 2021, 7:11 p.m. UTC | #5
From: Long Li <longli@microsoft.com> Sent: Tuesday, August 24, 2021 10:28 AM
> 
> > Subject: Re: [PATCH] PCI: hv: Fix a bug on removing child devices on the bus
> >
> > On Tue, Aug 24, 2021 at 12:20:20AM -0700, longli@linuxonhyperv.com wrote:
> > > From: Long Li <longli@microsoft.com>
> > >
> > > In hv_pci_bus_exit, the code is holding a spinlock while calling
> > > pci_destroy_slot(), which takes a mutex.
> > >
> > > This is not safe for spinlock. Fix this by moving the children to be
> > > deleted to a list on the stack, and removing them after spinlock is
> > > released.
> > >
> > > Fixes: 94d22763207a ("PCI: hv: Fix a race condition when removing the
> > > device")
> > >
> > > Cc: "K. Y. Srinivasan" <kys@microsoft.com>
> > > Cc: Haiyang Zhang <haiyangz@microsoft.com>
> > > Cc: Stephen Hemminger <sthemmin@microsoft.com>
> > > Cc: Wei Liu <wei.liu@kernel.org>
> > > Cc: Dexuan Cui <decui@microsoft.com>
> > > Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> > > Cc: Rob Herring <robh@kernel.org>
> > > Cc: "Krzysztof Wilczyński" <kw@linux.com>
> > > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > > Cc: Michael Kelley <mikelley@microsoft.com>
> > > Cc: Dan Carpenter <dan.carpenter@oracle.com>
> > > Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> > > Signed-off-by: Long Li <longli@microsoft.com>
> > > ---
> > >  drivers/pci/controller/pci-hyperv.c | 15 ++++++++++++---
> > >  1 file changed, 12 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/pci/controller/pci-hyperv.c
> > > b/drivers/pci/controller/pci-hyperv.c
> > > index a53bd8728d0d..d4f3cce18957 100644
> > > --- a/drivers/pci/controller/pci-hyperv.c
> > > +++ b/drivers/pci/controller/pci-hyperv.c
> > > @@ -3220,6 +3220,7 @@ static int hv_pci_bus_exit(struct hv_device *hdev,
> > bool keep_devs)
> > >  	struct hv_pci_dev *hpdev, *tmp;
> > >  	unsigned long flags;
> > >  	int ret;
> > > +	struct list_head removed;
> >
> > This can be moved to where it is needed -- the if(!keep_dev) branch -- to limit its
> > scope.
> >
> > >
> > >  	/*
> > >  	 * After the host sends the RESCIND_CHANNEL message, it doesn't @@
> > > -3229,9 +3230,18 @@ static int hv_pci_bus_exit(struct hv_device *hdev, bool
> > keep_devs)
> > >  		return 0;
> > >
> > >  	if (!keep_devs) {
> > > -		/* Delete any children which might still exist. */
> > > +		INIT_LIST_HEAD(&removed);
> > > +
> > > +		/* Move all present children to the list on stack */
> > >  		spin_lock_irqsave(&hbus->device_list_lock, flags);
> > > -		list_for_each_entry_safe(hpdev, tmp, &hbus->children,
> > list_entry) {
> > > +		list_for_each_entry_safe(hpdev, tmp, &hbus->children,
> > list_entry)
> > > +			list_move_tail(&hpdev->list_entry, &removed);
> > > +		spin_unlock_irqrestore(&hbus->device_list_lock, flags);
> > > +
> > > +		/* Remove all children in the list */
> > > +		while (!list_empty(&removed)) {
> > > +			hpdev = list_first_entry(&removed, struct hv_pci_dev,
> > > +						 list_entry);
> >
> > list_for_each_entry_safe can also be used here, right?
> >
> > Wei.
> 
> I will address your comments.
> 
> Long

I thought list_for_each_entry_safe() is for use when list manipulation
is *not* protected by a lock and you want to safely walk the list
even if an entry gets removed.  If the list is protected by a lock or
not subject to contention (as is the case here), then
list_for_each_entry() is the simpler implementation.  The original
implementation didn't need to use the _safe version because of
the spin lock.

Or do I have it backwards?  

Michael

> 
> >
> > >  			list_del(&hpdev->list_entry);
> > >  			if (hpdev->pci_slot)
> > >  				pci_destroy_slot(hpdev->pci_slot);
> > > @@ -3239,7 +3249,6 @@ static int hv_pci_bus_exit(struct hv_device *hdev,
> > bool keep_devs)
> > >  			put_pcichild(hpdev);
> > >  			put_pcichild(hpdev);
> > >  		}
> > > -		spin_unlock_irqrestore(&hbus->device_list_lock, flags);
> > >  	}
> > >
> > >  	ret = hv_send_resources_released(hdev);
> > > --
> > > 2.25.1
> > >
Long Li Aug. 25, 2021, 8:25 p.m. UTC | #6
> Subject: RE: [PATCH] PCI: hv: Fix a bug on removing child devices on the bus
> 
> From: Long Li <longli@microsoft.com> Sent: Tuesday, August 24, 2021 10:28
> AM
> >
> > > Subject: Re: [PATCH] PCI: hv: Fix a bug on removing child devices on
> > > the bus
> > >
> > > On Tue, Aug 24, 2021 at 12:20:20AM -0700, longli@linuxonhyperv.com
> wrote:
> > > > From: Long Li <longli@microsoft.com>
> > > >
> > > > In hv_pci_bus_exit, the code is holding a spinlock while calling
> > > > pci_destroy_slot(), which takes a mutex.
> > > >
> > > > This is not safe for spinlock. Fix this by moving the children to
> > > > be deleted to a list on the stack, and removing them after
> > > > spinlock is released.
> > > >
> > > > Fixes: 94d22763207a ("PCI: hv: Fix a race condition when removing
> > > > the
> > > > device")
> > > >
> > > > Cc: "K. Y. Srinivasan" <kys@microsoft.com>
> > > > Cc: Haiyang Zhang <haiyangz@microsoft.com>
> > > > Cc: Stephen Hemminger <sthemmin@microsoft.com>
> > > > Cc: Wei Liu <wei.liu@kernel.org>
> > > > Cc: Dexuan Cui <decui@microsoft.com>
> > > > Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> > > > Cc: Rob Herring <robh@kernel.org>
> > > > Cc: "Krzysztof Wilczyński" <kw@linux.com>
> > > > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > > > Cc: Michael Kelley <mikelley@microsoft.com>
> > > > Cc: Dan Carpenter <dan.carpenter@oracle.com>
> > > > Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> > > > Signed-off-by: Long Li <longli@microsoft.com>
> > > > ---
> > > >  drivers/pci/controller/pci-hyperv.c | 15 ++++++++++++---
> > > >  1 file changed, 12 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/pci/controller/pci-hyperv.c
> > > > b/drivers/pci/controller/pci-hyperv.c
> > > > index a53bd8728d0d..d4f3cce18957 100644
> > > > --- a/drivers/pci/controller/pci-hyperv.c
> > > > +++ b/drivers/pci/controller/pci-hyperv.c
> > > > @@ -3220,6 +3220,7 @@ static int hv_pci_bus_exit(struct hv_device
> > > > *hdev,
> > > bool keep_devs)
> > > >  	struct hv_pci_dev *hpdev, *tmp;
> > > >  	unsigned long flags;
> > > >  	int ret;
> > > > +	struct list_head removed;
> > >
> > > This can be moved to where it is needed -- the if(!keep_dev) branch
> > > -- to limit its scope.
> > >
> > > >
> > > >  	/*
> > > >  	 * After the host sends the RESCIND_CHANNEL message, it doesn't
> > > > @@
> > > > -3229,9 +3230,18 @@ static int hv_pci_bus_exit(struct hv_device
> > > > *hdev, bool
> > > keep_devs)
> > > >  		return 0;
> > > >
> > > >  	if (!keep_devs) {
> > > > -		/* Delete any children which might still exist. */
> > > > +		INIT_LIST_HEAD(&removed);
> > > > +
> > > > +		/* Move all present children to the list on stack */
> > > >  		spin_lock_irqsave(&hbus->device_list_lock, flags);
> > > > -		list_for_each_entry_safe(hpdev, tmp, &hbus->children,
> > > list_entry) {
> > > > +		list_for_each_entry_safe(hpdev, tmp, &hbus->children,
> > > list_entry)
> > > > +			list_move_tail(&hpdev->list_entry, &removed);
> > > > +		spin_unlock_irqrestore(&hbus->device_list_lock, flags);
> > > > +
> > > > +		/* Remove all children in the list */
> > > > +		while (!list_empty(&removed)) {
> > > > +			hpdev = list_first_entry(&removed, struct hv_pci_dev,
> > > > +						 list_entry);
> > >
> > > list_for_each_entry_safe can also be used here, right?
> > >
> > > Wei.
> >
> > I will address your comments.
> >
> > Long
> 
> I thought list_for_each_entry_safe() is for use when list manipulation is *not*
> protected by a lock and you want to safely walk the list even if an entry gets
> removed.  If the list is protected by a lock or not subject to contention (as is the
> case here), then
> list_for_each_entry() is the simpler implementation.  The original
> implementation didn't need to use the _safe version because of the spin lock.
> 
> Or do I have it backwards?
> 
> Michael

I think we need list_for_each_entry_safe() because we delete the list elements while going through them:

Here is the comment on list_for_each_entry_safe():
/**
 * Loop through the list, keeping a backup pointer to the element. This
 * macro allows for the deletion of a list element while looping through the
 * list.
 *
 * See list_for_each_entry for more details.
 */

> 
> >
> > >
> > > >  			list_del(&hpdev->list_entry);
> > > >  			if (hpdev->pci_slot)
> > > >  				pci_destroy_slot(hpdev->pci_slot);
> > > > @@ -3239,7 +3249,6 @@ static int hv_pci_bus_exit(struct hv_device
> > > > *hdev,
> > > bool keep_devs)
> > > >  			put_pcichild(hpdev);
> > > >  			put_pcichild(hpdev);
> > > >  		}
> > > > -		spin_unlock_irqrestore(&hbus->device_list_lock, flags);
> > > >  	}
> > > >
> > > >  	ret = hv_send_resources_released(hdev);
> > > > --
> > > > 2.25.1
> > > >
Rob Herring Aug. 25, 2021, 8:32 p.m. UTC | #7
On Wed, Aug 25, 2021 at 2:11 PM Michael Kelley <mikelley@microsoft.com> wrote:
>
> From: Long Li <longli@microsoft.com> Sent: Tuesday, August 24, 2021 10:28 AM
> >
> > > Subject: Re: [PATCH] PCI: hv: Fix a bug on removing child devices on the bus
> > >
> > > On Tue, Aug 24, 2021 at 12:20:20AM -0700, longli@linuxonhyperv.com wrote:
> > > > From: Long Li <longli@microsoft.com>
> > > >
> > > > In hv_pci_bus_exit, the code is holding a spinlock while calling
> > > > pci_destroy_slot(), which takes a mutex.
> > > >
> > > > This is not safe for spinlock. Fix this by moving the children to be
> > > > deleted to a list on the stack, and removing them after spinlock is
> > > > released.
> > > >
> > > > Fixes: 94d22763207a ("PCI: hv: Fix a race condition when removing the
> > > > device")
> > > >
> > > > Cc: "K. Y. Srinivasan" <kys@microsoft.com>
> > > > Cc: Haiyang Zhang <haiyangz@microsoft.com>
> > > > Cc: Stephen Hemminger <sthemmin@microsoft.com>
> > > > Cc: Wei Liu <wei.liu@kernel.org>
> > > > Cc: Dexuan Cui <decui@microsoft.com>
> > > > Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> > > > Cc: Rob Herring <robh@kernel.org>
> > > > Cc: "Krzysztof Wilczyński" <kw@linux.com>
> > > > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > > > Cc: Michael Kelley <mikelley@microsoft.com>
> > > > Cc: Dan Carpenter <dan.carpenter@oracle.com>
> > > > Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> > > > Signed-off-by: Long Li <longli@microsoft.com>
> > > > ---
> > > >  drivers/pci/controller/pci-hyperv.c | 15 ++++++++++++---
> > > >  1 file changed, 12 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/pci/controller/pci-hyperv.c
> > > > b/drivers/pci/controller/pci-hyperv.c
> > > > index a53bd8728d0d..d4f3cce18957 100644
> > > > --- a/drivers/pci/controller/pci-hyperv.c
> > > > +++ b/drivers/pci/controller/pci-hyperv.c
> > > > @@ -3220,6 +3220,7 @@ static int hv_pci_bus_exit(struct hv_device *hdev,
> > > bool keep_devs)
> > > >   struct hv_pci_dev *hpdev, *tmp;
> > > >   unsigned long flags;
> > > >   int ret;
> > > > + struct list_head removed;
> > >
> > > This can be moved to where it is needed -- the if(!keep_dev) branch -- to limit its
> > > scope.
> > >
> > > >
> > > >   /*
> > > >    * After the host sends the RESCIND_CHANNEL message, it doesn't @@
> > > > -3229,9 +3230,18 @@ static int hv_pci_bus_exit(struct hv_device *hdev, bool
> > > keep_devs)
> > > >           return 0;
> > > >
> > > >   if (!keep_devs) {
> > > > -         /* Delete any children which might still exist. */
> > > > +         INIT_LIST_HEAD(&removed);
> > > > +
> > > > +         /* Move all present children to the list on stack */
> > > >           spin_lock_irqsave(&hbus->device_list_lock, flags);
> > > > -         list_for_each_entry_safe(hpdev, tmp, &hbus->children,
> > > list_entry) {
> > > > +         list_for_each_entry_safe(hpdev, tmp, &hbus->children,
> > > list_entry)
> > > > +                 list_move_tail(&hpdev->list_entry, &removed);
> > > > +         spin_unlock_irqrestore(&hbus->device_list_lock, flags);
> > > > +
> > > > +         /* Remove all children in the list */
> > > > +         while (!list_empty(&removed)) {
> > > > +                 hpdev = list_first_entry(&removed, struct hv_pci_dev,
> > > > +                                          list_entry);
> > >
> > > list_for_each_entry_safe can also be used here, right?
> > >
> > > Wei.
> >
> > I will address your comments.
> >
> > Long
>
> I thought list_for_each_entry_safe() is for use when list manipulation
> is *not* protected by a lock and you want to safely walk the list
> even if an entry gets removed.  If the list is protected by a lock or
> not subject to contention (as is the case here), then
> list_for_each_entry() is the simpler implementation.  The original
> implementation didn't need to use the _safe version because of
> the spin lock.
>
> Or do I have it backwards?

"_safe" only means "safe against removal of list entry" as the
kerneldoc says. But that means removal within the loop iteration, not
any writer. A lock is needed in either case if there's another writer.

Don't ask me about the RCU variant though...

Rob
Michael Kelley (LINUX) Aug. 26, 2021, 4:50 p.m. UTC | #8
From: Long Li <longli@microsoft.com> Sent: Wednesday, August 25, 2021 1:25 PM

> >
> > I thought list_for_each_entry_safe() is for use when list manipulation is *not*
> > protected by a lock and you want to safely walk the list even if an entry gets
> > removed.  If the list is protected by a lock or not subject to contention (as is the
> > case here), then
> > list_for_each_entry() is the simpler implementation.  The original
> > implementation didn't need to use the _safe version because of the spin lock.
> >
> > Or do I have it backwards?
> >
> > Michael
> 
> I think we need list_for_each_entry_safe() because we delete the list elements while going through them:
> 
> Here is the comment on list_for_each_entry_safe():
> /**
>  * Loop through the list, keeping a backup pointer to the element. This
>  * macro allows for the deletion of a list element while looping through the
>  * list.
>  *
>  * See list_for_each_entry for more details.
>  */
> 

Got it.  Thanks (and to Rob Herring).   I read that comment but
with the wrong assumptions and didn't understand it correctly.

Interestingly, pci-hyperv.c has another case of looping through
this list and removing items where the _safe version is not used.
See pci_devices_present_work() where the missing children are
moved to a list on the stack.

Michael
Wei Liu Aug. 26, 2021, 7:41 p.m. UTC | #9
On Thu, Aug 26, 2021 at 04:50:28PM +0000, Michael Kelley wrote:
> From: Long Li <longli@microsoft.com> Sent: Wednesday, August 25, 2021 1:25 PM
> 
> > >
> > > I thought list_for_each_entry_safe() is for use when list manipulation is *not*
> > > protected by a lock and you want to safely walk the list even if an entry gets
> > > removed.  If the list is protected by a lock or not subject to contention (as is the
> > > case here), then
> > > list_for_each_entry() is the simpler implementation.  The original
> > > implementation didn't need to use the _safe version because of the spin lock.
> > >
> > > Or do I have it backwards?
> > >
> > > Michael
> > 
> > I think we need list_for_each_entry_safe() because we delete the list elements while going through them:
> > 
> > Here is the comment on list_for_each_entry_safe():
> > /**
> >  * Loop through the list, keeping a backup pointer to the element. This
> >  * macro allows for the deletion of a list element while looping through the
> >  * list.
> >  *
> >  * See list_for_each_entry for more details.
> >  */
> > 
> 
> Got it.  Thanks (and to Rob Herring).   I read that comment but
> with the wrong assumptions and didn't understand it correctly.
> 
> Interestingly, pci-hyperv.c has another case of looping through
> this list and removing items where the _safe version is not used.
> See pci_devices_present_work() where the missing children are
> moved to a list on the stack.

That can be converted too, I think.

The original code is not wrong per-se. It is just not as concise as
using list_for_each_entry_safe.

Wei.

> 
> Michael
Long Li Aug. 26, 2021, 8:09 p.m. UTC | #10
> Subject: Re: [PATCH] PCI: hv: Fix a bug on removing child devices on the bus
> 
> On Thu, Aug 26, 2021 at 04:50:28PM +0000, Michael Kelley wrote:
> > From: Long Li <longli@microsoft.com> Sent: Wednesday, August 25, 2021
> > 1:25 PM
> >
> > > >
> > > > I thought list_for_each_entry_safe() is for use when list
> > > > manipulation is *not* protected by a lock and you want to safely
> > > > walk the list even if an entry gets removed.  If the list is
> > > > protected by a lock or not subject to contention (as is the case
> > > > here), then
> > > > list_for_each_entry() is the simpler implementation.  The original
> > > > implementation didn't need to use the _safe version because of the spin
> lock.
> > > >
> > > > Or do I have it backwards?
> > > >
> > > > Michael
> > >
> > > I think we need list_for_each_entry_safe() because we delete the list
> elements while going through them:
> > >
> > > Here is the comment on list_for_each_entry_safe():
> > > /**
> > >  * Loop through the list, keeping a backup pointer to the element.
> > > This
> > >  * macro allows for the deletion of a list element while looping
> > > through the
> > >  * list.
> > >  *
> > >  * See list_for_each_entry for more details.
> > >  */
> > >
> >
> > Got it.  Thanks (and to Rob Herring).   I read that comment but
> > with the wrong assumptions and didn't understand it correctly.
> >
> > Interestingly, pci-hyperv.c has another case of looping through this
> > list and removing items where the _safe version is not used.
> > See pci_devices_present_work() where the missing children are moved to
> > a list on the stack.
> 
> That can be converted too, I think.
> 
> The original code is not wrong per-se. It is just not as concise as using
> list_for_each_entry_safe.
> 
> Wei.

I assume we are talking about the following code in pci_devices_present_work():

                list_for_each_entry(hpdev, &hbus->children, list_entry) {
                        if (hpdev->reported_missing) {
                                found = true;
                                put_pcichild(hpdev);
                                list_move_tail(&hpdev->list_entry, &removed);
                                break;
                        }
                }

This code is correct as there is a "break" after a list entry is removed from the list. So there is no need to use the _safe version. This code can be converted to use the _safe version.
Wei Liu Aug. 26, 2021, 8:15 p.m. UTC | #11
On Thu, Aug 26, 2021 at 08:09:19PM +0000, Long Li wrote:
> > Subject: Re: [PATCH] PCI: hv: Fix a bug on removing child devices on the bus
> > 
> > On Thu, Aug 26, 2021 at 04:50:28PM +0000, Michael Kelley wrote:
> > > From: Long Li <longli@microsoft.com> Sent: Wednesday, August 25, 2021
> > > 1:25 PM
> > >
> > > > >
> > > > > I thought list_for_each_entry_safe() is for use when list
> > > > > manipulation is *not* protected by a lock and you want to safely
> > > > > walk the list even if an entry gets removed.  If the list is
> > > > > protected by a lock or not subject to contention (as is the case
> > > > > here), then
> > > > > list_for_each_entry() is the simpler implementation.  The original
> > > > > implementation didn't need to use the _safe version because of the spin
> > lock.
> > > > >
> > > > > Or do I have it backwards?
> > > > >
> > > > > Michael
> > > >
> > > > I think we need list_for_each_entry_safe() because we delete the list
> > elements while going through them:
> > > >
> > > > Here is the comment on list_for_each_entry_safe():
> > > > /**
> > > >  * Loop through the list, keeping a backup pointer to the element.
> > > > This
> > > >  * macro allows for the deletion of a list element while looping
> > > > through the
> > > >  * list.
> > > >  *
> > > >  * See list_for_each_entry for more details.
> > > >  */
> > > >
> > >
> > > Got it.  Thanks (and to Rob Herring).   I read that comment but
> > > with the wrong assumptions and didn't understand it correctly.
> > >
> > > Interestingly, pci-hyperv.c has another case of looping through this
> > > list and removing items where the _safe version is not used.
> > > See pci_devices_present_work() where the missing children are moved to
> > > a list on the stack.
> > 
> > That can be converted too, I think.
> > 
> > The original code is not wrong per-se. It is just not as concise as using
> > list_for_each_entry_safe.
> > 
> > Wei.
> 
> I assume we are talking about the following code in pci_devices_present_work():
> 
>                 list_for_each_entry(hpdev, &hbus->children, list_entry) {
>                         if (hpdev->reported_missing) {
>                                 found = true;
>                                 put_pcichild(hpdev);
>                                 list_move_tail(&hpdev->list_entry, &removed);
>                                 break;
>                         }
>                 }
> 
> This code is correct as there is a "break" after a list entry is
> removed from the list. So there is no need to use the _safe version.
> This code can be converted to use the _safe version.

After this block there is another block like

  while (!list_empty(removed)) {
	...
  	list_del(...)

  }

I assumed Michael was referring to that block. :-)

Wei.
Long Li Aug. 26, 2021, 8:20 p.m. UTC | #12
> Subject: Re: [PATCH] PCI: hv: Fix a bug on removing child devices on the bus
> 
> On Thu, Aug 26, 2021 at 08:09:19PM +0000, Long Li wrote:
> > > Subject: Re: [PATCH] PCI: hv: Fix a bug on removing child devices on
> > > the bus
> > >
> > > On Thu, Aug 26, 2021 at 04:50:28PM +0000, Michael Kelley wrote:
> > > > From: Long Li <longli@microsoft.com> Sent: Wednesday, August 25,
> > > > 2021
> > > > 1:25 PM
> > > >
> > > > > >
> > > > > > I thought list_for_each_entry_safe() is for use when list
> > > > > > manipulation is *not* protected by a lock and you want to
> > > > > > safely walk the list even if an entry gets removed.  If the
> > > > > > list is protected by a lock or not subject to contention (as
> > > > > > is the case here), then
> > > > > > list_for_each_entry() is the simpler implementation.  The
> > > > > > original implementation didn't need to use the _safe version
> > > > > > because of the spin
> > > lock.
> > > > > >
> > > > > > Or do I have it backwards?
> > > > > >
> > > > > > Michael
> > > > >
> > > > > I think we need list_for_each_entry_safe() because we delete the
> > > > > list
> > > elements while going through them:
> > > > >
> > > > > Here is the comment on list_for_each_entry_safe():
> > > > > /**
> > > > >  * Loop through the list, keeping a backup pointer to the element.
> > > > > This
> > > > >  * macro allows for the deletion of a list element while looping
> > > > > through the
> > > > >  * list.
> > > > >  *
> > > > >  * See list_for_each_entry for more details.
> > > > >  */
> > > > >
> > > >
> > > > Got it.  Thanks (and to Rob Herring).   I read that comment but
> > > > with the wrong assumptions and didn't understand it correctly.
> > > >
> > > > Interestingly, pci-hyperv.c has another case of looping through
> > > > this list and removing items where the _safe version is not used.
> > > > See pci_devices_present_work() where the missing children are
> > > > moved to a list on the stack.
> > >
> > > That can be converted too, I think.
> > >
> > > The original code is not wrong per-se. It is just not as concise as
> > > using list_for_each_entry_safe.
> > >
> > > Wei.
> >
> > I assume we are talking about the following code in
> pci_devices_present_work():
> >
> >                 list_for_each_entry(hpdev, &hbus->children, list_entry) {
> >                         if (hpdev->reported_missing) {
> >                                 found = true;
> >                                 put_pcichild(hpdev);
> >                                 list_move_tail(&hpdev->list_entry, &removed);
> >                                 break;
> >                         }
> >                 }
> >
> > This code is correct as there is a "break" after a list entry is
> > removed from the list. So there is no need to use the _safe version.
> > This code can be converted to use the _safe version.
> 
> After this block there is another block like
> 
>   while (!list_empty(removed)) {
> 	...
>   	list_del(...)
> 
>   }
> 
> I assumed Michael was referring to that block. :-)
> 
> Wei.

This block is also correct. We don't have a bug here but there is a better way to code it.

Long
diff mbox series

Patch

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index a53bd8728d0d..d4f3cce18957 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -3220,6 +3220,7 @@  static int hv_pci_bus_exit(struct hv_device *hdev, bool keep_devs)
 	struct hv_pci_dev *hpdev, *tmp;
 	unsigned long flags;
 	int ret;
+	struct list_head removed;
 
 	/*
 	 * After the host sends the RESCIND_CHANNEL message, it doesn't
@@ -3229,9 +3230,18 @@  static int hv_pci_bus_exit(struct hv_device *hdev, bool keep_devs)
 		return 0;
 
 	if (!keep_devs) {
-		/* Delete any children which might still exist. */
+		INIT_LIST_HEAD(&removed);
+
+		/* Move all present children to the list on stack */
 		spin_lock_irqsave(&hbus->device_list_lock, flags);
-		list_for_each_entry_safe(hpdev, tmp, &hbus->children, list_entry) {
+		list_for_each_entry_safe(hpdev, tmp, &hbus->children, list_entry)
+			list_move_tail(&hpdev->list_entry, &removed);
+		spin_unlock_irqrestore(&hbus->device_list_lock, flags);
+
+		/* Remove all children in the list */
+		while (!list_empty(&removed)) {
+			hpdev = list_first_entry(&removed, struct hv_pci_dev,
+						 list_entry);
 			list_del(&hpdev->list_entry);
 			if (hpdev->pci_slot)
 				pci_destroy_slot(hpdev->pci_slot);
@@ -3239,7 +3249,6 @@  static int hv_pci_bus_exit(struct hv_device *hdev, bool keep_devs)
 			put_pcichild(hpdev);
 			put_pcichild(hpdev);
 		}
-		spin_unlock_irqrestore(&hbus->device_list_lock, flags);
 	}
 
 	ret = hv_send_resources_released(hdev);