diff mbox series

PCI: hv: Fix a race condition when removing the device

Message ID 1618860054-928-1-git-send-email-longli@linuxonhyperv.com
State New
Headers show
Series PCI: hv: Fix a race condition when removing the device | expand

Commit Message

Long Li April 19, 2021, 7:20 p.m. UTC
From: Long Li <longli@microsoft.com>

On removing the device, any work item (hv_pci_devices_present() or
hv_pci_eject_device()) scheduled on workqueue hbus->wq may still be running
and race with hv_pci_remove().

This can happen because the host may send PCI_EJECT or PCI_BUS_RELATIONS(2)
and decide to rescind the channel immediately after that.

Fix this by flushing/stopping the workqueue of hbus before doing hbus remove.

Signed-off-by: Long Li <longli@microsoft.com>
---
 drivers/pci/controller/pci-hyperv.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Michael Kelley (LINUX) April 21, 2021, 5:33 p.m. UTC | #1
From: longli@linuxonhyperv.com <longli@linuxonhyperv.com>  Sent: Monday, April 19, 2021 12:21 PM
> 
> On removing the device, any work item (hv_pci_devices_present() or
> hv_pci_eject_device()) scheduled on workqueue hbus->wq may still be running
> and race with hv_pci_remove().
> 
> This can happen because the host may send PCI_EJECT or PCI_BUS_RELATIONS(2)
> and decide to rescind the channel immediately after that.
> 
> Fix this by flushing/stopping the workqueue of hbus before doing hbus remove.

I can see that this change follows the same pattern as in hv_pci_suspend().   The
comments there give a full explanation of the issue and the solution.  But
interestingly, the current code also has a reference count mechanism on
the hbus.  And code near the end of hv_pci_remove() decrements the reference
count and then waits for all users to finish before destroying the workqueue.
With this change, is this reference counting mechanism still needed?   If the
workqueue has already been emptied, it seems like the wait_for_completion()
near the end of hv_pci_remove() would never be waiting for anything.  It makes
me wonder if moving the reference count checking code from near the end of
hv_pci_remove() up to near the beginning would solve the problem as well
(and maybe in hv_pci_suspend also?).

Michael 

> 
> Signed-off-by: Long Li <longli@microsoft.com>
> ---
>  drivers/pci/controller/pci-hyperv.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index 27a17a1e4a7c..116815404313 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -3305,6 +3305,17 @@ static int hv_pci_remove(struct hv_device *hdev)
> 
>  	hbus = hv_get_drvdata(hdev);
>  	if (hbus->state == hv_pcibus_installed) {
> +		tasklet_disable(&hdev->channel->callback_event);
> +		hbus->state = hv_pcibus_removing;
> +		tasklet_enable(&hdev->channel->callback_event);
> +
> +		flush_workqueue(hbus->wq);
> +		/*
> +		 * At this point, no work is running or can be scheduled
> +		 * on hbus-wq. We can't race with hv_pci_devices_present()
> +		 * or hv_pci_eject_device(), it's safe to proceed.
> +		 */
> +
>  		/* Remove the bus from PCI's point of view. */
>  		pci_lock_rescan_remove();
>  		pci_stop_root_bus(hbus->pci_bus);
> --
> 2.27.0
Long Li April 21, 2021, 7:56 p.m. UTC | #2
> Subject: RE: [PATCH] PCI: hv: Fix a race condition when removing the device
> 
> From: longli@linuxonhyperv.com <longli@linuxonhyperv.com>  Sent:
> Monday, April 19, 2021 12:21 PM
> >
> > On removing the device, any work item (hv_pci_devices_present() or
> > hv_pci_eject_device()) scheduled on workqueue hbus->wq may still be
> > running and race with hv_pci_remove().
> >
> > This can happen because the host may send PCI_EJECT or
> > PCI_BUS_RELATIONS(2) and decide to rescind the channel immediately
> after that.
> >
> > Fix this by flushing/stopping the workqueue of hbus before doing hbus
> remove.
> 
> I can see that this change follows the same pattern as in hv_pci_suspend().
> The
> comments there give a full explanation of the issue and the solution.  But
> interestingly, the current code also has a reference count mechanism on the
> hbus.  And code near the end of hv_pci_remove() decrements the reference
> count and then waits for all users to finish before destroying the workqueue.
> With this change, is this reference counting mechanism still needed?   If the
> workqueue has already been emptied, it seems like the
> wait_for_completion() near the end of hv_pci_remove() would never be
> waiting for anything.  It makes me wonder if moving the reference count
> checking code from near the end of
> hv_pci_remove() up to near the beginning would solve the problem as well
> (and maybe in hv_pci_suspend also?).

Yes I think put_hvpcibus() and get_hvpcibus() can be removed, as we have changed to use a dedicated workqueue for hbus since they were introduced.

But we still need to call tasklet_disable/enable() the same way hv_pci_suspend() does, the reason is that we need to protect hbus->state. This value needs to be consistent for the driver. For example, a CPU may decide to schedule a work on a work queue that we just flushed or destroyed, by reading the wrong hbus->state.

> 
> Michael
> 
> >
> > Signed-off-by: Long Li <longli@microsoft.com>
> > ---
> >  drivers/pci/controller/pci-hyperv.c | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> >
> > diff --git a/drivers/pci/controller/pci-hyperv.c
> > b/drivers/pci/controller/pci-hyperv.c
> > index 27a17a1e4a7c..116815404313 100644
> > --- a/drivers/pci/controller/pci-hyperv.c
> > +++ b/drivers/pci/controller/pci-hyperv.c
> > @@ -3305,6 +3305,17 @@ static int hv_pci_remove(struct hv_device
> > *hdev)
> >
> >  	hbus = hv_get_drvdata(hdev);
> >  	if (hbus->state == hv_pcibus_installed) {
> > +		tasklet_disable(&hdev->channel->callback_event);
> > +		hbus->state = hv_pcibus_removing;
> > +		tasklet_enable(&hdev->channel->callback_event);
> > +
> > +		flush_workqueue(hbus->wq);
> > +		/*
> > +		 * At this point, no work is running or can be scheduled
> > +		 * on hbus-wq. We can't race with hv_pci_devices_present()
> > +		 * or hv_pci_eject_device(), it's safe to proceed.
> > +		 */
> > +
> >  		/* Remove the bus from PCI's point of view. */
> >  		pci_lock_rescan_remove();
> >  		pci_stop_root_bus(hbus->pci_bus);
> > --
> > 2.27.0
Michael Kelley (LINUX) April 21, 2021, 9:05 p.m. UTC | #3
From: Long Li <longli@microsoft.com> Sent: Wednesday, April 21, 2021 12:57 PM
> > From: longli@linuxonhyperv.com <longli@linuxonhyperv.com>  Sent:
> > Monday, April 19, 2021 12:21 PM
> > >
> > > On removing the device, any work item (hv_pci_devices_present() or
> > > hv_pci_eject_device()) scheduled on workqueue hbus->wq may still be
> > > running and race with hv_pci_remove().
> > >
> > > This can happen because the host may send PCI_EJECT or
> > > PCI_BUS_RELATIONS(2) and decide to rescind the channel immediately
> > after that.
> > >
> > > Fix this by flushing/stopping the workqueue of hbus before doing hbus
> > remove.
> >
> > I can see that this change follows the same pattern as in hv_pci_suspend().
> > The comments there give a full explanation of the issue and the solution.  But
> > interestingly, the current code also has a reference count mechanism on the
> > hbus.  And code near the end of hv_pci_remove() decrements the reference
> > count and then waits for all users to finish before destroying the workqueue.
> > With this change, is this reference counting mechanism still needed?   If the
> > workqueue has already been emptied, it seems like the
> > wait_for_completion() near the end of hv_pci_remove() would never be
> > waiting for anything.  It makes me wonder if moving the reference count
> > checking code from near the end of hv_pci_remove() up to near the beginning
> > would solve the problem as well (and maybe in hv_pci_suspend also?).
> 
> Yes I think put_hvpcibus() and get_hvpcibus() can be removed, as we have changed to use
> a dedicated workqueue for hbus since they were introduced.
> 
> But we still need to call tasklet_disable/enable() the same way hv_pci_suspend() does, the
> reason is that we need to protect hbus->state. This value needs to be consistent for the
> driver. For example, a CPU may decide to schedule a work on a work queue that we just
> flushed or destroyed, by reading the wrong hbus->state.
> 

Yes, I would agree the tasklet disable/enable are needed, especially since tasklet_disable()
is what ensures that the tasklet is not currently running.

If the hbus ref counting isn't needed any longer, I would strongly recommend adding
a patch to the series that removes it.  This synchronization stuff is hard enough to
understand and reason about; having a leftover mechanism that doesn't really do
anything useful makes it nearly impossible. :-)

Dexuan -- I'm hoping you can take a look as well and see if you agree.

Michael

> >
> > Michael
> >
> > >
> > > Signed-off-by: Long Li <longli@microsoft.com>
> > > ---
> > >  drivers/pci/controller/pci-hyperv.c | 11 +++++++++++
> > >  1 file changed, 11 insertions(+)
> > >
> > > diff --git a/drivers/pci/controller/pci-hyperv.c
> > > b/drivers/pci/controller/pci-hyperv.c
> > > index 27a17a1e4a7c..116815404313 100644
> > > --- a/drivers/pci/controller/pci-hyperv.c
> > > +++ b/drivers/pci/controller/pci-hyperv.c
> > > @@ -3305,6 +3305,17 @@ static int hv_pci_remove(struct hv_device
> > > *hdev)
> > >
> > >  	hbus = hv_get_drvdata(hdev);
> > >  	if (hbus->state == hv_pcibus_installed) {
> > > +		tasklet_disable(&hdev->channel->callback_event);
> > > +		hbus->state = hv_pcibus_removing;
> > > +		tasklet_enable(&hdev->channel->callback_event);
> > > +
> > > +		flush_workqueue(hbus->wq);
> > > +		/*
> > > +		 * At this point, no work is running or can be scheduled
> > > +		 * on hbus-wq. We can't race with hv_pci_devices_present()
> > > +		 * or hv_pci_eject_device(), it's safe to proceed.
> > > +		 */
> > > +
> > >  		/* Remove the bus from PCI's point of view. */
> > >  		pci_lock_rescan_remove();
> > >  		pci_stop_root_bus(hbus->pci_bus);
> > > --
> > > 2.27.0
Dexuan Cui April 22, 2021, 2:31 a.m. UTC | #4
> From: Michael Kelley <mikelley@microsoft.com>
> Sent: Wednesday, April 21, 2021 2:06 PM
>  ...
> > Yes I think put_hvpcibus() and get_hvpcibus() can be removed, as we have
> > changed to use
> > a dedicated workqueue for hbus since they were introduced.
> >
> > But we still need to call tasklet_disable/enable() the same way
> > hv_pci_suspend() does, the
> > reason is that we need to protect hbus->state. This value needs to be
> consistent for the
> > driver. For example, a CPU may decide to schedule a work on a work queue
> that we just
> > flushed or destroyed, by reading the wrong hbus->state.
> >
> 
> Yes, I would agree the tasklet disable/enable are needed, especially since
> tasklet_disable()
> is what ensures that the tasklet is not currently running.
> 
> If the hbus ref counting isn't needed any longer, I would strongly recommend
> adding
> a patch to the series that removes it.  This synchronization stuff is hard
> enough to
> understand and reason about; having a leftover mechanism that doesn't really
> do
> anything useful makes it nearly impossible. :-)
> 
> Dexuan -- I'm hoping you can take a look as well and see if you agree.
> 
> Michael

I also think we can remove the reference counting.

But it looks like there is still race in hv_pci_remove() even with Long's
patch: in hv_pci_remove(), we disable the tasklet, change hbus->state to
hv_pcibus_removing, re-enable the tasklet and flush hbus->wq, and set
hbus->state to hv_pcibus_removed -- what if the channel callback runs
again? -- now hbus->state is no longer hv_pcibus_removing, so
hv_pci_devices_present() -> hv_pci_start_relations_work() and
hv_pci_eject_device() can still add new work items to hbus->wq, and the new
work items may race with the vmbus_close().

It looks like we should remove the state hv_pcibus_removed?

Thanks,
-- Dexuan
Long Li April 22, 2021, 3:57 a.m. UTC | #5
> Subject: RE: [PATCH] PCI: hv: Fix a race condition when removing the device
> 
> > From: Michael Kelley <mikelley@microsoft.com>
> > Sent: Wednesday, April 21, 2021 2:06 PM  ...
> > > Yes I think put_hvpcibus() and get_hvpcibus() can be removed, as we
> > > have changed to use a dedicated workqueue for hbus since they were
> > > introduced.
> > >
> > > But we still need to call tasklet_disable/enable() the same way
> > > hv_pci_suspend() does, the
> > > reason is that we need to protect hbus->state. This value needs to
> > > be
> > consistent for the
> > > driver. For example, a CPU may decide to schedule a work on a work
> > > queue
> > that we just
> > > flushed or destroyed, by reading the wrong hbus->state.
> > >
> >
> > Yes, I would agree the tasklet disable/enable are needed, especially
> > since
> > tasklet_disable()
> > is what ensures that the tasklet is not currently running.
> >
> > If the hbus ref counting isn't needed any longer, I would strongly
> > recommend adding a patch to the series that removes it.  This
> > synchronization stuff is hard enough to understand and reason about;
> > having a leftover mechanism that doesn't really do anything useful
> > makes it nearly impossible. :-)
> >
> > Dexuan -- I'm hoping you can take a look as well and see if you agree.
> >
> > Michael
> 
> I also think we can remove the reference counting.
> 
> But it looks like there is still race in hv_pci_remove() even with Long's
> patch: in hv_pci_remove(), we disable the tasklet, change hbus->state to
> hv_pcibus_removing, re-enable the tasklet and flush hbus->wq, and set
> hbus->state to hv_pcibus_removed -- what if the channel callback runs
> again? -- now hbus->state is no longer hv_pcibus_removing, so
> hv_pci_devices_present() -> hv_pci_start_relations_work() and
> hv_pci_eject_device() can still add new work items to hbus->wq, and the
> new work items may race with the vmbus_close().

Good catch, adding a check for hv_pcibus_removed should fix it. I'm sending a v2.

> 
> It looks like we should remove the state hv_pcibus_removed?
> 
> Thanks,
> -- Dexuan
Long Li April 22, 2021, 4:19 a.m. UTC | #6
> Subject: RE: [PATCH] PCI: hv: Fix a race condition when removing the device
> 
> > Subject: RE: [PATCH] PCI: hv: Fix a race condition when removing the
> > device
> >
> > > From: Michael Kelley <mikelley@microsoft.com>
> > > Sent: Wednesday, April 21, 2021 2:06 PM  ...
> > > > Yes I think put_hvpcibus() and get_hvpcibus() can be removed, as
> > > > we have changed to use a dedicated workqueue for hbus since they
> > > > were introduced.
> > > >
> > > > But we still need to call tasklet_disable/enable() the same way
> > > > hv_pci_suspend() does, the
> > > > reason is that we need to protect hbus->state. This value needs to
> > > > be
> > > consistent for the
> > > > driver. For example, a CPU may decide to schedule a work on a work
> > > > queue
> > > that we just
> > > > flushed or destroyed, by reading the wrong hbus->state.
> > > >
> > >
> > > Yes, I would agree the tasklet disable/enable are needed, especially
> > > since
> > > tasklet_disable()
> > > is what ensures that the tasklet is not currently running.
> > >
> > > If the hbus ref counting isn't needed any longer, I would strongly
> > > recommend adding a patch to the series that removes it.  This
> > > synchronization stuff is hard enough to understand and reason about;
> > > having a leftover mechanism that doesn't really do anything useful
> > > makes it nearly impossible. :-)
> > >
> > > Dexuan -- I'm hoping you can take a look as well and see if you agree.
> > >
> > > Michael
> >
> > I also think we can remove the reference counting.
> >
> > But it looks like there is still race in hv_pci_remove() even with
> > Long's
> > patch: in hv_pci_remove(), we disable the tasklet, change hbus->state
> > to hv_pcibus_removing, re-enable the tasklet and flush hbus->wq, and
> > set
> > hbus->state to hv_pcibus_removed -- what if the channel callback runs
> > again? -- now hbus->state is no longer hv_pcibus_removing, so
> > hv_pci_devices_present() -> hv_pci_start_relations_work() and
> > hv_pci_eject_device() can still add new work items to hbus->wq, and
> > the new work items may race with the vmbus_close().
> 
> Good catch, adding a check for hv_pcibus_removed should fix it. I'm sending
> a v2.

I don't see the reason why we want to assign hbus->state to hv_pcibus_removed at all, so just removing it will take care of the race.

> 
> >
> > It looks like we should remove the state hv_pcibus_removed?
> >
> > Thanks,
> > -- Dexuan
diff mbox series

Patch

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 27a17a1e4a7c..116815404313 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -3305,6 +3305,17 @@  static int hv_pci_remove(struct hv_device *hdev)
 
 	hbus = hv_get_drvdata(hdev);
 	if (hbus->state == hv_pcibus_installed) {
+		tasklet_disable(&hdev->channel->callback_event);
+		hbus->state = hv_pcibus_removing;
+		tasklet_enable(&hdev->channel->callback_event);
+
+		flush_workqueue(hbus->wq);
+		/*
+		 * At this point, no work is running or can be scheduled
+		 * on hbus-wq. We can't race with hv_pci_devices_present()
+		 * or hv_pci_eject_device(), it's safe to proceed.
+		 */
+
 		/* Remove the bus from PCI's point of view. */
 		pci_lock_rescan_remove();
 		pci_stop_root_bus(hbus->pci_bus);