diff mbox series

[v2,2/2] PCI: hv: Remove unused refcount and supporting functions for handling bus device removal

Message ID 1619070346-21557-2-git-send-email-longli@linuxonhyperv.com
State New
Headers show
Series [v2,1/2] PCI: hv: Fix a race condition when removing the device | expand

Commit Message

Long Li April 22, 2021, 5:45 a.m. UTC
From: Long Li <longli@microsoft.com>

With the new method of flushing/stopping the workqueue before doing bus
removal, the old mechanisum of using refcount and wait for completion
is no longer needed. Remove those dead code.

Signed-off-by: Long Li <longli@microsoft.com>
---
 drivers/pci/controller/pci-hyperv.c | 34 +++--------------------------
 1 file changed, 3 insertions(+), 31 deletions(-)

Comments

Dexuan Cui April 23, 2021, 7:18 a.m. UTC | #1
> From: longli@linuxonhyperv.com <longli@linuxonhyperv.com>
> Sent: Wednesday, April 21, 2021 10:46 PM
> 
> With the new method of flushing/stopping the workqueue before doing bus
> removal, the old mechanisum of using refcount and wait for completion

mechanisum -> mechanism

> is no longer needed. Remove those dead code.
> 
> Signed-off-by: Long Li <longli@microsoft.com>
> ---

The patch looks good to me. BTW, can we also remove get_pcichild() and
put_pcichild() in an extra patch? I suspect we don't really need those either.
Long Li April 23, 2021, 6:40 p.m. UTC | #2
> Subject: RE: [Patch v2 2/2] PCI: hv: Remove unused refcount and supporting
> functions for handling bus device removal
> 
> > From: longli@linuxonhyperv.com <longli@linuxonhyperv.com>
> > Sent: Wednesday, April 21, 2021 10:46 PM
> >
> > With the new method of flushing/stopping the workqueue before doing
> > bus removal, the old mechanisum of using refcount and wait for
> > completion
> 
> mechanisum -> mechanism
> 
> > is no longer needed. Remove those dead code.
> >
> > Signed-off-by: Long Li <longli@microsoft.com>
> > ---
> 
> The patch looks good to me. BTW, can we also remove get_pcichild() and
> put_pcichild() in an extra patch? I suspect we don't really need those either.

Those two functions are for protecting accessing to the devices on the hbus. There are interactions from PCI layer that need guarantee from hbus that the device is present at the time of access.

Why do you think we don't' need those?
Dexuan Cui April 25, 2021, 2:57 a.m. UTC | #3
> From: Long Li <longli@microsoft.com>
> Sent: Friday, April 23, 2021 11:40 AM
> To: Dexuan Cui <decui@microsoft.com>; longli@linuxonhyperv.com; KY
> 
> > Subject: RE: [Patch v2 2/2] PCI: hv: Remove unused refcount and supporting
> > functions for handling bus device removal
> >
> > > From: longli@linuxonhyperv.com <longli@linuxonhyperv.com>
> > > Sent: Wednesday, April 21, 2021 10:46 PM
> > >
> > > With the new method of flushing/stopping the workqueue before doing
> > > bus removal, the old mechanisum of using refcount and wait for
> > > completion
> >
> > mechanisum -> mechanism
> >
> > > is no longer needed. Remove those dead code.
> > >
> > > Signed-off-by: Long Li <longli@microsoft.com>
> >
> > The patch looks good to me. BTW, can we also remove get_pcichild() and
> > put_pcichild() in an extra patch? I suspect we don't really need those either.
> 
> Those two functions are for protecting accessing to the devices on the hbus.
> There are interactions from PCI layer that need guarantee from hbus that the
> device is present at the time of access.
> 
> Why do you think we don't' need those?

IMO there is proper locking and synchronization logic in the PCI layer, so
I don't think the 'hpdev' struct can vanish when it's being accessed from the
PCI layer.

I think the 'hpdev' struct can only be freed in two scenarios:
1) the PCI device is removed: the VM receives the PCI_EJECT message,
hv_eject_device_work() calls pci_stop_and_remove_bus_device() to deregister
the pci_dev from the PCI layer, and then does other cleanup, and finally
call kfree(hpdev) in the third put_pcichild() in hv_eject_device_work().

2) the pci-hyperv driver is unloaded: in this case, hv_pci_remove() calls
pci_remove_root_bus() to deregister the pci_dev, and then calls
hv_pci_bus_exit() -> hv_pci_start_relations_work(), and eventually
pci_devices_present_work() decreases the ref counter to zero and free
the 'hpdev'.

In both the case, when the hpdev or the pdev is still being used by the
PCI layer, I think the pci_stop_and_remove_bus_device() and
pci_remove_root_bus() should be blocked, i.e. the hpdev can't be freed
even if we don't have the ref counter mechanism.

For example. we know the 'lspci' program can read the PCI device's config
space directly via the sysfile /sys/bus/pci/devices/XXXX/config; when
'lspci' is reading the config space, the code path can be:

do_syscall_64
  ksys_pread64
    vfs_read
      new_sync_read
        kernfs_fop_read_iter
          kernfs_file_read_iter
            kernfs_get_active -> atomic_inc_unless_negative(&kn->active).
            pci_read_config
              pci_user_read_config_dword
                hv_pcifront_read_config
                  _hv_pcifront_read_config

At this moment, if the host tries to remove the PCI device, the PCI_EJECT
code path will be blocked due to the kn->active:

hv_eject_device_work()
  pci_stop_and_remove_bus_device()
    pci_stop_bus_device
      pci_remove_sysfs_dev_files
        kernfs_remove_by_name_ns
          __kernfs_remove
            kernfs_drain
               wait_event(root->deactivate_waitq,
                        atomic_read(&kn->active) == KN_DEACTIVATED_BIAS);

I don't check all the scenarios and code paths, but generally speaking
I suppose the PCI subsystem should already have the proper locking and
synchronization logic we need here.

PS, I'm not sure if the host can remove the device by only sending
a PCI_BUS_RELATIONS message with bus_rel->device_count == 0 (i.e. no
PCI_EJECT message): in this case, if we don't have the ref counter for
hpdev, pci_devices_present_work() frees the hpdev before calling
pci_scan_child_bus() and this can be a potential race condition. But IMO
this can be easily fixed by moving "free the hpdev" to a later place, afer
pci_scan_child_bus() is called.

Thanks,
-- Dexuan
diff mbox series

Patch

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index fc948a2ed703..e6b4ee323068 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -452,7 +452,6 @@  struct hv_pcibus_device {
 	/* Protocol version negotiated with the host */
 	enum pci_protocol_version_t protocol_version;
 	enum hv_pcibus_state state;
-	refcount_t remove_lock;
 	struct hv_device *hdev;
 	resource_size_t low_mmio_space;
 	resource_size_t high_mmio_space;
@@ -460,7 +459,6 @@  struct hv_pcibus_device {
 	struct resource *low_mmio_res;
 	struct resource *high_mmio_res;
 	struct completion *survey_event;
-	struct completion remove_event;
 	struct pci_bus *pci_bus;
 	spinlock_t config_lock;	/* Avoid two threads writing index page */
 	spinlock_t device_list_lock;	/* Protect lists below */
@@ -593,9 +591,6 @@  static void put_pcichild(struct hv_pci_dev *hpdev)
 		kfree(hpdev);
 }
 
-static void get_hvpcibus(struct hv_pcibus_device *hv_pcibus);
-static void put_hvpcibus(struct hv_pcibus_device *hv_pcibus);
-
 /*
  * There is no good way to get notified from vmbus_onoffer_rescind(),
  * so let's use polling here, since this is not a hot path.
@@ -2067,10 +2062,8 @@  static void pci_devices_present_work(struct work_struct *work)
 	}
 	spin_unlock_irqrestore(&hbus->device_list_lock, flags);
 
-	if (!dr) {
-		put_hvpcibus(hbus);
+	if (!dr)
 		return;
-	}
 
 	/* First, mark all existing children as reported missing. */
 	spin_lock_irqsave(&hbus->device_list_lock, flags);
@@ -2153,7 +2146,6 @@  static void pci_devices_present_work(struct work_struct *work)
 		break;
 	}
 
-	put_hvpcibus(hbus);
 	kfree(dr);
 }
 
@@ -2194,12 +2186,10 @@  static int hv_pci_start_relations_work(struct hv_pcibus_device *hbus,
 	list_add_tail(&dr->list_entry, &hbus->dr_list);
 	spin_unlock_irqrestore(&hbus->device_list_lock, flags);
 
-	if (pending_dr) {
+	if (pending_dr)
 		kfree(dr_wrk);
-	} else {
-		get_hvpcibus(hbus);
+	else
 		queue_work(hbus->wq, &dr_wrk->wrk);
-	}
 
 	return 0;
 }
@@ -2342,8 +2332,6 @@  static void hv_eject_device_work(struct work_struct *work)
 	put_pcichild(hpdev);
 	put_pcichild(hpdev);
 	/* hpdev has been freed. Do not use it any more. */
-
-	put_hvpcibus(hbus);
 }
 
 /**
@@ -2367,7 +2355,6 @@  static void hv_pci_eject_device(struct hv_pci_dev *hpdev)
 	hpdev->state = hv_pcichild_ejecting;
 	get_pcichild(hpdev);
 	INIT_WORK(&hpdev->wrk, hv_eject_device_work);
-	get_hvpcibus(hbus);
 	queue_work(hbus->wq, &hpdev->wrk);
 }
 
@@ -2967,17 +2954,6 @@  static int hv_send_resources_released(struct hv_device *hdev)
 	return 0;
 }
 
-static void get_hvpcibus(struct hv_pcibus_device *hbus)
-{
-	refcount_inc(&hbus->remove_lock);
-}
-
-static void put_hvpcibus(struct hv_pcibus_device *hbus)
-{
-	if (refcount_dec_and_test(&hbus->remove_lock))
-		complete(&hbus->remove_event);
-}
-
 #define HVPCI_DOM_MAP_SIZE (64 * 1024)
 static DECLARE_BITMAP(hvpci_dom_map, HVPCI_DOM_MAP_SIZE);
 
@@ -3097,14 +3073,12 @@  static int hv_pci_probe(struct hv_device *hdev,
 	hbus->sysdata.domain = dom;
 
 	hbus->hdev = hdev;
-	refcount_set(&hbus->remove_lock, 1);
 	INIT_LIST_HEAD(&hbus->children);
 	INIT_LIST_HEAD(&hbus->dr_list);
 	INIT_LIST_HEAD(&hbus->resources_for_children);
 	spin_lock_init(&hbus->config_lock);
 	spin_lock_init(&hbus->device_list_lock);
 	spin_lock_init(&hbus->retarget_msi_interrupt_lock);
-	init_completion(&hbus->remove_event);
 	hbus->wq = alloc_ordered_workqueue("hv_pci_%x", 0,
 					   hbus->sysdata.domain);
 	if (!hbus->wq) {
@@ -3332,8 +3306,6 @@  static int hv_pci_remove(struct hv_device *hdev)
 	hv_pci_free_bridge_windows(hbus);
 	irq_domain_remove(hbus->irq_domain);
 	irq_domain_free_fwnode(hbus->sysdata.fwnode);
-	put_hvpcibus(hbus);
-	wait_for_completion(&hbus->remove_event);
 
 	hv_put_dom_num(hbus->sysdata.domain);