diff mbox series

[RFC,2/6] pci: Set pci_dev->is_added before calling device_add

Message ID 20180817044902.31420-3-benh@kernel.crashing.org
State Not Applicable
Delegated to: Bjorn Helgaas
Headers show
Series pci: Rework is_added race fix and address bridge enable races | expand

Commit Message

Benjamin Herrenschmidt Aug. 17, 2018, 4:48 a.m. UTC
This re-fixes the bug reported by Hari Vyas <hari.vyas@broadcom.com>
after my revert of his commit but in a much simpler way.

The main issues is that is_added was being set after the driver
got bound and started, and thus setting it could race with other
changes to struct pci_dev.

This fixes it by setting the flag first, which also has the
advantage of matching the fact that we are clearing it *after*
unbinding in the remove path, thus the flag is now symtetric
and always set while the driver code is running.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 drivers/pci/bus.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Bjorn Helgaas Aug. 17, 2018, 4:25 p.m. UTC | #1
On Fri, Aug 17, 2018 at 02:48:58PM +1000, Benjamin Herrenschmidt wrote:
> This re-fixes the bug reported by Hari Vyas <hari.vyas@broadcom.com>
> after my revert of his commit but in a much simpler way.
> 
> The main issues is that is_added was being set after the driver
> got bound and started, and thus setting it could race with other
> changes to struct pci_dev.

The "bind driver, then set dev->added = 1" order seems to have been
there since the beginning of dev->is_added:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8a1bc9013a03

This patch seems reasonable, but I'm a little dubious about the
existence of "is_added" in the first place.  As far as I can tell, the
only other buses with something similar are the MEN Chameleon bus and
the Intel Management Engine Interface.

The PCI uses of "is_added" don't seem *that* critical or unique to
PCI, so I'm not 100% convinced we need it at all.  But I haven't
looked into it enough to be able to propose an alternative.

> This fixes it by setting the flag first, which also has the
> advantage of matching the fact that we are clearing it *after*
> unbinding in the remove path, thus the flag is now symtetric
> and always set while the driver code is running.
> 
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
>  drivers/pci/bus.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> index 35b7fc87eac5..48ae63673aa8 100644
> --- a/drivers/pci/bus.c
> +++ b/drivers/pci/bus.c
> @@ -321,16 +321,16 @@ void pci_bus_add_device(struct pci_dev *dev)
>  	pci_proc_attach_device(dev);
>  	pci_bridge_d3_update(dev);
>  
> +	dev->is_added = 1;
>  	dev->match_driver = true;
>  	retval = device_attach(&dev->dev);
>  	if (retval < 0 && retval != -EPROBE_DEFER) {
> +		dev->is_added = 0;
>  		pci_warn(dev, "device attach failed (%d)\n", retval);
>  		pci_proc_detach_device(dev);
>  		pci_remove_sysfs_dev_files(dev);
>  		return;
>  	}
> -
> -	dev->is_added = 1;
>  }
>  EXPORT_SYMBOL_GPL(pci_bus_add_device);
>  
> -- 
> 2.17.1
>
Lukas Wunner Aug. 17, 2018, 6:15 p.m. UTC | #2
On Fri, Aug 17, 2018 at 11:25:34AM -0500, Bjorn Helgaas wrote:
> The "bind driver, then set dev->added = 1" order seems to have been
> there since the beginning of dev->is_added:
> 
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8a1bc9013a03
> 
> This patch seems reasonable, but I'm a little dubious about the
> existence of "is_added" in the first place.  As far as I can tell, the
> only other buses with something similar are the MEN Chameleon bus and
> the Intel Management Engine Interface.
> 
> The PCI uses of "is_added" don't seem *that* critical or unique to
> PCI, so I'm not 100% convinced we need it at all.  But I haven't
> looked into it enough to be able to propose an alternative.

Addition of new PCI devices, e.g. on boot or on hotplug, consists of two
stages:  First, new devices are created by pci_scan_slot(), afterwards
they're bound to a driver by pci_bus_add_devices().  The idea is that
the bus is scanned in full before drivers are attached to devices.

In the second step, i.e. in pci_bus_add_devices(), the is_added flag is
set on devices.  The flag is significant because pci_scan_slot() returns
the number of newly discovered PCI functions in the given slot, and it
calculates that number based on the is_added flag.

More specifically, it invokes pci_scan_single_device() which either
returns an existing device or a newly created device.  The is_added flag
is basically a way for pci_scan_single_device() to communicate back to
pci_scan_slot() which of the two code paths it took.

The two steps (enumeration and driver attachment) are protected by
pci_lock_rescan_remove().  Initially, when a pci_dev is newly allocated
with kzalloc(), is_added is 0.  The transition from 0 to 1 happens while
under pci_lock_rescan_remove().  When that lock is released, is_added is
always 1, barring a device_attach() error, in which case it will remain
at 0.

AFAICS, there is no second mutex needed to ensure synchronization of
is_added, the existing mutex should be sufficient and the only problem
are RMW races when accessing adjacent flags. Those were correctly addressed
by Hari's patch.  Benjamin seems to be alleging that concurrency issues
exist beyond the RMW races, I fail to see them, sorry.

Thanks,

Lukas
Benjamin Herrenschmidt Aug. 18, 2018, 3:28 a.m. UTC | #3
On Fri, 2018-08-17 at 11:25 -0500, Bjorn Helgaas wrote:
> On Fri, Aug 17, 2018 at 02:48:58PM +1000, Benjamin Herrenschmidt wrote:
> > This re-fixes the bug reported by Hari Vyas <hari.vyas@broadcom.com>
> > after my revert of his commit but in a much simpler way.
> > 
> > The main issues is that is_added was being set after the driver
> > got bound and started, and thus setting it could race with other
> > changes to struct pci_dev.
> 
> The "bind driver, then set dev->added = 1" order seems to have been
> there since the beginning of dev->is_added:
> 
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8a1bc9013a03
> 
> This patch seems reasonable, but I'm a little dubious about the
> existence of "is_added" in the first place.  As far as I can tell, the
> only other buses with something similar are the MEN Chameleon bus and
> the Intel Management Engine Interface.
> 
> The PCI uses of "is_added" don't seem *that* critical or unique to
> PCI, so I'm not 100% convinced we need it at all.  But I haven't
> looked into it enough to be able to propose an alternative.

This is a whole different conversation you are taking us into :-)

is_added is currently needed for a number of reasons, mostly relating
to partial hotplug, and historically comes from the fact that we
separated the PCI probing & tree construction from the registration
with the device-model. This of course comes from the fact that the
device model didn't actually exist yet when the PCI code was
created :-)

So let's keep things separate shall we ? I'd rather fix this correctly
now, and get rid of that pesky atomic priv_flags which I think is just
going to be a long term add to the mess rather than an improvement, and
separately we can discuss whether is_added is something that can go
away, but I suspect this will come in the form of either a deeper
rework of how we do PCI probing, or simply finding a struct device/kobj
field we can use as a hint that we've added the device already for
hotplug.

> > This fixes it by setting the flag first, which also has the
> > advantage of matching the fact that we are clearing it *after*
> > unbinding in the remove path, thus the flag is now symtetric
> > and always set while the driver code is running.
> > 
> > Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > ---
> >  drivers/pci/bus.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> > index 35b7fc87eac5..48ae63673aa8 100644
> > --- a/drivers/pci/bus.c
> > +++ b/drivers/pci/bus.c
> > @@ -321,16 +321,16 @@ void pci_bus_add_device(struct pci_dev *dev)
> >  	pci_proc_attach_device(dev);
> >  	pci_bridge_d3_update(dev);
> >  
> > +	dev->is_added = 1;
> >  	dev->match_driver = true;
> >  	retval = device_attach(&dev->dev);
> >  	if (retval < 0 && retval != -EPROBE_DEFER) {
> > +		dev->is_added = 0;
> >  		pci_warn(dev, "device attach failed (%d)\n", retval);
> >  		pci_proc_detach_device(dev);
> >  		pci_remove_sysfs_dev_files(dev);
> >  		return;
> >  	}
> > -
> > -	dev->is_added = 1;
> >  }
> >  EXPORT_SYMBOL_GPL(pci_bus_add_device);
> >  
> > -- 
> > 2.17.1
> >
Benjamin Herrenschmidt Aug. 18, 2018, 3:41 a.m. UTC | #4
On Fri, 2018-08-17 at 20:15 +0200, Lukas Wunner wrote:
> 
> The two steps (enumeration and driver attachment) are protected by
> pci_lock_rescan_remove().  Initially, when a pci_dev is newly allocated
> with kzalloc(), is_added is 0.  The transition from 0 to 1 happens while
> under pci_lock_rescan_remove().  When that lock is released, is_added is
> always 1, barring a device_attach() error, in which case it will remain
> at 0.
> 
> AFAICS, there is no second mutex needed to ensure synchronization of
> is_added, the existing mutex should be sufficient and the only problem
> are RMW races when accessing adjacent flags. Those were correctly addressed
> by Hari's patch.  Benjamin seems to be alleging that concurrency issues
> exist beyond the RMW races, I fail to see them, sorry.

Im saying that fixing those issues using atomic bitops is a generally
unsafe practice even if it happens to work in a specific case.

In this one, I argue that the root problem was simply that is_added was
set in the wrong place. The "fix" from Hari touches 9 files, adds
horrible relative includes to an architecture and generally bloats
things while a single 3 liner would have been sufficient.

The current use of is_added is asymetric (it's cleared during
device_attach but set during detach) which is bogus and the entire race
stems from the fact that it is set after device_attach returns.

So setting it early is, imho, the right fix, is much simpler, and fixes
the odd imbalance we have to begin with.

Ben.
diff mbox series

Patch

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 35b7fc87eac5..48ae63673aa8 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -321,16 +321,16 @@  void pci_bus_add_device(struct pci_dev *dev)
 	pci_proc_attach_device(dev);
 	pci_bridge_d3_update(dev);
 
+	dev->is_added = 1;
 	dev->match_driver = true;
 	retval = device_attach(&dev->dev);
 	if (retval < 0 && retval != -EPROBE_DEFER) {
+		dev->is_added = 0;
 		pci_warn(dev, "device attach failed (%d)\n", retval);
 		pci_proc_detach_device(dev);
 		pci_remove_sysfs_dev_files(dev);
 		return;
 	}
-
-	dev->is_added = 1;
 }
 EXPORT_SYMBOL_GPL(pci_bus_add_device);