diff mbox

switchtec: cleanup cdev init

Message ID 1486749440-24309-1-git-send-email-logang@deltatee.com
State Not Applicable
Headers show

Commit Message

Logan Gunthorpe Feb. 10, 2017, 5:57 p.m. UTC
Per a suggestion from Greg Kroah-Hartman [1], don't set the cdev's
kobject parent. This allows us to use device_register instead of init
and add.

[1] https://lkml.org/lkml/2017/2/10/370
---
 drivers/pci/switch/switchtec.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

Comments

Logan Gunthorpe Feb. 18, 2017, 8:22 p.m. UTC | #1
Hi,

Please don't apply this patch and instead apply the switchtec driver as
we submitted in v2. As per the discussion in [1], not using the cdev's
kobj parent results in incorrect reference counting and a possible use
of the cdev after its containing structure is freed.

I've also done a quick review around the kernel and found the pattern in
question to be quite prevalent. It's used in, at least, these drivers:

drivers/dax/dax.c:708
drivers/input/evdev.c:1419
drivers/input/joydev.c:908
drivers/input/mousedev.c:904
drivers/gpio/gpiolib.c:1039
drivers/char/tpm/tpm-chip.c:190
drivers/platform/chrome/cros_ec_dev.c:411
drivers/infiniband/hw/hfi1/device.c:72
drivers/infiniband/core/uverbs_main.c:1168
drivers/infiniband/core/user_mad.c:1186
drivers/infiniband/core/user_mad.c:1205
drivers/iio/industrialio-core.c:1721
drivers/media/cec/cec-core.c:140
drivers/media/media-devnode.c:258

Dan Williams has proposed a helper API in [2] that could make this code
appear significantly less suspect which I think would be a good idea.
However, for now, I feel the switchtec code should also follow this
pattern (ie. the way it was submitted in v2) and can be changed or
cleaned up when/if the above numerous examples are fixed up.

Thanks,

Logan

[1] https://lkml.org/lkml/2017/2/10/589
[2] https://lkml.org/lkml/2017/2/13/700




On 10/02/17 10:57 AM, Logan Gunthorpe wrote:
> Per a suggestion from Greg Kroah-Hartman [1], don't set the cdev's
> kobject parent. This allows us to use device_register instead of init
> and add.
> 
> [1] https://lkml.org/lkml/2017/2/10/370
> ---
>  drivers/pci/switch/switchtec.c | 15 +++++++--------
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/pci/switch/switchtec.c b/drivers/pci/switch/switchtec.c
> index 82bfd18..014eaec 100644
> --- a/drivers/pci/switch/switchtec.c
> +++ b/drivers/pci/switch/switchtec.c
> @@ -1222,24 +1222,23 @@ static struct switchtec_dev *stdev_create(struct pci_dev *pdev)
>  		return ERR_PTR(minor);
>  
>  	dev = &stdev->dev;
> -	device_initialize(dev);
>  	dev->devt = MKDEV(MAJOR(switchtec_devt), minor);
> -	dev->class = switchtec_class;
> -	dev->parent = &pdev->dev;
> -	dev->groups = switchtec_device_groups;
> -	dev->release = stdev_release;
> -	dev_set_name(dev, "switchtec%d", minor);
>  
>  	cdev = &stdev->cdev;
>  	cdev_init(cdev, &switchtec_fops);
>  	cdev->owner = THIS_MODULE;
> -	cdev->kobj.parent = &dev->kobj;
>  
>  	rc = cdev_add(&stdev->cdev, dev->devt, 1);
>  	if (rc)
>  		goto err_cdev;
>  
> -	rc = device_add(dev);
> +	dev->class = switchtec_class;
> +	dev->parent = &pdev->dev;
> +	dev->groups = switchtec_device_groups;
> +	dev->release = stdev_release;
> +	dev_set_name(dev, "switchtec%d", minor);
> +
> +	rc = device_register(dev);
>  	if (rc) {
>  		cdev_del(&stdev->cdev);
>  		put_device(dev);
>
Dan Williams Feb. 19, 2017, 9:43 p.m. UTC | #2
On Sat, Feb 18, 2017 at 12:22 PM, Logan Gunthorpe <logang@deltatee.com> wrote:
> Hi,
>
> Please don't apply this patch and instead apply the switchtec driver as
> we submitted in v2. As per the discussion in [1], not using the cdev's
> kobj parent results in incorrect reference counting and a possible use
> of the cdev after its containing structure is freed.

Is this race present for all file operations? I've only seen it with
mmap() and late faults. So if these other drivers do not support mmap
it's not clear they can trigger the failure.
Logan Gunthorpe Feb. 20, 2017, 4:22 a.m. UTC | #3
On 19/02/17 02:43 PM, Dan Williams wrote:
> Is this race present for all file operations? I've only seen it with
> mmap() and late faults. So if these other drivers do not support mmap
> it's not clear they can trigger the failure.

I don't see how it's related to mmap only. I think there's a number of
variations on this but the race I see happens if you try to take a
reference to the device in the open/close handlers of a char device file.

If a driver puts the last remaining reference in the release handler,
then the device structure will be freed, and on returning from the
release handler, the char_dev code will try to put the last reference to
the cdev structure which may have already been free'd. There needs to be
a way for the cdev structure to take a reference on the device structure
so it doesn't get freed and the kobj.parent trick seems to accomplish
this fairly well.

Really, in any situation where there's a cdev and a device in the same
structure, the life cycles of the two become linked but their reference
counts are not and that is the problem here.

I feel like a number of developers have made the same realization
independently so this would probably be a good thing to clean up.

See these commits for examples spanning around 5 years:

4a215aa Input: fix use-after-free introduced with dynamic minor changes
0d5b7da iio: Prevent race between IIO chardev opening and IIO device
ba0ef85 tpm: Fix initialization of the cdev

Also, seems both my brother and I have independently looked into this
exact issue:

https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg25692.html

So, I think it would be a good idea to clean it up with Dan's proposed
API cleanup. If I have some time tomorrow, I may throw together a patch
set with that patch and the changes to the instances around the kernel.

Logan
Jason Gunthorpe Feb. 21, 2017, 6:37 p.m. UTC | #4
On Sun, Feb 19, 2017 at 09:22:35PM -0700, Logan Gunthorpe wrote:

> Really, in any situation where there's a cdev and a device in the same
> structure, the life cycles of the two become linked but their reference
> counts are not and that is the problem here.

Yes, the cdev must hold a kref on the containing struct otherwise
userspace can trigger a use after free. This cannot be fixed with an
approach inside the open/release function either as the cdev core code
itself relies on the memory to exist.

I've suggested something like this before:

https://lkml.org/lkml/2015/7/8/1066

So I hope this will make it in, it is a step in the right direction.

If it does, would you make another patch to go further? I think
cdev_init should take enough arguments to hold the enclosing kref, API
wise there should be no API to init a cdev without the caller
specifying the enclosing struct's kref. That is the only way we will
stamp this bug-class out.

Eg look at kernel/time/posix-clock.c, it is wrong in the same way as
well - the kref_put in posix_clock_release is not enough to make it
work, clk->cdev is referenced after posix_clock_release returns by the
cdev core so this has a use-after-free.

Jason
diff mbox

Patch

diff --git a/drivers/pci/switch/switchtec.c b/drivers/pci/switch/switchtec.c
index 82bfd18..014eaec 100644
--- a/drivers/pci/switch/switchtec.c
+++ b/drivers/pci/switch/switchtec.c
@@ -1222,24 +1222,23 @@  static struct switchtec_dev *stdev_create(struct pci_dev *pdev)
 		return ERR_PTR(minor);
 
 	dev = &stdev->dev;
-	device_initialize(dev);
 	dev->devt = MKDEV(MAJOR(switchtec_devt), minor);
-	dev->class = switchtec_class;
-	dev->parent = &pdev->dev;
-	dev->groups = switchtec_device_groups;
-	dev->release = stdev_release;
-	dev_set_name(dev, "switchtec%d", minor);
 
 	cdev = &stdev->cdev;
 	cdev_init(cdev, &switchtec_fops);
 	cdev->owner = THIS_MODULE;
-	cdev->kobj.parent = &dev->kobj;
 
 	rc = cdev_add(&stdev->cdev, dev->devt, 1);
 	if (rc)
 		goto err_cdev;
 
-	rc = device_add(dev);
+	dev->class = switchtec_class;
+	dev->parent = &pdev->dev;
+	dev->groups = switchtec_device_groups;
+	dev->release = stdev_release;
+	dev_set_name(dev, "switchtec%d", minor);
+
+	rc = device_register(dev);
 	if (rc) {
 		cdev_del(&stdev->cdev);
 		put_device(dev);