Message ID | 1486749440-24309-1-git-send-email-logang@deltatee.com |
---|---|
State | Not Applicable |
Headers | show |
Hi, Please don't apply this patch and instead apply the switchtec driver as we submitted in v2. As per the discussion in [1], not using the cdev's kobj parent results in incorrect reference counting and a possible use of the cdev after its containing structure is freed. I've also done a quick review around the kernel and found the pattern in question to be quite prevalent. It's used in, at least, these drivers: drivers/dax/dax.c:708 drivers/input/evdev.c:1419 drivers/input/joydev.c:908 drivers/input/mousedev.c:904 drivers/gpio/gpiolib.c:1039 drivers/char/tpm/tpm-chip.c:190 drivers/platform/chrome/cros_ec_dev.c:411 drivers/infiniband/hw/hfi1/device.c:72 drivers/infiniband/core/uverbs_main.c:1168 drivers/infiniband/core/user_mad.c:1186 drivers/infiniband/core/user_mad.c:1205 drivers/iio/industrialio-core.c:1721 drivers/media/cec/cec-core.c:140 drivers/media/media-devnode.c:258 Dan Williams has proposed a helper API in [2] that could make this code appear significantly less suspect which I think would be a good idea. However, for now, I feel the switchtec code should also follow this pattern (ie. the way it was submitted in v2) and can be changed or cleaned up when/if the above numerous examples are fixed up. Thanks, Logan [1] https://lkml.org/lkml/2017/2/10/589 [2] https://lkml.org/lkml/2017/2/13/700 On 10/02/17 10:57 AM, Logan Gunthorpe wrote: > Per a suggestion from Greg Kroah-Hartman [1], don't set the cdev's > kobject parent. This allows us to use device_register instead of init > and add. > > [1] https://lkml.org/lkml/2017/2/10/370 > --- > drivers/pci/switch/switchtec.c | 15 +++++++-------- > 1 file changed, 7 insertions(+), 8 deletions(-) > > diff --git a/drivers/pci/switch/switchtec.c b/drivers/pci/switch/switchtec.c > index 82bfd18..014eaec 100644 > --- a/drivers/pci/switch/switchtec.c > +++ b/drivers/pci/switch/switchtec.c > @@ -1222,24 +1222,23 @@ static struct switchtec_dev *stdev_create(struct pci_dev *pdev) > return ERR_PTR(minor); > > dev = &stdev->dev; > - device_initialize(dev); > dev->devt = MKDEV(MAJOR(switchtec_devt), minor); > - dev->class = switchtec_class; > - dev->parent = &pdev->dev; > - dev->groups = switchtec_device_groups; > - dev->release = stdev_release; > - dev_set_name(dev, "switchtec%d", minor); > > cdev = &stdev->cdev; > cdev_init(cdev, &switchtec_fops); > cdev->owner = THIS_MODULE; > - cdev->kobj.parent = &dev->kobj; > > rc = cdev_add(&stdev->cdev, dev->devt, 1); > if (rc) > goto err_cdev; > > - rc = device_add(dev); > + dev->class = switchtec_class; > + dev->parent = &pdev->dev; > + dev->groups = switchtec_device_groups; > + dev->release = stdev_release; > + dev_set_name(dev, "switchtec%d", minor); > + > + rc = device_register(dev); > if (rc) { > cdev_del(&stdev->cdev); > put_device(dev); >
On Sat, Feb 18, 2017 at 12:22 PM, Logan Gunthorpe <logang@deltatee.com> wrote: > Hi, > > Please don't apply this patch and instead apply the switchtec driver as > we submitted in v2. As per the discussion in [1], not using the cdev's > kobj parent results in incorrect reference counting and a possible use > of the cdev after its containing structure is freed. Is this race present for all file operations? I've only seen it with mmap() and late faults. So if these other drivers do not support mmap it's not clear they can trigger the failure.
On 19/02/17 02:43 PM, Dan Williams wrote: > Is this race present for all file operations? I've only seen it with > mmap() and late faults. So if these other drivers do not support mmap > it's not clear they can trigger the failure. I don't see how it's related to mmap only. I think there's a number of variations on this but the race I see happens if you try to take a reference to the device in the open/close handlers of a char device file. If a driver puts the last remaining reference in the release handler, then the device structure will be freed, and on returning from the release handler, the char_dev code will try to put the last reference to the cdev structure which may have already been free'd. There needs to be a way for the cdev structure to take a reference on the device structure so it doesn't get freed and the kobj.parent trick seems to accomplish this fairly well. Really, in any situation where there's a cdev and a device in the same structure, the life cycles of the two become linked but their reference counts are not and that is the problem here. I feel like a number of developers have made the same realization independently so this would probably be a good thing to clean up. See these commits for examples spanning around 5 years: 4a215aa Input: fix use-after-free introduced with dynamic minor changes 0d5b7da iio: Prevent race between IIO chardev opening and IIO device ba0ef85 tpm: Fix initialization of the cdev Also, seems both my brother and I have independently looked into this exact issue: https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg25692.html So, I think it would be a good idea to clean it up with Dan's proposed API cleanup. If I have some time tomorrow, I may throw together a patch set with that patch and the changes to the instances around the kernel. Logan
On Sun, Feb 19, 2017 at 09:22:35PM -0700, Logan Gunthorpe wrote: > Really, in any situation where there's a cdev and a device in the same > structure, the life cycles of the two become linked but their reference > counts are not and that is the problem here. Yes, the cdev must hold a kref on the containing struct otherwise userspace can trigger a use after free. This cannot be fixed with an approach inside the open/release function either as the cdev core code itself relies on the memory to exist. I've suggested something like this before: https://lkml.org/lkml/2015/7/8/1066 So I hope this will make it in, it is a step in the right direction. If it does, would you make another patch to go further? I think cdev_init should take enough arguments to hold the enclosing kref, API wise there should be no API to init a cdev without the caller specifying the enclosing struct's kref. That is the only way we will stamp this bug-class out. Eg look at kernel/time/posix-clock.c, it is wrong in the same way as well - the kref_put in posix_clock_release is not enough to make it work, clk->cdev is referenced after posix_clock_release returns by the cdev core so this has a use-after-free. Jason
diff --git a/drivers/pci/switch/switchtec.c b/drivers/pci/switch/switchtec.c index 82bfd18..014eaec 100644 --- a/drivers/pci/switch/switchtec.c +++ b/drivers/pci/switch/switchtec.c @@ -1222,24 +1222,23 @@ static struct switchtec_dev *stdev_create(struct pci_dev *pdev) return ERR_PTR(minor); dev = &stdev->dev; - device_initialize(dev); dev->devt = MKDEV(MAJOR(switchtec_devt), minor); - dev->class = switchtec_class; - dev->parent = &pdev->dev; - dev->groups = switchtec_device_groups; - dev->release = stdev_release; - dev_set_name(dev, "switchtec%d", minor); cdev = &stdev->cdev; cdev_init(cdev, &switchtec_fops); cdev->owner = THIS_MODULE; - cdev->kobj.parent = &dev->kobj; rc = cdev_add(&stdev->cdev, dev->devt, 1); if (rc) goto err_cdev; - rc = device_add(dev); + dev->class = switchtec_class; + dev->parent = &pdev->dev; + dev->groups = switchtec_device_groups; + dev->release = stdev_release; + dev_set_name(dev, "switchtec%d", minor); + + rc = device_register(dev); if (rc) { cdev_del(&stdev->cdev); put_device(dev);