[tpmdd-devel,v5,4/5] Initialize TPM and get durations and timeouts
diff mbox

Message ID 20160211194810.GA24211@obsidianresearch.com
State New
Headers show

Commit Message

Jason Gunthorpe Feb. 11, 2016, 7:48 p.m. UTC
On Thu, Feb 11, 2016 at 02:10:56PM -0500, Stefan Berger wrote:

>    > Make sense. Don't change the names all the drivers would have to be
>    > churn'd. tpm_chip_alloc, tpmm_chip_alloc.
>    >
>    That's right:
>    struct tpm_chip *tpmm_chip_alloc(struct device *dev,
>                                     const struct tpm_class_ops *ops)
>    {
>            struct tpm_chip *chip;
>            chip = tpm_chip_alloc(ops);
>            if (IS_ERR(chip))
>                    return chip;
>            tpmm_chip_dev(chip, dev);

No, just call the one devm_ function here (Based this work on top of
Jarkko's patch that adds the devm function)

>    > It is probably OK just to use put_device(&chip->dev), tpm_chip_put is
>    > already taken for something else. Don't call it free, it isn't free.
>    tpm_chip_free undoes what tpm_chip_alloc did.

No, once a chip is created we want the kref to be functional, that
means it can only ever be free'd by put_device. Do not create a
tpm_chip_free.

>     * Allocates a new struct tpm_chip instance and assigns a free
>     * device number for it.
>     */
>    struct tpm_chip *tpm_chip_alloc(const struct tpm_class_ops *ops)
>    {
>            struct tpm_chip *chip;
>            chip = kzalloc(sizeof(*chip), GFP_KERNEL);
>            if (chip == NULL)
>                    return ERR_PTR(-ENOMEM);

I see, why you got confused - don't try and remove device_initialize,
it doesn't care about the parent pointer.

Move the dev_set_drvdata into tpmm_chip_alloc
Move the cdev.owner to before cdev_add

A tpm_chip should never be in a state where the kref doesn't work.

Something like this [be careful merging this on top of Jarkko's final
patch adding the devm call].

pdev could even be passed as null for tpm_chip_alloc, as long as it is
properly set before registration, but I'd probably init you vtpm
device, then alloc the chip copy the id, then register the vtpm.

>From 4b1b31c351f838d608644660eb178b4c1f92bb7c Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Date: Thu, 11 Feb 2016 12:45:48 -0700
Subject: [PATCH] tpm: Split out the devm stuff from tpmm_chip_alloc

tpm_chip_alloc becomes a typical subsystem allocate call.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
---
 drivers/char/tpm/tpm-chip.c | 46 +++++++++++++++++++++++++++++++++++----------
 drivers/char/tpm/tpm.h      |  2 ++
 2 files changed, 38 insertions(+), 10 deletions(-)

Comments

Stefan Berger Feb. 11, 2016, 10:10 p.m. UTC | #1
Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote on 02/11/2016 
02:48:10 PM:

> 
> On Thu, Feb 11, 2016 at 02:10:56PM -0500, Stefan Berger wrote:
> 
> >    > Make sense. Don't change the names all the drivers would have to 
be
> >    > churn'd. tpm_chip_alloc, tpmm_chip_alloc.
> >    >
> >    That's right:
> >    struct tpm_chip *tpmm_chip_alloc(struct device *dev,
> >                                     const struct tpm_class_ops *ops)
> >    {
> >            struct tpm_chip *chip;
> >            chip = tpm_chip_alloc(ops);
> >            if (IS_ERR(chip))
> >                    return chip;
> >            tpmm_chip_dev(chip, dev);
> 
> No, just call the one devm_ function here (Based this work on top of
> Jarkko's patch that adds the devm function)

Ok.

So here's the new patch.

https://github.com/stefanberger/linux/commit/a1037db7fb239c46449b054bfd1a1d18b54179f3

I removed dev from parameter to tpm_chip_alloc. We don't need to provide 
it here but it has to be a parameter to tpmm_chip_dev.

   Stefan
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
Jason Gunthorpe Feb. 11, 2016, 10:18 p.m. UTC | #2
On Thu, Feb 11, 2016 at 05:10:28PM -0500, Stefan Berger wrote:
>    Ok.
>    So here's the new patch.
>    [1]https://github.com/stefanberger/linux/commit/a1037db7fb239c46449b054
>    bfd1a1d18b54179f3

What is the point of tpmm_chip_dev?

Just have the vtpm driver do device_initialize, then tpm_alloc and
have the release function do put_device on the chip. No need for devm
at all

Jason

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
Stefan Berger Feb. 11, 2016, 10:26 p.m. UTC | #3
Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote on 02/11/2016 
05:18:22 PM:


> 
> On Thu, Feb 11, 2016 at 05:10:28PM -0500, Stefan Berger wrote:
> >    Ok.
> >    So here's the new patch.
> >    [1]
https://github.com/stefanberger/linux/commit/a1037db7fb239c46449b054
> >    bfd1a1d18b54179f3
> 
> What is the point of tpmm_chip_dev?

So that the usage model of the chip is the same. We get this in the 
tpm-vtpm.c with tpm_alloc_chip + tpmm_chip_dev while all others can call 
tpmm_chip_alloc, which combines the two.

> 
> Just have the vtpm driver do device_initialize, then tpm_alloc and
> have the release function do put_device on the chip. No need for devm
> at all

I would also like to have a tpm_chip_put (static inline in tpm.h ?) that 
wraps the put_device. To me this is more intuitive than calling 
put_device() as a counter-part to tpm_chip_alloc.

    Stefan


> 
> Jason
>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
Jason Gunthorpe Feb. 11, 2016, 11:56 p.m. UTC | #4
On Thu, Feb 11, 2016 at 05:26:24PM -0500, Stefan Berger wrote:

>    > What is the point of tpmm_chip_dev?
>    So that the usage model of the chip is the same. We get this in the
>    tpm-vtpm.c with tpm_alloc_chip + tpmm_chip_dev while all others can
>    call tpmm_chip_alloc, which combines the two.

No need, just don't use devm in vtpm, that is even better. The
standard devm idiom is a with and without version.

>    > Just have the vtpm driver do device_initialize, then tpm_alloc and
>    > have the release function do put_device on the chip. No need for devm
>    > at all

>    I would also like to have a tpm_chip_put (static inline in tpm.h ?)
>    that wraps the put_device. To me this is more intuitive than calling
>    put_device() as a counter-part to tpm_chip_alloc.

Many in the kernel community would call this sort of wrapping
obfuscation.. We don't have a put_platform_device, etc for
instance. Naked put_device in an error path is fine.

Jason

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
Stefan Berger Feb. 12, 2016, 3:56 a.m. UTC | #5
Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote on 02/11/2016 
06:56:11 PM:


> 
> On Thu, Feb 11, 2016 at 05:26:24PM -0500, Stefan Berger wrote:
> 
> >    > What is the point of tpmm_chip_dev?
> >    So that the usage model of the chip is the same. We get this in the
> >    tpm-vtpm.c with tpm_alloc_chip + tpmm_chip_dev while all others can
> >    call tpmm_chip_alloc, which combines the two.
> 
> No need, just don't use devm in vtpm, that is even better. The
> standard devm idiom is a with and without version.

Updated the branch. Are you going to upstream your patch? Otherwise I 
would just add your Signed-off-by to it if that's ok ?

https://github.com/stefanberger/linux/tree/vtpm-driver.v3

> 
> >    > Just have the vtpm driver do device_initialize, then tpm_alloc 
and
> >    > have the release function do put_device on the chip. No need for 
devm
> >    > at all
> 
> >    I would also like to have a tpm_chip_put (static inline in tpm.h ?)
> >    that wraps the put_device. To me this is more intuitive than 
calling
> >    put_device() as a counter-part to tpm_chip_alloc.
> 
> Many in the kernel community would call this sort of wrapping
> obfuscation.. We don't have a put_platform_device, etc for
> instance. Naked put_device in an error path is fine.

I guess it's a matter of getting used to. I still like being 'guided' by 
function names...

   Stefan

> 
> Jason
>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
Stefan Berger Feb. 12, 2016, 6:13 p.m. UTC | #6
Stefan Berger/Watson/IBM@IBMUS wrote on 02/11/2016 10:56:47 PM:

> 
> Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote on 02/11/
> 2016 06:56:11 PM:
> 
> 
> > 
> > On Thu, Feb 11, 2016 at 05:26:24PM -0500, Stefan Berger wrote:
> > 
> > >    > What is the point of tpmm_chip_dev?
> > >    So that the usage model of the chip is the same. We get this in 
the
> > >    tpm-vtpm.c with tpm_alloc_chip + tpmm_chip_dev while all others 
can
> > >    call tpmm_chip_alloc, which combines the two.
> > 
> > No need, just don't use devm in vtpm, that is even better. The
> > standard devm idiom is a with and without version.
> 
> Updated the branch. Are you going to upstream your patch? Otherwise 
> I would just add your Signed-off-by to it if that's ok ?
> 
> https://github.com/stefanberger/linux/tree/vtpm-driver.v3

I converted tpm-chip.c to use IDA as well. 

A test for 4096 vTPM instances:

for ((i = 0; i < 4096; i++)); do ./vtpmctrl & done

   Stefan
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
Jason Gunthorpe Feb. 12, 2016, 6:40 p.m. UTC | #7
On Thu, Feb 11, 2016 at 10:56:47PM -0500, Stefan Berger wrote:
>    Updated the branch. Are you going to upstream your patch? Otherwise I
>    would just add your Signed-off-by to it if that's ok ?

Yes, sorry about that. Just add it to your series with your
tested-by/reviewed-by. It will need some rebasing to go on top of
Jarkko's final patch that adds the devm_ (just delete it from
tpm_alloc and use my version for tpmm_alloc)

The other thing that has come to my mind is device removal. Right now
the tpm core is still fairly broken in that regard, and we have
largely got away with ignoring that because drviers are rarely
removed.

However, you are adding something that actually requires removal to
work and be race free, so we'll also have to fix core removal :(

The basic approach I'd suggest would be the same as what the rtc
subsystem does.

Add a mutex around 'chip->ops' and every place that
uses ops has to hold the lock.

Upon removal grab the mutex and null ops.

Every place that graps the lock for ops has to check for null. Null
means the device is being destroyed and the call must fail.

Make changes to tpm_chip_find_get to check for null, and make it do
put_device. Reconsider the use of module locking there, not sure it
makes any sense to do that if we have proper safe removal support.

Jason

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
Stefan Berger Feb. 12, 2016, 8:31 p.m. UTC | #8
Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote on 02/12/2016 
01:40:51 PM:


> 
> On Thu, Feb 11, 2016 at 10:56:47PM -0500, Stefan Berger wrote:
> >    Updated the branch. Are you going to upstream your patch? Otherwise 
I
> >    would just add your Signed-off-by to it if that's ok ?
> 
> Yes, sorry about that. Just add it to your series with your
> tested-by/reviewed-by. It will need some rebasing to go on top of
> Jarkko's final patch that adds the devm_ (just delete it from
> tpm_alloc and use my version for tpmm_alloc)
> 
> The other thing that has come to my mind is device removal. Right now
> the tpm core is still fairly broken in that regard, and we have
> largely got away with ignoring that because drviers are rarely
> removed.
> 
> However, you are adding something that actually requires removal to
> work and be race free, so we'll also have to fix core removal :(

Where is the race? I tested the following:

The tpm_vtpm module use counter increases with every server file 
descriptor being handed out, so every run of ./vtpmctrl increases it by 1.

Every opening of /dev/tpm%d (exec 1XY<>/dev/tpm%d) increases the tpm_vtpm 
module use counter also by 1.

If the vtpmctrl's die, the module use counter decreases for every vtpmctrl 
termiating. However, the use counter is still at the number of open 
/dev/tpm%d devices.

vtpm_dev and chip structures only get free'd once the file descriptor is 
close. So it looks like expected good behavior to me. We cannot remove the 
'backend' module following the usage counter increase, so this is good 
too.

     Stefan


> 
> The basic approach I'd suggest would be the same as what the rtc
> subsystem does.
> 
> Add a mutex around 'chip->ops' and every place that
> uses ops has to hold the lock.
> 
> Upon removal grab the mutex and null ops.
> 
> Every place that graps the lock for ops has to check for null. Null
> means the device is being destroyed and the call must fail.
> 
> Make changes to tpm_chip_find_get to check for null, and make it do
> put_device. Reconsider the use of module locking there, not sure it
> makes any sense to do that if we have proper safe removal support.
> 
> Jason
>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
Jason Gunthorpe Feb. 12, 2016, 8:39 p.m. UTC | #9
On Fri, Feb 12, 2016 at 03:31:13PM -0500, Stefan Berger wrote:

>    Where is the race? I tested the following:
>    The tpm_vtpm module use counter increases with every server file
>    descriptor being handed out, so every run of ./vtpmctrl increases it by
>    1.
>    Every opening of /dev/tpm%d (exec 1XY<>/dev/tpm%d) increases the
>    tpm_vtpm module use counter also by 1.
>    If the vtpmctrl's die, the module use counter decreases for every
>    vtpmctrl termiating. However, the use counter is still at the number of
>    open /dev/tpm%d devices.
>    vtpm_dev and chip structures only get free'd once the file descriptor
>    is close. So it looks like expected good behavior to me. We cannot
>    remove the 'backend' module following the usage counter increase, so
>    this is good too.

We don't expect tpm_chip_unregister to run concurrently with any
in-progress operation, that isn't properly synchronized.

So your driver will call tpm_chip_unregister and then put_device it's
structure, but the tpm_chip is still active and could still generate a
callback to vtpm code resulting in use-after-free.

Jason

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
Stefan Berger Feb. 12, 2016, 8:44 p.m. UTC | #10
Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote on 02/12/2016 
03:39:56 PM:

> 
> On Fri, Feb 12, 2016 at 03:31:13PM -0500, Stefan Berger wrote:
> 
> >    Where is the race? I tested the following:
> >    The tpm_vtpm module use counter increases with every server file
> >    descriptor being handed out, so every run of ./vtpmctrl increases 
it by
> >    1.
> >    Every opening of /dev/tpm%d (exec 1XY<>/dev/tpm%d) increases the
> >    tpm_vtpm module use counter also by 1.
> >    If the vtpmctrl's die, the module use counter decreases for every
> >    vtpmctrl termiating. However, the use counter is still at the 
number of
> >    open /dev/tpm%d devices.
> >    vtpm_dev and chip structures only get free'd once the file 
descriptor
> >    is close. So it looks like expected good behavior to me. We cannot
> >    remove the 'backend' module following the usage counter increase, 
so
> >    this is good too.
> 
> We don't expect tpm_chip_unregister to run concurrently with any
> in-progress operation, that isn't properly synchronized.
> 
> So your driver will call tpm_chip_unregister and then put_device it's
> structure, but the tpm_chip is still active and could still generate a
> callback to vtpm code resulting in use-after-free.

What I observed is that both tpm_chip and vtpm_dev structures are freed 
once the last one of two sides (/dev/tpmX or server side file descriptor) 
closes. Closing means there's no more code being executed by an 
application. So how can we have something still executing in the driver 
when both sides closed?

  Stefan
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
Jason Gunthorpe Feb. 12, 2016, 9:15 p.m. UTC | #11
On Fri, Feb 12, 2016 at 03:44:14PM -0500, Stefan Berger wrote:
>    What I observed is that both tpm_chip and vtpm_dev structures are freed
>    once the last one of two sides (/dev/tpmX or server side file
>    descriptor) closes.

Hmmm...

I don't see how that can happen. Looking at the tpm cdev, it is
continues to exist even after tpm_unregister returns (cdev_del does
not force close existing opens). Certainly the kAPI (ie
tpm_chip_find_get) will continue to use the chip without blocking
tpm_unregister.

I see no mechanism for the cdev/kAPI to continue to hold a kref on the
vtpm struct.

The obvious one would be because the vtpm struct is a parent of the
chip, but that kref is let go during device_del.

So, we have a situation where tpm_unregister can return, release the
kref on the vtpm and still have outstanding users, which will result
in a use after-free.

[BTW, your lastest vtpm on github still has a problem with error
 unwind. Move the put_device(&vtpm_dev->chip->dev);  from
 vtpm_delete_vtpm_dev() and put it in vtpm_dev_release() with a NULL
 test.

 The put_device is missing after the tpm_chip_unregister call, the
 above is the safest way to fix it.
 
 This is why you shouldn't wrapper put_device, anything but naked
 put_device is probably wrong]

[Also, err_kfree should not exist in vtpm_create_vtpm_dev, always
 put_device after device_initialize returns, the comment near the device_add
 is wrong, it is using the get_device done implicitly by
 device_initialize]

[Don't forget to error check dev_set_name]

Jason

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
Stefan Berger Feb. 12, 2016, 10:23 p.m. UTC | #12
Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote on 02/12/2016 
04:15:38 PM:

> From: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
> To: Stefan Berger/Watson/IBM@IBMUS
> Cc: dhowells@redhat.com, dwmw2@infradead.org, Jarkko Sakkinen 
> <jarkko.sakkinen@linux.intel.com>, tpmdd-devel@lists.sourceforge.net
> Date: 02/12/2016 04:15 PM
> Subject: Re: [tpmdd-devel] [PATCH v5 4/5] Initialize TPM and get 
> durations and timeouts
> 
> On Fri, Feb 12, 2016 at 03:44:14PM -0500, Stefan Berger wrote:
> >    What I observed is that both tpm_chip and vtpm_dev structures are 
freed
> >    once the last one of two sides (/dev/tpmX or server side file
> >    descriptor) closes.
> 
> Hmmm...
> 
> I don't see how that can happen. Looking at the tpm cdev, it is

Well, following my tests, it does happen.

> continues to exist even after tpm_unregister returns (cdev_del does
> not force close existing opens). Certainly the kAPI (ie
> tpm_chip_find_get) will continue to use the chip without blocking
> tpm_unregister.
> 
> I see no mechanism for the cdev/kAPI to continue to hold a kref on the
> vtpm struct.
> 
> The obvious one would be because the vtpm struct is a parent of the
> chip, but that kref is let go during device_del.
> 
> So, we have a situation where tpm_unregister can return, release the
> kref on the vtpm and still have outstanding users, which will result
> in a use after-free.

Maybe you should give it a try ... I don't see these problems and any 
change seems like ill-fixing it. What will be accessed after 
tpm_unregister? The only case we have tpm_unregister is here:

static void vtpm_delete_device_pair(struct vtpm_dev *vtpm_dev)
{
        tpm_chip_unregister(vtpm_dev->chip);

        vtpm_fops_undo_open(vtpm_dev);

        vtpm_delete_vtpm_dev(vtpm_dev);
}

followed by:

static inline void vtpm_delete_vtpm_dev(struct vtpm_dev *vtpm_dev)
{
        put_device(&vtpm_dev->chip->dev); /* frees chip */
        device_unregister(&vtpm_dev->dev); /* does put_device */
}

I don't see a problem with that.

> 
> [BTW, your lastest vtpm on github still has a problem with error
>  unwind. Move the put_device(&vtpm_dev->chip->dev);  from
>  vtpm_delete_vtpm_dev() and put it in vtpm_dev_release() with a NULL
>  test.

I tested all the error paths. The vtpm_delete_vtpm_dev is the counter-part 
to vtpm_create_vtpm_dev and only ever gets called if vtpm_create_vtpm_dev 
ran successfully and is the function to call to unwind 
vtpm_create_vtpm_dev. So if we unwind there in reverse order then we 
should be ok. If not, why not ?

> 
>  The put_device is missing after the tpm_chip_unregister call, the
>  above is the safest way to fix it.

No, it's the vtpm_delete_vtpm_dev function that unwinds it. I tested this 
and structures get freed properly.

> 
>  This is why you shouldn't wrapper put_device, anything but naked
>  put_device is probably wrong]
> 
> [Also, err_kfree should not exist in vtpm_create_vtpm_dev, always
>  put_device after device_initialize returns, the comment near the 
device_add
>  is wrong, it is using the get_device done implicitly by
>  device_initialize]

Fixed.
> 
> [Don't forget to error check dev_set_name]

Fixed. Looks like 80% of the drivers don't check here.

   Stefan
> 
> Jason
>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
Jason Gunthorpe Feb. 12, 2016, 10:46 p.m. UTC | #13
On Fri, Feb 12, 2016 at 05:23:16PM -0500, Stefan Berger wrote:

>    > I don't see how that can happen. Looking at the tpm cdev, it is

>    Well, following my tests, it does happen.

How are you testing for use-after-free bugs? Did you test the kapi
interface? Did you get it in *exactly* the right configuration to
cause this?

>    > So, we have a situation where tpm_unregister can return, release the
>    > kref on the vtpm and still have outstanding users, which will result
>    > in a use after-free.

>    Maybe you should give it a try ... I don't see these problems and any
>    change seems like ill-fixing it. What will be accessed after
>    tpm_unregister? The only case we have tpm_unregister is here:

>    static void vtpm_delete_device_pair(structvtpm_dev *vtpm_dev)
>    {
>            tpm_chip_unregister(vtpm_dev->chip);
>            vtpm_fops_undo_open(vtpm_dev);
>            vtpm_delete_vtpm_dev(vtpm_dev);
>    }

>    followed by:

>    static inline voidvtpm_delete_vtpm_dev(struct vtpm_dev *vtpm_dev)
>    {
>            put_device(&vtpm_dev->chip->dev); /* frees chip */
>            device_unregister(&vtpm_dev->dev); /* does put_device */
>    }

>    I don't see a problem with that.

device_unregister will put_device(vtpmt_dev) which will kfree it since
no refs are left.

tpm_chip is still alive because the cdev/kapi are still holding a ref
on it.

The cdev/kapi can still call vtpm_tpm_op_send which will then deref
chip->priv which is the free'd vtpm_dev.

Any attempt to test for this scenario is very likely to hit the
STATE_OPENED_BIT test and exit without crashing. However that is just
a user-after free getting lucky. Testing is not a substitute for
proper analysis.

Show me an analysis of what in cdev/kapi is holding the kref on
vtpm_dev if you still think this can't happen.

As I said, the only kref the tpm core takes on vtpm_dev is dropped
by tpm_chip_unregister.

Jason

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
Stefan Berger Feb. 12, 2016, 10:47 p.m. UTC | #14
Stefan Berger/Watson/IBM@IBMUS wrote on 02/12/2016 05:23:16 PM:

> 
> Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote on 02/12/
> 2016 04:15:38 PM:

> > 
> > On Fri, Feb 12, 2016 at 03:44:14PM -0500, Stefan Berger wrote:
> > >    What I observed is that both tpm_chip and vtpm_dev structuresare 
freed
> > >    once the last one of two sides (/dev/tpmX or server side file
> > >    descriptor) closes.
> > 
> > Hmmm...
> > 
> > I don't see how that can happen. Looking at the tpm cdev, it is
> 
> Well, following my tests, it does happen.

Also I am zeroing tpm_chip and vtpm_dev structures before the free. 
Nothing bad happens in any combination of device opening / closing tests I 
did.

    Stefan
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
Stefan Berger Feb. 12, 2016, 11:14 p.m. UTC | #15
Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote on 02/12/2016 
05:46:42 PM:

> 
> On Fri, Feb 12, 2016 at 05:23:16PM -0500, Stefan Berger wrote:
> 
> >    > I don't see how that can happen. Looking at the tpm cdev, it is
> 
> >    Well, following my tests, it does happen.
> 
> How are you testing for use-after-free bugs? Did you test the kapi
> interface? Did you get it in *exactly* the right configuration to
> cause this?

I think there are two cases that matter for the final unwind:

1) server side closes last
2) client side closes last

Any other cases?

Here's what happens in these cases:

1) we unwind with vtpm_delete_device_pair + vtpm_delete_vtpm_dev [see 
below]. The chip an vtpm_dev are freed, in that order by just these 
functions.

2) we unwind in tpm_release and the following kicks it off:

http://lxr.free-electrons.com/source/drivers/char/tpm/tpm-dev.c#L169

169         put_device(priv->chip->pdev);
170         kfree(priv);
171         return 0;
172 }
173 

The above put_device (in the front-end) releases vtpm_dev first and then 
after tpm_release finishes the chip is released (following the cdev's 
end). 

Once tpm_release happened, we closed the file descriptor and there's no 
more code doing other business in either the front or backend drivers 
other than unwinding.

   Stefan

> 
> >    > So, we have a situation where tpm_unregister can return, release 
the
> >    > kref on the vtpm and still have outstanding users, which will 
result
> >    > in a use after-free.
> 
> >    Maybe you should give it a try ... I don't see these problems and 
any
> >    change seems like ill-fixing it. What will be accessed after
> >    tpm_unregister? The only case we have tpm_unregister is here:
> 
> >    static void vtpm_delete_device_pair(structvtpm_dev *vtpm_dev)
> >    {
> >            tpm_chip_unregister(vtpm_dev->chip);
> >            vtpm_fops_undo_open(vtpm_dev);
> >            vtpm_delete_vtpm_dev(vtpm_dev);
> >    }
> 
> >    followed by:
> 
> >    static inline voidvtpm_delete_vtpm_dev(struct vtpm_dev *vtpm_dev)
> >    {
> >            put_device(&vtpm_dev->chip->dev); /* frees chip */
> >            device_unregister(&vtpm_dev->dev); /* does put_device */
> >    }
> 
> >    I don't see a problem with that.
> 
> device_unregister will put_device(vtpmt_dev) which will kfree it since
> no refs are left.
> 
> tpm_chip is still alive because the cdev/kapi are still holding a ref
> on it.
> 
> The cdev/kapi can still call vtpm_tpm_op_send which will then deref
> chip->priv which is the free'd vtpm_dev.
> 
> Any attempt to test for this scenario is very likely to hit the
> STATE_OPENED_BIT test and exit without crashing. However that is just
> a user-after free getting lucky. Testing is not a substitute for
> proper analysis.
> 
> Show me an analysis of what in cdev/kapi is holding the kref on
> vtpm_dev if you still think this can't happen.
> 
> As I said, the only kref the tpm core takes on vtpm_dev is dropped
> by tpm_chip_unregister.
> 
> Jason
>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
Jason Gunthorpe Feb. 12, 2016, 11:21 p.m. UTC | #16
On Fri, Feb 12, 2016 at 06:14:42PM -0500, Stefan Berger wrote:
>    169        put_device(priv->chip->pdev);
>    170        kfree(priv);

Okay, I thought I took that bogus thing out a long time ago, but that
is why you can't show this with the cdev.

But as I keep saying the kapi doesn't have anything like that.

Jason

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140

Patch
diff mbox

diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
index 999cbd9b388c..8134003b023c 100644
--- a/drivers/char/tpm/tpm-chip.c
+++ b/drivers/char/tpm/tpm-chip.c
@@ -77,15 +77,15 @@  static void tpm_dev_release(struct device *dev)
 /**
  * tpmm_chip_alloc() - allocate a new struct tpm_chip instance
  * @dev: device to which the chip is associated
+ *       At this point pdev must be initalized, but does not have to be
+ *       registered.
  * @ops: struct tpm_class_ops instance
  *
  * Allocates a new struct tpm_chip instance and assigns a free
- * device number for it. Caller does not have to worry about
- * freeing the allocated resources. When the devices is removed
- * devres calls tpmm_chip_remove() to do the job.
+ * device number for it. Must be pair'd with put_device(&chip->dev).
  */
-struct tpm_chip *tpmm_chip_alloc(struct device *dev,
-				 const struct tpm_class_ops *ops)
+struct tpm_chip *tpm_chip_alloc(struct device *dev,
+				const struct tpm_class_ops *ops)
 {
 	struct tpm_chip *chip;
 
@@ -109,13 +109,12 @@  struct tpm_chip *tpmm_chip_alloc(struct device *dev,
 	}
 
 	set_bit(chip->dev_num, dev_mask);
+	device_initialize(&chip->dev);
 
 	scnprintf(chip->devname, sizeof(chip->devname), "tpm%d", chip->dev_num);
 
 	chip->pdev = dev;
 
-	dev_set_drvdata(dev, chip);
-
 	chip->dev.class = tpm_class;
 	chip->dev.release = tpm_dev_release;
 	chip->dev.parent = chip->pdev;
@@ -130,20 +129,47 @@  struct tpm_chip *tpmm_chip_alloc(struct device *dev,
 
 	dev_set_name(&chip->dev, "%s", chip->devname);
 
-	device_initialize(&chip->dev);
-
 	cdev_init(&chip->cdev, &tpm_fops);
-	chip->cdev.owner = chip->pdev->driver->owner;
 	chip->cdev.kobj.parent = &chip->dev.kobj;
 
 	return chip;
 }
+EXPORT_SYMBOL_GPL(tpm_chip_alloc);
+
+/**
+ * tpmm_chip_alloc() - allocate a new struct tpm_chip instance
+ * @dev: device to which the chip is associated
+ *       Must be fully registered and able to handle devm.
+ * @ops: struct tpm_class_ops instance
+ *
+ * Same as tpm_chip_alloc except devm is used to do the put_device
+ */
+struct tpm_chip *tpmm_chip_alloc(struct device *pdev,
+				 const struct tpm_class_ops *ops)
+{
+	struct tpm_chip *chip;
+	int rc;
+
+	chip = tpm_chip_alloc(pdev, ops);
+	if (IS_ERR(chip))
+		return chip;
+	rc = devm_add_action(pdev, (void (*)(void *)) put_device, &chip->dev);
+	if (rc) {
+		put_device(&chip->dev);
+		return ERR_PTR(rc);
+	}
+
+	dev_set_drvdata(pdev, chip);
+
+	return chip;
+}
 EXPORT_SYMBOL_GPL(tpmm_chip_alloc);
 
 static int tpm_dev_add_device(struct tpm_chip *chip)
 {
 	int rc;
 
+	chip->cdev.owner = chip->pdev->driver->owner;
 	rc = cdev_add(&chip->cdev, chip->dev.devt, 1);
 	if (rc) {
 		dev_err(&chip->dev,
diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index 2fd5ee2ed7ad..3a5452f09b96 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -508,6 +508,8 @@  extern int wait_for_tpm_stat(struct tpm_chip *, u8, unsigned long,
 			     wait_queue_head_t *, bool);
 
 struct tpm_chip *tpm_chip_find_get(int chip_num);
+extern struct tpm_chip *tpm_chip_alloc(struct device *dev,
+				       const struct tpm_class_ops *ops);
 extern struct tpm_chip *tpmm_chip_alloc(struct device *dev,
 				       const struct tpm_class_ops *ops);
 extern int tpm_chip_register(struct tpm_chip *chip);