[tpmdd-devel] tpm: fix a race condition tpm2_unseal_trusted()
diff mbox

Message ID 1468973792-17598-1-git-send-email-jarkko.sakkinen@linux.intel.com
State New
Headers show

Commit Message

Jarkko Sakkinen July 20, 2016, 12:16 a.m. UTC
Unseal and load operations should be done as an atomic unit. This
commit fixes the issue by moving TPM mutex handling to tpm_try_get_ops()
and tpm_put_ops(), which is probably more logical place for it anyway.

Fixes: 954650efb79f ("tpm: seal/unseal for TPM 2.0")
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
---
 drivers/char/tpm/tpm-chip.c      | 2 ++
 drivers/char/tpm/tpm-interface.c | 3 ---
 2 files changed, 2 insertions(+), 3 deletions(-)

Comments

Jason Gunthorpe July 20, 2016, 4:48 p.m. UTC | #1
On Wed, Jul 20, 2016 at 03:16:32AM +0300, Jarkko Sakkinen wrote:
> Unseal and load operations should be done as an atomic unit. This
> commit fixes the issue by moving TPM mutex handling to tpm_try_get_ops()
> and tpm_put_ops(), which is probably more logical place for it anyway.

No..

'get_ops' is to be used to hold a persisent kref to a single tpm. It
cannot block other tpm access.

Eg a upper protocol might get_ops to for a long period to ensure it
consistently talks to the same TPM in a multi-tpm system.

We need something else to solve whatever you are concerned with
here..

Jason

------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
Jarkko Sakkinen July 20, 2016, 8:53 p.m. UTC | #2
On Wed, Jul 20, 2016 at 10:48:18AM -0600, Jason Gunthorpe wrote:
> On Wed, Jul 20, 2016 at 03:16:32AM +0300, Jarkko Sakkinen wrote:
> > Unseal and load operations should be done as an atomic unit. This
> > commit fixes the issue by moving TPM mutex handling to tpm_try_get_ops()
> > and tpm_put_ops(), which is probably more logical place for it anyway.
> 
> No..
> 
> 'get_ops' is to be used to hold a persisent kref to a single tpm. It
> cannot block other tpm access.
> 
> Eg a upper protocol might get_ops to for a long period to ensure it
> consistently talks to the same TPM in a multi-tpm system.
> 
> We need something else to solve whatever you are concerned with
> here..

The only use cases I see at the moment for it work this way:

1. Call tpm_try_get_ops.
2. Send a TPM command.
3. Call tpm_put_ops.

I did not find any other form of use. The only use is to make sure that
there are no transactions running before the ops are cleared. Or did I
overlook something perhaps?

Trusted key unseal operation with TPM2 is broken into two operations:

1. Load the given key blob.
2. Unseal the data.

Without locking and unlocking mutex only once there is a race condition.

/Jarkko

------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
Jason Gunthorpe July 20, 2016, 9:13 p.m. UTC | #3
On Wed, Jul 20, 2016 at 11:53:14PM +0300, Jarkko Sakkinen wrote:

> The only use cases I see at the moment for it work this way:
> 
> 1. Call tpm_try_get_ops.
> 2. Send a TPM command.
> 3. Call tpm_put_ops.

Right, but that is just a reflection of what the in kernel users are
doing today, not necessarily what they should be doing.

We should not break the put/get semantics..

> I did not find any other form of use. The only use is to make sure that
> there are no transactions running before the ops are cleared. Or did I
> overlook something perhaps?

The put/get is intended to allow a kapi user to hold a ref to tpm
without it geting destroyed. It is not intended to be an exclusive lock.

> Trusted key unseal operation with TPM2 is broken into two operations:
> 
> 1. Load the given key blob.
> 2. Unseal the data.
> 
> Without locking and unlocking mutex only once there is a race condition.

Well, the race condition is fundamentally because we don't have key
virtualization in the kernel :|

Those sorts of compound ops should hold the tpm_mutex manually, not
through the get_ops scheme.

Jason

------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
Jarkko Sakkinen July 21, 2016, 9:02 a.m. UTC | #4
On Wed, Jul 20, 2016 at 03:13:32PM -0600, Jason Gunthorpe wrote:
> On Wed, Jul 20, 2016 at 11:53:14PM +0300, Jarkko Sakkinen wrote:
> 
> > The only use cases I see at the moment for it work this way:
> > 
> > 1. Call tpm_try_get_ops.
> > 2. Send a TPM command.
> > 3. Call tpm_put_ops.
> 
> Right, but that is just a reflection of what the in kernel users are
> doing today, not necessarily what they should be doing.
> 
> We should not break the put/get semantics..
> 
> > I did not find any other form of use. The only use is to make sure that
> > there are no transactions running before the ops are cleared. Or did I
> > overlook something perhaps?
> 
> The put/get is intended to allow a kapi user to hold a ref to tpm
> without it geting destroyed. It is not intended to be an exclusive lock.

These operations *are not* exposed to kapi. They are interal to the
driver. That's why it does not make sense speak about kapi user.

All the places where it's used it's always used in the way that you also
take the exclusive lock.

You are speaking how it could be used. I'm looking at how it's actually
used. Shouldn't implementation reflect that rather than future
prospects?

> > Trusted key unseal operation with TPM2 is broken into two operations:
> > 
> > 1. Load the given key blob.
> > 2. Unseal the data.
> > 
> > Without locking and unlocking mutex only once there is a race condition.
> 
> Well, the race condition is fundamentally because we don't have key
> virtualization in the kernel :|

I do agree and I'll share my thought about TPM2 context swapping in
LSS 2016.

> Those sorts of compound ops should hold the tpm_mutex manually, not
> through the get_ops scheme.

The next best option would be to have unlocked transmit_cmd function.

/Jarkko

------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
Jason Gunthorpe July 21, 2016, 4:25 p.m. UTC | #5
On Thu, Jul 21, 2016 at 12:02:45PM +0300, Jarkko Sakkinen wrote:
> On Wed, Jul 20, 2016 at 03:13:32PM -0600, Jason Gunthorpe wrote:
> > On Wed, Jul 20, 2016 at 11:53:14PM +0300, Jarkko Sakkinen wrote:
> > 
> > > The only use cases I see at the moment for it work this way:
> > > 
> > > 1. Call tpm_try_get_ops.
> > > 2. Send a TPM command.
> > > 3. Call tpm_put_ops.
> > 
> > Right, but that is just a reflection of what the in kernel users are
> > doing today, not necessarily what they should be doing.
> > 
> > We should not break the put/get semantics..
> > 
> > > I did not find any other form of use. The only use is to make sure that
> > > there are no transactions running before the ops are cleared. Or did I
> > > overlook something perhaps?
> > 
> > The put/get is intended to allow a kapi user to hold a ref to tpm
> > without it geting destroyed. It is not intended to be an exclusive lock.
> 
> These operations *are not* exposed to kapi. They are interal to the
> driver. That's why it does not make sense speak about kapi user.

Right now yes, but look at other subsystems and you will see
operations like that, because that is typical design pattern. When I
wrote them I made sure they could be used in that typical way.

We have issues in our kapi users with regards to hot plug and multiple
tpms. Fortunately that basically never happens, but it does indicate
the API is not sufficient..

> You are speaking how it could be used. I'm looking at how it's actually
> used. Shouldn't implementation reflect that rather than future
> prospects?

Well, there are common patterns in the kernel for how these things
work and the get/put stuff does not imply an exclusive lock. That is
why it is called get/put :)

> > Those sorts of compound ops should hold the tpm_mutex manually, not
> > through the get_ops scheme.
> 
> The next best option would be to have unlocked transmit_cmd function.

The locking needs a good scrub, it is one of the last legacy things
left now that doesn't make alot of sense, especially if we need to
lock these compound operations.

The get/put locking cleanup was just a small step toward something
more typical..

Jason

------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
Jarkko Sakkinen Aug. 9, 2016, 10:36 a.m. UTC | #6
On Thu, Jul 21, 2016 at 10:25:36AM -0600, Jason Gunthorpe wrote:
> On Thu, Jul 21, 2016 at 12:02:45PM +0300, Jarkko Sakkinen wrote:
> > On Wed, Jul 20, 2016 at 03:13:32PM -0600, Jason Gunthorpe wrote:
> > > On Wed, Jul 20, 2016 at 11:53:14PM +0300, Jarkko Sakkinen wrote:
> > > 
> > > > The only use cases I see at the moment for it work this way:
> > > > 
> > > > 1. Call tpm_try_get_ops.
> > > > 2. Send a TPM command.
> > > > 3. Call tpm_put_ops.
> > > 
> > > Right, but that is just a reflection of what the in kernel users are
> > > doing today, not necessarily what they should be doing.
> > > 
> > > We should not break the put/get semantics..
> > > 
> > > > I did not find any other form of use. The only use is to make sure that
> > > > there are no transactions running before the ops are cleared. Or did I
> > > > overlook something perhaps?
> > > 
> > > The put/get is intended to allow a kapi user to hold a ref to tpm
> > > without it geting destroyed. It is not intended to be an exclusive lock.
> > 
> > These operations *are not* exposed to kapi. They are interal to the
> > driver. That's why it does not make sense speak about kapi user.
> 
> Right now yes, but look at other subsystems and you will see
> operations like that, because that is typical design pattern. When I
> wrote them I made sure they could be used in that typical way.
> 
> We have issues in our kapi users with regards to hot plug and multiple
> tpms. Fortunately that basically never happens, but it does indicate
> the API is not sufficient..

Functionally my patch should not break anything. I understand the need
for clean up of locking but why doing this now to make the driver
non-racy would make clean up later on any harder?

I would rather think of clean up after the code is non-racy than doing a
huge clean up for racy code. Correct functionality is more important
than clean code because it has direct effect to users.

/Jarkko

------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. http://sdm.link/zohodev2dev
Jason Gunthorpe Aug. 9, 2016, 3:49 p.m. UTC | #7
On Tue, Aug 09, 2016 at 01:36:29PM +0300, Jarkko Sakkinen wrote:

> Functionally my patch should not break anything. I understand the need
> for clean up of locking but why doing this now to make the driver
> non-racy would make clean up later on any harder?

Then rename the functions so they don't confusingly look like the
standard kernel pattern that doesn't lock.

Jason

------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. http://sdm.link/zohodev2dev

Patch
diff mbox

diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
index e595013..9749f59 100644
--- a/drivers/char/tpm/tpm-chip.c
+++ b/drivers/char/tpm/tpm-chip.c
@@ -56,6 +56,7 @@  int tpm_try_get_ops(struct tpm_chip *chip)
 	if (!chip->ops)
 		goto out_lock;
 
+	mutex_lock(&chip->tpm_mutex);
 	return 0;
 out_lock:
 	up_read(&chip->ops_sem);
@@ -73,6 +74,7 @@  EXPORT_SYMBOL_GPL(tpm_try_get_ops);
  */
 void tpm_put_ops(struct tpm_chip *chip)
 {
+	mutex_unlock(&chip->tpm_mutex);
 	up_read(&chip->ops_sem);
 	put_device(&chip->dev);
 }
diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
index 1abe2d7..a2a9c36 100644
--- a/drivers/char/tpm/tpm-interface.c
+++ b/drivers/char/tpm/tpm-interface.c
@@ -350,8 +350,6 @@  ssize_t tpm_transmit(struct tpm_chip *chip, const char *buf,
 		return -E2BIG;
 	}
 
-	mutex_lock(&chip->tpm_mutex);
-
 	rc = chip->ops->send(chip, (u8 *) buf, count);
 	if (rc < 0) {
 		dev_err(&chip->dev,
@@ -393,7 +391,6 @@  out_recv:
 		dev_err(&chip->dev,
 			"tpm_transmit: tpm_recv: error %zd\n", rc);
 out:
-	mutex_unlock(&chip->tpm_mutex);
 	return rc;
 }