diff mbox

[tpmdd-devel,v3,09/11] tpm: Driver for supporting multiple emulated TPMs

Message ID 1455885728-10315-10-git-send-email-stefanb@linux.vnet.ibm.com
State New
Headers show

Commit Message

Stefan Berger Feb. 19, 2016, 12:42 p.m. UTC
This patch implements a driver for supporting multiple emulated TPMs in a
system.

The driver implements a device /dev/vtpmx that is used to created
a client device pair /dev/tpmX (e.g., /dev/tpm10) and a server side that
is accessed using a file descriptor returned by an ioctl.
The device /dev/tpmX is the usual TPM device created by the core TPM
driver. Applications or kernel subsystems can send TPM commands to it
and the corresponding server-side file descriptor receives these
commands and delivers them to an emulated TPM.

Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
---
 drivers/char/tpm/Kconfig    |  10 +
 drivers/char/tpm/Makefile   |   1 +
 drivers/char/tpm/tpm-vtpm.c | 543 ++++++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/Kbuild   |   1 +
 include/uapi/linux/vtpm.h   |  38 ++++
 5 files changed, 593 insertions(+)
 create mode 100644 drivers/char/tpm/tpm-vtpm.c
 create mode 100644 include/uapi/linux/vtpm.h

Comments

Jason Gunthorpe Feb. 22, 2016, 7:27 p.m. UTC | #1
On Fri, Feb 19, 2016 at 07:42:06AM -0500, Stefan Berger wrote:

> +#define VTPM_NUM_DEVICES TPM_NUM_DEVICES

Never used

> +	rc = copy_to_user(buf, vtpm_dev->buffer, len);
> +	memset(vtpm_dev->buffer, 0, len);
> +	vtpm_dev->req_len = 0;
> +
> +	spin_unlock(&vtpm_dev->buf_lock);

No, do not call copy_to_user in a spin lock, (or copy_from_user)

> +	chip->vendor.irq = 1;

Do not set this, I am trying to remove it..

Overall, I think everything has turned out very nice indeed.

Jason

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
Stefan Berger Feb. 23, 2016, 1:45 a.m. UTC | #2
Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote on 02/22/2016 
02:27:41 PM:


> Subject: Re: [tpmdd-devel] [PATCH v3 09/11] tpm: Driver for 
> supporting multiple emulated TPMs
> 
> On Fri, Feb 19, 2016 at 07:42:06AM -0500, Stefan Berger wrote:
> 
> > +#define VTPM_NUM_DEVICES TPM_NUM_DEVICES
> 
> Never used
> 

Dropped.

> > +   rc = copy_to_user(buf, vtpm_dev->buffer, len);
> > +   memset(vtpm_dev->buffer, 0, len);
> > +   vtpm_dev->req_len = 0;
> > +
> > +   spin_unlock(&vtpm_dev->buf_lock);
> 
> No, do not call copy_to_user in a spin lock, (or copy_from_user)

I suppose we don't copy into an intermediary buffer, right? Just copy 
without the protection then??


> 
> > +   chip->vendor.irq = 1;
> 
> Do not set this, I am trying to remove it..

dropped.

> 
> Overall, I think everything has turned out very nice indeed.

Thanks. I could not race it anymore with lots of concurrency, so a bit 
hesitant about the reordering of the IDR stuff.

Two things:
- the ioctl takes flags; should we return an error on flags that are not 
supported but set by userspace?
- the sysfs works but I wished we could give some control over whether it 
shows any entries. Can we have a flag in the ioctl on whether to show 
these files in sysfs?

  Stefan
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
Jason Gunthorpe Feb. 23, 2016, 2:17 a.m. UTC | #3
On Mon, Feb 22, 2016 at 08:45:51PM -0500, Stefan Berger wrote:

> Two things:
> - the ioctl takes flags; should we return an error on flags that are not
> supported but set by userspace?

Typically yes. Otherwise you cannot introduce new flags in
future.

> - the sysfs works but I wished we could give some control over whether it shows
> any entries. Can we have a flag in the ioctl on whether to show these files in
> sysfs?

That is something to address in the future namespace patch series I
expect you'll prepare..

Jason

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
Jarkko Sakkinen Feb. 23, 2016, 10:22 a.m. UTC | #4
On Fri, Feb 19, 2016 at 07:42:06AM -0500, Stefan Berger wrote:
> This patch implements a driver for supporting multiple emulated TPMs in a
> system.
> 
> The driver implements a device /dev/vtpmx that is used to created
> a client device pair /dev/tpmX (e.g., /dev/tpm10) and a server side that
> is accessed using a file descriptor returned by an ioctl.
> The device /dev/tpmX is the usual TPM device created by the core TPM
> driver. Applications or kernel subsystems can send TPM commands to it
> and the corresponding server-side file descriptor receives these
> commands and delivers them to an emulated TPM.
> 
> Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
> ---
>  drivers/char/tpm/Kconfig    |  10 +
>  drivers/char/tpm/Makefile   |   1 +
>  drivers/char/tpm/tpm-vtpm.c | 543 ++++++++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/Kbuild   |   1 +
>  include/uapi/linux/vtpm.h   |  38 ++++
>  5 files changed, 593 insertions(+)
>  create mode 100644 drivers/char/tpm/tpm-vtpm.c
>  create mode 100644 include/uapi/linux/vtpm.h
> 
> diff --git a/drivers/char/tpm/Kconfig b/drivers/char/tpm/Kconfig
> index 3b84a8b..4c4e843 100644
> --- a/drivers/char/tpm/Kconfig
> +++ b/drivers/char/tpm/Kconfig
> @@ -122,5 +122,15 @@ config TCG_CRB
>  	  from within Linux.  To compile this driver as a module, choose
>  	  M here; the module will be called tpm_crb.
>  
> +config TCG_VTPM
> +	tristate "VTPM Interface"
> +	depends on TCG_TPM
> +	---help---
> +	  This driver supports an emulated TPM (vTPM) running in userspace.
> +	  A device /dev/vtpmx is provided that creates a device pair
> +	  /dev/vtpmX and a server-side file descriptor on which the vTPM
> +	  can receive commands.
> +
> +
>  source "drivers/char/tpm/st33zp24/Kconfig"
>  endif # TCG_TPM
> diff --git a/drivers/char/tpm/Makefile b/drivers/char/tpm/Makefile
> index 56e8f1f..d947db2 100644
> --- a/drivers/char/tpm/Makefile
> +++ b/drivers/char/tpm/Makefile
> @@ -23,3 +23,4 @@ obj-$(CONFIG_TCG_IBMVTPM) += tpm_ibmvtpm.o
>  obj-$(CONFIG_TCG_TIS_ST33ZP24) += st33zp24/
>  obj-$(CONFIG_TCG_XEN) += xen-tpmfront.o
>  obj-$(CONFIG_TCG_CRB) += tpm_crb.o
> +obj-$(CONFIG_TCG_VTPM) += tpm-vtpm.o
> diff --git a/drivers/char/tpm/tpm-vtpm.c b/drivers/char/tpm/tpm-vtpm.c
> new file mode 100644
> index 0000000..823c94a
> --- /dev/null
> +++ b/drivers/char/tpm/tpm-vtpm.c
> @@ -0,0 +1,543 @@
> +/*
> + * Copyright (C) 2015, 2016 IBM Corporation
> + *
> + * Author: Stefan Berger <stefanb@us.ibm.com>
> + *
> + * Maintained by: <tpmdd-devel@lists.sourceforge.net>
> + *
> + * Device driver for vTPM.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation, version 2 of the
> + * License.
> + *
> + */
> +
> +#include <linux/types.h>
> +#include <linux/spinlock.h>
> +#include <linux/uaccess.h>
> +#include <linux/wait.h>
> +#include <linux/miscdevice.h>
> +#include <linux/vtpm.h>
> +#include <linux/file.h>
> +#include <linux/anon_inodes.h>
> +#include <linux/poll.h>
> +#include <linux/compat.h>
> +
> +#include "tpm.h"
> +
> +#define VTPM_NUM_DEVICES TPM_NUM_DEVICES
> +
> +struct vtpm_dev {
> +	struct tpm_chip *chip;
> +
> +	u32 flags;                   /* public API flags */
> +
> +	long state;
> +#define STATE_OPENED_BIT        0
> +#define STATE_WAIT_RESPONSE_BIT 1    /* waiting for emulator to give response */

I'd prefer something like this before declaring the struct:

enum vtpm_dev_states {
	VTPM_DEV_OPENED			= BIT(0),
	VTPM_DEV_WAITING_FOR_RESPONSE	= BIT(1),
};

This whole use of set/clear_bit macros when you don't have variable
number of bits just makes code less transparent.

> +
> +	spinlock_t buf_lock;         /* lock for buffers */
> +
> +	wait_queue_head_t wq;
> +
> +	size_t req_len;              /* length of queued TPM request */
> +	size_t resp_len;             /* length of queued TPM response */
> +	u8 buffer[TPM_BUFSIZE];      /* request/response buffer */

I'd use alloc_page() with GFP_USERHIGH in order to be a better citizen
in 32-bit environments. You can kmap() it when you need it.

> +};
> +
> +
> +static void vtpm_delete_device_pair(struct vtpm_dev *vtpm_dev);
> +
> +/*
> + * Functions related to 'server side'
> + */
> +
> +/**
> + * vtpm_fops_read - Read TPM commands on 'server side'
> + *
> + * Return value:
> + *	Number of bytes read or negative error code
> + */
> +static ssize_t vtpm_fops_read(struct file *filp, char __user *buf,
> +			      size_t count, loff_t *off)
> +{
> +	struct vtpm_dev *vtpm_dev = filp->private_data;
> +	size_t len;
> +	int sig, rc;
> +
> +	sig = wait_event_interruptible(vtpm_dev->wq, vtpm_dev->req_len != 0);
> +	if (sig)
> +		return -EINTR;
> +
> +	spin_lock(&vtpm_dev->buf_lock);
> +
> +	len = vtpm_dev->req_len;
> +
> +	if (count < len) {
> +		spin_unlock(&vtpm_dev->buf_lock);
> +		pr_debug("Invalid size in recv: count=%zd, req_len=%zd\n",
> +			 count, len);
> +		return -EIO;
> +	}
> +
> +	rc = copy_to_user(buf, vtpm_dev->buffer, len);
> +	memset(vtpm_dev->buffer, 0, len);
> +	vtpm_dev->req_len = 0;

I saw already Jason's comment but maybe you could just use a mutex
instead of a spinlock?

> +
> +	spin_unlock(&vtpm_dev->buf_lock);
> +
> +	if (rc)
> +		return -EFAULT;
> +
> +	set_bit(STATE_WAIT_RESPONSE_BIT, &vtpm_dev->state);

vtpm_dev->state |= VTPM_DEV_WAIT_FOR_RESPONSE;

> +
> +	return len;
> +}
> +
> +/**
> + * vtpm_fops_write - Write TPM responses on 'server side'
> + *
> + * Return value:
> + *	Number of bytes read or negative error value
> + */
> +static ssize_t vtpm_fops_write(struct file *filp, const char __user *buf,
> +			       size_t count, loff_t *off)
> +{
> +	struct vtpm_dev *vtpm_dev = filp->private_data;
> +
> +	if (count > sizeof(vtpm_dev->buffer) ||
> +	    !test_bit(STATE_WAIT_RESPONSE_BIT, &vtpm_dev->state))
> +		return -EIO;
> +
> +	clear_bit(STATE_WAIT_RESPONSE_BIT, &vtpm_dev->state);

vtpm_dev->state &= ~VTPM_DEV_WAIT_FOR_RESPONSE;

> +
> +	spin_lock(&vtpm_dev->buf_lock);
> +
> +	vtpm_dev->req_len = 0;
> +
> +	if (copy_from_user(vtpm_dev->buffer, buf, count)) {
> +		spin_unlock(&vtpm_dev->buf_lock);
> +		return -EFAULT;
> +	}
> +
> +	vtpm_dev->resp_len = count;
> +
> +	spin_unlock(&vtpm_dev->buf_lock);
> +
> +	wake_up_interruptible(&vtpm_dev->wq);
> +
> +	return count;
> +}
> +
> +/*
> + * vtpm_fops_poll: Poll status on 'server side'
> + *
> + * Return value:
> + *      Poll flags
> + */
> +static unsigned int vtpm_fops_poll(struct file *filp, poll_table *wait)
> +{
> +	struct vtpm_dev *vtpm_dev = filp->private_data;
> +	unsigned ret;
> +
> +	poll_wait(filp, &vtpm_dev->wq, wait);
> +
> +	ret = POLLOUT;
> +	if (vtpm_dev->req_len)
> +		ret |= POLLIN | POLLRDNORM;
> +
> +	return ret;
> +}
> +
> +/*
> + * vtpm_fops_open - Open vTPM device on 'server side'
> + *
> + * Called when setting up the anonymous file descriptor
> + */
> +static void vtpm_fops_open(struct file *filp)
> +{
> +	struct vtpm_dev *vtpm_dev = filp->private_data;
> +
> +	set_bit(STATE_OPENED_BIT, &vtpm_dev->state);

vtpm_dev->state |= VTPM_DEV_OPENED;

> +}
> +
> +/**
> + * vtpm_fops_undo_open - counter-part to vtpm_fops_open
> + *
> + * Call to undo vtpm_fops_open
> + */
> +static void vtpm_fops_undo_open(struct vtpm_dev *vtpm_dev)
> +{
> +	clear_bit(STATE_OPENED_BIT, &vtpm_dev->state);


vtpm_dev->state &= ~VTPM_DEV_OPENED;

> +
> +	/* no more TPM responses -- wake up anyone waiting for them */
> +	wake_up_interruptible(&vtpm_dev->wq);
> +}
> +
> +/*
> + * vtpm_fops_release: Close 'server side'
> + *
> + * Return value:
> + *      Always returns 0.
> + */
> +static int vtpm_fops_release(struct inode *inode, struct file *filp)
> +{
> +	struct vtpm_dev *vtpm_dev = filp->private_data;
> +
> +	filp->private_data = NULL;
> +
> +	vtpm_delete_device_pair(vtpm_dev);
> +
> +	return 0;
> +}
> +
> +static const struct file_operations vtpm_fops = {
> +	.owner = THIS_MODULE,
> +	.llseek = no_llseek,
> +	.read = vtpm_fops_read,
> +	.write = vtpm_fops_write,
> +	.poll = vtpm_fops_poll,
> +	.release = vtpm_fops_release,
> +};
> +
> +/*
> + * Functions invoked by the core TPM driver to send TPM commands to
> + * 'server side' and receive responses from there.
> + */
> +
> +/*
> + * Called when core TPM driver reads TPM responses from 'server side'
> + *
> + * Return value:
> + *      Number of TPM response bytes read, negative error value otherwise
> + */
> +static int vtpm_tpm_op_recv(struct tpm_chip *chip, u8 *buf, size_t count)
> +{
> +	struct vtpm_dev *vtpm_dev = chip->vendor.priv;
> +	int sig;
> +	size_t len;
> +
> +	if (!vtpm_dev)
> +		return -EIO;
> +
> +	/* wait for response or responder gone */
> +	sig = wait_event_interruptible(vtpm_dev->wq,
> +		(vtpm_dev->resp_len != 0
> +		|| !test_bit(STATE_OPENED_BIT, &vtpm_dev->state)));
> +
> +	if (sig)
> +		return -EINTR;
> +
> +	/* process gone ? */
> +	if (!test_bit(STATE_OPENED_BIT, &vtpm_dev->state))
> +		return -EPIPE;
> +
> +	spin_lock(&vtpm_dev->buf_lock);
> +
> +	len = vtpm_dev->resp_len;
> +	if (count < len) {
> +		dev_err(&chip->dev,
> +			"Invalid size in recv: count=%zd, resp_len=%zd\n",
> +			count, len);
> +		len = -EIO;
> +		goto out;
> +	}
> +
> +	memcpy(buf, vtpm_dev->buffer, len);
> +	vtpm_dev->resp_len = 0;
> +
> +out:
> +	spin_unlock(&vtpm_dev->buf_lock);
> +
> +	return len;
> +}
> +
> +/*
> + * Called when core TPM driver forwards TPM requests to 'server side'.
> + *
> + * Return value:
> + *      0 in case of success, negative error value otherwise.
> + */
> +static int vtpm_tpm_op_send(struct tpm_chip *chip, u8 *buf, size_t count)
> +{
> +	struct vtpm_dev *vtpm_dev = chip->vendor.priv;
> +	int rc = 0;
> +
> +	if (!vtpm_dev)
> +		return -EIO;
> +
> +	if (!test_bit(STATE_OPENED_BIT, &vtpm_dev->state))
> +		return -EPIPE;
> +
> +	if (count > sizeof(vtpm_dev->buffer)) {
> +		dev_err(&chip->dev,
> +			"Invalid size in send: count=%zd, buffer size=%zd\n",
> +			count, sizeof(vtpm_dev->buffer));
> +		return -EIO;
> +	}
> +
> +	spin_lock(&vtpm_dev->buf_lock);
> +
> +	vtpm_dev->resp_len = 0;
> +
> +	vtpm_dev->req_len = count;
> +	memcpy(vtpm_dev->buffer, buf, count);
> +
> +	spin_unlock(&vtpm_dev->buf_lock);
> +
> +	wake_up_interruptible(&vtpm_dev->wq);
> +
> +	clear_bit(STATE_WAIT_RESPONSE_BIT, &vtpm_dev->state);

vtpm_dev->state &= ~VTPM_DEV_WAIT_FOR_RESPONSE;

> +
> +	return rc;
> +}
> +
> +static void vtpm_tpm_op_cancel(struct tpm_chip *chip)
> +{
> +	/* not supported */
> +}
> +
> +static u8 vtpm_tpm_op_status(struct tpm_chip *chip)
> +{
> +	return 0;
> +}
> +
> +static bool vtpm_tpm_req_canceled(struct tpm_chip  *chip, u8 status)
> +{
> +	return (status == 0);
> +}
> +
> +static const struct tpm_class_ops vtpm_tpm_ops = {
> +	.recv = vtpm_tpm_op_recv,
> +	.send = vtpm_tpm_op_send,
> +	.cancel = vtpm_tpm_op_cancel,
> +	.status = vtpm_tpm_op_status,
> +	.req_complete_mask = 0,
> +	.req_complete_val = 0,
> +	.req_canceled = vtpm_tpm_req_canceled,
> +};
> +
> +/*
> + * Code related to creation and deletion of device pairs
> + */
> +static struct vtpm_dev *vtpm_create_vtpm_dev(void)
> +{
> +	struct vtpm_dev *vtpm_dev;
> +	struct tpm_chip *chip;
> +	int err;
> +
> +	vtpm_dev = kzalloc(sizeof(*vtpm_dev), GFP_KERNEL);
> +	if (vtpm_dev == NULL)
> +		return ERR_PTR(-ENOMEM);
> +
> +	init_waitqueue_head(&vtpm_dev->wq);
> +	spin_lock_init(&vtpm_dev->buf_lock);
> +
> +	chip = tpm_chip_alloc(NULL, &vtpm_tpm_ops);
> +	if (IS_ERR(chip)) {
> +		err = PTR_ERR(chip);
> +		goto err_vtpm_dev_free;
> +	}
> +	chip->vendor.priv = vtpm_dev;
> +	chip->vendor.irq = 1;
> +
> +	vtpm_dev->chip = chip;
> +
> +	return vtpm_dev;
> +
> +err_vtpm_dev_free:
> +	kfree(vtpm_dev);
> +
> +	return ERR_PTR(err);
> +}
> +
> +/*
> + * Undo what has been done in vtpm_create_vtpm_dev
> + */
> +static inline void vtpm_delete_vtpm_dev(struct vtpm_dev *vtpm_dev)
> +{
> +	put_device(&vtpm_dev->chip->dev); /* frees chip */
> +	kfree(vtpm_dev);
> +}
> +
> +/*
> + * Create a /dev/tpm%d and 'server side' file descriptor pair
> + *
> + * Return value:
> + *      Returns file pointer on success, an error value otherwise
> + */
> +static struct file *vtpm_create_device_pair(
> +				       struct vtpm_new_pair *vtpm_new_pair)
> +{
> +	struct vtpm_dev *vtpm_dev;
> +	int rc, fd;
> +	struct file *file;
> +
> +	vtpm_dev = vtpm_create_vtpm_dev();
> +	if (IS_ERR(vtpm_dev))
> +		return ERR_CAST(vtpm_dev);
> +
> +	vtpm_dev->flags = vtpm_new_pair->flags;
> +
> +	/* setup an anonymous file for the server-side */
> +	fd = get_unused_fd_flags(O_RDWR);
> +	if (fd < 0) {
> +		rc = fd;
> +		goto err_delete_vtpm_dev;
> +	}
> +
> +	file = anon_inode_getfile("[vtpms]", &vtpm_fops, vtpm_dev, O_RDWR);
> +	if (IS_ERR(file)) {
> +		rc = PTR_ERR(file);
> +		goto err_put_unused_fd;
> +	}
> +
> +	/* from now on we can unwind with put_unused_fd() + fput() */
> +	/* simulate an open() on the server side */
> +	vtpm_fops_open(file);
> +
> +	if (vtpm_dev->flags & VTPM_FLAG_TPM2)
> +		vtpm_dev->chip->flags |= TPM_CHIP_FLAG_TPM2;
> +
> +	rc = tpm_chip_register(vtpm_dev->chip);
> +	if (rc)
> +		goto err_vtpm_fput;
> +
> +	vtpm_new_pair->fd = fd;
> +	vtpm_new_pair->major = MAJOR(vtpm_dev->chip->dev.devt);
> +	vtpm_new_pair->minor = MINOR(vtpm_dev->chip->dev.devt);
> +	vtpm_new_pair->tpm_dev_num = vtpm_dev->chip->dev_num;
> +
> +	return file;
> +
> +err_vtpm_fput:
> +	put_unused_fd(fd);
> +	fput(file);
> +
> +	return ERR_PTR(rc);
> +
> +err_put_unused_fd:
> +	put_unused_fd(fd);
> +
> +err_delete_vtpm_dev:
> +	vtpm_delete_vtpm_dev(vtpm_dev);
> +
> +	return ERR_PTR(rc);
> +}
> +
> +/*
> + * Counter part to vtpm_create_device_pair.
> + */
> +static void vtpm_delete_device_pair(struct vtpm_dev *vtpm_dev)
> +{
> +	tpm_chip_unregister(vtpm_dev->chip);
> +
> +	vtpm_fops_undo_open(vtpm_dev);
> +
> +	vtpm_delete_vtpm_dev(vtpm_dev);
> +}
> +
> +/*
> + * Code related to the control device /dev/vtpmx
> + */
> +
> +/*
> + * vtpmx_fops_ioctl: ioctl on /dev/vtpmx
> + *
> + * Return value:
> + *      Returns 0 on success, a negative error code otherwise.
> + */
> +static long vtpmx_fops_ioctl(struct file *f, unsigned int ioctl,
> +			    unsigned long arg)
> +{
> +	void __user *argp = (void __user *)arg;
> +	struct vtpm_new_pair *vtpm_new_pair_p;
> +	struct vtpm_new_pair vtpm_new_pair;
> +	struct file *file;
> +
> +	switch (ioctl) {
> +	case VTPM_NEW_DEV:
> +		if (!capable(CAP_SYS_ADMIN))
> +			return -EPERM;
> +		vtpm_new_pair_p = argp;
> +		if (copy_from_user(&vtpm_new_pair, vtpm_new_pair_p,
> +				   sizeof(vtpm_new_pair)))
> +			return -EFAULT;
> +		file = vtpm_create_device_pair(&vtpm_new_pair);
> +		if (IS_ERR(file))
> +			return PTR_ERR(file);
> +		if (copy_to_user(vtpm_new_pair_p, &vtpm_new_pair,
> +				 sizeof(vtpm_new_pair))) {
> +			put_unused_fd(vtpm_new_pair.fd);
> +			fput(file);
> +			return -EFAULT;
> +		}
> +
> +		fd_install(vtpm_new_pair.fd, file);
> +		return 0;
> +
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
> +#ifdef CONFIG_COMPAT
> +static long vtpmx_fops_compat_ioctl(struct file *f, unsigned int ioctl,
> +				    unsigned long arg)
> +{
> +	return vtpmx_fops_ioctl(f, ioctl, (unsigned long)compat_ptr(arg));
> +}
> +#endif
> +
> +static const struct file_operations vtpmx_fops = {
> +	.owner = THIS_MODULE,
> +	.unlocked_ioctl = vtpmx_fops_ioctl,
> +#ifdef CONFIG_COMPAT
> +	.compat_ioctl = vtpmx_fops_compat_ioctl,
> +#endif
> +	.llseek = noop_llseek,
> +};
> +
> +static struct miscdevice vtpmx_miscdev = {
> +	.minor = MISC_DYNAMIC_MINOR,
> +	.name = "vtpmx",
> +	.fops = &vtpmx_fops,
> +};
> +
> +static int vtpmx_init(void)
> +{
> +	return misc_register(&vtpmx_miscdev);
> +}
> +
> +static void vtpmx_cleanup(void)
> +{
> +	misc_deregister(&vtpmx_miscdev);
> +}
> +
> +static int __init vtpm_module_init(void)
> +{
> +	int rc;
> +
> +	rc = vtpmx_init();
> +	if (rc) {
> +		pr_err("couldn't create vtpmx device\n");
> +		return rc;
> +	}
> +
> +	return 0;
> +}
> +
> +static void __exit vtpm_module_exit(void)
> +{
> +	vtpmx_cleanup();
> +}
> +
> +module_init(vtpm_module_init);
> +module_exit(vtpm_module_exit);
> +
> +MODULE_AUTHOR("Stefan Berger (stefanb@us.ibm.com)");
> +MODULE_DESCRIPTION("vTPM Driver");
> +MODULE_VERSION("0.1");
> +MODULE_LICENSE("GPL");
> diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
> index c2e5d6c..c194d61 100644
> --- a/include/uapi/linux/Kbuild
> +++ b/include/uapi/linux/Kbuild
> @@ -449,6 +449,7 @@ header-y += virtio_scsi.h
>  header-y += virtio_types.h
>  header-y += vm_sockets.h
>  header-y += vt.h
> +header-y += vtpm.h
>  header-y += wait.h
>  header-y += wanrouter.h
>  header-y += watchdog.h
> diff --git a/include/uapi/linux/vtpm.h b/include/uapi/linux/vtpm.h
> new file mode 100644
> index 0000000..aef8733
> --- /dev/null
> +++ b/include/uapi/linux/vtpm.h
> @@ -0,0 +1,38 @@
> +/*
> + * Definitions for the VTPM interface
> + * Copyright (c) 2015, 2016, IBM Corporation
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + */
> +
> +#ifndef _UAPI_LINUX_VTPM_H
> +#define _UAPI_LINUX_VTPM_H
> +
> +#include <linux/types.h>
> +#include <linux/ioctl.h>
> +
> +/* ioctls */
> +
> +struct vtpm_new_pair {
> +	__u32 flags;         /* input */
> +	__u32 tpm_dev_num;   /* output */
> +	__u32 fd;            /* output */
> +	__u32 major;         /* output */
> +	__u32 minor;         /* output */
> +};
> +
> +/* above flags */
> +#define VTPM_FLAG_TPM2           1  /* emulator is TPM 2 */
> +
> +#define VTPM_TPM 0xa0
> +
> +#define VTPM_NEW_DEV         _IOW(VTPM_TPM, 0x00, struct vtpm_new_pair)
> +
> +#endif /* _UAPI_LINUX_VTPM_H */
> -- 
> 2.4.3
> 

/Jarkko

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
Stefan Berger Feb. 23, 2016, 12:09 p.m. UTC | #5
Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote on 02/23/2016 
05:22:11 AM:


> 
> On Fri, Feb 19, 2016 at 07:42:06AM -0500, Stefan Berger wrote:
> > This patch implements a driver for supporting multiple emulated TPMs 
in a
> > system.
> > 
> > The driver implements a device /dev/vtpmx that is used to created
> > a client device pair /dev/tpmX (e.g., /dev/tpm10) and a server side 
that
> > is accessed using a file descriptor returned by an ioctl.
> > The device /dev/tpmX is the usual TPM device created by the core TPM
> > driver. Applications or kernel subsystems can send TPM commands to it
> > and the corresponding server-side file descriptor receives these
> > commands and delivers them to an emulated TPM.
> > 
> > Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
> > ---
> >  drivers/char/tpm/Kconfig    |  10 +
> >  drivers/char/tpm/Makefile   |   1 +
> >  drivers/char/tpm/tpm-vtpm.c | 543 +++++++++++++++++++++++++++++++
> +++++++++++++
> >  include/uapi/linux/Kbuild   |   1 +
> >  include/uapi/linux/vtpm.h   |  38 ++++
> >  5 files changed, 593 insertions(+)
> >  create mode 100644 drivers/char/tpm/tpm-vtpm.c
> >  create mode 100644 include/uapi/linux/vtpm.h
> > 
> > diff --git a/drivers/char/tpm/Kconfig b/drivers/char/tpm/Kconfig
> > index 3b84a8b..4c4e843 100644
> > --- a/drivers/char/tpm/Kconfig
> > +++ b/drivers/char/tpm/Kconfig
> > @@ -122,5 +122,15 @@ config TCG_CRB
> >       from within Linux.  To compile this driver as a module, choose
> >       M here; the module will be called tpm_crb.
> > 

> > +
> > +struct vtpm_dev {
> > +   struct tpm_chip *chip;
> > +
> > +   u32 flags;                   /* public API flags */
> > +
> > +   long state;
> > +#define STATE_OPENED_BIT        0
> > +#define STATE_WAIT_RESPONSE_BIT 1    /* waiting for emulator to 
> give response */
> 
> I'd prefer something like this before declaring the struct:
> 
> enum vtpm_dev_states {
>    VTPM_DEV_OPENED         = BIT(0),
>    VTPM_DEV_WAITING_FOR_RESPONSE   = BIT(1),
> };
> 
> This whole use of set/clear_bit macros when you don't have variable
> number of bits just makes code less transparent.

Though read-modify-writes with the bit ops are atomic and need no spinlock 
protection to avoid concurrency mess. Now we would need a spinlock for 
such ops.


> 
> > +
> > +   spinlock_t buf_lock;         /* lock for buffers */
> > +
> > +   wait_queue_head_t wq;
> > +
> > +   size_t req_len;              /* length of queued TPM request */
> > +   size_t resp_len;             /* length of queued TPM response */
> > +   u8 buffer[TPM_BUFSIZE];      /* request/response buffer */
> 
> I'd use alloc_page() with GFP_USERHIGH in order to be a better citizen
> in 32-bit environments. You can kmap() it when you need it.
> 
> > +};
> > +
> > +
> > +static void vtpm_delete_device_pair(struct vtpm_dev *vtpm_dev);
> > +
> > +/*
> > + * Functions related to 'server side'
> > + */
> > +
> > +/**
> > + * vtpm_fops_read - Read TPM commands on 'server side'
> > + *
> > + * Return value:
> > + *   Number of bytes read or negative error code
> > + */
> > +static ssize_t vtpm_fops_read(struct file *filp, char __user *buf,
> > +               size_t count, loff_t *off)
> > +{
> > +   struct vtpm_dev *vtpm_dev = filp->private_data;
> > +   size_t len;
> > +   int sig, rc;
> > +
> > +   sig = wait_event_interruptible(vtpm_dev->wq, vtpm_dev->req_len != 
0);
> > +   if (sig)
> > +      return -EINTR;
> > +
> > +   spin_lock(&vtpm_dev->buf_lock);
> > +
> > +   len = vtpm_dev->req_len;
> > +
> > +   if (count < len) {
> > +      spin_unlock(&vtpm_dev->buf_lock);
> > +      pr_debug("Invalid size in recv: count=%zd, req_len=%zd\n",
> > +          count, len);
> > +      return -EIO;
> > +   }
> > +
> > +   rc = copy_to_user(buf, vtpm_dev->buffer, len);
> > +   memset(vtpm_dev->buffer, 0, len);
> > +   vtpm_dev->req_len = 0;
> 
> I saw already Jason's comment but maybe you could just use a mutex
> instead of a spinlock?

I'll let Jason respond to it whether it's the difference between spinlock 
and mutex or just no locking at all. If no locking, I'd be inclined to 
work with two buffers, one for requests and one for responses.



> > +
> > +/*
> > + * Called when core TPM driver reads TPM responses from 'server side'
> > + *
> > + * Return value:
> > + *      Number of TPM response bytes read, negative error value 
otherwise
> > + */
> > +static int vtpm_tpm_op_recv(struct tpm_chip *chip, u8 *buf, size_t 
count)
> > +{
> > +   struct vtpm_dev *vtpm_dev = chip->vendor.priv;
> > +   int sig;
> > +   size_t len;
> > +
> > +   if (!vtpm_dev)
> > +      return -EIO;
> > +
> > +   /* wait for response or responder gone */
> > +   sig = wait_event_interruptible(vtpm_dev->wq,
> > +      (vtpm_dev->resp_len != 0
> > +      || !test_bit(STATE_OPENED_BIT, &vtpm_dev->state)));
> > +
> > +   if (sig)
> > +      return -EINTR;


With us not operating this driver in interrupt mode, we don't need the 
wait_event_interruptible if we make another change further below:


> > +
> > +static u8 vtpm_tpm_op_status(struct tpm_chip *chip)
> > +{
> > +   return 0;
> > +}


I modified this to 

static u8 vtpm_tpm_op_status(struct tpm_chip *chip)
{
        return (chip->resp_len) ? VTPM_REQ_COMPLETE_FLAG : 0;
}


> > +
> > +static bool vtpm_tpm_req_canceled(struct tpm_chip  *chip, u8 status)
> > +{
> > +   return (status == 0);
> > +}

This will have to remain like this.

> > +
> > +static const struct tpm_class_ops vtpm_tpm_ops = {
> > +   .recv = vtpm_tpm_op_recv,
> > +   .send = vtpm_tpm_op_send,
> > +   .cancel = vtpm_tpm_op_cancel,
> > +   .status = vtpm_tpm_op_status,
> > +   .req_complete_mask = 0,
> > +   .req_complete_val = 0,

        .req_complete_mask = VTPM_REQ_COMPLETE_FLAG,
        .req_complete_val = VTPM_REQ_COMPLETE_FLAG,

with

#define VTPM_REQ_COMPLETE_FLAG  BIT(0)


Agreed?

   Stefan
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
Jarkko Sakkinen Feb. 23, 2016, 6:36 p.m. UTC | #6
On Tue, Feb 23, 2016 at 07:09:44AM -0500, Stefan Berger wrote:
>    Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote on 02/23/2016
>    05:22:11 AM:
>
>    > > +struct vtpm_dev {
>    > > +   struct tpm_chip *chip;
>    > > +
>    > > +   u32 flags;                   /* public API flags */
>    > > +
>    > > +   long state;
>    > > +#define STATE_OPENED_BIT        0
>    > > +#define STATE_WAIT_RESPONSE_BIT 1    /* waiting for emulator to
>    > give response */
>    >
>    > I'd prefer something like this before declaring the struct:
>    >
>    > enum vtpm_dev_states {
>    >    VTPM_DEV_OPENED         = BIT(0),
>    >    VTPM_DEV_WAITING_FOR_RESPONSE   = BIT(1),
>    > };
>    >
>    > This whole use of set/clear_bit macros when you don't have variable
>    > number of bits just makes code less transparent.
> 
>    Though read-modify-writes with the bit ops are atomic and need no spinlock
>    protection to avoid concurrency mess. Now we would need a spinlock for
>    such ops.

Sounds messy. You should refer to flags inside buf_lock like you
otherwise do.

>    I'll let Jason respond to it whether it's the difference between spinlock
>    and mutex or just no locking at all. If no locking, I'd be inclined to
>    work with two buffers, one for requests and one for responses.

You have to switch to mutex because copy_to_user and copy_from_user can
sleep. How would you handle mutual exclusion without any locking (two
concurrent read calls)?

/Jarkko

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
Stefan Berger Feb. 24, 2016, 11:10 p.m. UTC | #7
Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote on 02/22/2016 
09:17:30 PM:

> 
> On Mon, Feb 22, 2016 at 08:45:51PM -0500, Stefan Berger wrote:
> 
> > Two things:
> > - the ioctl takes flags; should we return an error on flags that are 
not
> > supported but set by userspace?
> 
> Typically yes. Otherwise you cannot introduce new flags in
> future.
> 
> > - the sysfs works but I wished we could give some control over 
> whether it shows
> > any entries. Can we have a flag in the ioctl on whether to show 
> these files in
> > sysfs?
> 
> That is something to address in the future namespace patch series I
> expect you'll prepare..

It may be a while until we get there ... nevertheless it may be worth some 
thought already.

So we have at least two choices for how to avoid data leakage via sysfs; 
the problem is that sysfs shows all vtpm devices in all containers; the 
good thing is that at least Docker (other mgmt. stacks probably also) 
mount sysfs read-only into 'normal' containers, so that writing (even only 
to cancel) isn't typically possible.

1) allow user space to set a flag whether the sysfs entries are to be 
registered; a typical container mgmt. stack would set the flag to avoid 
data leakage between containers; no vtpm device with that flag set would 
show anything via sysfs

2) we know in which (user) namespace a /dev/tpm%d device is moved into 
following an ioctl on the device where a process's PID is a parameter; we 
could associate the process's (user) namespace with the chip and compare 
the current_user_ns() with chip->user_ns and return an empty string if 
they don't match; here the vtpm device owned by a particular (user) 
namespace would then also show data in sysfs entries if accessed from the 
right namespace; which sysfs entry to look at could be inferred from the 
minor number on /dev/tpm0 inside the container


   Stefan
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
Jarkko Sakkinen Feb. 25, 2016, 1:17 p.m. UTC | #8
On Wed, Feb 24, 2016 at 06:10:42PM -0500, Stefan Berger wrote:
>    Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote on 02/22/2016
>    09:17:30 PM:
> 
>    >
>    > On Mon, Feb 22, 2016 at 08:45:51PM -0500, Stefan Berger wrote:
>    >
>    > > Two things:
>    > > - the ioctl takes flags; should we return an error on flags that are
>    not
>    > > supported but set by userspace?
>    >
>    > Typically yes. Otherwise you cannot introduce new flags in
>    > future.
>    >
>    > > - the sysfs works but I wished we could give some control over
>    > whether it shows
>    > > any entries. Can we have a flag in the ioctl on whether to show
>    > these files in
>    > > sysfs?
>    >
>    > That is something to address in the future namespace patch series I
>    > expect you'll prepare..
> 
>    It may be a while until we get there ... nevertheless it may be worth some
>    thought already.
> 
>    So we have at least two choices for how to avoid data leakage via sysfs;
>    the problem is that sysfs shows all vtpm devices in all containers; the
>    good thing is that at least Docker (other mgmt. stacks probably also)
>    mount sysfs read-only into 'normal' containers, so that writing (even only
>    to cancel) isn't typically possible.
> 
>    1) allow user space to set a flag whether the sysfs entries are to be
>    registered; a typical container mgmt. stack would set the flag to avoid
>    data leakage between containers; no vtpm device with that flag set would
>    show anything via sysfs
> 
>    2) we know in which (user) namespace a /dev/tpm%d device is moved into
>    following an ioctl on the device where a process's PID is a parameter; we
>    could associate the process's (user) namespace with the chip and compare
>    the current_user_ns() with chip->user_ns and return an empty string if
>    they don't match; here the vtpm device owned by a particular (user)
>    namespace would then also show data in sysfs entries if accessed from the
>    right namespace; which sysfs entry to look at could be inferred from the
>    minor number on /dev/tpm0 inside the container

3) Do not show any existing sysfs attributes for containers. All but
   'ppi' are nonsense anyway or is there something that you couldn't read
   from /dev/tpm0? TPM 1.x user space tools could implement them by
   using the character device. It is not backwards compatibility break
   technically because existing code does not yet support vTPMs.

>       Stefan

How would you address measurement logs?

/Jarkko

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
Stefan Berger Feb. 25, 2016, 2:12 p.m. UTC | #9
Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote on 02/25/2016 
08:17:32 AM:

> 
> On Wed, Feb 24, 2016 at 06:10:42PM -0500, Stefan Berger wrote:
> >    Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote on 
02/22/2016
> >    09:17:30 PM:
> > 
> >    >
> >    > On Mon, Feb 22, 2016 at 08:45:51PM -0500, Stefan Berger wrote:
> >    >
> >    > > Two things:
> >    > > - the ioctl takes flags; should we return an error on flags 
that are
> >    not
> >    > > supported but set by userspace?
> >    >
> >    > Typically yes. Otherwise you cannot introduce new flags in
> >    > future.
> >    >
> >    > > - the sysfs works but I wished we could give some control over
> >    > whether it shows
> >    > > any entries. Can we have a flag in the ioctl on whether to show
> >    > these files in
> >    > > sysfs?
> >    >
> >    > That is something to address in the future namespace patch series 
I
> >    > expect you'll prepare..
> > 
> >    It may be a while until we get there ... nevertheless it may 
beworth some
> >    thought already.
> > 
> >    So we have at least two choices for how to avoid data leakage via 
sysfs;
> >    the problem is that sysfs shows all vtpm devices in all containers; 
the
> >    good thing is that at least Docker (other mgmt. stacks probably 
also)
> >    mount sysfs read-only into 'normal' containers, so that 
writing(even only
> >    to cancel) isn't typically possible.
> > 
> >    1) allow user space to set a flag whether the sysfs entries are to 
be
> >    registered; a typical container mgmt. stack would set the flag to 
avoid
> >    data leakage between containers; no vtpm device with that flag set 
would
> >    show anything via sysfs
> > 
> >    2) we know in which (user) namespace a /dev/tpm%d device is moved 
into
> >    following an ioctl on the device where a process's PID is a 
parameter; we
> >    could associate the process's (user) namespace with the chip and 
compare
> >    the current_user_ns() with chip->user_ns and return an empty string 
if
> >    they don't match; here the vtpm device owned by a particular (user)
> >    namespace would then also show data in sysfs entries if accessed 
from the
> >    right namespace; which sysfs entry to look at could be inferred 
from the
> >    minor number on /dev/tpm0 inside the container

With clone() not necessarily setting the user namespace and setns() being 
able to do that after some fork()s, I think 2) doesn't work so well.

> 
> 3) Do not show any existing sysfs attributes for containers. All but

A separate sysfs tree isn't built for every container, so sysfs is more or 
less global showing pretty much the same in every container except for 
networking namespace seems to have a way of not doing that.

>    'ppi' are nonsense anyway or is there something that you couldn't 
read
>    from /dev/tpm0? TPM 1.x user space tools could implement them by
>    using the character device. It is not backwards compatibility break
>    technically because existing code does not yet support vTPMs.

Is this the same as 1) then ?

> 
> >       Stefan
> 
> How would you address measurement logs?

IMA would be namespaced and log separately for every IMA namespace / 
container.

   Stefan

> 
> /Jarkko
>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
Jason Gunthorpe Feb. 25, 2016, 5:39 p.m. UTC | #10
On Thu, Feb 25, 2016 at 09:12:57AM -0500, Stefan Berger wrote:

>    > 3) Do not show any existing sysfs attributes for
>    > containers. All but

>    A separate sysfs tree isn't built for every container, so sysfs is more
>    or less global showing pretty much the same in every container except
>    for networking namespace seems to have a way of not doing that.

TPM should be able to use the same techniques as net, the syfs.*ns set
of APIs exists for this purpose. I've never looked at how to use them,
but something should be workable there.

Once you figure out how to define what TPMs are in a namespace it
should be doable to use the syfs_ns APIs to have sysfs follow that
restriction just like net does.

Jason

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
Stefan Berger Feb. 25, 2016, 6:42 p.m. UTC | #11
Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote on 02/25/2016 
12:39:56 PM:


> 
> On Thu, Feb 25, 2016 at 09:12:57AM -0500, Stefan Berger wrote:
> 
> >    > 3) Do not show any existing sysfs attributes for
> >    > containers. All but
> 
> >    A separate sysfs tree isn't built for every container, so sysfs is 
more
> >    or less global showing pretty much the same in every container 
except
> >    for networking namespace seems to have a way of not doing that.
> 
> TPM should be able to use the same techniques as net, the syfs.*ns set
> of APIs exists for this purpose. I've never looked at how to use them,
> but something should be workable there.

It looks like they some are being used on the kobject level. 

> 
> Once you figure out how to define what TPMs are in a namespace it
> should be doable to use the syfs_ns APIs to have sysfs follow that
> restriction just like net does.

Networking has its own namespace and it looks like all devices get created 
while in that namespace. So the kobject can have its association with that 
namespace right from the beginning. In the case of vtpm we need to create 
the device on the host since we run the TPM emulator on the host out of 
reach of signals from the container. We would only associate the vtpm 
device with the namespace after the clone(), a long time after current 
registration with sysfs. Another difference is that we don't have a device 
namespace, so all our device names and major / minor numbers need to be 
unique and that's also reflected in sysfs.


I have been experimenting with an ioctl that passes along a file 
descriptor to a user namespace (/proc/pid/ns/user) for the purpose of 
associating the vtpm with that user namespace. This is similar to what 
setns() does, except the ioctl associates a vTPM with a namespace. This 
works (once the child is in its final namespace, which the parent needs to 
sync with) and following the proposed filtering on the TPM sysfs attribute 
level, only read()s issued from the user namespace that the vTPM is 
associated with get data. That we can have up to 64k TPM entries in sysfs 
certainly isn't nice.

This 1st ioctl can be called basically at any time and is called on the 
file descriptor returned by the vtpm driver.

Another ioctl is the one we have been discussing previously for 
associating the chip with an IMA namespace (which would be a compile time 
option). Here we need to ensure that the child gets the chip hooked to the 
IMA namespace before the execve() triggers measurements by IMA. Here I 
pass the process Id of that child to then determine IMA namespace to hook 
the chip to and user namespace for vTPM sysfs association. I prefer the 
child's process id over passing two file descriptors in this case...

   Stefan
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
Jason Gunthorpe Feb. 25, 2016, 8:31 p.m. UTC | #12
On Thu, Feb 25, 2016 at 01:42:10PM -0500, Stefan Berger wrote:

>    It looks like they some are being used on the kobject level.

Yes, struct devices are kobjects.

>    > Once you figure out how to define what TPMs are in a namespace it
>    > should be doable to use the syfs_ns APIs to have sysfs follow that
>    > restriction just like net does.

>    Networking has its own namespace and it looks like all devices get
>    created while in that namespace.

No, it is like tpm, a cannonical example is something like:

 ip netns add blue
 ip link add veth0 type veth peer name veth1
 ip link set veth1 netns blue

To move an interface, which presumably moves the sysfs stuff as
well. Seems exactly like a mode that could work for TPM.

>    clone(), a long time after current registration with sysfs. Another
>    difference is that we don't have a device namespace, so all our device
>    names and major / minor numbers need to be unique and that's also
>    reflected in sysfs.

major/minor numbers do not need to be unique, the mapping of TPM ID to
physical TPM is something a namespace should control, eg TPM ID 0 is
always major/minor 10:224, and can be routed to which ever tpm is
correct for the namespace of the accessing process.

Same for sysfs, within the namespace the vtpm should appear as tpm0.

>    I have been experimenting with an ioctl that passes along a file
>    descriptor to a user namespace (/proc/pid/ns/user) for the purpose of
>    associating the vtpm with that user namespace.

I would have thought you'd use the IMA namespace for this, seems more
natural?

Functionally it doesn't matter, which ever name space is used, migrate
the sysfs stuff similar to net and virtualize all the ID.

>    time option). Here we need to ensure that the child gets the chip
>    hooked to the IMA namespace before the execve() triggers measurements
>    by IMA. Here I pass the process Id of that child to then determine IMA
>    namespace to hook the chip to and user namespace for vTPM sysfs
>    association. I prefer the child's process id over passing two file
>    descriptors in this case...

I'm sure people familiar with namespaces/etc will have suggestions on
how to build the uapi side.

I still intensely dislike the use of an ioctl on vtpm because
namespaces will have to be a core tpm feature, an ioctl on the
/dev/tpmX fd would be more approriate.

Jason

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
Stefan Berger Feb. 25, 2016, 10:11 p.m. UTC | #13
Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote on 02/25/2016 
03:31:17 PM:


> 
> On Thu, Feb 25, 2016 at 01:42:10PM -0500, Stefan Berger wrote:
> 
> >    It looks like they some are being used on the kobject level.
> 
> Yes, struct devices are kobjects.
> 
> >    > Once you figure out how to define what TPMs are in a namespace it
> >    > should be doable to use the syfs_ns APIs to have sysfs follow 
that
> >    > restriction just like net does.
> 
> >    Networking has its own namespace and it looks like all devices get
> >    created while in that namespace.
> 
> No, it is like tpm, a cannonical example is something like:
> 
>  ip netns add blue
>  ip link add veth0 type veth peer name veth1
>  ip link set veth1 netns blue

True.

> 
> To move an interface, which presumably moves the sysfs stuff as
> well. Seems exactly like a mode that could work for TPM.
> 
> >    clone(), a long time after current registration with sysfs. Another
> >    difference is that we don't have a device namespace, so all our 
device
> >    names and major / minor numbers need to be unique and that's also
> >    reflected in sysfs.
> 
> major/minor numbers do not need to be unique, the mapping of TPM ID to
> physical TPM is something a namespace should control, eg TPM ID 0 is
> always major/minor 10:224, and can be routed to which ever tpm is
> correct for the namespace of the accessing process.

Though you would agree that the device gets created in the host 
'namespace' first and then is moved into a namespace? We would then 
emulate the major/minor assignment (misc device oddity and so on) in the 
container management stack and assign the first 10:224 inside the 
container? Would the move make the device disappear on the host's main 
namespace? Like /dev/tpm2 now being moved into a namespace and it 
disappearing on the host?

Then in tpm_open() (as well as all the other functions) we would not get 
the chip the way we do now but by checking the user namespace to determine 
from where the open() is coming from and then search an ordered list of 
chips associated with that namespace and get the first entry? One would 
have to be able to conclude from the inode or file what major/minor number 
the device is associated with.

       struct tpm_chip *chip =
               container_of(inode->i_cdev, struct tpm_chip, cdev);

Well, I guess that part of this  would involve a level of indirection 
approximately like this:

struct tpm_file *tf = container_of(inode->i_cdev, struct tpm_file, cdev);

struct tpm_chip *chip = tpm_find_chip_by_namespace(current_user_ns(), 
tf->major, rf->minor);



> 
> Same for sysfs, within the namespace the vtpm should appear as tpm0.
> 
> >    I have been experimenting with an ioctl that passes along a file
> >    descriptor to a user namespace (/proc/pid/ns/user) for the purpose 
of
> >    associating the vtpm with that user namespace.
> 
> I would have thought you'd use the IMA namespace for this, seems more
> natural?

I am not yet sure. Maybe it's too early to talk about this ioctl. 

Can we upstream the current driver with patches 1 - 9?


> 
> Functionally it doesn't matter, which ever name space is used, migrate
> the sysfs stuff similar to net and virtualize all the ID.
> 
> >    time option). Here we need to ensure that the child gets the chip
> >    hooked to the IMA namespace before the execve() triggers 
measurements
> >    by IMA. Here I pass the process Id of that child to then determine 
IMA
> >    namespace to hook the chip to and user namespace for vTPM sysfs
> >    association. I prefer the child's process id over passing two file
> >    descriptors in this case...
> 
> I'm sure people familiar with namespaces/etc will have suggestions on
> how to build the uapi side.

Yes, not quite clear about which way is the best. 

> 
> I still intensely dislike the use of an ioctl on vtpm because
> namespaces will have to be a core tpm feature, an ioctl on the
> /dev/tpmX fd would be more approriate.

So that would be to move a hardware TPM into a container. If the 
namespacing implementation would need to be device-specific, it could call 
a function on the chip to do that.

   Stefan
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
diff mbox

Patch

diff --git a/drivers/char/tpm/Kconfig b/drivers/char/tpm/Kconfig
index 3b84a8b..4c4e843 100644
--- a/drivers/char/tpm/Kconfig
+++ b/drivers/char/tpm/Kconfig
@@ -122,5 +122,15 @@  config TCG_CRB
 	  from within Linux.  To compile this driver as a module, choose
 	  M here; the module will be called tpm_crb.
 
+config TCG_VTPM
+	tristate "VTPM Interface"
+	depends on TCG_TPM
+	---help---
+	  This driver supports an emulated TPM (vTPM) running in userspace.
+	  A device /dev/vtpmx is provided that creates a device pair
+	  /dev/vtpmX and a server-side file descriptor on which the vTPM
+	  can receive commands.
+
+
 source "drivers/char/tpm/st33zp24/Kconfig"
 endif # TCG_TPM
diff --git a/drivers/char/tpm/Makefile b/drivers/char/tpm/Makefile
index 56e8f1f..d947db2 100644
--- a/drivers/char/tpm/Makefile
+++ b/drivers/char/tpm/Makefile
@@ -23,3 +23,4 @@  obj-$(CONFIG_TCG_IBMVTPM) += tpm_ibmvtpm.o
 obj-$(CONFIG_TCG_TIS_ST33ZP24) += st33zp24/
 obj-$(CONFIG_TCG_XEN) += xen-tpmfront.o
 obj-$(CONFIG_TCG_CRB) += tpm_crb.o
+obj-$(CONFIG_TCG_VTPM) += tpm-vtpm.o
diff --git a/drivers/char/tpm/tpm-vtpm.c b/drivers/char/tpm/tpm-vtpm.c
new file mode 100644
index 0000000..823c94a
--- /dev/null
+++ b/drivers/char/tpm/tpm-vtpm.c
@@ -0,0 +1,543 @@ 
+/*
+ * Copyright (C) 2015, 2016 IBM Corporation
+ *
+ * Author: Stefan Berger <stefanb@us.ibm.com>
+ *
+ * Maintained by: <tpmdd-devel@lists.sourceforge.net>
+ *
+ * Device driver for vTPM.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation, version 2 of the
+ * License.
+ *
+ */
+
+#include <linux/types.h>
+#include <linux/spinlock.h>
+#include <linux/uaccess.h>
+#include <linux/wait.h>
+#include <linux/miscdevice.h>
+#include <linux/vtpm.h>
+#include <linux/file.h>
+#include <linux/anon_inodes.h>
+#include <linux/poll.h>
+#include <linux/compat.h>
+
+#include "tpm.h"
+
+#define VTPM_NUM_DEVICES TPM_NUM_DEVICES
+
+struct vtpm_dev {
+	struct tpm_chip *chip;
+
+	u32 flags;                   /* public API flags */
+
+	long state;
+#define STATE_OPENED_BIT        0
+#define STATE_WAIT_RESPONSE_BIT 1    /* waiting for emulator to give response */
+
+	spinlock_t buf_lock;         /* lock for buffers */
+
+	wait_queue_head_t wq;
+
+	size_t req_len;              /* length of queued TPM request */
+	size_t resp_len;             /* length of queued TPM response */
+	u8 buffer[TPM_BUFSIZE];      /* request/response buffer */
+};
+
+
+static void vtpm_delete_device_pair(struct vtpm_dev *vtpm_dev);
+
+/*
+ * Functions related to 'server side'
+ */
+
+/**
+ * vtpm_fops_read - Read TPM commands on 'server side'
+ *
+ * Return value:
+ *	Number of bytes read or negative error code
+ */
+static ssize_t vtpm_fops_read(struct file *filp, char __user *buf,
+			      size_t count, loff_t *off)
+{
+	struct vtpm_dev *vtpm_dev = filp->private_data;
+	size_t len;
+	int sig, rc;
+
+	sig = wait_event_interruptible(vtpm_dev->wq, vtpm_dev->req_len != 0);
+	if (sig)
+		return -EINTR;
+
+	spin_lock(&vtpm_dev->buf_lock);
+
+	len = vtpm_dev->req_len;
+
+	if (count < len) {
+		spin_unlock(&vtpm_dev->buf_lock);
+		pr_debug("Invalid size in recv: count=%zd, req_len=%zd\n",
+			 count, len);
+		return -EIO;
+	}
+
+	rc = copy_to_user(buf, vtpm_dev->buffer, len);
+	memset(vtpm_dev->buffer, 0, len);
+	vtpm_dev->req_len = 0;
+
+	spin_unlock(&vtpm_dev->buf_lock);
+
+	if (rc)
+		return -EFAULT;
+
+	set_bit(STATE_WAIT_RESPONSE_BIT, &vtpm_dev->state);
+
+	return len;
+}
+
+/**
+ * vtpm_fops_write - Write TPM responses on 'server side'
+ *
+ * Return value:
+ *	Number of bytes read or negative error value
+ */
+static ssize_t vtpm_fops_write(struct file *filp, const char __user *buf,
+			       size_t count, loff_t *off)
+{
+	struct vtpm_dev *vtpm_dev = filp->private_data;
+
+	if (count > sizeof(vtpm_dev->buffer) ||
+	    !test_bit(STATE_WAIT_RESPONSE_BIT, &vtpm_dev->state))
+		return -EIO;
+
+	clear_bit(STATE_WAIT_RESPONSE_BIT, &vtpm_dev->state);
+
+	spin_lock(&vtpm_dev->buf_lock);
+
+	vtpm_dev->req_len = 0;
+
+	if (copy_from_user(vtpm_dev->buffer, buf, count)) {
+		spin_unlock(&vtpm_dev->buf_lock);
+		return -EFAULT;
+	}
+
+	vtpm_dev->resp_len = count;
+
+	spin_unlock(&vtpm_dev->buf_lock);
+
+	wake_up_interruptible(&vtpm_dev->wq);
+
+	return count;
+}
+
+/*
+ * vtpm_fops_poll: Poll status on 'server side'
+ *
+ * Return value:
+ *      Poll flags
+ */
+static unsigned int vtpm_fops_poll(struct file *filp, poll_table *wait)
+{
+	struct vtpm_dev *vtpm_dev = filp->private_data;
+	unsigned ret;
+
+	poll_wait(filp, &vtpm_dev->wq, wait);
+
+	ret = POLLOUT;
+	if (vtpm_dev->req_len)
+		ret |= POLLIN | POLLRDNORM;
+
+	return ret;
+}
+
+/*
+ * vtpm_fops_open - Open vTPM device on 'server side'
+ *
+ * Called when setting up the anonymous file descriptor
+ */
+static void vtpm_fops_open(struct file *filp)
+{
+	struct vtpm_dev *vtpm_dev = filp->private_data;
+
+	set_bit(STATE_OPENED_BIT, &vtpm_dev->state);
+}
+
+/**
+ * vtpm_fops_undo_open - counter-part to vtpm_fops_open
+ *
+ * Call to undo vtpm_fops_open
+ */
+static void vtpm_fops_undo_open(struct vtpm_dev *vtpm_dev)
+{
+	clear_bit(STATE_OPENED_BIT, &vtpm_dev->state);
+
+	/* no more TPM responses -- wake up anyone waiting for them */
+	wake_up_interruptible(&vtpm_dev->wq);
+}
+
+/*
+ * vtpm_fops_release: Close 'server side'
+ *
+ * Return value:
+ *      Always returns 0.
+ */
+static int vtpm_fops_release(struct inode *inode, struct file *filp)
+{
+	struct vtpm_dev *vtpm_dev = filp->private_data;
+
+	filp->private_data = NULL;
+
+	vtpm_delete_device_pair(vtpm_dev);
+
+	return 0;
+}
+
+static const struct file_operations vtpm_fops = {
+	.owner = THIS_MODULE,
+	.llseek = no_llseek,
+	.read = vtpm_fops_read,
+	.write = vtpm_fops_write,
+	.poll = vtpm_fops_poll,
+	.release = vtpm_fops_release,
+};
+
+/*
+ * Functions invoked by the core TPM driver to send TPM commands to
+ * 'server side' and receive responses from there.
+ */
+
+/*
+ * Called when core TPM driver reads TPM responses from 'server side'
+ *
+ * Return value:
+ *      Number of TPM response bytes read, negative error value otherwise
+ */
+static int vtpm_tpm_op_recv(struct tpm_chip *chip, u8 *buf, size_t count)
+{
+	struct vtpm_dev *vtpm_dev = chip->vendor.priv;
+	int sig;
+	size_t len;
+
+	if (!vtpm_dev)
+		return -EIO;
+
+	/* wait for response or responder gone */
+	sig = wait_event_interruptible(vtpm_dev->wq,
+		(vtpm_dev->resp_len != 0
+		|| !test_bit(STATE_OPENED_BIT, &vtpm_dev->state)));
+
+	if (sig)
+		return -EINTR;
+
+	/* process gone ? */
+	if (!test_bit(STATE_OPENED_BIT, &vtpm_dev->state))
+		return -EPIPE;
+
+	spin_lock(&vtpm_dev->buf_lock);
+
+	len = vtpm_dev->resp_len;
+	if (count < len) {
+		dev_err(&chip->dev,
+			"Invalid size in recv: count=%zd, resp_len=%zd\n",
+			count, len);
+		len = -EIO;
+		goto out;
+	}
+
+	memcpy(buf, vtpm_dev->buffer, len);
+	vtpm_dev->resp_len = 0;
+
+out:
+	spin_unlock(&vtpm_dev->buf_lock);
+
+	return len;
+}
+
+/*
+ * Called when core TPM driver forwards TPM requests to 'server side'.
+ *
+ * Return value:
+ *      0 in case of success, negative error value otherwise.
+ */
+static int vtpm_tpm_op_send(struct tpm_chip *chip, u8 *buf, size_t count)
+{
+	struct vtpm_dev *vtpm_dev = chip->vendor.priv;
+	int rc = 0;
+
+	if (!vtpm_dev)
+		return -EIO;
+
+	if (!test_bit(STATE_OPENED_BIT, &vtpm_dev->state))
+		return -EPIPE;
+
+	if (count > sizeof(vtpm_dev->buffer)) {
+		dev_err(&chip->dev,
+			"Invalid size in send: count=%zd, buffer size=%zd\n",
+			count, sizeof(vtpm_dev->buffer));
+		return -EIO;
+	}
+
+	spin_lock(&vtpm_dev->buf_lock);
+
+	vtpm_dev->resp_len = 0;
+
+	vtpm_dev->req_len = count;
+	memcpy(vtpm_dev->buffer, buf, count);
+
+	spin_unlock(&vtpm_dev->buf_lock);
+
+	wake_up_interruptible(&vtpm_dev->wq);
+
+	clear_bit(STATE_WAIT_RESPONSE_BIT, &vtpm_dev->state);
+
+	return rc;
+}
+
+static void vtpm_tpm_op_cancel(struct tpm_chip *chip)
+{
+	/* not supported */
+}
+
+static u8 vtpm_tpm_op_status(struct tpm_chip *chip)
+{
+	return 0;
+}
+
+static bool vtpm_tpm_req_canceled(struct tpm_chip  *chip, u8 status)
+{
+	return (status == 0);
+}
+
+static const struct tpm_class_ops vtpm_tpm_ops = {
+	.recv = vtpm_tpm_op_recv,
+	.send = vtpm_tpm_op_send,
+	.cancel = vtpm_tpm_op_cancel,
+	.status = vtpm_tpm_op_status,
+	.req_complete_mask = 0,
+	.req_complete_val = 0,
+	.req_canceled = vtpm_tpm_req_canceled,
+};
+
+/*
+ * Code related to creation and deletion of device pairs
+ */
+static struct vtpm_dev *vtpm_create_vtpm_dev(void)
+{
+	struct vtpm_dev *vtpm_dev;
+	struct tpm_chip *chip;
+	int err;
+
+	vtpm_dev = kzalloc(sizeof(*vtpm_dev), GFP_KERNEL);
+	if (vtpm_dev == NULL)
+		return ERR_PTR(-ENOMEM);
+
+	init_waitqueue_head(&vtpm_dev->wq);
+	spin_lock_init(&vtpm_dev->buf_lock);
+
+	chip = tpm_chip_alloc(NULL, &vtpm_tpm_ops);
+	if (IS_ERR(chip)) {
+		err = PTR_ERR(chip);
+		goto err_vtpm_dev_free;
+	}
+	chip->vendor.priv = vtpm_dev;
+	chip->vendor.irq = 1;
+
+	vtpm_dev->chip = chip;
+
+	return vtpm_dev;
+
+err_vtpm_dev_free:
+	kfree(vtpm_dev);
+
+	return ERR_PTR(err);
+}
+
+/*
+ * Undo what has been done in vtpm_create_vtpm_dev
+ */
+static inline void vtpm_delete_vtpm_dev(struct vtpm_dev *vtpm_dev)
+{
+	put_device(&vtpm_dev->chip->dev); /* frees chip */
+	kfree(vtpm_dev);
+}
+
+/*
+ * Create a /dev/tpm%d and 'server side' file descriptor pair
+ *
+ * Return value:
+ *      Returns file pointer on success, an error value otherwise
+ */
+static struct file *vtpm_create_device_pair(
+				       struct vtpm_new_pair *vtpm_new_pair)
+{
+	struct vtpm_dev *vtpm_dev;
+	int rc, fd;
+	struct file *file;
+
+	vtpm_dev = vtpm_create_vtpm_dev();
+	if (IS_ERR(vtpm_dev))
+		return ERR_CAST(vtpm_dev);
+
+	vtpm_dev->flags = vtpm_new_pair->flags;
+
+	/* setup an anonymous file for the server-side */
+	fd = get_unused_fd_flags(O_RDWR);
+	if (fd < 0) {
+		rc = fd;
+		goto err_delete_vtpm_dev;
+	}
+
+	file = anon_inode_getfile("[vtpms]", &vtpm_fops, vtpm_dev, O_RDWR);
+	if (IS_ERR(file)) {
+		rc = PTR_ERR(file);
+		goto err_put_unused_fd;
+	}
+
+	/* from now on we can unwind with put_unused_fd() + fput() */
+	/* simulate an open() on the server side */
+	vtpm_fops_open(file);
+
+	if (vtpm_dev->flags & VTPM_FLAG_TPM2)
+		vtpm_dev->chip->flags |= TPM_CHIP_FLAG_TPM2;
+
+	rc = tpm_chip_register(vtpm_dev->chip);
+	if (rc)
+		goto err_vtpm_fput;
+
+	vtpm_new_pair->fd = fd;
+	vtpm_new_pair->major = MAJOR(vtpm_dev->chip->dev.devt);
+	vtpm_new_pair->minor = MINOR(vtpm_dev->chip->dev.devt);
+	vtpm_new_pair->tpm_dev_num = vtpm_dev->chip->dev_num;
+
+	return file;
+
+err_vtpm_fput:
+	put_unused_fd(fd);
+	fput(file);
+
+	return ERR_PTR(rc);
+
+err_put_unused_fd:
+	put_unused_fd(fd);
+
+err_delete_vtpm_dev:
+	vtpm_delete_vtpm_dev(vtpm_dev);
+
+	return ERR_PTR(rc);
+}
+
+/*
+ * Counter part to vtpm_create_device_pair.
+ */
+static void vtpm_delete_device_pair(struct vtpm_dev *vtpm_dev)
+{
+	tpm_chip_unregister(vtpm_dev->chip);
+
+	vtpm_fops_undo_open(vtpm_dev);
+
+	vtpm_delete_vtpm_dev(vtpm_dev);
+}
+
+/*
+ * Code related to the control device /dev/vtpmx
+ */
+
+/*
+ * vtpmx_fops_ioctl: ioctl on /dev/vtpmx
+ *
+ * Return value:
+ *      Returns 0 on success, a negative error code otherwise.
+ */
+static long vtpmx_fops_ioctl(struct file *f, unsigned int ioctl,
+			    unsigned long arg)
+{
+	void __user *argp = (void __user *)arg;
+	struct vtpm_new_pair *vtpm_new_pair_p;
+	struct vtpm_new_pair vtpm_new_pair;
+	struct file *file;
+
+	switch (ioctl) {
+	case VTPM_NEW_DEV:
+		if (!capable(CAP_SYS_ADMIN))
+			return -EPERM;
+		vtpm_new_pair_p = argp;
+		if (copy_from_user(&vtpm_new_pair, vtpm_new_pair_p,
+				   sizeof(vtpm_new_pair)))
+			return -EFAULT;
+		file = vtpm_create_device_pair(&vtpm_new_pair);
+		if (IS_ERR(file))
+			return PTR_ERR(file);
+		if (copy_to_user(vtpm_new_pair_p, &vtpm_new_pair,
+				 sizeof(vtpm_new_pair))) {
+			put_unused_fd(vtpm_new_pair.fd);
+			fput(file);
+			return -EFAULT;
+		}
+
+		fd_install(vtpm_new_pair.fd, file);
+		return 0;
+
+	default:
+		return -EINVAL;
+	}
+}
+
+#ifdef CONFIG_COMPAT
+static long vtpmx_fops_compat_ioctl(struct file *f, unsigned int ioctl,
+				    unsigned long arg)
+{
+	return vtpmx_fops_ioctl(f, ioctl, (unsigned long)compat_ptr(arg));
+}
+#endif
+
+static const struct file_operations vtpmx_fops = {
+	.owner = THIS_MODULE,
+	.unlocked_ioctl = vtpmx_fops_ioctl,
+#ifdef CONFIG_COMPAT
+	.compat_ioctl = vtpmx_fops_compat_ioctl,
+#endif
+	.llseek = noop_llseek,
+};
+
+static struct miscdevice vtpmx_miscdev = {
+	.minor = MISC_DYNAMIC_MINOR,
+	.name = "vtpmx",
+	.fops = &vtpmx_fops,
+};
+
+static int vtpmx_init(void)
+{
+	return misc_register(&vtpmx_miscdev);
+}
+
+static void vtpmx_cleanup(void)
+{
+	misc_deregister(&vtpmx_miscdev);
+}
+
+static int __init vtpm_module_init(void)
+{
+	int rc;
+
+	rc = vtpmx_init();
+	if (rc) {
+		pr_err("couldn't create vtpmx device\n");
+		return rc;
+	}
+
+	return 0;
+}
+
+static void __exit vtpm_module_exit(void)
+{
+	vtpmx_cleanup();
+}
+
+module_init(vtpm_module_init);
+module_exit(vtpm_module_exit);
+
+MODULE_AUTHOR("Stefan Berger (stefanb@us.ibm.com)");
+MODULE_DESCRIPTION("vTPM Driver");
+MODULE_VERSION("0.1");
+MODULE_LICENSE("GPL");
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index c2e5d6c..c194d61 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -449,6 +449,7 @@  header-y += virtio_scsi.h
 header-y += virtio_types.h
 header-y += vm_sockets.h
 header-y += vt.h
+header-y += vtpm.h
 header-y += wait.h
 header-y += wanrouter.h
 header-y += watchdog.h
diff --git a/include/uapi/linux/vtpm.h b/include/uapi/linux/vtpm.h
new file mode 100644
index 0000000..aef8733
--- /dev/null
+++ b/include/uapi/linux/vtpm.h
@@ -0,0 +1,38 @@ 
+/*
+ * Definitions for the VTPM interface
+ * Copyright (c) 2015, 2016, IBM Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#ifndef _UAPI_LINUX_VTPM_H
+#define _UAPI_LINUX_VTPM_H
+
+#include <linux/types.h>
+#include <linux/ioctl.h>
+
+/* ioctls */
+
+struct vtpm_new_pair {
+	__u32 flags;         /* input */
+	__u32 tpm_dev_num;   /* output */
+	__u32 fd;            /* output */
+	__u32 major;         /* output */
+	__u32 minor;         /* output */
+};
+
+/* above flags */
+#define VTPM_FLAG_TPM2           1  /* emulator is TPM 2 */
+
+#define VTPM_TPM 0xa0
+
+#define VTPM_NEW_DEV         _IOW(VTPM_TPM, 0x00, struct vtpm_new_pair)
+
+#endif /* _UAPI_LINUX_VTPM_H */