Patchwork [v3,03/17] powerpc: Add PFO support to the VIO bus

login
register
mail settings
Submitter Kent Yoder
Date April 12, 2012, 3:08 p.m.
Message ID <1334243302.18090.10.camel@key-ThinkPad-W510>
Download mbox | patch
Permalink /patch/152089/
State Accepted
Headers show

Comments

Kent Yoder - April 12, 2012, 3:08 p.m.
Add support for the Platform Facilities Option (PFO) to the VIO bus.
These devices have a separate root node in OpenFirmware which
requires additional parsing to map into the existing VIO device
structure fields. This adds the interface for PFO device drivers to
make synchronous hypervisor calls.

Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Kent Yoder <key@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/vio.h |   46 +++++++
 arch/powerpc/kernel/vio.c      |  273 ++++++++++++++++++++++++++++++++++------
 2 files changed, 280 insertions(+), 39 deletions(-)
Benjamin Herrenschmidt - May 1, 2012, 3:06 a.m.
Hrm... I don't like that much:

> +	if (op->timeout)
> +		deadline = jiffies + msecs_to_jiffies(op->timeout);
> +
> +	while (true) {
> +		hret = plpar_hcall_norets(H_COP, op->flags,
> +				vdev->resource_id,
> +				op->in, op->inlen, op->out,
> +				op->outlen, op->csbcpb);
> +
> +		if (hret == H_SUCCESS ||
> +		    (hret != H_NOT_ENOUGH_RESOURCES &&
> +		     hret != H_BUSY && hret != H_RESOURCE) ||
> +		    (op->timeout && time_after(deadline, jiffies)))
> +			break;
> +
> +		dev_dbg(dev, "%s: hcall ret(%ld), retrying.\n", __func__, hret);
> +	}
> +

Is this meant to be called in atomic context ? If not, maybe it should
at the very least do a cond_resched() ?

Else, what about ceding the processor ? Or at the very least reducing
the thread priority for a bit ?

Shouldn't we also enforce to always have a timeout ? IE. Something like
30s or so if nothing specified to avoid having the kernel just hard
lock...

In general I don't like that sort of synchronous code, I'd rather return
the busy status up the chain which gives a chance to the caller to take
more appropriate measures depending on what it's doing, but that really
depends what you use that synchronous call for. I suppose if it's for
configuration type operations, it's ok...

Cheers,
Ben.
Benjamin Herrenschmidt - May 1, 2012, 4:10 a.m.
> Else, what about ceding the processor ? Or at the very least reducing
> the thread priority for a bit ?
> 
> Shouldn't we also enforce to always have a timeout ? IE. Something like
> 30s or so if nothing specified to avoid having the kernel just hard
> lock...
> 
> In general I don't like that sort of synchronous code, I'd rather return
> the busy status up the chain which gives a chance to the caller to take
> more appropriate measures depending on what it's doing, but that really
> depends what you use that synchronous call for. I suppose if it's for
> configuration type operations, it's ok...

In any case, don't resend the whole series, just that one patch.

Cheers,
Ben.
Robert Jennings - May 10, 2012, 7:08 p.m.
* Benjamin Herrenschmidt (benh@kernel.crashing.org) wrote:
> Hrm... I don't like that much:
> 
> > +	if (op->timeout)
> > +		deadline = jiffies + msecs_to_jiffies(op->timeout);
> > +
> > +	while (true) {
> > +		hret = plpar_hcall_norets(H_COP, op->flags,
> > +				vdev->resource_id,
> > +				op->in, op->inlen, op->out,
> > +				op->outlen, op->csbcpb);
> > +
> > +		if (hret == H_SUCCESS ||
> > +		    (hret != H_NOT_ENOUGH_RESOURCES &&
> > +		     hret != H_BUSY && hret != H_RESOURCE) ||
> > +		    (op->timeout && time_after(deadline, jiffies)))
> > +			break;
> > +
> > +		dev_dbg(dev, "%s: hcall ret(%ld), retrying.\n", __func__, hret);
> > +	}
> > +
> 
> Is this meant to be called in atomic context ? If not, maybe it should
> at the very least do a cond_resched() ?
> 
> Else, what about ceding the processor ? Or at the very least reducing
> the thread priority for a bit ?
> 
> Shouldn't we also enforce to always have a timeout ? IE. Something like
> 30s or so if nothing specified to avoid having the kernel just hard
> lock...
> 
> In general I don't like that sort of synchronous code, I'd rather return
> the busy status up the chain which gives a chance to the caller to take
> more appropriate measures depending on what it's doing, but that really
> depends what you use that synchronous call for. I suppose if it's for
> configuration type operations, it's ok...

This function is called in atomic context, it is used by PFO-type device
drivers to perform operations with the nest accelerator unit (like
crypto acceleration).

Having the timeout and retries in this function is the wrong thing to do.
We'll resubmit this without the loop and the caller will be responsible for
retrying the operations.

I would rather have the caller cede the processor or alter thread
priority where appropriate than doing that in this function.  I don't
think this should be done in this crypto driver.

--Rob Jennings
Benjamin Herrenschmidt - May 10, 2012, 9:58 p.m.
On Thu, 2012-05-10 at 14:08 -0500, Robert Jennings wrote:
> * Benjamin Herrenschmidt (benh@kernel.crashing.org) wrote:

> > Is this meant to be called in atomic context ? If not, maybe it should
> > at the very least do a cond_resched() ?
> > 
> > Else, what about ceding the processor ? Or at the very least reducing
> > the thread priority for a bit ?
> > 
> > Shouldn't we also enforce to always have a timeout ? IE. Something like
> > 30s or so if nothing specified to avoid having the kernel just hard
> > lock...
> > 
> > In general I don't like that sort of synchronous code, I'd rather return
> > the busy status up the chain which gives a chance to the caller to take
> > more appropriate measures depending on what it's doing, but that really
> > depends what you use that synchronous call for. I suppose if it's for
> > configuration type operations, it's ok...
> 
> This function is called in atomic context, it is used by PFO-type device
> drivers to perform operations with the nest accelerator unit (like
> crypto acceleration).
> 
> Having the timeout and retries in this function is the wrong thing to do.
> We'll resubmit this without the loop and the caller will be responsible for
> retrying the operations.
>
> I would rather have the caller cede the processor or alter thread
> priority where appropriate than doing that in this function.  I don't
> think this should be done in this crypto driver.

That sounds right indeed... as long as the upper crypto layer has a
concept of "try again later"... if it doesn't it will result in random
funny failures :-)

Cheers,
Ben.
Benjamin Herrenschmidt - May 14, 2012, 12:32 a.m.
On Fri, 2012-05-11 at 07:58 +1000, Benjamin Herrenschmidt wrote:
> > 
> > Having the timeout and retries in this function is the wrong thing to do.
> > We'll resubmit this without the loop and the caller will be responsible for
> > retrying the operations.
> >
> > I would rather have the caller cede the processor or alter thread
> > priority where appropriate than doing that in this function.  I don't
> > think this should be done in this crypto driver.
> 
> That sounds right indeed... as long as the upper crypto layer has a
> concept of "try again later"... if it doesn't it will result in random
> funny failures :-)

Ping ?

So I'm merging 1 to 5 (ie, up to and including the hwrng driver). I will
still merge the rest if you send a fix for that in the next day or so
but not much longer.

Cheers,
Ben.

Patch

diff --git a/arch/powerpc/include/asm/vio.h b/arch/powerpc/include/asm/vio.h
index 6bfd5ff..b19adf7 100644
--- a/arch/powerpc/include/asm/vio.h
+++ b/arch/powerpc/include/asm/vio.h
@@ -46,6 +46,48 @@ 
 
 struct iommu_table;
 
+/*
+ * Platform Facilities Option (PFO)-specific data
+ */
+
+/* Starting unit address for PFO devices on the VIO BUS */
+#define VIO_BASE_PFO_UA	0x50000000
+
+/**
+ * vio_pfo_op - PFO operation parameters
+ *
+ * @flags: h_call subfunctions and modifiers
+ * @in: Input data block logical real address
+ * @inlen: If non-negative, the length of the input data block.  If negative,
+ *	the length of the input data descriptor list in bytes.
+ * @out: Output data block logical real address
+ * @outlen: If non-negative, the length of the input data block.  If negative,
+ *	the length of the input data descriptor list in bytes.
+ * @csbcpb: Logical real address of the 4k naturally-aligned storage block
+ *	containing the CSB & optional FC field specific CPB
+ * @timeout: # of milliseconds to retry h_call, 0 for no timeout.
+ * @hcall_err: pointer to return the h_call return value, else NULL
+ */
+struct vio_pfo_op {
+	u64 flags;
+	s64 in;
+	s64 inlen;
+	s64 out;
+	s64 outlen;
+	u64 csbcpb;
+	void *done;
+	unsigned long handle;
+	unsigned int timeout;
+	long hcall_err;
+};
+
+/* End PFO specific data */
+
+enum vio_dev_family {
+	VDEVICE,	/* The OF node is a child of /vdevice */
+	PFO,		/* The OF node is a child of /ibm,platform-facilities */
+};
+
 /**
  * vio_dev - This structure is used to describe virtual I/O devices.
  *
@@ -58,6 +100,7 @@  struct vio_dev {
 	const char *name;
 	const char *type;
 	uint32_t unit_address;
+	uint32_t resource_id;
 	unsigned int irq;
 	struct {
 		size_t desired;
@@ -65,6 +108,7 @@  struct vio_dev {
 		size_t allocated;
 		atomic_t allocs_failed;
 	} cmo;
+	enum vio_dev_family family;
 	struct device dev;
 };
 
@@ -95,6 +139,8 @@  extern void vio_cmo_set_dev_desired(struct vio_dev *viodev, size_t desired);
 
 extern void __devinit vio_unregister_device(struct vio_dev *dev);
 
+extern int vio_h_cop_sync(struct vio_dev *vdev, struct vio_pfo_op *op);
+
 struct device_node;
 
 extern struct vio_dev *vio_register_device_node(
diff --git a/arch/powerpc/kernel/vio.c b/arch/powerpc/kernel/vio.c
index a3a9990..cb87301 100644
--- a/arch/powerpc/kernel/vio.c
+++ b/arch/powerpc/kernel/vio.c
@@ -14,7 +14,9 @@ 
  *      2 of the License, or (at your option) any later version.
  */
 
+#include <linux/cpu.h>
 #include <linux/types.h>
+#include <linux/delay.h>
 #include <linux/stat.h>
 #include <linux/device.h>
 #include <linux/init.h>
@@ -709,13 +711,26 @@  static int vio_cmo_bus_probe(struct vio_dev *viodev)
 	struct vio_driver *viodrv = to_vio_driver(dev->driver);
 	unsigned long flags;
 	size_t size;
+	bool dma_capable = false;
+
+	/* A device requires entitlement if it has a DMA window property */
+	switch (viodev->family) {
+	case VDEVICE:
+		if (of_get_property(viodev->dev.of_node,
+					"ibm,my-dma-window", NULL))
+			dma_capable = true;
+		break;
+	case PFO:
+		dma_capable = false;
+		break;
+	default:
+		dev_warn(dev, "unknown device family: %d\n", viodev->family);
+		BUG();
+		break;
+	}
 
-	/*
-	 * Check to see that device has a DMA window and configure
-	 * entitlement for the device.
-	 */
-	if (of_get_property(viodev->dev.of_node,
-	                    "ibm,my-dma-window", NULL)) {
+	/* Configure entitlement for the device. */
+	if (dma_capable) {
 		/* Check that the driver is CMO enabled and get desired DMA */
 		if (!viodrv->get_desired_dma) {
 			dev_err(dev, "%s: device driver does not support CMO\n",
@@ -1050,6 +1065,94 @@  static void vio_cmo_sysfs_init(void) { }
 EXPORT_SYMBOL(vio_cmo_entitlement_update);
 EXPORT_SYMBOL(vio_cmo_set_dev_desired);
 
+
+/*
+ * Platform Facilities Option (PFO) support
+ */
+
+/**
+ * vio_h_cop_sync - Perform a synchronous PFO co-processor operation
+ *
+ * @vdev - Pointer to a struct vio_dev for device
+ * @op - Pointer to a struct vio_pfo_op for the operation parameters
+ *
+ * Calls the hypervisor to synchronously perform the PFO operation
+ * described in @op.  In the case of a busy response from the hypervisor,
+ * the operation will be re-submitted indefinitely unless a non-zero timeout
+ * is specified or an error occurs. The timeout places a limit on when to
+ * stop re-submitting a operation, the total time can be exceeded if an
+ * operation is in progress.
+ *
+ * If op->hcall_ret is not NULL, this will be set to the return from the
+ * last h_cop_op call or it will be 0 if an error not involving the h_call
+ * was encountered.
+ *
+ * Returns:
+ *	0 on success,
+ *	-EINVAL if the h_call fails due to an invalid parameter,
+ *	-E2BIG if the h_call can not be performed synchronously,
+ *	-EBUSY if a timeout is specified and has elapsed,
+ *	-EACCES if the memory area for data/status has been rescinded, or
+ *	-EPERM if a hardware fault has been indicated
+ */
+int vio_h_cop_sync(struct vio_dev *vdev, struct vio_pfo_op *op)
+{
+	struct device *dev = &vdev->dev;
+	unsigned long deadline = 0;
+	long hret = 0;
+	int ret = 0;
+
+	if (op->timeout)
+		deadline = jiffies + msecs_to_jiffies(op->timeout);
+
+	while (true) {
+		hret = plpar_hcall_norets(H_COP, op->flags,
+				vdev->resource_id,
+				op->in, op->inlen, op->out,
+				op->outlen, op->csbcpb);
+
+		if (hret == H_SUCCESS ||
+		    (hret != H_NOT_ENOUGH_RESOURCES &&
+		     hret != H_BUSY && hret != H_RESOURCE) ||
+		    (op->timeout && time_after(deadline, jiffies)))
+			break;
+
+		dev_dbg(dev, "%s: hcall ret(%ld), retrying.\n", __func__, hret);
+	}
+
+	switch (hret) {
+	case H_SUCCESS:
+		ret = 0;
+		break;
+	case H_OP_MODE:
+	case H_TOO_BIG:
+		ret = -E2BIG;
+		break;
+	case H_RESCINDED:
+		ret = -EACCES;
+		break;
+	case H_HARDWARE:
+		ret = -EPERM;
+		break;
+	case H_NOT_ENOUGH_RESOURCES:
+	case H_RESOURCE:
+	case H_BUSY:
+		ret = -EBUSY;
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+
+	if (ret)
+		dev_dbg(dev, "%s: Sync h_cop_op failure (ret:%d) (hret:%ld)\n",
+				__func__, ret, hret);
+
+	op->hcall_err = hret;
+	return ret;
+}
+EXPORT_SYMBOL(vio_h_cop_sync);
+
 static struct iommu_table *vio_build_iommu_table(struct vio_dev *dev)
 {
 	const unsigned char *dma_window;
@@ -1211,35 +1314,87 @@  static void __devinit vio_dev_release(struct device *dev)
 struct vio_dev *vio_register_device_node(struct device_node *of_node)
 {
 	struct vio_dev *viodev;
+	struct device_node *parent_node;
 	const unsigned int *unit_address;
+	const unsigned int *pfo_resid = NULL;
+	enum vio_dev_family family;
+	const char *of_node_name = of_node->name ? of_node->name : "<unknown>";
 
-	/* we need the 'device_type' property, in order to match with drivers */
-	if (of_node->type == NULL) {
-		printk(KERN_WARNING "%s: node %s missing 'device_type'\n",
-				__func__,
-				of_node->name ? of_node->name : "<unknown>");
+	/*
+	 * Determine if this node is a under the /vdevice node or under the
+	 * /ibm,platform-facilities node.  This decides the device's family.
+	 */
+	parent_node = of_get_parent(of_node);
+	if (parent_node) {
+		if (!strcmp(parent_node->full_name, "/ibm,platform-facilities"))
+			family = PFO;
+		else if (!strcmp(parent_node->full_name, "/vdevice"))
+			family = VDEVICE;
+		else {
+			pr_warn("%s: parent(%s) of %s not recognized.\n",
+					__func__,
+					parent_node->full_name,
+					of_node_name);
+			of_node_put(parent_node);
+			return NULL;
+		}
+		of_node_put(parent_node);
+	} else {
+		pr_warn("%s: could not determine the parent of node %s.\n",
+				__func__, of_node_name);
 		return NULL;
 	}
 
-	unit_address = of_get_property(of_node, "reg", NULL);
-	if (unit_address == NULL) {
-		printk(KERN_WARNING "%s: node %s missing 'reg'\n",
-				__func__,
-				of_node->name ? of_node->name : "<unknown>");
-		return NULL;
+	if (family == PFO) {
+		if (of_get_property(of_node, "interrupt-controller", NULL)) {
+			pr_debug("%s: Skipping the interrupt controller %s.\n",
+					__func__, of_node_name);
+			return NULL;
+		}
 	}
 
 	/* allocate a vio_dev for this node */
 	viodev = kzalloc(sizeof(struct vio_dev), GFP_KERNEL);
-	if (viodev == NULL)
+	if (viodev == NULL) {
+		pr_warn("%s: allocation failure for VIO device.\n", __func__);
 		return NULL;
+	}
 
-	viodev->irq = irq_of_parse_and_map(of_node, 0);
+	/* we need the 'device_type' property, in order to match with drivers */
+	viodev->family = family;
+	if (viodev->family == VDEVICE) {
+		if (of_node->type != NULL)
+			viodev->type = of_node->type;
+		else {
+			pr_warn("%s: node %s is missing the 'device_type' "
+					"property.\n", __func__, of_node_name);
+			goto out;
+		}
+
+		unit_address = of_get_property(of_node, "reg", NULL);
+		if (unit_address == NULL) {
+			pr_warn("%s: node %s missing 'reg'\n",
+					__func__, of_node_name);
+			goto out;
+		}
+		dev_set_name(&viodev->dev, "%x", *unit_address);
+		viodev->irq = irq_of_parse_and_map(of_node, 0);
+		viodev->unit_address = *unit_address;
+	} else {
+		/* PFO devices need their resource_id for submitting COP_OPs
+		 * This is an optional field for devices, but is required when
+		 * performing synchronous ops */
+		pfo_resid = of_get_property(of_node, "ibm,resource-id", NULL);
+		if (pfo_resid != NULL)
+			viodev->resource_id = *pfo_resid;
+
+		unit_address = NULL;
+		dev_set_name(&viodev->dev, "%s", of_node_name);
+		viodev->type = of_node_name;
+		viodev->irq = 0;
+	}
 
-	dev_set_name(&viodev->dev, "%x", *unit_address);
 	viodev->name = of_node->name;
-	viodev->type = of_node->type;
-	viodev->unit_address = *unit_address;
 	viodev->dev.of_node = of_node_get(of_node);
 
 	if (firmware_has_feature(FW_FEATURE_CMO))
@@ -1267,16 +1422,51 @@  struct vio_dev *vio_register_device_node(struct device_node *of_node)
 	}
 
 	return viodev;
+
+out:	/* Use this exit point for any return prior to device_register */
+	kfree(viodev);
+
+	return NULL;
 }
 EXPORT_SYMBOL(vio_register_device_node);
 
+/*
+ * vio_bus_scan_for_devices - Scan OF and register each child device
+ * @root_name - OF node name for the root of the subtree to search.
+ *		This must be non-NULL
+ *
+ * Starting from the root node provide, register the device node for
+ * each child beneath the root.
+ */
+static void vio_bus_scan_register_devices(char *root_name)
+{
+	struct device_node *node_root, *node_child;
+
+	if (!root_name)
+		return;
+
+	node_root = of_find_node_by_name(NULL, root_name);
+	if (node_root) {
+
+		/*
+		 * Create struct vio_devices for each virtual device in
+		 * the device tree. Drivers will associate with them later.
+		 */
+		node_child = of_get_next_child(node_root, NULL);
+		while (node_child) {
+			vio_register_device_node(node_child);
+			node_child = of_get_next_child(node_root, node_child);
+		}
+		of_node_put(node_root);
+	}
+}
+
 /**
  * vio_bus_init: - Initialize the virtual IO bus
  */
 static int __init vio_bus_init(void)
 {
 	int err;
-	struct device_node *node_vroot;
 
 	if (firmware_has_feature(FW_FEATURE_CMO))
 		vio_cmo_sysfs_init();
@@ -1301,19 +1491,8 @@  static int __init vio_bus_init(void)
 	if (firmware_has_feature(FW_FEATURE_CMO))
 		vio_cmo_bus_init();
 
-	node_vroot = of_find_node_by_name(NULL, "vdevice");
-	if (node_vroot) {
-		struct device_node *of_node;
-
-		/*
-		 * Create struct vio_devices for each virtual device in
-		 * the device tree. Drivers will associate with them later.
-		 */
-		for (of_node = node_vroot->child; of_node != NULL;
-				of_node = of_node->sibling)
-			vio_register_device_node(of_node);
-		of_node_put(node_vroot);
-	}
+	vio_bus_scan_register_devices("vdevice");
+	vio_bus_scan_register_devices("ibm,platform-facilities");
 
 	return 0;
 }
@@ -1436,12 +1615,28 @@  struct vio_dev *vio_find_node(struct device_node *vnode)
 {
 	const uint32_t *unit_address;
 	char kobj_name[20];
+	struct device_node *vnode_parent;
+	const char *dev_type;
+
+	vnode_parent = of_get_parent(vnode);
+	if (!vnode_parent)
+		return NULL;
+
+	dev_type = of_get_property(vnode_parent, "device_type", NULL);
+	of_node_put(vnode_parent);
+	if (!dev_type)
+		return NULL;
 
 	/* construct the kobject name from the device node */
-	unit_address = of_get_property(vnode, "reg", NULL);
-	if (!unit_address)
+	if (!strcmp(dev_type, "vdevice")) {
+		unit_address = of_get_property(vnode, "reg", NULL);
+		if (!unit_address)
+			return NULL;
+		snprintf(kobj_name, sizeof(kobj_name), "%x", *unit_address);
+	} else if (!strcmp(dev_type, "ibm,platform-facilities"))
+		snprintf(kobj_name, sizeof(kobj_name), "%s", vnode->name);
+	else
 		return NULL;
-	snprintf(kobj_name, sizeof(kobj_name), "%x", *unit_address);
 
 	return vio_find_name(kobj_name);
 }