Patchwork [RFC,1/2] ivring: Add a ring-buffer driver on IVShmem

login
register
mail settings
Submitter Yoshihiro YUNOMAE
Date June 5, 2012, 1:01 p.m.
Message ID <20120605130117.15479.32680.stgit@ltc189.sdl.hitachi.co.jp>
Download mbox | patch
Permalink /patch/163104/
State New
Headers show

Comments

Yoshihiro YUNOMAE - June 5, 2012, 1:01 p.m.
This patch adds a ring-buffer driver for IVShmem device, a virtual RAM device in
QEMU. This driver can be used as a ring-buffer for kernel logging or tracing of
a guest OS by recording kernel programing or SystemTap.

This ring-buffer driver is implemented very simple. First 4kB of shared memory
region is control structure of a ring-buffer. In this region, some values for
managing the ring-buffer is stored such as bits and mask of whole memory size,
writing position, threshold value for notification to a reader on a host OS.
This region is used by the reader to know writing position. Then, "total
memory size - 4kB" equals to usable memory region for recording data.
This ring-buffer driver records any data from start to end of the writable
memory region.

When writing size exceeds a threshold value, this driver can notify a reader
to read data by using writel(). As this later patch, reader does not have any
function for receiving the notification. This notification feature will be used
near the future.

As a writer records data in this ring-buffer, spinlock function is used to
avoid competing by some writers in multi CPU environment. Not to use spinlock,
lockless ring-buffer like as ftrace and one ring-buffer one CPU will be
implemented near the future.

Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Akihiro Nagai <akihiro.nagai.hw@hitachi.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: MyungJoo Ham <myungjoo.ham@samsung.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: linux-kernel@vger.kernel.org
Cc: Cam Macdonell <cam@cs.ualberta.ca>
Cc: qemu-devel@nongnu.org
Cc: systemtap@sourceware.org
---

 drivers/Kconfig          |    1 
 drivers/Makefile         |    1 
 drivers/ivshmem/Kconfig  |    9 +
 drivers/ivshmem/Makefile |    5 
 drivers/ivshmem/ivring.c |  551 ++++++++++++++++++++++++++++++++++++++++++++++
 drivers/ivshmem/ivring.h |   77 ++++++
 6 files changed, 644 insertions(+), 0 deletions(-)
 create mode 100644 drivers/ivshmem/Kconfig
 create mode 100644 drivers/ivshmem/Makefile
 create mode 100644 drivers/ivshmem/ivring.c
 create mode 100644 drivers/ivshmem/ivring.h
Anthony Liguori - June 5, 2012, 11:03 p.m.
On 06/05/2012 09:10 PM, Borislav Petkov wrote:
> On Tue, Jun 05, 2012 at 10:01:17PM +0900, Yoshihiro YUNOMAE wrote:
>> This patch adds a ring-buffer driver for IVShmem device, a virtual RAM device in
>> QEMU. This driver can be used as a ring-buffer for kernel logging or tracing of
>> a guest OS by recording kernel programing or SystemTap.
>>
>> This ring-buffer driver is implemented very simple. First 4kB of shared memory
>> region is control structure of a ring-buffer. In this region, some values for
>> managing the ring-buffer is stored such as bits and mask of whole memory size,
>> writing position, threshold value for notification to a reader on a host OS.
>> This region is used by the reader to know writing position. Then, "total
>> memory size - 4kB" equals to usable memory region for recording data.
>> This ring-buffer driver records any data from start to end of the writable
>> memory region.
>>
>> When writing size exceeds a threshold value, this driver can notify a reader
>> to read data by using writel(). As this later patch, reader does not have any
>> function for receiving the notification. This notification feature will be used
>> near the future.
>>
>> As a writer records data in this ring-buffer, spinlock function is used to
>> avoid competing by some writers in multi CPU environment. Not to use spinlock,
>> lockless ring-buffer like as ftrace and one ring-buffer one CPU will be
>> implemented near the future.
>
> Yet another ring buffer?
>
> We already have an ftrace and perf ring buffer, can't you use one of those?

Not to mention virtio :-)

Why not just make a virtio device for this kind of thing?

Regards,

Anthony Liguori

>
Greg KH - June 5, 2012, 11:22 p.m.
On Wed, Jun 06, 2012 at 07:03:06AM +0800, Anthony Liguori wrote:
> On 06/05/2012 09:10 PM, Borislav Petkov wrote:
> >On Tue, Jun 05, 2012 at 10:01:17PM +0900, Yoshihiro YUNOMAE wrote:
> >>This patch adds a ring-buffer driver for IVShmem device, a virtual RAM device in
> >>QEMU. This driver can be used as a ring-buffer for kernel logging or tracing of
> >>a guest OS by recording kernel programing or SystemTap.
> >>
> >>This ring-buffer driver is implemented very simple. First 4kB of shared memory
> >>region is control structure of a ring-buffer. In this region, some values for
> >>managing the ring-buffer is stored such as bits and mask of whole memory size,
> >>writing position, threshold value for notification to a reader on a host OS.
> >>This region is used by the reader to know writing position. Then, "total
> >>memory size - 4kB" equals to usable memory region for recording data.
> >>This ring-buffer driver records any data from start to end of the writable
> >>memory region.
> >>
> >>When writing size exceeds a threshold value, this driver can notify a reader
> >>to read data by using writel(). As this later patch, reader does not have any
> >>function for receiving the notification. This notification feature will be used
> >>near the future.
> >>
> >>As a writer records data in this ring-buffer, spinlock function is used to
> >>avoid competing by some writers in multi CPU environment. Not to use spinlock,
> >>lockless ring-buffer like as ftrace and one ring-buffer one CPU will be
> >>implemented near the future.
> >
> >Yet another ring buffer?
> >
> >We already have an ftrace and perf ring buffer, can't you use one of those?
> 
> Not to mention virtio :-)
> 
> Why not just make a virtio device for this kind of thing?

Yeah, that's exactly what I was thinking, why reinvent things again?

greg k-h
Masami Hiramatsu - June 6, 2012, 2:44 p.m.
(2012/06/06 8:22), Greg Kroah-Hartman wrote:
> On Wed, Jun 06, 2012 at 07:03:06AM +0800, Anthony Liguori wrote:
>> On 06/05/2012 09:10 PM, Borislav Petkov wrote:
>>> On Tue, Jun 05, 2012 at 10:01:17PM +0900, Yoshihiro YUNOMAE wrote:
>>>> This patch adds a ring-buffer driver for IVShmem device, a virtual RAM device in
>>>> QEMU. This driver can be used as a ring-buffer for kernel logging or tracing of
>>>> a guest OS by recording kernel programing or SystemTap.
>>>>
>>>> This ring-buffer driver is implemented very simple. First 4kB of shared memory
>>>> region is control structure of a ring-buffer. In this region, some values for
>>>> managing the ring-buffer is stored such as bits and mask of whole memory size,
>>>> writing position, threshold value for notification to a reader on a host OS.
>>>> This region is used by the reader to know writing position. Then, "total
>>>> memory size - 4kB" equals to usable memory region for recording data.
>>>> This ring-buffer driver records any data from start to end of the writable
>>>> memory region.
>>>>
>>>> When writing size exceeds a threshold value, this driver can notify a reader
>>>> to read data by using writel(). As this later patch, reader does not have any
>>>> function for receiving the notification. This notification feature will be used
>>>> near the future.
>>>>
>>>> As a writer records data in this ring-buffer, spinlock function is used to
>>>> avoid competing by some writers in multi CPU environment. Not to use spinlock,
>>>> lockless ring-buffer like as ftrace and one ring-buffer one CPU will be
>>>> implemented near the future.
>>>
>>> Yet another ring buffer?
>>>
>>> We already have an ftrace and perf ring buffer, can't you use one of those?
>>
>> Not to mention virtio :-)
>>
>> Why not just make a virtio device for this kind of thing?
> 
> Yeah, that's exactly what I was thinking, why reinvent things again?

Agreed. Actually, we think this is just a concept prototype.
Because of many restrictions of this device, especially for
scalability (which Yoshihiro will give a talk in LinuxCon Japan),
we are considering to move onto a virtio-based shmem device.

Afaics, it seems possible to use it virtio-ballon like way to pass
actual pages of the guest ring buffer to host. Then the reader
can read the pages directly from qemu.

Thank you,

Patch

diff --git a/drivers/Kconfig b/drivers/Kconfig
index bfc9186..e01adcd 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -148,4 +148,5 @@  source "drivers/iio/Kconfig"
 
 source "drivers/vme/Kconfig"
 
+source "drivers/ivshmem/Kconfig"
 endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index 2ba29ff..1ebdd03 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -23,6 +23,7 @@  obj-y				+= amba/
 # really early.
 obj-$(CONFIG_DMA_ENGINE)	+= dma/
 
+obj-$(CONFIG_IVRING_MANAGER)	+= ivshmem/
 obj-$(CONFIG_VIRTIO)		+= virtio/
 obj-$(CONFIG_XEN)		+= xen/
 
diff --git a/drivers/ivshmem/Kconfig b/drivers/ivshmem/Kconfig
new file mode 100644
index 0000000..e84364a
--- /dev/null
+++ b/drivers/ivshmem/Kconfig
@@ -0,0 +1,9 @@ 
+#
+# IVShmem support drivers
+#
+
+config IVRING_MANAGER
+	tristate "IVRing management driver"
+	help
+	  It allows IVShmem, a virtual PCI RAM device in QEMU, to use as a
+	  ring-buffer for tracing of a guest.
diff --git a/drivers/ivshmem/Makefile b/drivers/ivshmem/Makefile
new file mode 100644
index 0000000..e725f8c
--- /dev/null
+++ b/drivers/ivshmem/Makefile
@@ -0,0 +1,5 @@ 
+#
+# Makefile for IVShmem drivers
+#
+
+obj-$(CONFIG_IVRING_MANAGER)	+= ivring.o
diff --git a/drivers/ivshmem/ivring.c b/drivers/ivshmem/ivring.c
new file mode 100644
index 0000000..5cbcfb6
--- /dev/null
+++ b/drivers/ivshmem/ivring.c
@@ -0,0 +1,551 @@ 
+/*
+ * Ring buffer on IVShmem Driver
+ *
+ * (C) 2012 Hitachi, Ltd.
+ * Written by Hitachi Yokohama Research Laboratory.
+ *
+ * Created by Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
+ *            Akihiro Nagai <akihiro.nagai.hw@hitachi.com>
+ *            Yoshihiro Yunomae <yoshihiro.yunomae.ez@hitachi.com>
+ * based on UIOIVShmem Driver, http://www.gitorious.org/nahanni/guest-code,
+ *                                   (C) 2009 Cam Macdonell <cam@cs.ualberta.ca>
+ * based on Hilscher CIF card driver (C) 2007 Hans J. Koch <hjk@linutronix.de>
+ *
+ * Licensed under GPL version 2 only.
+ *
+ */
+
+#include <linux/bitops.h>
+#include <linux/device.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/spinlock.h>
+#include <linux/string.h>
+#include "./ivring.h"
+
+
+#define IVSHM_OFFS_INTRMASK	0
+#define IVSHM_OFFS_INTRSTATUS	4
+#define IVSHM_OFFS_IVPOSITION	8
+#define IVSHM_OFFS_DOORBELL	12
+
+#define MSIX_NAMEBUF_SIZE	128
+#define DEFAULT_NR_VECTORS	4
+
+#define IVRING_DEVNAME	"ivring"
+
+struct ivring_mem {
+	unsigned long	addr;
+	unsigned long	size;
+	void __iomem	*ioaddr;
+};
+
+struct ivring_info {
+	struct pci_dev		*dev;
+	int			irq;
+	struct ivring_mem	mem[2]; /* 0:control, 1:shmem */
+	struct msix_entry	*msix_entries;
+	char			(*msix_names)[MSIX_NAMEBUF_SIZE];
+	int			nvectors;
+	int			posn;
+	struct ivring_hdr	*hdr;
+};
+
+#define MAX_IVRING_CHN	16
+
+static struct ivring_info *ivring_channels[MAX_IVRING_CHN];
+static spinlock_t ivring_locks[MAX_IVRING_CHN];
+
+static void ivring_init_locks(void)
+{
+	int i;
+
+	for (i = 0; i < MAX_IVRING_CHN; i++)
+		spin_lock_init(&ivring_locks[i]);
+}
+
+#define ivring_lock(id, flags) \
+	spin_lock_irqsave(&ivring_locks[id], flags)
+
+#define ivring_unlock(id, flags) \
+	spin_unlock_irqrestore(&ivring_locks[id], flags)
+
+/* Device I/O helper: Don't check mem[0].ioaddr is ready */
+static int ivring_read_position(struct ivring_info *info)
+{
+	void __iomem *addr = (u8 *)info->mem[0].ioaddr + IVSHM_OFFS_IVPOSITION;
+	u32 val = readl(addr);
+
+	/* return as a singed value */
+	return (int)val;
+}
+
+/* Note: this operation is destructive. Intr status is cleared after reading */
+static u32 ivring_read_intr(struct ivring_info *info)
+{
+	void __iomem *addr = (u8 *)info->mem[0].ioaddr + IVSHM_OFFS_INTRSTATUS;
+	return readl(addr);
+}
+
+static void ivring_write_intrmask(struct ivring_info *info, u32 mask)
+{
+	void __iomem *addr = (u8 *)info->mem[0].ioaddr + IVSHM_OFFS_INTRMASK;
+	writel(mask, addr);
+}
+
+static void ivring_write_doorbell(struct ivring_info *info, int posn, int vec)
+{
+	u32 door = ((posn & 0xffff) << 16) | (vec & 0x00ff);
+	void __iomem *addr = (u8 *)info->mem[0].ioaddr + IVSHM_OFFS_DOORBELL;
+	writel(door, addr);
+}
+
+static unsigned long ivring_shmsize(struct ivring_info *info)
+{
+	return info->mem[1].size;
+}
+
+static int ivring_hdr_init(struct ivring_hdr *hdr, u32 shmsize)
+{
+	if (strncmp(hdr->magic, IVRING_MAGIC, 4) == 0) {
+		printk(KERN_INFO "Ring header is already initialized\n");
+		printk(KERN_INFO "reader %d, writer %d, pos %llx\n",
+			 (int)hdr->reader, (int)hdr->writer, hdr->pos);
+		if (hdr->version != IVRING_VERSION) {
+			printk(KERN_ERR "Ring version is different! (%d)\n",
+				(int)hdr->version);
+			return -EINVAL;
+		}
+		return 0;
+	}
+	memset(hdr, 0, IVRING_OFFSET);
+	memcpy(hdr->magic, IVRING_MAGIC, 4);
+	hdr->version = IVRING_VERSION;
+	hdr->reader = -1;
+	hdr->writer = -1;
+	hdr->total_bits = __fls(shmsize);
+	hdr->total_mask = ~(~0 << hdr->total_bits);
+	hdr->threshold = IVRING_INIT_THRESHOLD;
+	hdr->pos = IVRING_STARTPOS;
+	return 1;
+}
+
+static void ivring_notify_reader(struct ivring_info *info)
+{
+	if (info->hdr->reader != -1) {
+		pr_debug("Notify update to reader %d\n", info->hdr->reader);
+		ivring_write_doorbell(info, info->hdr->reader, IVRING_VECTOR);
+	}
+}
+
+static int ivring_init_hdr(struct ivring_info *info)
+{
+	if (!info->mem[1].ioaddr) {
+		printk(KERN_ERR "IVRing: IVShmem is not mapped.\n");
+		return -1;
+	}
+
+	info->hdr = info->mem[1].ioaddr;
+	ivring_hdr_init(info->hdr, ivring_shmsize(info));
+
+	info->hdr->writer = info->posn;
+	ivring_notify_reader(info);
+	return 0;
+}
+
+static void ivring_cleanup_hdr(struct ivring_info *info)
+{
+	if (!info->hdr || info->hdr->writer != info->posn)
+		return;
+
+	info->hdr->writer = -1;
+	/* Don't clear pos */
+	ivring_notify_reader(info);
+}
+
+/**
+ * ivring_ready - get an IVRing channel
+ * @id: ID of a ring-buffer
+ *
+ * Get whether a ring-buffer specified by ID is useable or not.
+ *
+ */
+bool ivring_ready(int id)
+{
+	if (id < 0 || id >= MAX_IVRING_CHN || ivring_channels[id] == NULL)
+		return false;
+
+	return true;
+}
+EXPORT_SYMBOL_GPL(ivring_ready);
+
+/**
+ * ivring_write - record data to IVRing
+ * @id: ID of a ring-buffer
+ * @buf: data buffer
+ * @size: data size(byte)
+ *
+ * Record data from address indicating a position of buffer to a ring-buffer
+ * specified by ID.
+ *
+ * Spinlock function is used and only one buffer is available for some CPUs.
+ * Then, DO NOT record data in SMP environment in this version.
+ *
+ */
+int ivring_write(int id, void *buf, size_t size)
+{
+	struct ivring_info *info;
+	struct ivring_hdr *hdr;
+	unsigned long flags;
+	u32 pos, tbits, room;
+	int ret = 0;
+
+	if (!ivring_ready(id))
+		return -ENOENT;
+
+	ivring_lock(id, flags);
+	info = ivring_channels[id];
+	if (unlikely(info == NULL)) {
+		ret = -ENOENT;
+		goto out;
+	}
+
+	hdr = info->hdr;
+	tbits = hdr->total_bits;
+	if (unlikely(size >> tbits)) {
+		/* write-size exceeds bufer-size */
+		ret = -E2BIG;
+		goto out;
+	}
+
+	pos = (u32)hdr->pos & hdr->total_mask;
+
+	if (unlikely((pos + size) >> tbits)) {
+		room = (1 << tbits) - pos;
+		memcpy(ivring_pos_addr(hdr, pos), buf, room);
+		memcpy(ivring_pos_addr(hdr, IVRING_OFFSET), buf + room,
+			size - room);
+		hdr->pos += size + IVRING_OFFSET;
+	} else {
+		memcpy(ivring_pos_addr(hdr, pos), buf, size);
+		hdr->pos += size;
+	}
+
+	/*
+	 * Notify reader if counter is over the threshold
+	 * This feature will be used for IVRing reader.
+	 */
+	if (hdr->threshold < hdr->pos) {
+		hdr->threshold = IVRING_INIT_THRESHOLD;
+		ivring_notify_reader(info);
+	}
+	ret = (int)size;
+out:
+	ivring_unlock(id, flags);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(ivring_write);
+
+/* Channel management functions */
+static int ivring_register_channel(struct ivring_info *info)
+{
+	int i;
+	unsigned long flags;
+
+	for (i = 0; i < MAX_IVRING_CHN; i++) {
+		ivring_lock(i, flags);
+		if (ivring_channels[i] == NULL) {
+			ivring_channels[i] = info;
+			info->posn = ivring_read_position(info);
+			ivring_init_hdr(info);
+			printk(KERN_INFO "Add ivring id %d, pos %d\n",
+								i, info->posn);
+			ivring_unlock(i, flags);
+			return i;
+		}
+		ivring_unlock(i, flags);
+	}
+
+	return -1;
+}
+
+static void ivring_unregister_channel(struct ivring_info *info)
+{
+	int i;
+	unsigned long flags;
+
+	for (i = 0; i < MAX_IVRING_CHN; i++) {
+		ivring_lock(i, flags);
+		if (ivring_channels[i] == info) {
+			printk(KERN_INFO "Remove ivring id %d, pos %d\n",
+								i, info->posn);
+			ivring_channels[i] = NULL;
+			ivring_cleanup_hdr(info);
+			ivring_unlock(i, flags);
+			break;
+		}
+		ivring_unlock(i, flags);
+	}
+}
+
+/* IVRing interrupt handlers */
+static void ivring_event_handler(int irq, struct ivring_info *info)
+{
+	/* TODO: update noticed - take a reaction? */
+
+	pr_debug("IVRing: Get an interrupt %d:%d.\n", info->posn, irq);
+}
+
+static irqreturn_t ivring_irq_handler(int irq, void *opaque)
+{
+	struct ivring_info *info = opaque;
+
+	if (ivring_read_intr(info) == 0)
+		return IRQ_NONE;
+
+	ivring_event_handler(irq, info);
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t ivring_msix_handler(int irq, void *opaque)
+{
+	struct ivring_info *info = opaque;
+
+	ivring_event_handler(irq, info);
+
+	return IRQ_HANDLED;
+}
+
+static void free_msix_vectors(struct ivring_info *info)
+{
+	int i;
+
+	if (info->irq != -1 || info->nvectors == 0)
+		/* No need to free it */
+		return;
+
+	for (i = 0; i < info->nvectors; i++)
+		free_irq(info->msix_entries[i].vector, info);
+	pci_disable_msix(info->dev);
+	info->nvectors = 0;
+
+	kfree(info->msix_entries);
+	info->msix_entries = NULL;
+	kfree(info->msix_names);
+	info->msix_names = NULL;
+}
+
+/* Setup MSI-X interrupt vectors */
+static int request_msix_vectors(struct ivring_info *info, int nvectors)
+{
+	int i, err;
+
+	info->msix_entries = kzalloc(nvectors * sizeof(*info->msix_entries),
+					GFP_KERNEL);
+	if (info->msix_entries == NULL)
+		return -ENOMEM;
+
+	info->msix_names = kzalloc(nvectors * sizeof(*info->msix_names),
+					GFP_KERNEL);
+	if (info->msix_names == NULL) {
+		kfree(info->msix_entries);
+		info->msix_entries = NULL;
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < nvectors; ++i)
+		info->msix_entries[i].entry = i;
+
+	err = pci_enable_msix(info->dev, info->msix_entries, nvectors);
+	if (err > 0) {
+		nvectors = err; /* msi-x positive error code
+				 returns the number available*/
+		err = pci_enable_msix(info->dev, info->msix_entries, nvectors);
+		if (err) {
+			printk(KERN_INFO "no MSI (%d). Back to INTx.\n", err);
+			goto error;
+		}
+	}
+
+	if (err)
+		goto error;
+
+	info->nvectors = nvectors;
+
+	for (i = 0; i < info->nvectors; i++) {
+
+		snprintf(info->msix_names[i], MSIX_NAMEBUF_SIZE,
+			"%s-config", IVRING_DEVNAME);
+
+		err = request_irq(info->msix_entries[i].vector,
+			ivring_msix_handler, 0,
+			info->msix_names[i], info);
+
+		if (err)
+			goto error_free_irq;
+	}
+
+	return 0;
+
+error_free_irq:
+	while (i--)
+		free_irq(info->msix_entries[i].vector, info);
+
+	pci_disable_msix(info->dev);
+	info->nvectors = 0;
+error:
+	kfree(info->msix_entries);
+	info->msix_entries = NULL;
+	kfree(info->msix_names);
+	info->msix_names = NULL;
+	return err;
+
+}
+
+static int __devinit ivring_pci_probe(struct pci_dev *dev,
+					const struct pci_device_id *id)
+{
+	struct ivring_info *info;
+	int ret;
+
+	info = kzalloc(sizeof(struct ivring_info), GFP_KERNEL);
+	if (!info)
+		return -ENOMEM;
+
+	if (pci_enable_device(dev)) {
+		printk(KERN_ERR "IVRing: Failed to probe device.\n");
+		goto out_free;
+	}
+
+	if (pci_request_regions(dev, IVRING_DEVNAME)) {
+		printk(KERN_ERR "IVRing: Failed (disable).\n");
+		goto out_disable;
+	}
+
+	/* Init control memory region */
+	info->mem[0].addr = pci_resource_start(dev, 0);
+	if (!info->mem[0].addr) {
+		printk(KERN_ERR "IVRing: Failed (release).\n");
+		goto out_release;
+	}
+
+	info->mem[0].size = pci_resource_len(dev, 0);
+	info->mem[0].ioaddr = pci_ioremap_bar(dev, 0);
+	if (!info->mem[0].ioaddr) {
+		printk(KERN_ERR "IVRing: Failed (release).\n");
+		goto out_release;
+	}
+
+	/* Init shmem region */
+	info->mem[1].addr = pci_resource_start(dev, 2);
+	if (!info->mem[1].addr) {
+		printk(KERN_INFO "failed to get addr\n");
+		printk(KERN_ERR "IVRing: Failed (unmap).\n");
+		goto out_unmap;
+	}
+
+	info->mem[1].size = pci_resource_len(dev, 2);
+	info->mem[1].ioaddr = ioremap_cache(info->mem[1].addr,
+						info->mem[1].size);
+	if (!info->mem[1].ioaddr) {
+		printk(KERN_INFO "failed to map addr\n");
+		printk(KERN_ERR "IVRing: Failed (unmap).\n");
+		goto out_unmap;
+	}
+
+	info->dev = dev;
+
+	/* Init interrupt vectors */
+	if (request_msix_vectors(info, DEFAULT_NR_VECTORS) != 0) {
+		info->irq = dev->irq;
+		ret = request_irq(info->irq, ivring_irq_handler, IRQF_SHARED,
+				IVRING_DEVNAME, info);
+		if (ret < 0) {
+			printk(KERN_ERR "IVRing: Failed (unmap2).\n");
+			goto out_unmap2;
+		}
+
+		printk(KERN_INFO "Regular IRQs enabled\n");
+		ivring_write_intrmask(info, 0xffffffff);
+	} else {
+		printk(KERN_INFO "MSI-X enabled\n");
+		info->irq = -1;
+		ivring_write_intrmask(info, 0xffffffff);
+	}
+
+	pci_set_drvdata(dev, info);
+
+	ivring_register_channel(info);
+
+	return 0;
+
+out_unmap2:
+	iounmap(info->mem[1].ioaddr);
+out_unmap:
+	iounmap(info->mem[0].ioaddr);
+out_release:
+	pci_release_regions(dev);
+out_disable:
+	pci_disable_device(dev);
+out_free:
+	kfree(info);
+	return -ENODEV;
+}
+
+static void ivring_pci_remove(struct pci_dev *dev)
+{
+	struct ivring_info *info = pci_get_drvdata(dev);
+
+	ivring_unregister_channel(info);
+
+	if (info->irq != -1)
+		free_irq(info->irq, info);
+	else
+		free_msix_vectors(info);
+
+	pci_release_regions(dev);
+	pci_disable_device(dev);
+	pci_set_drvdata(dev, NULL);
+	iounmap(info->mem[1].ioaddr);
+	iounmap(info->mem[0].ioaddr);
+
+	kfree(info);
+}
+
+static struct pci_device_id ivring_pci_ids[] __devinitdata = {
+	{
+		.vendor =	0x1af4,
+		.device =	0x1110,
+		.subvendor =	PCI_ANY_ID,
+		.subdevice =	PCI_ANY_ID,
+	},
+	{ 0, }
+};
+
+static struct pci_driver ivring_pci_driver = {
+	.name = "ivring",
+	.id_table = ivring_pci_ids,
+	.probe = ivring_pci_probe,
+	.remove = ivring_pci_remove,
+};
+
+static int __init ivring_init_module(void)
+{
+	ivring_init_locks();
+	return pci_register_driver(&ivring_pci_driver);
+}
+
+static void __exit ivring_exit_module(void)
+{
+	pci_unregister_driver(&ivring_pci_driver);
+}
+
+module_init(ivring_init_module);
+module_exit(ivring_exit_module);
+
+MODULE_DEVICE_TABLE(pci, ivring_pci_ids);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Hitachi, Ltd.");
diff --git a/drivers/ivshmem/ivring.h b/drivers/ivshmem/ivring.h
new file mode 100644
index 0000000..2fe9b46
--- /dev/null
+++ b/drivers/ivshmem/ivring.h
@@ -0,0 +1,77 @@ 
+#ifndef __IVRING_H__
+#define __IVRING_H__
+
+/* ivshmem ring buffer header */
+#ifdef __KERNEL__
+#include <linux/device.h>
+#include <linux/module.h>
+#else
+#include <stdbool.h>
+#include <bits/types.h>
+typedef int32_t		s32;
+typedef uint32_t	u32;
+typedef uint64_t	u64;
+#endif
+
+/* control structure of ivshmem ring buffer */
+struct ivring_hdr {
+	char magic[4];		/* Magic ID */
+	u32  version;		/* IVRing version number */
+	s32  reader;		/* reader ID */
+	s32  writer;		/* writer ID */
+	u32  total_mask;	/* bit mask of whole memory size */
+	u32  total_bits;	/* bits of whole memory size */
+	u64  pos;		/* writing position */
+	u64  threshold;		/* threshold value for notification */
+};
+
+#define IVRING_MAGIC	"RING"
+#define IVRING_VERSION	1
+#define IVRING_OFFSET	4096	/* This page is for the header */
+#define IVRING_VECTOR	0	/* Doorbell Number */
+#define IVRING_STARTPOS	IVRING_OFFSET
+#define IVRING_INIT_THRESHOLD	(~0ULL)
+#define IVRING_READ_MARGIN	4096
+
+static inline void *ivring_start_addr(struct ivring_hdr *hdr)
+{
+	return (char *)hdr + IVRING_OFFSET;
+}
+
+static inline void *ivring_end_addr(struct ivring_hdr *hdr)
+{
+	return (char *)hdr + (1 << hdr->total_bits);
+}
+
+static inline void *ivring_pos_addr(struct ivring_hdr *hdr, u32 pos)
+{
+	return (char *)hdr + pos;
+}
+
+static inline void *ivring_pos64_addr(struct ivring_hdr *hdr, u64 pos)
+{
+	u32 pos32;
+	pos32 = (u32)pos & hdr->total_mask;
+	return ivring_pos_addr(hdr, pos32);
+}
+
+static inline bool ivring_verify_pos(struct ivring_hdr *hdr, u32 pos)
+{
+	if (pos < IVRING_OFFSET ||
+	    pos >= (1 << hdr->total_bits))
+		return false;
+	return true;
+}
+
+#ifdef __KERNEL__
+/* Kernel ringbuffer(writer) APIs */
+
+/* Get an IVRing channel */
+extern bool ivring_ready(int id);
+
+/* Record data to IVRing */
+extern int ivring_write(int id, void *buf, size_t size);
+
+#endif
+
+#endif	/* __IVRING_H__ */