[2/2] dm log writes: add support for DAX

Message ID 20171020052404.13762-2-ross.zwisler@linux.intel.com
State New
Headers show
Series
  • [1/2] dm log writes: Add support for inline data buffers
Related show

Commit Message

Ross Zwisler Oct. 20, 2017, 5:24 a.m.
Now that we have the ability log filesystem writes using a flat buffer, add
support for DAX.  Unfortunately we can't easily track data that has been
written via mmap() now that the dax_flush() abstraction was removed by this
commit:

commit c3ca015fab6d ("dax: remove the pmem_dax_ops->flush abstraction")

Otherwise we could just treat each flush as a big write, and store the data
that is being synced to media.  It may be worthwhile to add the dax_flush()
entry point back, just as a notifier so we can do this logging.

The motivation for this support is the need for an xfstest that can test
the new MAP_SYNC DAX flag.  By logging the filesystem activity with
dm-log-writes we can show that the MAP_SYNC page faults are writing out
their metadata as they happen, instead of requiring an explicit
msync/fsync.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
---

Here's a link to Jan's latest MAP_SYNC set, which can be used for the
fstest:

https://www.spinics.net/lists/linux-xfs/msg11852.html

MAP_SYNC is not needed for basic DAX+dm-log-writes functionality.

---
 drivers/md/dm-log-writes.c | 90 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 89 insertions(+), 1 deletion(-)

Comments

Josef Bacik Oct. 23, 2017, 5:34 p.m. | #1
On Thu, Oct 19, 2017 at 11:24:04PM -0600, Ross Zwisler wrote:
> Now that we have the ability log filesystem writes using a flat buffer, add
> support for DAX.  Unfortunately we can't easily track data that has been
> written via mmap() now that the dax_flush() abstraction was removed by this
> commit:
> 
> commit c3ca015fab6d ("dax: remove the pmem_dax_ops->flush abstraction")
> 
> Otherwise we could just treat each flush as a big write, and store the data
> that is being synced to media.  It may be worthwhile to add the dax_flush()
> entry point back, just as a notifier so we can do this logging.
> 
> The motivation for this support is the need for an xfstest that can test
> the new MAP_SYNC DAX flag.  By logging the filesystem activity with
> dm-log-writes we can show that the MAP_SYNC page faults are writing out
> their metadata as they happen, instead of requiring an explicit
> msync/fsync.
> 
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> ---

Ok this is just my ignorance of how DAX works shining through, but do we need a
new flag to indicate this is DAX data?  You are logging it like it's just normal
data going to a certain sector, is that good enough?  If it is then hooray this
looks fine to me, I'm just slightly confused.  Thanks,

Josef
Ross Zwisler Oct. 23, 2017, 6:59 p.m. | #2
On Mon, Oct 23, 2017 at 01:34:09PM -0400, Josef Bacik wrote:
> On Thu, Oct 19, 2017 at 11:24:04PM -0600, Ross Zwisler wrote:
> > Now that we have the ability log filesystem writes using a flat buffer, add
> > support for DAX.  Unfortunately we can't easily track data that has been
> > written via mmap() now that the dax_flush() abstraction was removed by this
> > commit:
> > 
> > commit c3ca015fab6d ("dax: remove the pmem_dax_ops->flush abstraction")
> > 
> > Otherwise we could just treat each flush as a big write, and store the data
> > that is being synced to media.  It may be worthwhile to add the dax_flush()
> > entry point back, just as a notifier so we can do this logging.
> > 
> > The motivation for this support is the need for an xfstest that can test
> > the new MAP_SYNC DAX flag.  By logging the filesystem activity with
> > dm-log-writes we can show that the MAP_SYNC page faults are writing out
> > their metadata as they happen, instead of requiring an explicit
> > msync/fsync.
> > 
> > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> > ---
> 
> Ok this is just my ignorance of how DAX works shining through, but do we need a
> new flag to indicate this is DAX data?  You are logging it like it's just normal
> data going to a certain sector, is that good enough?  If it is then hooray this
> looks fine to me, I'm just slightly confused.  Thanks,
> 
> Josef

I don't think we need a special flag to specify that it's DAX.  Really it's
just the same as a normal filesystem write, except that we actually do the
work of writing the data via the FS DAX iomap code instead of bubbling it all
the way down to the block driver.
Mike Snitzer Oct. 24, 2017, 7:22 p.m. | #3
On Fri, Oct 20 2017 at  1:24am -0400,
Ross Zwisler <ross.zwisler@linux.intel.com> wrote:

> Now that we have the ability log filesystem writes using a flat buffer, add
> support for DAX.  Unfortunately we can't easily track data that has been
> written via mmap() now that the dax_flush() abstraction was removed by this
> commit:
> 
> commit c3ca015fab6d ("dax: remove the pmem_dax_ops->flush abstraction")
> 
> Otherwise we could just treat each flush as a big write, and store the data
> that is being synced to media.  It may be worthwhile to add the dax_flush()
> entry point back, just as a notifier so we can do this logging.
> 
> The motivation for this support is the need for an xfstest that can test
> the new MAP_SYNC DAX flag.  By logging the filesystem activity with
> dm-log-writes we can show that the MAP_SYNC page faults are writing out
> their metadata as they happen, instead of requiring an explicit
> msync/fsync.
> 
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>

I've picked this up, please see:
https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-4.15/dm&id=ae613bbb0144e84cb3c0ebfa9f4fd4d1507c2f0e

I tweaked the header and tweaked a couple whitespace nits.  Also
switched version bump from 1.0.1 to 1.1.0.

Thanks,
Mike
Ross Zwisler Oct. 24, 2017, 7:30 p.m. | #4
On Tue, Oct 24, 2017 at 03:22:23PM -0400, Mike Snitzer wrote:
> On Fri, Oct 20 2017 at  1:24am -0400,
> Ross Zwisler <ross.zwisler@linux.intel.com> wrote:
> 
> > Now that we have the ability log filesystem writes using a flat buffer, add
> > support for DAX.  Unfortunately we can't easily track data that has been
> > written via mmap() now that the dax_flush() abstraction was removed by this
> > commit:
> > 
> > commit c3ca015fab6d ("dax: remove the pmem_dax_ops->flush abstraction")
> > 
> > Otherwise we could just treat each flush as a big write, and store the data
> > that is being synced to media.  It may be worthwhile to add the dax_flush()
> > entry point back, just as a notifier so we can do this logging.
> > 
> > The motivation for this support is the need for an xfstest that can test
> > the new MAP_SYNC DAX flag.  By logging the filesystem activity with
> > dm-log-writes we can show that the MAP_SYNC page faults are writing out
> > their metadata as they happen, instead of requiring an explicit
> > msync/fsync.
> > 
> > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> 
> I've picked this up, please see:
> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-4.15/dm&id=ae613bbb0144e84cb3c0ebfa9f4fd4d1507c2f0e
> 
> I tweaked the header and tweaked a couple whitespace nits.  Also
> switched version bump from 1.0.1 to 1.1.0.
> 
> Thanks,
> Mike

Sure, your tweaks look fine.  Thanks!

Patch

diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
index c65f9d1..6a8d352 100644
--- a/drivers/md/dm-log-writes.c
+++ b/drivers/md/dm-log-writes.c
@@ -10,9 +10,11 @@ 
 #include <linux/init.h>
 #include <linux/blkdev.h>
 #include <linux/bio.h>
+#include <linux/dax.h>
 #include <linux/slab.h>
 #include <linux/kthread.h>
 #include <linux/freezer.h>
+#include <linux/uio.h>
 
 #define DM_MSG_PREFIX "log-writes"
 
@@ -609,6 +611,50 @@  static int log_mark(struct log_writes_c *lc, char *data)
 	return 0;
 }
 
+static int log_dax(struct log_writes_c *lc, sector_t sector, size_t bytes,
+		struct iov_iter *i)
+{
+	struct pending_block *block;
+
+	if (!bytes)
+		return 0;
+
+	block = kzalloc(sizeof(struct pending_block), GFP_KERNEL);
+	if (!block) {
+		DMERR("Error allocating dax pending block");
+		return -ENOMEM;
+	}
+
+	block->data = kzalloc(bytes, GFP_KERNEL);
+	if (!block->data) {
+		DMERR("Error allocating dax data space");
+		kfree(block);
+		return -ENOMEM;
+	}
+
+	/* write data provided via the iterator */
+	if (!copy_from_iter(block->data, bytes, i)) {
+		DMERR("Error copying dax data");
+		kfree(block->data);
+		kfree(block);
+		return -EIO;
+	}
+
+	/* rewind the iterator so that the block driver can use it */
+	iov_iter_revert(i, bytes);
+
+	block->datalen = bytes;
+	block->sector = bio_to_dev_sectors(lc, sector);
+	block->nr_sectors = ALIGN(bytes, lc->sectorsize) >> lc->sectorshift;
+
+	atomic_inc(&lc->pending_blocks);
+	spin_lock_irq(&lc->blocks_lock);
+	list_add_tail(&block->list, &lc->unflushed_blocks);
+	spin_unlock_irq(&lc->blocks_lock);
+	wake_up_process(lc->log_kthread);
+	return 0;
+}
+
 static void log_writes_dtr(struct dm_target *ti)
 {
 	struct log_writes_c *lc = ti->private;
@@ -874,9 +920,49 @@  static void log_writes_io_hints(struct dm_target *ti, struct queue_limits *limit
 	limits->io_min = limits->physical_block_size;
 }
 
+static long log_writes_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
+		long nr_pages, void **kaddr, pfn_t *pfn)
+{
+	struct log_writes_c *lc = ti->private;
+	struct block_device *bdev = lc->dev->bdev;
+	struct dax_device *dax_dev = lc->dev->dax_dev;
+	sector_t sector = pgoff * PAGE_SECTORS;
+	int ret;
+
+	ret = bdev_dax_pgoff(bdev, sector, nr_pages * PAGE_SIZE, &pgoff);
+	if (ret)
+		return ret;
+	return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn);
+}
+
+static size_t log_writes_dax_copy_from_iter(struct dm_target *ti,
+		pgoff_t pgoff, void *addr, size_t bytes, struct iov_iter *i)
+{
+	struct log_writes_c *lc = ti->private;
+	struct block_device *bdev = lc->dev->bdev;
+	struct dax_device *dax_dev = lc->dev->dax_dev;
+	sector_t sector = pgoff * PAGE_SECTORS;
+	int err;
+
+	if (bdev_dax_pgoff(bdev, sector, ALIGN(bytes, PAGE_SIZE), &pgoff))
+		return 0;
+
+	/* Don't bother doing anything if logging has been disabled */
+	if (!lc->logging_enabled)
+		goto dax_copy;
+
+	err = log_dax(lc, sector, bytes, i);
+	if (err) {
+		DMWARN("Error %d logging DAX write", err);
+		return 0;
+	}
+dax_copy:
+	return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i);
+}
+
 static struct target_type log_writes_target = {
 	.name   = "log-writes",
-	.version = {1, 0, 0},
+	.version = {1, 0, 1},
 	.module = THIS_MODULE,
 	.ctr    = log_writes_ctr,
 	.dtr    = log_writes_dtr,
@@ -887,6 +973,8 @@  static struct target_type log_writes_target = {
 	.message = log_writes_message,
 	.iterate_devices = log_writes_iterate_devices,
 	.io_hints = log_writes_io_hints,
+	.direct_access = log_writes_dax_direct_access,
+	.dax_copy_from_iter = log_writes_dax_copy_from_iter,
 };
 
 static int __init dm_log_writes_init(void)