diff mbox

[3/3] block: Introduce blkdev_issue_zeroout_discard() function

Message ID 1415336894-15327-4-git-send-email-martin.petersen@oracle.com
State Not Applicable
Delegated to: David Miller
Headers show

Commit Message

Martin K. Petersen Nov. 7, 2014, 5:08 a.m. UTC
blkdev_issue_discard() will zero a given block range on disk. This is
done by way of either WRITE SAME or regular WRITE. I.e. the blocks on
disk will be written and thus provisioned.

There are use cases where the desired behavior is to zero the blocks but
unprovision them if possible. The blocks must deterministically contain
zeroes when they are subsequently read back.

This patch introduces a blkdev_issue_zeroout_discard() call that
provides this functionality. If a block device guarantees
discard_zeroes_data the new function will use discard to clear the block
range. If the device does not support discard_zeroes_data or if the
discard request fails we will fall back to blkdev_issue_zeroout() to
ensure predictable results.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 block/blk-lib.c        | 44 ++++++++++++++++++++++++++++++++++++++++++--
 include/linux/blkdev.h |  2 ++
 2 files changed, 44 insertions(+), 2 deletions(-)

Comments

Christoph Hellwig Nov. 7, 2014, 8:26 a.m. UTC | #1
On Fri, Nov 07, 2014 at 12:08:14AM -0500, Martin K. Petersen wrote:
> blkdev_issue_discard() will zero a given block range on disk. This is
> done by way of either WRITE SAME or regular WRITE. I.e. the blocks on
> disk will be written and thus provisioned.
> 
> There are use cases where the desired behavior is to zero the blocks but
> unprovision them if possible. The blocks must deterministically contain
> zeroes when they are subsequently read back.
> 
> This patch introduces a blkdev_issue_zeroout_discard() call that
> provides this functionality. If a block device guarantees
> discard_zeroes_data the new function will use discard to clear the block
> range. If the device does not support discard_zeroes_data or if the
> discard request fails we will fall back to blkdev_issue_zeroout() to
> ensure predictable results.

I'm not a fan of adding another function here and would prefer a flag,
but it looks correct, so:

Reviewed-by: Christoph Hellwig <hch@lst.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin K. Petersen Nov. 7, 2014, 3:42 p.m. UTC | #2
>>>>> "Christoph" == Christoph Hellwig <hch@infradead.org> writes:

Christoph> I'm not a fan of adding another function here and would
Christoph> prefer a flag, but it looks correct, 

That was my original approach too but I didn't want to stomp over all
the existing callers. Although there only are few.

Ted: Which would you prefer?
Theodore Ts'o Nov. 7, 2014, 4:20 p.m. UTC | #3
On Fri, Nov 07, 2014 at 10:42:24AM -0500, Martin K. Petersen wrote:
> >>>>> "Christoph" == Christoph Hellwig <hch@infradead.org> writes:
> 
> Christoph> I'm not a fan of adding another function here and would
> Christoph> prefer a flag, but it looks correct, 
> 
> That was my original approach too but I didn't want to stomp over all
> the existing callers. Although there only are few.
> 
> Ted: Which would you prefer?

There are *very* few users of blkdev_issue_zeroout(), and aside for a
single drbd, they are all in the block layer.  It would only start
affecting ext4 when we plumb that flag through to sb_issue_zeroout
(which your patch doesn't currently do), at which point it will affect
4 call sites in ext4, and a call site in gfs2 and hpfs2.

So I'd be in favor of adding a flag to to blkdev_issue_zeroout(), and
I would have a slight preference for also modifying sb_issue_zeroout
so the flag gets plumbed all the way through to the fs-level callers.

Cheers,

							- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin K. Petersen Nov. 7, 2014, 4:27 p.m. UTC | #4
>>>>> "Ted" == Theodore Ts'o <tytso@mit.edu> writes:

Ted> So I'd be in favor of adding a flag to to blkdev_issue_zeroout(),
Ted> and I would have a slight preference for also modifying
Ted> sb_issue_zeroout so the flag gets plumbed all the way through to
Ted> the fs-level callers.

OK. I'll do that, then. I always liked the flag better. I was just
trying to minimize the impact.

What would you prefer as the default for the ext4 use case? To allocate
or to discard?
Darrick Wong Nov. 11, 2014, 12:04 a.m. UTC | #5
On Fri, Nov 07, 2014 at 12:08:14AM -0500, Martin K. Petersen wrote:
> blkdev_issue_discard() will zero a given block range on disk. This is
> done by way of either WRITE SAME or regular WRITE. I.e. the blocks on
> disk will be written and thus provisioned.
> 
> There are use cases where the desired behavior is to zero the blocks but
> unprovision them if possible. The blocks must deterministically contain
> zeroes when they are subsequently read back.
> 
> This patch introduces a blkdev_issue_zeroout_discard() call that
> provides this functionality. If a block device guarantees
> discard_zeroes_data the new function will use discard to clear the block
> range. If the device does not support discard_zeroes_data or if the
> discard request fails we will fall back to blkdev_issue_zeroout() to
> ensure predictable results.

Can this be plumbed into a BLK* ioctl too?  I'll write a patch, if this is ok
with everyone:

struct blkzeroout_t {
	__u64 start;
	__u64 end;
	__u32 flags;
};
#define BLKZEROOUT_DISCARD_OK	1

#define BLKZEROOUT_V2		_IOR(0x12, 127, sizeof(struct blkzeroout_t))

...and make it zap the page cache per earlier discussion.  This seems to be a
good fit with what we've been discussing for mke2fs.

--D

> 
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> ---
>  block/blk-lib.c        | 44 ++++++++++++++++++++++++++++++++++++++++++--
>  include/linux/blkdev.h |  2 ++
>  2 files changed, 44 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 8411be3c19d3..2ffec6a01c71 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -278,14 +278,18 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
>  }
>  
>  /**
> - * blkdev_issue_zeroout - zero-fill a block range
> + * blkdev_issue_zeroout - zero-fill and provision a block range
>   * @bdev:	blockdev to write
>   * @sector:	start sector
>   * @nr_sects:	number of sectors to write
>   * @gfp_mask:	memory allocation flags (for bio_alloc)
>   *
>   * Description:
> - *  Generate and issue number of bios with zerofiled pages.
> + *  Zero-fill a block range. The blocks will be provisioned
> + *  (allocated/anchored) and are guaranteed to return zeroes when read
> + *  back. This function will attempt to use WRITE SAME to optimize the
> + *  process if the block device supports it. Otherwise it will fall back
> + *  to zeroing the blocks using regular WRITE calls.
>   */
>  
>  int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
> @@ -305,3 +309,39 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
>  	return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
>  }
>  EXPORT_SYMBOL(blkdev_issue_zeroout);
> +
> +/**
> + * blkdev_issue_zeroout_discard - zero-fill and attempt to discard block range
> + * @bdev:	blockdev to write
> + * @sector:	start sector
> + * @nr_sects:	number of sectors to write
> + * @gfp_mask:	memory allocation flags (for bio_alloc)
> + *
> + * Description:
> + *  Zero-fill a block range. In contrast to blkdev_issue_zeroout() this
> + *  function will attempt to deprovision (deallocate/discard) the blocks
> + *  in question. It will only do so if the underlying device guarantees
> + *  that subsequent READ operations to the block range in question will
> + *  return zeroes. If the device does not provide hard guarantees or if
> + *  the DISCARD attempt should fail the block range will be explicitly
> + *  zeroed using blkdev_issue_zeroout().
> + */
> +
> +int blkdev_issue_zeroout_discard(struct block_device *bdev, sector_t sector,
> +				 sector_t nr_sects, gfp_t gfp_mask)
> +{
> +	struct request_queue *q = bdev_get_queue(bdev);
> +
> +	if (blk_queue_discard(q) && q->limits.discard_zeroes_data) {
> +		unsigned char bdn[BDEVNAME_SIZE];
> +
> +		if (!blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0))
> +			return 0;
> +
> +		bdevname(bdev, bdn);
> +		pr_err("%s: DISCARD failed. Manually zeroing.\n", bdn);
> +	}
> +
> +	return blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
> +}
> +EXPORT_SYMBOL(blkdev_issue_zeroout_discard);
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index aac0f9ea952a..078b6e5f488a 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1164,6 +1164,8 @@ extern int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
>  		sector_t nr_sects, gfp_t gfp_mask, struct page *page);
>  extern int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
>  			sector_t nr_sects, gfp_t gfp_mask);
> +extern int blkdev_issue_zeroout_discard(struct block_device *bdev,
> +			sector_t sector, sector_t nr_sects, gfp_t gfp_mask);
>  static inline int sb_issue_discard(struct super_block *sb, sector_t block,
>  		sector_t nr_blocks, gfp_t gfp_mask, unsigned long flags)
>  {
> -- 
> 1.9.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin K. Petersen Nov. 11, 2014, 2:33 a.m. UTC | #6
>>>>> "Darrick" == Darrick J Wong <darrick.wong@oracle.com> writes:

Darrick> Can this be plumbed into a BLK* ioctl too?  I'll write a patch,
Darrick> if this is ok with everyone:

Darrick> ...and make it zap the page cache per earlier discussion.  This
Darrick> seems to be a good fit with what we've been discussing for
Darrick> mke2fs.

That sounds good to me. I'll get the updated patch out tomorrow.
diff mbox

Patch

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 8411be3c19d3..2ffec6a01c71 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -278,14 +278,18 @@  static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 }
 
 /**
- * blkdev_issue_zeroout - zero-fill a block range
+ * blkdev_issue_zeroout - zero-fill and provision a block range
  * @bdev:	blockdev to write
  * @sector:	start sector
  * @nr_sects:	number of sectors to write
  * @gfp_mask:	memory allocation flags (for bio_alloc)
  *
  * Description:
- *  Generate and issue number of bios with zerofiled pages.
+ *  Zero-fill a block range. The blocks will be provisioned
+ *  (allocated/anchored) and are guaranteed to return zeroes when read
+ *  back. This function will attempt to use WRITE SAME to optimize the
+ *  process if the block device supports it. Otherwise it will fall back
+ *  to zeroing the blocks using regular WRITE calls.
  */
 
 int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
@@ -305,3 +309,39 @@  int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 	return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
 }
 EXPORT_SYMBOL(blkdev_issue_zeroout);
+
+/**
+ * blkdev_issue_zeroout_discard - zero-fill and attempt to discard block range
+ * @bdev:	blockdev to write
+ * @sector:	start sector
+ * @nr_sects:	number of sectors to write
+ * @gfp_mask:	memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *  Zero-fill a block range. In contrast to blkdev_issue_zeroout() this
+ *  function will attempt to deprovision (deallocate/discard) the blocks
+ *  in question. It will only do so if the underlying device guarantees
+ *  that subsequent READ operations to the block range in question will
+ *  return zeroes. If the device does not provide hard guarantees or if
+ *  the DISCARD attempt should fail the block range will be explicitly
+ *  zeroed using blkdev_issue_zeroout().
+ */
+
+int blkdev_issue_zeroout_discard(struct block_device *bdev, sector_t sector,
+				 sector_t nr_sects, gfp_t gfp_mask)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+
+	if (blk_queue_discard(q) && q->limits.discard_zeroes_data) {
+		unsigned char bdn[BDEVNAME_SIZE];
+
+		if (!blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0))
+			return 0;
+
+		bdevname(bdev, bdn);
+		pr_err("%s: DISCARD failed. Manually zeroing.\n", bdn);
+	}
+
+	return blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
+}
+EXPORT_SYMBOL(blkdev_issue_zeroout_discard);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index aac0f9ea952a..078b6e5f488a 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1164,6 +1164,8 @@  extern int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp_mask, struct page *page);
 extern int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 			sector_t nr_sects, gfp_t gfp_mask);
+extern int blkdev_issue_zeroout_discard(struct block_device *bdev,
+			sector_t sector, sector_t nr_sects, gfp_t gfp_mask);
 static inline int sb_issue_discard(struct super_block *sb, sector_t block,
 		sector_t nr_blocks, gfp_t gfp_mask, unsigned long flags)
 {