diff mbox

ext4: Return the length of a hole from get_block

Message ID 1435936511-17705-1-git-send-email-matthew.r.wilcox@intel.com
State Accepted, archived
Headers show

Commit Message

Matthew Wilcox July 3, 2015, 3:15 p.m. UTC
From: Matthew Wilcox <willy@linux.intel.com>

Currently, if ext4's get_block encounters a hole, it does not modify the
buffer_head.  That's fine for many callers, but for DAX, it's useful to
know how large the hole is.  XFS already returns the length of the hole,
so this improvement should not confuse any callers.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
---
 fs/ext4/ext4.h  | 6 ++++--
 fs/ext4/inode.c | 6 +++++-
 2 files changed, 9 insertions(+), 3 deletions(-)

Comments

Jan Kara July 13, 2015, 3:16 p.m. UTC | #1
On Fri 03-07-15 11:15:11, Matthew Wilcox wrote:
> From: Matthew Wilcox <willy@linux.intel.com>
> 
> Currently, if ext4's get_block encounters a hole, it does not modify the
> buffer_head.  That's fine for many callers, but for DAX, it's useful to
> know how large the hole is.  XFS already returns the length of the hole,
> so this improvement should not confuse any callers.
> 
> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>

So I'm somewhat wondering: What is the reason of BH_Uptodate flag being
set? I can see the XFS sets it in some cases as well but the use of the
flag isn't really clear to me...

								Honza

> ---
>  fs/ext4/ext4.h  | 6 ++++--
>  fs/ext4/inode.c | 6 +++++-
>  2 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 9a83f14..7b7c097 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -154,12 +154,14 @@ struct ext4_allocation_request {
>   * well as to store the information returned by ext4_map_blocks().  It
>   * takes less room on the stack than a struct buffer_head.
>   */
> +#define EXT4_MAP_UPTODATE	(1 << BH_Uptodate)
>  #define EXT4_MAP_NEW		(1 << BH_New)
>  #define EXT4_MAP_MAPPED		(1 << BH_Mapped)
>  #define EXT4_MAP_UNWRITTEN	(1 << BH_Unwritten)
>  #define EXT4_MAP_BOUNDARY	(1 << BH_Boundary)
> -#define EXT4_MAP_FLAGS		(EXT4_MAP_NEW | EXT4_MAP_MAPPED |\
> -				 EXT4_MAP_UNWRITTEN | EXT4_MAP_BOUNDARY)
> +#define EXT4_MAP_FLAGS		(EXT4_MAP_UPTODATE | EXT4_MAP_NEW | \
> +				 EXT4_MAP_MAPPED | EXT4_MAP_UNWRITTEN | \
> +				 EXT4_MAP_BOUNDARY)
>  
>  struct ext4_map_blocks {
>  	ext4_fsblk_t m_pblk;
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 9b46f6f..0fc49ac 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -494,7 +494,11 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
>  				retval = map->m_len;
>  			map->m_len = retval;
>  		} else if (ext4_es_is_delayed(&es) || ext4_es_is_hole(&es)) {
> -			retval = 0;
> +			map->m_flags |= EXT4_MAP_UPTODATE;
> +			retval = es.es_len - (map->m_lblk - es.es_lblk);
> +			if (retval > map->m_len)
> +				retval = map->m_len;
> +			map->m_len = retval;
>  		} else {
>  			BUG_ON(1);
>  		}
> -- 
> 2.1.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matthew Wilcox July 13, 2015, 3:26 p.m. UTC | #2
On Mon, Jul 13, 2015 at 05:16:10PM +0200, Jan Kara wrote:
> On Fri 03-07-15 11:15:11, Matthew Wilcox wrote:
> > From: Matthew Wilcox <willy@linux.intel.com>
> > 
> > Currently, if ext4's get_block encounters a hole, it does not modify the
> > buffer_head.  That's fine for many callers, but for DAX, it's useful to
> > know how large the hole is.  XFS already returns the length of the hole,
> > so this improvement should not confuse any callers.
> > 
> > Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
> 
> So I'm somewhat wondering: What is the reason of BH_Uptodate flag being
> set? I can see the XFS sets it in some cases as well but the use of the
> flag isn't really clear to me...

No clue.  I'm just following the documentation in buffer.c:

 * NOTE! All mapped/uptodate combinations are valid:
 *
 *      Mapped  Uptodate        Meaning
 *
 *      No      No              "unknown" - must do get_block()
 *      No      Yes             "hole" - zero-filled
 *      Yes     No              "allocated" - allocated on disk, not read in
 *      Yes     Yes             "valid" - allocated and up-to-date in memory.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara July 14, 2015, 9:02 a.m. UTC | #3
On Mon 13-07-15 11:26:15, Matthew Wilcox wrote:
> On Mon, Jul 13, 2015 at 05:16:10PM +0200, Jan Kara wrote:
> > On Fri 03-07-15 11:15:11, Matthew Wilcox wrote:
> > > From: Matthew Wilcox <willy@linux.intel.com>
> > > 
> > > Currently, if ext4's get_block encounters a hole, it does not modify the
> > > buffer_head.  That's fine for many callers, but for DAX, it's useful to
> > > know how large the hole is.  XFS already returns the length of the hole,
> > > so this improvement should not confuse any callers.
> > > 
> > > Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
> > 
> > So I'm somewhat wondering: What is the reason of BH_Uptodate flag being
> > set? I can see the XFS sets it in some cases as well but the use of the
> > flag isn't really clear to me...
> 
> No clue.  I'm just following the documentation in buffer.c:
> 
>  * NOTE! All mapped/uptodate combinations are valid:
>  *
>  *      Mapped  Uptodate        Meaning
>  *
>  *      No      No              "unknown" - must do get_block()
>  *      No      Yes             "hole" - zero-filled
>  *      Yes     No              "allocated" - allocated on disk, not read in
>  *      Yes     Yes             "valid" - allocated and up-to-date in memory.

OK, but that speaks about buffer head attached to a page. get_block()
callback gets a temporary bh (at least in some cases) only so that it can
communicate result of block mapping. And BH_Uptodate should be set only if
data in the buffer is properly filled (which cannot be the case for
temporary bh which doesn't have *any* data) and it simply isn't the case
even for bh attached to a page because ext4 get_block() functions don't
touch bh->b_data at all. So I just wouldn't set BH_Uptodate in get_block()
at all..

									Honza
Matthew Wilcox July 14, 2015, 1:48 p.m. UTC | #4
On Tue, Jul 14, 2015 at 11:02:46AM +0200, Jan Kara wrote:
> On Mon 13-07-15 11:26:15, Matthew Wilcox wrote:
> > On Mon, Jul 13, 2015 at 05:16:10PM +0200, Jan Kara wrote:
> > > On Fri 03-07-15 11:15:11, Matthew Wilcox wrote:
> > > > From: Matthew Wilcox <willy@linux.intel.com>
> > > > 
> > > > Currently, if ext4's get_block encounters a hole, it does not modify the
> > > > buffer_head.  That's fine for many callers, but for DAX, it's useful to
> > > > know how large the hole is.  XFS already returns the length of the hole,
> > > > so this improvement should not confuse any callers.
> > > > 
> > > > Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
> > > 
> > > So I'm somewhat wondering: What is the reason of BH_Uptodate flag being
> > > set? I can see the XFS sets it in some cases as well but the use of the
> > > flag isn't really clear to me...
> > 
> > No clue.  I'm just following the documentation in buffer.c:
> > 
> >  * NOTE! All mapped/uptodate combinations are valid:
> >  *
> >  *      Mapped  Uptodate        Meaning
> >  *
> >  *      No      No              "unknown" - must do get_block()
> >  *      No      Yes             "hole" - zero-filled
> >  *      Yes     No              "allocated" - allocated on disk, not read in
> >  *      Yes     Yes             "valid" - allocated and up-to-date in memory.
> 
> OK, but that speaks about buffer head attached to a page. get_block()
> callback gets a temporary bh (at least in some cases) only so that it can
> communicate result of block mapping. And BH_Uptodate should be set only if
> data in the buffer is properly filled (which cannot be the case for
> temporary bh which doesn't have *any* data) and it simply isn't the case
> even for bh attached to a page because ext4 get_block() functions don't
> touch bh->b_data at all. So I just wouldn't set BH_Uptodate in get_block()
> at all..

OK, but how should DAX then distinguish between an old-style filesystem
(like current ext4) which reports "unknown" and leaves b_size untouched
when it encounters a hole, versus a new-style filesystem (XFS, ext4 with
this patch) which wants to report the size of a hole in b_size?  The use
of Uptodate currently distinguishes the two cases.

Plus, why would you want bh's to be treated differently, depending on
whether they're stack-based or attached to a page?  That seems even more
confusing than bh's already are.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner July 14, 2015, 10:24 p.m. UTC | #5
On Tue, Jul 14, 2015 at 09:48:51AM -0400, Matthew Wilcox wrote:
> On Tue, Jul 14, 2015 at 11:02:46AM +0200, Jan Kara wrote:
> > On Mon 13-07-15 11:26:15, Matthew Wilcox wrote:
> > > On Mon, Jul 13, 2015 at 05:16:10PM +0200, Jan Kara wrote:
> > > > On Fri 03-07-15 11:15:11, Matthew Wilcox wrote:
> > > > > From: Matthew Wilcox <willy@linux.intel.com>
> > > > > 
> > > > > Currently, if ext4's get_block encounters a hole, it does not modify the
> > > > > buffer_head.  That's fine for many callers, but for DAX, it's useful to
> > > > > know how large the hole is.  XFS already returns the length of the hole,
> > > > > so this improvement should not confuse any callers.
> > > > > 
> > > > > Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
> > > > 
> > > > So I'm somewhat wondering: What is the reason of BH_Uptodate flag being
> > > > set? I can see the XFS sets it in some cases as well but the use of the
> > > > flag isn't really clear to me...
> > > 
> > > No clue.  I'm just following the documentation in buffer.c:
> > > 
> > >  * NOTE! All mapped/uptodate combinations are valid:
> > >  *
> > >  *      Mapped  Uptodate        Meaning
> > >  *
> > >  *      No      No              "unknown" - must do get_block()
> > >  *      No      Yes             "hole" - zero-filled
> > >  *      Yes     No              "allocated" - allocated on disk, not read in
> > >  *      Yes     Yes             "valid" - allocated and up-to-date in memory.
> > 
> > OK, but that speaks about buffer head attached to a page. get_block()
> > callback gets a temporary bh (at least in some cases) only so that it can
> > communicate result of block mapping. And BH_Uptodate should be set only if
> > data in the buffer is properly filled (which cannot be the case for
> > temporary bh which doesn't have *any* data) and it simply isn't the case
> > even for bh attached to a page because ext4 get_block() functions don't
> > touch bh->b_data at all. So I just wouldn't set BH_Uptodate in get_block()
> > at all..
> 
> OK, but how should DAX then distinguish between an old-style filesystem
> (like current ext4) which reports "unknown" and leaves b_size untouched
> when it encounters a hole, versus a new-style filesystem (XFS, ext4 with
> this patch) which wants to report the size of a hole in b_size?  The use
> of Uptodate currently distinguishes the two cases.
> 
> Plus, why would you want bh's to be treated differently, depending on
> whether they're stack-based or attached to a page?  That seems even more
> confusing than bh's already are.

The best solution to this is to kill get_block() and move to an
iomap() interface using a struct iomap to pass the mapped region
back to the caller. We're already moving this way (*) and when I
remove buffer heads from XFS I'll be moving it to an iomap based
infrastructure and so I'll want to convert the DAX code at the same
time.  Also, ISTR Christoph directed the GFS2 folk to implementing
the iomap interface to solve this same get_block hole problem the
are having with fiemap(?).

IMO we should just stop abusing bufferheads for this function and
add an iomap method that has sane, clear semantics that aren't
entangled with something carried on a page to track it's state....

(*) See https://lkml.org/lkml/2013/7/23/809 for an example of
multiple page write contexts using ->iomap callouts, and note how
similar that interface is to the PNFS ->map_blocks export operation
in include/linux/exportfs.h.

Cheers,

Dave.
Jan Kara July 15, 2015, 9:59 a.m. UTC | #6
On Tue 14-07-15 09:48:51, Matthew Wilcox wrote:
> On Tue, Jul 14, 2015 at 11:02:46AM +0200, Jan Kara wrote:
> > On Mon 13-07-15 11:26:15, Matthew Wilcox wrote:
> > > On Mon, Jul 13, 2015 at 05:16:10PM +0200, Jan Kara wrote:
> > > > On Fri 03-07-15 11:15:11, Matthew Wilcox wrote:
> > > > > From: Matthew Wilcox <willy@linux.intel.com>
> > > > > 
> > > > > Currently, if ext4's get_block encounters a hole, it does not modify the
> > > > > buffer_head.  That's fine for many callers, but for DAX, it's useful to
> > > > > know how large the hole is.  XFS already returns the length of the hole,
> > > > > so this improvement should not confuse any callers.
> > > > > 
> > > > > Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
> > > > 
> > > > So I'm somewhat wondering: What is the reason of BH_Uptodate flag being
> > > > set? I can see the XFS sets it in some cases as well but the use of the
> > > > flag isn't really clear to me...
> > > 
> > > No clue.  I'm just following the documentation in buffer.c:
> > > 
> > >  * NOTE! All mapped/uptodate combinations are valid:
> > >  *
> > >  *      Mapped  Uptodate        Meaning
> > >  *
> > >  *      No      No              "unknown" - must do get_block()
> > >  *      No      Yes             "hole" - zero-filled
> > >  *      Yes     No              "allocated" - allocated on disk, not read in
> > >  *      Yes     Yes             "valid" - allocated and up-to-date in memory.
> > 
> > OK, but that speaks about buffer head attached to a page. get_block()
> > callback gets a temporary bh (at least in some cases) only so that it can
> > communicate result of block mapping. And BH_Uptodate should be set only if
> > data in the buffer is properly filled (which cannot be the case for
> > temporary bh which doesn't have *any* data) and it simply isn't the case
> > even for bh attached to a page because ext4 get_block() functions don't
> > touch bh->b_data at all. So I just wouldn't set BH_Uptodate in get_block()
> > at all..
> 
> OK, but how should DAX then distinguish between an old-style filesystem
> (like current ext4) which reports "unknown" and leaves b_size untouched
> when it encounters a hole, versus a new-style filesystem (XFS, ext4 with
> this patch) which wants to report the size of a hole in b_size?  The use
> of Uptodate currently distinguishes the two cases.
> 
> Plus, why would you want bh's to be treated differently, depending on
> whether they're stack-based or attached to a page?  That seems even more
> confusing than bh's already are.

Well, you may want to treat them differently because they *are* different.
For example touching b_size of page-attached buffer_head is a no-go.
get_block() interface is abusing buffer_head structure for historical
reasons.

Seeing you have hit issues with using buffer_head for passing mapping
information I agree with Dave that we should convert DAX code to use
iomaps instead of cluttering get_block() via buffer_head further. You can
lift struct iomap from include/linux/exportfs.h (and related constant
definitions) and use it for passing map information. It should be quite
straightforward and simple now that DAX doesn't have many users. We will
have:

typedef int (iomap_fn_t)(struct inode *inode, loff_t offset, u64 length,
			 bool create, struct iomap *iomap);

and DAX functions will take this instead of get_block_t. Adding a wrapper
to ext4_map_blocks() to work as iomap_fn_t is pretty straightforward as
well. I'm sorry we didn't come up with this immediately when you started
implementing DAX...

								Honza
diff mbox

Patch

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 9a83f14..7b7c097 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -154,12 +154,14 @@  struct ext4_allocation_request {
  * well as to store the information returned by ext4_map_blocks().  It
  * takes less room on the stack than a struct buffer_head.
  */
+#define EXT4_MAP_UPTODATE	(1 << BH_Uptodate)
 #define EXT4_MAP_NEW		(1 << BH_New)
 #define EXT4_MAP_MAPPED		(1 << BH_Mapped)
 #define EXT4_MAP_UNWRITTEN	(1 << BH_Unwritten)
 #define EXT4_MAP_BOUNDARY	(1 << BH_Boundary)
-#define EXT4_MAP_FLAGS		(EXT4_MAP_NEW | EXT4_MAP_MAPPED |\
-				 EXT4_MAP_UNWRITTEN | EXT4_MAP_BOUNDARY)
+#define EXT4_MAP_FLAGS		(EXT4_MAP_UPTODATE | EXT4_MAP_NEW | \
+				 EXT4_MAP_MAPPED | EXT4_MAP_UNWRITTEN | \
+				 EXT4_MAP_BOUNDARY)
 
 struct ext4_map_blocks {
 	ext4_fsblk_t m_pblk;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 9b46f6f..0fc49ac 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -494,7 +494,11 @@  int ext4_map_blocks(handle_t *handle, struct inode *inode,
 				retval = map->m_len;
 			map->m_len = retval;
 		} else if (ext4_es_is_delayed(&es) || ext4_es_is_hole(&es)) {
-			retval = 0;
+			map->m_flags |= EXT4_MAP_UPTODATE;
+			retval = es.es_len - (map->m_lblk - es.es_lblk);
+			if (retval > map->m_len)
+				retval = map->m_len;
+			map->m_len = retval;
 		} else {
 			BUG_ON(1);
 		}