diff mbox

[5/5] ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA

Message ID 20170629135420.21357-6-hch@lst.de
State Not Applicable
Headers show

Commit Message

Christoph Hellwig June 29, 2017, 1:54 p.m. UTC
Switch to the iomap_seek_hole and iomap_seek_data helpers for
implementing lseek SEEK_HOLE / SEEK_DATA, and remove all the
code that isn't needed any more.

Note that with this patch ext4 will now always depend on the iomap
code instead of only when CONFIG_DAX is enabled, and it requires
adding a call into the extent status tree for iomap_begin as well
to properly deal with delalloc extents.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/ext4/Kconfig |   1 +
 fs/ext4/ext4.h  |   3 -
 fs/ext4/file.c  | 264 +++-----------------------------------------------------
 fs/ext4/inode.c |  96 ++++++---------------
 4 files changed, 36 insertions(+), 328 deletions(-)

Comments

Andreas Gruenbacher June 30, 2017, 11:51 a.m. UTC | #1
On Thu, Jun 29, 2017 at 3:54 PM, Christoph Hellwig <hch@lst.de> wrote:
> Switch to the iomap_seek_hole and iomap_seek_data helpers for
> implementing lseek SEEK_HOLE / SEEK_DATA, and remove all the
> code that isn't needed any more.
>
> Note that with this patch ext4 will now always depend on the iomap
> code instead of only when CONFIG_DAX is enabled, and it requires
> adding a call into the extent status tree for iomap_begin as well
> to properly deal with delalloc extents.

This breaks SEEK_HOLE / SEEK_DATA on filesystems with the inline_data feature.

Also, ext4 no longer calls inode_lock or inode_lock_shared; that needs
to be added back for consistency between reading i_size and walking
the file extents.

Thanks,
Andreas
Andreas Gruenbacher June 30, 2017, 12:11 p.m. UTC | #2
On Fri, Jun 30, 2017 at 1:51 PM, Andreas Gruenbacher
<agruenba@redhat.com> wrote:
> On Thu, Jun 29, 2017 at 3:54 PM, Christoph Hellwig <hch@lst.de> wrote:
>> Switch to the iomap_seek_hole and iomap_seek_data helpers for
>> implementing lseek SEEK_HOLE / SEEK_DATA, and remove all the
>> code that isn't needed any more.
>>
>> Note that with this patch ext4 will now always depend on the iomap
>> code instead of only when CONFIG_DAX is enabled, and it requires
>> adding a call into the extent status tree for iomap_begin as well
>> to properly deal with delalloc extents.
>
> This breaks SEEK_HOLE / SEEK_DATA on filesystems with the inline_data feature.
>
> Also, ext4 no longer calls inode_lock or inode_lock_shared; that needs
> to be added back for consistency between reading i_size and walking
> the file extents.

Same on xfs, btw.

Thanks,
Andreas
Christoph Hellwig June 30, 2017, 5:37 p.m. UTC | #3
On Fri, Jun 30, 2017 at 01:51:10PM +0200, Andreas Gruenbacher wrote:
> Also, ext4 no longer calls inode_lock or inode_lock_shared; that needs
> to be added back for consistency between reading i_size and walking
> the file extents.

At least for XFS we never had such a consistency as we never took
the iolock (aka i_rwsem).
Darrick Wong July 1, 2017, 7:03 a.m. UTC | #4
On Fri, Jun 30, 2017 at 07:37:40PM +0200, Christoph Hellwig wrote:
> On Fri, Jun 30, 2017 at 01:51:10PM +0200, Andreas Gruenbacher wrote:
> > Also, ext4 no longer calls inode_lock or inode_lock_shared; that needs
> > to be added back for consistency between reading i_size and walking
> > the file extents.
> 
> At least for XFS we never had such a consistency as we never took
> the iolock (aka i_rwsem).

Do we need it?  It seems strange to me that someone else could be
changing the file size while we're trying to use it to scan for holes or
data, but OTOH there are plenty of other places where the kernel
effectively requires that userspace either has synchronized writer
processess with the seeker process or is prepared to deal with the
inconsistent results.  If that's the intended behavior for the
SEEK_{HOLE,DATA} then I guess I can put my fears to rest about XFS /not/
taking the IOLOCK/i_rwsem.

The non-ext4 mechanical parts look ok to me (and test out ok), but I
want to be sure that we don't create a locking mess.  Dave complained in
an earlier thread about lockdep problems because the old code took the
ilock and then started locking pages; since we take the ILOCK
during ->iomap_begin, drop it, and only take page locks during
page_cache_seek_hole_data (which is called from the iomap actor) I think
that particular problem goes away.

(As for the ext4 patch that involves deeper surgery to the ext4 iomap
implementation so you'll have to take that up with Ted...)

--D

> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig July 2, 2017, 3:24 p.m. UTC | #5
On Sat, Jul 01, 2017 at 12:03:06AM -0700, Darrick J. Wong wrote:
> On Fri, Jun 30, 2017 at 07:37:40PM +0200, Christoph Hellwig wrote:
> > On Fri, Jun 30, 2017 at 01:51:10PM +0200, Andreas Gruenbacher wrote:
> > > Also, ext4 no longer calls inode_lock or inode_lock_shared; that needs
> > > to be added back for consistency between reading i_size and walking
> > > the file extents.
> > 
> > At least for XFS we never had such a consistency as we never took
> > the iolock (aka i_rwsem).
> 
> Do we need it?

For XFS I'm pretty sure we don't.  The lseek is fundamentally safe
without it due to the ilock.

> The non-ext4 mechanical parts look ok to me (and test out ok), but I
> want to be sure that we don't create a locking mess.  Dave complained in
> an earlier thread about lockdep problems because the old code took the
> ilock and then started locking pages; since we take the ILOCK
> during ->iomap_begin, drop it, and only take page locks during
> page_cache_seek_hole_data (which is called from the iomap actor) I think
> that particular problem goes away.

The old code took the ilook in the seek hole/data helper, which had
two problems:  it double locked the ilock as we take it in iomap,
and it hols the ilock over the page cache calls.  None of which happen
with this code.
Andreas Gruenbacher July 3, 2017, 3:03 p.m. UTC | #6
On Fri, Jun 30, 2017 at 7:37 PM, Christoph Hellwig <hch@lst.de> wrote:
> On Fri, Jun 30, 2017 at 01:51:10PM +0200, Andreas Gruenbacher wrote:
>> Also, ext4 no longer calls inode_lock or inode_lock_shared; that needs
>> to be added back for consistency between reading i_size and walking
>> the file extents.
>
> At least for XFS we never had such a consistency as we never took
> the iolock (aka i_rwsem).

What else does this piece of code from mainline xfs_seek_hole_data()
do instead then?

        lock = xfs_ilock_data_map_shared(ip);

        end = i_size_read(inode);
        offset = __xfs_seek_hole_data(inode, start, end, whence);

Thanks,
Andreas
Darrick Wong July 3, 2017, 4:21 p.m. UTC | #7
On Mon, Jul 03, 2017 at 05:03:54PM +0200, Andreas Gruenbacher wrote:
> On Fri, Jun 30, 2017 at 7:37 PM, Christoph Hellwig <hch@lst.de> wrote:
> > On Fri, Jun 30, 2017 at 01:51:10PM +0200, Andreas Gruenbacher wrote:
> >> Also, ext4 no longer calls inode_lock or inode_lock_shared; that needs
> >> to be added back for consistency between reading i_size and walking
> >> the file extents.
> >
> > At least for XFS we never had such a consistency as we never took
> > the iolock (aka i_rwsem).
> 
> What else does this piece of code from mainline xfs_seek_hole_data()
> do instead then?
> 
>         lock = xfs_ilock_data_map_shared(ip);

To avoid confusion, I'll start with ilock != iolock.

If I'm reading everything correctly, there are three levels of inode
locks that must be taken in XFS: First, the IOLOCK (aka i_rwsem) to
protect against concurrent IO when we need it, then the MMAPLOCK (istr
this was created to handle dax page faults, which don't take i_rwsem?),
and finally the ILOCK for inode core/extent map updates.  I think page
locks are supposed to happen before ILOCK.

xfs_ilock_data_map_shared() is a helper that takes the ILOCK in shared
or exclusive mode depending on whether or not all the inode metadata
have already been cached in memory.

The ILOCK must be held before reading or writing the extent map.

--D

> 
>         end = i_size_read(inode);
>         offset = __xfs_seek_hole_data(inode, start, end, whence);
> 
> Thanks,
> Andreas
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner July 3, 2017, 10:58 p.m. UTC | #8
On Mon, Jul 03, 2017 at 09:21:45AM -0700, Darrick J. Wong wrote:
> On Mon, Jul 03, 2017 at 05:03:54PM +0200, Andreas Gruenbacher wrote:
> > On Fri, Jun 30, 2017 at 7:37 PM, Christoph Hellwig <hch@lst.de> wrote:
> > > On Fri, Jun 30, 2017 at 01:51:10PM +0200, Andreas Gruenbacher wrote:
> > >> Also, ext4 no longer calls inode_lock or inode_lock_shared; that needs
> > >> to be added back for consistency between reading i_size and walking
> > >> the file extents.
> > >
> > > At least for XFS we never had such a consistency as we never took
> > > the iolock (aka i_rwsem).
> > 
> > What else does this piece of code from mainline xfs_seek_hole_data()
> > do instead then?
> > 
> >         lock = xfs_ilock_data_map_shared(ip);
> 
> To avoid confusion, I'll start with ilock != iolock.
> 
> If I'm reading everything correctly, there are three levels of inode
> locks that must be taken in XFS: First, the IOLOCK (aka i_rwsem) to
> protect against concurrent IO when we need it, then the MMAPLOCK (istr
> this was created to handle dax page faults, which don't take i_rwsem?),
> and finally the ILOCK for inode core/extent map updates.  I think page
> locks are supposed to happen before ILOCK.

IOLOCK and MMAPLOCK are equivalent - there are two locks because we
can't take the IOLOCK in page fault context because page faults can
occur while we are holding the IOLOCK.

Locking order for IO path is:

IOLOCK -> page lock -> ILOCK

Locking order for page faults is:

mmap_sem -> MMAPLOCK -> page lock -> ILOCK

> xfs_ilock_data_map_shared() is a helper that takes the ILOCK in shared
> or exclusive mode depending on whether or not all the inode metadata
> have already been cached in memory.
> 
> The ILOCK must be held before reading or writing the extent map.

More simply:

	- IOLOCK/MMAPLOCK is for access control to file data
	- ILOCK is for access control to inode metadata

Cheers,

Dave.
Andreas Gruenbacher July 7, 2017, 9:27 p.m. UTC | #9
On Fri, Jun 30, 2017 at 1:51 PM, Andreas Gruenbacher <agruenba@redhat.com> wrote:
> On Thu, Jun 29, 2017 at 3:54 PM, Christoph Hellwig <hch@lst.de> wrote:
>> Switch to the iomap_seek_hole and iomap_seek_data helpers for
>> implementing lseek SEEK_HOLE / SEEK_DATA, and remove all the
>> code that isn't needed any more.
>>
>> Note that with this patch ext4 will now always depend on the iomap
>> code instead of only when CONFIG_DAX is enabled, and it requires
>> adding a call into the extent status tree for iomap_begin as well
>> to properly deal with delalloc extents.
>
> This breaks SEEK_HOLE / SEEK_DATA on filesystems with the inline_data feature.
>
> Also, ext4 no longer calls inode_lock or inode_lock_shared; that needs
> to be added back for consistency between reading i_size and walking
> the file extents.

Here are some possible fixes on top of this patch.

One problem with iomap is that is uses a sector number for the physical offset
instead of a byte offset which prevents iomap from being used for inline data
(ext4) / stuffed inodes (gfs2).  If we convert that into a byte offset,
iomap_fiemap and iomap_seek_{hole,data} will just work for those kinds of files.

Andreas

Andreas Gruenbacher (3):
  ext4: Add missing locking around iomap_seek_{hole,data}
  iomap: Switch from blkno to physical offset
  ext4: Add IOMAP_REPORT support for inline data

 fs/buffer.c           |  2 +-
 fs/dax.c              |  2 +-
 fs/ext2/inode.c       |  4 ++--
 fs/ext4/ext4.h        |  4 ++++
 fs/ext4/file.c        |  4 ++++
 fs/ext4/inline.c      | 33 +++++++++++++++++++++++++++++++++
 fs/ext4/inode.c       | 13 +++++++++----
 fs/iomap.c            | 14 ++++++++------
 fs/nfsd/blocklayout.c |  4 ++--
 fs/xfs/xfs_iomap.c    |  6 +++---
 include/linux/iomap.h |  5 +++--
 11 files changed, 70 insertions(+), 21 deletions(-)
Jan Kara July 25, 2017, 12:45 p.m. UTC | #10
On Thu 29-06-17 06:54:20, Christoph Hellwig wrote:
> @@ -3359,6 +3358,7 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
>  	unsigned long first_block = offset >> blkbits;
>  	unsigned long last_block = (offset + length - 1) >> blkbits;
>  	struct ext4_map_blocks map;
> +	bool delalloc = false;
>  	int ret;
>  
>  	if (WARN_ON_ONCE(ext4_has_inline_data(inode)))
> @@ -3369,6 +3369,27 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
>  
>  	if (!(flags & IOMAP_WRITE)) {
>  		ret = ext4_map_blocks(NULL, inode, &map, 0);
> +		if (!ret) {
> +			struct extent_status es = {};
> +
> +			ext4_es_find_delayed_extent_range(inode, map.m_lblk,
> +					map.m_lblk + map.m_len - 1, &es);
> +			/* Is delalloc data before next block in extent tree? */
> +			if (es.es_len && es.es_lblk < map.m_lblk + map.m_len) {
> +				ext4_lblk_t offset = 0;
> +
> +				if (es.es_lblk < map.m_lblk)
> +					offset = map.m_lblk - es.es_lblk;
> +				map.m_lblk = es.es_lblk + offset;
> +				map.m_pblk = ext4_es_pblock(&es) + offset;

This is wrong for delalloc extent - adding offset to some magic block
value...

> +				map.m_len = es.es_len - offset;
> +				if (ext4_es_status(&es) & EXTENT_STATUS_UNWRITTEN)
> +					map.m_flags |= EXT4_MAP_UNWRITTEN;
> +				if (ext4_es_status(&es) & EXTENT_STATUS_DELAYED)
> +					delalloc = true;
> +				ret = 1;
> +			}
> +		}

Also the delalloc handling seems to be wrong that if we have file as:

HD

where H is hole, D is delalloc block, and iomap is called at offset 0, we
should be getting back "hole at 0" back but instead you will return
"delalloc at 1".

We deal with a similar situation in ext4_da_map_blocks() so it would be
good if we could reuse that one rather than reimplementing it here. But
factoring the necessary functionality out isn't that easy.

								Honza
Andreas Gruenbacher Aug. 29, 2017, 1:46 p.m. UTC | #11
Jan,

On Tue, Jul 25, 2017 at 2:45 PM, Jan Kara <jack@suse.cz> wrote:
> On Thu 29-06-17 06:54:20, Christoph Hellwig wrote:
>> @@ -3359,6 +3358,7 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
>>       unsigned long first_block = offset >> blkbits;
>>       unsigned long last_block = (offset + length - 1) >> blkbits;
>>       struct ext4_map_blocks map;
>> +     bool delalloc = false;
>>       int ret;
>>
>>       if (WARN_ON_ONCE(ext4_has_inline_data(inode)))
>> @@ -3369,6 +3369,27 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
>>
>>       if (!(flags & IOMAP_WRITE)) {
>>               ret = ext4_map_blocks(NULL, inode, &map, 0);
>> +             if (!ret) {
>> +                     struct extent_status es = {};
>> +
>> +                     ext4_es_find_delayed_extent_range(inode, map.m_lblk,
>> +                                     map.m_lblk + map.m_len - 1, &es);
>> +                     /* Is delalloc data before next block in extent tree? */
>> +                     if (es.es_len && es.es_lblk < map.m_lblk + map.m_len) {
>> +                             ext4_lblk_t offset = 0;
>> +
>> +                             if (es.es_lblk < map.m_lblk)
>> +                                     offset = map.m_lblk - es.es_lblk;
>> +                             map.m_lblk = es.es_lblk + offset;
>> +                             map.m_pblk = ext4_es_pblock(&es) + offset;
>
> This is wrong for delalloc extent - adding offset to some magic block
> value...

Right.

>> +                             map.m_len = es.es_len - offset;
>> +                             if (ext4_es_status(&es) & EXTENT_STATUS_UNWRITTEN)
>> +                                     map.m_flags |= EXT4_MAP_UNWRITTEN;
>> +                             if (ext4_es_status(&es) & EXTENT_STATUS_DELAYED)
>> +                                     delalloc = true;
>> +                             ret = 1;
>> +                     }
>> +             }
>
> Also the delalloc handling seems to be wrong that if we have file as:
>
> HD
>
> where H is hole, D is delalloc block, and iomap is called at offset 0, we
> should be getting back "hole at 0" back but instead you will return
> "delalloc at 1".
>
> We deal with a similar situation in ext4_da_map_blocks() so it would be
> good if we could reuse that one rather than reimplementing it here. But
> factoring the necessary functionality out isn't that easy.

I've tried to figure out where the issue might be, and I really don't see any.

It may be confusing that ext4_iomap_begin doesn't split up
IOMAP_UNWRITTEN maps into holes and data as ext4_find_unwritten_pgoff
did. This splitting is instead done generically in
iomap_seek_{hole,data} for IOMAP_UNWRITTEN maps using
page_cache_seek_hole_data.

I'll post an improved version of this patch queue shortly.

Thanks,
Andreas
diff mbox

Patch

diff --git a/fs/ext4/Kconfig b/fs/ext4/Kconfig
index e38039fd96ff..73b850f5659c 100644
--- a/fs/ext4/Kconfig
+++ b/fs/ext4/Kconfig
@@ -37,6 +37,7 @@  config EXT4_FS
 	select CRC16
 	select CRYPTO
 	select CRYPTO_CRC32C
+	select FS_IOMAP
 	help
 	  This is the next generation of the ext3 filesystem.
 
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 32191548abed..eb0a1f221af3 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2490,9 +2490,6 @@  extern void ext4_da_update_reserve_space(struct inode *inode,
 					int used, int quota_claim);
 extern int ext4_issue_zeroout(struct inode *inode, ext4_lblk_t lblk,
 			      ext4_fsblk_t pblk, ext4_lblk_t len);
-extern int ext4_get_next_extent(struct inode *inode, ext4_lblk_t lblk,
-				unsigned int map_len,
-				struct extent_status *result);
 
 /* indirect.c */
 extern int ext4_ind_map_blocks(handle_t *handle, struct inode *inode,
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 02ce7e7bbdf5..02bbf2ce7517 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -20,6 +20,7 @@ 
 
 #include <linux/time.h>
 #include <linux/fs.h>
+#include <linux/iomap.h>
 #include <linux/mount.h>
 #include <linux/path.h>
 #include <linux/dax.h>
@@ -439,253 +440,6 @@  static int ext4_file_open(struct inode * inode, struct file * filp)
 }
 
 /*
- * Here we use ext4_map_blocks() to get a block mapping for a extent-based
- * file rather than ext4_ext_walk_space() because we can introduce
- * SEEK_DATA/SEEK_HOLE for block-mapped and extent-mapped file at the same
- * function.  When extent status tree has been fully implemented, it will
- * track all extent status for a file and we can directly use it to
- * retrieve the offset for SEEK_DATA/SEEK_HOLE.
- */
-
-/*
- * When we retrieve the offset for SEEK_DATA/SEEK_HOLE, we would need to
- * lookup page cache to check whether or not there has some data between
- * [startoff, endoff] because, if this range contains an unwritten extent,
- * we determine this extent as a data or a hole according to whether the
- * page cache has data or not.
- */
-static int ext4_find_unwritten_pgoff(struct inode *inode,
-				     int whence,
-				     ext4_lblk_t end_blk,
-				     loff_t *offset)
-{
-	struct pagevec pvec;
-	unsigned int blkbits;
-	pgoff_t index;
-	pgoff_t end;
-	loff_t endoff;
-	loff_t startoff;
-	loff_t lastoff;
-	int found = 0;
-
-	blkbits = inode->i_sb->s_blocksize_bits;
-	startoff = *offset;
-	lastoff = startoff;
-	endoff = (loff_t)end_blk << blkbits;
-
-	index = startoff >> PAGE_SHIFT;
-	end = (endoff - 1) >> PAGE_SHIFT;
-
-	pagevec_init(&pvec, 0);
-	do {
-		int i, num;
-		unsigned long nr_pages;
-
-		num = min_t(pgoff_t, end - index, PAGEVEC_SIZE - 1) + 1;
-		nr_pages = pagevec_lookup(&pvec, inode->i_mapping, index,
-					  (pgoff_t)num);
-		if (nr_pages == 0)
-			break;
-
-		for (i = 0; i < nr_pages; i++) {
-			struct page *page = pvec.pages[i];
-			struct buffer_head *bh, *head;
-
-			/*
-			 * If current offset is smaller than the page offset,
-			 * there is a hole at this offset.
-			 */
-			if (whence == SEEK_HOLE && lastoff < endoff &&
-			    lastoff < page_offset(pvec.pages[i])) {
-				found = 1;
-				*offset = lastoff;
-				goto out;
-			}
-
-			if (page->index > end)
-				goto out;
-
-			lock_page(page);
-
-			if (unlikely(page->mapping != inode->i_mapping)) {
-				unlock_page(page);
-				continue;
-			}
-
-			if (!page_has_buffers(page)) {
-				unlock_page(page);
-				continue;
-			}
-
-			if (page_has_buffers(page)) {
-				lastoff = page_offset(page);
-				bh = head = page_buffers(page);
-				do {
-					if (buffer_uptodate(bh) ||
-					    buffer_unwritten(bh)) {
-						if (whence == SEEK_DATA)
-							found = 1;
-					} else {
-						if (whence == SEEK_HOLE)
-							found = 1;
-					}
-					if (found) {
-						*offset = max_t(loff_t,
-							startoff, lastoff);
-						unlock_page(page);
-						goto out;
-					}
-					lastoff += bh->b_size;
-					bh = bh->b_this_page;
-				} while (bh != head);
-			}
-
-			lastoff = page_offset(page) + PAGE_SIZE;
-			unlock_page(page);
-		}
-
-		/* The no. of pages is less than our desired, we are done. */
-		if (nr_pages < num)
-			break;
-
-		index = pvec.pages[i - 1]->index + 1;
-		pagevec_release(&pvec);
-	} while (index <= end);
-
-	if (whence == SEEK_HOLE && lastoff < endoff) {
-		found = 1;
-		*offset = lastoff;
-	}
-out:
-	pagevec_release(&pvec);
-	return found;
-}
-
-/*
- * ext4_seek_data() retrieves the offset for SEEK_DATA.
- */
-static loff_t ext4_seek_data(struct file *file, loff_t offset, loff_t maxsize)
-{
-	struct inode *inode = file->f_mapping->host;
-	struct extent_status es;
-	ext4_lblk_t start, last, end;
-	loff_t dataoff, isize;
-	int blkbits;
-	int ret;
-
-	inode_lock(inode);
-
-	isize = i_size_read(inode);
-	if (offset >= isize) {
-		inode_unlock(inode);
-		return -ENXIO;
-	}
-
-	blkbits = inode->i_sb->s_blocksize_bits;
-	start = offset >> blkbits;
-	last = start;
-	end = isize >> blkbits;
-	dataoff = offset;
-
-	do {
-		ret = ext4_get_next_extent(inode, last, end - last + 1, &es);
-		if (ret <= 0) {
-			/* No extent found -> no data */
-			if (ret == 0)
-				ret = -ENXIO;
-			inode_unlock(inode);
-			return ret;
-		}
-
-		last = es.es_lblk;
-		if (last != start)
-			dataoff = (loff_t)last << blkbits;
-		if (!ext4_es_is_unwritten(&es))
-			break;
-
-		/*
-		 * If there is a unwritten extent at this offset,
-		 * it will be as a data or a hole according to page
-		 * cache that has data or not.
-		 */
-		if (ext4_find_unwritten_pgoff(inode, SEEK_DATA,
-					      es.es_lblk + es.es_len, &dataoff))
-			break;
-		last += es.es_len;
-		dataoff = (loff_t)last << blkbits;
-		cond_resched();
-	} while (last <= end);
-
-	inode_unlock(inode);
-
-	if (dataoff > isize)
-		return -ENXIO;
-
-	return vfs_setpos(file, dataoff, maxsize);
-}
-
-/*
- * ext4_seek_hole() retrieves the offset for SEEK_HOLE.
- */
-static loff_t ext4_seek_hole(struct file *file, loff_t offset, loff_t maxsize)
-{
-	struct inode *inode = file->f_mapping->host;
-	struct extent_status es;
-	ext4_lblk_t start, last, end;
-	loff_t holeoff, isize;
-	int blkbits;
-	int ret;
-
-	inode_lock(inode);
-
-	isize = i_size_read(inode);
-	if (offset >= isize) {
-		inode_unlock(inode);
-		return -ENXIO;
-	}
-
-	blkbits = inode->i_sb->s_blocksize_bits;
-	start = offset >> blkbits;
-	last = start;
-	end = isize >> blkbits;
-	holeoff = offset;
-
-	do {
-		ret = ext4_get_next_extent(inode, last, end - last + 1, &es);
-		if (ret < 0) {
-			inode_unlock(inode);
-			return ret;
-		}
-		/* Found a hole? */
-		if (ret == 0 || es.es_lblk > last) {
-			if (last != start)
-				holeoff = (loff_t)last << blkbits;
-			break;
-		}
-		/*
-		 * If there is a unwritten extent at this offset,
-		 * it will be as a data or a hole according to page
-		 * cache that has data or not.
-		 */
-		if (ext4_es_is_unwritten(&es) &&
-		    ext4_find_unwritten_pgoff(inode, SEEK_HOLE,
-					      last + es.es_len, &holeoff))
-			break;
-
-		last += es.es_len;
-		holeoff = (loff_t)last << blkbits;
-		cond_resched();
-	} while (last <= end);
-
-	inode_unlock(inode);
-
-	if (holeoff > isize)
-		holeoff = isize;
-
-	return vfs_setpos(file, holeoff, maxsize);
-}
-
-/*
  * ext4_llseek() handles both block-mapped and extent-mapped maxbytes values
  * by calling generic_file_llseek_size() with the appropriate maxbytes
  * value for each.
@@ -701,18 +455,20 @@  loff_t ext4_llseek(struct file *file, loff_t offset, int whence)
 		maxbytes = inode->i_sb->s_maxbytes;
 
 	switch (whence) {
-	case SEEK_SET:
-	case SEEK_CUR:
-	case SEEK_END:
+	default:
 		return generic_file_llseek_size(file, offset, whence,
 						maxbytes, i_size_read(inode));
-	case SEEK_DATA:
-		return ext4_seek_data(file, offset, maxbytes);
 	case SEEK_HOLE:
-		return ext4_seek_hole(file, offset, maxbytes);
+		offset = iomap_seek_hole(inode, offset, &ext4_iomap_ops);
+		break;
+	case SEEK_DATA:
+		offset = iomap_seek_data(inode, offset, &ext4_iomap_ops);
+		break;
 	}
 
-	return -EINVAL;
+	if (offset < 0)
+		return offset;
+	return vfs_setpos(file, offset, maxbytes);
 }
 
 const struct file_operations ext4_file_operations = {
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5cf82d03968c..56a3b042b0ce 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3350,7 +3350,6 @@  static int ext4_releasepage(struct page *page, gfp_t wait)
 		return try_to_free_buffers(page);
 }
 
-#ifdef CONFIG_FS_DAX
 static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 			    unsigned flags, struct iomap *iomap)
 {
@@ -3359,6 +3358,7 @@  static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 	unsigned long first_block = offset >> blkbits;
 	unsigned long last_block = (offset + length - 1) >> blkbits;
 	struct ext4_map_blocks map;
+	bool delalloc = false;
 	int ret;
 
 	if (WARN_ON_ONCE(ext4_has_inline_data(inode)))
@@ -3369,6 +3369,27 @@  static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 
 	if (!(flags & IOMAP_WRITE)) {
 		ret = ext4_map_blocks(NULL, inode, &map, 0);
+		if (!ret) {
+			struct extent_status es = {};
+
+			ext4_es_find_delayed_extent_range(inode, map.m_lblk,
+					map.m_lblk + map.m_len - 1, &es);
+			/* Is delalloc data before next block in extent tree? */
+			if (es.es_len && es.es_lblk < map.m_lblk + map.m_len) {
+				ext4_lblk_t offset = 0;
+
+				if (es.es_lblk < map.m_lblk)
+					offset = map.m_lblk - es.es_lblk;
+				map.m_lblk = es.es_lblk + offset;
+				map.m_pblk = ext4_es_pblock(&es) + offset;
+				map.m_len = es.es_len - offset;
+				if (ext4_es_status(&es) & EXTENT_STATUS_UNWRITTEN)
+					map.m_flags |= EXT4_MAP_UNWRITTEN;
+				if (ext4_es_status(&es) & EXTENT_STATUS_DELAYED)
+					delalloc = true;
+				ret = 1;
+			}
+		}
 	} else {
 		int dio_credits;
 		handle_t *handle;
@@ -3436,7 +3457,9 @@  static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 		iomap->blkno = IOMAP_NULL_BLOCK;
 		iomap->length = (u64)map.m_len << blkbits;
 	} else {
-		if (map.m_flags & EXT4_MAP_MAPPED) {
+		if (delalloc) {
+			iomap->type = IOMAP_DELALLOC;
+		} else if (map.m_flags & EXT4_MAP_MAPPED) {
 			iomap->type = IOMAP_MAPPED;
 		} else if (map.m_flags & EXT4_MAP_UNWRITTEN) {
 			iomap->type = IOMAP_UNWRITTEN;
@@ -3511,8 +3534,6 @@  const struct iomap_ops ext4_iomap_ops = {
 	.iomap_end		= ext4_iomap_end,
 };
 
-#endif
-
 static int ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
 			    ssize_t size, void *private)
 {
@@ -5994,70 +6015,3 @@  int ext4_filemap_fault(struct vm_fault *vmf)
 
 	return err;
 }
-
-/*
- * Find the first extent at or after @lblk in an inode that is not a hole.
- * Search for @map_len blocks at most. The extent is returned in @result.
- *
- * The function returns 1 if we found an extent. The function returns 0 in
- * case there is no extent at or after @lblk and in that case also sets
- * @result->es_len to 0. In case of error, the error code is returned.
- */
-int ext4_get_next_extent(struct inode *inode, ext4_lblk_t lblk,
-			 unsigned int map_len, struct extent_status *result)
-{
-	struct ext4_map_blocks map;
-	struct extent_status es = {};
-	int ret;
-
-	map.m_lblk = lblk;
-	map.m_len = map_len;
-
-	/*
-	 * For non-extent based files this loop may iterate several times since
-	 * we do not determine full hole size.
-	 */
-	while (map.m_len > 0) {
-		ret = ext4_map_blocks(NULL, inode, &map, 0);
-		if (ret < 0)
-			return ret;
-		/* There's extent covering m_lblk? Just return it. */
-		if (ret > 0) {
-			int status;
-
-			ext4_es_store_pblock(result, map.m_pblk);
-			result->es_lblk = map.m_lblk;
-			result->es_len = map.m_len;
-			if (map.m_flags & EXT4_MAP_UNWRITTEN)
-				status = EXTENT_STATUS_UNWRITTEN;
-			else
-				status = EXTENT_STATUS_WRITTEN;
-			ext4_es_store_status(result, status);
-			return 1;
-		}
-		ext4_es_find_delayed_extent_range(inode, map.m_lblk,
-						  map.m_lblk + map.m_len - 1,
-						  &es);
-		/* Is delalloc data before next block in extent tree? */
-		if (es.es_len && es.es_lblk < map.m_lblk + map.m_len) {
-			ext4_lblk_t offset = 0;
-
-			if (es.es_lblk < lblk)
-				offset = lblk - es.es_lblk;
-			result->es_lblk = es.es_lblk + offset;
-			ext4_es_store_pblock(result,
-					     ext4_es_pblock(&es) + offset);
-			result->es_len = es.es_len - offset;
-			ext4_es_store_status(result, ext4_es_status(&es));
-
-			return 1;
-		}
-		/* There's a hole at m_lblk, advance us after it */
-		map.m_lblk += map.m_len;
-		map_len -= map.m_len;
-		map.m_len = map_len;
-		cond_resched();
-	}
-	result->es_len = 0;
-	return 0;
-}