diff mbox series

[RFC,1/1] Remove buffered failover for ext4 and block fops direct writes.

Message ID 20240501231533.3128797-2-bongiojp@gmail.com
State New
Headers show
Series Change failover behavior for DIRECT writes in ext4/block fops | expand

Commit Message

Jeremy Bongio May 1, 2024, 11:15 p.m. UTC
From: Jeremy Bongio <jbongio@google.com>

ext4 and block fops would both failover to syncronous, buffered writes if
the direct IO results in a short write where only a portion of the request
was completed.

This patch changes the behavior to simply return the number of bytes
written if the direct write is short.
---
 block/fops.c   |  3 ---
 fs/ext4/file.c | 27 ---------------------------
 2 files changed, 30 deletions(-)

Comments

Christoph Hellwig May 2, 2024, 5:45 a.m. UTC | #1
On Wed, May 01, 2024 at 04:15:33PM -0700, Jeremy Bongio wrote:
> From: Jeremy Bongio <jbongio@google.com>
> 
> ext4 and block fops would both failover to syncronous, buffered writes if
> the direct IO results in a short write where only a portion of the request
> was completed.
> 
> This patch changes the behavior to simply return the number of bytes
> written if the direct write is short.

Please don't combine ext4 and block changes in a single patch.  Please
also explain why you want to change things.

AFAIK this is simply the historic behavior of the old direct I/O code
that's been around forever.  I think the XFS semantics make a lot more
sense, but people might rely on this one way or another.
Theodore Ts'o May 2, 2024, 2:01 p.m. UTC | #2
On Wed, May 01, 2024 at 10:45:06PM -0700, Christoph Hellwig wrote:
> 
> Please don't combine ext4 and block changes in a single patch.  Please
> also explain why you want to change things.
> 
> AFAIK this is simply the historic behavior of the old direct I/O code
> that's been around forever.  I think the XFS semantics make a lot more
> sense, but people might rely on this one way or another.

I agree that the ext4 and block I/O change should be split into two
separate patches.

As for the rest, we discussed this at the weekly ext4 conference call
last week and at the, I had indicated that this was indeed the
historical Direct I/O behavior.  Darrick mentioned that XFS is only
falling back to buffered I/O in one circumstances, which is when there
is direct I/O to a file which is reflinked, which since the
application wouldn't know that this might be the case, falling back to
buffered I/O was the best of not-so-great alternatives.

It might be a good idea if we could agree on a unfied set of standard
semantics for Direct I/O, including what should happen if there is an
I/O error in the middle of a DIO request; should the kernel return a
short write?  Should it silently fallback to buffered I/O?  Given that
XFS has had a fairly strict "never fall back to buffered" practice,
and there haven't been users screaming bloody murder, perhaps it is
time that we can leave the old historical Direct I/O semantics behind,
and we should just be more strict.

Ext4 can make a decision about what to do on its own, but if we want
to unify behavior across all file systems and all of the direct I/O
implications in the kernels, then this is a discussion that would need
to take place on linux-fsdevel, linux-block, and/or LSF/MM.

With that context, what are folks' thiking about the proposal that we
unify Linux's Direct I/O semantics?  I think it would be good if it
was (a) clearly documented, and (b) not be surprising for userspace
application which they switch beteween file systems, or between a file
system and a raw block device.  (Which for certain enterprise
database, is mostly only use for benchmarketing, on the back cover of
Business Week, but sometimes there might be users who decide to
squeeze that last 1% of performance by going to a raw block device,
and it might be nice if they see the same behaviour when they make
that change.)

Cheers,

					- Ted
Darrick J. Wong May 2, 2024, 2:33 p.m. UTC | #3
On Thu, May 02, 2024 at 10:01:39AM -0400, Theodore Ts'o wrote:
> On Wed, May 01, 2024 at 10:45:06PM -0700, Christoph Hellwig wrote:
> > 
> > Please don't combine ext4 and block changes in a single patch.  Please
> > also explain why you want to change things.
> > 
> > AFAIK this is simply the historic behavior of the old direct I/O code
> > that's been around forever.  I think the XFS semantics make a lot more
> > sense, but people might rely on this one way or another.
> 
> I agree that the ext4 and block I/O change should be split into two
> separate patches.
> 
> As for the rest, we discussed this at the weekly ext4 conference call
> last week and at the, I had indicated that this was indeed the
> historical Direct I/O behavior.  Darrick mentioned that XFS is only
> falling back to buffered I/O in one circumstances, which is when there
> is direct I/O to a file which is reflinked, which since the

fsblock unaligned directio writes to a reflinked file, specifically.

> application wouldn't know that this might be the case, falling back to
> buffered I/O was the best of not-so-great alternatives.
> 
> It might be a good idea if we could agree on a unfied set of standard
> semantics for Direct I/O, including what should happen if there is an
> I/O error in the middle of a DIO request; should the kernel return a
> short write?

Given the attitude of "if you use directio you're supposed to know what
you're doing", I think it's fine to return a short write.

>               Should it silently fallback to buffered I/O?  Given that
> XFS has had a fairly strict "never fall back to buffered" practice,
> and there haven't been users screaming bloody murder, perhaps it is
> time that we can leave the old historical Direct I/O semantics behind,
> and we should just be more strict.

The other thing I've heard, mostly from willy is that directio could be
done through the pagecache when it is already caching the data.  I've
also heard about other operating systems <cough> where the mode could
bleed through to other fds (er...).

> Ext4 can make a decision about what to do on its own, but if we want
> to unify behavior across all file systems and all of the direct I/O
> implications in the kernels, then this is a discussion that would need
> to take place on linux-fsdevel, linux-block, and/or LSF/MM.
> 
> With that context, what are folks' thiking about the proposal that we
> unify Linux's Direct I/O semantics?  I think it would be good if it
> was (a) clearly documented, and (b) not be surprising for userspace
> application which they switch beteween file systems, or between a file
> system and a raw block device.  (Which for certain enterprise
> database, is mostly only use for benchmarketing, on the back cover of
> Business Week, but sometimes there might be users who decide to
> squeeze that last 1% of performance by going to a raw block device,
> and it might be nice if they see the same behaviour when they make
> that change.)

Possibly a good idea but how much of LSFMM do we want to spend
relitigating old {,non-}decisions? ;)

--D

> Cheers,
> 
> 					- Ted
>
diff mbox series

Patch

diff --git a/block/fops.c b/block/fops.c
index 0cf8cf72cdfa..d32574ba9d71 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -704,9 +704,6 @@  static ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
 
 	if (iocb->ki_flags & IOCB_DIRECT) {
 		ret = blkdev_direct_write(iocb, from);
-		if (ret >= 0 && iov_iter_count(from))
-			ret = direct_write_fallback(iocb, from, ret,
-					blkdev_buffered_write(iocb, from));
 	} else {
 		ret = blkdev_buffered_write(iocb, from);
 	}
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 54d6ff22585c..d0760452a11f 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -595,32 +595,6 @@  static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	else
 		inode_unlock(inode);
 
-	if (ret >= 0 && iov_iter_count(from)) {
-		ssize_t err;
-		loff_t endbyte;
-
-		offset = iocb->ki_pos;
-		err = ext4_buffered_write_iter(iocb, from);
-		if (err < 0)
-			return err;
-
-		/*
-		 * We need to ensure that the pages within the page cache for
-		 * the range covered by this I/O are written to disk and
-		 * invalidated. This is in attempt to preserve the expected
-		 * direct I/O semantics in the case we fallback to buffered I/O
-		 * to complete off the I/O request.
-		 */
-		ret += err;
-		endbyte = offset + err - 1;
-		err = filemap_write_and_wait_range(iocb->ki_filp->f_mapping,
-						   offset, endbyte);
-		if (!err)
-			invalidate_mapping_pages(iocb->ki_filp->f_mapping,
-						 offset >> PAGE_SHIFT,
-						 endbyte >> PAGE_SHIFT);
-	}
-
 	return ret;
 }
 
@@ -958,4 +932,3 @@  const struct inode_operations ext4_file_inode_operations = {
 	.fileattr_get	= ext4_fileattr_get,
 	.fileattr_set	= ext4_fileattr_set,
 };
-