diff mbox

ext4: refuse O_DIRECT opens for mode where DIO doesn't work

Message ID 1461472078-20104-1-git-send-email-tytso@mit.edu
State Rejected, archived
Headers show

Commit Message

Theodore Ts'o April 24, 2016, 4:27 a.m. UTC
Certain ext4 modes (encryption, data=journal, inline data) cause
Direct I/O to be a no-op.  Instead of making DIO fail silently, make
the open with the O_DIRECT flag fail with EINVAL.

This will avoid surprises to application programs, and also signal to
xfstests not to try O_DIRECT tests for file system modes where it
doesn't work (and could result in test failures).

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
---
 fs/ext4/file.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Dmitry Monakhov April 25, 2016, 9:35 a.m. UTC | #1
Theodore Ts'o <tytso@mit.edu> writes:

> Certain ext4 modes (encryption, data=journal, inline data) cause
> Direct I/O to be a no-op.  Instead of making DIO fail silently, make
> the open with the O_DIRECT flag fail with EINVAL.
>
> This will avoid surprises to application programs, and also signal to
> xfstests not to try O_DIRECT tests for file system modes where it
> doesn't work (and could result in test failures).
>
> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> ---
>  fs/ext4/file.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index fa2208b..4113676 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -372,7 +372,12 @@ static int ext4_file_open(struct inode * inode, struct file * filp)
>  			return -EACCES;
>  		if (ext4_encryption_info(inode) == NULL)
>  			return -ENOKEY;
> +		if (filp->f_flags & O_DIRECT)
> +			return -EINVAL;
>  	}
> +	if ((ext4_should_journal_data(inode) || ext4_has_inline_data(inode)) &&
> +	    (filp->f_flags & O_DIRECT))
Hmm...
__ext4_new_inode set EXT4_STATE_MAY_INLINE_DATA for each inode if
ext4_has_feature_inline_data(sb) is true.
So this may result in complain from user who want inline data
optimization for small files, but also want O_DIRECT to works.
IMHO it is reasonable to convert inline inodes to regular ones if user
open it for WRITE with O_DIRECT
> +		return -EINVAL;
>  
>  	dir = dget_parent(file_dentry(filp));
>  	if (ext4_encrypted_inode(d_inode(dir)) &&
> -- 
> 2.5.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner April 25, 2016, 11:49 p.m. UTC | #2
On Mon, Apr 25, 2016 at 12:35:02PM +0300, Dmitry Monakhov wrote:
> Theodore Ts'o <tytso@mit.edu> writes:
> 
> > Certain ext4 modes (encryption, data=journal, inline data) cause
> > Direct I/O to be a no-op.  Instead of making DIO fail silently, make
> > the open with the O_DIRECT flag fail with EINVAL.
> >
> > This will avoid surprises to application programs, and also signal to
> > xfstests not to try O_DIRECT tests for file system modes where it
> > doesn't work (and could result in test failures).
> >
> > Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> > ---
> >  fs/ext4/file.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> > index fa2208b..4113676 100644
> > --- a/fs/ext4/file.c
> > +++ b/fs/ext4/file.c
> > @@ -372,7 +372,12 @@ static int ext4_file_open(struct inode * inode, struct file * filp)
> >  			return -EACCES;
> >  		if (ext4_encryption_info(inode) == NULL)
> >  			return -ENOKEY;
> > +		if (filp->f_flags & O_DIRECT)
> > +			return -EINVAL;
> >  	}
> > +	if ((ext4_should_journal_data(inode) || ext4_has_inline_data(inode)) &&
> > +	    (filp->f_flags & O_DIRECT))
> Hmm...
> __ext4_new_inode set EXT4_STATE_MAY_INLINE_DATA for each inode if
> ext4_has_feature_inline_data(sb) is true.
> So this may result in complain from user who want inline data
> optimization for small files, but also want O_DIRECT to works.
> IMHO it is reasonable to convert inline inodes to regular ones if user
> open it for WRITE with O_DIRECT

Why not just transparently fall back to buffered IO if direct IO
cannot be done? Saves people from wondering why applications fail
on one ext4 filesystem and not another....

Cheers,

Dave.
Theodore Ts'o April 26, 2016, 12:20 a.m. UTC | #3
On Tue, Apr 26, 2016 at 09:49:46AM +1000, Dave Chinner wrote:
> Why not just transparently fall back to buffered IO if direct IO
> cannot be done? Saves people from wondering why applications fail
> on one ext4 filesystem and not another....

Some file systems return EINVAL if they don't support O_DIRECT.  In
fact, _require_odirect in xfstests relies on this fact.  The question
of whether it's better to silently fall back to buffered I/O and fail
to provide the O_DIRECT functionality requested by the user, or
whether it's better to fail with EINVAL is an interesting one.  It is
possible for applications (or xfstests tests) that expect O_DIRECT
functionality but don't get it can end up fail, sometimes in subtle
ways that might be tricky to debug.

It would be nice if there was some way to answer the question, "does
O_DIRECT work or is it a placebo", perhaps via an extension to
statfs(2), but we don't have that today.

I ran into this because I was investigating an xfstests failure, and
it was because we were claiming to support O_DIRECT when in fact we
weren't, and this was causing a test failure.  OTOH, after I applied
this patch, there were a handful of tests that are using O_DIRECT
without using the _require_odirect assertion, and we were passing
those tests essentially by luck.

So depending how we decide to deal with this case, I can send patches
that add the missing _require_odirect to those tests, and/or send
tests that add more ext4-specific checks to _require_odirect (e.g.,
don't try to use O_DIRECT in data=journal, or if inline_data is
enabled, or if encryption is enabled).

What do folks think?

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig April 26, 2016, 8:14 a.m. UTC | #4
On Tue, Apr 26, 2016 at 09:49:46AM +1000, Dave Chinner wrote:
> Why not just transparently fall back to buffered IO if direct IO
> cannot be done? Saves people from wondering why applications fail
> on one ext4 filesystem and not another....

I've been doing an audit of our direct I/O implementations, and most
of them does some form of transparent fallback, including some that
only pretend to support O_DIRECT, but do anything special for it at all,
while at the same time we go through greast efforts to check a file
system actualy supports direct I/O, leading to nasty no-op ->direct_IO
implementations as we even got that abstraction wrong.

At this point I wonder if we should simply treat O_DIRECT as a hint
and always allow it, and just let the file system optimize for it
(skip buffering, require alignment, relaxed Posix atomicy requirements)
if it is set.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike Marshall April 26, 2016, 3:07 p.m. UTC | #5
We have Orangefs users who have applications they
can't run because the applications open with
O_DIRECT. They are happy when I show them
how to "pretend" to support O_DIRECT - the
way they did it in NFS back in the 2.6 era...

I was thinking of adding it to the upstream version,
maybe as a mount option... so I like this "hint" idea...

-Mike

On Tue, Apr 26, 2016 at 4:14 AM, Christoph Hellwig <hch@infradead.org> wrote:
> On Tue, Apr 26, 2016 at 09:49:46AM +1000, Dave Chinner wrote:
>> Why not just transparently fall back to buffered IO if direct IO
>> cannot be done? Saves people from wondering why applications fail
>> on one ext4 filesystem and not another....
>
> I've been doing an audit of our direct I/O implementations, and most
> of them does some form of transparent fallback, including some that
> only pretend to support O_DIRECT, but do anything special for it at all,
> while at the same time we go through greast efforts to check a file
> system actualy supports direct I/O, leading to nasty no-op ->direct_IO
> implementations as we even got that abstraction wrong.
>
> At this point I wonder if we should simply treat O_DIRECT as a hint
> and always allow it, and just let the file system optimize for it
> (skip buffering, require alignment, relaxed Posix atomicy requirements)
> if it is set.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o April 27, 2016, 2:16 a.m. UTC | #6
On Tue, Apr 26, 2016 at 01:14:51AM -0700, Christoph Hellwig wrote:
> I've been doing an audit of our direct I/O implementations, and most
> of them does some form of transparent fallback, including some that
> only pretend to support O_DIRECT, but do anything special for it at all,
> while at the same time we go through greast efforts to check a file
> system actualy supports direct I/O, leading to nasty no-op ->direct_IO
> implementations as we even got that abstraction wrong.
> 
> At this point I wonder if we should simply treat O_DIRECT as a hint
> and always allow it, and just let the file system optimize for it
> (skip buffering, require alignment, relaxed Posix atomicy requirements)
> if it is set.

That's fine with me, but there ought to be some way for a program to
query whether a particular file / file system is one where DIO is
supported, and if so, what the alignment requirements would be.  That
way applications who care can get the information they need (and we
can use it for xfstests's _require_odirect :-).

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen April 27, 2016, 2:22 a.m. UTC | #7
On 4/26/16 9:16 PM, Theodore Ts'o wrote:
> On Tue, Apr 26, 2016 at 01:14:51AM -0700, Christoph Hellwig wrote:
>> I've been doing an audit of our direct I/O implementations, and most
>> of them does some form of transparent fallback, including some that
>> only pretend to support O_DIRECT, but do anything special for it at all,
>> while at the same time we go through greast efforts to check a file
>> system actualy supports direct I/O, leading to nasty no-op ->direct_IO
>> implementations as we even got that abstraction wrong.
>>
>> At this point I wonder if we should simply treat O_DIRECT as a hint
>> and always allow it, and just let the file system optimize for it
>> (skip buffering, require alignment, relaxed Posix atomicy requirements)
>> if it is set.
> 
> That's fine with me, but there ought to be some way for a program to
> query whether a particular file / file system is one where DIO is
> supported, and if so, what the alignment requirements would be.  That
> way applications who care can get the information they need (and we
> can use it for xfstests's _require_odirect :-).
> 
> 						- Ted

Well, we have xfs's XFS_IOC_DIOINFO which gives memory alignment and
io size/alignment constraints.

That could pretty easily be "hoisted" to the vfs so that any fs 
could return the same info based on internal requirements, or EOPNOTSUPP
or something if dio is never possible.

We seemed to be talking about adding this to xstat, but I guess I
like the idea of a purpose-built interface, not an all-singing,
all-dancing file/filesystem query ...

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner April 27, 2016, 2:25 a.m. UTC | #8
On Tue, Apr 26, 2016 at 10:16:49PM -0400, Theodore Ts'o wrote:
> On Tue, Apr 26, 2016 at 01:14:51AM -0700, Christoph Hellwig wrote:
> > I've been doing an audit of our direct I/O implementations, and most
> > of them does some form of transparent fallback, including some that
> > only pretend to support O_DIRECT, but do anything special for it at all,
> > while at the same time we go through greast efforts to check a file
> > system actualy supports direct I/O, leading to nasty no-op ->direct_IO
> > implementations as we even got that abstraction wrong.
> > 
> > At this point I wonder if we should simply treat O_DIRECT as a hint
> > and always allow it, and just let the file system optimize for it
> > (skip buffering, require alignment, relaxed Posix atomicy requirements)
> > if it is set.
> 
> That's fine with me, but there ought to be some way for a program to
> query whether a particular file / file system is one where DIO is
> supported, and if so, what the alignment requirements would be.

Yes, that's called XFS_IOC_DIOINFO. We've been saying that this
should be promoted to the VFS for some time, though it might be
better to re-implement it with a different structure that includes
padding and a flags field....

> That
> way applications who care can get the information they need (and we
> can use it for xfstests's _require_odirect :-).

Just return EOPNOTSUPP to XFS_IOC_DIOINFO if direct io is not
supported?

Cheers,

Dave.
Dave Chinner April 27, 2016, 2:27 a.m. UTC | #9
On Tue, Apr 26, 2016 at 01:14:51AM -0700, Christoph Hellwig wrote:
> On Tue, Apr 26, 2016 at 09:49:46AM +1000, Dave Chinner wrote:
> > Why not just transparently fall back to buffered IO if direct IO
> > cannot be done? Saves people from wondering why applications fail
> > on one ext4 filesystem and not another....
> 
> I've been doing an audit of our direct I/O implementations, and most
> of them does some form of transparent fallback, including some that
> only pretend to support O_DIRECT, but do anything special for it at all,
> while at the same time we go through greast efforts to check a file
> system actualy supports direct I/O, leading to nasty no-op ->direct_IO
> implementations as we even got that abstraction wrong.
> 
> At this point I wonder if we should simply treat O_DIRECT as a hint
> and always allow it, and just let the file system optimize for it
> (skip buffering, require alignment, relaxed Posix atomicy requirements)
> if it is set.

I thought that's how most filesystems treated it, anyway. i.e.
anything they can't do via direct IO, they fell back to buffered IO
to complete (e.g. for allocation or append writes, etc). Hence why I
suggested the fallback rather than erroring out....

Cheers,

Dave.
Theodore Ts'o April 27, 2016, 3:25 a.m. UTC | #10
On Wed, Apr 27, 2016 at 12:27:46PM +1000, Dave Chinner wrote:
> > At this point I wonder if we should simply treat O_DIRECT as a hint
> > and always allow it, and just let the file system optimize for it
> > (skip buffering, require alignment, relaxed Posix atomicy requirements)
> > if it is set.
> 
> I thought that's how most filesystems treated it, anyway. i.e.
> anything they can't do via direct IO, they fell back to buffered IO
> to complete (e.g. for allocation or append writes, etc). Hence why I
> suggested the fallback rather than erroring out....

No, some file systems return EINVAL on the open.  In fact that's what
the _require_odirect test in xfstests relies upon....

    		     	     	      	     - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner April 27, 2016, 3:37 a.m. UTC | #11
On Tue, Apr 26, 2016 at 11:25:26PM -0400, Theodore Ts'o wrote:
> On Wed, Apr 27, 2016 at 12:27:46PM +1000, Dave Chinner wrote:
> > > At this point I wonder if we should simply treat O_DIRECT as a hint
> > > and always allow it, and just let the file system optimize for it
> > > (skip buffering, require alignment, relaxed Posix atomicy requirements)
> > > if it is set.
> > 
> > I thought that's how most filesystems treated it, anyway. i.e.
> > anything they can't do via direct IO, they fell back to buffered IO
> > to complete (e.g. for allocation or append writes, etc). Hence why I
> > suggested the fallback rather than erroring out....
> 
> No, some file systems return EINVAL on the open.  In fact that's what
> the _require_odirect test in xfstests relies upon....

Sure, but that doesn't change the fact that many of the filesystems
that "support O_DIRECT" don't always do O_DIRECT - they
transparently do buffered IO instead and hence are treating O_DIRECT
as a hint once the file has been opened.

Cheers,

Dave.
diff mbox

Patch

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index fa2208b..4113676 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -372,7 +372,12 @@  static int ext4_file_open(struct inode * inode, struct file * filp)
 			return -EACCES;
 		if (ext4_encryption_info(inode) == NULL)
 			return -ENOKEY;
+		if (filp->f_flags & O_DIRECT)
+			return -EINVAL;
 	}
+	if ((ext4_should_journal_data(inode) || ext4_has_inline_data(inode)) &&
+	    (filp->f_flags & O_DIRECT))
+		return -EINVAL;
 
 	dir = dget_parent(file_dentry(filp));
 	if (ext4_encrypted_inode(d_inode(dir)) &&