diff mbox series

9pfs: don't ignore O_DIRECT flag in the 9pfs server

Message ID 5A126CCB.3010602@huawei.com
State New
Headers show
Series 9pfs: don't ignore O_DIRECT flag in the 9pfs server | expand

Commit Message

jiangyiwen Nov. 20, 2017, 5:48 a.m. UTC
Now v9fs in linux has already supported O_DIRECT(v9fs_direct_IO),
when guest user open file with O_DIRECT flag and return success,
so user hopes data doesn't pass through page cache, but 9pfs in
qemu ignore direct disk access and use host page cache, it is not
match to DIRECT_IO semantic, so we should not ignore O_DIRECT in
9pfs unless v9fs in linux don't support DIRECT_IO.

And if server fs don't support O_DIRECT, user will receive -EINVAL
and know the filesystem don't support O_DIRECT.

So in order to ensure semantic consistency, don't ignore direct
disk access in 9pfs.

Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
---
 hw/9pfs/9p.c | 4 ----
 1 file changed, 4 deletions(-)

--

Comments

Greg Kurz Nov. 20, 2017, 10:13 a.m. UTC | #1
On Mon, 20 Nov 2017 13:48:59 +0800
jiangyiwen <jiangyiwen@huawei.com> wrote:

> Now v9fs in linux has already supported O_DIRECT(v9fs_direct_IO),
> when guest user open file with O_DIRECT flag and return success,
> so user hopes data doesn't pass through page cache, but 9pfs in
> qemu ignore direct disk access and use host page cache, it is not
> match to DIRECT_IO semantic, so we should not ignore O_DIRECT in
> 9pfs unless v9fs in linux don't support DIRECT_IO.
> 
> And if server fs don't support O_DIRECT, user will receive -EINVAL
> and know the filesystem don't support O_DIRECT.
> 
> So in order to ensure semantic consistency, don't ignore direct
> disk access in 9pfs.
> 

There are good reasons for us to ignore O_DIRECT. AFAIK, O_DIRECT requires
the application to take care of the alignment of the buffers with either the
logical block size of the filesystem (linux <= 2.4) or the logical block size
of the underlying storage... I don't really see how you can achieve that since
the linux 9p client only cares to break up requests into msize-chunks, and
ignores alignment. ie, you're likely to not ensure semantic consistency on the
host side anyway and linux will fallback to cached I/O.

Also, this change would silently break existing users of O_DIRECT, which isn't
acceptable... it would require some fsdev property to enable this new behavior.

> Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
> ---
>  hw/9pfs/9p.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> index 52d4663..5ea01c4 100644
> --- a/hw/9pfs/9p.c
> +++ b/hw/9pfs/9p.c
> @@ -155,10 +155,6 @@ static int get_dotl_openflags(V9fsState *s, int oflags)
>       */
>      flags = dotl_to_open_flags(oflags);
>      flags &= ~(O_NOCTTY | O_ASYNC | O_CREAT);
> -    /*
> -     * Ignore direct disk access hint until the server supports it.
> -     */
> -    flags &= ~O_DIRECT;
>      return flags;
>  }
>
jiangyiwen Nov. 21, 2017, 3:28 a.m. UTC | #2
On 2017/11/20 18:13, Greg Kurz wrote:
> On Mon, 20 Nov 2017 13:48:59 +0800
> jiangyiwen <jiangyiwen@huawei.com> wrote:
> 
>> Now v9fs in linux has already supported O_DIRECT(v9fs_direct_IO),
>> when guest user open file with O_DIRECT flag and return success,
>> so user hopes data doesn't pass through page cache, but 9pfs in
>> qemu ignore direct disk access and use host page cache, it is not
>> match to DIRECT_IO semantic, so we should not ignore O_DIRECT in
>> 9pfs unless v9fs in linux don't support DIRECT_IO.
>>
>> And if server fs don't support O_DIRECT, user will receive -EINVAL
>> and know the filesystem don't support O_DIRECT.
>>
>> So in order to ensure semantic consistency, don't ignore direct
>> disk access in 9pfs.
>>
> 
> There are good reasons for us to ignore O_DIRECT. AFAIK, O_DIRECT requires
> the application to take care of the alignment of the buffers with either the
> logical block size of the filesystem (linux <= 2.4) or the logical block size
> of the underlying storage... I don't really see how you can achieve that since
> the linux 9p client only cares to break up requests into msize-chunks, and
> ignores alignment. ie, you're likely to not ensure semantic consistency on the
> host side anyway and linux will fallback to cached I/O.
> 
> Also, this change would silently break existing users of O_DIRECT, which isn't
> acceptable... it would require some fsdev property to enable this new behavior.
> 

Thank you very much, I understand what your mean, v9fs in linux doesn't take care
of the alignment, only split as the msize. Then there is another problem,
should v9fs support DIRECT_IO? or delete DIRECT_IO ops in v9fs?

Thanks,
Yiwen.
>> Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
>> ---
>>  hw/9pfs/9p.c | 4 ----
>>  1 file changed, 4 deletions(-)
>>
>> diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
>> index 52d4663..5ea01c4 100644
>> --- a/hw/9pfs/9p.c
>> +++ b/hw/9pfs/9p.c
>> @@ -155,10 +155,6 @@ static int get_dotl_openflags(V9fsState *s, int oflags)
>>       */
>>      flags = dotl_to_open_flags(oflags);
>>      flags &= ~(O_NOCTTY | O_ASYNC | O_CREAT);
>> -    /*
>> -     * Ignore direct disk access hint until the server supports it.
>> -     */
>> -    flags &= ~O_DIRECT;
>>      return flags;
>>  }
>>
> 
> 
> .
>
Greg Kurz Nov. 27, 2017, 10:46 a.m. UTC | #3
On Tue, 21 Nov 2017 11:28:29 +0800
jiangyiwen <jiangyiwen@huawei.com> wrote:

> On 2017/11/20 18:13, Greg Kurz wrote:
> > On Mon, 20 Nov 2017 13:48:59 +0800
> > jiangyiwen <jiangyiwen@huawei.com> wrote:
> >   
> >> Now v9fs in linux has already supported O_DIRECT(v9fs_direct_IO),
> >> when guest user open file with O_DIRECT flag and return success,
> >> so user hopes data doesn't pass through page cache, but 9pfs in
> >> qemu ignore direct disk access and use host page cache, it is not
> >> match to DIRECT_IO semantic, so we should not ignore O_DIRECT in
> >> 9pfs unless v9fs in linux don't support DIRECT_IO.
> >>
> >> And if server fs don't support O_DIRECT, user will receive -EINVAL
> >> and know the filesystem don't support O_DIRECT.
> >>
> >> So in order to ensure semantic consistency, don't ignore direct
> >> disk access in 9pfs.
> >>  
> > 
> > There are good reasons for us to ignore O_DIRECT. AFAIK, O_DIRECT requires
> > the application to take care of the alignment of the buffers with either the
> > logical block size of the filesystem (linux <= 2.4) or the logical block size
> > of the underlying storage... I don't really see how you can achieve that since
> > the linux 9p client only cares to break up requests into msize-chunks, and
> > ignores alignment. ie, you're likely to not ensure semantic consistency on the
> > host side anyway and linux will fallback to cached I/O.
> > 
> > Also, this change would silently break existing users of O_DIRECT, which isn't
> > acceptable... it would require some fsdev property to enable this new behavior.
> >   
> 
> Thank you very much, I understand what your mean, v9fs in linux doesn't take care
> of the alignment, only split as the msize. Then there is another problem,
> should v9fs support DIRECT_IO? or delete DIRECT_IO ops in v9fs?
> 

The linux open(2) manual page says:

"O_DIRECT (since Linux 2.4.10)
        Try to minimize cache effects of the I/O to and from this  file."

ie, it doesn't ensure you necessarily skip any cache that may exist between
the application and the ultimate storage. The manual page also elaborates
how direct I/O behaves with NFS:

"The  behavior  of O_DIRECT with NFS will differ from local filesystems.  Older
 kernels, or kernels configured in certain ways, may not support this combination.
 The NFS protocol does not support passing the flag to the server, so O_DIRECT I/O
 will bypass the page cache only on the client; the server may still cache the I/O.
 The client asks the server to make the I/O synchronous to preserve the synchronous
 semantics of O_DIRECT.  Some servers will perform poorly under these circumstances,
 especially if the I/O size is small.  Some servers may also be configured to lie to
 clients about the I/O having reached stable storage; this will avoid the performance
 penalty at some risk to data integrity in the event of server power failure. The
 Linux NFS client places no alignment restrictions on O_DIRECT I/O."

So, Direct I/O in v9fs basically means the page cache is skipped in the client,
which looks acceptable IMHO... And, again, removing this may break existing
setups, so I'm not sure we should do that.

Cheers,

--
Greg

> Thanks,
> Yiwen.
> >> Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
> >> ---
> >>  hw/9pfs/9p.c | 4 ----
> >>  1 file changed, 4 deletions(-)
> >>
> >> diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> >> index 52d4663..5ea01c4 100644
> >> --- a/hw/9pfs/9p.c
> >> +++ b/hw/9pfs/9p.c
> >> @@ -155,10 +155,6 @@ static int get_dotl_openflags(V9fsState *s, int oflags)
> >>       */
> >>      flags = dotl_to_open_flags(oflags);
> >>      flags &= ~(O_NOCTTY | O_ASYNC | O_CREAT);
> >> -    /*
> >> -     * Ignore direct disk access hint until the server supports it.
> >> -     */
> >> -    flags &= ~O_DIRECT;
> >>      return flags;
> >>  }
> >>  
> > 
> > 
> > .
> >   
> 
>
diff mbox series

Patch

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index 52d4663..5ea01c4 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -155,10 +155,6 @@  static int get_dotl_openflags(V9fsState *s, int oflags)
      */
     flags = dotl_to_open_flags(oflags);
     flags &= ~(O_NOCTTY | O_ASYNC | O_CREAT);
-    /*
-     * Ignore direct disk access hint until the server supports it.
-     */
-    flags &= ~O_DIRECT;
     return flags;
 }