diff mbox series

[1/2,SRU,Disco] UBUNTU: SAUCE: shiftfs: add O_DIRECT support

Message ID 20190719155047.5885-2-christian.brauner@ubuntu.com
State New
Headers show
Series shiftfs: bugfix and O_DIRECT support | expand

Commit Message

Christian Brauner July 19, 2019, 3:50 p.m. UTC
BugLink: https://bugs.launchpad.net/bugs/1837223

This enabled O_DIRECT support for shiftfs if the underlay supports it.

Currently shiftfs does not handle O_DIRECT if the underlay supports it.
This is blocking dqlite - an essential part of LXD - from profiting from
the performance benefits of O_DIRECT on suitable filesystems when used
with async io such as aio or io_uring.
Overlayfs cannot support this directly since the upper filesystem in
overlay can be any filesystem. So if the upper filesystem does not
support O_DIRECT but the lower filesystem does you're out of luck.
Shiftfs does not suffer from the same problem since there is not concept
of an upper filesystem in the same way that overlayfs has it.
Essentially, shiftfs is a transparent shim relaying everything to the
underlay while overlayfs' upper layer is not (completely).

Cc: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
 fs/shiftfs.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Seth Forshee July 19, 2019, 8:02 p.m. UTC | #1
On Fri, Jul 19, 2019 at 05:50:46PM +0200, Christian Brauner wrote:
> BugLink: https://bugs.launchpad.net/bugs/1837223
> 
> This enabled O_DIRECT support for shiftfs if the underlay supports it.
> 
> Currently shiftfs does not handle O_DIRECT if the underlay supports it.
> This is blocking dqlite - an essential part of LXD - from profiting from
> the performance benefits of O_DIRECT on suitable filesystems when used
> with async io such as aio or io_uring.
> Overlayfs cannot support this directly since the upper filesystem in
> overlay can be any filesystem. So if the upper filesystem does not
> support O_DIRECT but the lower filesystem does you're out of luck.
> Shiftfs does not suffer from the same problem since there is not concept
> of an upper filesystem in the same way that overlayfs has it.
> Essentially, shiftfs is a transparent shim relaying everything to the
> underlay while overlayfs' upper layer is not (completely).

I get concerned any time shiftfs just copies up some non-trivial data
from the lower filesystem, that shiftfs is going to get bypassed and
some important metadata will not get propoerly updated in shiftfs. The
question that immediately comes to mind in this case is whether i_size
in the shiftfs indoe gets updated following an O_DIRECT write, and I
suspect tha there are other similar cases to worry about. Have you
considered this possibility and confirmed that no such issues exist?

Thanks,
Seth
Christian Brauner July 22, 2019, 3:05 p.m. UTC | #2
On Fri, Jul 19, 2019 at 03:02:21PM -0500, Seth Forshee wrote:
> On Fri, Jul 19, 2019 at 05:50:46PM +0200, Christian Brauner wrote:
> > BugLink: https://bugs.launchpad.net/bugs/1837223
> > 
> > This enabled O_DIRECT support for shiftfs if the underlay supports it.
> > 
> > Currently shiftfs does not handle O_DIRECT if the underlay supports it.
> > This is blocking dqlite - an essential part of LXD - from profiting from
> > the performance benefits of O_DIRECT on suitable filesystems when used
> > with async io such as aio or io_uring.
> > Overlayfs cannot support this directly since the upper filesystem in
> > overlay can be any filesystem. So if the upper filesystem does not
> > support O_DIRECT but the lower filesystem does you're out of luck.
> > Shiftfs does not suffer from the same problem since there is not concept
> > of an upper filesystem in the same way that overlayfs has it.
> > Essentially, shiftfs is a transparent shim relaying everything to the
> > underlay while overlayfs' upper layer is not (completely).
> 
> I get concerned any time shiftfs just copies up some non-trivial data
> from the lower filesystem, that shiftfs is going to get bypassed and
> some important metadata will not get propoerly updated in shiftfs. The
> question that immediately comes to mind in this case is whether i_size
> in the shiftfs indoe gets updated following an O_DIRECT write, and I
> suspect tha there are other similar cases to worry about. Have you

I'm not following, what are "other similar cases to worry about" and are
those already exposed?

Re: i_size for O_DIRECT
I have not seen prior art for filesystems that use iomap for O_DIRECT
that would fiddle with i_size so I didn't want to get into that game.
That is to say, shiftfs only supports O_DIRECT if the underlying
filesystem is iomap based. This was the same approach that overlayfs
wanted to use.

Christian
Seth Forshee July 25, 2019, 8:26 p.m. UTC | #3
On Mon, Jul 22, 2019 at 05:05:08PM +0200, Christian Brauner wrote:
> On Fri, Jul 19, 2019 at 03:02:21PM -0500, Seth Forshee wrote:
> > On Fri, Jul 19, 2019 at 05:50:46PM +0200, Christian Brauner wrote:
> > > BugLink: https://bugs.launchpad.net/bugs/1837223
> > > 
> > > This enabled O_DIRECT support for shiftfs if the underlay supports it.
> > > 
> > > Currently shiftfs does not handle O_DIRECT if the underlay supports it.
> > > This is blocking dqlite - an essential part of LXD - from profiting from
> > > the performance benefits of O_DIRECT on suitable filesystems when used
> > > with async io such as aio or io_uring.
> > > Overlayfs cannot support this directly since the upper filesystem in
> > > overlay can be any filesystem. So if the upper filesystem does not
> > > support O_DIRECT but the lower filesystem does you're out of luck.
> > > Shiftfs does not suffer from the same problem since there is not concept
> > > of an upper filesystem in the same way that overlayfs has it.
> > > Essentially, shiftfs is a transparent shim relaying everything to the
> > > underlay while overlayfs' upper layer is not (completely).
> > 
> > I get concerned any time shiftfs just copies up some non-trivial data
> > from the lower filesystem, that shiftfs is going to get bypassed and
> > some important metadata will not get propoerly updated in shiftfs. The
> > question that immediately comes to mind in this case is whether i_size
> > in the shiftfs indoe gets updated following an O_DIRECT write, and I
> > suspect tha there are other similar cases to worry about. Have you
> 
> I'm not following, what are "other similar cases to worry about" and are
> those already exposed?

I just mean anything else which might change in the underlay as a result
of I/O, e.g. timestamps.

> Re: i_size for O_DIRECT
> I have not seen prior art for filesystems that use iomap for O_DIRECT
> that would fiddle with i_size so I didn't want to get into that game.
> That is to say, shiftfs only supports O_DIRECT if the underlying
> filesystem is iomap based. This was the same approach that overlayfs
> wanted to use.

Are you saying that an O_DIRECT write that extends a file does not
result in i_size being updated? That doesn't sound right ...
Christian Brauner July 26, 2019, 8:29 a.m. UTC | #4
On Thu, Jul 25, 2019 at 03:26:01PM -0500, Seth Forshee wrote:
> On Mon, Jul 22, 2019 at 05:05:08PM +0200, Christian Brauner wrote:
> > On Fri, Jul 19, 2019 at 03:02:21PM -0500, Seth Forshee wrote:
> > > On Fri, Jul 19, 2019 at 05:50:46PM +0200, Christian Brauner wrote:
> > > > BugLink: https://bugs.launchpad.net/bugs/1837223
> > > > 
> > > > This enabled O_DIRECT support for shiftfs if the underlay supports it.
> > > > 
> > > > Currently shiftfs does not handle O_DIRECT if the underlay supports it.
> > > > This is blocking dqlite - an essential part of LXD - from profiting from
> > > > the performance benefits of O_DIRECT on suitable filesystems when used
> > > > with async io such as aio or io_uring.
> > > > Overlayfs cannot support this directly since the upper filesystem in
> > > > overlay can be any filesystem. So if the upper filesystem does not
> > > > support O_DIRECT but the lower filesystem does you're out of luck.
> > > > Shiftfs does not suffer from the same problem since there is not concept
> > > > of an upper filesystem in the same way that overlayfs has it.
> > > > Essentially, shiftfs is a transparent shim relaying everything to the
> > > > underlay while overlayfs' upper layer is not (completely).
> > > 
> > > I get concerned any time shiftfs just copies up some non-trivial data
> > > from the lower filesystem, that shiftfs is going to get bypassed and
> > > some important metadata will not get propoerly updated in shiftfs. The
> > > question that immediately comes to mind in this case is whether i_size
> > > in the shiftfs indoe gets updated following an O_DIRECT write, and I
> > > suspect tha there are other similar cases to worry about. Have you
> > 
> > I'm not following, what are "other similar cases to worry about" and are
> > those already exposed?
> 
> I just mean anything else which might change in the underlay as a result
> of I/O, e.g. timestamps.
> 
> > Re: i_size for O_DIRECT
> > I have not seen prior art for filesystems that use iomap for O_DIRECT
> > that would fiddle with i_size so I didn't want to get into that game.
> > That is to say, shiftfs only supports O_DIRECT if the underlying
> > filesystem is iomap based. This was the same approach that overlayfs
> > wanted to use.
> 
> Are you saying that an O_DIRECT write that extends a file does not
> result in i_size being updated? That doesn't sound right ...

For example, when overlayfs intended to introduce support for this
(which they didn't because of their version of upper and lower) they did
not mess with i_size. That's one of the reasons I didn't. Do you prefer
to copy up attributes to make sure we're not missing anything?

Christian
Seth Forshee July 26, 2019, 12:59 p.m. UTC | #5
On Fri, Jul 26, 2019 at 10:29:38AM +0200, Christian Brauner wrote:
> On Thu, Jul 25, 2019 at 03:26:01PM -0500, Seth Forshee wrote:
> > On Mon, Jul 22, 2019 at 05:05:08PM +0200, Christian Brauner wrote:
> > > On Fri, Jul 19, 2019 at 03:02:21PM -0500, Seth Forshee wrote:
> > > > On Fri, Jul 19, 2019 at 05:50:46PM +0200, Christian Brauner wrote:
> > > > > BugLink: https://bugs.launchpad.net/bugs/1837223
> > > > > 
> > > > > This enabled O_DIRECT support for shiftfs if the underlay supports it.
> > > > > 
> > > > > Currently shiftfs does not handle O_DIRECT if the underlay supports it.
> > > > > This is blocking dqlite - an essential part of LXD - from profiting from
> > > > > the performance benefits of O_DIRECT on suitable filesystems when used
> > > > > with async io such as aio or io_uring.
> > > > > Overlayfs cannot support this directly since the upper filesystem in
> > > > > overlay can be any filesystem. So if the upper filesystem does not
> > > > > support O_DIRECT but the lower filesystem does you're out of luck.
> > > > > Shiftfs does not suffer from the same problem since there is not concept
> > > > > of an upper filesystem in the same way that overlayfs has it.
> > > > > Essentially, shiftfs is a transparent shim relaying everything to the
> > > > > underlay while overlayfs' upper layer is not (completely).
> > > > 
> > > > I get concerned any time shiftfs just copies up some non-trivial data
> > > > from the lower filesystem, that shiftfs is going to get bypassed and
> > > > some important metadata will not get propoerly updated in shiftfs. The
> > > > question that immediately comes to mind in this case is whether i_size
> > > > in the shiftfs indoe gets updated following an O_DIRECT write, and I
> > > > suspect tha there are other similar cases to worry about. Have you
> > > 
> > > I'm not following, what are "other similar cases to worry about" and are
> > > those already exposed?
> > 
> > I just mean anything else which might change in the underlay as a result
> > of I/O, e.g. timestamps.
> > 
> > > Re: i_size for O_DIRECT
> > > I have not seen prior art for filesystems that use iomap for O_DIRECT
> > > that would fiddle with i_size so I didn't want to get into that game.
> > > That is to say, shiftfs only supports O_DIRECT if the underlying
> > > filesystem is iomap based. This was the same approach that overlayfs
> > > wanted to use.
> > 
> > Are you saying that an O_DIRECT write that extends a file does not
> > result in i_size being updated? That doesn't sound right ...
> 
> For example, when overlayfs intended to introduce support for this
> (which they didn't because of their version of upper and lower) they did
> not mess with i_size. That's one of the reasons I didn't. Do you prefer
> to copy up attributes to make sure we're not missing anything?

I think that something like this:

  fd = open(path, O_RDWR|O_DIRECT);
  lseek(fd, 0, SEEK_END);
  write(fd, data, size);
  fstat(fd, &stat);
  printf("%ld\n", (long)stat.st_size);

Should print out the correct size (and similarly correct data for other
attributes). Maybe it will with this patch, I'm just not certain that it
will, so really I'm just asking whether you've tested this sort of
thing. If it doesn't work, then it seems like we probably do need to
copy up the attributes.

Seth
Stefan Bader Aug. 12, 2019, 12:31 p.m. UTC | #6
On 26.07.19 14:59, Seth Forshee wrote:
> On Fri, Jul 26, 2019 at 10:29:38AM +0200, Christian Brauner wrote:
>> On Thu, Jul 25, 2019 at 03:26:01PM -0500, Seth Forshee wrote:
>>> On Mon, Jul 22, 2019 at 05:05:08PM +0200, Christian Brauner wrote:
>>>> On Fri, Jul 19, 2019 at 03:02:21PM -0500, Seth Forshee wrote:
>>>>> On Fri, Jul 19, 2019 at 05:50:46PM +0200, Christian Brauner wrote:
>>>>>> BugLink: https://bugs.launchpad.net/bugs/1837223
>>>>>>
>>>>>> This enabled O_DIRECT support for shiftfs if the underlay supports it.
>>>>>>
>>>>>> Currently shiftfs does not handle O_DIRECT if the underlay supports it.
>>>>>> This is blocking dqlite - an essential part of LXD - from profiting from
>>>>>> the performance benefits of O_DIRECT on suitable filesystems when used
>>>>>> with async io such as aio or io_uring.
>>>>>> Overlayfs cannot support this directly since the upper filesystem in
>>>>>> overlay can be any filesystem. So if the upper filesystem does not
>>>>>> support O_DIRECT but the lower filesystem does you're out of luck.
>>>>>> Shiftfs does not suffer from the same problem since there is not concept
>>>>>> of an upper filesystem in the same way that overlayfs has it.
>>>>>> Essentially, shiftfs is a transparent shim relaying everything to the
>>>>>> underlay while overlayfs' upper layer is not (completely).
>>>>>
>>>>> I get concerned any time shiftfs just copies up some non-trivial data
>>>>> from the lower filesystem, that shiftfs is going to get bypassed and
>>>>> some important metadata will not get propoerly updated in shiftfs. The
>>>>> question that immediately comes to mind in this case is whether i_size
>>>>> in the shiftfs indoe gets updated following an O_DIRECT write, and I
>>>>> suspect tha there are other similar cases to worry about. Have you
>>>>
>>>> I'm not following, what are "other similar cases to worry about" and are
>>>> those already exposed?
>>>
>>> I just mean anything else which might change in the underlay as a result
>>> of I/O, e.g. timestamps.
>>>
>>>> Re: i_size for O_DIRECT
>>>> I have not seen prior art for filesystems that use iomap for O_DIRECT
>>>> that would fiddle with i_size so I didn't want to get into that game.
>>>> That is to say, shiftfs only supports O_DIRECT if the underlying
>>>> filesystem is iomap based. This was the same approach that overlayfs
>>>> wanted to use.
>>>
>>> Are you saying that an O_DIRECT write that extends a file does not
>>> result in i_size being updated? That doesn't sound right ...
>>
>> For example, when overlayfs intended to introduce support for this
>> (which they didn't because of their version of upper and lower) they did
>> not mess with i_size. That's one of the reasons I didn't. Do you prefer
>> to copy up attributes to make sure we're not missing anything?
> 
> I think that something like this:
> 
>   fd = open(path, O_RDWR|O_DIRECT);
>   lseek(fd, 0, SEEK_END);
>   write(fd, data, size);
>   fstat(fd, &stat);
>   printf("%ld\n", (long)stat.st_size);
> 
> Should print out the correct size (and similarly correct data for other
> attributes). Maybe it will with this patch, I'm just not certain that it
> will, so really I'm just asking whether you've tested this sort of
> thing. If it doesn't work, then it seems like we probably do need to
> copy up the attributes.
> 
> Seth
> 
With the long discussion seemingly stopping unresolved I am not sure this
patchset is ready for SRU or not.

-Stefan
Seth Forshee Aug. 12, 2019, 2:45 p.m. UTC | #7
On Mon, Aug 12, 2019 at 02:31:38PM +0200, Stefan Bader wrote:
> On 26.07.19 14:59, Seth Forshee wrote:
> > On Fri, Jul 26, 2019 at 10:29:38AM +0200, Christian Brauner wrote:
> >> On Thu, Jul 25, 2019 at 03:26:01PM -0500, Seth Forshee wrote:
> >>> On Mon, Jul 22, 2019 at 05:05:08PM +0200, Christian Brauner wrote:
> >>>> On Fri, Jul 19, 2019 at 03:02:21PM -0500, Seth Forshee wrote:
> >>>>> On Fri, Jul 19, 2019 at 05:50:46PM +0200, Christian Brauner wrote:
> >>>>>> BugLink: https://bugs.launchpad.net/bugs/1837223
> >>>>>>
> >>>>>> This enabled O_DIRECT support for shiftfs if the underlay supports it.
> >>>>>>
> >>>>>> Currently shiftfs does not handle O_DIRECT if the underlay supports it.
> >>>>>> This is blocking dqlite - an essential part of LXD - from profiting from
> >>>>>> the performance benefits of O_DIRECT on suitable filesystems when used
> >>>>>> with async io such as aio or io_uring.
> >>>>>> Overlayfs cannot support this directly since the upper filesystem in
> >>>>>> overlay can be any filesystem. So if the upper filesystem does not
> >>>>>> support O_DIRECT but the lower filesystem does you're out of luck.
> >>>>>> Shiftfs does not suffer from the same problem since there is not concept
> >>>>>> of an upper filesystem in the same way that overlayfs has it.
> >>>>>> Essentially, shiftfs is a transparent shim relaying everything to the
> >>>>>> underlay while overlayfs' upper layer is not (completely).
> >>>>>
> >>>>> I get concerned any time shiftfs just copies up some non-trivial data
> >>>>> from the lower filesystem, that shiftfs is going to get bypassed and
> >>>>> some important metadata will not get propoerly updated in shiftfs. The
> >>>>> question that immediately comes to mind in this case is whether i_size
> >>>>> in the shiftfs indoe gets updated following an O_DIRECT write, and I
> >>>>> suspect tha there are other similar cases to worry about. Have you
> >>>>
> >>>> I'm not following, what are "other similar cases to worry about" and are
> >>>> those already exposed?
> >>>
> >>> I just mean anything else which might change in the underlay as a result
> >>> of I/O, e.g. timestamps.
> >>>
> >>>> Re: i_size for O_DIRECT
> >>>> I have not seen prior art for filesystems that use iomap for O_DIRECT
> >>>> that would fiddle with i_size so I didn't want to get into that game.
> >>>> That is to say, shiftfs only supports O_DIRECT if the underlying
> >>>> filesystem is iomap based. This was the same approach that overlayfs
> >>>> wanted to use.
> >>>
> >>> Are you saying that an O_DIRECT write that extends a file does not
> >>> result in i_size being updated? That doesn't sound right ...
> >>
> >> For example, when overlayfs intended to introduce support for this
> >> (which they didn't because of their version of upper and lower) they did
> >> not mess with i_size. That's one of the reasons I didn't. Do you prefer
> >> to copy up attributes to make sure we're not missing anything?
> > 
> > I think that something like this:
> > 
> >   fd = open(path, O_RDWR|O_DIRECT);
> >   lseek(fd, 0, SEEK_END);
> >   write(fd, data, size);
> >   fstat(fd, &stat);
> >   printf("%ld\n", (long)stat.st_size);
> > 
> > Should print out the correct size (and similarly correct data for other
> > attributes). Maybe it will with this patch, I'm just not certain that it
> > will, so really I'm just asking whether you've tested this sort of
> > thing. If it doesn't work, then it seems like we probably do need to
> > copy up the attributes.
> > 
> > Seth
> > 
> With the long discussion seemingly stopping unresolved I am not sure this
> patchset is ready for SRU or not.

I think we are still waiting for Christian to get back with some test
results here, so these patches should wait.
Christian Brauner Aug. 13, 2019, 5:58 a.m. UTC | #8
After the sprint I was on vacation.
I'll try to get around to it soon.

Christian

On August 12, 2019 4:45:19 PM GMT+02:00, Seth Forshee <seth.forshee@canonical.com> wrote:
>On Mon, Aug 12, 2019 at 02:31:38PM +0200, Stefan Bader wrote:
>> On 26.07.19 14:59, Seth Forshee wrote:
>> > On Fri, Jul 26, 2019 at 10:29:38AM +0200, Christian Brauner wrote:
>> >> On Thu, Jul 25, 2019 at 03:26:01PM -0500, Seth Forshee wrote:
>> >>> On Mon, Jul 22, 2019 at 05:05:08PM +0200, Christian Brauner
>wrote:
>> >>>> On Fri, Jul 19, 2019 at 03:02:21PM -0500, Seth Forshee wrote:
>> >>>>> On Fri, Jul 19, 2019 at 05:50:46PM +0200, Christian Brauner
>wrote:
>> >>>>>> BugLink: https://bugs.launchpad.net/bugs/1837223
>> >>>>>>
>> >>>>>> This enabled O_DIRECT support for shiftfs if the underlay
>supports it.
>> >>>>>>
>> >>>>>> Currently shiftfs does not handle O_DIRECT if the underlay
>supports it.
>> >>>>>> This is blocking dqlite - an essential part of LXD - from
>profiting from
>> >>>>>> the performance benefits of O_DIRECT on suitable filesystems
>when used
>> >>>>>> with async io such as aio or io_uring.
>> >>>>>> Overlayfs cannot support this directly since the upper
>filesystem in
>> >>>>>> overlay can be any filesystem. So if the upper filesystem does
>not
>> >>>>>> support O_DIRECT but the lower filesystem does you're out of
>luck.
>> >>>>>> Shiftfs does not suffer from the same problem since there is
>not concept
>> >>>>>> of an upper filesystem in the same way that overlayfs has it.
>> >>>>>> Essentially, shiftfs is a transparent shim relaying everything
>to the
>> >>>>>> underlay while overlayfs' upper layer is not (completely).
>> >>>>>
>> >>>>> I get concerned any time shiftfs just copies up some
>non-trivial data
>> >>>>> from the lower filesystem, that shiftfs is going to get
>bypassed and
>> >>>>> some important metadata will not get propoerly updated in
>shiftfs. The
>> >>>>> question that immediately comes to mind in this case is whether
>i_size
>> >>>>> in the shiftfs indoe gets updated following an O_DIRECT write,
>and I
>> >>>>> suspect tha there are other similar cases to worry about. Have
>you
>> >>>>
>> >>>> I'm not following, what are "other similar cases to worry about"
>and are
>> >>>> those already exposed?
>> >>>
>> >>> I just mean anything else which might change in the underlay as a
>result
>> >>> of I/O, e.g. timestamps.
>> >>>
>> >>>> Re: i_size for O_DIRECT
>> >>>> I have not seen prior art for filesystems that use iomap for
>O_DIRECT
>> >>>> that would fiddle with i_size so I didn't want to get into that
>game.
>> >>>> That is to say, shiftfs only supports O_DIRECT if the underlying
>> >>>> filesystem is iomap based. This was the same approach that
>overlayfs
>> >>>> wanted to use.
>> >>>
>> >>> Are you saying that an O_DIRECT write that extends a file does
>not
>> >>> result in i_size being updated? That doesn't sound right ...
>> >>
>> >> For example, when overlayfs intended to introduce support for this
>> >> (which they didn't because of their version of upper and lower)
>they did
>> >> not mess with i_size. That's one of the reasons I didn't. Do you
>prefer
>> >> to copy up attributes to make sure we're not missing anything?
>> > 
>> > I think that something like this:
>> > 
>> >   fd = open(path, O_RDWR|O_DIRECT);
>> >   lseek(fd, 0, SEEK_END);
>> >   write(fd, data, size);
>> >   fstat(fd, &stat);
>> >   printf("%ld\n", (long)stat.st_size);
>> > 
>> > Should print out the correct size (and similarly correct data for
>other
>> > attributes). Maybe it will with this patch, I'm just not certain
>that it
>> > will, so really I'm just asking whether you've tested this sort of
>> > thing. If it doesn't work, then it seems like we probably do need
>to
>> > copy up the attributes.
>> > 
>> > Seth
>> > 
>> With the long discussion seemingly stopping unresolved I am not sure
>this
>> patchset is ready for SRU or not.
>
>I think we are still waiting for Christian to get back with some test
>results here, so these patches should wait.
Christian Brauner Aug. 13, 2019, 11:48 a.m. UTC | #9
On Mon, Aug 12, 2019 at 09:45:19AM -0500, Seth Forshee wrote:
> On Mon, Aug 12, 2019 at 02:31:38PM +0200, Stefan Bader wrote:
> > On 26.07.19 14:59, Seth Forshee wrote:
> > > On Fri, Jul 26, 2019 at 10:29:38AM +0200, Christian Brauner wrote:
> > >> On Thu, Jul 25, 2019 at 03:26:01PM -0500, Seth Forshee wrote:
> > >>> On Mon, Jul 22, 2019 at 05:05:08PM +0200, Christian Brauner wrote:
> > >>>> On Fri, Jul 19, 2019 at 03:02:21PM -0500, Seth Forshee wrote:
> > >>>>> On Fri, Jul 19, 2019 at 05:50:46PM +0200, Christian Brauner wrote:
> > >>>>>> BugLink: https://bugs.launchpad.net/bugs/1837223
> > >>>>>>
> > >>>>>> This enabled O_DIRECT support for shiftfs if the underlay supports it.
> > >>>>>>
> > >>>>>> Currently shiftfs does not handle O_DIRECT if the underlay supports it.
> > >>>>>> This is blocking dqlite - an essential part of LXD - from profiting from
> > >>>>>> the performance benefits of O_DIRECT on suitable filesystems when used
> > >>>>>> with async io such as aio or io_uring.
> > >>>>>> Overlayfs cannot support this directly since the upper filesystem in
> > >>>>>> overlay can be any filesystem. So if the upper filesystem does not
> > >>>>>> support O_DIRECT but the lower filesystem does you're out of luck.
> > >>>>>> Shiftfs does not suffer from the same problem since there is not concept
> > >>>>>> of an upper filesystem in the same way that overlayfs has it.
> > >>>>>> Essentially, shiftfs is a transparent shim relaying everything to the
> > >>>>>> underlay while overlayfs' upper layer is not (completely).
> > >>>>>
> > >>>>> I get concerned any time shiftfs just copies up some non-trivial data
> > >>>>> from the lower filesystem, that shiftfs is going to get bypassed and
> > >>>>> some important metadata will not get propoerly updated in shiftfs. The
> > >>>>> question that immediately comes to mind in this case is whether i_size
> > >>>>> in the shiftfs indoe gets updated following an O_DIRECT write, and I
> > >>>>> suspect tha there are other similar cases to worry about. Have you
> > >>>>
> > >>>> I'm not following, what are "other similar cases to worry about" and are
> > >>>> those already exposed?
> > >>>
> > >>> I just mean anything else which might change in the underlay as a result
> > >>> of I/O, e.g. timestamps.
> > >>>
> > >>>> Re: i_size for O_DIRECT
> > >>>> I have not seen prior art for filesystems that use iomap for O_DIRECT
> > >>>> that would fiddle with i_size so I didn't want to get into that game.
> > >>>> That is to say, shiftfs only supports O_DIRECT if the underlying
> > >>>> filesystem is iomap based. This was the same approach that overlayfs
> > >>>> wanted to use.
> > >>>
> > >>> Are you saying that an O_DIRECT write that extends a file does not
> > >>> result in i_size being updated? That doesn't sound right ...
> > >>
> > >> For example, when overlayfs intended to introduce support for this
> > >> (which they didn't because of their version of upper and lower) they did
> > >> not mess with i_size. That's one of the reasons I didn't. Do you prefer
> > >> to copy up attributes to make sure we're not missing anything?
> > > 
> > > I think that something like this:
> > > 
> > >   fd = open(path, O_RDWR|O_DIRECT);
> > >   lseek(fd, 0, SEEK_END);
> > >   write(fd, data, size);
> > >   fstat(fd, &stat);
> > >   printf("%ld\n", (long)stat.st_size);
> > > 
> > > Should print out the correct size (and similarly correct data for other
> > > attributes). Maybe it will with this patch, I'm just not certain that it
> > > will, so really I'm just asking whether you've tested this sort of
> > > thing. If it doesn't work, then it seems like we probably do need to
> > > copy up the attributes.
> > > 
> > > Seth
> > > 
> > With the long discussion seemingly stopping unresolved I am not sure this
> > patchset is ready for SRU or not.
> 
> I think we are still waiting for Christian to get back with some test
> results here, so these patches should wait.

The test outlined above shows that the correct file size is reported.

Christian
diff mbox series

Patch

diff --git a/fs/shiftfs.c b/fs/shiftfs.c
index 49f6714e9f95..addaa6e21e57 100644
--- a/fs/shiftfs.c
+++ b/fs/shiftfs.c
@@ -1126,6 +1126,9 @@  static int shiftfs_open(struct inode *inode, struct file *file)
 	}
 
 	file->private_data = file_info;
+	/* For O_DIRECT dentry_open() checks f_mapping->a_ops->direct_IO. */
+	file->f_mapping = realfile->f_mapping;
+
 	file_info->realfile = realfile;
 	return 0;
 }