mbox series

[v4,00/10] fscache: Replace and remove old I/O API

Message ID 163551653404.1877519.12363794970541005441.stgit@warthog.procyon.org.uk
Headers show
Series fscache: Replace and remove old I/O API | expand

Message

David Howells Oct. 29, 2021, 2:08 p.m. UTC
Here's a set of patches that removes the old fscache I/O API by the following
means:

 (1) A simple fallback API is added that can read or write a single page
     synchronously.  The functions for this have "fallback" in their names
     as they have to be removed at some point.

 (2) An implementation of this is provided in cachefiles.  It creates a kiocb
     to use DIO to the backing file rather than calling readpage on the
     backing filesystem page and then snooping the page wait queue.

 (3) NFS is switched to use the fallback API.

 (4) CIFS is switched to use the fallback API also for the moment.

 (5) 9P is switched to using netfslib.

 (6) The old I/O API is removed from fscache and the page snooping
     implementation is removed from cachefiles.

The reasons for doing this are:

 (A) Using a kiocb to do asynchronous DIO from/to the pages of the backing
     file is now a possibility that didn't exist when cachefiles was created.
     This is much simpler than the snooping mechanism with a proper callback
     path and it also requires fewer copies and less memory.

 (B) We have to stop using bmap() or SEEK_DATA/SEEK_HOLE to work out what
     blocks are present in the backing file is dangerous and can lead to data
     corruption if the backing filesystem can insert or remove blocks of zeros
     arbitrarily in order to optimise its extent list[1].

     Whilst this patchset doesn't fix that yet, it does simplify the code and
     the fix for that can be made in a subsequent patchset.

 (C) In order to fix (B), the cache will need to keep track itself of what
     data is present.  To make this easier to manage, the intention is to
     increase the cache block granularity to, say, 256KiB - importantly, a
     size that will span multiple pages - which means the single-page
     interface will have to go away.  netfslib is designed to deal with
     that on behalf of a filesystem, though a filesystem could use raw
     cache calls instead and manage things itself.

These patches can be found also on:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-remove-old-io

David

Changes
=======
ver #4:
  - Fixed cifs to not check PG_fscache before writing a page to the cache.

ver #3:
  - Added a patch to make fscache_cookie_enabled() handle a NULL cookie.
  - Added a patch to make cachefiles_prepare_read() indicate that the
    caller should fill a block with zeros if it's beyond EOF rather than
    risking the read get marked invalid and trigger a warning.
  - Merged fixes from Dave Wysochanski for the NFS patch[4].

ver #2:
  - Changed "deprecated" to "fallback" in the new function names[2].
  - Cleaned up some kernel test robot warnings[3].
  - Made the netfs read helpers use NETFS_READ_HOLE_* flags.


References
==========

Link: https://lore.kernel.org/r/YO17ZNOcq+9PajfQ@mit.edu [1]
Link: https://lore.kernel.org/r/CAHk-=wiVK+1CyEjW8u71zVPK8msea=qPpznX35gnX+s8sXnJTg@mail.gmail.com/ [2]
Link: https://lore.kernel.org/r/202109150420.QX7dDzSE-lkp@intel.com/ [3]
Link: https://listman.redhat.com/archives/linux-cachefs/2021-October/msg00011.html [4]

Older postings
==============

Link: https://lore.kernel.org/r/163162767601.438332.9017034724960075707.stgit@warthog.procyon.org.uk/ # rfc v1
Link: https://lore.kernel.org/r/163189104510.2509237.10805032055807259087.stgit@warthog.procyon.org.uk/ # rfc v2

Note that some of this was seen in previous patchsets too:

# [RFC PATCH 00/61] fscache, cachefiles: Rewrite the I/O interface in terms of kiocb/iov_iter
Link: https://lore.kernel.org/r/158861203563.340223.7585359869938129395.stgit@warthog.procyon.org.uk/
# [PATCH 00/14] fscache: Rewrite 1: Disable and clean in preparation for rewrite
Link: https://lore.kernel.org/r/159465766378.1376105.11619976251039287525.stgit@warthog.procyon.org.uk/
# [RFC PATCH 00/76] fscache: Modernisation
Link: https://lore.kernel.org/r/160588455242.3465195.3214733858273019178.stgit@warthog.procyon.org.uk/

---
David Howells (10):
      fscache: Generalise the ->begin_read_operation method
      fscache: Fix fscache_cookie_enabled() to handle NULL cookie
      cachefiles: Always indicate we should fill a post-EOF page with zeros
      fscache: Implement a fallback I/O interface to replace the old API
      nfs: Move to using the alternate fallback fscache I/O API
      9p: Convert to using the netfs helper lib to do reads and caching
      cifs: Move to using the alternate fallback fscache I/O API
      fscache: Remove the old I/O API
      fscache: Remove stats that are no longer used
      fscache: Update the documentation to reflect I/O API changes


 .../filesystems/caching/backend-api.rst       |  138 +--
 .../filesystems/caching/netfs-api.rst         |  385 +-----
 fs/9p/Kconfig                                 |    1 +
 fs/9p/cache.c                                 |  137 ---
 fs/9p/cache.h                                 |   98 +-
 fs/9p/v9fs.h                                  |    9 +
 fs/9p/vfs_addr.c                              |  195 ++-
 fs/9p/vfs_file.c                              |   21 +-
 fs/cachefiles/Makefile                        |    1 -
 fs/cachefiles/interface.c                     |   15 -
 fs/cachefiles/internal.h                      |   38 -
 fs/cachefiles/io.c                            |   57 +-
 fs/cachefiles/main.c                          |    1 -
 fs/cachefiles/rdwr.c                          |  972 ---------------
 fs/cifs/file.c                                |   64 +-
 fs/cifs/fscache.c                             |  105 +-
 fs/cifs/fscache.h                             |   76 +-
 fs/fscache/cache.c                            |    6 -
 fs/fscache/cookie.c                           |   10 -
 fs/fscache/internal.h                         |   58 +-
 fs/fscache/io.c                               |  137 ++-
 fs/fscache/object.c                           |    2 -
 fs/fscache/page.c                             | 1066 -----------------
 fs/fscache/stats.c                            |   73 +-
 fs/netfs/read_helper.c                        |    8 +-
 fs/nfs/file.c                                 |   14 +-
 fs/nfs/fscache-index.c                        |   26 -
 fs/nfs/fscache.c                              |  170 +--
 fs/nfs/fscache.h                              |   84 +-
 fs/nfs/read.c                                 |   25 +-
 fs/nfs/write.c                                |    7 +-
 include/linux/fscache-cache.h                 |  131 --
 include/linux/fscache.h                       |  445 ++-----
 include/linux/netfs.h                         |   17 +-
 34 files changed, 569 insertions(+), 4023 deletions(-)
 delete mode 100644 fs/cachefiles/rdwr.c

Comments

Linus Torvalds Oct. 29, 2021, 4:49 p.m. UTC | #1
On Fri, Oct 29, 2021 at 7:09 AM David Howells <dhowells@redhat.com> wrote:
>
>  (1) A simple fallback API is added that can read or write a single page
>      synchronously.  The functions for this have "fallback" in their names
>      as they have to be removed at some point.

David, I still don't understand WHY.

I read the explanations in the commits, and that didn't help either.

Why, of why, do you insist of adding this intermediate interface that
is already documented to "must be removed" at the point it is even
added?

What's the point of adding garbage that is useless in the long run?

Why is the first step not just "remove fscache"?

Why is there this addition of the "deprecated" interface - that you
have now renamed "fallback"?

I agree that "fallback" is a less annoying name, so that renaming is
an improvement, but WHY?

I absolutely detested your whole "move old garbage around before
removal", and I also detest this "add new garbage that will be
removed".

What's the point? Why isn't the fix just "remove CONFIG_FSCACHE and
all the code".

You already *HAVE* the "fallback" code - it's all that

    #else /* CONFIG_NFS_FSCACHE */
    static inline int nfs_fscache_register(void) { return 0; }
    static inline void nfs_fscache_unregister(void) {}
    ...

stuff in <nfs/fscache.h> and friends. So why do you need _new_
fallback code, when CONFIG_FSCACHE already exists and already has a
"this disables fscache"?

Maybe there is some really good reason, but that really good reason
sure as hell isn't documented anywhere, and I really don't see the
point.

So let me say this again:

 - it would be much better if you could incrementally just improve the
existing FSCACHE so that it just _works_ all the time, and fixes the
problems in it, and a bisection works, and there is no flag-day.

 - but dammit, if you have to have a flag-day, then there is NO POINT
in all this "move the old code around before moving it", or "add a
fallback interface before removing it again".

Oh, I can understand wanting to keep the header files around in case
the interfaces end up being similar enough in the end that it all
matters.

But I don't understand why you do this kind of crud:

 fs/cachefiles/io.c      |   28 ++++++++-
 fs/fscache/io.c         |  137 +++++++++++++++++++++++++++++++++++++++------

when the neither of those directories will ever even be *compiled* if
CONFIG_FSCACHE isn't true (because CACHEFILES has a "depends on
FSCACHE").

See my argument? If FSCACHE isn't usable during the transition, then
don't make these kinds of pointless code movement or creation things
that are just dead.

There's absolutely no point in having some "fallback" interface that
allows people to test some configuration that simply isn't relevant.
It doesn't help anything. It just adds more noise and more
configurations, and no actual value that I can see.

It doesn't help bisectability: if some bug ever bisects to the
fallback code, what does that tell us? Nothing. Sure, it trivially
tells us the fallback code was buggy, but since the fallback code has
been removed afterwards, the _value_ of that information is nil,
zilch, nada. It's not "information", it's just "worthless data".

And hey, maybe there's some issue that I don't understand, and I don't
see. But if there is some subtle value here, it should  have been
documented.

So I say exactly the same thing I said last time: if the old fscache
code is not usable, and you can't incrementally fix it so that it
works all the time, then JUST REMOVE IT ALL. Moving it elsewhere
before the removal is only pointless noise. But adding some fallback
intermediate code before removal is ALSO just pointless noise.

Doing a flag-day with "switch from A to B" is already painful and
wrong. I don't like it. But I like it even _less_, if it's a "switch
from A to B to C".

If you do want t9o have a "halfway state", the only halfway state that
makes sense to me is something like

 (a) make all the changes to the old FSCACHE - keeping it all
_working_ during this phase - to make it have the same _interfaces_ as
the new fscache will have.

 (b) then remove the old FSCACHE entirely

 (c) then plop in the new FSCACHE

But note how there was no "fallback" stage anywhere. No code that lies
around dead at any point. At each point it was either all working old
or all working new (or nothing at all).

Yes, in this case that "step (a)" is extra work and you're basically
modifying code that you know will be removed, but the advantage now is

 -  at least the fscache _users_ are being modified while the old and
tested world is still working, and the interface change is
"bisectable" in that sense. That's useful in itself.

 - if it turns out that people have problems with the new generation
FSCACHE, they can reverse steps (b) and (c) without having to touch
and revert all the other filesystems changes.

IOW, if a "same interfaces" state exists, that's fine. But for it to
make sense, those same interfaces have to be actually _useful_, not
some fallback code that is neither the old nor the new.

And maybe you can't do that "step (a)" because the interfaces are part
of the fundamental problem with the old FSCACHE. But if you drop (a),
then don't add some stage between (b) and (c), because it's not
helpful.

And again, maybe I'm missing something. But really, I don't see why
this "remove old FSCACHE" stage should *ever* make any modifications
to fs/fscache/* and fs/cachefiles/* when disabling the config option
means that it just won't get built at all.

               Linus
David Howells Oct. 29, 2021, 5:55 p.m. UTC | #2
Linus Torvalds <torvalds@linux-foundation.org> wrote:

>  - it would be much better if you could incrementally just improve the
> existing FSCACHE so that it just _works_ all the time, and fixes the
> problems in it, and a bisection works, and there is no flag-day.

As I stated in the cover letters, the advent of the kiocb and tmpfile
interfaces provide a way to massively simplify the way fscache and cachefiles
work - and, as a result, remove over five thousand lines of code.

With the changes I'm looking at making, I can get rid of the object state
machine as it stands and the operation scheduling - which means all of the
code in:

	fs/fscache/object.c
	fs/fscache/operation.c

gets deleted along with:

	fs/fscache/page.c
	fs/cachefiles/rdwr.c

when I remove the deprecated I/O code (which is what this patchset does).

A large chunk of code in fs/fscache/cookie.c gets removed and replaced too

Further, I've been looking at simplifying the index management with an eye to
completely replacing the cache backend with something more akin to a tagged
cache with fixed-size storage units.  This also allows the cachefiles code to
be simplified a bit too.

This means bisection is of limited value and why I'm looking at a 'flag day'.


There is one particular problem that *this* patchset was intended to address:
I want to convert the filesystems that are going to be accessing fscache to
use netfslib, but they're not all there yet so that there's one common body of
code that does VM interface, cache I/O and content crypto (eg. fscrypt).  Jeff
and I have converted afs and ceph already and I've got a conversion for 9p in
this patchset.  I have a partial patch for cifs/smb and have been discussing
that with Steve and Shyam.  Dave Wysochanski has a set of patches for nfs, but
there's pushback on it from the nfs maintainers.

This particular patchset is intended to enable removal of the old I/O routines
by changing nfs and cifs to use a "fallback" method to use the new kiocb-using
API and thus allow me to get on with the rest of it.

However, if you would rather I just removed all of fscache and (most of[*])
cachefiles, that I can do.  I have the patches arrayed in a way that makes
them easier to explain logically and, hopefully, easier to review.  It would,
however, leave any netfs that isn't converted to use netfslib unable to use
caching until converted.

David

[*] The code that deals with the cachefilesd UAPI will be mostly unchanged.
Linus Torvalds Oct. 29, 2021, 6:17 p.m. UTC | #3
On Fri, Oct 29, 2021 at 10:55 AM David Howells <dhowells@redhat.com> wrote:
>
> This means bisection is of limited value and why I'm looking at a 'flag day'.

So I'm kind of resigned to that by now, I just wanted to again clarify
that the rest of my comments are about "if we have to deal with a flag
dat anyway, then make it as simple and straightforward as possible,
rather than adding extra steps that are only noise".

> [ Snip explanation of netfslib ]
> This particular patchset is intended to enable removal of the old I/O routines
> by changing nfs and cifs to use a "fallback" method to use the new kiocb-using
> API and thus allow me to get on with the rest of it.

Ok, at least that explains that part.

But:

> However, if you would rather I just removed all of fscache and (most of[*])
> cachefiles, that I can do.

I assume and think that if you just do that part first, then the
"convert to netfslib" of afs and ceph at that later stage will mean
that the fallback code will never be needed?

So I would much prefer that streamlined model over one that adds that
temporary intermediate stage only for it to be deleted.

             Linus
David Howells Oct. 29, 2021, 6:51 p.m. UTC | #4
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> But:
> 
> > However, if you would rather I just removed all of fscache and (most of[*])
> > cachefiles, that I can do.
> 
> I assume and think that if you just do that part first, then the
> "convert to netfslib" of afs and ceph at that later stage will mean
> that the fallback code will never be needed?

The netfslib coversions for afs and ceph are already in your tree and I have a
patch here to do that for 9p (if you're willing to take that in the upcoming
merge window?).

The issue is cifs[*] and nfs.  I could leave caching in those disabled,
pending approved patches for those filesystems.  This would mean that I
wouldn't need the fallback code.

An alternative is that I could move the "fallback" code into fs/nfs/fscache.c
and fs/cifs/fscache.c if that would be easier and merge it into the functions
there.  The problem will come when the cache wants to do I/O in larger units
than page size to suit its own block size[**].

David

[*] As it happens, it turns out that cifs seems to have a bug in it that
causes the entire cache for a superblock to be discarded each time that
superblock is mounted.

[**] At some point the cache *has* to start keeping track of what data it is
holding rather than relying on bmap/SEEK_DATA/SEEK_HOLE to get round the
extent-bridging problem.  I'm trying to take a leaf out of the book of other
caching filesystems and use larger block sizes (e.g. 256K) to reduce the
overhead of cache metadata.