mbox series

[v1,00/14] DFS failover

Message ID 20181115142103.24617-1-aaptel@suse.com
Headers show
Series DFS failover | expand

Message

Aurélien Aptel Nov. 15, 2018, 2:20 p.m. UTC
Hi linux-cifs,

This patchset adds

* a DFS cache so that DFS links can be resolved even when hosts are down
* DFS failover so that if the DFS target we are connected to is down
  cifs.ko will try to reconnect to a different target if there are
  any.

This is 90% Paulo's work, I gave him the task and a general roadmap to
go about it thinking it would not be too hard and it turned into this
massive, intricate 2k lines beast as we both slowly realized all the
ins and outs and subtleties of the problem and resulting solutions. So
congrats to him.

 * * *

* What is DFS?

DFS is basically symbolic links across servers. You can have links
that points to a UNC path to directories on other servers. The client
has to resolve the link, get the target from the result, and connect
to the target. There is no proxy-ing involved.

A share that can take resolving requests and respond to them is called
a DFS root. A host with such a share is in a "standalone" setup.

A domain can be setup so that when accessing a share, the share
*itself* is redirected to a DFS root on a separate server. Essentially
the share is a symbolic link itself and the domain doesn't host it,
even though you typed \\domain\dfsshare to access it. This is called a
"domain-based" setup.

Each of those links can have multiple targets so that if one fails you
can try another.

So how do you know a file is a link or not? Well if you try to access
it the server will reply with the error STATUS_PATH_NOT_COVERED. Then
you are supposed to issue a DFS Referral Request on that path and you
get the targets in the response. You are then supposed to connect to
one of the target and repeat.

Microsoft has a dedicated document [MS-DFSC] about it which I suggest
you take a look.

* How does DFS work in cifs.ko?

There are 2 entry points for DFS currently:

 - cifs_mount(): the "static" codepath
 - dentry automount: the dynamic codepath

The static code path is when you mount a UNC path \\FOO\d1\d2\d3.
cifs_mount() will follow all the links until you reach the final host
and mount that as a regular mount point.

The dynamic code path is when you mount a DFS root (= a share that has
links) and, *after* mounting, access one of those links. In this case
cifs.ko will set an AUTOMOUNT flag on the inode&dentry of the file
which is a linux VFS thing that instruct the VFS upper-layer to
lazily call the d_automount() dentry operations when needed.

That operation does a VFS sub-mount on that dentry. It ends up calling
cifs_mount() and it will get its own superblock and everything.

The problem is we pass the resolved path when mounting which means if
it fails, we cannot resolve again and use a different target.
Note that the original mount path can have multiple nested links.

This is why we need a cache to store results to do failover properly.

* What this patch adds

 - a DFS cache so that DFS links can be resolved even when hosts are down
 - DFS failover so that if the DFS target we are connected to is down
   cifs.ko will try to reconnect to a different target if there are
   any

          +-  cifs: Refactor out cifs_mount()
          |   cifs: Skip any trailing backslashes from UNC
 refactor |   cifs: Fix separator when building path from dentry
 &bugfix  |   cifs: Make devname param optional in cifs_compose_mount_options()
          |   cifs: Respect -EAGAIN when querying paths
          |   cifs: Save TTL value when parsing DFS referrals
          +-  cifs: auto disable 'serverino' in dfs mounts
 new impl,|   cifs: Add DFS cache routines <------ main new code
 replace, |   cifs: Make use of DFS cache to get new DFS referrals
 reco     |   cifs: Add support for failover in cifs_mount()
   &      |   cifs: Add support for failover in cifs_reconnect()
 failover |   cifs: start DFS cache refresher in cifs_mount()
          |   cifs: Add support for failover in smb2_reconnect()
          +-  cifs: Add support for failover in cifs_reconnect_tcon()

** The DFS cache

The DFS cache is a hashtable that maps UNC paths to cache entries.

A cache entry contains:
 - the UNC path it is mapped on
 - how much the the UNC path the entry consumes
 - flags
 - a Time-To-Live after which the entry expires
 - a list of possible targets (linked lists of UNC paths)
 - a "hint target" pointing the last known working target or the first
   target if none were tried. This hint lets cifs.ko remember and try
   working targets first.

* Looking for an entry in the cache is done with dfs_cache_find()
  - if no valid entries are found, a DFS query is made, stored in the
    cache and returned
  - the full target list can be copied and returned to avoid race
    conditions and looped on with the help with the
    dfs_cache_tgt_iterator

* Updating the target hint to the next target is done with
  dfs_cache_update_tgthint()

These functions have a dfs_cache_noreq_XXX() version that doesn't
fetches referrals if no entries are found. These versions don't
require the tcp/ses/tcon/cifs_sb parameters as a result.

** Refreshing expired cache entries

Expired entries cannot be used and since they have a pretty short TTL
in order for them to be useful for failover the DFS cache adds a
delayed work called periodically to keep them fresh.

Since we might not have available connections to issue the referral
request when refreshing we need to store volume_info structs with
credentials and other needed info to be able to connect to the right
server.

** Mount failover

The static and dynamic codepaths were patched to use the DFS cache to
try alternative targets.

We store the initial user-provided mount path in the superblock as
origin_fullpath.

** Reconnect failover

As you know the reconnect logic isn't the simplest to follow and we
had to tweak some things:

Since we might try to reconnect to multiple targets and we do this
sequentially threads waiting for tcp reconnection should wait the
socket timeout x number of targets.

** Server file id

When following a DFS link you connect to a different server with a
different set of file ids. Those 2 sets of ids can overlap as a
result. Similarly if you failover to a different server, you will get
a different set file ids than the ones you initially got from your
original servers. We decided to disable server inode in case of
failover and had to tweak the logic of dentry revalidation in order to
not return -ESTALE on syscalls.

** Remaining issues

We hit problem sometime where we suspect if the reconnect codepath is
triggered *while* mounting a DFS link you hit a NULL-ptr deref in the
reconnection code. This may be an already existing bug.

* Testing

This was tested in various ways:

 - static and dynamic path at mount time with random target initially down
 - dropping packets from an already mounted connection and waiting for
   IO timeout or echo-thread timeout
 - soft and hard mount (hard means only fail to userspace if there is no other way)

We have a little testsuite that tries mounting every weird
combinations of links and paths in the static or dynamic code path and
a simple reconnect test. We used iptables on the client to drop
packets to/from the server to simulate failure. All of our testing was
on very short lived sessions though, and we need more testing on long
living ones.

Our test setups were the following:

* a 3 VM Windows Server "domain-based" setup (SMB1, SMB3):

          (share link)
DOM/dfstest -> DFSROOT1/dfstest [files and links to] -> {DFSROOT1/share1,DFSROOT2/share2}
            -> DFSROOT2/dfstest [files and links to] -> {DFSROOT1/share1,DFSROOT2/share2}

* a 3 VM "standalone" samba setup (SMB1+unix extension, SMB3):

ROOT/dfstest [files and links to] -> {TARGET1/share1, TARGET2/share2}

 * * *

Paulo Alcantara (14):
  cifs: Refactor out cifs_mount()
  cifs: Skip any trailing backslashes from UNC
  cifs: Fix separator when building path from dentry
  cifs: Make devname param optional in cifs_compose_mount_options()
  cifs: Respect -EAGAIN when querying paths
  cifs: Save TTL value when parsing DFS referrals
  cifs: auto disable 'serverino' in dfs mounts
  cifs: Add DFS cache routines
  cifs: Make use of DFS cache to get new DFS referrals
  cifs: Add support for failover in cifs_mount()
  cifs: Add support for failover in cifs_reconnect()
  cifs: start DFS cache refresher in cifs_mount()
  cifs: Add support for failover in smb2_reconnect()
  cifs: Add support for failover in cifs_reconnect_tcon()

 fs/cifs/Makefile       |    2 +-
 fs/cifs/cifs_debug.c   |   12 +
 fs/cifs/cifs_dfs_ref.c |  141 +++--
 fs/cifs/cifs_fs_sb.h   |    9 +
 fs/cifs/cifsfs.c       |   17 +-
 fs/cifs/cifsglob.h     |   14 +-
 fs/cifs/cifsproto.h    |   28 +-
 fs/cifs/cifssmb.c      |   88 ++-
 fs/cifs/connect.c      |  889 +++++++++++++++++++++++--------
 fs/cifs/dfs_cache.c    | 1379 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/cifs/dfs_cache.h    |   97 ++++
 fs/cifs/dir.c          |    2 +-
 fs/cifs/inode.c        |   49 +-
 fs/cifs/misc.c         |   34 +-
 fs/cifs/smb1ops.c      |   15 +-
 fs/cifs/smb2ops.c      |   23 +-
 fs/cifs/smb2pdu.c      |   88 ++-
 17 files changed, 2565 insertions(+), 322 deletions(-)
 create mode 100644 fs/cifs/dfs_cache.c
 create mode 100644 fs/cifs/dfs_cache.h