diff mbox series

fix netfs/folios regression

Message ID CAH2r5msJQGww+MAJLpA9qNw_jDt9ymiHO+bcpTkGMJpJdVc=gA@mail.gmail.com
State New
Headers show
Series fix netfs/folios regression | expand

Commit Message

Steve French Feb. 6, 2024, 11:14 p.m. UTC
A case was recently reported where an old (SMB1) server negotiated a
maximum write size that was not a multiple of PAGE_SIZE, and it caused
easy to reproduce data corruptions on sequential writes (e.g. cp) for
files that were bigger than maximum write size.   This could also be
reproduced by setting the optional mount parm ("wsize") to a
non-standard value that was not a multiple of 4096.  The problem was
introduced in 6.3-rc1 or rc2 probably by patch:
"cifs: Change the I/O paths to use an iterator rather than a page list"

The code in question is a little hard to follow, and may eventually
get rewritten by later folio/netfs patches from David Howells but the
problem is in
cifs_write_back_from_locked_folio() and cifs_writepages_region() where
after the write (of maximum write size) completes, the next write
skips to the beginning of the next page (leaving the tail end of the
previous page unwritten).  This is not an issue with typical servers
and typical wsize values because those will almost always be a
multiple of 4096, but in the bug report the server in question was old
and had sent a value for maximum write size that was not a multiple of
4096.

This can be a temporary fix, that can be removed as netfs/folios
implementation improves here - but in the short term the easiest way
to fix this seems to be to round the negotiated maximum_write_size
down if not a multiple of 4096, to be a multiple of 4096 (this can be
removed in the future when the folios code is found which caused
this), and also warn the user if they pick a wsize that is not
recommended, not a multiple of 4096.

Comments

Matthew Wilcox Feb. 6, 2024, 11:18 p.m. UTC | #1
On Tue, Feb 06, 2024 at 05:14:42PM -0600, Steve French wrote:
> The code in question is a little hard to follow, and may eventually
> get rewritten by later folio/netfs patches from David Howells but the
> problem is in
> cifs_write_back_from_locked_folio() and cifs_writepages_region() where
> after the write (of maximum write size) completes, the next write
> skips to the beginning of the next page (leaving the tail end of the
> previous page unwritten).  This is not an issue with typical servers
> and typical wsize values because those will almost always be a
> multiple of 4096, but in the bug report the server in question was old
> and had sent a value for maximum write size that was not a multiple of
> 4096.
> 
> This can be a temporary fix, that can be removed as netfs/folios
> implementation improves here - but in the short term the easiest way
> to fix this seems to be to round the negotiated maximum_write_size
> down if not a multiple of 4096, to be a multiple of 4096 (this can be
> removed in the future when the folios code is found which caused
> this), and also warn the user if they pick a wsize that is not
> recommended, not a multiple of 4096.

Seems like a sensible stopgap, but probably the patch should use
PAGE_SIZE rather than plain 4096 (what about
Alpha/Sparc/powerpc-64k/arm64-{16,64}k?)

Also, what if the server says its max-write-size is 2048 bytes?
Also, does the code work well if the max-write-size is, say, 20480
bytes?  (ie an odd multiple of PAGE_SIZE is fine; it doesn't need to be
a power-of-two?)
Steve French Feb. 7, 2024, 1:38 a.m. UTC | #2
> Also, what if the server says its max-write-size is 2048 bytes?

A good question but realistically the worst case I have seen is 16K
and those servers were ancient (most SMB1 were 64K but for any
reasonably current server the worst case is typically write size of
1MB (Azure) or 4MB (Windows and Samba) while some will increase max
write size (to 8MB) not reduce it.

You are right though: setting "smb2 max write" size in Samba
/etc/samba/smb.conf to 4000 (less than 4096) would cause writes to
hang (since it would now round down to write size of 0) - so we can't
set it less than 4K.

When David's patches for netfs/folios integration rewrite are ready I
do want to see if they fix this bug so we can avoid the game of
rounding down max write size (fortunately extremely rare when it is
needed but without this temporary patch we do risk data corruption in
this strange configurations)



On Tue, Feb 6, 2024 at 5:18 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Feb 06, 2024 at 05:14:42PM -0600, Steve French wrote:
> > The code in question is a little hard to follow, and may eventually
> > get rewritten by later folio/netfs patches from David Howells but the
> > problem is in
> > cifs_write_back_from_locked_folio() and cifs_writepages_region() where
> > after the write (of maximum write size) completes, the next write
> > skips to the beginning of the next page (leaving the tail end of the
> > previous page unwritten).  This is not an issue with typical servers
> > and typical wsize values because those will almost always be a
> > multiple of 4096, but in the bug report the server in question was old
> > and had sent a value for maximum write size that was not a multiple of
> > 4096.
> >
> > This can be a temporary fix, that can be removed as netfs/folios
> > implementation improves here - but in the short term the easiest way
> > to fix this seems to be to round the negotiated maximum_write_size
> > down if not a multiple of 4096, to be a multiple of 4096 (this can be
> > removed in the future when the folios code is found which caused
> > this), and also warn the user if they pick a wsize that is not
> > recommended, not a multiple of 4096.
>
> Seems like a sensible stopgap, but probably the patch should use
> PAGE_SIZE rather than plain 4096 (what about
> Alpha/Sparc/powerpc-64k/arm64-{16,64}k?)
>
> Also, what if the server says its max-write-size is 2048 bytes?
> Also, does the code work well if the max-write-size is, say, 20480
> bytes?  (ie an odd multiple of PAGE_SIZE is fine; it doesn't need to be
> a power-of-two?)
>
diff mbox series

Patch

From 115f9424e2899269084069e88296b6481a0250a5 Mon Sep 17 00:00:00 2001
From: Steve French <stfrench@microsoft.com>
Date: Tue, 6 Feb 2024 16:34:22 -0600
Subject: [PATCH] smb: Fix regression in writes when non-standard maximum write
 size negotiated

The conversion to netfs in the 6.3 kernel caused a regression when
maximum write size is set by the server to an unexpected value which is
not a multiple of 4096 (similarly if the user overrides the maximum
write size by setting mount parm "wsize", but sets it to a value that
is not a multiple of 4096).  When negotiated write size is not a
multiple of 4096 the netfs code can skip the end of the final
page when doing large sequential writes causing data corruption.

This section of code is being rewritten/removed due to a large
netfs change, but until that point (from 6.3 kernel until now)
we can not support non-standard maximum write sizes.

Add a warning if a user specifies a wsize on mount that is not
a multiple of 4096, and also add a change where we round down the
maximum write size if the server negotiates a value that is not
a multiple of 4096.

Reported-by: R. Diez" <rdiez-2006@rd10.de>
Fixes: d08089f649a0 ("cifs: Change the I/O paths to use an iterator rather than a page list")
Suggested-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Cc: stable@vger.kernel.org
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
---
 fs/smb/client/connect.c    | 2 +-
 fs/smb/client/fs_context.c | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
index c5cf88de32b7..9ceb3b2c614b 100644
--- a/fs/smb/client/connect.c
+++ b/fs/smb/client/connect.c
@@ -3441,7 +3441,7 @@  int cifs_mount_get_tcon(struct cifs_mount_ctx *mnt_ctx)
 	 */
 	if ((cifs_sb->ctx->wsize == 0) ||
 	    (cifs_sb->ctx->wsize > server->ops->negotiate_wsize(tcon, ctx)))
-		cifs_sb->ctx->wsize = server->ops->negotiate_wsize(tcon, ctx);
+		cifs_sb->ctx->wsize = round_down(server->ops->negotiate_wsize(tcon, ctx), 4096);
 	if ((cifs_sb->ctx->rsize == 0) ||
 	    (cifs_sb->ctx->rsize > server->ops->negotiate_rsize(tcon, ctx)))
 		cifs_sb->ctx->rsize = server->ops->negotiate_rsize(tcon, ctx);
diff --git a/fs/smb/client/fs_context.c b/fs/smb/client/fs_context.c
index 82eafe0815dc..55157778e553 100644
--- a/fs/smb/client/fs_context.c
+++ b/fs/smb/client/fs_context.c
@@ -1141,6 +1141,8 @@  static int smb3_fs_context_parse_param(struct fs_context *fc,
 	case Opt_wsize:
 		ctx->wsize = result.uint_32;
 		ctx->got_wsize = true;
+		if (round_up(ctx->wsize, 4096) != ctx->wsize)
+			cifs_dbg(VFS, "wsize should be a multiple of 4096\n");
 		break;
 	case Opt_acregmax:
 		ctx->acregmax = HZ * result.uint_32;
-- 
2.40.1