diff mbox series

[v2] sock: allow reading and changing sk_userlocks with setsockopt

Message ID 20210730160708.6544-1-ptikhomirov@virtuozzo.com
State New
Headers show
Series [v2] sock: allow reading and changing sk_userlocks with setsockopt | expand

Commit Message

Pavel Tikhomirov July 30, 2021, 4:07 p.m. UTC
SOCK_SNDBUF_LOCK and SOCK_RCVBUF_LOCK flags disable automatic socket
buffers adjustment done by kernel (see tcp_fixup_rcvbuf() and
tcp_sndbuf_expand()). If we've just created a new socket this adjustment
is enabled on it, but if one changes the socket buffer size by
setsockopt(SO_{SND,RCV}BUF*) it becomes disabled.

CRIU needs to call setsockopt(SO_{SND,RCV}BUF*) on each socket on
restore as it first needs to increase buffer sizes for packet queues
restore and second it needs to restore back original buffer sizes. So
after CRIU restore all sockets become non-auto-adjustable, which can
decrease network performance of restored applications significantly.

CRIU need to be able to restore sockets with enabled/disabled adjustment
to the same state it was before dump, so let's add special setsockopt
for it.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
---
Here is a corresponding CRIU commits using these new feature to fix slow
download speed problem after migration:
https://github.com/checkpoint-restore/criu/pull/1568

Origin of the problem:

We have a customer in Virtuozzo who mentioned that nginx server becomes
slower after container migration. Especially it is easy to mention when
you wget some big file via localhost from the same container which was
just migrated.

By strace-ing all nginx processes I see that nginx worker process before
c/r sends data to local wget with big chunks ~1.5Mb, but after c/r it
only succeeds to send by small chunks ~64Kb.

Before:
sendfile(12, 13, [7984974] => [9425600], 11479629) = 1440626 <0.000180>

After:
sendfile(8, 13, [1507275] => [1568768], 17957328) = 61493 <0.000675>

Smaller buffer can explain the decrease in download speed. So as a POC I
just commented out all buffer setting manipulations and that helped.

v2: define SOCK_BUF_LOCK_MASK mask
---
 arch/alpha/include/uapi/asm/socket.h  | 2 ++
 arch/mips/include/uapi/asm/socket.h   | 2 ++
 arch/parisc/include/uapi/asm/socket.h | 2 ++
 arch/sparc/include/uapi/asm/socket.h  | 2 ++
 include/net/sock.h                    | 2 ++
 include/uapi/asm-generic/socket.h     | 2 ++
 net/core/sock.c                       | 9 +++++++++
 7 files changed, 21 insertions(+)

Comments

Jakub Kicinski July 30, 2021, 4:46 p.m. UTC | #1
On Fri, 30 Jul 2021 19:07:08 +0300 Pavel Tikhomirov wrote:
> SOCK_SNDBUF_LOCK and SOCK_RCVBUF_LOCK flags disable automatic socket
> buffers adjustment done by kernel (see tcp_fixup_rcvbuf() and
> tcp_sndbuf_expand()). If we've just created a new socket this adjustment
> is enabled on it, but if one changes the socket buffer size by
> setsockopt(SO_{SND,RCV}BUF*) it becomes disabled.
> 
> CRIU needs to call setsockopt(SO_{SND,RCV}BUF*) on each socket on
> restore as it first needs to increase buffer sizes for packet queues
> restore and second it needs to restore back original buffer sizes. So
> after CRIU restore all sockets become non-auto-adjustable, which can
> decrease network performance of restored applications significantly.
> 
> CRIU need to be able to restore sockets with enabled/disabled adjustment
> to the same state it was before dump, so let's add special setsockopt
> for it.
> 
> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>

The patchwork bot is struggling to ingest this, please double check it
applies cleanly to net-next.
Pavel Tikhomirov Aug. 2, 2021, 8:26 a.m. UTC | #2
On 30.07.2021 19:46, Jakub Kicinski wrote:
> On Fri, 30 Jul 2021 19:07:08 +0300 Pavel Tikhomirov wrote:
>> SOCK_SNDBUF_LOCK and SOCK_RCVBUF_LOCK flags disable automatic socket
>> buffers adjustment done by kernel (see tcp_fixup_rcvbuf() and
>> tcp_sndbuf_expand()). If we've just created a new socket this adjustment
>> is enabled on it, but if one changes the socket buffer size by
>> setsockopt(SO_{SND,RCV}BUF*) it becomes disabled.
>>
>> CRIU needs to call setsockopt(SO_{SND,RCV}BUF*) on each socket on
>> restore as it first needs to increase buffer sizes for packet queues
>> restore and second it needs to restore back original buffer sizes. So
>> after CRIU restore all sockets become non-auto-adjustable, which can
>> decrease network performance of restored applications significantly.
>>
>> CRIU need to be able to restore sockets with enabled/disabled adjustment
>> to the same state it was before dump, so let's add special setsockopt
>> for it.
>>
>> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
> 
> The patchwork bot is struggling to ingest this, please double check it
> applies cleanly to net-next.

I checked that it applies cleanly to net-next:

[snorch@fedora linux]$ git am 
~/Downloads/patches/ptikhomirov/setsockopt-sk_userlocks/\[PATCH\ v2\]\ 
sock\:\ allow\ reading\ and\ changing\ sk_userlocks\ with\ setsockopt.eml

[snorch@fedora linux]$ git log --oneline
c339520aadd5 (HEAD -> net-next) sock: allow reading and changing 
sk_userlocks with setsockopt

d39e8b92c341 (net-next/master) Merge 
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next


Probably it was some temporary problem and now it's OK? 
https://patchwork.kernel.org/project/netdevbpf/patch/20210730160708.6544-1-ptikhomirov@virtuozzo.com/

>
Jakub Kicinski Aug. 2, 2021, 4:11 p.m. UTC | #3
On Mon, 2 Aug 2021 11:26:09 +0300 Pavel Tikhomirov wrote:
> On 30.07.2021 19:46, Jakub Kicinski wrote:
> > On Fri, 30 Jul 2021 19:07:08 +0300 Pavel Tikhomirov wrote:  
> >> SOCK_SNDBUF_LOCK and SOCK_RCVBUF_LOCK flags disable automatic socket
> >> buffers adjustment done by kernel (see tcp_fixup_rcvbuf() and
> >> tcp_sndbuf_expand()). If we've just created a new socket this adjustment
> >> is enabled on it, but if one changes the socket buffer size by
> >> setsockopt(SO_{SND,RCV}BUF*) it becomes disabled.
> >>
> >> CRIU needs to call setsockopt(SO_{SND,RCV}BUF*) on each socket on
> >> restore as it first needs to increase buffer sizes for packet queues
> >> restore and second it needs to restore back original buffer sizes. So
> >> after CRIU restore all sockets become non-auto-adjustable, which can
> >> decrease network performance of restored applications significantly.
> >>
> >> CRIU need to be able to restore sockets with enabled/disabled adjustment
> >> to the same state it was before dump, so let's add special setsockopt
> >> for it.
> >>
> >> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>  
> > 
> > The patchwork bot is struggling to ingest this, please double check it
> > applies cleanly to net-next.  
> 
> I checked that it applies cleanly to net-next:
> 
> [snorch@fedora linux]$ git am 
> ~/Downloads/patches/ptikhomirov/setsockopt-sk_userlocks/\[PATCH\ v2\]\ 
> sock\:\ allow\ reading\ and\ changing\ sk_userlocks\ with\ setsockopt.eml
> 
> [snorch@fedora linux]$ git log --oneline
> c339520aadd5 (HEAD -> net-next) sock: allow reading and changing 
> sk_userlocks with setsockopt
> 
> d39e8b92c341 (net-next/master) Merge 
> https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
> 
> Probably it was some temporary problem and now it's OK? 
> https://patchwork.kernel.org/project/netdevbpf/patch/20210730160708.6544-1-ptikhomirov@virtuozzo.com/

Indeed, must have been resolved by the merge of net into net-next which
happened on Saturday? Regardless, would you mind reposting? There is no
way for me to retry the patchwork checks.

And one more thing..

> +	case SO_BUF_LOCK:
> +		sk->sk_userlocks = (sk->sk_userlocks & ~SOCK_BUF_LOCK_MASK) |
> +				   (val & SOCK_BUF_LOCK_MASK);

What's the thinking on silently ignoring unsupported flags on set
rather than rejecting? I feel like these days we lean towards explicit
rejects.

> +	case SO_BUF_LOCK:
> +		v.val = sk->sk_userlocks & (SOCK_SNDBUF_LOCK | SOCK_RCVBUF_LOCK);
> +		break;

The mask could you be used here.

Just to double check - is the expectation that the value returned is
completely opaque to the user space? The defines in question are not
part of uAPI.
Pavel Tikhomirov Aug. 3, 2021, 11:04 a.m. UTC | #4
On 02.08.2021 19:11, Jakub Kicinski wrote:
> On Mon, 2 Aug 2021 11:26:09 +0300 Pavel Tikhomirov wrote:
>> On 30.07.2021 19:46, Jakub Kicinski wrote:
>>> On Fri, 30 Jul 2021 19:07:08 +0300 Pavel Tikhomirov wrote:
>>>> SOCK_SNDBUF_LOCK and SOCK_RCVBUF_LOCK flags disable automatic socket
>>>> buffers adjustment done by kernel (see tcp_fixup_rcvbuf() and
>>>> tcp_sndbuf_expand()). If we've just created a new socket this adjustment
>>>> is enabled on it, but if one changes the socket buffer size by
>>>> setsockopt(SO_{SND,RCV}BUF*) it becomes disabled.
>>>>
>>>> CRIU needs to call setsockopt(SO_{SND,RCV}BUF*) on each socket on
>>>> restore as it first needs to increase buffer sizes for packet queues
>>>> restore and second it needs to restore back original buffer sizes. So
>>>> after CRIU restore all sockets become non-auto-adjustable, which can
>>>> decrease network performance of restored applications significantly.
>>>>
>>>> CRIU need to be able to restore sockets with enabled/disabled adjustment
>>>> to the same state it was before dump, so let's add special setsockopt
>>>> for it.
>>>>
>>>> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
>>>
>>> The patchwork bot is struggling to ingest this, please double check it
>>> applies cleanly to net-next.
>>
>> I checked that it applies cleanly to net-next:
>>
>> [snorch@fedora linux]$ git am
>> ~/Downloads/patches/ptikhomirov/setsockopt-sk_userlocks/\[PATCH\ v2\]\
>> sock\:\ allow\ reading\ and\ changing\ sk_userlocks\ with\ setsockopt.eml
>>
>> [snorch@fedora linux]$ git log --oneline
>> c339520aadd5 (HEAD -> net-next) sock: allow reading and changing
>> sk_userlocks with setsockopt
>>
>> d39e8b92c341 (net-next/master) Merge
>> https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
>>
>> Probably it was some temporary problem and now it's OK?
>> https://patchwork.kernel.org/project/netdevbpf/patch/20210730160708.6544-1-ptikhomirov@virtuozzo.com/
> 
> Indeed, must have been resolved by the merge of net into net-next which
> happened on Saturday? Regardless, would you mind reposting? There is no
> way for me to retry the patchwork checks.
> 
> And one more thing..
> 
>> +	case SO_BUF_LOCK:
>> +		sk->sk_userlocks = (sk->sk_userlocks & ~SOCK_BUF_LOCK_MASK) |
>> +				   (val & SOCK_BUF_LOCK_MASK);
> 
> What's the thinking on silently ignoring unsupported flags on set
> rather than rejecting? I feel like these days we lean towards explicit
> rejects.

Will do.

> 
>> +	case SO_BUF_LOCK:
>> +		v.val = sk->sk_userlocks & (SOCK_SNDBUF_LOCK | SOCK_RCVBUF_LOCK);
>> +		break;
> 
> The mask could you be used here.

Sure, missed it...

> 
> Just to double check - is the expectation that the value returned is
> completely opaque to the user space? The defines in question are not
> part of uAPI.

Sorry, didn't though about it initially. For criu we don't care about 
the actual bits we restore same what we've dumped. Buf if some real 
users would like to use this interface to restore default autoadjustment 
on their sockets we should probably export SOCK_SNDBUF_LOCK and 
SOCK_RCVBUF_LOCK to uAPI.

>
Jakub Kicinski Aug. 3, 2021, 12:42 p.m. UTC | #5
On Tue, 3 Aug 2021 14:04:39 +0300 Pavel Tikhomirov wrote:
> > Just to double check - is the expectation that the value returned is
> > completely opaque to the user space? The defines in question are not
> > part of uAPI.  
> 
> Sorry, didn't though about it initially. For criu we don't care about 
> the actual bits we restore same what we've dumped. Buf if some real 
> users would like to use this interface to restore default autoadjustment 
> on their sockets we should probably export SOCK_SNDBUF_LOCK and 
> SOCK_RCVBUF_LOCK to uAPI.

Just to be sure - please mention this in the commit message.
diff mbox series

Patch

diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
index 6b3daba60987..1dd9baf4a6c2 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -129,6 +129,8 @@ 
 
 #define SO_NETNS_COOKIE		71
 
+#define SO_BUF_LOCK		72
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64
diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
index cdf404a831b2..1eaf6a1ca561 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -140,6 +140,8 @@ 
 
 #define SO_NETNS_COOKIE		71
 
+#define SO_BUF_LOCK		72
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64
diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
index 5b5351cdcb33..8baaad52d799 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -121,6 +121,8 @@ 
 
 #define SO_NETNS_COOKIE		0x4045
 
+#define SO_BUF_LOCK		0x4046
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64
diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
index 92675dc380fa..e80ee8641ac3 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -122,6 +122,8 @@ 
 
 #define SO_NETNS_COOKIE          0x0050
 
+#define SO_BUF_LOCK              0x0051
+
 #if !defined(__KERNEL__)
 
 
diff --git a/include/net/sock.h b/include/net/sock.h
index f23cb259b0e2..a6fb02f39cc4 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1443,6 +1443,8 @@  static inline int __sk_prot_rehash(struct sock *sk)
 #define SOCK_BINDADDR_LOCK	4
 #define SOCK_BINDPORT_LOCK	8
 
+#define SOCK_BUF_LOCK_MASK (SOCK_SNDBUF_LOCK | SOCK_RCVBUF_LOCK)
+
 struct socket_alloc {
 	struct socket socket;
 	struct inode vfs_inode;
diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
index d588c244ec2f..1f0a2b4864e4 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -124,6 +124,8 @@ 
 
 #define SO_NETNS_COOKIE		71
 
+#define SO_BUF_LOCK		72
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
diff --git a/net/core/sock.c b/net/core/sock.c
index a3eea6e0b30a..c8c994e48b76 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1357,6 +1357,11 @@  int sock_setsockopt(struct socket *sock, int level, int optname,
 		ret = sock_bindtoindex_locked(sk, val);
 		break;
 
+	case SO_BUF_LOCK:
+		sk->sk_userlocks = (sk->sk_userlocks & ~SOCK_BUF_LOCK_MASK) |
+				   (val & SOCK_BUF_LOCK_MASK);
+		break;
+
 	default:
 		ret = -ENOPROTOOPT;
 		break;
@@ -1719,6 +1724,10 @@  int sock_getsockopt(struct socket *sock, int level, int optname,
 		v.val64 = sock_net(sk)->net_cookie;
 		break;
 
+	case SO_BUF_LOCK:
+		v.val = sk->sk_userlocks & (SOCK_SNDBUF_LOCK | SOCK_RCVBUF_LOCK);
+		break;
+
 	default:
 		/* We implement the SO_SNDLOWAT etc to not be settable
 		 * (1003.1g 7).