Message ID | 20190817042622.91497-1-edumazet@google.com |
---|---|
State | Accepted |
Delegated to: | David Miller |
Headers | show |
Series | [net] tcp: make sure EPOLLOUT wont be missed | expand |
On Sat, Aug 17, 2019 at 12:26 AM Eric Dumazet <edumazet@google.com> wrote: > > As Jason Baron explained in commit 790ba4566c1a ("tcp: set SOCK_NOSPACE > under memory pressure"), it is crucial we properly set SOCK_NOSPACE > when needed. > > However, Jason patch had a bug, because the 'nonblocking' status > as far as sk_stream_wait_memory() is concerned is governed > by MSG_DONTWAIT flag passed at sendmsg() time : > > long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT); > > So it is very possible that tcp sendmsg() calls sk_stream_wait_memory(), > and that sk_stream_wait_memory() returns -EAGAIN with SOCK_NOSPACE > cleared, if sk->sk_sndtimeo has been set to a small (but not zero) > value. > > This patch removes the 'noblock' variable since we must always > set SOCK_NOSPACE if -EAGAIN is returned. > > It also renames the do_nonblock label since we might reach this > code path even if we were in blocking mode. > > Fixes: 790ba4566c1a ("tcp: set SOCK_NOSPACE under memory pressure") > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Jason Baron <jbaron@akamai.com> > Reported-by: Vladimir Rutsky <rutsky@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Thank you for the fix! > --- > net/core/stream.c | 16 +++++++++------- > 1 file changed, 9 insertions(+), 7 deletions(-) > > diff --git a/net/core/stream.c b/net/core/stream.c > index e94bb02a56295ec2db34ab423a8c7c890df0a696..4f1d4aa5fb38d989a9c81f32dfce3f31bbc1fa47 100644 > --- a/net/core/stream.c > +++ b/net/core/stream.c > @@ -120,7 +120,6 @@ int sk_stream_wait_memory(struct sock *sk, long *timeo_p) > int err = 0; > long vm_wait = 0; > long current_timeo = *timeo_p; > - bool noblock = (*timeo_p ? false : true); > DEFINE_WAIT_FUNC(wait, woken_wake_function); > > if (sk_stream_memory_free(sk)) > @@ -133,11 +132,8 @@ int sk_stream_wait_memory(struct sock *sk, long *timeo_p) > > if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN)) > goto do_error; > - if (!*timeo_p) { > - if (noblock) > - set_bit(SOCK_NOSPACE, &sk->sk_socket->flags); > - goto do_nonblock; > - } > + if (!*timeo_p) > + goto do_eagain; > if (signal_pending(current)) > goto do_interrupted; > sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk); > @@ -169,7 +165,13 @@ int sk_stream_wait_memory(struct sock *sk, long *timeo_p) > do_error: > err = -EPIPE; > goto out; > -do_nonblock: > +do_eagain: > + /* Make sure that whenever EAGAIN is returned, EPOLLOUT event can > + * be generated later. > + * When TCP receives ACK packets that make room, tcp_check_space() > + * only calls tcp_new_space() if SOCK_NOSPACE is set. > + */ > + set_bit(SOCK_NOSPACE, &sk->sk_socket->flags); > err = -EAGAIN; > goto out; > do_interrupted: > -- > 2.23.0.rc1.153.gdeed80330f-goog >
On 8/17/19 12:26 AM, Eric Dumazet wrote: > As Jason Baron explained in commit 790ba4566c1a ("tcp: set SOCK_NOSPACE > under memory pressure"), it is crucial we properly set SOCK_NOSPACE > when needed. > > However, Jason patch had a bug, because the 'nonblocking' status > as far as sk_stream_wait_memory() is concerned is governed > by MSG_DONTWAIT flag passed at sendmsg() time : > > long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT); > > So it is very possible that tcp sendmsg() calls sk_stream_wait_memory(), > and that sk_stream_wait_memory() returns -EAGAIN with SOCK_NOSPACE > cleared, if sk->sk_sndtimeo has been set to a small (but not zero) > value. Is MSG_DONTWAIT not set in this case? The original patch was intended only for the explicit non-blocking case. The epoll manpage says: "EPOLLET flag should use nonblocking file descriptors". So the original intention was not to impact the blocking case. This seems to me like a different use-case. Thanks, -Jason > This patch removes the 'noblock' variable since we must always > set SOCK_NOSPACE if -EAGAIN is returned. > > It also renames the do_nonblock label since we might reach this > code path even if we were in blocking mode. > > Fixes: 790ba4566c1a ("tcp: set SOCK_NOSPACE under memory pressure") > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Jason Baron <jbaron@akamai.com> > Reported-by: Vladimir Rutsky <rutsky@google.com> > --- > net/core/stream.c | 16 +++++++++------- > 1 file changed, 9 insertions(+), 7 deletions(-) > > diff --git a/net/core/stream.c b/net/core/stream.c > index e94bb02a56295ec2db34ab423a8c7c890df0a696..4f1d4aa5fb38d989a9c81f32dfce3f31bbc1fa47 100644 > --- a/net/core/stream.c > +++ b/net/core/stream.c > @@ -120,7 +120,6 @@ int sk_stream_wait_memory(struct sock *sk, long *timeo_p) > int err = 0; > long vm_wait = 0; > long current_timeo = *timeo_p; > - bool noblock = (*timeo_p ? false : true); > DEFINE_WAIT_FUNC(wait, woken_wake_function); > > if (sk_stream_memory_free(sk)) > @@ -133,11 +132,8 @@ int sk_stream_wait_memory(struct sock *sk, long *timeo_p) > > if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN)) > goto do_error; > - if (!*timeo_p) { > - if (noblock) > - set_bit(SOCK_NOSPACE, &sk->sk_socket->flags); > - goto do_nonblock; > - } > + if (!*timeo_p) > + goto do_eagain; > if (signal_pending(current)) > goto do_interrupted; > sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk); > @@ -169,7 +165,13 @@ int sk_stream_wait_memory(struct sock *sk, long *timeo_p) > do_error: > err = -EPIPE; > goto out; > -do_nonblock: > +do_eagain: > + /* Make sure that whenever EAGAIN is returned, EPOLLOUT event can > + * be generated later. > + * When TCP receives ACK packets that make room, tcp_check_space() > + * only calls tcp_new_space() if SOCK_NOSPACE is set. > + */ > + set_bit(SOCK_NOSPACE, &sk->sk_socket->flags); > err = -EAGAIN; > goto out; > do_interrupted: >
On 8/17/19 4:19 PM, Jason Baron wrote: > > > On 8/17/19 12:26 AM, Eric Dumazet wrote: >> As Jason Baron explained in commit 790ba4566c1a ("tcp: set SOCK_NOSPACE >> under memory pressure"), it is crucial we properly set SOCK_NOSPACE >> when needed. >> >> However, Jason patch had a bug, because the 'nonblocking' status >> as far as sk_stream_wait_memory() is concerned is governed >> by MSG_DONTWAIT flag passed at sendmsg() time : >> >> long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT); >> >> So it is very possible that tcp sendmsg() calls sk_stream_wait_memory(), >> and that sk_stream_wait_memory() returns -EAGAIN with SOCK_NOSPACE >> cleared, if sk->sk_sndtimeo has been set to a small (but not zero) >> value. > > Is MSG_DONTWAIT not set in this case? The original patch was intended > only for the explicit non-blocking case. The epoll manpage says: > "EPOLLET flag should use nonblocking file descriptors". So the original > intention was not to impact the blocking case. This seems to me like > a different use-case. > I guess the problem is how we define 'non-blocking' ... SO_SNDTIMEO can be used by application to implement a variation of non-blocking, by waiting for a socket event with a short timeout, to maybe recover from memory pressure conditions in a more efficient way than simply looping. Note that the man page for epoll() only _suggests_ to use nonblocking file descriptors. <quote> The suggested way to use epoll as an edge-triggered (EPOLLET) interface is as follows: i with nonblocking file descriptors; and ii by waiting for an event only after read(2) or write(2) return EAGAIN. </quote>
On Sat, Aug 17, 2019 at 12:26 AM Eric Dumazet <edumazet@google.com> wrote: > > As Jason Baron explained in commit 790ba4566c1a ("tcp: set SOCK_NOSPACE > under memory pressure"), it is crucial we properly set SOCK_NOSPACE > when needed. > > However, Jason patch had a bug, because the 'nonblocking' status > as far as sk_stream_wait_memory() is concerned is governed > by MSG_DONTWAIT flag passed at sendmsg() time : > > long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT); > > So it is very possible that tcp sendmsg() calls sk_stream_wait_memory(), > and that sk_stream_wait_memory() returns -EAGAIN with SOCK_NOSPACE > cleared, if sk->sk_sndtimeo has been set to a small (but not zero) > value. > > This patch removes the 'noblock' variable since we must always > set SOCK_NOSPACE if -EAGAIN is returned. > > It also renames the do_nonblock label since we might reach this > code path even if we were in blocking mode. > > Fixes: 790ba4566c1a ("tcp: set SOCK_NOSPACE under memory pressure") > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Jason Baron <jbaron@akamai.com> > Reported-by: Vladimir Rutsky <rutsky@google.com> > --- > net/core/stream.c | 16 +++++++++------- > 1 file changed, 9 insertions(+), 7 deletions(-) Acked-by: Neal Cardwell <ncardwell@google.com> Thanks, Eric! neal
On 8/17/19 12:26 PM, Eric Dumazet wrote: > > > On 8/17/19 4:19 PM, Jason Baron wrote: >> >> >> On 8/17/19 12:26 AM, Eric Dumazet wrote: >>> As Jason Baron explained in commit 790ba4566c1a ("tcp: set SOCK_NOSPACE >>> under memory pressure"), it is crucial we properly set SOCK_NOSPACE >>> when needed. >>> >>> However, Jason patch had a bug, because the 'nonblocking' status >>> as far as sk_stream_wait_memory() is concerned is governed >>> by MSG_DONTWAIT flag passed at sendmsg() time : >>> >>> long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT); >>> >>> So it is very possible that tcp sendmsg() calls sk_stream_wait_memory(), >>> and that sk_stream_wait_memory() returns -EAGAIN with SOCK_NOSPACE >>> cleared, if sk->sk_sndtimeo has been set to a small (but not zero) >>> value. >> >> Is MSG_DONTWAIT not set in this case? The original patch was intended >> only for the explicit non-blocking case. The epoll manpage says: >> "EPOLLET flag should use nonblocking file descriptors". So the original >> intention was not to impact the blocking case. This seems to me like >> a different use-case. >> > > I guess the problem is how we define 'non-blocking' ... > > SO_SNDTIMEO can be used by application to implement a variation of non-blocking, > by waiting for a socket event with a short timeout, to maybe recover > from memory pressure conditions in a more efficient way than simply looping. > > Note that the man page for epoll() only _suggests_ to use nonblocking file descriptors. > > <quote> > The suggested way to use epoll as an edge-triggered (EPOLLET) > interface is as follows: > > i with nonblocking file descriptors; and > > ii by waiting for an event only after read(2) or > write(2) return EAGAIN. > </quote> > > Ok, seems reasonable: Acked-by: Jason Baron <jbaron@akamai.com> I found a similar pattern in net/smc/smc_tx.c, which I also just sent a patch for. Thanks, -Jason
From: Eric Dumazet <edumazet@google.com> Date: Fri, 16 Aug 2019 21:26:22 -0700 > As Jason Baron explained in commit 790ba4566c1a ("tcp: set SOCK_NOSPACE > under memory pressure"), it is crucial we properly set SOCK_NOSPACE > when needed. > > However, Jason patch had a bug, because the 'nonblocking' status > as far as sk_stream_wait_memory() is concerned is governed > by MSG_DONTWAIT flag passed at sendmsg() time : > > long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT); > > So it is very possible that tcp sendmsg() calls sk_stream_wait_memory(), > and that sk_stream_wait_memory() returns -EAGAIN with SOCK_NOSPACE > cleared, if sk->sk_sndtimeo has been set to a small (but not zero) > value. > > This patch removes the 'noblock' variable since we must always > set SOCK_NOSPACE if -EAGAIN is returned. > > It also renames the do_nonblock label since we might reach this > code path even if we were in blocking mode. > > Fixes: 790ba4566c1a ("tcp: set SOCK_NOSPACE under memory pressure") > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Jason Baron <jbaron@akamai.com> > Reported-by: Vladimir Rutsky <rutsky@google.com> Applied and queued up for -stable.
diff --git a/net/core/stream.c b/net/core/stream.c index e94bb02a56295ec2db34ab423a8c7c890df0a696..4f1d4aa5fb38d989a9c81f32dfce3f31bbc1fa47 100644 --- a/net/core/stream.c +++ b/net/core/stream.c @@ -120,7 +120,6 @@ int sk_stream_wait_memory(struct sock *sk, long *timeo_p) int err = 0; long vm_wait = 0; long current_timeo = *timeo_p; - bool noblock = (*timeo_p ? false : true); DEFINE_WAIT_FUNC(wait, woken_wake_function); if (sk_stream_memory_free(sk)) @@ -133,11 +132,8 @@ int sk_stream_wait_memory(struct sock *sk, long *timeo_p) if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN)) goto do_error; - if (!*timeo_p) { - if (noblock) - set_bit(SOCK_NOSPACE, &sk->sk_socket->flags); - goto do_nonblock; - } + if (!*timeo_p) + goto do_eagain; if (signal_pending(current)) goto do_interrupted; sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk); @@ -169,7 +165,13 @@ int sk_stream_wait_memory(struct sock *sk, long *timeo_p) do_error: err = -EPIPE; goto out; -do_nonblock: +do_eagain: + /* Make sure that whenever EAGAIN is returned, EPOLLOUT event can + * be generated later. + * When TCP receives ACK packets that make room, tcp_check_space() + * only calls tcp_new_space() if SOCK_NOSPACE is set. + */ + set_bit(SOCK_NOSPACE, &sk->sk_socket->flags); err = -EAGAIN; goto out; do_interrupted:
As Jason Baron explained in commit 790ba4566c1a ("tcp: set SOCK_NOSPACE under memory pressure"), it is crucial we properly set SOCK_NOSPACE when needed. However, Jason patch had a bug, because the 'nonblocking' status as far as sk_stream_wait_memory() is concerned is governed by MSG_DONTWAIT flag passed at sendmsg() time : long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT); So it is very possible that tcp sendmsg() calls sk_stream_wait_memory(), and that sk_stream_wait_memory() returns -EAGAIN with SOCK_NOSPACE cleared, if sk->sk_sndtimeo has been set to a small (but not zero) value. This patch removes the 'noblock' variable since we must always set SOCK_NOSPACE if -EAGAIN is returned. It also renames the do_nonblock label since we might reach this code path even if we were in blocking mode. Fixes: 790ba4566c1a ("tcp: set SOCK_NOSPACE under memory pressure") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jason Baron <jbaron@akamai.com> Reported-by: Vladimir Rutsky <rutsky@google.com> --- net/core/stream.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-)