Message ID | 9b8502f9cec31c971e480ee2281f5cd7088b50df.1552077823.git.gnault@redhat.com |
---|---|
State | Accepted |
Delegated to: | David Miller |
Headers | show |
Series | [net] tcp: handle inet_csk_reqsk_queue_add() failures | expand |
On 03/08/2019 01:09 PM, Guillaume Nault wrote: > Commit 7716682cc58e ("tcp/dccp: fix another race at listener > dismantle") let inet_csk_reqsk_queue_add() fail, and adjusted > {tcp,dccp}_check_req() accordingly. However, TFO and syncookies > weren't modified, thus leaking allocated resources on error. > > Contrary to tcp_check_req(), in both syncookies and TFO cases, > we need to drop the request socket. Also, since the child socket is > created with inet_csk_clone_lock(), we have to unlock it and drop an > extra reference (->sk_refcount is initially set to 2 and > inet_csk_reqsk_queue_add() drops only one ref). > > For TFO, we also need to revert the work done by tcp_try_fastopen() > (with reqsk_fastopen_remove()). > > Fixes: 7716682cc58e ("tcp/dccp: fix another race at listener dismantle") > Signed-off-by: Guillaume Nault <gnault@redhat.com> > --- > > Note for stable backports: this patch relies on da8ab57863ed > ("tcp/dccp: remove reqsk_put() from inet_child_forget()"), to prevent > inet_child_forget() from dropping a reference from the request socket. > > Therefore, for trees older than 4.14, commit da8ab57863ed has to be > backported before this patch. > Thanks for working on this issue (it was on my radar as well) > > net/ipv4/syncookies.c | 7 ++++++- > net/ipv4/tcp_input.c | 8 +++++++- > 2 files changed, 13 insertions(+), 2 deletions(-) > > diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c > index 606f868d9f3f..e531344611a0 100644 > --- a/net/ipv4/syncookies.c > +++ b/net/ipv4/syncookies.c > @@ -216,7 +216,12 @@ struct sock *tcp_get_cookie_sock(struct sock *sk, struct sk_buff *skb, > refcount_set(&req->rsk_refcnt, 1); > tcp_sk(child)->tsoffset = tsoff; > sock_rps_save_rxhash(child, skb); > - inet_csk_reqsk_queue_add(sk, req, child); > + if (!inet_csk_reqsk_queue_add(sk, req, child)) { > + bh_unlock_sock(child); > + sock_put(child); > + child = NULL; > + reqsk_put(req); Since we use reqsk_free(req) in the same function, we can use reqsk_free(req) here as well ? I suggest the following maybe : diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c index 606f868d9f3fde1c3140aa7eecde87d2ec32b5f2..8b28fb66a8fcefba27a2f5e371e9469d4d7e3650 100644 --- a/net/ipv4/syncookies.c +++ b/net/ipv4/syncookies.c @@ -216,11 +216,14 @@ struct sock *tcp_get_cookie_sock(struct sock *sk, struct sk_buff *skb, refcount_set(&req->rsk_refcnt, 1); tcp_sk(child)->tsoffset = tsoff; sock_rps_save_rxhash(child, skb); - inet_csk_reqsk_queue_add(sk, req, child); - } else { - reqsk_free(req); + if (likely(inet_csk_reqsk_queue_add(sk, req, child))) + return child; + bh_unlock_sock(child); + sock_put(child); } - return child; + + reqsk_free(req); + return NULL; } EXPORT_SYMBOL(tcp_get_cookie_sock); > + } > } else { > reqsk_free(req); > } > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > index 4eb0c8ca3c60..5def3c48870e 100644 > --- a/net/ipv4/tcp_input.c > +++ b/net/ipv4/tcp_input.c > @@ -6498,7 +6498,13 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops, > af_ops->send_synack(fastopen_sk, dst, &fl, req, > &foc, TCP_SYNACK_FASTOPEN); > /* Add the child socket directly into the accept queue */ > - inet_csk_reqsk_queue_add(sk, req, fastopen_sk); > + if (!inet_csk_reqsk_queue_add(sk, req, fastopen_sk)) { > + reqsk_fastopen_remove(fastopen_sk, req, false); > + bh_unlock_sock(fastopen_sk); > + sock_put(fastopen_sk); > + reqsk_put(req); > + goto drop; These two lines can be replaced by : goto drop_and_free; > + } > sk->sk_data_ready(sk); > bh_unlock_sock(fastopen_sk); > sock_put(fastopen_sk); >
On Fri, Mar 08, 2019 at 01:33:02PM -0800, Eric Dumazet wrote: > > > On 03/08/2019 01:09 PM, Guillaume Nault wrote: > > @@ -216,7 +216,12 @@ struct sock *tcp_get_cookie_sock(struct sock *sk, struct sk_buff *skb, > > refcount_set(&req->rsk_refcnt, 1); > > tcp_sk(child)->tsoffset = tsoff; > > sock_rps_save_rxhash(child, skb); > > - inet_csk_reqsk_queue_add(sk, req, child); > > + if (!inet_csk_reqsk_queue_add(sk, req, child)) { > > + bh_unlock_sock(child); > > + sock_put(child); > > + child = NULL; > > + reqsk_put(req); > > Since we use reqsk_free(req) in the same function, we can use reqsk_free(req) > here as well ? > That was my first approach, but reqsk_free() doesn't like it: static inline void reqsk_free(struct request_sock *req) { /* temporary debugging */ WARN_ON_ONCE(refcount_read(&req->rsk_refcnt) != 0); ... } > I suggest the following maybe : > > diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c > index 606f868d9f3fde1c3140aa7eecde87d2ec32b5f2..8b28fb66a8fcefba27a2f5e371e9469d4d7e3650 100644 > --- a/net/ipv4/syncookies.c > +++ b/net/ipv4/syncookies.c > @@ -216,11 +216,14 @@ struct sock *tcp_get_cookie_sock(struct sock *sk, struct sk_buff *skb, > refcount_set(&req->rsk_refcnt, 1); > tcp_sk(child)->tsoffset = tsoff; > sock_rps_save_rxhash(child, skb); > - inet_csk_reqsk_queue_add(sk, req, child); > - } else { > - reqsk_free(req); > + if (likely(inet_csk_reqsk_queue_add(sk, req, child))) > + return child; > + bh_unlock_sock(child); > + sock_put(child); > } > - return child; > + > + reqsk_free(req); > + return NULL; > } > EXPORT_SYMBOL(tcp_get_cookie_sock); > > I prefer this form as well, but I'm not sure if removing the "temporary" WARN() is appropriate for -net. If it is, I'll resubmit. Otherwise I can refactor it after net-next reopens. Any opinion? Guillaume
On 03/08/2019 02:22 PM, Guillaume Nault wrote: > On Fri, Mar 08, 2019 at 01:33:02PM -0800, Eric Dumazet wrote: >> >> >> On 03/08/2019 01:09 PM, Guillaume Nault wrote: >>> @@ -216,7 +216,12 @@ struct sock *tcp_get_cookie_sock(struct sock *sk, struct sk_buff *skb, >>> refcount_set(&req->rsk_refcnt, 1); >>> tcp_sk(child)->tsoffset = tsoff; >>> sock_rps_save_rxhash(child, skb); >>> - inet_csk_reqsk_queue_add(sk, req, child); >>> + if (!inet_csk_reqsk_queue_add(sk, req, child)) { >>> + bh_unlock_sock(child); >>> + sock_put(child); >>> + child = NULL; >>> + reqsk_put(req); >> >> Since we use reqsk_free(req) in the same function, we can use reqsk_free(req) >> here as well ? >> > That was my first approach, but reqsk_free() doesn't like it: > > static inline void reqsk_free(struct request_sock *req) > { > /* temporary debugging */ > WARN_ON_ONCE(refcount_read(&req->rsk_refcnt) != 0); > ... > } Oh right, there is this refcount_set(&req->rsk_refcnt, 1) before the call to inet_csk_reqsk_queue_add(sk, req, child); So just change the TFO case only :)
On Fri, Mar 08, 2019 at 02:34:07PM -0800, Eric Dumazet wrote: > > > On 03/08/2019 02:22 PM, Guillaume Nault wrote: > > On Fri, Mar 08, 2019 at 01:33:02PM -0800, Eric Dumazet wrote: > >> > >> > >> On 03/08/2019 01:09 PM, Guillaume Nault wrote: > >>> @@ -216,7 +216,12 @@ struct sock *tcp_get_cookie_sock(struct sock *sk, struct sk_buff *skb, > >>> refcount_set(&req->rsk_refcnt, 1); > >>> tcp_sk(child)->tsoffset = tsoff; > >>> sock_rps_save_rxhash(child, skb); > >>> - inet_csk_reqsk_queue_add(sk, req, child); > >>> + if (!inet_csk_reqsk_queue_add(sk, req, child)) { > >>> + bh_unlock_sock(child); > >>> + sock_put(child); > >>> + child = NULL; > >>> + reqsk_put(req); > >> > >> Since we use reqsk_free(req) in the same function, we can use reqsk_free(req) > >> here as well ? > >> > > That was my first approach, but reqsk_free() doesn't like it: > > > > static inline void reqsk_free(struct request_sock *req) > > { > > /* temporary debugging */ > > WARN_ON_ONCE(refcount_read(&req->rsk_refcnt) != 0); > > ... > > } > > Oh right, there is this refcount_set(&req->rsk_refcnt, 1) before the call > to inet_csk_reqsk_queue_add(sk, req, child); > > So just change the TFO case only :) > Well.. refcount is 1 in the TFO case too. Long term, do we want to keep the WARN_ON_ONCE()? If so, we should probably remove the comment.
On 03/08/2019 02:40 PM, Guillaume Nault wrote: > On Fri, Mar 08, 2019 at 02:34:07PM -0800, Eric Dumazet wrote: >> >> >> On 03/08/2019 02:22 PM, Guillaume Nault wrote: >>> On Fri, Mar 08, 2019 at 01:33:02PM -0800, Eric Dumazet wrote: >>>> >>>> >>>> On 03/08/2019 01:09 PM, Guillaume Nault wrote: >>>>> @@ -216,7 +216,12 @@ struct sock *tcp_get_cookie_sock(struct sock *sk, struct sk_buff *skb, >>>>> refcount_set(&req->rsk_refcnt, 1); >>>>> tcp_sk(child)->tsoffset = tsoff; >>>>> sock_rps_save_rxhash(child, skb); >>>>> - inet_csk_reqsk_queue_add(sk, req, child); >>>>> + if (!inet_csk_reqsk_queue_add(sk, req, child)) { >>>>> + bh_unlock_sock(child); >>>>> + sock_put(child); >>>>> + child = NULL; >>>>> + reqsk_put(req); >>>> >>>> Since we use reqsk_free(req) in the same function, we can use reqsk_free(req) >>>> here as well ? >>>> >>> That was my first approach, but reqsk_free() doesn't like it: >>> >>> static inline void reqsk_free(struct request_sock *req) >>> { >>> /* temporary debugging */ >>> WARN_ON_ONCE(refcount_read(&req->rsk_refcnt) != 0); >>> ... >>> } >> >> Oh right, there is this refcount_set(&req->rsk_refcnt, 1) before the call >> to inet_csk_reqsk_queue_add(sk, req, child); >> >> So just change the TFO case only :) >> > Well.. refcount is 1 in the TFO case too. Arg... > > Long term, do we want to keep the WARN_ON_ONCE()? If so, we should > probably remove the comment. We want to keep the warning. We do not have a way to tell if the req was ever inserted in a hash table, so better play safe. Signed-off-by: Eric Dumazet <edumazet@google.com> Thanks !
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Fri, 8 Mar 2019 15:47:25 -0800 > Signed-off-by: Eric Dumazet <edumazet@google.com> Applied and queued up for -stable.
On Fri, Mar 08, 2019 at 03:47:25PM -0800, Eric Dumazet wrote: > > On 03/08/2019 02:40 PM, Guillaume Nault wrote: > > On Fri, Mar 08, 2019 at 02:34:07PM -0800, Eric Dumazet wrote: > > > > Long term, do we want to keep the WARN_ON_ONCE()? If so, we should > > probably remove the comment. > > We want to keep the warning. > > We do not have a way to tell if the req was ever inserted in a hash table, so better play safe. > Then I'm going to remove the /* temporary debugging */ line, so that nobody will be tempted to drop the test. Thanks for your feedbacks. Guillaume
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c index 606f868d9f3f..e531344611a0 100644 --- a/net/ipv4/syncookies.c +++ b/net/ipv4/syncookies.c @@ -216,7 +216,12 @@ struct sock *tcp_get_cookie_sock(struct sock *sk, struct sk_buff *skb, refcount_set(&req->rsk_refcnt, 1); tcp_sk(child)->tsoffset = tsoff; sock_rps_save_rxhash(child, skb); - inet_csk_reqsk_queue_add(sk, req, child); + if (!inet_csk_reqsk_queue_add(sk, req, child)) { + bh_unlock_sock(child); + sock_put(child); + child = NULL; + reqsk_put(req); + } } else { reqsk_free(req); } diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 4eb0c8ca3c60..5def3c48870e 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -6498,7 +6498,13 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops, af_ops->send_synack(fastopen_sk, dst, &fl, req, &foc, TCP_SYNACK_FASTOPEN); /* Add the child socket directly into the accept queue */ - inet_csk_reqsk_queue_add(sk, req, fastopen_sk); + if (!inet_csk_reqsk_queue_add(sk, req, fastopen_sk)) { + reqsk_fastopen_remove(fastopen_sk, req, false); + bh_unlock_sock(fastopen_sk); + sock_put(fastopen_sk); + reqsk_put(req); + goto drop; + } sk->sk_data_ready(sk); bh_unlock_sock(fastopen_sk); sock_put(fastopen_sk);
Commit 7716682cc58e ("tcp/dccp: fix another race at listener dismantle") let inet_csk_reqsk_queue_add() fail, and adjusted {tcp,dccp}_check_req() accordingly. However, TFO and syncookies weren't modified, thus leaking allocated resources on error. Contrary to tcp_check_req(), in both syncookies and TFO cases, we need to drop the request socket. Also, since the child socket is created with inet_csk_clone_lock(), we have to unlock it and drop an extra reference (->sk_refcount is initially set to 2 and inet_csk_reqsk_queue_add() drops only one ref). For TFO, we also need to revert the work done by tcp_try_fastopen() (with reqsk_fastopen_remove()). Fixes: 7716682cc58e ("tcp/dccp: fix another race at listener dismantle") Signed-off-by: Guillaume Nault <gnault@redhat.com> --- Note for stable backports: this patch relies on da8ab57863ed ("tcp/dccp: remove reqsk_put() from inet_child_forget()"), to prevent inet_child_forget() from dropping a reference from the request socket. Therefore, for trees older than 4.14, commit da8ab57863ed has to be backported before this patch. net/ipv4/syncookies.c | 7 ++++++- net/ipv4/tcp_input.c | 8 +++++++- 2 files changed, 13 insertions(+), 2 deletions(-)