diff mbox series

[net] tcp: handle inet_csk_reqsk_queue_add() failures

Message ID 9b8502f9cec31c971e480ee2281f5cd7088b50df.1552077823.git.gnault@redhat.com
State Accepted
Delegated to: David Miller
Headers show
Series [net] tcp: handle inet_csk_reqsk_queue_add() failures | expand

Commit Message

Guillaume Nault March 8, 2019, 9:09 p.m. UTC
Commit 7716682cc58e ("tcp/dccp: fix another race at listener
dismantle") let inet_csk_reqsk_queue_add() fail, and adjusted
{tcp,dccp}_check_req() accordingly. However, TFO and syncookies
weren't modified, thus leaking allocated resources on error.

Contrary to tcp_check_req(), in both syncookies and TFO cases,
we need to drop the request socket. Also, since the child socket is
created with inet_csk_clone_lock(), we have to unlock it and drop an
extra reference (->sk_refcount is initially set to 2 and
inet_csk_reqsk_queue_add() drops only one ref).

For TFO, we also need to revert the work done by tcp_try_fastopen()
(with reqsk_fastopen_remove()).

Fixes: 7716682cc58e ("tcp/dccp: fix another race at listener dismantle")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
---

Note for stable backports: this patch relies on da8ab57863ed
("tcp/dccp: remove reqsk_put() from inet_child_forget()"), to prevent
inet_child_forget() from dropping a reference from the request socket.

Therefore, for trees older than 4.14, commit da8ab57863ed has to be
backported before this patch.


 net/ipv4/syncookies.c | 7 ++++++-
 net/ipv4/tcp_input.c  | 8 +++++++-
 2 files changed, 13 insertions(+), 2 deletions(-)

Comments

Eric Dumazet March 8, 2019, 9:33 p.m. UTC | #1
On 03/08/2019 01:09 PM, Guillaume Nault wrote:
> Commit 7716682cc58e ("tcp/dccp: fix another race at listener
> dismantle") let inet_csk_reqsk_queue_add() fail, and adjusted
> {tcp,dccp}_check_req() accordingly. However, TFO and syncookies
> weren't modified, thus leaking allocated resources on error.
> 
> Contrary to tcp_check_req(), in both syncookies and TFO cases,
> we need to drop the request socket. Also, since the child socket is
> created with inet_csk_clone_lock(), we have to unlock it and drop an
> extra reference (->sk_refcount is initially set to 2 and
> inet_csk_reqsk_queue_add() drops only one ref).
> 
> For TFO, we also need to revert the work done by tcp_try_fastopen()
> (with reqsk_fastopen_remove()).
> 
> Fixes: 7716682cc58e ("tcp/dccp: fix another race at listener dismantle")
> Signed-off-by: Guillaume Nault <gnault@redhat.com>
> ---
> 
> Note for stable backports: this patch relies on da8ab57863ed
> ("tcp/dccp: remove reqsk_put() from inet_child_forget()"), to prevent
> inet_child_forget() from dropping a reference from the request socket.
> 
> Therefore, for trees older than 4.14, commit da8ab57863ed has to be
> backported before this patch.
> 

Thanks for working on this issue (it was on my radar as well)

> 
>  net/ipv4/syncookies.c | 7 ++++++-
>  net/ipv4/tcp_input.c  | 8 +++++++-
>  2 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
> index 606f868d9f3f..e531344611a0 100644
> --- a/net/ipv4/syncookies.c
> +++ b/net/ipv4/syncookies.c
> @@ -216,7 +216,12 @@ struct sock *tcp_get_cookie_sock(struct sock *sk, struct sk_buff *skb,
>  		refcount_set(&req->rsk_refcnt, 1);
>  		tcp_sk(child)->tsoffset = tsoff;
>  		sock_rps_save_rxhash(child, skb);
> -		inet_csk_reqsk_queue_add(sk, req, child);
> +		if (!inet_csk_reqsk_queue_add(sk, req, child)) {
> +			bh_unlock_sock(child);
> +			sock_put(child);
> +			child = NULL;
> +			reqsk_put(req);

Since we use reqsk_free(req) in the same function, we can use reqsk_free(req)
here as well ?

I suggest the following maybe :

diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index 606f868d9f3fde1c3140aa7eecde87d2ec32b5f2..8b28fb66a8fcefba27a2f5e371e9469d4d7e3650 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -216,11 +216,14 @@ struct sock *tcp_get_cookie_sock(struct sock *sk, struct sk_buff *skb,
                refcount_set(&req->rsk_refcnt, 1);
                tcp_sk(child)->tsoffset = tsoff;
                sock_rps_save_rxhash(child, skb);
-               inet_csk_reqsk_queue_add(sk, req, child);
-       } else {
-               reqsk_free(req);
+               if (likely(inet_csk_reqsk_queue_add(sk, req, child)))
+                       return child;
+               bh_unlock_sock(child);
+               sock_put(child);
        }
-       return child;
+
+       reqsk_free(req);
+       return NULL;
 }
 EXPORT_SYMBOL(tcp_get_cookie_sock);
 


> +		}
>  	} else {
>  		reqsk_free(req);
>  	}
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 4eb0c8ca3c60..5def3c48870e 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -6498,7 +6498,13 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
>  		af_ops->send_synack(fastopen_sk, dst, &fl, req,
>  				    &foc, TCP_SYNACK_FASTOPEN);
>  		/* Add the child socket directly into the accept queue */
> -		inet_csk_reqsk_queue_add(sk, req, fastopen_sk);
> +		if (!inet_csk_reqsk_queue_add(sk, req, fastopen_sk)) {
> +			reqsk_fastopen_remove(fastopen_sk, req, false);
> +			bh_unlock_sock(fastopen_sk);
> +			sock_put(fastopen_sk);



> +			reqsk_put(req);
> +			goto drop;
	These two lines can be replaced by :

	goto  drop_and_free;  

> +		}
>  		sk->sk_data_ready(sk);
>  		bh_unlock_sock(fastopen_sk);
>  		sock_put(fastopen_sk);
>
Guillaume Nault March 8, 2019, 10:22 p.m. UTC | #2
On Fri, Mar 08, 2019 at 01:33:02PM -0800, Eric Dumazet wrote:
> 
> 
> On 03/08/2019 01:09 PM, Guillaume Nault wrote:
> > @@ -216,7 +216,12 @@ struct sock *tcp_get_cookie_sock(struct sock *sk, struct sk_buff *skb,
> >  		refcount_set(&req->rsk_refcnt, 1);
> >  		tcp_sk(child)->tsoffset = tsoff;
> >  		sock_rps_save_rxhash(child, skb);
> > -		inet_csk_reqsk_queue_add(sk, req, child);
> > +		if (!inet_csk_reqsk_queue_add(sk, req, child)) {
> > +			bh_unlock_sock(child);
> > +			sock_put(child);
> > +			child = NULL;
> > +			reqsk_put(req);
> 
> Since we use reqsk_free(req) in the same function, we can use reqsk_free(req)
> here as well ?
> 
That was my first approach, but reqsk_free() doesn't like it:

static inline void reqsk_free(struct request_sock *req)
{
        /* temporary debugging */
	WARN_ON_ONCE(refcount_read(&req->rsk_refcnt) != 0);
...
}

> I suggest the following maybe :
> 
> diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
> index 606f868d9f3fde1c3140aa7eecde87d2ec32b5f2..8b28fb66a8fcefba27a2f5e371e9469d4d7e3650 100644
> --- a/net/ipv4/syncookies.c
> +++ b/net/ipv4/syncookies.c
> @@ -216,11 +216,14 @@ struct sock *tcp_get_cookie_sock(struct sock *sk, struct sk_buff *skb,
>                 refcount_set(&req->rsk_refcnt, 1);
>                 tcp_sk(child)->tsoffset = tsoff;
>                 sock_rps_save_rxhash(child, skb);
> -               inet_csk_reqsk_queue_add(sk, req, child);
> -       } else {
> -               reqsk_free(req);
> +               if (likely(inet_csk_reqsk_queue_add(sk, req, child)))
> +                       return child;
> +               bh_unlock_sock(child);
> +               sock_put(child);
>         }
> -       return child;
> +
> +       reqsk_free(req);
> +       return NULL;
>  }
>  EXPORT_SYMBOL(tcp_get_cookie_sock);
>  
> 
I prefer this form as well, but I'm not sure if removing the
"temporary" WARN() is appropriate for -net. If it is, I'll resubmit.
Otherwise I can refactor it after net-next reopens. Any opinion?

Guillaume
Eric Dumazet March 8, 2019, 10:34 p.m. UTC | #3
On 03/08/2019 02:22 PM, Guillaume Nault wrote:
> On Fri, Mar 08, 2019 at 01:33:02PM -0800, Eric Dumazet wrote:
>>
>>
>> On 03/08/2019 01:09 PM, Guillaume Nault wrote:
>>> @@ -216,7 +216,12 @@ struct sock *tcp_get_cookie_sock(struct sock *sk, struct sk_buff *skb,
>>>  		refcount_set(&req->rsk_refcnt, 1);
>>>  		tcp_sk(child)->tsoffset = tsoff;
>>>  		sock_rps_save_rxhash(child, skb);
>>> -		inet_csk_reqsk_queue_add(sk, req, child);
>>> +		if (!inet_csk_reqsk_queue_add(sk, req, child)) {
>>> +			bh_unlock_sock(child);
>>> +			sock_put(child);
>>> +			child = NULL;
>>> +			reqsk_put(req);
>>
>> Since we use reqsk_free(req) in the same function, we can use reqsk_free(req)
>> here as well ?
>>
> That was my first approach, but reqsk_free() doesn't like it:
> 
> static inline void reqsk_free(struct request_sock *req)
> {
>         /* temporary debugging */
> 	WARN_ON_ONCE(refcount_read(&req->rsk_refcnt) != 0);
> ...
> }

Oh right, there is this refcount_set(&req->rsk_refcnt, 1) before the call
to inet_csk_reqsk_queue_add(sk, req, child);

So just change the TFO case only :)
Guillaume Nault March 8, 2019, 10:40 p.m. UTC | #4
On Fri, Mar 08, 2019 at 02:34:07PM -0800, Eric Dumazet wrote:
> 
> 
> On 03/08/2019 02:22 PM, Guillaume Nault wrote:
> > On Fri, Mar 08, 2019 at 01:33:02PM -0800, Eric Dumazet wrote:
> >>
> >>
> >> On 03/08/2019 01:09 PM, Guillaume Nault wrote:
> >>> @@ -216,7 +216,12 @@ struct sock *tcp_get_cookie_sock(struct sock *sk, struct sk_buff *skb,
> >>>  		refcount_set(&req->rsk_refcnt, 1);
> >>>  		tcp_sk(child)->tsoffset = tsoff;
> >>>  		sock_rps_save_rxhash(child, skb);
> >>> -		inet_csk_reqsk_queue_add(sk, req, child);
> >>> +		if (!inet_csk_reqsk_queue_add(sk, req, child)) {
> >>> +			bh_unlock_sock(child);
> >>> +			sock_put(child);
> >>> +			child = NULL;
> >>> +			reqsk_put(req);
> >>
> >> Since we use reqsk_free(req) in the same function, we can use reqsk_free(req)
> >> here as well ?
> >>
> > That was my first approach, but reqsk_free() doesn't like it:
> > 
> > static inline void reqsk_free(struct request_sock *req)
> > {
> >         /* temporary debugging */
> > 	WARN_ON_ONCE(refcount_read(&req->rsk_refcnt) != 0);
> > ...
> > }
> 
> Oh right, there is this refcount_set(&req->rsk_refcnt, 1) before the call
> to inet_csk_reqsk_queue_add(sk, req, child);
> 
> So just change the TFO case only :)
> 
Well.. refcount is 1 in the TFO case too.

Long term, do we want to keep the WARN_ON_ONCE()? If so, we should
probably remove the comment.
Eric Dumazet March 8, 2019, 11:47 p.m. UTC | #5
On 03/08/2019 02:40 PM, Guillaume Nault wrote:
> On Fri, Mar 08, 2019 at 02:34:07PM -0800, Eric Dumazet wrote:
>>
>>
>> On 03/08/2019 02:22 PM, Guillaume Nault wrote:
>>> On Fri, Mar 08, 2019 at 01:33:02PM -0800, Eric Dumazet wrote:
>>>>
>>>>
>>>> On 03/08/2019 01:09 PM, Guillaume Nault wrote:
>>>>> @@ -216,7 +216,12 @@ struct sock *tcp_get_cookie_sock(struct sock *sk, struct sk_buff *skb,
>>>>>  		refcount_set(&req->rsk_refcnt, 1);
>>>>>  		tcp_sk(child)->tsoffset = tsoff;
>>>>>  		sock_rps_save_rxhash(child, skb);
>>>>> -		inet_csk_reqsk_queue_add(sk, req, child);
>>>>> +		if (!inet_csk_reqsk_queue_add(sk, req, child)) {
>>>>> +			bh_unlock_sock(child);
>>>>> +			sock_put(child);
>>>>> +			child = NULL;
>>>>> +			reqsk_put(req);
>>>>
>>>> Since we use reqsk_free(req) in the same function, we can use reqsk_free(req)
>>>> here as well ?
>>>>
>>> That was my first approach, but reqsk_free() doesn't like it:
>>>
>>> static inline void reqsk_free(struct request_sock *req)
>>> {
>>>         /* temporary debugging */
>>> 	WARN_ON_ONCE(refcount_read(&req->rsk_refcnt) != 0);
>>> ...
>>> }
>>
>> Oh right, there is this refcount_set(&req->rsk_refcnt, 1) before the call
>> to inet_csk_reqsk_queue_add(sk, req, child);
>>
>> So just change the TFO case only :)
>>
> Well.. refcount is 1 in the TFO case too.


Arg...

> 
> Long term, do we want to keep the WARN_ON_ONCE()? If so, we should
> probably remove the comment.

We want to keep the warning.

We do not have a way to tell if the req was ever inserted in a hash table, so better play safe.

Signed-off-by: Eric Dumazet <edumazet@google.com>

Thanks !
David Miller March 9, 2019, 12:06 a.m. UTC | #6
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 8 Mar 2019 15:47:25 -0800

> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied and queued up for -stable.
Guillaume Nault March 9, 2019, 9:02 a.m. UTC | #7
On Fri, Mar 08, 2019 at 03:47:25PM -0800, Eric Dumazet wrote:
> 
> On 03/08/2019 02:40 PM, Guillaume Nault wrote:
> > On Fri, Mar 08, 2019 at 02:34:07PM -0800, Eric Dumazet wrote:
> > 
> > Long term, do we want to keep the WARN_ON_ONCE()? If so, we should
> > probably remove the comment.
> 
> We want to keep the warning.
> 
> We do not have a way to tell if the req was ever inserted in a hash table, so better play safe.
> 
Then I'm going to remove the /* temporary debugging */ line, so that
nobody will be tempted to drop the test.

Thanks for your feedbacks.

Guillaume
diff mbox series

Patch

diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index 606f868d9f3f..e531344611a0 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -216,7 +216,12 @@  struct sock *tcp_get_cookie_sock(struct sock *sk, struct sk_buff *skb,
 		refcount_set(&req->rsk_refcnt, 1);
 		tcp_sk(child)->tsoffset = tsoff;
 		sock_rps_save_rxhash(child, skb);
-		inet_csk_reqsk_queue_add(sk, req, child);
+		if (!inet_csk_reqsk_queue_add(sk, req, child)) {
+			bh_unlock_sock(child);
+			sock_put(child);
+			child = NULL;
+			reqsk_put(req);
+		}
 	} else {
 		reqsk_free(req);
 	}
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 4eb0c8ca3c60..5def3c48870e 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6498,7 +6498,13 @@  int tcp_conn_request(struct request_sock_ops *rsk_ops,
 		af_ops->send_synack(fastopen_sk, dst, &fl, req,
 				    &foc, TCP_SYNACK_FASTOPEN);
 		/* Add the child socket directly into the accept queue */
-		inet_csk_reqsk_queue_add(sk, req, fastopen_sk);
+		if (!inet_csk_reqsk_queue_add(sk, req, fastopen_sk)) {
+			reqsk_fastopen_remove(fastopen_sk, req, false);
+			bh_unlock_sock(fastopen_sk);
+			sock_put(fastopen_sk);
+			reqsk_put(req);
+			goto drop;
+		}
 		sk->sk_data_ready(sk);
 		bh_unlock_sock(fastopen_sk);
 		sock_put(fastopen_sk);