diff mbox series

[net] tcp: avoid creating multiple req socks with the same tuples

Message ID 20190604145543.61624-1-maowenan@huawei.com
State Changes Requested
Delegated to: David Miller
Headers show
Series [net] tcp: avoid creating multiple req socks with the same tuples | expand

Commit Message

maowenan June 4, 2019, 2:55 p.m. UTC
There is one issue about bonding mode BOND_MODE_BROADCAST, and
two slaves with diffierent affinity, so packets will be handled
by different cpu. These are two pre-conditions in this case.

When two slaves receive the same syn packets at the same time,
two request sock(reqsk) will be created if below situation happens:
1. syn1 arrived tcp_conn_request, create reqsk1 and have not yet called
inet_csk_reqsk_queue_hash_add.
2. syn2 arrived tcp_v4_rcv, it goes to tcp_conn_request and create reqsk2
because it can't find reqsk1 in the __inet_lookup_skb.

Then reqsk1 and reqsk2 are added to establish hash table, and two synack with different
seq(seq1 and seq2) are sent to client, then tcp ack arrived and will be
processed in tcp_v4_rcv and tcp_check_req, if __inet_lookup_skb find the reqsk2, and
tcp ack packet is ack_seq is seq1, it will be failed after checking:
TCP_SKB_CB(skb)->ack_seq != tcp_rsk(req)->snt_isn + 1)
and then tcp rst will be sent to client and close the connection.

To fix this, do lookup before calling inet_csk_reqsk_queue_hash_add
to add reqsk2 to hash table, if it finds the existed reqsk1 with the same five tuples,
it removes reqsk2 and does not send synack to client.

Signed-off-by: Mao Wenan <maowenan@huawei.com>
---
 net/ipv4/tcp_input.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Eric Dumazet June 4, 2019, 3:24 p.m. UTC | #1
On Tue, Jun 4, 2019 at 7:47 AM Mao Wenan <maowenan@huawei.com> wrote:
>
> There is one issue about bonding mode BOND_MODE_BROADCAST, and
> two slaves with diffierent affinity, so packets will be handled
> by different cpu. These are two pre-conditions in this case.
>
> When two slaves receive the same syn packets at the same time,
> two request sock(reqsk) will be created if below situation happens:
> 1. syn1 arrived tcp_conn_request, create reqsk1 and have not yet called
> inet_csk_reqsk_queue_hash_add.
> 2. syn2 arrived tcp_v4_rcv, it goes to tcp_conn_request and create reqsk2
> because it can't find reqsk1 in the __inet_lookup_skb.
>
> Then reqsk1 and reqsk2 are added to establish hash table, and two synack with different
> seq(seq1 and seq2) are sent to client, then tcp ack arrived and will be
> processed in tcp_v4_rcv and tcp_check_req, if __inet_lookup_skb find the reqsk2, and
> tcp ack packet is ack_seq is seq1, it will be failed after checking:
> TCP_SKB_CB(skb)->ack_seq != tcp_rsk(req)->snt_isn + 1)
> and then tcp rst will be sent to client and close the connection.
>
> To fix this, do lookup before calling inet_csk_reqsk_queue_hash_add
> to add reqsk2 to hash table, if it finds the existed reqsk1 with the same five tuples,
> it removes reqsk2 and does not send synack to client.
>
> Signed-off-by: Mao Wenan <maowenan@huawei.com>
> ---
>  net/ipv4/tcp_input.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 08a477e74cf3..c75eeb1fe098 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -6569,6 +6569,15 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
>                 bh_unlock_sock(fastopen_sk);
>                 sock_put(fastopen_sk);
>         } else {
> +               struct sock *sk1 = req_to_sk(req);
> +               struct sock *sk2 = NULL;
> +               sk2 = __inet_lookup_established(sock_net(sk1), &tcp_hashinfo,
> +                                                                       sk1->sk_daddr, sk1->sk_dport,
> +                                                                       sk1->sk_rcv_saddr, sk1->sk_num,
> +                                                                       inet_iif(skb),inet_sdif(skb));
> +               if (sk2 != NULL)
> +                       goto drop_and_release;
> +
>                 tcp_rsk(req)->tfo_listener = false;
>                 if (!want_cookie)
>                         inet_csk_reqsk_queue_hash_add(sk, req,

This issue has been discussed last year.

I am afraid your patch does not solve all races.

The lookup you add is lockless, so this is racy.

Really the only way to solve this is to make sure that _when_ the
bucket lock is held,
we do not insert a request socket if the 4-tuple is already in the
chain (probably in inet_ehash_insert())

This needs more tricky changes than your patch.
maowenan June 5, 2019, 2:06 a.m. UTC | #2
On 2019/6/4 23:24, Eric Dumazet wrote:
> On Tue, Jun 4, 2019 at 7:47 AM Mao Wenan <maowenan@huawei.com> wrote:
>>
>> There is one issue about bonding mode BOND_MODE_BROADCAST, and
>> two slaves with diffierent affinity, so packets will be handled
>> by different cpu. These are two pre-conditions in this case.
>>
>> When two slaves receive the same syn packets at the same time,
>> two request sock(reqsk) will be created if below situation happens:
>> 1. syn1 arrived tcp_conn_request, create reqsk1 and have not yet called
>> inet_csk_reqsk_queue_hash_add.
>> 2. syn2 arrived tcp_v4_rcv, it goes to tcp_conn_request and create reqsk2
>> because it can't find reqsk1 in the __inet_lookup_skb.
>>
>> Then reqsk1 and reqsk2 are added to establish hash table, and two synack with different
>> seq(seq1 and seq2) are sent to client, then tcp ack arrived and will be
>> processed in tcp_v4_rcv and tcp_check_req, if __inet_lookup_skb find the reqsk2, and
>> tcp ack packet is ack_seq is seq1, it will be failed after checking:
>> TCP_SKB_CB(skb)->ack_seq != tcp_rsk(req)->snt_isn + 1)
>> and then tcp rst will be sent to client and close the connection.
>>
>> To fix this, do lookup before calling inet_csk_reqsk_queue_hash_add
>> to add reqsk2 to hash table, if it finds the existed reqsk1 with the same five tuples,
>> it removes reqsk2 and does not send synack to client.
>>
>> Signed-off-by: Mao Wenan <maowenan@huawei.com>
>> ---
>>  net/ipv4/tcp_input.c | 9 +++++++++
>>  1 file changed, 9 insertions(+)
>>
>> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
>> index 08a477e74cf3..c75eeb1fe098 100644
>> --- a/net/ipv4/tcp_input.c
>> +++ b/net/ipv4/tcp_input.c
>> @@ -6569,6 +6569,15 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
>>                 bh_unlock_sock(fastopen_sk);
>>                 sock_put(fastopen_sk);
>>         } else {
>> +               struct sock *sk1 = req_to_sk(req);
>> +               struct sock *sk2 = NULL;
>> +               sk2 = __inet_lookup_established(sock_net(sk1), &tcp_hashinfo,
>> +                                                                       sk1->sk_daddr, sk1->sk_dport,
>> +                                                                       sk1->sk_rcv_saddr, sk1->sk_num,
>> +                                                                       inet_iif(skb),inet_sdif(skb));
>> +               if (sk2 != NULL)
>> +                       goto drop_and_release;
>> +
>>                 tcp_rsk(req)->tfo_listener = false;
>>                 if (!want_cookie)
>>                         inet_csk_reqsk_queue_hash_add(sk, req,
> 
> This issue has been discussed last year.
Can you share discussion information?

> 
> I am afraid your patch does not solve all races.
> 
> The lookup you add is lockless, so this is racy.
it's right, it has already in race region.
> 
> Really the only way to solve this is to make sure that _when_ the
> bucket lock is held,
> we do not insert a request socket if the 4-tuple is already in the
> chain (probably in inet_ehash_insert())
> 

put lookup code in spin_lock() of inet_ehash_insert(), is it ok like this?
will it affect performance?

in inet_ehash_insert():
...
        spin_lock(lock);
+       reqsk = __inet_lookup_established(sock_net(sk), &tcp_hashinfo,
+                                                       sk->sk_daddr, sk->sk_dport,
+                                                       sk->sk_rcv_saddr, sk->sk_num,
+                                                       sk_bound_dev_if, sk_bound_dev_if);
+       if (reqsk) {
+               spin_unlock(lock);
+               return ret;
+       }
+
        if (osk) {
                WARN_ON_ONCE(sk->sk_hash != osk->sk_hash);
                ret = sk_nulls_del_node_init_rcu(osk);
	}
	if (ret)
		__sk_nulls_add_node_rcu(sk, list);
	spin_unlock(lock);
...

> This needs more tricky changes than your patch.
> 
> .
>
Eric Dumazet June 5, 2019, 3:16 a.m. UTC | #3
On Tue, Jun 4, 2019 at 7:07 PM maowenan <maowenan@huawei.com> wrote:
>
>
>
> On 2019/6/4 23:24, Eric Dumazet wrote:
> > On Tue, Jun 4, 2019 at 7:47 AM Mao Wenan <maowenan@huawei.com> wrote:
> >>
> >> There is one issue about bonding mode BOND_MODE_BROADCAST, and
> >> two slaves with diffierent affinity, so packets will be handled
> >> by different cpu. These are two pre-conditions in this case.
> >>
> >> When two slaves receive the same syn packets at the same time,
> >> two request sock(reqsk) will be created if below situation happens:
> >> 1. syn1 arrived tcp_conn_request, create reqsk1 and have not yet called
> >> inet_csk_reqsk_queue_hash_add.
> >> 2. syn2 arrived tcp_v4_rcv, it goes to tcp_conn_request and create reqsk2
> >> because it can't find reqsk1 in the __inet_lookup_skb.
> >>
> >> Then reqsk1 and reqsk2 are added to establish hash table, and two synack with different
> >> seq(seq1 and seq2) are sent to client, then tcp ack arrived and will be
> >> processed in tcp_v4_rcv and tcp_check_req, if __inet_lookup_skb find the reqsk2, and
> >> tcp ack packet is ack_seq is seq1, it will be failed after checking:
> >> TCP_SKB_CB(skb)->ack_seq != tcp_rsk(req)->snt_isn + 1)
> >> and then tcp rst will be sent to client and close the connection.
> >>
> >> To fix this, do lookup before calling inet_csk_reqsk_queue_hash_add
> >> to add reqsk2 to hash table, if it finds the existed reqsk1 with the same five tuples,
> >> it removes reqsk2 and does not send synack to client.
> >>
> >> Signed-off-by: Mao Wenan <maowenan@huawei.com>
> >> ---
> >>  net/ipv4/tcp_input.c | 9 +++++++++
> >>  1 file changed, 9 insertions(+)
> >>
> >> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> >> index 08a477e74cf3..c75eeb1fe098 100644
> >> --- a/net/ipv4/tcp_input.c
> >> +++ b/net/ipv4/tcp_input.c
> >> @@ -6569,6 +6569,15 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
> >>                 bh_unlock_sock(fastopen_sk);
> >>                 sock_put(fastopen_sk);
> >>         } else {
> >> +               struct sock *sk1 = req_to_sk(req);
> >> +               struct sock *sk2 = NULL;
> >> +               sk2 = __inet_lookup_established(sock_net(sk1), &tcp_hashinfo,
> >> +                                                                       sk1->sk_daddr, sk1->sk_dport,
> >> +                                                                       sk1->sk_rcv_saddr, sk1->sk_num,
> >> +                                                                       inet_iif(skb),inet_sdif(skb));
> >> +               if (sk2 != NULL)
> >> +                       goto drop_and_release;
> >> +
> >>                 tcp_rsk(req)->tfo_listener = false;
> >>                 if (!want_cookie)
> >>                         inet_csk_reqsk_queue_hash_add(sk, req,
> >
> > This issue has been discussed last year.
> Can you share discussion information?


https://www.spinics.net/lists/netdev/msg507423.html


>
> >
> > I am afraid your patch does not solve all races.
> >
> > The lookup you add is lockless, so this is racy.
> it's right, it has already in race region.
> >
> > Really the only way to solve this is to make sure that _when_ the
> > bucket lock is held,
> > we do not insert a request socket if the 4-tuple is already in the
> > chain (probably in inet_ehash_insert())
> >
>
> put lookup code in spin_lock() of inet_ehash_insert(), is it ok like this?
> will it affect performance?
>
> in inet_ehash_insert():
> ...
>         spin_lock(lock);
> +       reqsk = __inet_lookup_established(sock_net(sk), &tcp_hashinfo,
> +                                                       sk->sk_daddr, sk->sk_dport,
> +                                                       sk->sk_rcv_saddr, sk->sk_num,
> +                                                       sk_bound_dev_if, sk_bound_dev_if);
> +       if (reqsk) {

You should test this before asking :)


> +               spin_unlock(lock);
> +               return ret;
> +       }
> +
>         if (osk) {
>                 WARN_ON_ONCE(sk->sk_hash != osk->sk_hash);
>                 ret = sk_nulls_del_node_init_rcu(osk);
>         }
>         if (ret)
>                 __sk_nulls_add_node_rcu(sk, list);
>         spin_unlock(lock);
> ...
>
> > This needs more tricky changes than your patch.
> >
> > .
> >
>
Zhiqiang Liu June 5, 2019, 8:52 a.m. UTC | #4
在 2019/6/4 23:24, Eric Dumazet 写道:
> On Tue, Jun 4, 2019 at 7:47 AM Mao Wenan <maowenan@huawei.com> wrote:
>>
>> There is one issue about bonding mode BOND_MODE_BROADCAST, and
>> two slaves with diffierent affinity, so packets will be handled
>> by different cpu. These are two pre-conditions in this case.

>> Signed-off-by: Mao Wenan <maowenan@huawei.com>
>> --
> 
> This issue has been discussed last year.
> 
> I am afraid your patch does not solve all races.
> 
> The lookup you add is lockless, so this is racy.
> 
> Really the only way to solve this is to make sure that _when_ the
> bucket lock is held,
> we do not insert a request socket if the 4-tuple is already in the
> chain (probably in inet_ehash_insert())
> 
> This needs more tricky changes than your patch.
> 

This kind case is rarely used, and the condition of the issue is strict.
If we add the "lookup" before or in inet_ehash_insert func for each reqsk,
overall performance will be affected.

We may solve the small probability issue with a trick in the tcp_v4_rcv.
If the ACK is invalid checked by tcp_check_req func, the req could be dropped,
and then goto the lookup for searching another avaliable reqsk. In this way,
the performance will not be affected in the normal process.

The patch is given as following:
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index a2896944aa37..9d0491587ed2 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1874,8 +1874,10 @@ int tcp_v4_rcv(struct sk_buff *skb)
                        goto discard_and_relse;
                }
                if (nsk == sk) {
-                       reqsk_put(req);
+                       inet_csk_reqsk_queue_drop_and_put(sk, req);
                        tcp_v4_restore_cb(skb);
+                       sock_put(sk);
+                       goto lookup;
                } else if (tcp_child_process(sk, nsk, skb)) {
                        tcp_v4_send_reset(nsk, skb);
                        goto discard_and_relse;
Eric Dumazet June 5, 2019, 2:18 p.m. UTC | #5
On 6/5/19 1:52 AM, Zhiqiang Liu wrote:
> 
> 
> 在 2019/6/4 23:24, Eric Dumazet 写道:
>> On Tue, Jun 4, 2019 at 7:47 AM Mao Wenan <maowenan@huawei.com> wrote:
>>>
>>> There is one issue about bonding mode BOND_MODE_BROADCAST, and
>>> two slaves with diffierent affinity, so packets will be handled
>>> by different cpu. These are two pre-conditions in this case.
> 
>>> Signed-off-by: Mao Wenan <maowenan@huawei.com>
>>> --
>>
>> This issue has been discussed last year.
>>
>> I am afraid your patch does not solve all races.
>>
>> The lookup you add is lockless, so this is racy.
>>
>> Really the only way to solve this is to make sure that _when_ the
>> bucket lock is held,
>> we do not insert a request socket if the 4-tuple is already in the
>> chain (probably in inet_ehash_insert())
>>
>> This needs more tricky changes than your patch.
>>
> 
> This kind case is rarely used, and the condition of the issue is strict.
> If we add the "lookup" before or in inet_ehash_insert func for each reqsk,
> overall performance will be affected.
> 
> We may solve the small probability issue with a trick in the tcp_v4_rcv.
> If the ACK is invalid checked by tcp_check_req func, the req could be dropped,
> and then goto the lookup for searching another avaliable reqsk. In this way,
> the performance will not be affected in the normal process.
> 
> The patch is given as following:
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index a2896944aa37..9d0491587ed2 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1874,8 +1874,10 @@ int tcp_v4_rcv(struct sk_buff *skb)
>                         goto discard_and_relse;
>                 }
>                 if (nsk == sk) {
> -                       reqsk_put(req);
> +                       inet_csk_reqsk_queue_drop_and_put(sk, req);
>                         tcp_v4_restore_cb(skb);
> +                       sock_put(sk);
> +                       goto lookup;
>                 } else if (tcp_child_process(sk, nsk, skb)) {
>                         tcp_v4_send_reset(nsk, skb);
>                         goto discard_and_relse;
> 

This is not solving the race.

Please read again my prior emails.

If you want to work on this issue, you have to fix it for good.

Thanks.
diff mbox series

Patch

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 08a477e74cf3..c75eeb1fe098 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6569,6 +6569,15 @@  int tcp_conn_request(struct request_sock_ops *rsk_ops,
 		bh_unlock_sock(fastopen_sk);
 		sock_put(fastopen_sk);
 	} else {
+		struct sock *sk1 = req_to_sk(req);
+		struct sock *sk2 = NULL;
+		sk2 = __inet_lookup_established(sock_net(sk1), &tcp_hashinfo,
+									sk1->sk_daddr, sk1->sk_dport,
+									sk1->sk_rcv_saddr, sk1->sk_num,
+									inet_iif(skb),inet_sdif(skb));
+		if (sk2 != NULL)
+			goto drop_and_release;
+
 		tcp_rsk(req)->tfo_listener = false;
 		if (!want_cookie)
 			inet_csk_reqsk_queue_hash_add(sk, req,