diff mbox series

[net,4/9] net/mlx5e: Fix refcount leak on kTLS RX resync

Message ID 20201103191830.60151-5-saeedm@nvidia.com
State Changes Requested
Delegated to: David Miller
Headers show
Series [net,1/9] net/mlx5e: Fix modify header actions memory leak | expand

Checks

Context Check Description
jkicinski/cover_letter warning Series does not have a cover letter
jkicinski/fixes_present success Link
jkicinski/patch_count success Link
jkicinski/tree_selection success Clearly marked for net
jkicinski/subject_prefix success Link
jkicinski/source_inline success Was 0 now: 0
jkicinski/verify_signedoff success Link
jkicinski/module_param success Was 0 now: 0
jkicinski/build_32bit success Errors and warnings before: 0 this patch: 0
jkicinski/kdoc success Errors and warnings before: 0 this patch: 0
jkicinski/verify_fixes success Link
jkicinski/checkpatch success total: 0 errors, 0 warnings, 0 checks, 27 lines checked
jkicinski/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
jkicinski/header_inline success Link
jkicinski/stable success Stable not CCed

Commit Message

Saeed Mahameed Nov. 3, 2020, 7:18 p.m. UTC
From: Maxim Mikityanskiy <maximmi@mellanox.com>

On resync, the driver calls inet_lookup_established
(__inet6_lookup_established) that increases sk_refcnt of the socket. To
decrease it, the driver set skb->destructor to sock_edemux. However, it
didn't work well, because the TCP stack also sets this destructor for
early demux, and the refcount gets decreased only once, while increased
two times (in mlx5e and in the TCP stack). It leads to a socket leak, a
TLS context leak, which in the end leads to calling tls_dev_del twice:
on socket close and on driver unload, which in turn leads to a crash.

This commit fixes the refcount leak by calling sock_gen_put right away
after using the socket, thus fixing all the subsequent issues.

Fixes: 0419d8c9d8f8 ("net/mlx5e: kTLS, Add kTLS RX resync support")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c  | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

Comments

Jakub Kicinski Nov. 4, 2020, 10:59 p.m. UTC | #1
On Tue, 3 Nov 2020 11:18:25 -0800 Saeed Mahameed wrote:
> From: Maxim Mikityanskiy <maximmi@mellanox.com>
> 
> On resync, the driver calls inet_lookup_established
> (__inet6_lookup_established) that increases sk_refcnt of the socket. To
> decrease it, the driver set skb->destructor to sock_edemux. However, it
> didn't work well, because the TCP stack also sets this destructor for
> early demux, and the refcount gets decreased only once

Why is the stack doing early_demux if there is already a socket
assigned? Or is it not early_demux but something else?
Can you point us at the code?

IPv4:
	if (net->ipv4.sysctl_ip_early_demux &&
	    !skb_dst(skb) &&
	    !skb->sk &&                              <============
	    !ip_is_fragment(iph)) {
		const struct net_protocol *ipprot;
		int protocol = iph->protocol;

		ipprot = rcu_dereference(inet_protos[protocol]);
		if (ipprot && (edemux = READ_ONCE(ipprot->early_demux))) {
			err = INDIRECT_CALL_2(edemux, tcp_v4_early_demux,
					      udp_v4_early_demux, skb);
			if (unlikely(err))
				goto drop_error;
			/* must reload iph, skb->head might have changed */
			iph = ip_hdr(skb);
		}
	}

IPv6:
	if (net->ipv4.sysctl_ip_early_demux && !skb_dst(skb) && skb->sk == NULL) {
                                                                ~~~~~~~~~~~~~~~
		const struct inet6_protocol *ipprot;

		ipprot = rcu_dereference(inet6_protos[ipv6_hdr(skb)->nexthdr]);
		if (ipprot && (edemux = READ_ONCE(ipprot->early_demux)))
			INDIRECT_CALL_2(edemux, tcp_v6_early_demux,
					udp_v6_early_demux, skb);
	}
Maxim Mikityanskiy Nov. 5, 2020, 10:15 a.m. UTC | #2
On 2020-11-05 00:59, Jakub Kicinski wrote:
> On Tue, 3 Nov 2020 11:18:25 -0800 Saeed Mahameed wrote:
>> From: Maxim Mikityanskiy <maximmi@mellanox.com>
>>
>> On resync, the driver calls inet_lookup_established
>> (__inet6_lookup_established) that increases sk_refcnt of the socket. To
>> decrease it, the driver set skb->destructor to sock_edemux. However, it
>> didn't work well, because the TCP stack also sets this destructor for
>> early demux, and the refcount gets decreased only once
> 
> Why is the stack doing early_demux if there is already a socket
> assigned? Or is it not early_demux but something else?
> Can you point us at the code?

I thought this was the code that was in conflict with setting 
skb->destructor in the driver:

void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
{
         skb_orphan(skb);
         skb->sk = sk;
#ifdef CONFIG_INET
         if (unlikely(!sk_fullsock(sk))) {
                 skb->destructor = sock_edemux;
                 sock_hold(sk);
                 return;
         }
#endif

However, after taking another look, it seems that the root cause is 
somewhere else. This piece of code actually calls skb_orphan before 
reassigning the destructor.

I'll debug it further to try to find where the destructor is replaced or 
just not called.

> IPv4:
> 	if (net->ipv4.sysctl_ip_early_demux &&
> 	    !skb_dst(skb) &&
> 	    !skb->sk &&                              <============
> 	    !ip_is_fragment(iph)) {
> 		const struct net_protocol *ipprot;
> 		int protocol = iph->protocol;
> 
> 		ipprot = rcu_dereference(inet_protos[protocol]);
> 		if (ipprot && (edemux = READ_ONCE(ipprot->early_demux))) {
> 			err = INDIRECT_CALL_2(edemux, tcp_v4_early_demux,
> 					      udp_v4_early_demux, skb);
> 			if (unlikely(err))
> 				goto drop_error;
> 			/* must reload iph, skb->head might have changed */
> 			iph = ip_hdr(skb);
> 		}
> 	}
> 
> IPv6:
> 	if (net->ipv4.sysctl_ip_early_demux && !skb_dst(skb) && skb->sk == NULL) {
>                                                                  ~~~~~~~~~~~~~~~
> 		const struct inet6_protocol *ipprot;
> 
> 		ipprot = rcu_dereference(inet6_protos[ipv6_hdr(skb)->nexthdr]);
> 		if (ipprot && (edemux = READ_ONCE(ipprot->early_demux)))
> 			INDIRECT_CALL_2(edemux, tcp_v6_early_demux,
> 					udp_v6_early_demux, skb);
> 	}
>
diff mbox series

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c
index 7f6221b8b1f7..6a1d82503ef8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c
@@ -476,19 +476,22 @@  static void resync_update_sn(struct mlx5e_rq *rq, struct sk_buff *skb)
 
 	depth += sizeof(struct tcphdr);
 
-	if (unlikely(!sk || sk->sk_state == TCP_TIME_WAIT))
+	if (unlikely(!sk))
 		return;
 
-	if (unlikely(!resync_queue_get_psv(sk)))
-		return;
+	if (unlikely(sk->sk_state == TCP_TIME_WAIT))
+		goto unref;
 
-	skb->sk = sk;
-	skb->destructor = sock_edemux;
+	if (unlikely(!resync_queue_get_psv(sk)))
+		goto unref;
 
 	seq = th->seq;
 	datalen = skb->len - depth;
 	tls_offload_rx_resync_async_request_start(sk, seq, datalen);
 	rq->stats->tls_resync_req_start++;
+
+unref:
+	sock_gen_put(sk);
 }
 
 void mlx5e_ktls_rx_resync(struct net_device *netdev, struct sock *sk,