diff mbox series

[bpf] bpf, sockmap: RCU splat with TLS redirect and strparser

Message ID 159303296342.360.5487181450879978407.stgit@john-XPS-13-9370
State Changes Requested
Delegated to: BPF Maintainers
Headers show
Series [bpf] bpf, sockmap: RCU splat with TLS redirect and strparser | expand

Commit Message

John Fastabend June 24, 2020, 9:09 p.m. UTC
Redirect on non-TLS sockmap side has RCU lock held from sockmap code
path but when called from TLS this is no longer true. The RCU section
is needed because we use rcu dereference to fetch the psock of the
socket we are redirecting to.

To fix the below splat wrap the redirect usage in rcu_read_lock,
rcu_read_unlock.

[ 1095.937597] WARNING: suspicious RCU usage
[ 1095.940964] 5.7.0-rc7-02911-g463bac5f1ca79 #1 Tainted: G        W
[ 1095.944363] -----------------------------
[ 1095.947384] include/linux/skmsg.h:284 suspicious rcu_dereference_check() usage!
[ 1095.950866]
[ 1095.950866] other info that might help us debug this:
[ 1095.950866]
[ 1095.957146]
[ 1095.957146] rcu_scheduler_active = 2, debug_locks = 1
[ 1095.961482] 1 lock held by test_sockmap/15970:
[ 1095.964501]  #0: ffff9ea6b25de660 (sk_lock-AF_INET){+.+.}-{0:0}, at: tls_sw_recvmsg+0x13a/0x840 [tls]
[ 1095.968568]
[ 1095.968568] stack backtrace:
[ 1095.975001] CPU: 1 PID: 15970 Comm: test_sockmap Tainted: G        W         5.7.0-rc7-02911-g463bac5f1ca79 #1
[ 1095.977883] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 1095.980519] Call Trace:
[ 1095.982191]  dump_stack+0x8f/0xd0
[ 1095.984040]  sk_psock_skb_redirect+0xa6/0xf0
[ 1095.986073]  sk_psock_tls_strp_read+0x1d8/0x250
[ 1095.988095]  tls_sw_recvmsg+0x714/0x840 [tls]

Fixes: e91de6afa81c1 ("bpf: Fix running sk_skb program types with ktls")
Reported-by: Jakub Sitnicki <jakub@cloudflare.com>
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 net/core/skmsg.c |    3 +++
 1 file changed, 3 insertions(+)

Comments

Martin KaFai Lau June 25, 2020, 6:22 a.m. UTC | #1
On Wed, Jun 24, 2020 at 02:09:23PM -0700, John Fastabend wrote:
> Redirect on non-TLS sockmap side has RCU lock held from sockmap code
> path but when called from TLS this is no longer true. The RCU section
> is needed because we use rcu dereference to fetch the psock of the
> socket we are redirecting to.
sk_psock_verdict_apply() is also called by sk_psock_strp_read() after
rcu_read_unlock().  This issue should not be limited to tls?
John Fastabend June 25, 2020, 4:46 p.m. UTC | #2
Martin KaFai Lau wrote:
> On Wed, Jun 24, 2020 at 02:09:23PM -0700, John Fastabend wrote:
> > Redirect on non-TLS sockmap side has RCU lock held from sockmap code
> > path but when called from TLS this is no longer true. The RCU section
> > is needed because we use rcu dereference to fetch the psock of the
> > socket we are redirecting to.
> sk_psock_verdict_apply() is also called by sk_psock_strp_read() after
> rcu_read_unlock().  This issue should not be limited to tls?

The base case is covered because the non-TLS case is wrapped in
rcu_read_lock/unlock here,

 static void sk_psock_strp_data_ready(struct sock *sk)
 {
	struct sk_psock *psock;

	rcu_read_lock();
	psock = sk_psock(sk);
	if (likely(psock)) {
		if (tls_sw_has_ctx_rx(sk)) {
			psock->parser.saved_data_ready(sk);
		} else {
			write_lock_bh(&sk->sk_callback_lock);
			strp_data_ready(&psock->parser.strp);
			write_unlock_bh(&sk->sk_callback_lock);
		}
	}
	rcu_read_unlock();
 }

There is a case that has existed for awhile where if a skb_clone()
fails or alloc_skb_for_msg() fails when building a merged skb. We
could call back into sk_psock_strp_read() from a workqueue in
strparser that would not be covered by above sk_psock_strp_data_ready().
This would hit the sk_psock_verdict_apply() you caught above.

We don't actually see this from selftests because in selftests we
always return skb->len indicating a msg is a single skb. In our
use cases this is all we've ever used to date so we've not actually
hit the case you call out. Another case that might hit this, based
on code review, is some of the zero copy TX cases.

To fix the above case I think its best to submit another patch
because the Fixes tag will be different. Sound OK? I could push
them as a two patch series if that is easier to understand.

Also I should have a patch shortly to allow users to skip setting
up a parse_msg hook for the strparser. This helps because the
vast majority of use cases I've seen just use skb->len as the
message deliminator. It also bypasses above concern.

Thanks,
John
John Fastabend June 25, 2020, 5:07 p.m. UTC | #3
John Fastabend wrote:
> Martin KaFai Lau wrote:
> > On Wed, Jun 24, 2020 at 02:09:23PM -0700, John Fastabend wrote:
> > > Redirect on non-TLS sockmap side has RCU lock held from sockmap code
> > > path but when called from TLS this is no longer true. The RCU section
> > > is needed because we use rcu dereference to fetch the psock of the
> > > socket we are redirecting to.
> > sk_psock_verdict_apply() is also called by sk_psock_strp_read() after
> > rcu_read_unlock().  This issue should not be limited to tls?
> 
> The base case is covered because the non-TLS case is wrapped in
> rcu_read_lock/unlock here,
> 
>  static void sk_psock_strp_data_ready(struct sock *sk)
>  {
> 	struct sk_psock *psock;
> 
> 	rcu_read_lock();
> 	psock = sk_psock(sk);
> 	if (likely(psock)) {
> 		if (tls_sw_has_ctx_rx(sk)) {
> 			psock->parser.saved_data_ready(sk);
> 		} else {
> 			write_lock_bh(&sk->sk_callback_lock);
> 			strp_data_ready(&psock->parser.strp);
> 			write_unlock_bh(&sk->sk_callback_lock);
> 		}
> 	}
> 	rcu_read_unlock();
>  }
> 
> There is a case that has existed for awhile where if a skb_clone()
> fails or alloc_skb_for_msg() fails when building a merged skb. We
> could call back into sk_psock_strp_read() from a workqueue in
> strparser that would not be covered by above sk_psock_strp_data_ready().
> This would hit the sk_psock_verdict_apply() you caught above.
> 
> We don't actually see this from selftests because in selftests we
> always return skb->len indicating a msg is a single skb. In our
> use cases this is all we've ever used to date so we've not actually
> hit the case you call out. Another case that might hit this, based
> on code review, is some of the zero copy TX cases.
> 
> To fix the above case I think its best to submit another patch
> because the Fixes tag will be different. Sound OK? I could push
> them as a two patch series if that is easier to understand.

Sorry not enough coffee this morning the fix here is enough for
both cases I'll update the commit message.

> 
> Also I should have a patch shortly to allow users to skip setting
> up a parse_msg hook for the strparser. This helps because the
> vast majority of use cases I've seen just use skb->len as the
> message deliminator. It also bypasses above concern.
> 
> Thanks,
> John
diff mbox series

Patch

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 351afbf6bfba..070553fa3900 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -694,9 +694,11 @@  static void sk_psock_skb_redirect(struct sk_psock *psock, struct sk_buff *skb)
 		kfree_skb(skb);
 		return;
 	}
+	rcu_read_lock();
 	psock_other = sk_psock(sk_other);
 	if (!psock_other || sock_flag(sk_other, SOCK_DEAD) ||
 	    !sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED)) {
+		rcu_read_unlock();
 		kfree_skb(skb);
 		return;
 	}
@@ -713,6 +715,7 @@  static void sk_psock_skb_redirect(struct sk_psock *psock, struct sk_buff *skb)
 	} else {
 		kfree_skb(skb);
 	}
+	rcu_read_unlock();
 }
 
 static void sk_psock_tls_verdict_apply(struct sk_psock *psock,