Message ID | 159303296342.360.5487181450879978407.stgit@john-XPS-13-9370 |
---|---|
State | Changes Requested |
Delegated to: | BPF Maintainers |
Headers | show |
Series | [bpf] bpf, sockmap: RCU splat with TLS redirect and strparser | expand |
On Wed, Jun 24, 2020 at 02:09:23PM -0700, John Fastabend wrote: > Redirect on non-TLS sockmap side has RCU lock held from sockmap code > path but when called from TLS this is no longer true. The RCU section > is needed because we use rcu dereference to fetch the psock of the > socket we are redirecting to. sk_psock_verdict_apply() is also called by sk_psock_strp_read() after rcu_read_unlock(). This issue should not be limited to tls?
Martin KaFai Lau wrote: > On Wed, Jun 24, 2020 at 02:09:23PM -0700, John Fastabend wrote: > > Redirect on non-TLS sockmap side has RCU lock held from sockmap code > > path but when called from TLS this is no longer true. The RCU section > > is needed because we use rcu dereference to fetch the psock of the > > socket we are redirecting to. > sk_psock_verdict_apply() is also called by sk_psock_strp_read() after > rcu_read_unlock(). This issue should not be limited to tls? The base case is covered because the non-TLS case is wrapped in rcu_read_lock/unlock here, static void sk_psock_strp_data_ready(struct sock *sk) { struct sk_psock *psock; rcu_read_lock(); psock = sk_psock(sk); if (likely(psock)) { if (tls_sw_has_ctx_rx(sk)) { psock->parser.saved_data_ready(sk); } else { write_lock_bh(&sk->sk_callback_lock); strp_data_ready(&psock->parser.strp); write_unlock_bh(&sk->sk_callback_lock); } } rcu_read_unlock(); } There is a case that has existed for awhile where if a skb_clone() fails or alloc_skb_for_msg() fails when building a merged skb. We could call back into sk_psock_strp_read() from a workqueue in strparser that would not be covered by above sk_psock_strp_data_ready(). This would hit the sk_psock_verdict_apply() you caught above. We don't actually see this from selftests because in selftests we always return skb->len indicating a msg is a single skb. In our use cases this is all we've ever used to date so we've not actually hit the case you call out. Another case that might hit this, based on code review, is some of the zero copy TX cases. To fix the above case I think its best to submit another patch because the Fixes tag will be different. Sound OK? I could push them as a two patch series if that is easier to understand. Also I should have a patch shortly to allow users to skip setting up a parse_msg hook for the strparser. This helps because the vast majority of use cases I've seen just use skb->len as the message deliminator. It also bypasses above concern. Thanks, John
John Fastabend wrote: > Martin KaFai Lau wrote: > > On Wed, Jun 24, 2020 at 02:09:23PM -0700, John Fastabend wrote: > > > Redirect on non-TLS sockmap side has RCU lock held from sockmap code > > > path but when called from TLS this is no longer true. The RCU section > > > is needed because we use rcu dereference to fetch the psock of the > > > socket we are redirecting to. > > sk_psock_verdict_apply() is also called by sk_psock_strp_read() after > > rcu_read_unlock(). This issue should not be limited to tls? > > The base case is covered because the non-TLS case is wrapped in > rcu_read_lock/unlock here, > > static void sk_psock_strp_data_ready(struct sock *sk) > { > struct sk_psock *psock; > > rcu_read_lock(); > psock = sk_psock(sk); > if (likely(psock)) { > if (tls_sw_has_ctx_rx(sk)) { > psock->parser.saved_data_ready(sk); > } else { > write_lock_bh(&sk->sk_callback_lock); > strp_data_ready(&psock->parser.strp); > write_unlock_bh(&sk->sk_callback_lock); > } > } > rcu_read_unlock(); > } > > There is a case that has existed for awhile where if a skb_clone() > fails or alloc_skb_for_msg() fails when building a merged skb. We > could call back into sk_psock_strp_read() from a workqueue in > strparser that would not be covered by above sk_psock_strp_data_ready(). > This would hit the sk_psock_verdict_apply() you caught above. > > We don't actually see this from selftests because in selftests we > always return skb->len indicating a msg is a single skb. In our > use cases this is all we've ever used to date so we've not actually > hit the case you call out. Another case that might hit this, based > on code review, is some of the zero copy TX cases. > > To fix the above case I think its best to submit another patch > because the Fixes tag will be different. Sound OK? I could push > them as a two patch series if that is easier to understand. Sorry not enough coffee this morning the fix here is enough for both cases I'll update the commit message. > > Also I should have a patch shortly to allow users to skip setting > up a parse_msg hook for the strparser. This helps because the > vast majority of use cases I've seen just use skb->len as the > message deliminator. It also bypasses above concern. > > Thanks, > John
diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 351afbf6bfba..070553fa3900 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -694,9 +694,11 @@ static void sk_psock_skb_redirect(struct sk_psock *psock, struct sk_buff *skb) kfree_skb(skb); return; } + rcu_read_lock(); psock_other = sk_psock(sk_other); if (!psock_other || sock_flag(sk_other, SOCK_DEAD) || !sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED)) { + rcu_read_unlock(); kfree_skb(skb); return; } @@ -713,6 +715,7 @@ static void sk_psock_skb_redirect(struct sk_psock *psock, struct sk_buff *skb) } else { kfree_skb(skb); } + rcu_read_unlock(); } static void sk_psock_tls_verdict_apply(struct sk_psock *psock,
Redirect on non-TLS sockmap side has RCU lock held from sockmap code path but when called from TLS this is no longer true. The RCU section is needed because we use rcu dereference to fetch the psock of the socket we are redirecting to. To fix the below splat wrap the redirect usage in rcu_read_lock, rcu_read_unlock. [ 1095.937597] WARNING: suspicious RCU usage [ 1095.940964] 5.7.0-rc7-02911-g463bac5f1ca79 #1 Tainted: G W [ 1095.944363] ----------------------------- [ 1095.947384] include/linux/skmsg.h:284 suspicious rcu_dereference_check() usage! [ 1095.950866] [ 1095.950866] other info that might help us debug this: [ 1095.950866] [ 1095.957146] [ 1095.957146] rcu_scheduler_active = 2, debug_locks = 1 [ 1095.961482] 1 lock held by test_sockmap/15970: [ 1095.964501] #0: ffff9ea6b25de660 (sk_lock-AF_INET){+.+.}-{0:0}, at: tls_sw_recvmsg+0x13a/0x840 [tls] [ 1095.968568] [ 1095.968568] stack backtrace: [ 1095.975001] CPU: 1 PID: 15970 Comm: test_sockmap Tainted: G W 5.7.0-rc7-02911-g463bac5f1ca79 #1 [ 1095.977883] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 1095.980519] Call Trace: [ 1095.982191] dump_stack+0x8f/0xd0 [ 1095.984040] sk_psock_skb_redirect+0xa6/0xf0 [ 1095.986073] sk_psock_tls_strp_read+0x1d8/0x250 [ 1095.988095] tls_sw_recvmsg+0x714/0x840 [tls] Fixes: e91de6afa81c1 ("bpf: Fix running sk_skb program types with ktls") Reported-by: Jakub Sitnicki <jakub@cloudflare.com> Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> --- net/core/skmsg.c | 3 +++ 1 file changed, 3 insertions(+)