mbox series

[bpf,v4,00/14] sockmap/tls fixes

Message ID 20190719172927.18181-1-jakub.kicinski@netronome.com
Headers show
Series sockmap/tls fixes | expand

Message

Jakub Kicinski July 19, 2019, 5:29 p.m. UTC
John says:

Resolve a series of splats discovered by syzbot and an unhash
TLS issue noted by Eric Dumazet.

The main issues revolved around interaction between TLS and
sockmap tear down. TLS and sockmap could both reset sk->prot
ops creating a condition where a close or unhash op could be
called forever. A rare race condition resulting from a missing
rcu sync operation was causing a use after free. Then on the
TLS side dropping the sock lock and re-acquiring it during the
close op could hang. Finally, sockmap must be deployed before
tls for current stack assumptions to be met. This is enforced
now. A feature series can enable it.

To fix this first refactor TLS code so the lock is held for the
entire teardown operation. Then add an unhash callback to ensure
TLS can not transition from ESTABLISHED to LISTEN state. This
transition is a similar bug to the one found and fixed previously
in sockmap. Then apply three fixes to sockmap to fix up races
on tear down around map free and close. Finally, if sockmap
is destroyed before TLS we add a new ULP op update to inform
the TLS stack it should not call sockmap ops. This last one
appears to be the most commonly found issue from syzbot.

v4:
 - fix some use after frees;
 - disable disconnect work for offload (ctx lifetime is much
   more complex);
 - remove some of the dead code which made it hard to understand
   (for me) that things work correctly (e.g. the checks TLS is
   the top ULP);
 - add selftets.

Jakub Kicinski (7):
  net/tls: don't arm strparser immediately in tls_set_sw_offload()
  net/tls: don't call tls_sk_proto_close for hw record offload
  selftests/tls: add a test for ULP but no keys
  selftests/tls: test error codes around TLS ULP installation
  selftests/tls: add a bidirectional test
  selftests/tls: close the socket with open record
  selftests/tls: add shutdown tests

John Fastabend (7):
  net/tls: remove close callback sock unlock/lock around TX work flush
  net/tls: remove sock unlock/lock around strp_done()
  net/tls: fix transition through disconnect with close
  bpf: sockmap, sock_map_delete needs to use xchg
  bpf: sockmap, synchronize_rcu before free'ing map
  bpf: sockmap, only create entry if ulp is not already enabled
  bpf: sockmap/tls, close can race with map free

 Documentation/networking/tls-offload.rst |   6 +
 include/linux/skmsg.h                    |   8 +-
 include/net/tcp.h                        |   3 +
 include/net/tls.h                        |  15 +-
 net/core/skmsg.c                         |   4 +-
 net/core/sock_map.c                      |  19 ++-
 net/ipv4/tcp_ulp.c                       |  13 ++
 net/tls/tls_main.c                       | 142 +++++++++++++----
 net/tls/tls_sw.c                         |  83 +++++++---
 tools/testing/selftests/net/tls.c        | 194 +++++++++++++++++++++++
 10 files changed, 419 insertions(+), 68 deletions(-)

Comments

Jakub Kicinski July 19, 2019, 5:37 p.m. UTC | #1
On Fri, 19 Jul 2019 10:29:13 -0700, Jakub Kicinski wrote:
> John says:
> 
> Resolve a series of splats discovered by syzbot and an unhash
> TLS issue noted by Eric Dumazet.

Sorry for the delay, this code is quite tricky. According to my testing
TLS SW and HW should now work, I hope I didn't regress things on the
sockmap side.

This is not solving all the issues (ugh), apart from HW needing the
unhash/shutdown treatment, as discussed we may have a sender stuck in
wmem wait while we free context underneath. That's a "minor" UAF for
another day..
Daniel Borkmann July 22, 2019, 2:22 p.m. UTC | #2
On 7/19/19 7:37 PM, Jakub Kicinski wrote:
> On Fri, 19 Jul 2019 10:29:13 -0700, Jakub Kicinski wrote:
>> John says:
>>
>> Resolve a series of splats discovered by syzbot and an unhash
>> TLS issue noted by Eric Dumazet.
> 
> Sorry for the delay, this code is quite tricky. According to my testing
> TLS SW and HW should now work, I hope I didn't regress things on the
> sockmap side.

Applied, thanks everyone!
John Fastabend July 22, 2019, 3:46 p.m. UTC | #3
Jakub Kicinski wrote:
> On Fri, 19 Jul 2019 10:29:13 -0700, Jakub Kicinski wrote:
> > John says:
> > 
> > Resolve a series of splats discovered by syzbot and an unhash
> > TLS issue noted by Eric Dumazet.
> 
> Sorry for the delay, this code is quite tricky. According to my testing
> TLS SW and HW should now work, I hope I didn't regress things on the
> sockmap side.

I'll run it through our CI as well but looks good to me. Thanks a lot
for getting this finished up.

> 
> This is not solving all the issues (ugh), apart from HW needing the
> unhash/shutdown treatment, as discussed we may have a sender stuck in
> wmem wait while we free context underneath. That's a "minor" UAF for
> another day..

Agreed. But this should solve most of the syzbot issues and a few
crashes we saw in testing real workloads.

Thanks,
John
John Fastabend July 22, 2019, 3:48 p.m. UTC | #4
Daniel Borkmann wrote:
> On 7/19/19 7:37 PM, Jakub Kicinski wrote:
> > On Fri, 19 Jul 2019 10:29:13 -0700, Jakub Kicinski wrote:
> >> John says:
> >>
> >> Resolve a series of splats discovered by syzbot and an unhash
> >> TLS issue noted by Eric Dumazet.
> > 
> > Sorry for the delay, this code is quite tricky. According to my testing
> > TLS SW and HW should now work, I hope I didn't regress things on the
> > sockmap side.
> 
> Applied, thanks everyone!

Thanks Jakub, for the patches without my signed-off already

Acked-by: John Fastabend <john.fastabend@gmail.com>