[bpf,v2,3/6] bpf: sockhash fix omitted bucket lock in sock_close

First in tcp_close, reduce scope of sk_callback_lock() the lock is
only needed for protecting smap_release_sock() the ingress and cork
lists are protected by sock lock. Having the lock in wider scope is
harmless but may confuse the reader who may infer it is in fact
needed.

Next, in sock_hash_delete_elem() the pattern is as follows,

  sock_hash_delete_elem()
     [...]
     spin_lock(bucket_lock)
     l = lookup_elem_raw()
     if (l)
        hlist_del_rcu()
        write_lock(sk_callback_lock)
         .... destroy psock ...
        write_unlock(sk_callback_lock)
     spin_unlock(bucket_lock)

The ordering is necessary because we only know the {p}sock after
dereferencing the hash table which we can't do unless we have the
bucket lock held. Once we have the bucket lock and the psock element
it is deleted from the hashmap to ensure any other path doing a lookup
will fail. Finally, the refcnt is decremented and if zero the psock
is destroyed.

In parallel with the above (or free'ing the map) a tcp close event
may trigger tcp_close(). Which at the moment omits the bucket lock
altogether (oops!) where the flow looks like this,

  bpf_tcp_close()
     [...]
     write_lock(sk_callback_lock)
     for each psock->maps // list of maps this sock is part of
         hlist_del_rcu(ref_hash_node);
         .... destroy psock ...
     write_unlock(sk_callback_lock)

Obviously, and demonstrated by syzbot, this is broken because
we can have multiple threads deleting entries via hlist_del_rcu().

To fix this we might be tempted to wrap the hlist operation in a
bucket lock but that would create a lock inversion problem. In
summary to follow locking rules maps needs the sk_callback_lock but we
need the bucket lock to do the hlist_del_rcu. To resolve the lock
inversion problem note that when bpf_tcp_close is called no updates
can happen in parallel, due to ESTABLISH state check in update logic,
so pop the head of the list repeatedly and remove the reference until
no more are left. If a delete happens in parallel from the BPF API
that is OK as well because it will do a similar action, lookup the
sock in the map/hash, delete it from the map/hash, and dec the refcnt.
We check for this case before doing a destroy on the psock to ensure
we don't have two threads tearing down a psock. The new logic is
as follows,

  bpf_tcp_close()
  e = psock_map_pop(psock->maps) // done with sk_callback_lock
  bucket_lock() // lock hash list bucket
  l = lookup_elem_raw(head, hash, key, key_size);
  if (l) {
     //only get here if elmnt was not already removed
     hlist_del_rcu()
     ... destroy psock...
  }
  bucket_unlock()

And finally for all the above to work add missing sk_callback_lock
around smap_list_remove in sock_hash_ctx_update_elem(). Otherwise
delete and update may corrupt maps list.

(As an aside the sk_callback_lock serves two purposes. The
 first, is to update the sock callbacks sk_data_ready, sk_write_space,
 etc. The second is to protect the psock 'maps' list. The 'maps' list
 is used to (as shown above) to delete all map/hash references to a
 sock when the sock is closed)

(If we did not have the ESTABLISHED state guarantee from tcp_close
 then we could not ensure completion because updates could happen
 forever and pin thread in delete loop.)

Reported-by: syzbot+0ce137753c78f7b6acc1@syzkaller.appspotmail.com
Fixes: 81110384441a ("bpf: sockmap, add hash map support")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 0 files changed

Message ID	20180614164457.24994.33710.stgit@john-Precision-Tower-5810
State	Changes Requested, archived
Delegated to:	BPF Maintainers
Headers	show Return-Path: <netdev-owner@vger.kernel.org> X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=<UNKNOWN>) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 4168ct3Z4bz9s01 for <patchwork-incoming-netdev@ozlabs.org>; Fri, 15 Jun 2018 02:45:34 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964840AbeFNQpc (ORCPT <rfc822;patchwork-incoming-netdev@ozlabs.org>); Thu, 14 Jun 2018 12:45:32 -0400 Received: from [184.63.162.180] ([184.63.162.180]:55596 "EHLO john-Precision-Tower-5810" rhost-flags-FAIL-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S964808AbeFNQpa (ORCPT <rfc822;netdev@vger.kernel.org>); Thu, 14 Jun 2018 12:45:30 -0400 Received: from [127.0.1.1] (localhost [127.0.0.1]) by john-Precision-Tower-5810 (Postfix) with ESMTP id 1BA16D46623; Thu, 14 Jun 2018 09:44:57 -0700 (PDT) Subject: [bpf PATCH v2 3/6] bpf: sockhash fix omitted bucket lock in sock_close From: John Fastabend <john.fastabend@gmail.com> To: ast@kernel.org, daniel@iogearbox.net Cc: netdev@vger.kernel.org Date: Thu, 14 Jun 2018 09:44:57 -0700 Message-ID: <20180614164457.24994.33710.stgit@john-Precision-Tower-5810> In-Reply-To: <20180614164148.24994.65250.stgit@john-Precision-Tower-5810> References: <20180614164148.24994.65250.stgit@john-Precision-Tower-5810> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: <netdev.vger.kernel.org> X-Mailing-List: netdev@vger.kernel.org
Series	BPF fixes for sockhash \| expand [bpf,v2,0/6] BPF fixes for sockhash [bpf,v2,1/6] bpf: sockmap, fix crash when ipv6 sock is added [bpf,v2,2/6] bpf: sockmap only allow ESTABLISHED sock state [bpf,v2,3/6] bpf: sockhash fix omitted bucket lock in sock_close [bpf,v2,4/6] bpf: sockmap, tcp_disconnect to listen transition [bpf,v2,5/6] bpf: sockhash, add release routine [bpf,v2,6/6] bpf: selftest remove attempts to add LISTEN sockets to sockmap

[bpf,v2,3/6] bpf: sockhash fix omitted bucket lock in sock_close

Commit Message

Comments

Patch