From patchwork Thu Jun 14 16:44:46 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 929576 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 4168cg6p3Gz9s01 for ; Fri, 15 Jun 2018 02:45:23 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935796AbeFNQpV (ORCPT ); Thu, 14 Jun 2018 12:45:21 -0400 Received: from [184.63.162.180] ([184.63.162.180]:55574 "EHLO john-Precision-Tower-5810" rhost-flags-FAIL-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1755071AbeFNQpU (ORCPT ); Thu, 14 Jun 2018 12:45:20 -0400 Received: from [127.0.1.1] (localhost [127.0.0.1]) by john-Precision-Tower-5810 (Postfix) with ESMTP id F1B78D46623; Thu, 14 Jun 2018 09:44:46 -0700 (PDT) Subject: [bpf PATCH v2 1/6] bpf: sockmap, fix crash when ipv6 sock is added From: John Fastabend To: ast@kernel.org, daniel@iogearbox.net Cc: netdev@vger.kernel.org Date: Thu, 14 Jun 2018 09:44:46 -0700 Message-ID: <20180614164446.24994.22118.stgit@john-Precision-Tower-5810> In-Reply-To: <20180614164148.24994.65250.stgit@john-Precision-Tower-5810> References: <20180614164148.24994.65250.stgit@john-Precision-Tower-5810> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This fixes a crash where we assign tcp_prot to IPv6 sockets instead of tcpv6_prot. Previously we overwrote the sk->prot field with tcp_prot even in the AF_INET6 case. This patch ensures the correct tcp_prot and tcpv6_prot are used. Further, only allow ESTABLISHED connections to join the map per note in TLS ULP, /* The TLS ulp is currently supported only for TCP sockets * in ESTABLISHED state. * Supporting sockets in LISTEN state will require us * to modify the accept implementation to clone rather then * share the ulp context. */ Also tested with 'netserver -6' and 'netperf -H [IPv6]' as well as 'netperf -H [IPv4]'. The ESTABLISHED check resolves the previously crashing case here. Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support") Reported-by: syzbot+5c063698bdbfac19f363@syzkaller.appspotmail.com Signed-off-by: John Fastabend Signed-off-by: Wei Wang --- 0 files changed diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c index 52a91d8..f6dd4cd 100644 --- a/kernel/bpf/sockmap.c +++ b/kernel/bpf/sockmap.c @@ -140,6 +140,7 @@ static int bpf_tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, static int bpf_tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size); static int bpf_tcp_sendpage(struct sock *sk, struct page *page, int offset, size_t size, int flags); +static void bpf_tcp_close(struct sock *sk, long timeout); static inline struct smap_psock *smap_psock_sk(const struct sock *sk) { @@ -161,7 +162,42 @@ static bool bpf_tcp_stream_read(const struct sock *sk) return !empty; } -static struct proto tcp_bpf_proto; +enum { + SOCKMAP_IPV4, + SOCKMAP_IPV6, + SOCKMAP_NUM_PROTS, +}; + +enum { + SOCKMAP_BASE, + SOCKMAP_TX, + SOCKMAP_NUM_CONFIGS, +}; + +static struct proto *saved_tcpv6_prot; +static DEFINE_MUTEX(tcpv6_prot_mutex); +static struct proto bpf_tcp_prots[SOCKMAP_NUM_PROTS][SOCKMAP_NUM_CONFIGS]; +static void build_protos(struct proto prot[SOCKMAP_NUM_CONFIGS], + struct proto *base) +{ + prot[SOCKMAP_BASE] = *base; + prot[SOCKMAP_BASE].close = bpf_tcp_close; + prot[SOCKMAP_BASE].recvmsg = bpf_tcp_recvmsg; + prot[SOCKMAP_BASE].stream_memory_read = bpf_tcp_stream_read; + + prot[SOCKMAP_TX] = prot[SOCKMAP_BASE]; + prot[SOCKMAP_TX].sendmsg = bpf_tcp_sendmsg; + prot[SOCKMAP_TX].sendpage = bpf_tcp_sendpage; +} + +static void update_sk_prot(struct sock *sk, struct smap_psock *psock) +{ + int family = sk->sk_family == AF_INET6 ? SOCKMAP_IPV6 : SOCKMAP_IPV4; + int conf = psock->bpf_tx_msg ? SOCKMAP_TX : SOCKMAP_BASE; + + sk->sk_prot = &bpf_tcp_prots[family][conf]; +} + static int bpf_tcp_init(struct sock *sk) { struct smap_psock *psock; @@ -181,14 +217,17 @@ static int bpf_tcp_init(struct sock *sk) psock->save_close = sk->sk_prot->close; psock->sk_proto = sk->sk_prot; - if (psock->bpf_tx_msg) { - tcp_bpf_proto.sendmsg = bpf_tcp_sendmsg; - tcp_bpf_proto.sendpage = bpf_tcp_sendpage; - tcp_bpf_proto.recvmsg = bpf_tcp_recvmsg; - tcp_bpf_proto.stream_memory_read = bpf_tcp_stream_read; + /* Build IPv6 sockmap whenever the address of tcpv6_prot changes */ + if (sk->sk_family == AF_INET6 && + unlikely(sk->sk_prot != smp_load_acquire(&saved_tcpv6_prot))) { + mutex_lock(&tcpv6_prot_mutex); + if (likely(sk->sk_prot != saved_tcpv6_prot)) { + build_protos(bpf_tcp_prots[SOCKMAP_IPV6], sk->sk_prot); + smp_store_release(&saved_tcpv6_prot, sk->sk_prot); + } + mutex_unlock(&tcpv6_prot_mutex); } - - sk->sk_prot = &tcp_bpf_proto; + update_sk_prot(sk, psock); rcu_read_unlock(); return 0; } @@ -1111,8 +1150,7 @@ static void bpf_tcp_msg_add(struct smap_psock *psock, static int bpf_tcp_ulp_register(void) { - tcp_bpf_proto = tcp_prot; - tcp_bpf_proto.close = bpf_tcp_close; + build_protos(bpf_tcp_prots[SOCKMAP_IPV4], &tcp_prot); /* Once BPF TX ULP is registered it is never unregistered. It * will be in the ULP list for the lifetime of the system. Doing * duplicate registers is not a problem. From patchwork Thu Jun 14 16:44:52 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 929577 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 4168cm496Tz9s1B for ; Fri, 15 Jun 2018 02:45:28 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964807AbeFNQp0 (ORCPT ); Thu, 14 Jun 2018 12:45:26 -0400 Received: from [184.63.162.180] ([184.63.162.180]:55584 "EHLO john-Precision-Tower-5810" rhost-flags-FAIL-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1755166AbeFNQpZ (ORCPT ); Thu, 14 Jun 2018 12:45:25 -0400 Received: from [127.0.1.1] (localhost [127.0.0.1]) by john-Precision-Tower-5810 (Postfix) with ESMTP id 0B4FCD464F8; Thu, 14 Jun 2018 09:44:52 -0700 (PDT) Subject: [bpf PATCH v2 2/6] bpf: sockmap only allow ESTABLISHED sock state From: John Fastabend To: ast@kernel.org, daniel@iogearbox.net Cc: netdev@vger.kernel.org Date: Thu, 14 Jun 2018 09:44:52 -0700 Message-ID: <20180614164451.24994.31096.stgit@john-Precision-Tower-5810> In-Reply-To: <20180614164148.24994.65250.stgit@john-Precision-Tower-5810> References: <20180614164148.24994.65250.stgit@john-Precision-Tower-5810> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Per the note in the TLS ULP (which is actually a generic statement regarding ULPs) /* The TLS ulp is currently supported only for TCP sockets * in ESTABLISHED state. * Supporting sockets in LISTEN state will require us * to modify the accept implementation to clone rather then * share the ulp context. */ After this patch we only allow socks that are in ESTABLISHED state or are being added via a sock_ops event that is transitioning into an ESTABLISHED state. By allowing sock_ops events we allow users to manage sockmaps directly from sock ops programs. The two supported sock_ops ops are BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB and BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB. >From the userspace BPF update API the sock lock is also taken now to ensure we don't race with state changes after the ESTABLISHED check. The BPF program sock ops hook already has the sock lock taken. Also tested with 'netserver -6' and 'netperf -H [IPv6]' as well as 'netperf -H [IPv4]'. Reported-by: Eric Dumazet Signed-off-by: John Fastabend Acked-by: Martin KaFai Lau --- 0 files changed diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c index f6dd4cd..f1ab52d 100644 --- a/kernel/bpf/sockmap.c +++ b/kernel/bpf/sockmap.c @@ -1976,13 +1976,20 @@ static int sock_map_update_elem(struct bpf_map *map, return -EINVAL; } + lock_sock(skops.sk); + /* ULPs are currently supported only for TCP sockets in ESTABLISHED + * state. + */ if (skops.sk->sk_type != SOCK_STREAM || - skops.sk->sk_protocol != IPPROTO_TCP) { - fput(socket->file); - return -EOPNOTSUPP; + skops.sk->sk_protocol != IPPROTO_TCP || + skops.sk->sk_state != TCP_ESTABLISHED) { + err = -EOPNOTSUPP; + goto out; } err = sock_map_ctx_update_elem(&skops, map, key, flags); +out: + release_sock(skops.sk); fput(socket->file); return err; } @@ -2247,10 +2254,6 @@ static int sock_hash_ctx_update_elem(struct bpf_sock_ops_kern *skops, sock = skops->sk; - if (sock->sk_type != SOCK_STREAM || - sock->sk_protocol != IPPROTO_TCP) - return -EOPNOTSUPP; - if (unlikely(map_flags > BPF_EXIST)) return -EINVAL; @@ -2338,7 +2341,20 @@ static int sock_hash_update_elem(struct bpf_map *map, return -EINVAL; } + lock_sock(skops.sk); + /* ULPs are currently supported only for TCP sockets in ESTABLISHED + * state. + */ + if (skops.sk->sk_type != SOCK_STREAM || + skops.sk->sk_protocol != IPPROTO_TCP || + skops.sk->sk_state != TCP_ESTABLISHED) { + err = -EOPNOTSUPP; + goto out; + } + err = sock_hash_ctx_update_elem(&skops, map, key, flags); +out: + release_sock(skops.sk); fput(socket->file); return err; } @@ -2423,10 +2439,19 @@ struct sock *__sock_hash_lookup_elem(struct bpf_map *map, void *key) .map_delete_elem = sock_hash_delete_elem, }; +static bool bpf_is_valid_sock(struct bpf_sock_ops_kern *ops) +{ + return ops->op == BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB || + ops->op == BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB; +} + BPF_CALL_4(bpf_sock_map_update, struct bpf_sock_ops_kern *, bpf_sock, struct bpf_map *, map, void *, key, u64, flags) { WARN_ON_ONCE(!rcu_read_lock_held()); + + if (!bpf_is_valid_sock(bpf_sock)) + return -EOPNOTSUPP; return sock_map_ctx_update_elem(bpf_sock, map, key, flags); } @@ -2445,6 +2470,9 @@ struct sock *__sock_hash_lookup_elem(struct bpf_map *map, void *key) struct bpf_map *, map, void *, key, u64, flags) { WARN_ON_ONCE(!rcu_read_lock_held()); + + if (!bpf_is_valid_sock(bpf_sock)) + return -EOPNOTSUPP; return sock_hash_ctx_update_elem(bpf_sock, map, key, flags); } From patchwork Thu Jun 14 16:44:57 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 929578 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 4168ct3Z4bz9s01 for ; Fri, 15 Jun 2018 02:45:34 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964840AbeFNQpc (ORCPT ); Thu, 14 Jun 2018 12:45:32 -0400 Received: from [184.63.162.180] ([184.63.162.180]:55596 "EHLO john-Precision-Tower-5810" rhost-flags-FAIL-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S964808AbeFNQpa (ORCPT ); Thu, 14 Jun 2018 12:45:30 -0400 Received: from [127.0.1.1] (localhost [127.0.0.1]) by john-Precision-Tower-5810 (Postfix) with ESMTP id 1BA16D46623; Thu, 14 Jun 2018 09:44:57 -0700 (PDT) Subject: [bpf PATCH v2 3/6] bpf: sockhash fix omitted bucket lock in sock_close From: John Fastabend To: ast@kernel.org, daniel@iogearbox.net Cc: netdev@vger.kernel.org Date: Thu, 14 Jun 2018 09:44:57 -0700 Message-ID: <20180614164457.24994.33710.stgit@john-Precision-Tower-5810> In-Reply-To: <20180614164148.24994.65250.stgit@john-Precision-Tower-5810> References: <20180614164148.24994.65250.stgit@john-Precision-Tower-5810> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org First in tcp_close, reduce scope of sk_callback_lock() the lock is only needed for protecting smap_release_sock() the ingress and cork lists are protected by sock lock. Having the lock in wider scope is harmless but may confuse the reader who may infer it is in fact needed. Next, in sock_hash_delete_elem() the pattern is as follows, sock_hash_delete_elem() [...] spin_lock(bucket_lock) l = lookup_elem_raw() if (l) hlist_del_rcu() write_lock(sk_callback_lock) .... destroy psock ... write_unlock(sk_callback_lock) spin_unlock(bucket_lock) The ordering is necessary because we only know the {p}sock after dereferencing the hash table which we can't do unless we have the bucket lock held. Once we have the bucket lock and the psock element it is deleted from the hashmap to ensure any other path doing a lookup will fail. Finally, the refcnt is decremented and if zero the psock is destroyed. In parallel with the above (or free'ing the map) a tcp close event may trigger tcp_close(). Which at the moment omits the bucket lock altogether (oops!) where the flow looks like this, bpf_tcp_close() [...] write_lock(sk_callback_lock) for each psock->maps // list of maps this sock is part of hlist_del_rcu(ref_hash_node); .... destroy psock ... write_unlock(sk_callback_lock) Obviously, and demonstrated by syzbot, this is broken because we can have multiple threads deleting entries via hlist_del_rcu(). To fix this we might be tempted to wrap the hlist operation in a bucket lock but that would create a lock inversion problem. In summary to follow locking rules maps needs the sk_callback_lock but we need the bucket lock to do the hlist_del_rcu. To resolve the lock inversion problem note that when bpf_tcp_close is called no updates can happen in parallel, due to ESTABLISH state check in update logic, so pop the head of the list repeatedly and remove the reference until no more are left. If a delete happens in parallel from the BPF API that is OK as well because it will do a similar action, lookup the sock in the map/hash, delete it from the map/hash, and dec the refcnt. We check for this case before doing a destroy on the psock to ensure we don't have two threads tearing down a psock. The new logic is as follows, bpf_tcp_close() e = psock_map_pop(psock->maps) // done with sk_callback_lock bucket_lock() // lock hash list bucket l = lookup_elem_raw(head, hash, key, key_size); if (l) { //only get here if elmnt was not already removed hlist_del_rcu() ... destroy psock... } bucket_unlock() And finally for all the above to work add missing sk_callback_lock around smap_list_remove in sock_hash_ctx_update_elem(). Otherwise delete and update may corrupt maps list. (As an aside the sk_callback_lock serves two purposes. The first, is to update the sock callbacks sk_data_ready, sk_write_space, etc. The second is to protect the psock 'maps' list. The 'maps' list is used to (as shown above) to delete all map/hash references to a sock when the sock is closed) (If we did not have the ESTABLISHED state guarantee from tcp_close then we could not ensure completion because updates could happen forever and pin thread in delete loop.) Reported-by: syzbot+0ce137753c78f7b6acc1@syzkaller.appspotmail.com Fixes: 81110384441a ("bpf: sockmap, add hash map support") Signed-off-by: John Fastabend --- 0 files changed diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c index f1ab52d..04764f5 100644 --- a/kernel/bpf/sockmap.c +++ b/kernel/bpf/sockmap.c @@ -258,16 +258,54 @@ static void bpf_tcp_release(struct sock *sk) rcu_read_unlock(); } +static struct htab_elem *lookup_elem_raw(struct hlist_head *head, + u32 hash, void *key, u32 key_size) +{ + struct htab_elem *l; + + hlist_for_each_entry_rcu(l, head, hash_node) { + if (l->hash == hash && !memcmp(&l->key, key, key_size)) + return l; + } + + return NULL; +} + +static inline struct bucket *__select_bucket(struct bpf_htab *htab, u32 hash) +{ + return &htab->buckets[hash & (htab->n_buckets - 1)]; +} + +static inline struct hlist_head *select_bucket(struct bpf_htab *htab, u32 hash) +{ + return &__select_bucket(htab, hash)->head; +} + static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l) { atomic_dec(&htab->count); kfree_rcu(l, rcu); } +struct smap_psock_map_entry *psock_map_pop(struct sock *sk, + struct smap_psock *psock) +{ + struct smap_psock_map_entry *e; + + write_lock_bh(&sk->sk_callback_lock); + e = list_first_entry_or_null(&psock->maps, + struct smap_psock_map_entry, + list); + if (e) + list_del(&e->list); + write_unlock_bh(&sk->sk_callback_lock); + return e; +} + static void bpf_tcp_close(struct sock *sk, long timeout) { void (*close_fun)(struct sock *sk, long timeout); - struct smap_psock_map_entry *e, *tmp; + struct smap_psock_map_entry *e; struct sk_msg_buff *md, *mtmp; struct smap_psock *psock; struct sock *osk; @@ -286,7 +324,6 @@ static void bpf_tcp_close(struct sock *sk, long timeout) */ close_fun = psock->save_close; - write_lock_bh(&sk->sk_callback_lock); if (psock->cork) { free_start_sg(psock->sock, psock->cork); kfree(psock->cork); @@ -299,20 +336,48 @@ static void bpf_tcp_close(struct sock *sk, long timeout) kfree(md); } - list_for_each_entry_safe(e, tmp, &psock->maps, list) { + /* Sock is in TCP_CLOSE state so any concurrent adds or updates will be + * blocked by ESTABLISHED check. However, tcp_close() + delete + free + * can all run at the same time. If a tcp_close + delete happens each + * code path will remove the entry for the map/hash before deleting it. + * In the map case a xchg and then check to verify we have a sk protects + * two paths from tearing down the same object. For hash map we lock the + * bucket and remove the object from the hash map before destroying to + * ensure that only one reference exists. By pulling object off the head + * of the list with (with sk_callback_lock) if multiple deleters are + * running we avoid duplicate references. + */ + e = psock_map_pop(sk, psock); + while (e) { if (e->entry) { osk = cmpxchg(e->entry, sk, NULL); if (osk == sk) { - list_del(&e->list); smap_release_sock(psock, sk); } } else { - hlist_del_rcu(&e->hash_link->hash_node); - smap_release_sock(psock, e->hash_link->sk); - free_htab_elem(e->htab, e->hash_link); + struct htab_elem *link = e->hash_link; + struct hlist_head *head; + struct htab_elem *l; + struct bucket *b; + + b = __select_bucket(e->htab, link->hash); + head = &b->head; + raw_spin_lock_bh(&b->lock); + l = lookup_elem_raw(head, + link->hash, link->key, + e->htab->elem_size); + /* If another thread deleted this object skip deletion. + * The refcnt on psock may or may not be zero. + */ + if (l) { + hlist_del_rcu(&e->hash_link->hash_node); + smap_release_sock(psock, e->hash_link->sk); + free_htab_elem(e->htab, e->hash_link); + } + raw_spin_unlock_bh(&b->lock); } + e = psock_map_pop(sk, psock); } - write_unlock_bh(&sk->sk_callback_lock); rcu_read_unlock(); close_fun(sk, timeout); } @@ -2088,16 +2153,6 @@ static struct bpf_map *sock_hash_alloc(union bpf_attr *attr) return ERR_PTR(err); } -static inline struct bucket *__select_bucket(struct bpf_htab *htab, u32 hash) -{ - return &htab->buckets[hash & (htab->n_buckets - 1)]; -} - -static inline struct hlist_head *select_bucket(struct bpf_htab *htab, u32 hash) -{ - return &__select_bucket(htab, hash)->head; -} - static void sock_hash_free(struct bpf_map *map) { struct bpf_htab *htab = container_of(map, struct bpf_htab, map); @@ -2114,10 +2169,13 @@ static void sock_hash_free(struct bpf_map *map) */ rcu_read_lock(); for (i = 0; i < htab->n_buckets; i++) { - struct hlist_head *head = select_bucket(htab, i); + struct bucket *b = __select_bucket(htab, i); + struct hlist_head *head; struct hlist_node *n; struct htab_elem *l; + raw_spin_lock_bh(&b->lock); + head = &b->head; hlist_for_each_entry_safe(l, n, head, hash_node) { struct sock *sock = l->sk; struct smap_psock *psock; @@ -2137,6 +2195,7 @@ static void sock_hash_free(struct bpf_map *map) write_unlock_bh(&sock->sk_callback_lock); kfree(l); } + raw_spin_unlock_bh(&b->lock); } rcu_read_unlock(); bpf_map_area_free(htab->buckets); @@ -2167,19 +2226,6 @@ static struct htab_elem *alloc_sock_hash_elem(struct bpf_htab *htab, return l_new; } -static struct htab_elem *lookup_elem_raw(struct hlist_head *head, - u32 hash, void *key, u32 key_size) -{ - struct htab_elem *l; - - hlist_for_each_entry_rcu(l, head, hash_node) { - if (l->hash == hash && !memcmp(&l->key, key, key_size)) - return l; - } - - return NULL; -} - static inline u32 htab_map_hash(const void *key, u32 key_len) { return jhash(key, key_len, 0); @@ -2307,8 +2353,10 @@ static int sock_hash_ctx_update_elem(struct bpf_sock_ops_kern *skops, psock = smap_psock_sk(l_old->sk); hlist_del_rcu(&l_old->hash_node); + write_lock_bh(&l_old->sk->sk_callback_lock); smap_list_remove(psock, NULL, l_old); smap_release_sock(psock, l_old->sk); + write_unlock_bh(&l_old->sk->sk_callback_lock); free_htab_elem(htab, l_old); } raw_spin_unlock_bh(&b->lock); From patchwork Thu Jun 14 16:45:02 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 929579 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 4168cz1WYWz9s1B for ; Fri, 15 Jun 2018 02:45:39 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964856AbeFNQpg (ORCPT ); Thu, 14 Jun 2018 12:45:36 -0400 Received: from [184.63.162.180] ([184.63.162.180]:55604 "EHLO john-Precision-Tower-5810" rhost-flags-FAIL-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S964808AbeFNQpf (ORCPT ); Thu, 14 Jun 2018 12:45:35 -0400 Received: from [127.0.1.1] (localhost [127.0.0.1]) by john-Precision-Tower-5810 (Postfix) with ESMTP id 2A9EED464F8; Thu, 14 Jun 2018 09:45:02 -0700 (PDT) Subject: [bpf PATCH v2 4/6] bpf: sockmap, tcp_disconnect to listen transition From: John Fastabend To: ast@kernel.org, daniel@iogearbox.net Cc: netdev@vger.kernel.org Date: Thu, 14 Jun 2018 09:45:02 -0700 Message-ID: <20180614164502.24994.38682.stgit@john-Precision-Tower-5810> In-Reply-To: <20180614164148.24994.65250.stgit@john-Precision-Tower-5810> References: <20180614164148.24994.65250.stgit@john-Precision-Tower-5810> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org After adding checks to ensure TCP is in ESTABLISHED state when a sock is added we need to also ensure that user does not transition through tcp_disconnect() and back into ESTABLISHED state without sockmap removing the sock. To do this add unhash hook and remove sock from map there. Reported-by: Eric Dumazet Fixes: 81110384441a ("bpf: sockmap, add hash map support") Signed-off-by: John Fastabend Acked-by: Martin KaFai Lau --- 0 files changed diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c index 04764f5..ffc5152 100644 --- a/kernel/bpf/sockmap.c +++ b/kernel/bpf/sockmap.c @@ -130,6 +130,7 @@ struct smap_psock { struct proto *sk_proto; void (*save_close)(struct sock *sk, long timeout); + void (*save_unhash)(struct sock *sk); void (*save_data_ready)(struct sock *sk); void (*save_write_space)(struct sock *sk); }; @@ -141,6 +142,7 @@ static int bpf_tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, static int bpf_tcp_sendpage(struct sock *sk, struct page *page, int offset, size_t size, int flags); static void bpf_tcp_close(struct sock *sk, long timeout); +static void bpf_tcp_unhash(struct sock *sk); static inline struct smap_psock *smap_psock_sk(const struct sock *sk) { @@ -182,6 +184,7 @@ static void build_protos(struct proto prot[SOCKMAP_NUM_CONFIGS], { prot[SOCKMAP_BASE] = *base; prot[SOCKMAP_BASE].close = bpf_tcp_close; + prot[SOCKMAP_BASE].unhash = bpf_tcp_unhash; prot[SOCKMAP_BASE].recvmsg = bpf_tcp_recvmsg; prot[SOCKMAP_BASE].stream_memory_read = bpf_tcp_stream_read; @@ -215,6 +218,7 @@ static int bpf_tcp_init(struct sock *sk) } psock->save_close = sk->sk_prot->close; + psock->save_unhash = sk->sk_prot->unhash; psock->sk_proto = sk->sk_prot; /* Build IPv6 sockmap whenever the address of tcpv6_prot changes */ @@ -302,28 +306,12 @@ struct smap_psock_map_entry *psock_map_pop(struct sock *sk, return e; } -static void bpf_tcp_close(struct sock *sk, long timeout) +static void bpf_tcp_remove(struct sock *sk, struct smap_psock *psock) { - void (*close_fun)(struct sock *sk, long timeout); struct smap_psock_map_entry *e; struct sk_msg_buff *md, *mtmp; - struct smap_psock *psock; struct sock *osk; - rcu_read_lock(); - psock = smap_psock_sk(sk); - if (unlikely(!psock)) { - rcu_read_unlock(); - return sk->sk_prot->close(sk, timeout); - } - - /* The psock may be destroyed anytime after exiting the RCU critial - * section so by the time we use close_fun the psock may no longer - * be valid. However, bpf_tcp_close is called with the sock lock - * held so the close hook and sk are still valid. - */ - close_fun = psock->save_close; - if (psock->cork) { free_start_sg(psock->sock, psock->cork); kfree(psock->cork); @@ -378,6 +366,51 @@ static void bpf_tcp_close(struct sock *sk, long timeout) } e = psock_map_pop(sk, psock); } +} + +static void bpf_tcp_unhash(struct sock *sk) +{ + void (*unhash_fun)(struct sock *sk); + struct smap_psock *psock; + + rcu_read_lock(); + psock = smap_psock_sk(sk); + if (unlikely(!psock)) { + rcu_read_unlock(); + return sk->sk_prot->unhash(sk); + } + + /* The psock may be destroyed anytime after exiting the RCU critial + * section so by the time we use close_fun the psock may no longer + * be valid. However, bpf_tcp_close is called with the sock lock + * held so the close hook and sk are still valid. + */ + unhash_fun = psock->save_unhash; + bpf_tcp_remove(sk, psock); + rcu_read_unlock(); + unhash_fun(sk); + +} + +static void bpf_tcp_close(struct sock *sk, long timeout) +{ + void (*close_fun)(struct sock *sk, long timeout); + struct smap_psock *psock; + + rcu_read_lock(); + psock = smap_psock_sk(sk); + if (unlikely(!psock)) { + rcu_read_unlock(); + return sk->sk_prot->close(sk, timeout); + } + + /* The psock may be destroyed anytime after exiting the RCU critial + * section so by the time we use close_fun the psock may no longer + * be valid. However, bpf_tcp_close is called with the sock lock + * held so the close hook and sk are still valid. + */ + close_fun = psock->save_close; + bpf_tcp_remove(sk, psock); rcu_read_unlock(); close_fun(sk, timeout); } From patchwork Thu Jun 14 16:45:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 929580 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 4168d24hWvz9s01 for ; Fri, 15 Jun 2018 02:45:42 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964868AbeFNQpl (ORCPT ); Thu, 14 Jun 2018 12:45:41 -0400 Received: from [184.63.162.180] ([184.63.162.180]:55618 "EHLO john-Precision-Tower-5810" rhost-flags-FAIL-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S964808AbeFNQpk (ORCPT ); Thu, 14 Jun 2018 12:45:40 -0400 Received: from [127.0.1.1] (localhost [127.0.0.1]) by john-Precision-Tower-5810 (Postfix) with ESMTP id 388EBD46623; Thu, 14 Jun 2018 09:45:07 -0700 (PDT) Subject: [bpf PATCH v2 5/6] bpf: sockhash, add release routine From: John Fastabend To: ast@kernel.org, daniel@iogearbox.net Cc: netdev@vger.kernel.org Date: Thu, 14 Jun 2018 09:45:07 -0700 Message-ID: <20180614164507.24994.83616.stgit@john-Precision-Tower-5810> In-Reply-To: <20180614164148.24994.65250.stgit@john-Precision-Tower-5810> References: <20180614164148.24994.65250.stgit@john-Precision-Tower-5810> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Add map_release_uref pointer to hashmap ops. This was dropped when original sockhash code was ported into bpf-next before initial commit. Fixes: 81110384441a ("bpf: sockmap, add hash map support") Signed-off-by: John Fastabend Acked-by: Martin KaFai Lau --- 0 files changed diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c index ffc5152..77fe204 100644 --- a/kernel/bpf/sockmap.c +++ b/kernel/bpf/sockmap.c @@ -2518,6 +2518,7 @@ struct sock *__sock_hash_lookup_elem(struct bpf_map *map, void *key) .map_get_next_key = sock_hash_get_next_key, .map_update_elem = sock_hash_update_elem, .map_delete_elem = sock_hash_delete_elem, + .map_release_uref = sock_map_release, }; static bool bpf_is_valid_sock(struct bpf_sock_ops_kern *ops) From patchwork Thu Jun 14 16:45:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 929581 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 4168d84sJRz9s3q for ; Fri, 15 Jun 2018 02:45:48 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964885AbeFNQpq (ORCPT ); Thu, 14 Jun 2018 12:45:46 -0400 Received: from [184.63.162.180] ([184.63.162.180]:55626 "EHLO john-Precision-Tower-5810" rhost-flags-FAIL-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S964808AbeFNQpp (ORCPT ); Thu, 14 Jun 2018 12:45:45 -0400 Received: from [127.0.1.1] (localhost [127.0.0.1]) by john-Precision-Tower-5810 (Postfix) with ESMTP id 47D0FD464F8; Thu, 14 Jun 2018 09:45:12 -0700 (PDT) Subject: [bpf PATCH v2 6/6] bpf: selftest remove attempts to add LISTEN sockets to sockmap From: John Fastabend To: ast@kernel.org, daniel@iogearbox.net Cc: netdev@vger.kernel.org Date: Thu, 14 Jun 2018 09:45:12 -0700 Message-ID: <20180614164512.24994.56879.stgit@john-Precision-Tower-5810> In-Reply-To: <20180614164148.24994.65250.stgit@john-Precision-Tower-5810> References: <20180614164148.24994.65250.stgit@john-Precision-Tower-5810> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org In selftest test_maps the sockmap test case attempts to add a socket in listening state to the sockmap. This is no longer a valid operation so it fails as expected. However, the test wrongly reports this as an error now. Fix the test to avoid adding sockets in listening state. Signed-off-by: John Fastabend Acked-by: Martin KaFai Lau --- 0 files changed diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c index 6c25334..9fed5f0 100644 --- a/tools/testing/selftests/bpf/test_maps.c +++ b/tools/testing/selftests/bpf/test_maps.c @@ -564,7 +564,7 @@ static void test_sockmap(int tasks, void *data) } /* Test update without programs */ - for (i = 0; i < 6; i++) { + for (i = 2; i < 6; i++) { err = bpf_map_update_elem(fd, &i, &sfd[i], BPF_ANY); if (err) { printf("Failed noprog update sockmap '%i:%i'\n", @@ -727,7 +727,7 @@ static void test_sockmap(int tasks, void *data) } /* Test map update elem afterwards fd lives in fd and map_fd */ - for (i = 0; i < 6; i++) { + for (i = 2; i < 6; i++) { err = bpf_map_update_elem(map_fd_rx, &i, &sfd[i], BPF_ANY); if (err) { printf("Failed map_fd_rx update sockmap %i '%i:%i'\n",