From patchwork Fri Jan 12 18:10:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 860137 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="GKAZeWQq"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3zJ9ly6D1Nz9sNc for ; Sat, 13 Jan 2018 05:10:54 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965191AbeALSKx (ORCPT ); Fri, 12 Jan 2018 13:10:53 -0500 Received: from mail-pf0-f196.google.com ([209.85.192.196]:35319 "EHLO mail-pf0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S964962AbeALSKu (ORCPT ); Fri, 12 Jan 2018 13:10:50 -0500 Received: by mail-pf0-f196.google.com with SMTP id t12so4956146pfg.2 for ; Fri, 12 Jan 2018 10:10:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=2pIPDW3fM8zo8Zhxnu7TEX8KkpwONKjcCi3VLNYq4TI=; b=GKAZeWQq4o+toydXCIKCWCDz5GFjsW0b3dbccDhH3rGTAswkierL/oUekQ4Mpjncrl 5K6Jc7PxG445mZ3EsyJD8cY5yM+eLa96BpLclBzmE4U/HFjO+MioqRPNOGZmM3IuEU0A /nbZuXyuYoms81cXQO3UED04FCY66pFNOTb/WlBqoMQMIAW85rMCK4uAOX6C8eJ0RCzM fz7fnzsaYIzIX28P+3q1/iuz1QNSMvtnziNYtQjyiXT8EGzdF7ReNeqmibcsONBevZK8 UNmgvaRgks4UrnKdM+L+i87/1yrnp72VFas1KboC0Va58oT+7LgcCIptO7L3HeAd1QZr hLqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=2pIPDW3fM8zo8Zhxnu7TEX8KkpwONKjcCi3VLNYq4TI=; b=PsxSo1vqyYjNUnotliudVdinwPWW1gx21HepP/tKpamxybiYsvhIre+fQfKjWtY6s/ rctm3foFwA9uAzi9L3f3K+9316ZU0ZwivZ3Znzp2vyJTzCdmJKEziNd5nqdbkn/y5q7q 4Rcj/fK+/6JHH2+tXYDRV5/M2OWrYZ92DvKzwsP8RPLlMyXEFI/IeK4JHUcCOB4sckYa 93N2TwcOdbeKOvVtg7gePkVjYlMThRzssG/voy7lWDjY+vsjRRsfekDth/sQU4SeijT6 g0c1R2c8Xn/TxvDJa30mEkRwry/X26CfR3138jeETjApSUfQBOC729vebQ3kH1ZxDKIa oirg== X-Gm-Message-State: AKwxyteKrtfwB2LuqbtCUIB76dVyv4dpgSMDmnzAsm3SbxQXrMn6dkpx ZJU5yP9r6058w7gCNDNgxTljug== X-Google-Smtp-Source: ACJfBotBFy4jK0FNN8ta1bPlB5qZAp6oO4697dNtS65CnGqLpqFZ/zThNcDJ/VsXkOf56VOb0OFVFw== X-Received: by 10.98.63.214 with SMTP id z83mr11818678pfj.95.1515780649455; Fri, 12 Jan 2018 10:10:49 -0800 (PST) Received: from [127.0.1.1] ([75.106.27.153]) by smtp.gmail.com with ESMTPSA id u79sm24796393pfg.141.2018.01.12.10.10.44 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 12 Jan 2018 10:10:48 -0800 (PST) Subject: [bpf-next PATCH 3/7] sockmap: convert refcnt to an atomic refcnt From: John Fastabend To: borkmann@iogearbox.net, john.fastabend@gmail.com, ast@kernel.org Cc: netdev@vger.kernel.org, kafai@fb.com Date: Fri, 12 Jan 2018 10:10:38 -0800 Message-ID: <20180112181038.21531.8898.stgit@john-Precision-Tower-5810> In-Reply-To: <20180112175029.21531.54693.stgit@john-Precision-Tower-5810> References: <20180112175029.21531.54693.stgit@john-Precision-Tower-5810> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org The sockmap refcnt up until now has been wrapped in the sk_callback_lock(). So its not actually needed any locking of its own. The counter itself tracks the lifetime of the psock object. Sockets in a sockmap have a lifetime that is independent of the map they are part of. This is possible because a single socket may be in multiple maps. When this happens we can only release the psock data associated with the socket when the refcnt reaches zero. There are three possible delete sock reference decrement paths first through the normal sockmap process, the user deletes the socket from the map. Second the map is removed and all sockets in the map are removed, delete path is similar to case 1. The third case is an asyncronous socket event such as a closing the socket. The last case handles removing sockets that are no longer available. For completeness, although inc does not pose any problems in this patch series, the inc case only happens when a psock is added to a map. Next we plan to add another socket prog type to handle policy and monitoring on the TX path. When we do this however we will need to keep a reference count open across the sendmsg/sendpage call and holding the sk_callback_lock() here (on every send) seems less than ideal, also it may sleep in cases where we hit memory pressure. Instead of dealing with these issues in some clever way simply make the reference counting a refcnt_t type and do proper atomic ops. Signed-off-by: John Fastabend --- kernel/bpf/sockmap.c | 21 +++++++++------------ 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c index 3f662ee..972608f 100644 --- a/kernel/bpf/sockmap.c +++ b/kernel/bpf/sockmap.c @@ -62,8 +62,7 @@ struct smap_psock_map_entry { struct smap_psock { struct rcu_head rcu; - /* refcnt is used inside sk_callback_lock */ - u32 refcnt; + refcount_t refcnt; /* datapath variables */ struct sk_buff_head rxqueue; @@ -346,14 +345,12 @@ static void smap_destroy_psock(struct rcu_head *rcu) static void smap_release_sock(struct smap_psock *psock, struct sock *sock) { - psock->refcnt--; - if (psock->refcnt) - return; - - smap_stop_sock(psock, sock); - clear_bit(SMAP_TX_RUNNING, &psock->state); - rcu_assign_sk_user_data(sock, NULL); - call_rcu_sched(&psock->rcu, smap_destroy_psock); + if (refcount_dec_and_test(&psock->refcnt)) { + smap_stop_sock(psock, sock); + clear_bit(SMAP_TX_RUNNING, &psock->state); + rcu_assign_sk_user_data(sock, NULL); + call_rcu_sched(&psock->rcu, smap_destroy_psock); + } } static int smap_parse_func_strparser(struct strparser *strp, @@ -485,7 +482,7 @@ static struct smap_psock *smap_init_psock(struct sock *sock, INIT_WORK(&psock->tx_work, smap_tx_work); INIT_WORK(&psock->gc_work, smap_gc_work); INIT_LIST_HEAD(&psock->maps); - psock->refcnt = 1; + refcount_set(&psock->refcnt, 1); rcu_assign_sk_user_data(sock, psock); sock_hold(sock); @@ -745,7 +742,7 @@ static int sock_map_ctx_update_elem(struct bpf_sock_ops_kern *skops, err = -EBUSY; goto out_progs; } - psock->refcnt++; + refcount_inc(&psock->refcnt); } else { psock = smap_init_psock(sock, stab); if (IS_ERR(psock)) {