From patchwork Wed Jun 17 22:23:39 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kamal Mostafa X-Patchwork-Id: 486098 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) by ozlabs.org (Postfix) with ESMTP id 4F1371402A9; Thu, 18 Jun 2015 18:02:15 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.76) (envelope-from ) id 1Z5Uly-0007Ud-Cm; Thu, 18 Jun 2015 08:02:06 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtp (Exim 4.76) (envelope-from ) id 1Z5Lml-0007mc-IT for kernel-team@lists.ubuntu.com; Wed, 17 Jun 2015 22:26:19 +0000 Received: from [10.172.68.52] (helo=fourier) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1Z5Lml-0003HP-5T; Wed, 17 Jun 2015 22:26:19 +0000 Received: from kamal by fourier with local (Exim 4.82) (envelope-from ) id 1Z5Lmi-0000fA-Uj; Wed, 17 Jun 2015 15:26:16 -0700 From: Kamal Mostafa To: linux-kernel@vger.kernel.org, stable@vger.kernel.org, kernel-team@lists.ubuntu.com Subject: [PATCH 3.19.y-ckt 114/146] net: sched: fix call_rcu() race on classifier module unloads Date: Wed, 17 Jun 2015 15:23:39 -0700 Message-Id: <1434579851-1719-115-git-send-email-kamal@canonical.com> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1434579851-1719-1-git-send-email-kamal@canonical.com> References: <1434579851-1719-1-git-send-email-kamal@canonical.com> X-Extended-Stable: 3.19 X-Mailman-Approved-At: Thu, 18 Jun 2015 08:02:03 +0000 Cc: Daniel Borkmann , Kamal Mostafa , Jamal Hadi Salim , John Fastabend , Eric Dumazet , Thomas Graf , "David S. Miller" , Alexei Starovoitov X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.14 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: kernel-team-bounces@lists.ubuntu.com 3.19.8-ckt2 -stable review patch. If anyone has any objections, please let me know. ------------------ From: Daniel Borkmann commit c78e1746d3ad7d548bdf3fe491898cc453911a49 upstream. Vijay reported that a loop as simple as ... while true; do tc qdisc add dev foo root handle 1: prio tc filter add dev foo parent 1: u32 match u32 0 0 flowid 1 tc qdisc del dev foo root rmmod cls_u32 done ... will panic the kernel. Moreover, he bisected the change apparently introducing it to 78fd1d0ab072 ("netlink: Re-add locking to netlink_lookup() and seq walker"). The removal of synchronize_net() from the netlink socket triggering the qdisc to be removed, seems to have uncovered an RCU resp. module reference count race from the tc API. Given that RCU conversion was done after e341694e3eb5 ("netlink: Convert netlink_lookup() to use RCU protected hash table") which added the synchronize_net() originally, occasion of hitting the bug was less likely (not impossible though): When qdiscs that i) support attaching classifiers and, ii) have at least one of them attached, get deleted, they invoke tcf_destroy_chain(), and thus call into ->destroy() handler from a classifier module. After RCU conversion, all classifier that have an internal prio list, unlink them and initiate freeing via call_rcu() deferral. Meanhile, tcf_destroy() releases already reference to the tp->ops->owner module before the queued RCU callback handler has been invoked. Subsequent rmmod on the classifier module is then not prevented since all module references are already dropped. By the time, the kernel invokes the RCU callback handler from the module, that function address is then invalid. One way to fix it would be to add an rcu_barrier() to unregister_tcf_proto_ops() to wait for all pending call_rcu()s to complete. synchronize_rcu() is not appropriate as under heavy RCU callback load, registered call_rcu()s could be deferred longer than a grace period. In case we don't have any pending call_rcu()s, the barrier is allowed to return immediately. Since we came here via unregister_tcf_proto_ops(), there are no users of a given classifier anymore. Further nested call_rcu()s pointing into the module space are not being done anywhere. Only cls_bpf_delete_prog() may schedule a work item, to unlock pages eventually, but that is not in the range/context of cls_bpf anymore. Fixes: 25d8c0d55f24 ("net: rcu-ify tcf_proto") Fixes: 9888faefe132 ("net: sched: cls_basic use RCU") Reported-by: Vijay Subramanian Signed-off-by: Daniel Borkmann Cc: John Fastabend Cc: Eric Dumazet Cc: Thomas Graf Cc: Jamal Hadi Salim Cc: Alexei Starovoitov Tested-by: Vijay Subramanian Acked-by: Alexei Starovoitov Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Kamal Mostafa --- net/sched/cls_api.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index baef987..d3328a1 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -81,6 +81,11 @@ int unregister_tcf_proto_ops(struct tcf_proto_ops *ops) struct tcf_proto_ops *t; int rc = -ENOENT; + /* Wait for outstanding call_rcu()s, if any, from a + * tcf_proto_ops's destroy() handler. + */ + rcu_barrier(); + write_lock(&cls_mod_lock); list_for_each_entry(t, &tcf_proto_base, head) { if (t == ops) {