From patchwork Fri Mar 10 03:31:31 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jon Maxwell X-Patchwork-Id: 737241 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3vfXr71SKwz9s7t for ; Fri, 10 Mar 2017 14:32:07 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="EMh/hLhR"; dkim-atps=neutral Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752874AbdCJDb5 (ORCPT ); Thu, 9 Mar 2017 22:31:57 -0500 Received: from mail-pf0-f193.google.com ([209.85.192.193]:33876 "EHLO mail-pf0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751762AbdCJDbz (ORCPT ); Thu, 9 Mar 2017 22:31:55 -0500 Received: by mail-pf0-f193.google.com with SMTP id o126so9275775pfb.1; Thu, 09 Mar 2017 19:31:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=HeqASFOSVwRjuYIfWXbuBqcm4nXVjjxjMrtpTTYAwyo=; b=EMh/hLhRlhy5TrqWJUllWZTgWgiL0wx+irjNHIjGEsecQyrt7xifCIYhkb8n1ARk5T yJaKVFWkydJc8KM7BI+z+RqYjUWaAP8s3hHpJElUF2l+0KxMaECC12zKazvGSVN6PyhQ eFpOo6DhkwhWiEIKUAH0FHDD48UYZGmDf2tzCHWHiq9HJArJEFdudaBB5Kd8B3em87Si 0MNTJIZK+mhDXgTmM7g6TuHR1/1LGg97+CbJMD2rYzkgcdnhA0If2zsmgfR3jxE6EMbK AIRCvX7B1FvCAihMJ+qIFs/LUTvoycfz3OfN54Abi0ZJYEHyCz5dTf4q84Ya3g2l7OZZ wJzQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=HeqASFOSVwRjuYIfWXbuBqcm4nXVjjxjMrtpTTYAwyo=; b=nv+8LjLrgQR27j4mbO7h9y/bWRImM2nUnIkZYpyQdi07il1MgCFPAs1pGPMBy+irMg GBqZglEUknaEOTEaKB2H9Pe1ajzUov+EAMjpHl2hgBj/0lI82Az/gS2R+1kra7wYFM5J CIGWD8fvn9MQZaXoHdeFLA+hh3Xr3ALnvG2Y2NfLOAyT1UqFGRrUSWq4AiTOuVsEkQi5 IghwvNw8V9svhTK1rbjLYuhkor58NQLB+FJWcAtM9jebZByyAoQP0RRWzUvwZ5HjEvuc 0wBXqF+of4SO0cKZbmA7xppy3Q197/QruAJHTND73mDzYxr/F6NEDoM7jIX8+r6/OWni QBFA== X-Gm-Message-State: AMke39kTG7GEohoGPYCh2Fpmuq1eoQSYTRy7lHIfH9mc5ucNARTjyITge3Veutw59+B4Qw== X-Received: by 10.84.224.3 with SMTP id r3mr21925883plj.6.1489116713462; Thu, 09 Mar 2017 19:31:53 -0800 (PST) Received: from dhcp-1-107.bne.redhat.com (110-175-8-199.static.tpgi.com.au. [110.175.8.199]) by smtp.gmail.com with ESMTPSA id u82sm15078035pfd.7.2017.03.09.19.31.47 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 09 Mar 2017 19:31:52 -0800 (PST) From: Jon Maxwell To: gerrit@erg.abdn.ac.uk Cc: davem@davemloft.net, edumazet@google.com, andreyknvl@google.com, kuznet@ms2.inr.ac.ru, jmorris@namei.org, yoshfuji@linux-ipv6.org, kaber@trash.net, ncardwell@google.com, dccp@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, jmaxwell@redhat.com, Jon Maxwell , Eric Garver , Hannes Sowa Subject: [PATCH net, v1] dccp/tcp: fix routing redirect race Date: Fri, 10 Mar 2017 14:31:31 +1100 Message-Id: <1489116691-26264-1-git-send-email-jmaxwell37@gmail.com> X-Mailer: git-send-email 1.8.3.1 MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org As Eric Dumazet pointed out this also needs to be fixed in IPv6. v1: Contains the IPv6 patch as well. We have seen a few incidents lately where a dst_enty has been freed with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that dst_entry. If the conditions/timings are right a crash then ensues when the freed dst_entry is referenced later on. A Common crashing back trace is: #8 [] page_fault at ffffffff8163e648 [exception RIP: __tcp_ack_snd_check+74] . . #9 [] tcp_rcv_established at ffffffff81580b64 #10 [] tcp_v4_do_rcv at ffffffff8158b54a #11 [] tcp_v4_rcv at ffffffff8158cd02 #12 [] ip_local_deliver_finish at ffffffff815668f4 #13 [] ip_local_deliver at ffffffff81566bd9 #14 [] ip_rcv_finish at ffffffff8156656d #15 [] ip_rcv at ffffffff81566f06 #16 [] __netif_receive_skb_core at ffffffff8152b3a2 #17 [] __netif_receive_skb at ffffffff8152b608 #18 [] netif_receive_skb at ffffffff8152b690 #19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3] #20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3] #21 [] net_rx_action at ffffffff8152bac2 #22 [] __do_softirq at ffffffff81084b4f #23 [] call_softirq at ffffffff8164845c #24 [] do_softirq at ffffffff81016fc5 #25 [] irq_exit at ffffffff81084ee5 #26 [] do_IRQ at ffffffff81648ff8 Of course it may happen with other NIC drivers as well. It's found the freed dst_entry here: 224 static bool tcp_in_quickack_mode(struct sock *sk)↩ 225 {↩ 226 ▹ const struct inet_connection_sock *icsk = inet_csk(sk);↩ 227 ▹ const struct dst_entry *dst = __sk_dst_get(sk);↩ 228 ↩ 229 ▹ return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩ 230 ▹ ▹ (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩ 231 }↩ But there are other backtraces attributed to the same freed dst_entry in netfilter code as well. All the vmcores showed 2 significant clues: - Remote hosts behind the default gateway had always been redirected to a different gateway. A rtable/dst_entry will be added for that host. Making more dst_entrys with lower reference counts. Making this more probable. - All vmcores showed a postitive LockDroppedIcmps value, e.g: LockDroppedIcmps 267 A closer look at the tcp_v4_err() handler revealed that do_redirect() will run regardless of whether user space has the socket locked. This can result in a race condition where the same dst_entry cached in sk->sk_dst_entry can be decremented twice for the same socket via: do_redirect()->__sk_dst_check()-> dst_release(). Which leads to the dst_entry being prematurely freed with another socket pointing to it via sk->sk_dst_cache and a subsequent crash. To fix this skip do_redirect() if usespace has the socket locked. Instead let the redirect take place later when user space does not have the socket locked. The dccp/IPv6 code is very similar in this respect, so fixing it there too. As Eric Garver pointed out the following commit now invalidates routes. Which can set the dst->obsolete flag so that ipv4_dst_check() returns null and triggers the dst_release(). Fixes: ceb3320610d6 ("ipv4: Kill routes during PMTU/redirect updates.") Cc: Eric Garver Cc: Hannes Sowa Signed-off-by: Jon Maxwell --- net/dccp/ipv4.c | 3 ++- net/ipv4/tcp_ipv4.c | 3 ++- net/ipv6/tcp_ipv6.c | 8 +++++--- 3 files changed, 9 insertions(+), 5 deletions(-) diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c index 409d0cf..b99168b 100644 --- a/net/dccp/ipv4.c +++ b/net/dccp/ipv4.c @@ -289,7 +289,8 @@ static void dccp_v4_err(struct sk_buff *skb, u32 info) switch (type) { case ICMP_REDIRECT: - dccp_do_redirect(skb, sk); + if (!sock_owned_by_user(sk)) + dccp_do_redirect(skb, sk); goto out; case ICMP_SOURCE_QUENCH: /* Just silently ignore these. */ diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 8f3ec13..575e19d 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -431,7 +431,8 @@ void tcp_v4_err(struct sk_buff *icmp_skb, u32 info) switch (type) { case ICMP_REDIRECT: - do_redirect(icmp_skb, sk); + if (!sock_owned_by_user(sk)) + do_redirect(icmp_skb, sk); goto out; case ICMP_SOURCE_QUENCH: /* Just silently ignore these. */ diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 60a5295..49fa2e8 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -391,10 +391,12 @@ static void tcp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, np = inet6_sk(sk); if (type == NDISC_REDIRECT) { - struct dst_entry *dst = __sk_dst_check(sk, np->dst_cookie); + if (!sock_owned_by_user(sk)) { + struct dst_entry *dst = __sk_dst_check(sk, np->dst_cookie); - if (dst) - dst->ops->redirect(dst, sk, skb); + if (dst) + dst->ops->redirect(dst, sk, skb); + } goto out; }