From patchwork Thu May 30 14:15:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Henrique Cerri X-Patchwork-Id: 1107772 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 45F8kF5w3Mz9s9T; Fri, 31 May 2019 00:15:33 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1hWLq5-0000qA-VJ; Thu, 30 May 2019 14:15:29 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1hWLq2-0000oe-HV for kernel-team@lists.ubuntu.com; Thu, 30 May 2019 14:15:26 +0000 Received: from mail-qt1-f197.google.com ([209.85.160.197]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1hWLq2-0001QG-4J for kernel-team@lists.ubuntu.com; Thu, 30 May 2019 14:15:26 +0000 Received: by mail-qt1-f197.google.com with SMTP id r40so5055784qtk.0 for ; Thu, 30 May 2019 07:15:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=QQ6/9w4xJSK46nDboYOvHjmfX26vt3a4ZsSh+/ZbX8A=; b=pSUW82b1pFNgEUlGxUcXqC+EhjsUuC40S0aSZ3cCnfDk8ELh1ckA84qNWJhhFmn9dr KSiEWc75wucAKuGYyEO1iqANgvElIsrPRAGq3EPhkFKcr6OudPMTPLsngZMfeflmQtqL SAWUItGhZpncuJoitDaDnIcqomqHUDmUDC15xsH8OmdRT4DWzQoaUolKmyIx96utElKF YLVu8ns4zV3q1421B/5tjEbyYqYAeKRpgYF6wJkofVGwgye1DaI8nhQRN11Fudb6R+57 Z69H8gghMb+JIMXa0JgBmvCs4xcx6uKoYAKeMNypnktRtha1ETCNHustKw1evDVc47PJ ec+A== X-Gm-Message-State: APjAAAVt8LBbqyljqjpQIk+VVIMubM1G5jqBD8wII3M+HXvMJcHMajLd 0MggRy2LJmed/Si/NZ/MqqgbW4xGfxaudA57OCehUBGLDtlIGE3yEhLqVC99ZCVuvCkC5+EylyN Mylon2gIWLSlLSFI2+Vqsk6b7miykwP1VbbqmtBM5 X-Received: by 2002:a37:670e:: with SMTP id b14mr3306447qkc.216.1559225724815; Thu, 30 May 2019 07:15:24 -0700 (PDT) X-Google-Smtp-Source: APXvYqzIJUnUd4GxO+b7HMHZ/V12ZXHBax9s05woCOfLfj5Fb9OYLzZ6LvCESP4xB9y02vhqR5AN+A== X-Received: by 2002:a37:670e:: with SMTP id b14mr3306421qkc.216.1559225724473; Thu, 30 May 2019 07:15:24 -0700 (PDT) Received: from gallifrey.lan ([2804:14c:4e3:4a76:e1a0:5724:a353:2a54]) by smtp.gmail.com with ESMTPSA id c4sm1449152qkd.24.2019.05.30.07.15.22 for (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Thu, 30 May 2019 07:15:23 -0700 (PDT) From: Marcelo Henrique Cerri To: kernel-team@lists.ubuntu.com Subject: [c/azure][PATCH] ipv4: ipv6: netfilter: Adjust the frag mem limit when truesize changes Date: Thu, 30 May 2019 11:15:14 -0300 Message-Id: <20190530141514.9585-3-marcelo.cerri@canonical.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190530141514.9585-1-marcelo.cerri@canonical.com> References: <20190530141514.9585-1-marcelo.cerri@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jiri Wiesner BugLink: http://bugs.launchpad.net/bugs/1830266 The *_frag_reasm() functions are susceptible to miscalculating the byte count of packet fragments in case the truesize of a head buffer changes. The truesize member may be changed by the call to skb_unclone(), leaving the fragment memory limit counter unbalanced even if all fragments are processed. This miscalculation goes unnoticed as long as the network namespace which holds the counter is not destroyed. Should an attempt be made to destroy a network namespace that holds an unbalanced fragment memory limit counter the cleanup of the namespace never finishes. The thread handling the cleanup gets stuck in inet_frags_exit_net() waiting for the percpu counter to reach zero. The thread is usually in running state with a stacktrace similar to: PID: 1073 TASK: ffff880626711440 CPU: 1 COMMAND: "kworker/u48:4" #5 [ffff880621563d48] _raw_spin_lock at ffffffff815f5480 #6 [ffff880621563d48] inet_evict_bucket at ffffffff8158020b #7 [ffff880621563d80] inet_frags_exit_net at ffffffff8158051c #8 [ffff880621563db0] ops_exit_list at ffffffff814f5856 #9 [ffff880621563dd8] cleanup_net at ffffffff814f67c0 #10 [ffff880621563e38] process_one_work at ffffffff81096f14 It is not possible to create new network namespaces, and processes that call unshare() end up being stuck in uninterruptible sleep state waiting to acquire the net_mutex. The bug was observed in the IPv6 netfilter code by Per Sundstrom. I thank him for his analysis of the problem. The parts of this patch that apply to IPv4 and IPv6 fragment reassembly are preemptive measures. Signed-off-by: Jiri Wiesner Reported-by: Per Sundstrom Acked-by: Peter Oskolkov Signed-off-by: David S. Miller (cherry picked from commit ebaf39e6032faf77218220707fc3fa22487784e0) Signed-off-by: Marcelo Henrique Cerri --- net/ipv4/ip_fragment.c | 7 +++++++ net/ipv6/netfilter/nf_conntrack_reasm.c | 8 +++++++- net/ipv6/reassembly.c | 8 +++++++- 3 files changed, 21 insertions(+), 2 deletions(-) diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c index e05004e36a07..46dcbff7d5c9 100644 --- a/net/ipv4/ip_fragment.c +++ b/net/ipv4/ip_fragment.c @@ -479,6 +479,7 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev, struct sk_buff *fp, *head = qp->q.fragments; int len; int ihlen; + int delta; int err; u8 ecn; @@ -519,10 +520,16 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev, if (len > 65535) goto out_oversize; + delta = - head->truesize; + /* Head of list must not be cloned. */ if (skb_unclone(head, GFP_ATOMIC)) goto out_nomem; + delta += head->truesize; + if (delta) + add_frag_mem_limit(qp->q.net, delta); + /* If the first fragment is fragmented itself, we split * it to two chunks: the first with data and paged part * and the second, holding only fragments. */ diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c index a452d99c9f52..460f4da8d584 100644 --- a/net/ipv6/netfilter/nf_conntrack_reasm.c +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c @@ -342,7 +342,7 @@ static bool nf_ct_frag6_reasm(struct frag_queue *fq, struct sk_buff *prev, struct net_device *dev) { struct sk_buff *fp, *head = fq->q.fragments; - int payload_len; + int payload_len, delta; u8 ecn; inet_frag_kill(&fq->q); @@ -364,10 +364,16 @@ nf_ct_frag6_reasm(struct frag_queue *fq, struct sk_buff *prev, struct net_devic return false; } + delta = - head->truesize; + /* Head of list must not be cloned. */ if (skb_unclone(head, GFP_ATOMIC)) return false; + delta += head->truesize; + if (delta) + add_frag_mem_limit(fq->q.net, delta); + /* If the first fragment is fragmented itself, we split * it to two chunks: the first with data and paged part * and the second, holding only fragments. */ diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c index b939b94e7e91..1ae71e5a95f4 100644 --- a/net/ipv6/reassembly.c +++ b/net/ipv6/reassembly.c @@ -336,7 +336,7 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev, { struct net *net = container_of(fq->q.net, struct net, ipv6.frags); struct sk_buff *fp, *head = fq->q.fragments; - int payload_len; + int payload_len, delta; unsigned int nhoff; int sum_truesize; u8 ecn; @@ -377,10 +377,16 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev, if (payload_len > IPV6_MAXPLEN) goto out_oversize; + delta = - head->truesize; + /* Head of list must not be cloned. */ if (skb_unclone(head, GFP_ATOMIC)) goto out_oom; + delta += head->truesize; + if (delta) + add_frag_mem_limit(fq->q.net, delta); + /* If the first fragment is fragmented itself, we split * it to two chunks: the first with data and paged part * and the second, holding only fragments. */