From patchwork Thu Jul 9 19:54:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thadeu Lima de Souza Cascardo X-Patchwork-Id: 1326198 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4B2n2h5Cmsz9sSJ; Fri, 10 Jul 2020 05:55:08 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1jtcdN-0006NJ-Jg; Thu, 09 Jul 2020 19:55:05 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jtcdL-0006Mv-R6 for kernel-team@lists.ubuntu.com; Thu, 09 Jul 2020 19:55:03 +0000 Received: from 1.general.cascardo.us.vpn ([10.172.70.58] helo=localhost.localdomain) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jtcdL-0006HQ-6D; Thu, 09 Jul 2020 19:55:03 +0000 From: Thadeu Lima de Souza Cascardo To: kernel-team@lists.ubuntu.com Subject: [SRU G/U] UBUNTU: SAUCE: Revert "netprio_cgroup: Fix unlimited memory leak of v2 cgroups" Date: Thu, 9 Jul 2020 16:54:02 -0300 Message-Id: <20200709195402.1835538-3-cascardo@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200709195402.1835538-1-cascardo@canonical.com> References: <20200709195402.1835538-1-cascardo@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Thadeu Lima de Souza Cascardo Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" BugLink: https://bugs.launchpad.net/bugs/1886668 This reverts commit 5eebba2159d707ae9533a52839e1ba71754c4426, which is commit 090e28b229af92dc5b40786ca673999d59e73056 upstream. There is a crash related to a possible use-after-free of cgroups when cgroup BPF is user with INET_INGRESS or INET_EGRESS. [ 696.396993] RIP: 0010:__cgroup_bpf_run_filter_skb+0xbb/0x1e0 [ 696.397005] RSP: 0018:ffff893fdcb83a70 EFLAGS: 00010292 [ 696.397015] RAX: 6d69546e6f697469 RBX: 0000000000000000 RCX: 0000000000000014 [ 696.397028] RDX: 0000000000000000 RSI: ffff893fd0360000 RDI: ffff893fb5154800 [ 696.397041] RBP: ffff893fdcb83ad0 R08: 0000000000000001 R09: 0000000000000000 [ 696.397058] R10: 0000000000000000 R11: 0000000000000003 R12: 0000000000000014 [ 696.397075] R13: ffff893fb5154800 R14: 0000000000000020 R15: ffff893fc6ba4d00 [ 696.397091] FS: 0000000000000000(0000) GS:ffff893fdcb80000(0000) knlGS:0000000000000000 [ 696.397107] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 696.397119] CR2: 000000c0001b4000 CR3: 00000006dce0a004 CR4: 00000000003606e0 [ 696.397135] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 696.397152] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 696.397169] Call Trace: [ 696.397175] [ 696.397183] sk_filter_trim_cap+0xd0/0x1b0 [ 696.397191] tcp_v4_rcv+0x8b7/0xa80 [ 696.397199] ip_local_deliver_finish+0x66/0x210 [ 696.397208] ip_local_deliver+0x7e/0xe0 [ 696.397215] ? ip_rcv_finish+0x430/0x430 [ 696.397223] ip_rcv_finish+0x129/0x430 [ 696.397230] ip_rcv+0x296/0x360 [ 696.397238] ? inet_del_offload+0x40/0x40 [ 696.397249] __netif_receive_skb_core+0x432/0xb80 [ 696.397261] ? skb_send_sock+0x50/0x50 [ 696.397271] ? tcp4_gro_receive+0x137/0x1a0 [ 696.397280] __netif_receive_skb+0x18/0x60 [ 696.397290] ? __netif_receive_skb+0x18/0x60 [ 696.397300] netif_receive_skb_internal+0x45/0xe0 [ 696.397309] napi_gro_receive+0xc5/0xf0 [ 696.397317] xennet_poll+0x9ca/0xbc0 [ 696.397325] net_rx_action+0x140/0x3a0 [ 696.397334] __do_softirq+0xe4/0x2d4 [ 696.397344] irq_exit+0xc5/0xd0 [ 696.397352] xen_evtchn_do_upcall+0x30/0x50 [ 696.397361] xen_hvm_callback_vector+0x90/0xa0 [ 696.397371] [ 696.397378] RIP: 0010:native_safe_halt+0x12/0x20 [ 696.397390] RSP: 0018:ffff94c4862cbe80 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff0c [ 696.397405] RAX: ffffffff8efc1800 RBX: 0000000000000006 RCX: 0000000000000000 [ 696.397419] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 696.397435] RBP: ffff94c4862cbe80 R08: 0000000000000002 R09: 0000000000000001 [ 696.397449] R10: 0000000000100000 R11: 0000000000000397 R12: 0000000000000006 [ 696.397462] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 696.397479] ? __sched_text_end+0x1/0x1 [ 696.397489] default_idle+0x20/0x100 [ 696.397499] arch_cpu_idle+0x15/0x20 [ 696.397507] default_idle_call+0x23/0x30 [ 696.397515] do_idle+0x172/0x1f0 [ 696.397522] cpu_startup_entry+0x73/0x80 [ 696.397530] start_secondary+0x1ab/0x200 [ 696.397538] secondary_startup_64+0xa5/0xb0 [ 696.397545] Code: 89 5d b0 49 29 cc 45 01 a7 80 00 00 00 44 89 e1 48 29 c8 48 89 4d a8 49 89 87 d8 00 00 00 89 d2 48 8d 84 d6 38 03 00 00 48 8b 00 <4c> 8b 70 10 4c 8d 68 10 4d 85 f6 0f 84 f6 00 00 00 49 8d 47 30 [ 696.397584] RIP: __cgroup_bpf_run_filter_skb+0xbb/0x1e0 RSP: ffff893fdcb83a70 [ 696.397607] ---[ end trace ec5c84424d511a6f ]--- [ 696.397616] Kernel panic - not syncing: Fatal exception in interrupt [ 696.397876] Kernel Offset: 0xd600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) This is caused by net_cls and net_prio cgroups disabling cgroup BPF and causing it to stop refcounting when allocating new sockets. Releasing those sockets will cause the refcount to go negative, leading to the potential use-after-free. Though this revert won't prevent the issue from happening as it could still theoretically be caused by setting net_cls.classid or net_prio.ifpriomap, this will prevent it from happening on default system configurations. A combination of systemd use of cgroup BPF and extensive cgroup use including net_prio will cause this. Reports usually involve using lxd, libvirt, docker or kubernetes and some systemd service with IPAddressDeny or IPAddressAllow. And though this patch has been introduced to avoid some potential memory leaks, the cure is worse than the disease. We will need to revisit both issues later on and reapply this patch when we have a real fix for the crash. Signed-off-by: Thadeu Lima de Souza Cascardo --- net/core/netprio_cgroup.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c index 9bd4cab7d510..8881dd943dd0 100644 --- a/net/core/netprio_cgroup.c +++ b/net/core/netprio_cgroup.c @@ -236,8 +236,6 @@ static void net_prio_attach(struct cgroup_taskset *tset) struct task_struct *p; struct cgroup_subsys_state *css; - cgroup_sk_alloc_disable(); - cgroup_taskset_for_each(p, css, tset) { void *v = (void *)(unsigned long)css->id;