From patchwork Fri Oct 16 22:57:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jann Horn X-Patchwork-Id: 1383556 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.infradead.org (client-ip=2001:8b0:10b:1231::1; helo=merlin.infradead.org; envelope-from=linux-um-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; secure) header.d=lists.infradead.org header.i=@lists.infradead.org header.a=rsa-sha256 header.s=merlin.20170209 header.b=pyA63r8g; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20161025 header.b=Z5cCOFqA; dkim-atps=neutral Received: from merlin.infradead.org (merlin.infradead.org [IPv6:2001:8b0:10b:1231::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CChPZ29fCz9sVN for ; Sat, 17 Oct 2020 09:57:38 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To:Message-Id:Date: Subject:To:From:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=bv6mrdznYfIF6q2n1B2U1rCvEQq3LhfGGNDDVwmKKTY=; b=pyA63r8gov2K7xkTHGPxAbgF3 xAU2Wbbhdhmb5cOKEj6cXDjUdBQc0g9DqOb1NYVNHBAid38jr4LHAGksSjfkoFCm/2IK5oS5LCvaA Qe+wutNuKN/e3/oKDTCDX5BZPzyT4HnSfXNsiFakIpie6pg4fNFEq8Z+ryB229QfLuUUJGS4Og4gw /RLjsVvDxDzuwbog0ep2M+N7uJszsLIY8ByIYjpBYGM4xxs5ecmL2keNs0dz4fNbTnUQ0z5DgoEmw RBeDgjRRh05F30lRnTsdigI8HJUfql308iZNUJx/9/BedKY6FAA9Y82bvZsB2kJfPlXSuQO4sYJz1 oG003g3qg==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kTYfG-0007Bn-3N; Fri, 16 Oct 2020 22:57:34 +0000 Received: from mail-wr1-x443.google.com ([2a00:1450:4864:20::443]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1kTYfB-00078C-IS for linux-um@lists.infradead.org; Fri, 16 Oct 2020 22:57:31 +0000 Received: by mail-wr1-x443.google.com with SMTP id x7so4854089wrl.3 for ; Fri, 16 Oct 2020 15:57:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=NLFZo5d99qxe4AcvKWOWAtWTR/VkW0wmrCHS7CoEVqw=; b=Z5cCOFqA3FERPvwFKK+kH2wPg7MuCa4KLR5Paunp8gj62wG9LhaQqAGvfcYSGJ7wVA EI0lNOxuYx32n0nynXBDCu/44Xlt6TibXcdDN+UzSpXZ47TrWeZ8ADv1FQ9maTJDNizJ NfhXrgx0FM+2SGncn5l6KhlGSNymYoJQ8qHiDjcVu8zZxXI5m3RpHgoP6WOEgBzH5oik Zu0qvhJ1bfwAS3ZjCo/eTs6zBnrvVHnKK9wDDpO8g8XlcxQg60MxwRTuGVwIPBM+9NGO tcAT7RE+ND8Oo/nTXm60q7s4hUfCBG3jZSKrib4y5QsAaVfrmToH1gVMfRgLRZDEe1zS 7guA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=NLFZo5d99qxe4AcvKWOWAtWTR/VkW0wmrCHS7CoEVqw=; b=bJpNAaNEfe1DnWHBLSkA9aLGjYdN9VpGq6ZjL5Sa9IH2m/Dfzy7gs81lI9XqoNEeo2 a0lJPBqjQE2/VRLJYEh6mO4VzGVF1Jy/3RJv2S5+iQu8vZCQWCCEbBUJUG0nGY2OTro5 Cfa918aIiGm0SfKRZJ5/x8FmBuUHad98MPcNSYZkAgf4Ybmtz+Uk2St7auxSIK7IOz5q YvawTWTdLSfLOw46KdQu5qUVWR6RljzzTYBWj1LCVTkqyzRogfBUSVhdVd5jx0mxOPzz BQVLLH7p20j2hCa4I18vrD85vaznpqTsAaX+Wsc0SCDBsccoU6qWXy0L2J6fT+jcjnc2 LGOA== X-Gm-Message-State: AOAM531cfSHBc6Ocr5BjgZGMq0x9NKtjQEeKc2l2QCtz9Z8mBVChH0KY RPcgdtwnJV/PJQlcP3OyugcizA== X-Google-Smtp-Source: ABdhPJx/sIKSLz80ke6Adiby6XurrY1y9g3iFz2lPaUt/zqyi5uattU3OyLryeX5CaS0WeGvcM2BNw== X-Received: by 2002:adf:f246:: with SMTP id b6mr6540584wrp.111.1602889046371; Fri, 16 Oct 2020 15:57:26 -0700 (PDT) Received: from localhost ([2a02:168:96c5:1:55ed:514f:6ad7:5bcc]) by smtp.gmail.com with ESMTPSA id v6sm4752489wmj.6.2020.10.16.15.57.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 16 Oct 2020 15:57:25 -0700 (PDT) From: Jann Horn To: Andrew Morton , linux-mm@kvack.org Subject: [PATCH resend v3 1/2] mmap locking API: Order lock of nascent mm outside lock of live mm Date: Sat, 17 Oct 2020 00:57:12 +0200 Message-Id: <20201016225713.1971256-2-jannh@google.com> X-Mailer: git-send-email 2.29.0.rc1.297.gfa9743e501-goog In-Reply-To: <20201016225713.1971256-1-jannh@google.com> References: <20201016225713.1971256-1-jannh@google.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201016_185729_645909_4E69A820 X-CRM114-Status: GOOD ( 23.47 ) X-Spam-Score: -15.7 (---------------) X-Spam-Report: SpamAssassin version 3.4.4 on merlin.infradead.org summary: Content analysis details: (-15.7 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [2a00:1450:4864:20:0:0:0:443 listed in] [list.dnswl.org] -7.5 USER_IN_DEF_DKIM_WL From: address is in the default DKIM white-list 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record -7.5 USER_IN_DEF_SPF_WL From: address is in the default SPF white-list -0.0 SPF_PASS SPF: sender matches SPF record -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.5 ENV_AND_HDR_SPF_MATCH Env and Hdr From used in default SPF WL Match -0.0 DKIMWL_WL_MED DKIMwl.org - Medium trust sender X-BeenThere: linux-um@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Michel Lespinasse , Jason Gunthorpe , Richard Weinberger , Jeff Dike , linux-um@lists.infradead.org, linux-kernel@vger.kernel.org, "Eric W . Biederman" , Sakari Ailus , John Hubbard , Johannes Berg , Mauro Carvalho Chehab , Anton Ivanov Sender: "linux-um" Errors-To: linux-um-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org Until now, the mmap lock of the nascent mm was ordered inside the mmap lock of the old mm (in dup_mmap() and in UML's activate_mm()). A following patch will change the exec path to very broadly lock the nascent mm, but fine-grained locking should still work at the same time for the old mm. In particular, mmap locking calls are hidden behind the copy_from_user() calls and such that are reached through functions like copy_strings() - when a page fault occurs on a userspace memory access, the mmap lock will be taken. To do this in a way that lockdep is happy about, let's turn around the lock ordering in both places that currently nest the locks. Since SINGLE_DEPTH_NESTING is normally used for the inner nesting layer, make up our own lock subclass MMAP_LOCK_SUBCLASS_NASCENT and use that instead. The added locking calls in exec_mmap() are temporary; the following patch will move the locking out of exec_mmap(). As Johannes Berg pointed out[1][2], moving the mmap locking of arch/um/'s activate_mm() up into the execve code also fixes an issue that would've caused a scheduling-in-atomic bug due to mmap_write_lock_nested() while holding a spinlock if UM had support for voluntary preemption. (Even when a semaphore is uncontended, locking it can still trigger rescheduling via might_sleep().) [1] https://lore.kernel.org/linux-mm/115d17aa221b73a479e26ffee52899ddb18b1f53.camel@sipsolutions.net/ [2] https://lore.kernel.org/linux-mm/7b7d6954b74e109e653539d880173fa9cb5c5ddf.camel@sipsolutions.net/ Signed-off-by: Jann Horn --- arch/um/include/asm/mmu_context.h | 3 +-- fs/exec.c | 4 ++++ include/linux/mmap_lock.h | 23 +++++++++++++++++++++-- kernel/fork.c | 7 ++----- 4 files changed, 28 insertions(+), 9 deletions(-) diff --git a/arch/um/include/asm/mmu_context.h b/arch/um/include/asm/mmu_context.h index 17ddd4edf875..c13bc5150607 100644 --- a/arch/um/include/asm/mmu_context.h +++ b/arch/um/include/asm/mmu_context.h @@ -48,9 +48,8 @@ static inline void activate_mm(struct mm_struct *old, struct mm_struct *new) * when the new ->mm is used for the first time. */ __switch_mm(&new->context.id); - mmap_write_lock_nested(new, SINGLE_DEPTH_NESTING); + mmap_assert_write_locked(new); uml_setup_stubs(new); - mmap_write_unlock(new); } static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, diff --git a/fs/exec.c b/fs/exec.c index a91003e28eaa..229dbc7aa61a 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1114,6 +1114,8 @@ static int exec_mmap(struct mm_struct *mm) if (ret) return ret; + mmap_write_lock_nascent(mm); + if (old_mm) { /* * Make sure that if there is a core dump in progress @@ -1125,6 +1127,7 @@ static int exec_mmap(struct mm_struct *mm) if (unlikely(old_mm->core_state)) { mmap_read_unlock(old_mm); mutex_unlock(&tsk->signal->exec_update_mutex); + mmap_write_unlock(mm); return -EINTR; } } @@ -1138,6 +1141,7 @@ static int exec_mmap(struct mm_struct *mm) tsk->mm->vmacache_seqnum = 0; vmacache_flush(tsk); task_unlock(tsk); + mmap_write_unlock(mm); if (old_mm) { mmap_read_unlock(old_mm); BUG_ON(active_mm != old_mm); diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index 0707671851a8..24de1fe99ee4 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -3,6 +3,18 @@ #include +/* + * Lock subclasses for the mmap_lock. + * + * MMAP_LOCK_SUBCLASS_NASCENT is for core kernel code that wants to lock an mm + * that is still being constructed and wants to be able to access the active mm + * normally at the same time. It nests outside MMAP_LOCK_SUBCLASS_NORMAL. + */ +enum { + MMAP_LOCK_SUBCLASS_NORMAL = 0, + MMAP_LOCK_SUBCLASS_NASCENT +}; + #define MMAP_LOCK_INITIALIZER(name) \ .mmap_lock = __RWSEM_INITIALIZER((name).mmap_lock), @@ -16,9 +28,16 @@ static inline void mmap_write_lock(struct mm_struct *mm) down_write(&mm->mmap_lock); } -static inline void mmap_write_lock_nested(struct mm_struct *mm, int subclass) +/* + * Lock an mm_struct that is still being set up (during fork or exec). + * This nests outside the mmap locks of live mm_struct instances. + * No interruptible/killable versions exist because at the points where you're + * supposed to use this helper, the mm isn't visible to anything else, so we + * expect the mmap_lock to be uncontended. + */ +static inline void mmap_write_lock_nascent(struct mm_struct *mm) { - down_write_nested(&mm->mmap_lock, subclass); + down_write_nested(&mm->mmap_lock, MMAP_LOCK_SUBCLASS_NASCENT); } static inline int mmap_write_lock_killable(struct mm_struct *mm) diff --git a/kernel/fork.c b/kernel/fork.c index da8d360fb032..db67eb4ac7bd 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -474,6 +474,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, unsigned long charge; LIST_HEAD(uf); + mmap_write_lock_nascent(mm); uprobe_start_dup_mmap(); if (mmap_write_lock_killable(oldmm)) { retval = -EINTR; @@ -481,10 +482,6 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, } flush_cache_dup_mm(oldmm); uprobe_dup_mmap(oldmm, mm); - /* - * Not linked in yet - no deadlock potential: - */ - mmap_write_lock_nested(mm, SINGLE_DEPTH_NESTING); /* No ordering required: file already has been exposed. */ RCU_INIT_POINTER(mm->exe_file, get_mm_exe_file(oldmm)); @@ -600,12 +597,12 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, /* a new mm has just been created */ retval = arch_dup_mmap(oldmm, mm); out: - mmap_write_unlock(mm); flush_tlb_mm(oldmm); mmap_write_unlock(oldmm); dup_userfaultfd_complete(&uf); fail_uprobe_end: uprobe_end_dup_mmap(); + mmap_write_unlock(mm); return retval; fail_nomem_anon_vma_fork: mpol_put(vma_policy(tmp));