From patchwork Tue Oct 6 22:54:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jann Horn X-Patchwork-Id: 1377698 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.infradead.org (client-ip=2001:8b0:10b:1231::1; helo=merlin.infradead.org; envelope-from=linux-um-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; secure) header.d=lists.infradead.org header.i=@lists.infradead.org header.a=rsa-sha256 header.s=merlin.20170209 header.b=Z3shb/Ec; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20161025 header.b=MtWR3Vyu; dkim-atps=neutral Received: from merlin.infradead.org (merlin.infradead.org [IPv6:2001:8b0:10b:1231::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4C5XqJ5j2Rz9sV0 for ; Wed, 7 Oct 2020 09:55:08 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To:Message-Id:Date: Subject:To:From:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=gwd4fF0hrMcvf4Mk3KCQM/muf8oQ17m6JNTa/tMJMTE=; b=Z3shb/EcBfJUfqwtzFQrPYxLl wIo+Pk9zgacC4VjcVFo8nRp0hNbEz582MaPo+wzOSk9lIyP0wnpMLnAwcjhfDLkeO2gY7LvzZAwTB saO6Kr6E7uzadQyXLEC8r2+Njkps/jyYMPZvvqw6+VpklsH1+10GlJmxXhQfM03nohsvL+zxdqTtL p1+Vd9OgQVpmisxQOPh53Z6LHiG8ZiDAPF47pn40n0xwVbGGyYN8+eIa0UQBuZDxxUuzVSRMGOlix KiWaQnAe6PVrYSE47nxiyYwfezK4LQ0qNotR0S6tbbkZ4rF04wqZWM9kfkfbtim9aM8yQX5GFLbpZ QYFTXR4ow==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kPvrN-0003lk-4B; Tue, 06 Oct 2020 22:55:05 +0000 Received: from mail-wm1-x342.google.com ([2a00:1450:4864:20::342]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1kPvrK-0003jZ-83 for linux-um@lists.infradead.org; Tue, 06 Oct 2020 22:55:03 +0000 Received: by mail-wm1-x342.google.com with SMTP id l11so430017wmh.2 for ; Tue, 06 Oct 2020 15:55:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=HGCVKyGQYZlN2d0OUVMXYRPm5j+ZPoa8A5kyMh5F4+U=; b=MtWR3VyubrBnus+o/vh7gAp/nETf7cvtPg3bsbRbOoaBmBVPhDQViSLg0Sm+d6t2EV 1sFIztChA/8yNE6rZ+JfUbQozKN15UNruTKS97DMZO4BZGRcBJvSmnDMHH+Buvegorea uSJQp4hPC6vU1v94lar4/E34Wr1vxdJdwFPmNIUzMzod7kaC4cf+wsboZLdjioe0CIyN cdsHpaBlZizKsh1dmmSC1BTO2hcuDNe5Wuit40HFnaqcaVXxzYTB2VEYAwFzXLzXnRjN psNmrtLFyhE7lIzGaniy1ENf+7jOFbQs33UO6gsayyuDFY4MkBJPoFlLDmSfRDWIwUJI zCDQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=HGCVKyGQYZlN2d0OUVMXYRPm5j+ZPoa8A5kyMh5F4+U=; b=aOHWDU4SpeNl2w5KFCXcID1pwXRFETF68YguDdZhVnshY0cgOW7IfZBFpKEIVRg4AH YiolxOIfGNDEoOwY5lUeSdTgH/52l7jJFnxDBylHS0RlIdmFcjHTRKNQ4vbR+YO2T3lS Np1axkl2B8EZTDtevvDZ9qO/ywvxFRLhEbPBAlg/DgY62G8UOdLMBAMTxlJ9H9TTPdP2 wHiC0sasO/qZyODxYnIGEMZc/KiVEtAFU3ZDyGhvFGafZplK8bBLxSlgWzvkOZTLVTSW aJy8y3YdkJchp4cFZJkjJjZjN3rJm0xYjqSi3fvGueafZ0rZnwQq/yUcnXECaP5qGE41 cDGA== X-Gm-Message-State: AOAM532tqXqdzel7aCLC+uJPsp/+k6tNyp5ZkjYplbv4A9DG0LRa20oi lF9GjKPkDQ1xqWVKsKhsUTr03w== X-Google-Smtp-Source: ABdhPJxtwY+EEZDUG+ByVJbphRcGc6RU0rLKiFkJaQUAspMbv4Mof8L7Z40rGPrOffzEGzMOeng9Ig== X-Received: by 2002:a05:600c:2902:: with SMTP id i2mr167748wmd.31.1602024900851; Tue, 06 Oct 2020 15:55:00 -0700 (PDT) Received: from localhost ([2a02:168:96c5:1:55ed:514f:6ad7:5bcc]) by smtp.gmail.com with ESMTPSA id i15sm243602wrb.91.2020.10.06.15.55.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 06 Oct 2020 15:55:00 -0700 (PDT) From: Jann Horn To: Andrew Morton , linux-mm@kvack.org Subject: [PATCH v2 1/2] mmap locking API: Order lock of nascent mm outside lock of live mm Date: Wed, 7 Oct 2020 00:54:49 +0200 Message-Id: <20201006225450.751742-2-jannh@google.com> X-Mailer: git-send-email 2.28.0.806.g8561365e88-goog In-Reply-To: <20201006225450.751742-1-jannh@google.com> References: <20201006225450.751742-1-jannh@google.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201006_185502_412554_B8733F27 X-CRM114-Status: GOOD ( 23.42 ) X-Spam-Score: -15.7 (---------------) X-Spam-Report: SpamAssassin version 3.4.4 on merlin.infradead.org summary: Content analysis details: (-15.7 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [2a00:1450:4864:20:0:0:0:342 listed in] [list.dnswl.org] -0.0 SPF_PASS SPF: sender matches SPF record 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record -7.5 USER_IN_DEF_SPF_WL From: address is in the default SPF white-list -7.5 USER_IN_DEF_DKIM_WL From: address is in the default DKIM white-list -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain -0.5 ENV_AND_HDR_SPF_MATCH Env and Hdr From used in default SPF WL Match -0.0 DKIMWL_WL_MED DKIMwl.org - Medium trust sender X-BeenThere: linux-um@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Michel Lespinasse , Jason Gunthorpe , Richard Weinberger , Jeff Dike , linux-um@lists.infradead.org, linux-kernel@vger.kernel.org, "Eric W . Biederman" , Sakari Ailus , John Hubbard , Mauro Carvalho Chehab , Anton Ivanov Sender: "linux-um" Errors-To: linux-um-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org Until now, the mmap lock of the nascent mm was ordered inside the mmap lock of the old mm (in dup_mmap() and in UML's activate_mm()). A following patch will change the exec path to very broadly lock the nascent mm, but fine-grained locking should still work at the same time for the old mm. In particular, mmap locking calls are hidden behind the copy_from_user() calls and such that are reached through functions like copy_strings() - when a page fault occurs on a userspace memory access, the mmap lock will be taken. To do this in a way that lockdep is happy about, let's turn around the lock ordering in both places that currently nest the locks. Since SINGLE_DEPTH_NESTING is normally used for the inner nesting layer, make up our own lock subclass MMAP_LOCK_SUBCLASS_NASCENT and use that instead. The added locking calls in exec_mmap() are temporary; the following patch will move the locking out of exec_mmap(). Signed-off-by: Jann Horn --- arch/um/include/asm/mmu_context.h | 3 +-- fs/exec.c | 4 ++++ include/linux/mmap_lock.h | 23 +++++++++++++++++++++-- kernel/fork.c | 7 ++----- 4 files changed, 28 insertions(+), 9 deletions(-) diff --git a/arch/um/include/asm/mmu_context.h b/arch/um/include/asm/mmu_context.h index 17ddd4edf875..c13bc5150607 100644 --- a/arch/um/include/asm/mmu_context.h +++ b/arch/um/include/asm/mmu_context.h @@ -48,9 +48,8 @@ static inline void activate_mm(struct mm_struct *old, struct mm_struct *new) * when the new ->mm is used for the first time. */ __switch_mm(&new->context.id); - mmap_write_lock_nested(new, SINGLE_DEPTH_NESTING); + mmap_assert_write_locked(new); uml_setup_stubs(new); - mmap_write_unlock(new); } static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, diff --git a/fs/exec.c b/fs/exec.c index a91003e28eaa..229dbc7aa61a 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1114,6 +1114,8 @@ static int exec_mmap(struct mm_struct *mm) if (ret) return ret; + mmap_write_lock_nascent(mm); + if (old_mm) { /* * Make sure that if there is a core dump in progress @@ -1125,6 +1127,7 @@ static int exec_mmap(struct mm_struct *mm) if (unlikely(old_mm->core_state)) { mmap_read_unlock(old_mm); mutex_unlock(&tsk->signal->exec_update_mutex); + mmap_write_unlock(mm); return -EINTR; } } @@ -1138,6 +1141,7 @@ static int exec_mmap(struct mm_struct *mm) tsk->mm->vmacache_seqnum = 0; vmacache_flush(tsk); task_unlock(tsk); + mmap_write_unlock(mm); if (old_mm) { mmap_read_unlock(old_mm); BUG_ON(active_mm != old_mm); diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index 0707671851a8..24de1fe99ee4 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -3,6 +3,18 @@ #include +/* + * Lock subclasses for the mmap_lock. + * + * MMAP_LOCK_SUBCLASS_NASCENT is for core kernel code that wants to lock an mm + * that is still being constructed and wants to be able to access the active mm + * normally at the same time. It nests outside MMAP_LOCK_SUBCLASS_NORMAL. + */ +enum { + MMAP_LOCK_SUBCLASS_NORMAL = 0, + MMAP_LOCK_SUBCLASS_NASCENT +}; + #define MMAP_LOCK_INITIALIZER(name) \ .mmap_lock = __RWSEM_INITIALIZER((name).mmap_lock), @@ -16,9 +28,16 @@ static inline void mmap_write_lock(struct mm_struct *mm) down_write(&mm->mmap_lock); } -static inline void mmap_write_lock_nested(struct mm_struct *mm, int subclass) +/* + * Lock an mm_struct that is still being set up (during fork or exec). + * This nests outside the mmap locks of live mm_struct instances. + * No interruptible/killable versions exist because at the points where you're + * supposed to use this helper, the mm isn't visible to anything else, so we + * expect the mmap_lock to be uncontended. + */ +static inline void mmap_write_lock_nascent(struct mm_struct *mm) { - down_write_nested(&mm->mmap_lock, subclass); + down_write_nested(&mm->mmap_lock, MMAP_LOCK_SUBCLASS_NASCENT); } static inline int mmap_write_lock_killable(struct mm_struct *mm) diff --git a/kernel/fork.c b/kernel/fork.c index da8d360fb032..db67eb4ac7bd 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -474,6 +474,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, unsigned long charge; LIST_HEAD(uf); + mmap_write_lock_nascent(mm); uprobe_start_dup_mmap(); if (mmap_write_lock_killable(oldmm)) { retval = -EINTR; @@ -481,10 +482,6 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, } flush_cache_dup_mm(oldmm); uprobe_dup_mmap(oldmm, mm); - /* - * Not linked in yet - no deadlock potential: - */ - mmap_write_lock_nested(mm, SINGLE_DEPTH_NESTING); /* No ordering required: file already has been exposed. */ RCU_INIT_POINTER(mm->exe_file, get_mm_exe_file(oldmm)); @@ -600,12 +597,12 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, /* a new mm has just been created */ retval = arch_dup_mmap(oldmm, mm); out: - mmap_write_unlock(mm); flush_tlb_mm(oldmm); mmap_write_unlock(oldmm); dup_userfaultfd_complete(&uf); fail_uprobe_end: uprobe_end_dup_mmap(); + mmap_write_unlock(mm); return retval; fail_nomem_anon_vma_fork: mpol_put(vma_policy(tmp));