From patchwork Fri Oct 2 01:24:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jann Horn X-Patchwork-Id: 1375515 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.infradead.org (client-ip=2001:8b0:10b:1231::1; helo=merlin.infradead.org; envelope-from=linux-um-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; secure) header.d=lists.infradead.org header.i=@lists.infradead.org header.a=rsa-sha256 header.s=merlin.20170209 header.b=xsO1GDjq; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20161025 header.b=MZ9iRIkR; dkim-atps=neutral Received: from merlin.infradead.org (merlin.infradead.org [IPv6:2001:8b0:10b:1231::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4C2XNf1bsSz9sT6 for ; Fri, 2 Oct 2020 11:25:06 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:To:Subject:Message-ID:Date:MIME-Version:References: In-Reply-To:From:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=ksyFAQzbyCItDJUNEhg8MPAXQmH/E98OsfOrKevrobU=; b=xsO1GDjqEEjZtrX3YR7U5a9St eLmIpZHb+ksmSo9nGfnj1jBLt3b9sxrxwFrQq6xB3oDw0Bk2VUOWG/v7jG4UQXrY8JzfFgnFxtjZN O8Vs0ZbFkC9Yoc+bSlQ1b0wxjeCgOoo4FPucGeLd4Oip5YtmomRamK75Y/1ej6AsndsafLhIZLU3G OTsJCpFHzK7heeuewYc7vBTZq98ZfiOSdtUwxoloZsPq1x/kzABt+9baraBOM7LpwxEvVxt3tY31L virFkBY4AD14QrQGPZJxJf/m4VCCzL9q8QHFklMDwu7UZGh0upgz8ngUr/DNaTQlYInb681QF+vFs ZOTGxLbmg==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kO9ol-0000qh-11; Fri, 02 Oct 2020 01:25:03 +0000 Received: from mail-ej1-x643.google.com ([2a00:1450:4864:20::643]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1kO9oi-0000px-Gr for linux-um@lists.infradead.org; Fri, 02 Oct 2020 01:25:01 +0000 Received: by mail-ej1-x643.google.com with SMTP id ce10so559446ejc.5 for ; Thu, 01 Oct 2020 18:25:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=from:in-reply-to:references:mime-version:date:message-id:subject:to :cc; bh=JjWjRnD7VtTkrBKw9Z9yp/8Qca9PZwmKfr/gYPt5B7c=; b=MZ9iRIkR8sIxQPQeE2/FGCed+EjZ1SjnC22wlYVAy8q1L/mFYR0wDdMATSQjng4gdN Ak9UzylXFa9OzJmYIvjNwxLK21f2FkWTGbunq/qlXGBfgmZ3zYslxLSHRE/TxyOrWdMi Get48mIpPUmEMAJS7DUon9czQO2lMuPXJOVuKNRru4kgFQPb50E61ACGsPxORi1R6EFW RW2fxrM4gIexAPu+Ztou84pAchVRPu7s2IOXsZndSjqlVxoeoJJ5RmBWbDOT/GWVT194 883Gyvu7vcgmVVjzGe9ywt/Tb3RhLIey8XM8v6Uxttm3617h4wN5FLQICvWi+U04nXjy FJNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:in-reply-to:references:mime-version:date :message-id:subject:to:cc; bh=JjWjRnD7VtTkrBKw9Z9yp/8Qca9PZwmKfr/gYPt5B7c=; b=AisrESF59WgNNme/Fp39IcxkI+eJMeVKYTRzjxgIsMBrmfrnL2VDCeF0qvDqKKMqi3 mz53aFsQPg+ufZeUVZh8xdBSvBuclWkC1HxMiAvKFV7TE9JaGZjM7WBEIi454HPo2UWM 9JHfSPIyD7s4LtAKh9YAjp83fS4bZMaDijZzHcxrcx9ntDeGm1je1xvrG+dHrtO9YUUS oMXKWv8xkleH8RtzuD6xocFjTiP62Ku7XJkyRiFu7czJfsWfrCB5ybxhXwDIgFVAMEcT EixUuGipABDzIcJWIg4hZS3AtDkROgphlFKeuIeEsQnVZvMM1fmRwIAgIVhTUqBeOwwO 16RQ== X-Gm-Message-State: AOAM530FqK2/Hv4VmyAEMjChoqbniFEwOU3rCAPE1q3WWp48cNm5U0Gi MRNhB+v/LZ1fgelScnUOIHD+lvVry33tY0frHYUR3Q== X-Google-Smtp-Source: ABdhPJzNkr9JFa9Bj7OSKfyQ4dsgUA+6GfgW6WiqWaZ+qISi5o991nBt3a1CagiU1mNR40UmVdanrBbNQntb7NXy0JQ= X-Received: by 2002:a17:906:394:: with SMTP id b20mr9670379eja.513.1601601899361; Thu, 01 Oct 2020 18:24:59 -0700 (PDT) Received: from 913411032810 named unknown by gmailapi.google.com with HTTPREST; Thu, 1 Oct 2020 21:24:58 -0400 From: Jann Horn X-Mailer: git-send-email 2.28.0.806.g8561365e88-goog In-Reply-To: References: MIME-Version: 1.0 Date: Thu, 1 Oct 2020 21:24:58 -0400 Message-ID: Subject: [PATCH 1/2] mmap locking API: Order lock of nascent mm outside lock of live mm To: Andrew Morton , linux-mm@kvack.org X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201001_212500_586324_47CBC878 X-CRM114-Status: GOOD ( 20.37 ) X-Spam-Score: -15.7 (---------------) X-Spam-Report: SpamAssassin version 3.4.4 on merlin.infradead.org summary: Content analysis details: (-15.7 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [2a00:1450:4864:20:0:0:0:643 listed in] [list.dnswl.org] 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record -0.0 SPF_PASS SPF: sender matches SPF record -7.5 USER_IN_DEF_DKIM_WL From: address is in the default DKIM white-list -7.5 USER_IN_DEF_SPF_WL From: address is in the default SPF white-list -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.5 ENV_AND_HDR_SPF_MATCH Env and Hdr From used in default SPF WL Match -0.0 DKIMWL_WL_MED DKIMwl.org - Medium trust sender X-BeenThere: linux-um@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Michel Lespinasse , Jason Gunthorpe , Richard Weinberger , Jeff Dike , linux-um@lists.infradead.org, linux-kernel@vger.kernel.org, "Eric W . Biederman" , Sakari Ailus , John Hubbard , Mauro Carvalho Chehab , Anton Ivanov Sender: "linux-um" Errors-To: linux-um-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org Until now, the mmap lock of the nascent mm was ordered inside the mmap lock of the old mm (in dup_mmap() and in UML's activate_mm()). A following patch will change the exec path to very broadly lock the nascent mm, but fine-grained locking should still work at the same time for the new mm. To do this in a way that lockdep is happy about, let's turn around the lock ordering in both places that currently nest the locks. Since SINGLE_DEPTH_NESTING is normally used for the inner nesting layer, make up our own lock subclass MMAP_LOCK_SUBCLASS_NASCENT and use that instead. The added locking calls in exec_mmap() are temporary; the following patch will move the locking out of exec_mmap(). Signed-off-by: Jann Horn --- arch/um/include/asm/mmu_context.h | 3 +-- fs/exec.c | 4 ++++ include/linux/mmap_lock.h | 23 +++++++++++++++++++++-- kernel/fork.c | 7 ++----- 4 files changed, 28 insertions(+), 9 deletions(-) diff --git a/arch/um/include/asm/mmu_context.h b/arch/um/include/asm/mmu_context.h index 17ddd4edf875..c13bc5150607 100644 --- a/arch/um/include/asm/mmu_context.h +++ b/arch/um/include/asm/mmu_context.h @@ -48,9 +48,8 @@ static inline void activate_mm(struct mm_struct *old, struct mm_struct *new) * when the new ->mm is used for the first time. */ __switch_mm(&new->context.id); - mmap_write_lock_nested(new, SINGLE_DEPTH_NESTING); + mmap_assert_write_locked(new); uml_setup_stubs(new); - mmap_write_unlock(new); } static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, diff --git a/fs/exec.c b/fs/exec.c index a91003e28eaa..229dbc7aa61a 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1114,6 +1114,8 @@ static int exec_mmap(struct mm_struct *mm) if (ret) return ret; + mmap_write_lock_nascent(mm); + if (old_mm) { /* * Make sure that if there is a core dump in progress @@ -1125,6 +1127,7 @@ static int exec_mmap(struct mm_struct *mm) if (unlikely(old_mm->core_state)) { mmap_read_unlock(old_mm); mutex_unlock(&tsk->signal->exec_update_mutex); + mmap_write_unlock(mm); return -EINTR; } } @@ -1138,6 +1141,7 @@ static int exec_mmap(struct mm_struct *mm) tsk->mm->vmacache_seqnum = 0; vmacache_flush(tsk); task_unlock(tsk); + mmap_write_unlock(mm); if (old_mm) { mmap_read_unlock(old_mm); BUG_ON(active_mm != old_mm); diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index 0707671851a8..24de1fe99ee4 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -3,6 +3,18 @@ #include +/* + * Lock subclasses for the mmap_lock. + * + * MMAP_LOCK_SUBCLASS_NASCENT is for core kernel code that wants to lock an mm + * that is still being constructed and wants to be able to access the active mm + * normally at the same time. It nests outside MMAP_LOCK_SUBCLASS_NORMAL. + */ +enum { + MMAP_LOCK_SUBCLASS_NORMAL = 0, + MMAP_LOCK_SUBCLASS_NASCENT +}; + #define MMAP_LOCK_INITIALIZER(name) \ .mmap_lock = __RWSEM_INITIALIZER((name).mmap_lock), @@ -16,9 +28,16 @@ static inline void mmap_write_lock(struct mm_struct *mm) down_write(&mm->mmap_lock); } -static inline void mmap_write_lock_nested(struct mm_struct *mm, int subclass) +/* + * Lock an mm_struct that is still being set up (during fork or exec). + * This nests outside the mmap locks of live mm_struct instances. + * No interruptible/killable versions exist because at the points where you're + * supposed to use this helper, the mm isn't visible to anything else, so we + * expect the mmap_lock to be uncontended. + */ +static inline void mmap_write_lock_nascent(struct mm_struct *mm) { - down_write_nested(&mm->mmap_lock, subclass); + down_write_nested(&mm->mmap_lock, MMAP_LOCK_SUBCLASS_NASCENT); } static inline int mmap_write_lock_killable(struct mm_struct *mm) diff --git a/kernel/fork.c b/kernel/fork.c index da8d360fb032..db67eb4ac7bd 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -474,6 +474,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, unsigned long charge; LIST_HEAD(uf); + mmap_write_lock_nascent(mm); uprobe_start_dup_mmap(); if (mmap_write_lock_killable(oldmm)) { retval = -EINTR; @@ -481,10 +482,6 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, } flush_cache_dup_mm(oldmm); uprobe_dup_mmap(oldmm, mm); - /* - * Not linked in yet - no deadlock potential: - */ - mmap_write_lock_nested(mm, SINGLE_DEPTH_NESTING); /* No ordering required: file already has been exposed. */ RCU_INIT_POINTER(mm->exe_file, get_mm_exe_file(oldmm)); @@ -600,12 +597,12 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, /* a new mm has just been created */ retval = arch_dup_mmap(oldmm, mm); out: - mmap_write_unlock(mm); flush_tlb_mm(oldmm); mmap_write_unlock(oldmm); dup_userfaultfd_complete(&uf); fail_uprobe_end: uprobe_end_dup_mmap(); + mmap_write_unlock(mm); return retval; fail_nomem_anon_vma_fork: mpol_put(vma_policy(tmp)); From patchwork Fri Oct 2 01:25:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jann Horn X-Patchwork-Id: 1375516 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.infradead.org (client-ip=2001:8b0:10b:1231::1; helo=merlin.infradead.org; envelope-from=linux-um-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; secure) header.d=lists.infradead.org header.i=@lists.infradead.org header.a=rsa-sha256 header.s=merlin.20170209 header.b=QxfDp9Tc; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20161025 header.b=YrpQKZP4; dkim-atps=neutral Received: from merlin.infradead.org (merlin.infradead.org [IPv6:2001:8b0:10b:1231::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4C2XNk5CWRz9sT6 for ; Fri, 2 Oct 2020 11:25:10 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:To:Subject:Message-ID:Date:MIME-Version:References: In-Reply-To:From:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=HGbJk17vsMj5sT0HqUqmCOkB/xf8NTJmW+vAGX9cXls=; b=QxfDp9TcS+D0QQCvqV56oIXr6 RtbqdofSU9nHj1mpyKb8F1hMglI5khrpXe7mX7ZRhbE3X0DEiNJUrZP5pelIB7e5QjxxAcpllk5Zj 4vsmN0Au52N/DIYSnH8eUvBSGnKOoEHbQUafR7kvAesq0k4Lqd2EZj9N4WrsuU6zCfSJNTwt6Nan/ wdw9rEwl0HCkNr6TzbV8Up9cHmb3sHGdbx+dslUd9iRpUMyuurr7ceQpz3Ds3eYj5EZc6cXpXYge7 KvkFTcH1EHMOH4GbE+bJN0qy7nTeSyInDiiO/SYlcqeKfgITjhZnxJUWoy6a4YSve26BM6aSozlGH miRotvDLw==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kO9op-0000ry-Fk; Fri, 02 Oct 2020 01:25:07 +0000 Received: from mail-ej1-x643.google.com ([2a00:1450:4864:20::643]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1kO9on-0000rP-Ov for linux-um@lists.infradead.org; Fri, 02 Oct 2020 01:25:06 +0000 Received: by mail-ej1-x643.google.com with SMTP id p9so553782ejf.6 for ; Thu, 01 Oct 2020 18:25:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=from:in-reply-to:references:mime-version:date:message-id:subject:to :cc; bh=Kqs8a9E5uPOTBp74swyiiF8zCX3R3DCtjj7F/Xz7M0M=; b=YrpQKZP4LvvC42rx/oChM9+PWWxB9yCHXQgbCQcYI7Nnbzt0fKSh60i+aejNgAmbXc SvsUMZkpxiy7JKwBIbiJwHpUDUdfddKsvGighKIy6yzGdTfWxkZkGNRXR5Re952f5Ga/ Hl4Qe+15bcmgSHq1Wc6O/o1IxAlgmEVoUKoYuVH5+XmiT8T7fxocJlF6WN41M4lpGwUn ong/8Gi5OPDs2/zQ6e0jGFJlvTbk+zDls+z+59WjWoHkqMFhS7Mo/mAyK9WM3rL9yVEn LB0hnLToPsKOXZxi5YGzacuMeLRJCt0RaIOPu9f8QVIOU9oaqXKzbXRKLQC45H/wHh+z 8emA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:in-reply-to:references:mime-version:date :message-id:subject:to:cc; bh=Kqs8a9E5uPOTBp74swyiiF8zCX3R3DCtjj7F/Xz7M0M=; b=qEAdodQUTwYHJO733q/nAlrjkaPE/lgMr193vwjXQotUMhDve3SEsrwGw3Vl1WMTbC EpGYN/hDwgzsK4u55JGPv1vrs1ajwgHvOFF1L5vpS5IuyrGxMedS4Vltje4IufaK5t8G oEskQ1yDSbQaq+Wt8uFh4LG/LfaSKVRuf19xttrcfZLZZjrf1mPLJck4V+7AxbbYj628 wqJW9/m7qWpP/D4XfK4Ke5N4/CpDZAl4RMWUKdFEG0VYqh3c8xOC1mWkXxexGOaFMt0W CpOr7J1dNvT5u14qn0PslPvDM+pCD6hEMb3QiTMj8QKS5Qa4+6JdNcC1bxPzA58HYN+L jOYg== X-Gm-Message-State: AOAM533BSwYn8bpT7FMX87epj7mJc6tdfxf0BQv/dPVf9sIkME6ClqAc vX31AmrR4pEogEC8H3jDK+7/YiKXtZ+FeHyILCa4uw== X-Google-Smtp-Source: ABdhPJzVu3SoEuRQ33jBZmzhi/XuwzQ/9eJHcY97azL852GtwzhzffTrBq6YxHopboXK4kUbi1LzATXGMeGFDdIdU9w= X-Received: by 2002:a17:906:fcae:: with SMTP id qw14mr4739966ejb.537.1601601904679; Thu, 01 Oct 2020 18:25:04 -0700 (PDT) Received: from 913411032810 named unknown by gmailapi.google.com with HTTPREST; Thu, 1 Oct 2020 21:25:04 -0400 From: Jann Horn X-Mailer: git-send-email 2.28.0.806.g8561365e88-goog In-Reply-To: References: MIME-Version: 1.0 Date: Thu, 1 Oct 2020 21:25:04 -0400 Message-ID: Subject: [PATCH 2/2] exec: Broadly lock nascent mm until setup_arg_pages() To: Andrew Morton , linux-mm@kvack.org X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201001_212505_842649_135B9892 X-CRM114-Status: GOOD ( 24.01 ) X-Spam-Score: -15.7 (---------------) X-Spam-Report: SpamAssassin version 3.4.4 on merlin.infradead.org summary: Content analysis details: (-15.7 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [2a00:1450:4864:20:0:0:0:643 listed in] [list.dnswl.org] 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record -0.0 SPF_PASS SPF: sender matches SPF record -7.5 USER_IN_DEF_DKIM_WL From: address is in the default DKIM white-list -7.5 USER_IN_DEF_SPF_WL From: address is in the default SPF white-list -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.5 ENV_AND_HDR_SPF_MATCH Env and Hdr From used in default SPF WL Match -0.0 DKIMWL_WL_MED DKIMwl.org - Medium trust sender X-BeenThere: linux-um@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Michel Lespinasse , Jason Gunthorpe , Richard Weinberger , Jeff Dike , linux-um@lists.infradead.org, linux-kernel@vger.kernel.org, "Eric W . Biederman" , Sakari Ailus , John Hubbard , Mauro Carvalho Chehab , Anton Ivanov Sender: "linux-um" Errors-To: linux-um-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org While AFAIK there currently is nothing that can modify the VMA tree of a new mm until userspace has started running under the mm, we should properly lock the mm here anyway, both to keep lockdep happy when adding locking assertions and to be safe in the future in case someone e.g. decides to permit VMA-tree-mutating operations in process_madvise_behavior_valid(). The goal of this patch is to broadly lock the nascent mm in the exec path, from around the time it is created all the way to the end of setup_arg_pages() (because setup_arg_pages() accesses bprm->vma). As long as the mm is write-locked, keep it around in bprm->mm, even after it has been installed on the task (with an extra reference on the mm, to reduce complexity in free_bprm()). After setup_arg_pages(), we have to unlock the mm so that APIs such as copy_to_user() will work in the following binfmt-specific setup code. Suggested-by: Jason Gunthorpe Suggested-by: Michel Lespinasse Signed-off-by: Jann Horn --- fs/exec.c | 68 ++++++++++++++++++++--------------------- include/linux/binfmts.h | 2 +- 2 files changed, 35 insertions(+), 35 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 229dbc7aa61a..fe11d77e397a 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -254,11 +254,6 @@ static int __bprm_mm_init(struct linux_binprm *bprm) return -ENOMEM; vma_set_anonymous(vma); - if (mmap_write_lock_killable(mm)) { - err = -EINTR; - goto err_free; - } - /* * Place the stack at the largest stack address the architecture * supports. Later, we'll move this to an appropriate place. We don't @@ -276,12 +271,9 @@ static int __bprm_mm_init(struct linux_binprm *bprm) goto err; mm->stack_vm = mm->total_vm = 1; - mmap_write_unlock(mm); bprm->p = vma->vm_end - sizeof(void *); return 0; err: - mmap_write_unlock(mm); -err_free: bprm->vma = NULL; vm_area_free(vma); return err; @@ -364,9 +356,9 @@ static int bprm_mm_init(struct linux_binprm *bprm) struct mm_struct *mm = NULL; bprm->mm = mm = mm_alloc(); - err = -ENOMEM; if (!mm) - goto err; + return -ENOMEM; + mmap_write_lock_nascent(mm); /* Save current stack limit for all calculations made during exec. */ task_lock(current->group_leader); @@ -374,17 +366,12 @@ static int bprm_mm_init(struct linux_binprm *bprm) task_unlock(current->group_leader); err = __bprm_mm_init(bprm); - if (err) - goto err; - - return 0; - -err: - if (mm) { - bprm->mm = NULL; - mmdrop(mm); - } + if (!err) + return 0; + bprm->mm = NULL; + mmap_write_unlock(mm); + mmdrop(mm); return err; } @@ -735,6 +722,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift) /* * Finalizes the stack vm_area_struct. The flags and permissions are updated, * the stack is optionally relocated, and some extra space is added. + * At the end of this, the mm_struct will be unlocked on success. */ int setup_arg_pages(struct linux_binprm *bprm, unsigned long stack_top, @@ -787,9 +775,6 @@ int setup_arg_pages(struct linux_binprm *bprm, bprm->loader -= stack_shift; bprm->exec -= stack_shift; - if (mmap_write_lock_killable(mm)) - return -EINTR; - vm_flags = VM_STACK_FLAGS; /* @@ -807,7 +792,7 @@ int setup_arg_pages(struct linux_binprm *bprm, ret = mprotect_fixup(vma, &prev, vma->vm_start, vma->vm_end, vm_flags); if (ret) - goto out_unlock; + return ret; BUG_ON(prev != vma); if (unlikely(vm_flags & VM_EXEC)) { @@ -819,7 +804,7 @@ int setup_arg_pages(struct linux_binprm *bprm, if (stack_shift) { ret = shift_arg_pages(vma, stack_shift); if (ret) - goto out_unlock; + return ret; } /* mprotect_fixup is overkill to remove the temporary stack flags */ @@ -846,11 +831,17 @@ int setup_arg_pages(struct linux_binprm *bprm, current->mm->start_stack = bprm->p; ret = expand_stack(vma, stack_base); if (ret) - ret = -EFAULT; + return -EFAULT; -out_unlock: + /* + * From this point on, anything that wants to poke around in the + * mm_struct must lock it by itself. + */ + bprm->vma = NULL; mmap_write_unlock(mm); - return ret; + mmput(mm); + bprm->mm = NULL; + return 0; } EXPORT_SYMBOL(setup_arg_pages); @@ -1114,8 +1105,6 @@ static int exec_mmap(struct mm_struct *mm) if (ret) return ret; - mmap_write_lock_nascent(mm); - if (old_mm) { /* * Make sure that if there is a core dump in progress @@ -1127,11 +1116,12 @@ static int exec_mmap(struct mm_struct *mm) if (unlikely(old_mm->core_state)) { mmap_read_unlock(old_mm); mutex_unlock(&tsk->signal->exec_update_mutex); - mmap_write_unlock(mm); return -EINTR; } } + /* bprm->mm stays refcounted, current->mm takes an extra reference */ + mmget(mm); task_lock(tsk); active_mm = tsk->active_mm; membarrier_exec_mmap(mm); @@ -1141,7 +1131,6 @@ static int exec_mmap(struct mm_struct *mm) tsk->mm->vmacache_seqnum = 0; vmacache_flush(tsk); task_unlock(tsk); - mmap_write_unlock(mm); if (old_mm) { mmap_read_unlock(old_mm); BUG_ON(active_mm != old_mm); @@ -1397,8 +1386,6 @@ int begin_new_exec(struct linux_binprm * bprm) if (retval) goto out; - bprm->mm = NULL; - #ifdef CONFIG_POSIX_TIMERS exit_itimers(me->signal); flush_itimer_signals(); @@ -1545,6 +1532,18 @@ void setup_new_exec(struct linux_binprm * bprm) me->mm->task_size = TASK_SIZE; mutex_unlock(&me->signal->exec_update_mutex); mutex_unlock(&me->signal->cred_guard_mutex); + +#ifndef CONFIG_MMU + /* + * On MMU, setup_arg_pages() wants to access bprm->vma after this point, + * so we can't drop the mmap lock yet. + * On !MMU, we have neither setup_arg_pages() nor bprm->vma, so we + * should drop the lock here. + */ + mmap_write_unlock(bprm->mm); + mmput(bprm->mm); + bprm->mm = NULL; +#endif } EXPORT_SYMBOL(setup_new_exec); @@ -1581,6 +1580,7 @@ static void free_bprm(struct linux_binprm *bprm) { if (bprm->mm) { acct_arg_size(bprm, 0); + mmap_write_unlock(bprm->mm); mmput(bprm->mm); } free_arg_pages(bprm); diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index 0571701ab1c5..3bf06212fbae 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -22,7 +22,7 @@ struct linux_binprm { # define MAX_ARG_PAGES 32 struct page *page[MAX_ARG_PAGES]; #endif - struct mm_struct *mm; + struct mm_struct *mm; /* nascent mm, write-locked */ unsigned long p; /* current top of mem */ unsigned long argmin; /* rlimit marker for copy_strings() */ unsigned int