From patchwork Thu Sep 9 20:22:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mauricio Faria de Oliveira X-Patchwork-Id: 1526292 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=canonical.com header.i=@canonical.com header.a=rsa-sha256 header.s=20210705 header.b=r0rY1V+v; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4H59Rn18d1z9ssP; Fri, 10 Sep 2021 06:23:01 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1mOQZT-00022u-Mt; Thu, 09 Sep 2021 20:22:55 +0000 Received: from smtp-relay-internal-0.internal ([10.131.114.225] helo=smtp-relay-internal-0.canonical.com) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1mOQZJ-0001uM-BA for kernel-team@lists.ubuntu.com; Thu, 09 Sep 2021 20:22:45 +0000 Received: from mail-qv1-f69.google.com (mail-qv1-f69.google.com [209.85.219.69]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-0.canonical.com (Postfix) with ESMTPS id B27EF4025C for ; Thu, 9 Sep 2021 20:22:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1631218964; bh=vOchlIZjPaVR8UdRz7LpqoAgW/tEoug24ksVgpywphw=; h=From:To:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=r0rY1V+vsg5zrbMGX86t2cZ+rOD0TDx2PVk7rN/kpAAKmx8ShLMQ/TtVD6jY4WwQb N3cYsriADbnGhfX7NnV4e4VxTUN9xGjQr29D8269A25WFywrEqqiTcgdHnOWnV3BZh mCHzYYFgiEa8FPWL85eVh+LCFti5hXNu3MP0dmF2sY+HBwckqg4X3AiGfcK0YA6pg1 Cgo1LIJKxOv8095C0kmAHx8ShLObwaUz6ahBw5MiVT36AU9O1i0okm05AjZogFDiob KTb3W6VPCXkfydHlH9TkJRMIjYu3LUKrfTmdbEQinbAhw8mpYr2aFeJ+IEkLEyEVkL 9XWkJwCRuGcXw== Received: by mail-qv1-f69.google.com with SMTP id h9-20020a05621413a900b0037a2d3eaf8fso11553968qvz.8 for ; Thu, 09 Sep 2021 13:22:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=vOchlIZjPaVR8UdRz7LpqoAgW/tEoug24ksVgpywphw=; b=vE68GsHZIGbqUWrE3QZn+v3lwhdlEPrJ6kNcAZaoIvpN37SO5qZlKJLe/OQRgiBgqT nEExDrRfAOlyAFhboikoYrijr850rfT2O0kVFsazh4lOpOiA4ybKpwhRPWjO42/aspz/ 6yvQBnJqz4Khx23tIHzvzw/8W73rA6jwZfuiqtRhM8sxrKgX9zjarTdJ3AIDrr7u3C1P SL7pGPmHfrcts6L+jIbvvtzQapLrl4Zt3J1ol3IF8miRbMJ/e78tyNIC8DwLsZ3UFIVe aPDQoJAezMwi5elrUaGZO8ZOG/ihcZCwWpemEDeMIpKIiG7D9t9gPdW4CLbXOdfq9YFF F9zw== X-Gm-Message-State: AOAM5330prylUWmnSbZiCt4yHWJ36+oK3jnlLuha/nqHNu5+wSPdpSCG 4/5yn4/QGiyE9661MEC5muG0Qaj8+8zoQgLULnovDoKvLG3+3rUYGgHM/mQMuKr24dBtH9jZeCI iPSn+rKd4Rbjz3VeCuzhlDpkIMfJgJLjWgcAQQNTuwg== X-Received: by 2002:ac8:4b64:: with SMTP id g4mr4629421qts.263.1631218963380; Thu, 09 Sep 2021 13:22:43 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxPo+0ZmnQAJj4pGPQDNulxLQfYx9IU0SVSK1HpduGVmXyHppWJB4hjBLokEtJWJiijQAxgUQ== X-Received: by 2002:ac8:4b64:: with SMTP id g4mr4629397qts.263.1631218963134; Thu, 09 Sep 2021 13:22:43 -0700 (PDT) Received: from mfo-t470.. ([2804:14c:4e1:8732:e256:1fca:b0d8:d6a8]) by smtp.gmail.com with ESMTPSA id t64sm2172210qkd.71.2021.09.09.13.22.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Sep 2021 13:22:42 -0700 (PDT) From: Mauricio Faria de Oliveira To: kernel-team@lists.ubuntu.com Subject: [B][PATCH 3/5] ext4: data=journal: fixes for ext4_page_mkwrite() Date: Thu, 9 Sep 2021 17:22:28 -0300 Message-Id: <20210909202230.886329-8-mfo@canonical.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210909202230.886329-1-mfo@canonical.com> References: <20210909202230.886329-1-mfo@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" BugLink: https://bugs.launchpad.net/bugs/1847340 These are two fixes for data journalling required by the next patch, discovered while testing it. First, the optimization to return early if all buffers are mapped is not appropriate for the next patch: The inode _must_ be added to the transaction's list in data=journal mode (so to write-protect pages on commit) thus we cannot return early there. Second, once that optimization to reduce transactions was disabled for data=journal mode, more transactions happened, and occasionally hit this warning message: 'JBD2: Spotted dirty metadata buffer'. Reason is, block_page_mkwrite() will set_buffer_dirty() before do_journal_get_write_access() that is there to prevent it. This issue was masked by the optimization. So, on data=journal use __block_write_begin() instead. This also requires page locking and len recalculation. (see block_page_mkwrite() for implementation details.) Finally, as Jan noted there is little sharing between data=journal and other modes in ext4_page_mkwrite(). However, a prototype of ext4_journalled_page_mkwrite() showed there still would be lots of duplicated lines (tens of) that didn't seem worth it. Thus this patch ends up with an ugly goto to skip all non-data journalling code (to avoid long indentations, but that can be changed..) in the beginning, and just a conditional in the transaction section. Well, we skip a common part to data journalling which is the page truncated check, but we do it again after ext4_journal_start() when we re-acquire the page lock (so not to acquire the page lock twice needlessly for data journalling.) Signed-off-by: Mauricio Faria de Oliveira Suggested-by: Jan Kara Reviewed-by: Jan Kara Reviewed-by: Andreas Dilger Link: https://lore.kernel.org/r/20201006004841.600488-4-mfo@canonical.com Signed-off-by: Theodore Ts'o (backported from commit 64a9f1449950c774743420cf374047043e32fde4) [mfo: backport: inode.c/hunks 1,3: s/err/ret/, due to lack of commit 401b25aa1a75 ("ext4: convert fault handler to use vm_fault_t type"), which cannot be a dependency, bcz Bionic has no vm_fault_t.] Signed-off-by: Mauricio Faria de Oliveira --- fs/ext4/inode.c | 51 ++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 44 insertions(+), 7 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 4399ea128388..2b36a57dc05e 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -6228,9 +6228,17 @@ int ext4_page_mkwrite(struct vm_fault *vmf) if (ret) goto out_ret; + /* + * On data journalling we skip straight to the transaction handle: + * there's no delalloc; page truncated will be checked later; the + * early return w/ all buffers mapped (calculates size/len) can't + * be used; and there's no dioread_nolock, so only ext4_get_block. + */ + if (ext4_should_journal_data(inode)) + goto retry_alloc; + /* Delalloc case is easy... */ if (test_opt(inode->i_sb, DELALLOC) && - !ext4_should_journal_data(inode) && !ext4_nonda_switch(inode->i_sb)) { do { ret = block_page_mkwrite(vma, vmf, @@ -6256,6 +6264,9 @@ int ext4_page_mkwrite(struct vm_fault *vmf) /* * Return if we have all the buffers mapped. This avoids the need to do * journal_start/journal_stop which can block and take a long time + * + * This cannot be done for data journalling, as we have to add the + * inode to the transaction's list to writeprotect pages on commit. */ if (page_has_buffers(page)) { if (!ext4_walk_page_buffers(NULL, page_buffers(page), @@ -6280,16 +6291,42 @@ int ext4_page_mkwrite(struct vm_fault *vmf) ret = VM_FAULT_SIGBUS; goto out; } - ret = block_page_mkwrite(vma, vmf, get_block); - if (!ret && ext4_should_journal_data(inode)) { - if (ext4_walk_page_buffers(handle, page_buffers(page), 0, - PAGE_SIZE, NULL, do_journal_get_write_access)) { + /* + * Data journalling can't use block_page_mkwrite() because it + * will set_buffer_dirty() before do_journal_get_write_access() + * thus might hit warning messages for dirty metadata buffers. + */ + if (!ext4_should_journal_data(inode)) { + ret = block_page_mkwrite(vma, vmf, get_block); + } else { + lock_page(page); + size = i_size_read(inode); + /* Page got truncated from under us? */ + if (page->mapping != mapping || page_offset(page) > size) { unlock_page(page); - ret = VM_FAULT_SIGBUS; + ret = VM_FAULT_NOPAGE; ext4_journal_stop(handle); goto out; } - ext4_set_inode_state(inode, EXT4_STATE_JDATA); + + if (page->index == size >> PAGE_SHIFT) + len = size & ~PAGE_MASK; + else + len = PAGE_SIZE; + + ret = __block_write_begin(page, 0, len, ext4_get_block); + if (!ret) { + if (ext4_walk_page_buffers(handle, page_buffers(page), + 0, len, NULL, do_journal_get_write_access)) { + unlock_page(page); + ret = VM_FAULT_SIGBUS; + ext4_journal_stop(handle); + goto out; + } + ext4_set_inode_state(inode, EXT4_STATE_JDATA); + } else { + unlock_page(page); + } } ext4_journal_stop(handle); if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))