From patchwork Mon Oct 28 10:50:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Bobrowski X-Patchwork-Id: 1185293 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=mbobrowski.org Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=mbobrowski-org.20150623.gappssmtp.com header.i=@mbobrowski-org.20150623.gappssmtp.com header.b="idgzY9km"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 471s285XX0z9sPT for ; Mon, 28 Oct 2019 21:50:40 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388325AbfJ1Kuk (ORCPT ); Mon, 28 Oct 2019 06:50:40 -0400 Received: from mail-pg1-f193.google.com ([209.85.215.193]:38604 "EHLO mail-pg1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388297AbfJ1Kuk (ORCPT ); Mon, 28 Oct 2019 06:50:40 -0400 Received: by mail-pg1-f193.google.com with SMTP id w3so6642417pgt.5 for ; Mon, 28 Oct 2019 03:50:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mbobrowski-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=2ies8OSR13TA7Jp7kWGv+5wsPA+gLPOwFV0PFLes+tE=; b=idgzY9km4ZymqCqJaxtjO+0q0W941CVEFuRLmx4zne73d4vqSH/0Qe6y+ItLL8ln+a jIcdv0tlpUdzxcxd6KUo6VYCudr2nAL/MF6oyUjzuflIU6Q3PCnjy4JlfCtlxV80LaWl O6OtzimuKGGpbEzOTtsMd3O2q4BCHk9hLvHzbRmTRAAQvhv7PwBZ3EHUXvzEt68CGVQB x62msDbHzDthkeeKKrOuxBXi+fKmm+azxk8e8cwyE67sFtvZCtxTMKKNWsO0GivbfQ6B Y3fGwWpxdgVFdUOEwVPwnyQGh+Gx2V6e40NFMjejXGo8TlDR8Aq1iv9Zgc1sFoIIGvqu mWsw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=2ies8OSR13TA7Jp7kWGv+5wsPA+gLPOwFV0PFLes+tE=; b=hnC3jRGu7C9JpTTx6EUDTXERE6mJaH9tJJlZkYmFeChYc5uYg1HVyq/RZNZaOxDlJv Sqvly4io5JPpw+zaMKWMLhb+0sO4UaOUJ85/SF1A2vIUvmyh2Chp/3jDpbz+WiPlkDXX qLD66hyzuAemaHfXC2PWngANjmC5WHI2dIRnir8a0Rbi/Nj0RfFDKSb/Fnoa7CZ3cRxb rvkMpQM0oNmzwkS2xkNw97TeKbT4+0Ez7EL+Nhwsq5Yz3yKBuYLIziscDqSAxmS0DGZk nj9WNLHFOf35dbNgdXhVSrunCGvXvecK/6qWpqMx8OnDBPGd/LcdJ+3e2cXEKtjbDLY+ Q8uQ== X-Gm-Message-State: APjAAAWVmXMtR23N8ee42oS4W03cvrBLvKe/kzTGHs85exZQ03DaGBDT UuV7pauYRSWMIu7PdXU7sKwy X-Google-Smtp-Source: APXvYqy//cEjP+6Jvq/xP++tGHTGJ/km9mLsA5w7vy0g4d1acQF3C/Wjg30kTBo+WdBztVYsI+ELiw== X-Received: by 2002:a17:90a:5d0f:: with SMTP id s15mr21536034pji.126.1572259839283; Mon, 28 Oct 2019 03:50:39 -0700 (PDT) Received: from poseidon.bobrowski.net (d114-78-127-22.bla803.nsw.optusnet.com.au. [114.78.127.22]) by smtp.gmail.com with ESMTPSA id q13sm11740176pjq.0.2019.10.28.03.50.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Oct 2019 03:50:38 -0700 (PDT) Date: Mon, 28 Oct 2019 21:50:32 +1100 From: Matthew Bobrowski To: tytso@mit.edu, jack@suse.cz, adilger.kernel@dilger.ca Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, hch@infradead.org, david@fromorbit.com, darrick.wong@oracle.com Subject: [PATCH v6 01/11] ext4: reorder map.m_flags checks within ext4_iomap_begin() Message-ID: <7fcb28c72d81123f7882b8420b76860c05f302f8.1572255424.git.mbobrowski@mbobrowski.org> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org For the direct I/O changes that follow in this patch series, we need to accommodate for the case where the block mapping flags passed through to ext4_map_blocks() result in m_flags having both EXT4_MAP_MAPPED and EXT4_MAP_UNWRITTEN bits set. In order for any allocated unwritten extents to be converted correctly in the ->end_io() handler, the iomap->type must be set to IOMAP_UNWRITTEN for cases where the EXT4_MAP_UNWRITTEN bit has been set within m_flags. Hence the reason why we need to reshuffle this conditional statement around. This change is a no-op for DAX as the block mapping flags passed through to ext4_map_blocks() i.e. EXT4_GET_BLOCKS_CREATE_ZERO never results in both EXT4_MAP_MAPPED and EXT4_MAP_UNWRITTEN being set at once. Signed-off-by: Matthew Bobrowski Reviewed-by: Jan Kara Reviewed-by: Ritesh Harjani --- fs/ext4/inode.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index abaaf7d96ca4..ee116344c420 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3535,10 +3535,20 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, iomap->type = delalloc ? IOMAP_DELALLOC : IOMAP_HOLE; iomap->addr = IOMAP_NULL_ADDR; } else { - if (map.m_flags & EXT4_MAP_MAPPED) { - iomap->type = IOMAP_MAPPED; - } else if (map.m_flags & EXT4_MAP_UNWRITTEN) { + /* + * Flags passed into ext4_map_blocks() for direct I/O writes + * can result in m_flags having both EXT4_MAP_MAPPED and + * EXT4_MAP_UNWRITTEN bits set. In order for any allocated + * unwritten extents to be converted into written extents + * correctly within the ->end_io() handler, we need to ensure + * that the iomap->type is set appropriately. Hence the reason + * why we need to check whether EXT4_MAP_UNWRITTEN is set + * first. + */ + if (map.m_flags & EXT4_MAP_UNWRITTEN) { iomap->type = IOMAP_UNWRITTEN; + } else if (map.m_flags & EXT4_MAP_MAPPED) { + iomap->type = IOMAP_MAPPED; } else { WARN_ON_ONCE(1); return -EIO; From patchwork Mon Oct 28 10:50:54 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Bobrowski X-Patchwork-Id: 1185294 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=mbobrowski.org Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=mbobrowski-org.20150623.gappssmtp.com header.i=@mbobrowski-org.20150623.gappssmtp.com header.b="G9Md5Kat"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 471s2b1XGRz9sPT for ; Mon, 28 Oct 2019 21:51:03 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388328AbfJ1KvC (ORCPT ); Mon, 28 Oct 2019 06:51:02 -0400 Received: from mail-pf1-f194.google.com ([209.85.210.194]:41290 "EHLO mail-pf1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388321AbfJ1KvB (ORCPT ); Mon, 28 Oct 2019 06:51:01 -0400 Received: by mail-pf1-f194.google.com with SMTP id p26so2507799pfq.8 for ; Mon, 28 Oct 2019 03:51:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mbobrowski-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=adn4gOH9wM1yK1i3uZc/QwdCwMT/Epp1BBGQLJPWPVM=; b=G9Md5Kat3ZszJ6WvVofzucKe1kR+H5u+O0+UtzEYZIgJOgKbina+CIMx7USP3hYliB 5f+BUq26w57DXOCHzwjIe9LuGD+6Bu2VqhuJI+K2dRDNX1FNYQ/sgBuk0wP0m1JwOaCZ vOko0PQBkDVdh5mekf6tVVelrUCXtjEhT1AHa5Z/tfHx5VlMEpRkPqD865pdlyCbxsI+ yVPyYnTzTnzlqh/2iwK7Ccm0dS2ijXW4ilvpsHClehDQw0vy5gkDM5Ha8KfdUbyEG2o4 TQG8wHL2YgziVGM83VU+gFjg7czeqAN0AhzeLaMDix+EB7Sjh85rPKeJqvNMY1zvK2EG Dduw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=adn4gOH9wM1yK1i3uZc/QwdCwMT/Epp1BBGQLJPWPVM=; b=VO9gWiIA8UFODFG1hbLD2b0FyLcJP7WcUuh4ILvaom/Q9+RYRfyb0Mfien43AZhgPY 2z0NOaecOiDQPI8j7lyXuH7DbFXtngu15GbXobNXuWSYT+sZ4GIHELnYXjeMn4HaIpNA rO2bkzUzMll1f2m7oacvc59/G8oUcg5/PsdhNI58x3Q5rpARoGCPBOJLvK6U07kSc5rU JY3hGM39QJVG7Qzivz27i+p9Bs0vXCBsBr4Map5cBOVKzTQGxslqh9CXNbruDl1Anx6Q cbuJuoacXwfSW8mktZZhURy4IWf7h8M8Z6h2N+2+liMXF5usmnGAfVcvLeztG4cw/IQ4 S2qg== X-Gm-Message-State: APjAAAWx2s8uZYS7Tel57lKLoJOP8kk7D/3H4ratLuDVWgGPe/5RbDPg 5D67kRbfaEspVYB0d9RDKvgP X-Google-Smtp-Source: APXvYqxxGA+x43l4hHBehKoNsCBRa2AermBDxa/8TQql01AraTK0RxXJ7Iu0d7PBH7mRoW5CoEHAYQ== X-Received: by 2002:a62:5ac3:: with SMTP id o186mr19860150pfb.20.1572259861132; Mon, 28 Oct 2019 03:51:01 -0700 (PDT) Received: from poseidon.bobrowski.net (d114-78-127-22.bla803.nsw.optusnet.com.au. [114.78.127.22]) by smtp.gmail.com with ESMTPSA id 21sm2820419pfa.170.2019.10.28.03.50.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Oct 2019 03:51:00 -0700 (PDT) Date: Mon, 28 Oct 2019 21:50:54 +1100 From: Matthew Bobrowski To: tytso@mit.edu, jack@suse.cz, adilger.kernel@dilger.ca Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, hch@infradead.org, david@fromorbit.com, darrick.wong@oracle.com Subject: [PATCH v6 02/11] ext4: update direct I/O read lock pattern for IOCB_NOWAIT Message-ID: <17824b863511c87c3b4ea36531ca3c1430d30660.1572255425.git.mbobrowski@mbobrowski.org> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org This patch updates the lock pattern in ext4_direct_IO_read() to not block on inode lock in cases of IOCB_NOWAIT direct I/O reads. The locking condition implemented here is similar to that of 942491c9e6d6 ("xfs: fix AIM7 regression"). Fixes: 16c54688592c ("ext4: Allow parallel DIO reads") Signed-off-by: Matthew Bobrowski Reviewed-by: Jan Kara Reviewed-by: Ritesh Harjani --- fs/ext4/inode.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index ee116344c420..0a9ea291cfab 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3837,7 +3837,13 @@ static ssize_t ext4_direct_IO_read(struct kiocb *iocb, struct iov_iter *iter) * writes & truncates and since we take care of writing back page cache, * we are protected against page writeback as well. */ - inode_lock_shared(inode); + if (iocb->ki_flags & IOCB_NOWAIT) { + if (!inode_trylock_shared(inode)) + return -EAGAIN; + } else { + inode_lock_shared(inode); + } + ret = filemap_write_and_wait_range(mapping, iocb->ki_pos, iocb->ki_pos + count - 1); if (ret) From patchwork Mon Oct 28 10:51:16 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Bobrowski X-Patchwork-Id: 1185296 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=mbobrowski.org Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=mbobrowski-org.20150623.gappssmtp.com header.i=@mbobrowski-org.20150623.gappssmtp.com header.b="J4jFXinL"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 471s313xV1z9sNw for ; Mon, 28 Oct 2019 21:51:25 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388349AbfJ1KvY (ORCPT ); Mon, 28 Oct 2019 06:51:24 -0400 Received: from mail-pf1-f195.google.com ([209.85.210.195]:35705 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388345AbfJ1KvY (ORCPT ); Mon, 28 Oct 2019 06:51:24 -0400 Received: by mail-pf1-f195.google.com with SMTP id d13so5594189pfq.2 for ; Mon, 28 Oct 2019 03:51:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mbobrowski-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=G6CFq/h8k/c9oWEGqBl9JxHFVW55BNMnE9xk8vXWFJc=; b=J4jFXinLsX5dmwoyvFzhhoocnfyONCPBLWxTe23x2/t4nHuRk8IBRma6i1y4Dg11rG 1y5wXczUJYFBHaTrmnR00M5mGOqtXLrLxyJfj9ox5qp5NiEiba9hADYVBYcbucKgpKrt 2I8ght4J88dXiNHUUtNhHTk0Ty9oYuLaTXNGpELwJf0NOA2mZ8Qk4CLGn6lNhXkwsqM4 o+QjUrX17iHQrfNJgIjEdHO+3LN6Ag7v/LiUZlxSQsIUK9+YOdzCzZkG1vDAGWHNbR0y 0Yd0zVxM0Ms7CAs4o1mlkzjqGhotCuYv2FYuSxNt2z+rtNXIsbK/WZHM+QYC2Hy82LTp DJ/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=G6CFq/h8k/c9oWEGqBl9JxHFVW55BNMnE9xk8vXWFJc=; b=rAvEnBJBMb/hwIfJjTI1z6TsqvMMqgVmHkA8EDz0DupOf4qjepqyBaq17XNUgcs4ux U9wfvbkpMSnGpPaDMeMyqrui4re/ho3XP0FHK+likBOxJ5YYxjJ0bVuJA3Jp+7KMKEWF 9fRseB3LY9sMpV6DqZU1v4a6eGx19TZHy3DTh5b94TNq0hU5n5PHbj0qQTL9C6JTpeTW nhVCzGLKWL/y/3raF34SZr6WEo7HI7jJS06IZeQoCxU4c5am/oPa5+YcwmCfY+QhCDeN itvCapIzlRC2sSMB+52hTD88KlacQgwbXT7mbIt6PRR5ckRj167Lb2hyuRs4PJkWwNer uJuw== X-Gm-Message-State: APjAAAXi/10JIY1HF51tXJWSJOKIE6SxUJiksO+MGSW+m08hQ8VvpyDK O+i/FDkq1gKnwLDiyJwXWZpV X-Google-Smtp-Source: APXvYqwS8C4L1744maTaOYKwBZ7s1w5VWttnJ10FMSz29wrbBk64/wb4Cwe8o07TuyVhG9UTfc0Q6A== X-Received: by 2002:a63:c40e:: with SMTP id h14mr19858935pgd.254.1572259883449; Mon, 28 Oct 2019 03:51:23 -0700 (PDT) Received: from poseidon.bobrowski.net (d114-78-127-22.bla803.nsw.optusnet.com.au. [114.78.127.22]) by smtp.gmail.com with ESMTPSA id m4sm8948764pjs.8.2019.10.28.03.51.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Oct 2019 03:51:22 -0700 (PDT) Date: Mon, 28 Oct 2019 21:51:16 +1100 From: Matthew Bobrowski To: tytso@mit.edu, jack@suse.cz, adilger.kernel@dilger.ca Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, hch@infradead.org, david@fromorbit.com, darrick.wong@oracle.com Subject: [PATCH v6 03/11] ext4: iomap that extends beyond EOF should be marked dirty Message-ID: <2c5b5fc06d1ba70676f09dfc5430ea1c4bafc631.1572255425.git.mbobrowski@mbobrowski.org> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org This patch addresses what Dave Chinner had discovered and fixed within commit: 7684e2c4384d. This changes does not have any user visible impact for ext4 as none of the current users of ext4_iomap_begin() that extend files depend on IOMAP_F_DIRTY. When doing a direct IO that spans the current EOF, and there are written blocks beyond EOF that extend beyond the current write, the only metadata update that needs to be done is a file size extension. However, we don't mark such iomaps as IOMAP_F_DIRTY to indicate that there is IO completion metadata updates required, and hence we may fail to correctly sync file size extensions made in IO completion when O_DSYNC writes are being used and the hardware supports FUA. Hence when setting IOMAP_F_DIRTY, we need to also take into account whether the iomap spans the current EOF. If it does, then we need to mark it dirty so that IO completion will call generic_write_sync() to flush the inode size update to stable storage correctly. Signed-off-by: Matthew Bobrowski Reviewed-by: Jan Kara Reviewed-by: Ritesh Harjani --- fs/ext4/inode.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 0a9ea291cfab..da2ca81e3d9c 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3523,8 +3523,14 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, return ret; } + /* + * Writes that span EOF might trigger an I/O size update on completion, + * so consider them to be dirty for the purposes of O_DSYNC, even if + * there is no other metadata changes being made or are pending here. + */ iomap->flags = 0; - if (ext4_inode_datasync_dirty(inode)) + if (ext4_inode_datasync_dirty(inode) || + offset + length > i_size_read(inode)) iomap->flags |= IOMAP_F_DIRTY; iomap->bdev = inode->i_sb->s_bdev; iomap->dax_dev = sbi->s_daxdev; From patchwork Mon Oct 28 10:51:31 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Bobrowski X-Patchwork-Id: 1185297 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=mbobrowski.org Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=mbobrowski-org.20150623.gappssmtp.com header.i=@mbobrowski-org.20150623.gappssmtp.com header.b="1xjffGMZ"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 471s3H4M5Cz9sNw for ; Mon, 28 Oct 2019 21:51:39 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388351AbfJ1Kvj (ORCPT ); Mon, 28 Oct 2019 06:51:39 -0400 Received: from mail-pf1-f195.google.com ([209.85.210.195]:39229 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388328AbfJ1Kvj (ORCPT ); Mon, 28 Oct 2019 06:51:39 -0400 Received: by mail-pf1-f195.google.com with SMTP id v4so6645705pff.6 for ; Mon, 28 Oct 2019 03:51:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mbobrowski-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=r8YoWjyUgBX7hg6bA3g/5JaU64FOGb9lDuPU6JwSpsg=; b=1xjffGMZQbimtgpqqLmvKbopiqBHlfMzYLjHo/SSClMjFOATCvAgrtpmSxZ0ahHKHN DuzetyHRbEk1yFFdUl5WeuyVmz/LZVsNZdvbJ0GCqyFLEDLzmiOJt4zfkk9iEsqJ2nFc 1heL4zBCw9Z5mOS1+p9s1yQB+t44UbQT7xPkDcc6dyeV3lSW1Q731siVQ6AYSvBEtrY2 zviYrjCwhK4fLS1PnSyG0gyISXqKl8hLl/yEBbrTEU39bQMkzNllBikbeaX3iNvzxyvp y+eBB5aSWxd8uYxj9KtdW2+mp4BYZdz6iYVSByPU2iYFW/8eaTcMjHwY1PYzBi6Fmkj2 7V6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=r8YoWjyUgBX7hg6bA3g/5JaU64FOGb9lDuPU6JwSpsg=; b=Wrq+rogtyyIQkvFym/U9lHUeGgdhDwZJmvCC4jtPqf0vGp3NqoUrF5E58uno3g1HxG a6eTx1BGQaHZRqPzDMcfq81TbLo/3pilWiO/0C9KlwMf90FZsUdi4p++15Om0SYK2+H/ A+H0LqiUPZzRJpkjZIBgI21zpND/9C3EzF2WJT8rZHPlI3kFUTG1syULszeL9ob8pKiU LTKsU4lz84A9LuLZem0zPQGkxQ+/eQGT0SFk+bHTDSiL6B/Yn8zOoxbNxmmCAP86LKpx at8bvf+COwFwI42uLvbxbz6HuDNziymj+SUo6Irlh16YOwTTW8A8OospfBwntuKI6QAS rM2Q== X-Gm-Message-State: APjAAAXxndCk5Tj2KpMl4+sR9Ng3Jq5QUkCB3H0lZwgeGW2b8hlme5kY F4XwI+Sey7Z8RvEefDWvClUG X-Google-Smtp-Source: APXvYqxTvUaJSCF/Y1b/4DtjRqpZfhfhH29U0azdsoQkUmCqWhDGIU+P24PhRaiGFQMX1EjPjpWrfQ== X-Received: by 2002:a17:90a:c717:: with SMTP id o23mr21431592pjt.88.1572259898061; Mon, 28 Oct 2019 03:51:38 -0700 (PDT) Received: from poseidon.bobrowski.net (d114-78-127-22.bla803.nsw.optusnet.com.au. [114.78.127.22]) by smtp.gmail.com with ESMTPSA id a24sm11430889pgd.93.2019.10.28.03.51.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Oct 2019 03:51:37 -0700 (PDT) Date: Mon, 28 Oct 2019 21:51:31 +1100 From: Matthew Bobrowski To: tytso@mit.edu, jack@suse.cz, adilger.kernel@dilger.ca Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, hch@infradead.org, david@fromorbit.com, darrick.wong@oracle.com Subject: [PATCH v6 04/11] ext4: move set iomap routines into a separate helper ext4_set_iomap() Message-ID: <36c0b0028215ed0a39697512054f3fa4799b0701.1572255425.git.mbobrowski@mbobrowski.org> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Separate the iomap field population code that is currently within ext4_iomap_begin() into a separate helper ext4_set_iomap(). The intent of this function is self explanatory, however the rationale behind taking this step is to reeduce the overall clutter that we currently have within the ext4_iomap_begin() callback. Signed-off-by: Matthew Bobrowski Reviewed-by: Jan Kara Reviewed-by: Ritesh Harjani --- fs/ext4/inode.c | 92 +++++++++++++++++++++++++++---------------------- 1 file changed, 50 insertions(+), 42 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index da2ca81e3d9c..073b7c873bb2 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3406,10 +3406,56 @@ static bool ext4_inode_datasync_dirty(struct inode *inode) return inode->i_state & I_DIRTY_DATASYNC; } +static void ext4_set_iomap(struct inode *inode, struct iomap *iomap, + struct ext4_map_blocks *map, loff_t offset, + loff_t length) +{ + u8 blkbits = inode->i_blkbits; + + /* + * Writes that span EOF might trigger an I/O size update on completion, + * so consider them to be dirty for the purpose of O_DSYNC, even if + * there is no other metadata changes being made or are pending. + */ + iomap->flags = 0; + if (ext4_inode_datasync_dirty(inode) || + offset + length > i_size_read(inode)) + iomap->flags |= IOMAP_F_DIRTY; + + if (map->m_flags & EXT4_MAP_NEW) + iomap->flags |= IOMAP_F_NEW; + + iomap->bdev = inode->i_sb->s_bdev; + iomap->dax_dev = EXT4_SB(inode->i_sb)->s_daxdev; + iomap->offset = (u64) map->m_lblk << blkbits; + iomap->length = (u64) map->m_len << blkbits; + + if (map->m_flags & (EXT4_MAP_MAPPED | EXT4_MAP_UNWRITTEN)) { + /* + * Flags passed to ext4_map_blocks() for direct I/O writes can + * result in m_flags having both EXT4_MAP_MAPPED and + * EXT4_MAP_UNWRITTEN bits set. In order for any allocated + * unwritten extents to be converted into written extents + * correctly within the ->end_io() handler, we need to ensure + * that the iomap->type is set appropriately. Hence, the reason + * why we need to check whether the EXT4_MAP_UNWRITTEN bit has + * been set first. + */ + if (map->m_flags & EXT4_MAP_UNWRITTEN) + iomap->type = IOMAP_UNWRITTEN; + else if (map->m_flags & EXT4_MAP_MAPPED) + iomap->type = IOMAP_MAPPED; + + iomap->addr = (u64) map->m_pblk << blkbits; + } else { + iomap->type = IOMAP_HOLE; + iomap->addr = IOMAP_NULL_ADDR; + } +} + static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, unsigned flags, struct iomap *iomap, struct iomap *srcmap) { - struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); unsigned int blkbits = inode->i_blkbits; unsigned long first_block, last_block; struct ext4_map_blocks map; @@ -3523,47 +3569,9 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, return ret; } - /* - * Writes that span EOF might trigger an I/O size update on completion, - * so consider them to be dirty for the purposes of O_DSYNC, even if - * there is no other metadata changes being made or are pending here. - */ - iomap->flags = 0; - if (ext4_inode_datasync_dirty(inode) || - offset + length > i_size_read(inode)) - iomap->flags |= IOMAP_F_DIRTY; - iomap->bdev = inode->i_sb->s_bdev; - iomap->dax_dev = sbi->s_daxdev; - iomap->offset = (u64)first_block << blkbits; - iomap->length = (u64)map.m_len << blkbits; - - if (ret == 0) { - iomap->type = delalloc ? IOMAP_DELALLOC : IOMAP_HOLE; - iomap->addr = IOMAP_NULL_ADDR; - } else { - /* - * Flags passed into ext4_map_blocks() for direct I/O writes - * can result in m_flags having both EXT4_MAP_MAPPED and - * EXT4_MAP_UNWRITTEN bits set. In order for any allocated - * unwritten extents to be converted into written extents - * correctly within the ->end_io() handler, we need to ensure - * that the iomap->type is set appropriately. Hence the reason - * why we need to check whether EXT4_MAP_UNWRITTEN is set - * first. - */ - if (map.m_flags & EXT4_MAP_UNWRITTEN) { - iomap->type = IOMAP_UNWRITTEN; - } else if (map.m_flags & EXT4_MAP_MAPPED) { - iomap->type = IOMAP_MAPPED; - } else { - WARN_ON_ONCE(1); - return -EIO; - } - iomap->addr = (u64)map.m_pblk << blkbits; - } - - if (map.m_flags & EXT4_MAP_NEW) - iomap->flags |= IOMAP_F_NEW; + ext4_set_iomap(inode, iomap, &map, offset, length); + if (delalloc && iomap->type == IOMAP_HOLE) + iomap->type = IOMAP_DELALLOC; return 0; } From patchwork Mon Oct 28 10:51:45 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Bobrowski X-Patchwork-Id: 1185298 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=mbobrowski.org Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=mbobrowski-org.20150623.gappssmtp.com header.i=@mbobrowski-org.20150623.gappssmtp.com header.b="Y78vlRmh"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 471s3Z0wsSz9sNw for ; Mon, 28 Oct 2019 21:51:54 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388356AbfJ1Kvx (ORCPT ); Mon, 28 Oct 2019 06:51:53 -0400 Received: from mail-pg1-f196.google.com ([209.85.215.196]:34074 "EHLO mail-pg1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388297AbfJ1Kvx (ORCPT ); Mon, 28 Oct 2019 06:51:53 -0400 Received: by mail-pg1-f196.google.com with SMTP id e4so2348183pgs.1 for ; Mon, 28 Oct 2019 03:51:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mbobrowski-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=41LZBznqzMjRIBqWsHF9KQj0bmycavXaY90fbO/UooQ=; b=Y78vlRmhk7TKVgXHuQdCHYjj73OV9GL3R0sGrQD8CelWfG2lwoOryQP4U+FMKSbDLt EXWvekZl63NZ6b9awkG7xKRf7fk+QSZ0Zt4aQWvL88qEPngSHhutHCa4G1N5c5n8CCQ+ xwig07Bxw87ca9PHyv3zivkE4hf46akK4Fqlh3KatDfk6DdgsNfcIRTSAHuIseJE2P5W 3Cz6BroYIvuyFSJbU+jAKuWxTgNTVmj7MIxKweZsy2pEMBcKEMpSYRhlFBRG6aUgrZNF v0X9nqRN6STc1/AXbIixDyC9KRIMwYdrZez5gO8bMOt7iIxlliFYYro+o06ElvCCSUs6 RH3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=41LZBznqzMjRIBqWsHF9KQj0bmycavXaY90fbO/UooQ=; b=KuxkAVYZg5JUG27qadq+dHQeJsoWMubZZm2vLLfOZMVxJZ0HE8QFnlaJdQMOfv5Bci Wh6sjIFFcJqqBjZcjSnJJhWljgOotNRaK6xmDT43dZdLwXZGtiK4nY2FbI7VbMKl+2s1 8t02Aq61pB3KOHIFpWPjyCEdwtwouICGsmULcNjJKMjY5jC2HqaTRn6ciY0K1mQ2lHhn vYE9lZ2SXVtXw/8QS9TDtW7R2joO8WwfRYjfL2kPZtVV/9hLTOSLGkWi3P9vzTMyxnPY 4QUsErBSbB3UdaxBRaQmMxa4NiYtTleh2g4r5B7AnIB1dzeTnImIj32/Rh68mKtdRtQM eEjA== X-Gm-Message-State: APjAAAUfJUd4xGHN5PS7Y/A1eid9UWX6ydvKWYn3Jh3siDv0ZBlYgdt/ rVx48HQZNJ2pk8PL44Y4FudK X-Google-Smtp-Source: APXvYqwbSBrw1To4jhjMs1F1ifrQEm5wpfYgLwjRvQeNq7wOaCbjN8ET+9C/CJZARRnL94UPMqphhg== X-Received: by 2002:a63:8148:: with SMTP id t69mr20574828pgd.160.1572259912335; Mon, 28 Oct 2019 03:51:52 -0700 (PDT) Received: from poseidon.bobrowski.net (d114-78-127-22.bla803.nsw.optusnet.com.au. [114.78.127.22]) by smtp.gmail.com with ESMTPSA id w62sm3286848pfb.15.2019.10.28.03.51.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Oct 2019 03:51:51 -0700 (PDT) Date: Mon, 28 Oct 2019 21:51:45 +1100 From: Matthew Bobrowski To: tytso@mit.edu, jack@suse.cz, adilger.kernel@dilger.ca Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, hch@infradead.org, david@fromorbit.com, darrick.wong@oracle.com Subject: [PATCH v6 05/11] ext4: split IOMAP_WRITE branch in ext4_iomap_begin() into helper Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org In preparation for porting across the ext4 direct I/O path over to the iomap infrastructure, split up the IOMAP_WRITE branch that's currently within ext4_iomap_begin() into a separate helper ext4_alloc_iomap(). This way, when we add in the necessary code for direct I/O, we don't end up with ext4_iomap_begin() becoming a monstrous twisty maze. Signed-off-by: Matthew Bobrowski Reviewed-by: Jan Kara Reviewed-by: Ritesh Harjani --- fs/ext4/inode.c | 113 ++++++++++++++++++++++++++---------------------- 1 file changed, 61 insertions(+), 52 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 073b7c873bb2..325abba6482c 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3453,6 +3453,63 @@ static void ext4_set_iomap(struct inode *inode, struct iomap *iomap, } } +static int ext4_iomap_alloc(struct inode *inode, struct ext4_map_blocks *map, + unsigned int flags) +{ + handle_t *handle; + u8 blkbits = inode->i_blkbits; + int ret, dio_credits, retries = 0; + + /* + * Trim the mapping request to the maximum value that we can map at + * once for direct I/O. + */ + if (map->m_len > DIO_MAX_BLOCKS) + map->m_len = DIO_MAX_BLOCKS; + dio_credits = ext4_chunk_trans_blocks(inode, map->m_len); + +retry: + /* + * Either we allocate blocks and then don't get an unwritten extent, so + * in that case we have reserved enough credits. Or, the blocks are + * already allocated and unwritten. In that case, the extent conversion + * fits into the credits as well. + */ + handle = ext4_journal_start(inode, EXT4_HT_MAP_BLOCKS, dio_credits); + if (IS_ERR(handle)) + return PTR_ERR(handle); + + ret = ext4_map_blocks(handle, inode, map, EXT4_GET_BLOCKS_CREATE_ZERO); + if (ret < 0) + goto journal_stop; + + /* + * If we've allocated blocks beyond EOF, we need to ensure that they're + * truncated if we crash before updating the inode size metadata within + * ext4_iomap_end(). For faults, we don't need to do that (and cannot + * due to orphan list operations needing an inode_lock()). If we happen + * to instantiate blocks beyond EOF, it is because we race with a + * truncate operation, which already has added the inode onto the + * orphan list. + */ + if (!(flags & IOMAP_FAULT) && map->m_lblk + map->m_len > + (i_size_read(inode) + (1 << blkbits) - 1) >> blkbits) { + int err; + + err = ext4_orphan_add(handle, inode); + if (err < 0) + ret = err; + } + +journal_stop: + ext4_journal_stop(handle); + if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries)) + goto retry; + + return ret; +} + + static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, unsigned flags, struct iomap *iomap, struct iomap *srcmap) { @@ -3513,62 +3570,14 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, } } } else if (flags & IOMAP_WRITE) { - int dio_credits; - handle_t *handle; - int retries = 0; - - /* Trim mapping request to maximum we can map at once for DIO */ - if (map.m_len > DIO_MAX_BLOCKS) - map.m_len = DIO_MAX_BLOCKS; - dio_credits = ext4_chunk_trans_blocks(inode, map.m_len); -retry: - /* - * Either we allocate blocks and then we don't get unwritten - * extent so we have reserved enough credits, or the blocks - * are already allocated and unwritten and in that case - * extent conversion fits in the credits as well. - */ - handle = ext4_journal_start(inode, EXT4_HT_MAP_BLOCKS, - dio_credits); - if (IS_ERR(handle)) - return PTR_ERR(handle); - - ret = ext4_map_blocks(handle, inode, &map, - EXT4_GET_BLOCKS_CREATE_ZERO); - if (ret < 0) { - ext4_journal_stop(handle); - if (ret == -ENOSPC && - ext4_should_retry_alloc(inode->i_sb, &retries)) - goto retry; - return ret; - } - - /* - * If we added blocks beyond i_size, we need to make sure they - * will get truncated if we crash before updating i_size in - * ext4_iomap_end(). For faults we don't need to do that (and - * even cannot because for orphan list operations inode_lock is - * required) - if we happen to instantiate block beyond i_size, - * it is because we race with truncate which has already added - * the inode to the orphan list. - */ - if (!(flags & IOMAP_FAULT) && first_block + map.m_len > - (i_size_read(inode) + (1 << blkbits) - 1) >> blkbits) { - int err; - - err = ext4_orphan_add(handle, inode); - if (err < 0) { - ext4_journal_stop(handle); - return err; - } - } - ext4_journal_stop(handle); + ret = ext4_iomap_alloc(inode, &map, flags); } else { ret = ext4_map_blocks(NULL, inode, &map, 0); - if (ret < 0) - return ret; } + if (ret < 0) + return ret; + ext4_set_iomap(inode, iomap, &map, offset, length); if (delalloc && iomap->type == IOMAP_HOLE) iomap->type = IOMAP_DELALLOC; From patchwork Mon Oct 28 10:52:13 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Bobrowski X-Patchwork-Id: 1185299 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=mbobrowski.org Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=mbobrowski-org.20150623.gappssmtp.com header.i=@mbobrowski-org.20150623.gappssmtp.com header.b="pHpFwWBA"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 471s4562Chz9sNw for ; Mon, 28 Oct 2019 21:52:21 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388356AbfJ1KwV (ORCPT ); Mon, 28 Oct 2019 06:52:21 -0400 Received: from mail-pf1-f196.google.com ([209.85.210.196]:42429 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732365AbfJ1KwU (ORCPT ); Mon, 28 Oct 2019 06:52:20 -0400 Received: by mail-pf1-f196.google.com with SMTP id 21so6636965pfj.9 for ; Mon, 28 Oct 2019 03:52:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mbobrowski-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=O4XNryIx8YarFTM6JxSehZON+26PJYhWJHjyK+bvHnw=; b=pHpFwWBAVWX7Hh/xJENmt6rr76MjiohwtoLkEj6l++53QShJjEBSErJi+jS+V64Zli AmYGOMGMIoI3rG55iPoJxOXtrwPaLPmhdJXMqYQVtrniDiPYhPy2ootXRfy6wlhEoE5e ZxNIE2ABxHgFdP0umdfPzd3F5ItRniZh6vSXUQcap6fGDD1plhrXwfK9quc1kZymIYOi Zft9bPKjpOPfXZ3IaW0ApAsty1xfelff2IWj+9wz1oLnV+rQqe2HHu72w+3QU+/H+qIH mA60OnzZn64r77UsORMpzspDXCv79HXEWZCKdaKexw6sW47qVyY/Ni4N6LxkBYvMtv4T Q3Cg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=O4XNryIx8YarFTM6JxSehZON+26PJYhWJHjyK+bvHnw=; b=cYukTj7fksrw4+os5EB3nH6fvoPnRTb2TAGt9scV8PBUhluEmV4w5MW/qhsYC9g5tz jNrN17mh9/swbAdkgIJxuVNaTIbgaRkRnKaGjiiHRh39LDNABeJnG5ZkAALAbnL3bH4S mEV3B4rAdxDAlUzi4LYG4Pekw6aLgXMoOhipC9iGjYTVAsR3pJaJaLtxA0xWxcLOtE6f O7bMWgRydh7ZkdHzvNg59Qdq0cBovX7wxWMnv9VXLpiKBcsgjt7qhQrw8phEsb5f02PV YS0231wZ/oVOHCAIfnLCAT+CEY6BSrY2/OXT305qPTvq7VqJK571arzCKDWmWvrz6bJu t13w== X-Gm-Message-State: APjAAAVWnzPw5wqnkmT3q2CNuPqKOHZBChkLcQP8NqCbW+wYxyIkAGfB I02PcX8/hAEjgDToQpxVDuQ6 X-Google-Smtp-Source: APXvYqwvbWdHKLanX82SlOMLhrPo6UDi2scS/nQtXxZ79yL1nnboGnUBBqrIiv1Ce+NjLuuPRJqyVQ== X-Received: by 2002:a63:b5b:: with SMTP id a27mr20774244pgl.262.1572259939758; Mon, 28 Oct 2019 03:52:19 -0700 (PDT) Received: from poseidon.bobrowski.net (d114-78-127-22.bla803.nsw.optusnet.com.au. [114.78.127.22]) by smtp.gmail.com with ESMTPSA id q2sm10889089pfh.34.2019.10.28.03.52.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Oct 2019 03:52:19 -0700 (PDT) Date: Mon, 28 Oct 2019 21:52:13 +1100 From: Matthew Bobrowski To: tytso@mit.edu, jack@suse.cz, adilger.kernel@dilger.ca Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, hch@infradead.org, david@fromorbit.com, darrick.wong@oracle.com Subject: [PATCH v6 06/11] ext4: introduce new callback for IOMAP_REPORT Message-ID: <1c115e4dfc0d0b10d195c85dd48271cfc5167580.1572255425.git.mbobrowski@mbobrowski.org> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org As part of the ext4_iomap_begin() cleanups that precede this patch, we also split up the IOMAP_REPORT branch into a completely separate ->iomap_begin() callback named ext4_iomap_begin_report(). Again, the raionale for this change is to reduce the overall clutter within ext4_iomap_begin(). Signed-off-by: Matthew Bobrowski Reviewed-by: Jan Kara Reviewed-by: Ritesh Harjani --- fs/ext4/ext4.h | 1 + fs/ext4/file.c | 6 ++- fs/ext4/inode.c | 134 +++++++++++++++++++++++++++++------------------- 3 files changed, 85 insertions(+), 56 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 03db3e71676c..d0d88f411a44 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -3379,6 +3379,7 @@ static inline void ext4_clear_io_unwritten_flag(ext4_io_end_t *io_end) } extern const struct iomap_ops ext4_iomap_ops; +extern const struct iomap_ops ext4_iomap_report_ops; static inline int ext4_buffer_uptodate(struct buffer_head *bh) { diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 8d2bbcc2d813..ab75aee3e687 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -494,12 +494,14 @@ loff_t ext4_llseek(struct file *file, loff_t offset, int whence) maxbytes, i_size_read(inode)); case SEEK_HOLE: inode_lock_shared(inode); - offset = iomap_seek_hole(inode, offset, &ext4_iomap_ops); + offset = iomap_seek_hole(inode, offset, + &ext4_iomap_report_ops); inode_unlock_shared(inode); break; case SEEK_DATA: inode_lock_shared(inode); - offset = iomap_seek_data(inode, offset, &ext4_iomap_ops); + offset = iomap_seek_data(inode, offset, + &ext4_iomap_report_ops); inode_unlock_shared(inode); break; } diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 325abba6482c..50b4835cd927 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3513,74 +3513,32 @@ static int ext4_iomap_alloc(struct inode *inode, struct ext4_map_blocks *map, static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, unsigned flags, struct iomap *iomap, struct iomap *srcmap) { - unsigned int blkbits = inode->i_blkbits; - unsigned long first_block, last_block; - struct ext4_map_blocks map; - bool delalloc = false; int ret; + struct ext4_map_blocks map; + u8 blkbits = inode->i_blkbits; if ((offset >> blkbits) > EXT4_MAX_LOGICAL_BLOCK) return -EINVAL; - first_block = offset >> blkbits; - last_block = min_t(loff_t, (offset + length - 1) >> blkbits, - EXT4_MAX_LOGICAL_BLOCK); - - if (flags & IOMAP_REPORT) { - if (ext4_has_inline_data(inode)) { - ret = ext4_inline_data_iomap(inode, iomap); - if (ret != -EAGAIN) { - if (ret == 0 && offset >= iomap->length) - ret = -ENOENT; - return ret; - } - } - } else { - if (WARN_ON_ONCE(ext4_has_inline_data(inode))) - return -ERANGE; - } - map.m_lblk = first_block; - map.m_len = last_block - first_block + 1; - - if (flags & IOMAP_REPORT) { - ret = ext4_map_blocks(NULL, inode, &map, 0); - if (ret < 0) - return ret; - - if (ret == 0) { - ext4_lblk_t end = map.m_lblk + map.m_len - 1; - struct extent_status es; - - ext4_es_find_extent_range(inode, &ext4_es_is_delayed, - map.m_lblk, end, &es); + if (WARN_ON_ONCE(ext4_has_inline_data(inode))) + return -ERANGE; - if (!es.es_len || es.es_lblk > end) { - /* entire range is a hole */ - } else if (es.es_lblk > map.m_lblk) { - /* range starts with a hole */ - map.m_len = es.es_lblk - map.m_lblk; - } else { - ext4_lblk_t offs = 0; + /* + * Calculate the first and last logical blocks respectively. + */ + map.m_lblk = offset >> blkbits; + map.m_len = min_t(loff_t, (offset + length - 1) >> blkbits, + EXT4_MAX_LOGICAL_BLOCK) - map.m_lblk + 1; - if (es.es_lblk < map.m_lblk) - offs = map.m_lblk - es.es_lblk; - map.m_lblk = es.es_lblk + offs; - map.m_len = es.es_len - offs; - delalloc = true; - } - } - } else if (flags & IOMAP_WRITE) { + if (flags & IOMAP_WRITE) ret = ext4_iomap_alloc(inode, &map, flags); - } else { + else ret = ext4_map_blocks(NULL, inode, &map, 0); - } if (ret < 0) return ret; ext4_set_iomap(inode, iomap, &map, offset, length); - if (delalloc && iomap->type == IOMAP_HOLE) - iomap->type = IOMAP_DELALLOC; return 0; } @@ -3642,6 +3600,74 @@ const struct iomap_ops ext4_iomap_ops = { .iomap_end = ext4_iomap_end, }; +static bool ext4_iomap_is_delalloc(struct inode *inode, + struct ext4_map_blocks *map) +{ + struct extent_status es; + ext4_lblk_t offset = 0, end = map->m_lblk + map->m_len - 1; + + ext4_es_find_extent_range(inode, &ext4_es_is_delayed, + map->m_lblk, end, &es); + + if (!es.es_len || es.es_lblk > end) + return false; + + if (es.es_lblk > map->m_lblk) { + map->m_len = es.es_lblk - map->m_lblk; + return false; + } + + offset = map->m_lblk - es.es_lblk; + map->m_len = es.es_len - offset; + + return true; +} + +static int ext4_iomap_begin_report(struct inode *inode, loff_t offset, + loff_t length, unsigned int flags, + struct iomap *iomap, struct iomap *srcmap) +{ + int ret; + bool delalloc = false; + struct ext4_map_blocks map; + u8 blkbits = inode->i_blkbits; + + if ((offset >> blkbits) > EXT4_MAX_LOGICAL_BLOCK) + return -EINVAL; + + if (ext4_has_inline_data(inode)) { + ret = ext4_inline_data_iomap(inode, iomap); + if (ret != -EAGAIN) { + if (ret == 0 && offset >= iomap->length) + ret = -ENOENT; + return ret; + } + } + + /* + * Calculate the first and last logical block respectively. + */ + map.m_lblk = offset >> blkbits; + map.m_len = min_t(loff_t, (offset + length - 1) >> blkbits, + EXT4_MAX_LOGICAL_BLOCK) - map.m_lblk + 1; + + ret = ext4_map_blocks(NULL, inode, &map, 0); + if (ret < 0) + return ret; + if (ret == 0) + delalloc = ext4_iomap_is_delalloc(inode, &map); + + ext4_set_iomap(inode, iomap, &map, offset, length); + if (delalloc && iomap->type == IOMAP_HOLE) + iomap->type = IOMAP_DELALLOC; + + return 0; +} + +const struct iomap_ops ext4_iomap_report_ops = { + .iomap_begin = ext4_iomap_begin_report, +}; + static int ext4_end_io_dio(struct kiocb *iocb, loff_t offset, ssize_t size, void *private) { From patchwork Mon Oct 28 10:52:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Bobrowski X-Patchwork-Id: 1185300 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=mbobrowski.org Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=mbobrowski-org.20150623.gappssmtp.com header.i=@mbobrowski-org.20150623.gappssmtp.com header.b="KGl2mnsq"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 471s4Y5znFz9sNw for ; Mon, 28 Oct 2019 21:52:45 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388356AbfJ1Kwp (ORCPT ); Mon, 28 Oct 2019 06:52:45 -0400 Received: from mail-pl1-f194.google.com ([209.85.214.194]:43990 "EHLO mail-pl1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729177AbfJ1Kwo (ORCPT ); Mon, 28 Oct 2019 06:52:44 -0400 Received: by mail-pl1-f194.google.com with SMTP id v5so5400889ply.10 for ; Mon, 28 Oct 2019 03:52:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mbobrowski-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=0i4Q8lN5RfQHFAjF7mAEg7OwHhBhoJgtUkCfBexPhbI=; b=KGl2mnsqc3Dbgubqw/0YGsRP4gLcH1PBGTTzD1Ep1NHWIsrZR9xDqQIvYnr66fmESG fl8xcT2P2mRn/65SpHD4wCXoX9Zep6zqZwVizVOzsGoT7ViVo8qjvV3b1V619DftOX3v utd97ogd9IWCvcjjDJS2VyY7Ar4DmABnGVCoDIJ4oM+g7sy1GnFylmwpA9EZUToyjEV4 p7Fn0VBsooii8KMOkTrSy32GzMEY9QctwKN+CTKVNM0mXkyWAlyTJIQsOnW3HvThXbck B3Sj/d9W9MlmniWYOeSCcqSJtqKkcGIRrzoaWhVNtUcYVeOIgcCDufGrD+fCunhMfr0x 5DdA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=0i4Q8lN5RfQHFAjF7mAEg7OwHhBhoJgtUkCfBexPhbI=; b=TVTw5HY2QCzY3CPUK0vPzvOC2WLeJd9hhxAVB10lGheKuuzVf8w8XdRBkcgfH5ffBQ kAedr0CgpiBRlGBqf4hWMbzCdMvJKTBpQ6NR1swinE1budOOpSni0t1aWQZKiSyDs4p1 ZwDCymjiK1ewnB1o7oe5evf/uBhVzwKcUv8GS6/+7Z/vkPm15Zmv5b2YlqwWnFemjg3/ JOpqm9hcwXEXclPwNpp/573CfHB4vYb9e2+Uhq4LTl1VuQ1GLBFeiS/HvCAheVLkhYD3 2yzIGVh2bfwsswxjww4kRow7saGHxjDjSUzx4491TPayx83cvV8/VdCJx69VUYtzI0hB +yDg== X-Gm-Message-State: APjAAAXwNBKJpy7BezWgs4JBPgLMJRLT7OdjQwCB6odHd1llO4Zq2MCY CibA9wm0ksYm7w3LxLDVDF8t X-Google-Smtp-Source: APXvYqz7TEe3ZK9d2rB1EifxOHSMlNyvoo/BxtoRGPtgQP54h6byaD36eca6DwXuFEwc0n/MJadlZg== X-Received: by 2002:a17:902:848e:: with SMTP id c14mr17050134plo.77.1572259963862; Mon, 28 Oct 2019 03:52:43 -0700 (PDT) Received: from poseidon.bobrowski.net (d114-78-127-22.bla803.nsw.optusnet.com.au. [114.78.127.22]) by smtp.gmail.com with ESMTPSA id x190sm11771535pfc.89.2019.10.28.03.52.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Oct 2019 03:52:43 -0700 (PDT) Date: Mon, 28 Oct 2019 21:52:37 +1100 From: Matthew Bobrowski To: tytso@mit.edu, jack@suse.cz, adilger.kernel@dilger.ca Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, hch@infradead.org, david@fromorbit.com, darrick.wong@oracle.com Subject: [PATCH v6 07/11] ext4: introduce direct I/O read using iomap infrastructure Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org This patch introduces a new direct I/O read path which makes use of the iomap infrastructure. The new function ext4_do_read_iter() is responsible for calling into the iomap infrastructure via iomap_dio_rw(). If the read operation performed on the inode is not supported, which is checked via ext4_dio_supported(), then we simply fallback and complete the I/O using buffered I/O. Existing direct I/O read code path has been removed, as it is now redundant. Signed-off-by: Matthew Bobrowski Reviewed-by: Jan Kara Reviewed-by: Ritesh Harjani --- fs/ext4/file.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++-- fs/ext4/inode.c | 38 +--------------------------------- 2 files changed, 54 insertions(+), 39 deletions(-) diff --git a/fs/ext4/file.c b/fs/ext4/file.c index ab75aee3e687..440f4c6ba4ee 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -34,6 +34,52 @@ #include "xattr.h" #include "acl.h" +static bool ext4_dio_supported(struct inode *inode) +{ + if (IS_ENABLED(CONFIG_FS_ENCRYPTION) && IS_ENCRYPTED(inode)) + return false; + if (fsverity_active(inode)) + return false; + if (ext4_should_journal_data(inode)) + return false; + if (ext4_has_inline_data(inode)) + return false; + return true; +} + +static ssize_t ext4_dio_read_iter(struct kiocb *iocb, struct iov_iter *to) +{ + ssize_t ret; + struct inode *inode = file_inode(iocb->ki_filp); + + if (iocb->ki_flags & IOCB_NOWAIT) { + if (!inode_trylock_shared(inode)) + return -EAGAIN; + } else { + inode_lock_shared(inode); + } + + if (!ext4_dio_supported(inode)) { + inode_unlock_shared(inode); + /* + * Fallback to buffered I/O if the operation being performed on + * the inode is not supported by direct I/O. The IOCB_DIRECT + * flag needs to be cleared here in order to ensure that the + * direct I/O path within generic_file_read_iter() is not + * taken. + */ + iocb->ki_flags &= ~IOCB_DIRECT; + return generic_file_read_iter(iocb, to); + } + + ret = iomap_dio_rw(iocb, to, &ext4_iomap_ops, NULL, + is_sync_kiocb(iocb)); + inode_unlock_shared(inode); + + file_accessed(iocb->ki_filp); + return ret; +} + #ifdef CONFIG_FS_DAX static ssize_t ext4_dax_read_iter(struct kiocb *iocb, struct iov_iter *to) { @@ -64,16 +110,21 @@ static ssize_t ext4_dax_read_iter(struct kiocb *iocb, struct iov_iter *to) static ssize_t ext4_file_read_iter(struct kiocb *iocb, struct iov_iter *to) { - if (unlikely(ext4_forced_shutdown(EXT4_SB(file_inode(iocb->ki_filp)->i_sb)))) + struct inode *inode = file_inode(iocb->ki_filp); + + if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb)))) return -EIO; if (!iov_iter_count(to)) return 0; /* skip atime */ #ifdef CONFIG_FS_DAX - if (IS_DAX(file_inode(iocb->ki_filp))) + if (IS_DAX(inode)) return ext4_dax_read_iter(iocb, to); #endif + if (iocb->ki_flags & IOCB_DIRECT) + return ext4_dio_read_iter(iocb, to); + return generic_file_read_iter(iocb, to); } diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 50b4835cd927..e44b3b1dbbc4 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -863,9 +863,6 @@ int ext4_dio_get_block(struct inode *inode, sector_t iblock, { /* We don't expect handle for direct IO */ WARN_ON_ONCE(ext4_journal_current_handle()); - - if (!create) - return _ext4_get_block(inode, iblock, bh, 0); return ext4_get_block_trans(inode, iblock, bh, EXT4_GET_BLOCKS_CREATE); } @@ -3874,36 +3871,6 @@ static ssize_t ext4_direct_IO_write(struct kiocb *iocb, struct iov_iter *iter) return ret; } -static ssize_t ext4_direct_IO_read(struct kiocb *iocb, struct iov_iter *iter) -{ - struct address_space *mapping = iocb->ki_filp->f_mapping; - struct inode *inode = mapping->host; - size_t count = iov_iter_count(iter); - ssize_t ret; - - /* - * Shared inode_lock is enough for us - it protects against concurrent - * writes & truncates and since we take care of writing back page cache, - * we are protected against page writeback as well. - */ - if (iocb->ki_flags & IOCB_NOWAIT) { - if (!inode_trylock_shared(inode)) - return -EAGAIN; - } else { - inode_lock_shared(inode); - } - - ret = filemap_write_and_wait_range(mapping, iocb->ki_pos, - iocb->ki_pos + count - 1); - if (ret) - goto out_unlock; - ret = __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev, - iter, ext4_dio_get_block, NULL, NULL, 0); -out_unlock: - inode_unlock_shared(inode); - return ret; -} - static ssize_t ext4_direct_IO(struct kiocb *iocb, struct iov_iter *iter) { struct file *file = iocb->ki_filp; @@ -3930,10 +3897,7 @@ static ssize_t ext4_direct_IO(struct kiocb *iocb, struct iov_iter *iter) return 0; trace_ext4_direct_IO_enter(inode, offset, count, iov_iter_rw(iter)); - if (iov_iter_rw(iter) == READ) - ret = ext4_direct_IO_read(iocb, iter); - else - ret = ext4_direct_IO_write(iocb, iter); + ret = ext4_direct_IO_write(iocb, iter); trace_ext4_direct_IO_exit(inode, offset, count, iov_iter_rw(iter), ret); return ret; } From patchwork Mon Oct 28 10:52:51 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Bobrowski X-Patchwork-Id: 1185301 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=mbobrowski.org Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=mbobrowski-org.20150623.gappssmtp.com header.i=@mbobrowski-org.20150623.gappssmtp.com header.b="wFXLXDkX"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 471s4q6g4Lz9sNw for ; Mon, 28 Oct 2019 21:52:59 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388359AbfJ1Kw7 (ORCPT ); Mon, 28 Oct 2019 06:52:59 -0400 Received: from mail-pf1-f193.google.com ([209.85.210.193]:41434 "EHLO mail-pf1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729177AbfJ1Kw7 (ORCPT ); Mon, 28 Oct 2019 06:52:59 -0400 Received: by mail-pf1-f193.google.com with SMTP id p26so2510121pfq.8 for ; Mon, 28 Oct 2019 03:52:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mbobrowski-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=JkX2KfEo2naL2qEFuKSRLVvDmaAU8/3Bc6817L+E8Fo=; b=wFXLXDkXFkiuS932MoMWKKCgq/2NDq8WyHUBVjvYvBmh/I8nZAV+vL/zYrQ60XeJW0 g6LLNOYUH5MrblB0/PPiFMDfsyFhp7VQylcyrSFP2ivsg5qchkNaB8+6AyFgJgimMJ+r YoUeRyTtSD8b82d5Diu44nctHZMf0fBGfrNyRgCw+Ckbm5GkFO9NKYsttxoQjUCLYpT5 lpkx31CVZRBU96KDAnIaer6+wHvVi9cHsCKZkzcLkQZHge1MLFvRUh4fj+y+oIY256kT EnViTUewHYs2fsxZVdTq81tYM/B7my0u9mdcr+UF9NkhYdx9Kx8eYJ5BccBs8nuOvowg N8iw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=JkX2KfEo2naL2qEFuKSRLVvDmaAU8/3Bc6817L+E8Fo=; b=UcVpNr5zid7FRbUQXg4UZfYkq0ZHR4q3KyLJGFeA6f6UeydIrBn/whqU3+uCWCqE3s 5R5ulOJmKUhl/hQEDmHR3zoTGfqZtdKff6kyNebVLYz4bYvfD1tu0PVanOb3+A3D+mPP 4XKItQrFBuP/OHi1EnRZGeaWACTUWFSDee28C/ckIOVhYt2Bogr2OHPZVHLfsayCkwSD jZ8njyz6R724nb8GM4ucdyfPYOSRll4oDVlmSGAAbnC7d/TXyAiFW+4CH7JZ9nswpL3K UirkZ91LhIrZA0tsGATihVRUUFran5WxrdgIhPRjRIerGPTSmASJdTgkUtFlKrLClp5i dYgw== X-Gm-Message-State: APjAAAV/bHWrwEtEN4Pae3bUob9S6nSe8JzLYVsAUOCiwHm1vDqtWhTc 0wrCZNHjaVvgJTfTKdwfh37/ X-Google-Smtp-Source: APXvYqx3pcz9KXKPtdjyS7gWSqmEwZrdpCykub2i5VQPovbMRLjVCCn2vHT2Q+iigaGhr2g3zmqS5A== X-Received: by 2002:a17:90a:77c7:: with SMTP id e7mr21576092pjs.133.1572259978377; Mon, 28 Oct 2019 03:52:58 -0700 (PDT) Received: from poseidon.bobrowski.net (d114-78-127-22.bla803.nsw.optusnet.com.au. [114.78.127.22]) by smtp.gmail.com with ESMTPSA id q34sm4113953pjb.15.2019.10.28.03.52.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Oct 2019 03:52:57 -0700 (PDT) Date: Mon, 28 Oct 2019 21:52:51 +1100 From: Matthew Bobrowski To: tytso@mit.edu, jack@suse.cz, adilger.kernel@dilger.ca Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, hch@infradead.org, david@fromorbit.com, darrick.wong@oracle.com Subject: [PATCH v6 08/11] ext4: move inode extension/truncate code out from ->iomap_end() callback Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org In preparation for implementing the iomap direct I/O modifications, the inode extension/truncate code needs to be moved out from the ext4_iomap_end() callback. For direct I/O, if the current code remained, it would behave incorrrectly. Updating the inode size prior to converting unwritten extents would potentially allow a racing direct I/O read to find unwritten extents before being converted correctly. The inode extension/truncate code now resides within a new helper ext4_handle_inode_extension(). This function has been designed so that it can accommodate for both DAX and direct I/O extension/truncate operations. Signed-off-by: Matthew Bobrowski Reviewed-by: Jan Kara Reviewed-by: Ritesh Harjani --- fs/ext4/file.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++++- fs/ext4/inode.c | 48 +------------------------- 2 files changed, 89 insertions(+), 48 deletions(-) diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 440f4c6ba4ee..ec54fec96a81 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -33,6 +33,7 @@ #include "ext4_jbd2.h" #include "xattr.h" #include "acl.h" +#include "truncate.h" static bool ext4_dio_supported(struct inode *inode) { @@ -234,12 +235,95 @@ static ssize_t ext4_write_checks(struct kiocb *iocb, struct iov_iter *from) return iov_iter_count(from); } +static ssize_t ext4_handle_inode_extension(struct inode *inode, loff_t offset, + ssize_t written, size_t count) +{ + handle_t *handle; + bool truncate = false; + u8 blkbits = inode->i_blkbits; + ext4_lblk_t written_blk, end_blk; + + /* + * Note that EXT4_I(inode)->i_disksize can get extended up to + * inode->i_size while the I/O was running due to writeback of delalloc + * blocks. But, the code in ext4_iomap_alloc() is careful to use + * zeroed/unwritten extents if this is possible; thus we won't leave + * uninitialized blocks in a file even if we didn't succeed in writing + * as much as we intended. + */ + WARN_ON_ONCE(i_size_read(inode) < EXT4_I(inode)->i_disksize); + if (offset + count <= EXT4_I(inode)->i_disksize) { + /* + * We need to ensure that the inode is removed from the orphan + * list if it has been added prematurely, due to writeback of + * delalloc blocks. + */ + if (!list_empty(&EXT4_I(inode)->i_orphan) && inode->i_nlink) { + handle = ext4_journal_start(inode, EXT4_HT_INODE, 2); + + if (IS_ERR(handle)) { + ext4_orphan_del(NULL, inode); + return PTR_ERR(handle); + } + + ext4_orphan_del(handle, inode); + ext4_journal_stop(handle); + } + + return written; + } + + if (written < 0) + goto truncate; + + handle = ext4_journal_start(inode, EXT4_HT_INODE, 2); + if (IS_ERR(handle)) { + written = PTR_ERR(handle); + goto truncate; + } + + if (ext4_update_inode_size(inode, offset + written)) + ext4_mark_inode_dirty(handle, inode); + + /* + * We may need to truncate allocated but not written blocks beyond EOF. + */ + written_blk = ALIGN(offset + written, 1 << blkbits); + end_blk = ALIGN(offset + count, 1 << blkbits); + if (written_blk < end_blk && ext4_can_truncate(inode)) + truncate = true; + + /* + * Remove the inode from the orphan list if it has been extended and + * everything went OK. + */ + if (!truncate && inode->i_nlink) + ext4_orphan_del(handle, inode); + ext4_journal_stop(handle); + + if (truncate) { +truncate: + ext4_truncate_failed_write(inode); + /* + * If the truncate operation failed early, then the inode may + * still be on the orphan list. In that case, we need to try + * remove the inode from the in-memory linked list. + */ + if (inode->i_nlink) + ext4_orphan_del(NULL, inode); + } + + return written; +} + #ifdef CONFIG_FS_DAX static ssize_t ext4_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) { - struct inode *inode = file_inode(iocb->ki_filp); ssize_t ret; + size_t count; + loff_t offset; + struct inode *inode = file_inode(iocb->ki_filp); if (!inode_trylock(inode)) { if (iocb->ki_flags & IOCB_NOWAIT) @@ -256,7 +340,10 @@ ext4_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) if (ret) goto out; + offset = iocb->ki_pos; + count = iov_iter_count(from); ret = dax_iomap_rw(iocb, from, &ext4_iomap_ops); + ret = ext4_handle_inode_extension(inode, offset, ret, count); out: inode_unlock(inode); if (ret > 0) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index e44b3b1dbbc4..7c21028760ee 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3543,53 +3543,7 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, static int ext4_iomap_end(struct inode *inode, loff_t offset, loff_t length, ssize_t written, unsigned flags, struct iomap *iomap) { - int ret = 0; - handle_t *handle; - int blkbits = inode->i_blkbits; - bool truncate = false; - - if (!(flags & IOMAP_WRITE) || (flags & IOMAP_FAULT)) - return 0; - - handle = ext4_journal_start(inode, EXT4_HT_INODE, 2); - if (IS_ERR(handle)) { - ret = PTR_ERR(handle); - goto orphan_del; - } - if (ext4_update_inode_size(inode, offset + written)) - ext4_mark_inode_dirty(handle, inode); - /* - * We may need to truncate allocated but not written blocks beyond EOF. - */ - if (iomap->offset + iomap->length > - ALIGN(inode->i_size, 1 << blkbits)) { - ext4_lblk_t written_blk, end_blk; - - written_blk = (offset + written) >> blkbits; - end_blk = (offset + length) >> blkbits; - if (written_blk < end_blk && ext4_can_truncate(inode)) - truncate = true; - } - /* - * Remove inode from orphan list if we were extending a inode and - * everything went fine. - */ - if (!truncate && inode->i_nlink && - !list_empty(&EXT4_I(inode)->i_orphan)) - ext4_orphan_del(handle, inode); - ext4_journal_stop(handle); - if (truncate) { - ext4_truncate_failed_write(inode); -orphan_del: - /* - * If truncate failed early the inode might still be on the - * orphan list; we need to make sure the inode is removed from - * the orphan list in that case. - */ - if (inode->i_nlink) - ext4_orphan_del(NULL, inode); - } - return ret; + return 0; } const struct iomap_ops ext4_iomap_ops = { From patchwork Mon Oct 28 10:53:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Bobrowski X-Patchwork-Id: 1185302 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=mbobrowski.org Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=mbobrowski-org.20150623.gappssmtp.com header.i=@mbobrowski-org.20150623.gappssmtp.com header.b="k+V7j4Dt"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 471s563bXdz9sNw for ; Mon, 28 Oct 2019 21:53:14 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388373AbfJ1KxN (ORCPT ); Mon, 28 Oct 2019 06:53:13 -0400 Received: from mail-pg1-f195.google.com ([209.85.215.195]:44298 "EHLO mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729177AbfJ1KxN (ORCPT ); Mon, 28 Oct 2019 06:53:13 -0400 Received: by mail-pg1-f195.google.com with SMTP id e10so6625223pgd.11 for ; Mon, 28 Oct 2019 03:53:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mbobrowski-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=85uK5SPDrOSgd4/UZbXAWGK0mzidi8Z7PgS+mbnmLLg=; b=k+V7j4Dt1th3d0RcyGq6mbTtakcXti0F639aJaOZNiyvf5bhnkxiuv/lA3kS7KVJDc s1HTJAtb91zYGfsYHaVSk7jHL9NEsMclBo989U3p+vmGMy6wNdAs5WdtOCNJT2VJ/7Wy Vp/su5RO3XTGD4nDTlP2dXgVpFCt2bqmNmh96WX39id8d+uSnfFAGTjvUWyliDxadc9o eWm9l/kmv8AlvkV/cx/NEGpd0/OqkK6GbrlwOOcwPqUaqrya34fIcm13DYIrtiNR6FNE rGl31z99+1Ze0dmMk7UC9znKuAqoniBUHrgVvqCcIBEZXblJRnGPQR2rWuNGPExMcTDF v2cA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=85uK5SPDrOSgd4/UZbXAWGK0mzidi8Z7PgS+mbnmLLg=; b=VhezS4rGrKL36Qs8MBeD7QCqMrkBs6zgwZMUdI0dddejqX4D/qpF9awJk/elsQZB9m wWyCmdtypxbElz+wxm9jbc8JJegC6/0YYTYCdn5UJaz+gbAFdKo+zxjgFOROa0YkDU4R wcjCo13IFNmdTwtNkMQhKkIMoU5elLvCQOaAN1NcJK2+zf03LcZid5Kr/A8ivokxq4Uc DYfK2HaH48Up/M9OgA9R3TZNCTiXQqxN/Iib/EMRi+LJq9aTDKcQRz+TGH3NC0gCWMtb LYj79NuSx7ZzjXbpsfKCd+E/BTsdQdwqF70GIW4tU2jjtMXBfzJK7Yv11a8R3T+EV7CL G1EQ== X-Gm-Message-State: APjAAAWN9j08YMc/5j9IXZNziBnLcQ4tqJHgmMcoz0hF/XjBQpyOVtgU tYoeveO5OTvYTmjmMleVqTb+ X-Google-Smtp-Source: APXvYqzHDAKvvAZQN6/3ujQbMxPqjcELczXwIgD9GjZj/TPpo6JcMrbqPbq+ZU9nAm0PMXbH8G8Ekw== X-Received: by 2002:a63:f050:: with SMTP id s16mr19972049pgj.261.1572259992721; Mon, 28 Oct 2019 03:53:12 -0700 (PDT) Received: from poseidon.bobrowski.net (d114-78-127-22.bla803.nsw.optusnet.com.au. [114.78.127.22]) by smtp.gmail.com with ESMTPSA id v68sm10648961pfv.47.2019.10.28.03.53.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Oct 2019 03:53:12 -0700 (PDT) Date: Mon, 28 Oct 2019 21:53:06 +1100 From: Matthew Bobrowski To: tytso@mit.edu, jack@suse.cz, adilger.kernel@dilger.ca Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, hch@infradead.org, david@fromorbit.com, darrick.wong@oracle.com Subject: [PATCH v6 09/11] ext4: move inode extension check out from ext4_iomap_alloc() Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Lift the inode extension/orphan list handling code out from ext4_iomap_alloc() and apply it within the ext4_dax_write_iter(). Signed-off-by: Matthew Bobrowski Reviewed-by: Jan Kara Reviewed-by: Ritesh Harjani --- fs/ext4/file.c | 24 +++++++++++++++++++++++- fs/ext4/inode.c | 22 ---------------------- 2 files changed, 23 insertions(+), 23 deletions(-) diff --git a/fs/ext4/file.c b/fs/ext4/file.c index ec54fec96a81..83ef9c9ed208 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -323,6 +323,8 @@ ext4_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) ssize_t ret; size_t count; loff_t offset; + handle_t *handle; + bool extend = false; struct inode *inode = file_inode(iocb->ki_filp); if (!inode_trylock(inode)) { @@ -342,8 +344,28 @@ ext4_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) offset = iocb->ki_pos; count = iov_iter_count(from); + + if (offset + count > EXT4_I(inode)->i_disksize) { + handle = ext4_journal_start(inode, EXT4_HT_INODE, 2); + if (IS_ERR(handle)) { + ret = PTR_ERR(handle); + goto out; + } + + ret = ext4_orphan_add(handle, inode); + if (ret) { + ext4_journal_stop(handle); + goto out; + } + + extend = true; + ext4_journal_stop(handle); + } + ret = dax_iomap_rw(iocb, from, &ext4_iomap_ops); - ret = ext4_handle_inode_extension(inode, offset, ret, count); + + if (extend) + ret = ext4_handle_inode_extension(inode, offset, ret, count); out: inode_unlock(inode); if (ret > 0) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 7c21028760ee..2ca2e6e69344 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3454,7 +3454,6 @@ static int ext4_iomap_alloc(struct inode *inode, struct ext4_map_blocks *map, unsigned int flags) { handle_t *handle; - u8 blkbits = inode->i_blkbits; int ret, dio_credits, retries = 0; /* @@ -3477,28 +3476,7 @@ static int ext4_iomap_alloc(struct inode *inode, struct ext4_map_blocks *map, return PTR_ERR(handle); ret = ext4_map_blocks(handle, inode, map, EXT4_GET_BLOCKS_CREATE_ZERO); - if (ret < 0) - goto journal_stop; - - /* - * If we've allocated blocks beyond EOF, we need to ensure that they're - * truncated if we crash before updating the inode size metadata within - * ext4_iomap_end(). For faults, we don't need to do that (and cannot - * due to orphan list operations needing an inode_lock()). If we happen - * to instantiate blocks beyond EOF, it is because we race with a - * truncate operation, which already has added the inode onto the - * orphan list. - */ - if (!(flags & IOMAP_FAULT) && map->m_lblk + map->m_len > - (i_size_read(inode) + (1 << blkbits) - 1) >> blkbits) { - int err; - - err = ext4_orphan_add(handle, inode); - if (err < 0) - ret = err; - } -journal_stop: ext4_journal_stop(handle); if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries)) goto retry; From patchwork Mon Oct 28 10:53:52 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Bobrowski X-Patchwork-Id: 1185304 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=mbobrowski.org Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=mbobrowski-org.20150623.gappssmtp.com header.i=@mbobrowski-org.20150623.gappssmtp.com header.b="Angb3Tj3"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 471s614Z6mz9sNw for ; Mon, 28 Oct 2019 21:54:01 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388377AbfJ1KyA (ORCPT ); Mon, 28 Oct 2019 06:54:00 -0400 Received: from mail-pf1-f194.google.com ([209.85.210.194]:33236 "EHLO mail-pf1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728554AbfJ1Kx7 (ORCPT ); Mon, 28 Oct 2019 06:53:59 -0400 Received: by mail-pf1-f194.google.com with SMTP id c184so6673706pfb.0 for ; Mon, 28 Oct 2019 03:53:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mbobrowski-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=hzUA/3MWmz4zE5gI8tQIld6lm5+sdsFzA4VSheS6JfQ=; b=Angb3Tj3jgXlr520cPCtCWxo5mOy2rZV91q12F7goKGs2ZCJtV141AqbuLKEmpjftL gcMn2T2IK/RnVdDnLSa5s5FXW0OLUEppbQ3Qd/atgnoP7uSXpOp0Lzi5NhQ82CH7D8mK s/zwnbQ9/y0TYmQGLZdtSH8iVTIaeY2dmv1PhlOhrqXN2cUof1Xy/96ZL+S/5pFLNOtB Om17hpjg8PAyXG0/y6WeuVXW3kHiLPMpc921OCKIDSpD/kMATV4nLyRGuAKe3VPE/1lD RtYVZNdIsjCvm7Ev4EiDsfOeOhhFp4KBkseXLXeBMfGjBZACWr2yFa0eqL9ksomB3oG9 f0IA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=hzUA/3MWmz4zE5gI8tQIld6lm5+sdsFzA4VSheS6JfQ=; b=CFGmnhoWobUwOICaDBYHdirb2HxNrg3yU9fdIHZa710nj8bQL3Pq0t4RFN/CrXrmOT GSoujiRtrDWpASueR1NZ3ktSoJFcFQqPHGv9ewzTguAry7wd+iMmg4/HACbUbtSOv9Qq JPbSW1gMPyDYCK9jcEv3l6Ni2oL0sqhbM2qhM7fRZB5RcxzNcJw99bwR0pExa4AwlDFw yhRkSGB+sK4ynzD96PAqZhltigIOu/5SYmzEtff2C9SdL4aM8W+CPG1XIrpA7HMuYrMh hvYVwFTjvutUlQXYtUWl+yjYwV5PecBJ35g5bmZlu+tIHC1xVhJskTdDmRi8oRIBK7jC ls5A== X-Gm-Message-State: APjAAAXPXgsoBnDscuYsrXxF/MZVZ5f55SWvrcEOuQK1ptg9MiCYG3q0 CWnYvP6s4eBmtX3Bkgso7Kv8 X-Google-Smtp-Source: APXvYqwhhhnLLvV6plWnhv+DzE/nx0VoWa4YIiEht4CJyGSLT+8Y2GmQKsJVAjM0Yl9LpgfuSsvEsw== X-Received: by 2002:a65:64da:: with SMTP id t26mr19834283pgv.180.1572260038670; Mon, 28 Oct 2019 03:53:58 -0700 (PDT) Received: from poseidon.bobrowski.net (d114-78-127-22.bla803.nsw.optusnet.com.au. [114.78.127.22]) by smtp.gmail.com with ESMTPSA id v1sm11626385pjd.22.2019.10.28.03.53.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Oct 2019 03:53:58 -0700 (PDT) Date: Mon, 28 Oct 2019 21:53:52 +1100 From: Matthew Bobrowski To: tytso@mit.edu, jack@suse.cz, adilger.kernel@dilger.ca Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, hch@infradead.org, david@fromorbit.com, darrick.wong@oracle.com Subject: [PATCH v6 10/11] ext4: update ext4_sync_file() to not use __generic_file_fsync() Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org When the filesystem is created without a journal, we eventually call into __generic_file_fsync() in order to write out all the modified in-core data to the permanent storage device. This function happens to try and obtain an inode_lock() while synchronizing the files buffer and it's associated metadata. Generally, this is fine, however it becomes a problem when there is higher level code that has already obtained an inode_lock() as this leads to a recursive lock situation. This case is especially true when porting across direct I/O to iomap infrastructure as we obtain an inode_lock() early on in the I/O within ext4_dio_write_iter() and hold it until the I/O has been completed. Consequently, to not run into this specific issue, we move away from calling into __generic_file_fsync() and perform the necessary synchronization tasks within ext4_sync_file(). Signed-off-by: Matthew Bobrowski Reviewed-by: Ritesh Harjani Reviewed-by: Jan Kara --- Thanks Jan and Christoph for the suggestion on this one, highly appreciated. fs/ext4/fsync.c | 72 ++++++++++++++++++++++++++++++++----------------- 1 file changed, 47 insertions(+), 25 deletions(-) diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c index 5508baa11bb6..e10206e7f4bb 100644 --- a/fs/ext4/fsync.c +++ b/fs/ext4/fsync.c @@ -80,6 +80,43 @@ static int ext4_sync_parent(struct inode *inode) return ret; } +static int ext4_fsync_nojournal(struct inode *inode, bool datasync, + bool *needs_barrier) +{ + int ret, err; + + ret = sync_mapping_buffers(inode->i_mapping); + if (!(inode->i_state & I_DIRTY_ALL)) + return ret; + if (datasync && !(inode->i_state & I_DIRTY_DATASYNC)) + return ret; + + err = sync_inode_metadata(inode, 1); + if (!ret) + ret = err; + + if (!ret) + ret = ext4_sync_parent(inode); + if (test_opt(inode->i_sb, BARRIER)) + *needs_barrier = true; + + return ret; +} + +static int ext4_fsync_journal(struct inode *inode, bool datasync, + bool *needs_barrier) +{ + struct ext4_inode_info *ei = EXT4_I(inode); + journal_t *journal = EXT4_SB(inode->i_sb)->s_journal; + tid_t commit_tid = datasync ? ei->i_datasync_tid : ei->i_sync_tid; + + if (journal->j_flags & JBD2_BARRIER && + !jbd2_trans_will_send_data_barrier(journal, commit_tid)) + *needs_barrier = true; + + return jbd2_complete_transaction(journal, commit_tid); +} + /* * akpm: A new design for ext4_sync_file(). * @@ -91,17 +128,14 @@ static int ext4_sync_parent(struct inode *inode) * What we do is just kick off a commit and wait on it. This will snapshot the * inode to disk. */ - int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync) { - struct inode *inode = file->f_mapping->host; - struct ext4_inode_info *ei = EXT4_I(inode); - journal_t *journal = EXT4_SB(inode->i_sb)->s_journal; int ret = 0, err; - tid_t commit_tid; bool needs_barrier = false; + struct inode *inode = file->f_mapping->host; + struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); - if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb)))) + if (unlikely(ext4_forced_shutdown(sbi))) return -EIO; J_ASSERT(ext4_journal_current_handle() == NULL); @@ -111,23 +145,15 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync) if (sb_rdonly(inode->i_sb)) { /* Make sure that we read updated s_mount_flags value */ smp_rmb(); - if (EXT4_SB(inode->i_sb)->s_mount_flags & EXT4_MF_FS_ABORTED) + if (sbi->s_mount_flags & EXT4_MF_FS_ABORTED) ret = -EROFS; goto out; } - if (!journal) { - ret = __generic_file_fsync(file, start, end, datasync); - if (!ret) - ret = ext4_sync_parent(inode); - if (test_opt(inode->i_sb, BARRIER)) - goto issue_flush; - goto out; - } - ret = file_write_and_wait_range(file, start, end); if (ret) return ret; + /* * data=writeback,ordered: * The caller's filemap_fdatawrite()/wait will sync the data. @@ -142,18 +168,14 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync) * (they were dirtied by commit). But that's OK - the blocks are * safe in-journal, which is all fsync() needs to ensure. */ - if (ext4_should_journal_data(inode)) { + if (!sbi->s_journal) + ret = ext4_fsync_nojournal(inode, datasync, &needs_barrier); + else if (ext4_should_journal_data(inode)) ret = ext4_force_commit(inode->i_sb); - goto out; - } + else + ret = ext4_fsync_journal(inode, datasync, &needs_barrier); - commit_tid = datasync ? ei->i_datasync_tid : ei->i_sync_tid; - if (journal->j_flags & JBD2_BARRIER && - !jbd2_trans_will_send_data_barrier(journal, commit_tid)) - needs_barrier = true; - ret = jbd2_complete_transaction(journal, commit_tid); if (needs_barrier) { - issue_flush: err = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL); if (!ret) ret = err; From patchwork Mon Oct 28 10:53:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Bobrowski X-Patchwork-Id: 1185303 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=mbobrowski.org Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=mbobrowski-org.20150623.gappssmtp.com header.i=@mbobrowski-org.20150623.gappssmtp.com header.b="PUmN8g2h"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 471s5j5Wd7z9sPL for ; Mon, 28 Oct 2019 21:53:45 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388376AbfJ1Kxo (ORCPT ); Mon, 28 Oct 2019 06:53:44 -0400 Received: from mail-pf1-f196.google.com ([209.85.210.196]:39379 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728554AbfJ1Kxo (ORCPT ); Mon, 28 Oct 2019 06:53:44 -0400 Received: by mail-pf1-f196.google.com with SMTP id v4so6648182pff.6 for ; Mon, 28 Oct 2019 03:53:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mbobrowski-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=vhL2UatCbUUZyHDBHpvj0gxdVoTw6jvPYu0wRnJYWUU=; b=PUmN8g2hGSrdDZCEXyGp9ktyB7cwg58Z6etDfGfX50EUVEMoDemwZJSekz0YDigsLC VEpGzkjAWpXBWFdHBVpjrcgh4RFqUbFWfy96D1FzCmyfmCevg+MZIMjNre4KdTzXiQ5I 4xkvF8jytbuPGnnNV9BPWqCG+NY44TgRzcjirpT7Vorfn6B66pt3wsOXEtHMQSfeTnA+ OTcnMNy39L54pRf3xDbzi/SnHx4mhbm9HuNjxKrBRyK5Kq/Q937I0it2HWhz87ZP7NdI 92vLF+ZV2/zcG8qEtzZUn+/4jEG+S/Namixs+d+hp88tlcSDBoygARiVgaCHaczJ5vYj 8qcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=vhL2UatCbUUZyHDBHpvj0gxdVoTw6jvPYu0wRnJYWUU=; b=Q4nnQuHQwL8JP9H8X8vmeFGEdAiCiws8C4LTEWxaFV3wsUlou+iOdsPz2B6bWIGkkX 0nLmDMcc+Miey5ijITH46LjjqUcjU5AukzKl1vYHAtBQVG0bQdFxhzV7qQ6F/BYMYrAX gc2TT7qmTFL/r9SuWG3S1xbcC9lXtuft/CiyhUcCNMbm3+XSo3kyXhFfsJUmrbsiW1MG x2WcFyOAARPvs2uiMRFzgq5XlB/ueOqQOmPL13HmqrEQZWG9BmruVXsK1uFbBtI15U3V ga8EA4/jxkxGym+LGY1HwfAtCavTn3coFrVVKNwvcOYINn8rQZn8yCKebOEORKr1frXN uSpA== X-Gm-Message-State: APjAAAWvcYat+5eToffoUGo4a9fP/EcW3NZr0RcqWCZuV7Acq1UVvqEG fbioRwlBiY5ghKrXsEBOqtBeywtZmw== X-Google-Smtp-Source: APXvYqwH/yw7MzBno/gkQNK3tRwvF8jbbqEKH/SRpcG0ckTLGFw0JzH1dz8kd7IPMx73piOJ2SG96w== X-Received: by 2002:a17:90a:600f:: with SMTP id y15mr20887531pji.82.1572260022411; Mon, 28 Oct 2019 03:53:42 -0700 (PDT) Received: from poseidon.bobrowski.net (d114-78-127-22.bla803.nsw.optusnet.com.au. [114.78.127.22]) by smtp.gmail.com with ESMTPSA id s191sm10569320pgc.94.2019.10.28.03.53.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Oct 2019 03:53:41 -0700 (PDT) Date: Mon, 28 Oct 2019 21:53:35 +1100 From: Matthew Bobrowski To: tytso@mit.edu, jack@suse.cz, adilger.kernel@dilger.ca Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, hch@infradead.org, david@fromorbit.com, darrick.wong@oracle.com Subject: [PATCH v6 11/11] ext4: introduce direct I/O write using iomap infrastructure Message-ID: <630afd6dbe0b9d4578a2a626cdc3fdd36fe4b756.1572255426.git.mbobrowski@mbobrowski.org> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org This patch introduces a new direct I/O write path which makes use of the iomap infrastructure. All direct I/O writes are now passed from the ->write_iter() callback through to the new direct I/O handler ext4_dio_write_iter(). This function is responsible for calling into the iomap infrastructure via iomap_dio_rw(). Code snippets from the existing direct I/O write code within ext4_file_write_iter() such as, checking whether the I/O request is unaligned asynchronous I/O, or whether the write will result in an overwrite have effectively been moved out and into the new direct I/O ->write_iter() handler. The block mapping flags that are eventually passed down to ext4_map_blocks() from the *_get_block_*() suite of routines have been taken out and introduced within ext4_iomap_alloc(). For inode extension cases, ext4_handle_inode_extension() is effectively the function responsible for performing such metadata updates. This is called after iomap_dio_rw() has returned so that we can safely determine whether we need to potentially truncate any allocated blocks that may have been prepared for this direct I/O write. We don't perform the inode extension, or truncate operations from the ->end_io() handler as we don't have the original I/O 'length' available there. The ->end_io() however is responsible fo converting allocated unwritten extents to written extents. In the instance of a short write, we fallback and complete the remainder of the I/O using buffered I/O via ext4_buffered_write_iter(). The existing buffer_head direct I/O implementation has been removed as it's now redundant. Signed-off-by: Matthew Bobrowski Reviewed-by: Jan Kara Reviewed-by: Ritesh Harjani --- fs/ext4/ext4.h | 3 - fs/ext4/extents.c | 11 +- fs/ext4/file.c | 221 +++++++++++++++++-------- fs/ext4/inode.c | 411 +++++----------------------------------------- 4 files changed, 193 insertions(+), 453 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index d0d88f411a44..fdab3420539d 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1579,7 +1579,6 @@ enum { EXT4_STATE_NO_EXPAND, /* No space for expansion */ EXT4_STATE_DA_ALLOC_CLOSE, /* Alloc DA blks on close */ EXT4_STATE_EXT_MIGRATE, /* Inode is migrating */ - EXT4_STATE_DIO_UNWRITTEN, /* need convert on dio done*/ EXT4_STATE_NEWENTRY, /* File just added to dir */ EXT4_STATE_MAY_INLINE_DATA, /* may have in-inode data */ EXT4_STATE_EXT_PRECACHED, /* extents have been precached */ @@ -2560,8 +2559,6 @@ int ext4_get_block_unwritten(struct inode *inode, sector_t iblock, struct buffer_head *bh_result, int create); int ext4_get_block(struct inode *inode, sector_t iblock, struct buffer_head *bh_result, int create); -int ext4_dio_get_block(struct inode *inode, sector_t iblock, - struct buffer_head *bh_result, int create); int ext4_da_get_block_prep(struct inode *inode, sector_t iblock, struct buffer_head *bh, int create); int ext4_walk_page_buffers(handle_t *handle, diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index fb0f99dc8c22..df0629de3667 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -1753,16 +1753,9 @@ ext4_can_extents_be_merged(struct inode *inode, struct ext4_extent *ex1, */ if (ext1_ee_len + ext2_ee_len > EXT_INIT_MAX_LEN) return 0; - /* - * The check for IO to unwritten extent is somewhat racy as we - * increment i_unwritten / set EXT4_STATE_DIO_UNWRITTEN only after - * dropping i_data_sem. But reserved blocks should save us in that - * case. - */ + if (ext4_ext_is_unwritten(ex1) && - (ext4_test_inode_state(inode, EXT4_STATE_DIO_UNWRITTEN) || - atomic_read(&EXT4_I(inode)->i_unwritten) || - (ext1_ee_len + ext2_ee_len > EXT_UNWRITTEN_MAX_LEN))) + ext1_ee_len + ext2_ee_len > EXT_UNWRITTEN_MAX_LEN) return 0; #ifdef AGGRESSIVE_TEST if (ext1_ee_len >= 4) diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 83ef9c9ed208..0df9d5191ed0 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -29,6 +29,7 @@ #include #include #include +#include #include "ext4.h" #include "ext4_jbd2.h" #include "xattr.h" @@ -155,13 +156,6 @@ static int ext4_release_file(struct inode *inode, struct file *filp) return 0; } -static void ext4_unwritten_wait(struct inode *inode) -{ - wait_queue_head_t *wq = ext4_ioend_wq(inode); - - wait_event(*wq, (atomic_read(&EXT4_I(inode)->i_unwritten) == 0)); -} - /* * This tests whether the IO in question is block-aligned or not. * Ext4 utilizes unwritten extents when hole-filling during direct IO, and they @@ -214,13 +208,13 @@ static ssize_t ext4_write_checks(struct kiocb *iocb, struct iov_iter *from) struct inode *inode = file_inode(iocb->ki_filp); ssize_t ret; + if (unlikely(IS_IMMUTABLE(inode))) + return -EPERM; + ret = generic_write_checks(iocb, from); if (ret <= 0) return ret; - if (unlikely(IS_IMMUTABLE(inode))) - return -EPERM; - /* * If we have encountered a bitmap-format file, the size limit * is smaller than s_maxbytes, which is for extent-mapped files. @@ -232,9 +226,42 @@ static ssize_t ext4_write_checks(struct kiocb *iocb, struct iov_iter *from) return -EFBIG; iov_iter_truncate(from, sbi->s_bitmap_maxbytes - iocb->ki_pos); } + + ret = file_modified(iocb->ki_filp); + if (ret) + return ret; + return iov_iter_count(from); } +static ssize_t ext4_buffered_write_iter(struct kiocb *iocb, + struct iov_iter *from) +{ + ssize_t ret; + struct inode *inode = file_inode(iocb->ki_filp); + + if (iocb->ki_flags & IOCB_NOWAIT) + return -EOPNOTSUPP; + + inode_lock(inode); + ret = ext4_write_checks(iocb, from); + if (ret <= 0) + goto out; + + current->backing_dev_info = inode_to_bdi(inode); + ret = generic_perform_write(iocb->ki_filp, from, iocb->ki_pos); + current->backing_dev_info = NULL; + +out: + inode_unlock(inode); + if (likely(ret > 0)) { + iocb->ki_pos += ret; + ret = generic_write_sync(iocb, ret); + } + + return ret; +} + static ssize_t ext4_handle_inode_extension(struct inode *inode, loff_t offset, ssize_t written, size_t count) { @@ -316,6 +343,114 @@ static ssize_t ext4_handle_inode_extension(struct inode *inode, loff_t offset, return written; } +static int ext4_dio_write_end_io(struct kiocb *iocb, ssize_t size, + int error, unsigned int flags) +{ + loff_t offset = iocb->ki_pos; + struct inode *inode = file_inode(iocb->ki_filp); + + if (error) + return error; + + if (size && flags & IOMAP_DIO_UNWRITTEN) + return ext4_convert_unwritten_extents(NULL, inode, + offset, size); + + return 0; +} + +static const struct iomap_dio_ops ext4_dio_write_ops = { + .end_io = ext4_dio_write_end_io, +}; + +static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from) +{ + ssize_t ret; + size_t count; + loff_t offset; + handle_t *handle; + struct inode *inode = file_inode(iocb->ki_filp); + bool extend = false, overwrite = false, unaligned_aio = false; + + if (iocb->ki_flags & IOCB_NOWAIT) { + if (!inode_trylock(inode)) + return -EAGAIN; + } else { + inode_lock(inode); + } + + if (!ext4_dio_supported(inode)) { + inode_unlock(inode); + /* + * Fallback to buffered I/O if the inode does not support + * direct I/O. + */ + return ext4_buffered_write_iter(iocb, from); + } + + ret = ext4_write_checks(iocb, from); + if (ret <= 0) { + inode_unlock(inode); + return ret; + } + + /* + * Unaligned direct asynchronous I/O must be serialized among each + * other as the zeroing of partial blocks of two competing unaligned + * asynchronous I/O writes can result in data corruption. + */ + offset = iocb->ki_pos; + count = iov_iter_count(from); + if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS) && + !is_sync_kiocb(iocb) && ext4_unaligned_aio(inode, from, offset)) { + unaligned_aio = true; + inode_dio_wait(inode); + } + + /* + * Determine whether the I/O will overwrite allocated and initialized + * blocks. If so, check to see whether it is possible to take the + * dioread_nolock path. + */ + if (!unaligned_aio && ext4_overwrite_io(inode, offset, count) && + ext4_should_dioread_nolock(inode)) { + overwrite = true; + downgrade_write(&inode->i_rwsem); + } + + if (offset + count > EXT4_I(inode)->i_disksize) { + handle = ext4_journal_start(inode, EXT4_HT_INODE, 2); + if (IS_ERR(handle)) { + ret = PTR_ERR(handle); + goto out; + } + + ret = ext4_orphan_add(handle, inode); + if (ret) { + ext4_journal_stop(handle); + goto out; + } + + extend = true; + ext4_journal_stop(handle); + } + + ret = iomap_dio_rw(iocb, from, &ext4_iomap_ops, &ext4_dio_write_ops, + is_sync_kiocb(iocb) || unaligned_aio || extend); + + if (extend) + ret = ext4_handle_inode_extension(inode, offset, ret, count); +out: + if (overwrite) + inode_unlock_shared(inode); + else + inode_unlock(inode); + + if (ret >= 0 && iov_iter_count(from)) + return ext4_buffered_write_iter(iocb, from); + return ret; +} + #ifdef CONFIG_FS_DAX static ssize_t ext4_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) @@ -332,15 +467,10 @@ ext4_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) return -EAGAIN; inode_lock(inode); } + ret = ext4_write_checks(iocb, from); if (ret <= 0) goto out; - ret = file_remove_privs(iocb->ki_filp); - if (ret) - goto out; - ret = file_update_time(iocb->ki_filp); - if (ret) - goto out; offset = iocb->ki_pos; count = iov_iter_count(from); @@ -378,10 +508,6 @@ static ssize_t ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from) { struct inode *inode = file_inode(iocb->ki_filp); - int o_direct = iocb->ki_flags & IOCB_DIRECT; - int unaligned_aio = 0; - int overwrite = 0; - ssize_t ret; if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb)))) return -EIO; @@ -390,59 +516,10 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from) if (IS_DAX(inode)) return ext4_dax_write_iter(iocb, from); #endif + if (iocb->ki_flags & IOCB_DIRECT) + return ext4_dio_write_iter(iocb, from); - if (!inode_trylock(inode)) { - if (iocb->ki_flags & IOCB_NOWAIT) - return -EAGAIN; - inode_lock(inode); - } - - ret = ext4_write_checks(iocb, from); - if (ret <= 0) - goto out; - - /* - * Unaligned direct AIO must be serialized among each other as zeroing - * of partial blocks of two competing unaligned AIOs can result in data - * corruption. - */ - if (o_direct && ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS) && - !is_sync_kiocb(iocb) && - ext4_unaligned_aio(inode, from, iocb->ki_pos)) { - unaligned_aio = 1; - ext4_unwritten_wait(inode); - } - - iocb->private = &overwrite; - /* Check whether we do a DIO overwrite or not */ - if (o_direct && !unaligned_aio) { - if (ext4_overwrite_io(inode, iocb->ki_pos, iov_iter_count(from))) { - if (ext4_should_dioread_nolock(inode)) - overwrite = 1; - } else if (iocb->ki_flags & IOCB_NOWAIT) { - ret = -EAGAIN; - goto out; - } - } - - ret = __generic_file_write_iter(iocb, from); - /* - * Unaligned direct AIO must be the only IO in flight. Otherwise - * overlapping aligned IO after unaligned might result in data - * corruption. - */ - if (ret == -EIOCBQUEUED && unaligned_aio) - ext4_unwritten_wait(inode); - inode_unlock(inode); - - if (ret > 0) - ret = generic_write_sync(iocb, ret); - - return ret; - -out: - inode_unlock(inode); - return ret; + return ext4_buffered_write_iter(iocb, from); } #ifdef CONFIG_FS_DAX diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 2ca2e6e69344..ff683d918112 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -826,133 +826,6 @@ int ext4_get_block_unwritten(struct inode *inode, sector_t iblock, /* Maximum number of blocks we map for direct IO at once. */ #define DIO_MAX_BLOCKS 4096 -/* - * Get blocks function for the cases that need to start a transaction - - * generally difference cases of direct IO and DAX IO. It also handles retries - * in case of ENOSPC. - */ -static int ext4_get_block_trans(struct inode *inode, sector_t iblock, - struct buffer_head *bh_result, int flags) -{ - int dio_credits; - handle_t *handle; - int retries = 0; - int ret; - - /* Trim mapping request to maximum we can map at once for DIO */ - if (bh_result->b_size >> inode->i_blkbits > DIO_MAX_BLOCKS) - bh_result->b_size = DIO_MAX_BLOCKS << inode->i_blkbits; - dio_credits = ext4_chunk_trans_blocks(inode, - bh_result->b_size >> inode->i_blkbits); -retry: - handle = ext4_journal_start(inode, EXT4_HT_MAP_BLOCKS, dio_credits); - if (IS_ERR(handle)) - return PTR_ERR(handle); - - ret = _ext4_get_block(inode, iblock, bh_result, flags); - ext4_journal_stop(handle); - - if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries)) - goto retry; - return ret; -} - -/* Get block function for DIO reads and writes to inodes without extents */ -int ext4_dio_get_block(struct inode *inode, sector_t iblock, - struct buffer_head *bh, int create) -{ - /* We don't expect handle for direct IO */ - WARN_ON_ONCE(ext4_journal_current_handle()); - return ext4_get_block_trans(inode, iblock, bh, EXT4_GET_BLOCKS_CREATE); -} - -/* - * Get block function for AIO DIO writes when we create unwritten extent if - * blocks are not allocated yet. The extent will be converted to written - * after IO is complete. - */ -static int ext4_dio_get_block_unwritten_async(struct inode *inode, - sector_t iblock, struct buffer_head *bh_result, int create) -{ - int ret; - - /* We don't expect handle for direct IO */ - WARN_ON_ONCE(ext4_journal_current_handle()); - - ret = ext4_get_block_trans(inode, iblock, bh_result, - EXT4_GET_BLOCKS_IO_CREATE_EXT); - - /* - * When doing DIO using unwritten extents, we need io_end to convert - * unwritten extents to written on IO completion. We allocate io_end - * once we spot unwritten extent and store it in b_private. Generic - * DIO code keeps b_private set and furthermore passes the value to - * our completion callback in 'private' argument. - */ - if (!ret && buffer_unwritten(bh_result)) { - if (!bh_result->b_private) { - ext4_io_end_t *io_end; - - io_end = ext4_init_io_end(inode, GFP_KERNEL); - if (!io_end) - return -ENOMEM; - bh_result->b_private = io_end; - ext4_set_io_unwritten_flag(inode, io_end); - } - set_buffer_defer_completion(bh_result); - } - - return ret; -} - -/* - * Get block function for non-AIO DIO writes when we create unwritten extent if - * blocks are not allocated yet. The extent will be converted to written - * after IO is complete by ext4_direct_IO_write(). - */ -static int ext4_dio_get_block_unwritten_sync(struct inode *inode, - sector_t iblock, struct buffer_head *bh_result, int create) -{ - int ret; - - /* We don't expect handle for direct IO */ - WARN_ON_ONCE(ext4_journal_current_handle()); - - ret = ext4_get_block_trans(inode, iblock, bh_result, - EXT4_GET_BLOCKS_IO_CREATE_EXT); - - /* - * Mark inode as having pending DIO writes to unwritten extents. - * ext4_direct_IO_write() checks this flag and converts extents to - * written. - */ - if (!ret && buffer_unwritten(bh_result)) - ext4_set_inode_state(inode, EXT4_STATE_DIO_UNWRITTEN); - - return ret; -} - -static int ext4_dio_get_block_overwrite(struct inode *inode, sector_t iblock, - struct buffer_head *bh_result, int create) -{ - int ret; - - ext4_debug("ext4_dio_get_block_overwrite: inode %lu, create flag %d\n", - inode->i_ino, create); - /* We don't expect handle for direct IO */ - WARN_ON_ONCE(ext4_journal_current_handle()); - - ret = _ext4_get_block(inode, iblock, bh_result, 0); - /* - * Blocks should have been preallocated! ext4_file_write_iter() checks - * that. - */ - WARN_ON_ONCE(!buffer_mapped(bh_result) || buffer_unwritten(bh_result)); - - return ret; -} - - /* * `handle' can be NULL if create is zero */ @@ -3454,7 +3327,8 @@ static int ext4_iomap_alloc(struct inode *inode, struct ext4_map_blocks *map, unsigned int flags) { handle_t *handle; - int ret, dio_credits, retries = 0; + u8 blkbits = inode->i_blkbits; + int ret, dio_credits, m_flags = 0, retries = 0; /* * Trim the mapping request to the maximum value that we can map at @@ -3475,7 +3349,33 @@ static int ext4_iomap_alloc(struct inode *inode, struct ext4_map_blocks *map, if (IS_ERR(handle)) return PTR_ERR(handle); - ret = ext4_map_blocks(handle, inode, map, EXT4_GET_BLOCKS_CREATE_ZERO); + /* + * DAX and direct I/O are the only two operations that are currently + * supported with IOMAP_WRITE. + */ + WARN_ON(!IS_DAX(inode) && !(flags & IOMAP_DIRECT)); + if (IS_DAX(inode)) + m_flags = EXT4_GET_BLOCKS_CREATE_ZERO; + /* + * We use i_size instead of i_disksize here because delalloc writeback + * can complete at any point and subsequently push the i_disksize out + * to i_size. This could be beyond where the direct I/O is happening + * and thus expose allocated blocks to direct I/O reads. + */ + else if ((map->m_lblk * (1 << blkbits)) >= i_size_read(inode)) + m_flags = EXT4_GET_BLOCKS_CREATE; + else if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) + m_flags = EXT4_GET_BLOCKS_IO_CREATE_EXT; + + ret = ext4_map_blocks(handle, inode, map, m_flags); + + /* + * We cannot fill holes in indirect tree based inodes as that could + * expose stale data in the case of a crash. Use the magic error code + * to fallback to buffered I/O. + */ + if (!m_flags && !ret) + ret = -ENOTBLK; ext4_journal_stop(handle); if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries)) @@ -3521,6 +3421,16 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, static int ext4_iomap_end(struct inode *inode, loff_t offset, loff_t length, ssize_t written, unsigned flags, struct iomap *iomap) { + /* + * Check to see whether an error occurred while writing out the data to + * the allocated blocks. If so, return the magic error code so that we + * fallback to buffered I/O and attempt to complete the remainder of + * the I/O. Any blocks that may have been allocated in preparation for + * the direct I/O write will be reused during the buffered I/O. + */ + if (flags & (IOMAP_WRITE | IOMAP_DIRECT) && written == 0) + return -ENOTBLK; + return 0; } @@ -3597,243 +3507,6 @@ const struct iomap_ops ext4_iomap_report_ops = { .iomap_begin = ext4_iomap_begin_report, }; -static int ext4_end_io_dio(struct kiocb *iocb, loff_t offset, - ssize_t size, void *private) -{ - ext4_io_end_t *io_end = private; - - /* if not async direct IO just return */ - if (!io_end) - return 0; - - ext_debug("ext4_end_io_dio(): io_end 0x%p " - "for inode %lu, iocb 0x%p, offset %llu, size %zd\n", - io_end, io_end->inode->i_ino, iocb, offset, size); - - /* - * Error during AIO DIO. We cannot convert unwritten extents as the - * data was not written. Just clear the unwritten flag and drop io_end. - */ - if (size <= 0) { - ext4_clear_io_unwritten_flag(io_end); - size = 0; - } - io_end->offset = offset; - io_end->size = size; - ext4_put_io_end(io_end); - - return 0; -} - -/* - * Handling of direct IO writes. - * - * For ext4 extent files, ext4 will do direct-io write even to holes, - * preallocated extents, and those write extend the file, no need to - * fall back to buffered IO. - * - * For holes, we fallocate those blocks, mark them as unwritten - * If those blocks were preallocated, we mark sure they are split, but - * still keep the range to write as unwritten. - * - * The unwritten extents will be converted to written when DIO is completed. - * For async direct IO, since the IO may still pending when return, we - * set up an end_io call back function, which will do the conversion - * when async direct IO completed. - * - * If the O_DIRECT write will extend the file then add this inode to the - * orphan list. So recovery will truncate it back to the original size - * if the machine crashes during the write. - * - */ -static ssize_t ext4_direct_IO_write(struct kiocb *iocb, struct iov_iter *iter) -{ - struct file *file = iocb->ki_filp; - struct inode *inode = file->f_mapping->host; - struct ext4_inode_info *ei = EXT4_I(inode); - ssize_t ret; - loff_t offset = iocb->ki_pos; - size_t count = iov_iter_count(iter); - int overwrite = 0; - get_block_t *get_block_func = NULL; - int dio_flags = 0; - loff_t final_size = offset + count; - int orphan = 0; - handle_t *handle; - - if (final_size > inode->i_size || final_size > ei->i_disksize) { - /* Credits for sb + inode write */ - handle = ext4_journal_start(inode, EXT4_HT_INODE, 2); - if (IS_ERR(handle)) { - ret = PTR_ERR(handle); - goto out; - } - ret = ext4_orphan_add(handle, inode); - if (ret) { - ext4_journal_stop(handle); - goto out; - } - orphan = 1; - ext4_update_i_disksize(inode, inode->i_size); - ext4_journal_stop(handle); - } - - BUG_ON(iocb->private == NULL); - - /* - * Make all waiters for direct IO properly wait also for extent - * conversion. This also disallows race between truncate() and - * overwrite DIO as i_dio_count needs to be incremented under i_mutex. - */ - inode_dio_begin(inode); - - /* If we do a overwrite dio, i_mutex locking can be released */ - overwrite = *((int *)iocb->private); - - if (overwrite) - inode_unlock(inode); - - /* - * For extent mapped files we could direct write to holes and fallocate. - * - * Allocated blocks to fill the hole are marked as unwritten to prevent - * parallel buffered read to expose the stale data before DIO complete - * the data IO. - * - * As to previously fallocated extents, ext4 get_block will just simply - * mark the buffer mapped but still keep the extents unwritten. - * - * For non AIO case, we will convert those unwritten extents to written - * after return back from blockdev_direct_IO. That way we save us from - * allocating io_end structure and also the overhead of offloading - * the extent convertion to a workqueue. - * - * For async DIO, the conversion needs to be deferred when the - * IO is completed. The ext4 end_io callback function will be - * called to take care of the conversion work. Here for async - * case, we allocate an io_end structure to hook to the iocb. - */ - iocb->private = NULL; - if (overwrite) - get_block_func = ext4_dio_get_block_overwrite; - else if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS) || - round_down(offset, i_blocksize(inode)) >= inode->i_size) { - get_block_func = ext4_dio_get_block; - dio_flags = DIO_LOCKING | DIO_SKIP_HOLES; - } else if (is_sync_kiocb(iocb)) { - get_block_func = ext4_dio_get_block_unwritten_sync; - dio_flags = DIO_LOCKING; - } else { - get_block_func = ext4_dio_get_block_unwritten_async; - dio_flags = DIO_LOCKING; - } - ret = __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev, iter, - get_block_func, ext4_end_io_dio, NULL, - dio_flags); - - if (ret > 0 && !overwrite && ext4_test_inode_state(inode, - EXT4_STATE_DIO_UNWRITTEN)) { - int err; - /* - * for non AIO case, since the IO is already - * completed, we could do the conversion right here - */ - err = ext4_convert_unwritten_extents(NULL, inode, - offset, ret); - if (err < 0) - ret = err; - ext4_clear_inode_state(inode, EXT4_STATE_DIO_UNWRITTEN); - } - - inode_dio_end(inode); - /* take i_mutex locking again if we do a ovewrite dio */ - if (overwrite) - inode_lock(inode); - - if (ret < 0 && final_size > inode->i_size) - ext4_truncate_failed_write(inode); - - /* Handle extending of i_size after direct IO write */ - if (orphan) { - int err; - - /* Credits for sb + inode write */ - handle = ext4_journal_start(inode, EXT4_HT_INODE, 2); - if (IS_ERR(handle)) { - /* - * We wrote the data but cannot extend - * i_size. Bail out. In async io case, we do - * not return error here because we have - * already submmitted the corresponding - * bio. Returning error here makes the caller - * think that this IO is done and failed - * resulting in race with bio's completion - * handler. - */ - if (!ret) - ret = PTR_ERR(handle); - if (inode->i_nlink) - ext4_orphan_del(NULL, inode); - - goto out; - } - if (inode->i_nlink) - ext4_orphan_del(handle, inode); - if (ret > 0) { - loff_t end = offset + ret; - if (end > inode->i_size || end > ei->i_disksize) { - ext4_update_i_disksize(inode, end); - if (end > inode->i_size) - i_size_write(inode, end); - /* - * We're going to return a positive `ret' - * here due to non-zero-length I/O, so there's - * no way of reporting error returns from - * ext4_mark_inode_dirty() to userspace. So - * ignore it. - */ - ext4_mark_inode_dirty(handle, inode); - } - } - err = ext4_journal_stop(handle); - if (ret == 0) - ret = err; - } -out: - return ret; -} - -static ssize_t ext4_direct_IO(struct kiocb *iocb, struct iov_iter *iter) -{ - struct file *file = iocb->ki_filp; - struct inode *inode = file->f_mapping->host; - size_t count = iov_iter_count(iter); - loff_t offset = iocb->ki_pos; - ssize_t ret; - -#ifdef CONFIG_FS_ENCRYPTION - if (IS_ENCRYPTED(inode) && S_ISREG(inode->i_mode)) - return 0; -#endif - if (fsverity_active(inode)) - return 0; - - /* - * If we are doing data journalling we don't support O_DIRECT - */ - if (ext4_should_journal_data(inode)) - return 0; - - /* Let buffer I/O handle the inline data case. */ - if (ext4_has_inline_data(inode)) - return 0; - - trace_ext4_direct_IO_enter(inode, offset, count, iov_iter_rw(iter)); - ret = ext4_direct_IO_write(iocb, iter); - trace_ext4_direct_IO_exit(inode, offset, count, iov_iter_rw(iter), ret); - return ret; -} - /* * Pages can be marked dirty completely asynchronously from ext4's journalling * activity. By filemap_sync_pte(), try_to_unmap_one(), etc. We cannot do @@ -3871,7 +3544,7 @@ static const struct address_space_operations ext4_aops = { .bmap = ext4_bmap, .invalidatepage = ext4_invalidatepage, .releasepage = ext4_releasepage, - .direct_IO = ext4_direct_IO, + .direct_IO = noop_direct_IO, .migratepage = buffer_migrate_page, .is_partially_uptodate = block_is_partially_uptodate, .error_remove_page = generic_error_remove_page, @@ -3888,7 +3561,7 @@ static const struct address_space_operations ext4_journalled_aops = { .bmap = ext4_bmap, .invalidatepage = ext4_journalled_invalidatepage, .releasepage = ext4_releasepage, - .direct_IO = ext4_direct_IO, + .direct_IO = noop_direct_IO, .is_partially_uptodate = block_is_partially_uptodate, .error_remove_page = generic_error_remove_page, }; @@ -3904,7 +3577,7 @@ static const struct address_space_operations ext4_da_aops = { .bmap = ext4_bmap, .invalidatepage = ext4_invalidatepage, .releasepage = ext4_releasepage, - .direct_IO = ext4_direct_IO, + .direct_IO = noop_direct_IO, .migratepage = buffer_migrate_page, .is_partially_uptodate = block_is_partially_uptodate, .error_remove_page = generic_error_remove_page,