From patchwork Tue Apr 28 00:21:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 1278101 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49B2RB5Syyz9sSy for ; Tue, 28 Apr 2020 10:22:46 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726418AbgD1AVs (ORCPT ); Mon, 27 Apr 2020 20:21:48 -0400 Received: from mga05.intel.com ([192.55.52.43]:40690 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726233AbgD1AVr (ORCPT ); Mon, 27 Apr 2020 20:21:47 -0400 IronPort-SDR: 5I+h9wpec3M5t/LMY/OXixjbjJtBTqUjRccg5xbFg7dHXqXgiyN8yWc/foGvKhJ4wQnMV7GCYQ WCHchg5bIKgQ== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Apr 2020 17:21:47 -0700 IronPort-SDR: 8HXR028P2Jqh6f3cFIkVcptTXE6rTaUHplYnb/zxCMhtfdmToUdOJ4YjVd7Jj2eEaswyGCq75m azAYa/ZGxVqQ== X-IronPort-AV: E=Sophos;i="5.73,325,1583222400"; d="scan'208";a="246331329" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Apr 2020 17:21:46 -0700 From: ira.weiny@intel.com To: linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, "Darrick J. Wong" Cc: Ira Weiny , Dave Chinner , Al Viro , Jan Kara , Dan Williams , Dave Chinner , Christoph Hellwig , "Theodore Y. Ts'o" , Jeff Moyer , linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Subject: [PATCH V11 01/11] fs/xfs: Remove unnecessary initialization of i_rwsem Date: Mon, 27 Apr 2020 17:21:32 -0700 Message-Id: <20200428002142.404144-2-ira.weiny@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200428002142.404144-1-ira.weiny@intel.com> References: <20200428002142.404144-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org From: Ira Weiny An earlier call of xfs_reinit_inode() from xfs_iget_cache_hit() already handles initialization of i_rwsem. Doing so again is unneeded. Reviewed-by: Darrick J. Wong Reviewed-by: Dave Chinner Signed-off-by: Ira Weiny --- Changes from V4: Update commit message to make it clear the xfs_iget_cache_hit() is actually doing the initialization via xfs_reinit_inode() New for V4: NOTE: This was found while ensuring the new i_aops_sem was properly handled. It seems like this is a layering violation so I think it is worth cleaning up so as to not confuse others. --- fs/xfs/xfs_icache.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 8bf1d15be3f6..17a0b86fe701 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -423,6 +423,7 @@ xfs_iget_cache_hit( spin_unlock(&ip->i_flags_lock); rcu_read_unlock(); + ASSERT(!rwsem_is_locked(&inode->i_rwsem)); error = xfs_reinit_inode(mp, inode); if (error) { bool wake; @@ -456,9 +457,6 @@ xfs_iget_cache_hit( ip->i_sick = 0; ip->i_checked = 0; - ASSERT(!rwsem_is_locked(&inode->i_rwsem)); - init_rwsem(&inode->i_rwsem); - spin_unlock(&ip->i_flags_lock); spin_unlock(&pag->pag_ici_lock); } else { From patchwork Tue Apr 28 00:21:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 1278100 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49B2R65VTFz9sT3 for ; Tue, 28 Apr 2020 10:22:42 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726661AbgD1AWj (ORCPT ); Mon, 27 Apr 2020 20:22:39 -0400 Received: from mga18.intel.com ([134.134.136.126]:16476 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726424AbgD1AVt (ORCPT ); Mon, 27 Apr 2020 20:21:49 -0400 IronPort-SDR: otWHwsYP7gJC3Nedz6AbXqXhkUeVc5uPDXCHZ+L0kQVVteh2pfSR7lMCPyRP+Bveh/tYzgz0i9 BLJquHBI3eig== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Apr 2020 17:21:48 -0700 IronPort-SDR: CzElres2me3QpcebS4hq61xayp3+MnEtKbHg/B4tNmZEW4TkWo5aykndJ3RyIAZQsROOEfosir v8KDoWQIoJgg== X-IronPort-AV: E=Sophos;i="5.73,325,1583222400"; d="scan'208";a="275659926" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Apr 2020 17:21:47 -0700 From: ira.weiny@intel.com To: linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, "Darrick J. Wong" Cc: Ira Weiny , Dave Chinner , Jan Kara , Christoph Hellwig , Al Viro , Dan Williams , Dave Chinner , "Theodore Y. Ts'o" , Jeff Moyer , linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Subject: [PATCH V11 02/11] fs: Remove unneeded IS_DAX() check in io_is_direct() Date: Mon, 27 Apr 2020 17:21:33 -0700 Message-Id: <20200428002142.404144-3-ira.weiny@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200428002142.404144-1-ira.weiny@intel.com> References: <20200428002142.404144-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org From: Ira Weiny Remove the check because DAX now has it's own read/write methods and file systems which support DAX check IS_DAX() prior to IOCB_DIRECT on their own. Therefore, it does not matter if the file state is DAX when the iocb flags are created. Also remove io_is_direct() as it is just a simple flag check. Reviewed-by: Dave Chinner Reviewed-by: Jan Kara Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Signed-off-by: Ira Weiny --- Changes from v8: Rebase to latest Linus tree Changes from v6: remove io_is_direct() as well. Remove Reviews since this is quite a bit different. Changes from v3: Reword commit message. Reordered to be a 'pre-cleanup' patch --- drivers/block/loop.c | 6 +++--- include/linux/fs.h | 7 +------ 2 files changed, 4 insertions(+), 9 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index da693e6a834e..14372df0f354 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -634,8 +634,8 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq) static inline void loop_update_dio(struct loop_device *lo) { - __loop_update_dio(lo, io_is_direct(lo->lo_backing_file) | - lo->use_dio); + __loop_update_dio(lo, (lo->lo_backing_file->f_flags & O_DIRECT) | + lo->use_dio); } static void loop_reread_partitions(struct loop_device *lo, @@ -1028,7 +1028,7 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode, if (!(lo_flags & LO_FLAGS_READ_ONLY) && file->f_op->fsync) blk_queue_write_cache(lo->lo_queue, true, false); - if (io_is_direct(lo->lo_backing_file) && inode->i_sb->s_bdev) { + if ((lo->lo_backing_file->f_flags & O_DIRECT) && inode->i_sb->s_bdev) { /* In case of direct I/O, match underlying block size */ unsigned short bsize = bdev_logical_block_size( inode->i_sb->s_bdev); diff --git a/include/linux/fs.h b/include/linux/fs.h index 4f6f59b4f22a..a87cc5845a02 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3394,11 +3394,6 @@ extern void setattr_copy(struct inode *inode, const struct iattr *attr); extern int file_update_time(struct file *file); -static inline bool io_is_direct(struct file *filp) -{ - return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping->host); -} - static inline bool vma_is_dax(const struct vm_area_struct *vma) { return vma->vm_file && IS_DAX(vma->vm_file->f_mapping->host); @@ -3423,7 +3418,7 @@ static inline int iocb_flags(struct file *file) int res = 0; if (file->f_flags & O_APPEND) res |= IOCB_APPEND; - if (io_is_direct(file)) + if (file->f_flags & O_DIRECT) res |= IOCB_DIRECT; if ((file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host)) res |= IOCB_DSYNC; From patchwork Tue Apr 28 00:21:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 1278098 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49B2Qw1nD4z9sT4 for ; Tue, 28 Apr 2020 10:22:32 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726233AbgD1AVu (ORCPT ); Mon, 27 Apr 2020 20:21:50 -0400 Received: from mga06.intel.com ([134.134.136.31]:25937 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726437AbgD1AVt (ORCPT ); Mon, 27 Apr 2020 20:21:49 -0400 IronPort-SDR: lxeA3nddhOyj3TpVgPZrZADMcByzzPZANcMjVYlpKaCQEbtIpnfXQYfFjfF5NKuRxMoeokIFWb T33oMnu9BRmg== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Apr 2020 17:21:48 -0700 IronPort-SDR: BbR0PGu6WVS4hJkrnhrV8J087o4SM0ZVukT2QA2I6ye/QNUrtFChOb0DDwKEL4jVba1zBGtV9D TtZ9PYNWwsFQ== X-IronPort-AV: E=Sophos;i="5.73,325,1583222400"; d="scan'208";a="247559446" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Apr 2020 17:21:48 -0700 From: ira.weiny@intel.com To: linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, "Darrick J. Wong" Cc: Ira Weiny , Dave Chinner , Jan Kara , Al Viro , Dan Williams , Dave Chinner , Christoph Hellwig , "Theodore Y. Ts'o" , Jeff Moyer , linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Subject: [PATCH V11 03/11] fs/stat: Define DAX statx attribute Date: Mon, 27 Apr 2020 17:21:34 -0700 Message-Id: <20200428002142.404144-4-ira.weiny@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200428002142.404144-1-ira.weiny@intel.com> References: <20200428002142.404144-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org From: Ira Weiny In order for users to determine if a file is currently operating in DAX state (effective DAX). Define a statx attribute value and set that attribute if the effective DAX flag is set. To go along with this we propose the following addition to the statx man page: STATX_ATTR_DAX The file is in the DAX (cpu direct access) state. DAX state attempts to minimize software cache effects for both I/O and memory mappings of this file. It requires a file system which has been configured to support DAX. DAX generally assumes all accesses are via cpu load / store instructions which can minimize overhead for small accesses, but may adversely affect cpu utilization for large transfers. File I/O is done directly to/from user-space buffers and memory mapped I/O may be performed with direct memory mappings that bypass kernel page cache. While the DAX property tends to result in data being transferred synchronously, it does not give the same guarantees of O_SYNC where data and the necessary metadata are transferred together. A DAX file may support being mapped with the MAP_SYNC flag, which enables a program to use CPU cache flush instructions to persist CPU store operations without an explicit fsync(2). See mmap(2) for more information. Reviewed-by: Dave Chinner Reviewed-by: Jan Kara Reviewed-by: Darrick J. Wong Signed-off-by: Ira Weiny --- Changes from V2: Update man page text with comments from Darrick, Jan, Dan, and Dave. --- fs/stat.c | 3 +++ include/uapi/linux/stat.h | 1 + 2 files changed, 4 insertions(+) diff --git a/fs/stat.c b/fs/stat.c index 030008796479..894699c74dde 100644 --- a/fs/stat.c +++ b/fs/stat.c @@ -79,6 +79,9 @@ int vfs_getattr_nosec(const struct path *path, struct kstat *stat, if (IS_AUTOMOUNT(inode)) stat->attributes |= STATX_ATTR_AUTOMOUNT; + if (IS_DAX(inode)) + stat->attributes |= STATX_ATTR_DAX; + if (inode->i_op->getattr) return inode->i_op->getattr(path, stat, request_mask, query_flags); diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h index ad80a5c885d5..e5f9d5517f6b 100644 --- a/include/uapi/linux/stat.h +++ b/include/uapi/linux/stat.h @@ -169,6 +169,7 @@ struct statx { #define STATX_ATTR_ENCRYPTED 0x00000800 /* [I] File requires key to decrypt in fs */ #define STATX_ATTR_AUTOMOUNT 0x00001000 /* Dir: Automount trigger */ #define STATX_ATTR_VERITY 0x00100000 /* [I] Verity protected file */ +#define STATX_ATTR_DAX 0x00002000 /* [I] File is DAX */ #endif /* _UAPI_LINUX_STAT_H */ From patchwork Tue Apr 28 00:21:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 1278097 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49B2Qt0v9sz9sSk for ; Tue, 28 Apr 2020 10:22:30 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726470AbgD1AVv (ORCPT ); Mon, 27 Apr 2020 20:21:51 -0400 Received: from mga04.intel.com ([192.55.52.120]:41734 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726432AbgD1AVt (ORCPT ); Mon, 27 Apr 2020 20:21:49 -0400 IronPort-SDR: cPBqkSUpxh4JHp4Bh27IhlbLK4UHHhVFU8nwfrKCybIOtSPAFobvE209Uwd+sQpIamaGh93qAO uKb4rv5mN8gw== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Apr 2020 17:21:48 -0700 IronPort-SDR: 4BEeceNdCu0K8eAuBW6ElquobprMPPqCzX2266MyfI0E6spEC5ajtYUwv4lxbVQj7oQ4Vdo70n 0Xf2k60Tngiw== X-IronPort-AV: E=Sophos;i="5.73,325,1583222400"; d="scan'208";a="292681913" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Apr 2020 17:21:48 -0700 From: ira.weiny@intel.com To: linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, "Darrick J. Wong" Cc: Ira Weiny , Al Viro , Jan Kara , Dan Williams , Dave Chinner , Christoph Hellwig , "Theodore Y. Ts'o" , Jeff Moyer , linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Subject: [PATCH V11 04/11] Documentation/dax: Update Usage section Date: Mon, 27 Apr 2020 17:21:35 -0700 Message-Id: <20200428002142.404144-5-ira.weiny@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200428002142.404144-1-ira.weiny@intel.com> References: <20200428002142.404144-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org From: Ira Weiny Update the Usage section to reflect the new individual dax selection functionality. Signed-off-by: Ira Weiny --- Changes from V10: Clarifications from Dave Add '-c' to xfs_io examples Changes from V9: Fix missing ')' Fix trialing '"' Changes from V8: Updates from Darrick Changes from V7: Cleanups/clarifications from Darrick and Dan Changes from V6: Update to allow setting FS_XFLAG_DAX any time. Update with list of behaviors from Darrick https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/ Changes from V5: Update to reflect the agreed upon semantics https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/ --- Documentation/filesystems/dax.txt | 139 +++++++++++++++++++++++++++++- 1 file changed, 136 insertions(+), 3 deletions(-) diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt index 679729442fd2..409e4e83e46a 100644 --- a/Documentation/filesystems/dax.txt +++ b/Documentation/filesystems/dax.txt @@ -17,11 +17,144 @@ For file mappings, the storage device is mapped directly into userspace. Usage ----- -If you have a block device which supports DAX, you can make a filesystem +If you have a block device which supports DAX, you can make a file system on it as usual. The DAX code currently only supports files with a block size equal to your kernel's PAGE_SIZE, so you may need to specify a block -size when creating the filesystem. When mounting it, use the "-o dax" -option on the command line or add 'dax' to the options in /etc/fstab. +size when creating the file system. + +Currently 3 filesystems support DAX: ext2, ext4 and xfs. Enabling DAX on them +is different. + +Enabling DAX on ext4 and ext2 +----------------------------- + +When mounting the filesystem, use the "-o dax" option on the command line or +add 'dax' to the options in /etc/fstab. This works to enable DAX on all files +within the filesystem. It is equivalent to the '-o dax=always' behavior below. + + +Enabling DAX on xfs +------------------- + +Summary +------- + + 1. There exists an in-kernel file access mode flag S_DAX that corresponds to + the statx flag STATX_ATTR_DAX. See the manpage for statx(2) for details + about this access mode. + + 2. There exists a persistent flag FS_XFLAG_DAX that can be applied to regular + files and directories. This advisory flag can be set or cleared at any + time, but doing so does not immediately affect the S_DAX state. + + 3. If the persistent FS_XFLAG_DAX flag is set on a directory, this flag will + be inherited by all regular files and sub directories that are subsequently + created in this directory. Files and subdirectories that exist at the time + this flag is set or cleared on the parent directory are not modified by + this modification of the parent directory. + + 4. There exists dax mount options which can override FS_XFLAG_DAX in the + setting of the S_DAX flag. Given underlying storage which supports DAX the + following hold. + + "-o dax=inode" means "follow FS_XFLAG_DAX" and is the default. + + "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX." + + "-o dax=always" means "always set S_DAX ignore FS_XFLAG_DAX." + + "-o dax" is a legacy option which is an alias for "dax=always". + This may be removed in the future so "-o dax=always" is + the preferred method for specifying this behavior. + + NOTE: Setting and inheritance affect FS_XFLAG_DAX at all times even when + the file system is mounted with a dax option. However, in-core inode state + (S_DAX) will be overridden until the file system is remounted with + dax=inode and the inode is evicted from kernel memory. + + 5. The DAX policy can be changed via: + + a) Set the parent directory FS_XFLAG_DAX as needed before files are created + + b) Set the appropriate dax="foo" mount option + + c) Change the FS_XFLAG_DAX on existing regular files and directories. This + has runtime constraints and limitations that are described in 6) below. + + 6. When changing the DAX policy via toggling the persistent FS_XFLAG_DAX flag, + the change in behaviour for existing regular files may not occur + immediately. If the change must take effect immediately, the administrator + needs to: + + a) stop the application so there are no active references to the data set + the policy change will affect + + b) evict the data set from kernel caches so it will be re-instantiated when + the application is restarted. This can be acheived by: + + i. drop-caches + ii. a filesystem unmount and mount cycle + iii. a system reboot + + +Details +------- + +There are 2 per-file dax flags. One is a persistent inode setting (FS_XFLAG_DAX) +and the other is a volatile flag indicating the active state of the feature +(S_DAX). + +FS_XFLAG_DAX is preserved within the file system. This persistent config +setting can be set, cleared and/or queried using the FS_IOC_FS[GS]ETXATTR ioctl +(see ioctl_xfs_fsgetxattr(2)) or an utility such as 'xfs_io'. + +New files and directories automatically inherit FS_XFLAG_DAX from +their parent directory _when_ _created_. Therefore, setting FS_XFLAG_DAX at +directory creation time can be used to set a default behavior for an entire +sub-tree. + +To clarify inheritance here are 3 examples: + +Example A: + +mkdir -p a/b/c +xfs_io -c 'chattr +x' a +mkdir a/b/c/d +mkdir a/e + + dax: a,e + no dax: b,c,d + +Example B: + +mkdir a +xfs_io -c 'chattr +x' a +mkdir -p a/b/c/d + + dax: a,b,c,d + no dax: + +Example C: + +mkdir -p a/b/c +xfs_io -c 'chattr +x' c +mkdir a/b/c/d + + dax: c,d + no dax: a,b + + +The current enabled state (S_DAX) is set when a file inode is instantiated in +memory by the kernel. It is set based on the underlying media support, the +value of FS_XFLAG_DAX and the file systems dax mount option setting. + +statx can be used to query S_DAX. NOTE that a directory will never have S_DAX +set and therefore statx will never indicate that S_DAX is set on directories. + +Setting the FS_XFLAG_DAX (specifically or through inheritance) occurs even if +the underlying media does not support dax and/or the file system is overridden +with a mount option. + Implementation Tips for Block Driver Writers From patchwork Tue Apr 28 00:21:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 1278099 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49B2R1102Kz9sT9 for ; Tue, 28 Apr 2020 10:22:37 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726635AbgD1AWd (ORCPT ); Mon, 27 Apr 2020 20:22:33 -0400 Received: from mga07.intel.com ([134.134.136.100]:14273 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726445AbgD1AVu (ORCPT ); Mon, 27 Apr 2020 20:21:50 -0400 IronPort-SDR: WFcsoeW+/SRl9aJuWp/wa8wyClPdyq7ssM83Ga7So71n80Q0fDVtgtrKJjMLfXE6bl11Pcxg5a NlrQXm0ivCbQ== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Apr 2020 17:21:49 -0700 IronPort-SDR: ySxclc5LPntrD/UYO3RA9c4+mRIp+UX0a6VZj1NJ0MB+dkx//xiUT0e7aIjHanFL+9RNzzk6JH c39CmUDTasiw== X-IronPort-AV: E=Sophos;i="5.73,325,1583222400"; d="scan'208";a="293703521" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Apr 2020 17:21:48 -0700 From: ira.weiny@intel.com To: linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, "Darrick J. Wong" Cc: Ira Weiny , Dave Chinner , Al Viro , Jan Kara , Dan Williams , Dave Chinner , Christoph Hellwig , "Theodore Y. Ts'o" , Jeff Moyer , linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Subject: [PATCH V11 05/11] fs/xfs: Change XFS_MOUNT_DAX to XFS_MOUNT_DAX_ALWAYS Date: Mon, 27 Apr 2020 17:21:36 -0700 Message-Id: <20200428002142.404144-6-ira.weiny@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200428002142.404144-1-ira.weiny@intel.com> References: <20200428002142.404144-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org From: Ira Weiny In prep for the new tri-state mount option which then introduces XFS_MOUNT_DAX_NEVER. Reviewed-by: Dave Chinner Reviewed-by: Darrick J. Wong Signed-off-by: Ira Weiny --- Changes from v8 Move bit to 26 --- fs/xfs/xfs_iops.c | 2 +- fs/xfs/xfs_mount.h | 3 +-- fs/xfs/xfs_super.c | 8 ++++---- 3 files changed, 6 insertions(+), 7 deletions(-) diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index f7a99b3bbcf7..462f89af479a 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -1248,7 +1248,7 @@ xfs_inode_supports_dax( return false; /* DAX mount option or DAX iflag must be set. */ - if (!(mp->m_flags & XFS_MOUNT_DAX) && + if (!(mp->m_flags & XFS_MOUNT_DAX_ALWAYS) && !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)) return false; diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index b2e4598fdf7d..f6123fb0113c 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -237,8 +237,7 @@ typedef struct xfs_mount { #define XFS_MOUNT_FILESTREAMS (1ULL << 24) /* enable the filestreams allocator */ #define XFS_MOUNT_NOATTR2 (1ULL << 25) /* disable use of attr2 format */ - -#define XFS_MOUNT_DAX (1ULL << 62) /* TEST ONLY! */ +#define XFS_MOUNT_DAX_ALWAYS (1ULL << 26) /* * Max and min values for mount-option defined I/O diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 424bb9a2d532..ce169d1c7474 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -129,7 +129,7 @@ xfs_fs_show_options( { XFS_MOUNT_GRPID, ",grpid" }, { XFS_MOUNT_DISCARD, ",discard" }, { XFS_MOUNT_LARGEIO, ",largeio" }, - { XFS_MOUNT_DAX, ",dax" }, + { XFS_MOUNT_DAX_ALWAYS, ",dax" }, { 0, NULL } }; struct xfs_mount *mp = XFS_M(root->d_sb); @@ -1261,7 +1261,7 @@ xfs_fc_parse_param( return 0; #ifdef CONFIG_FS_DAX case Opt_dax: - mp->m_flags |= XFS_MOUNT_DAX; + mp->m_flags |= XFS_MOUNT_DAX_ALWAYS; return 0; #endif default: @@ -1454,7 +1454,7 @@ xfs_fc_fill_super( if (XFS_SB_VERSION_NUM(&mp->m_sb) == XFS_SB_VERSION_5) sb->s_flags |= SB_I_VERSION; - if (mp->m_flags & XFS_MOUNT_DAX) { + if (mp->m_flags & XFS_MOUNT_DAX_ALWAYS) { bool rtdev_is_dax = false, datadev_is_dax; xfs_warn(mp, @@ -1468,7 +1468,7 @@ xfs_fc_fill_super( if (!rtdev_is_dax && !datadev_is_dax) { xfs_alert(mp, "DAX unsupported by block device. Turning off DAX."); - mp->m_flags &= ~XFS_MOUNT_DAX; + mp->m_flags &= ~XFS_MOUNT_DAX_ALWAYS; } if (xfs_sb_version_hasreflink(&mp->m_sb)) { xfs_alert(mp, From patchwork Tue Apr 28 00:21:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 1278095 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49B2Qn0HLLz9sSk for ; Tue, 28 Apr 2020 10:22:25 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726380AbgD1AWS (ORCPT ); Mon, 27 Apr 2020 20:22:18 -0400 Received: from mga07.intel.com ([134.134.136.100]:14273 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726457AbgD1AVv (ORCPT ); Mon, 27 Apr 2020 20:21:51 -0400 IronPort-SDR: t383m2xnpsHaNhoyax9NzRg6iSgsnJz/f/I8mlb5NZCMW7AA3bg4MyarVRS5Ap2jmX1zzKwCt9 bqNYv7JqwIsw== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Apr 2020 17:21:50 -0700 IronPort-SDR: eWA3GBJHSbjKXcVCTyGr8gP4FBNWa5wYY863p6NtsFFZGtzPHlcBQ5pIEKCQMS7b1cgFx4bhbo xyrVWY6+cbKg== X-IronPort-AV: E=Sophos;i="5.73,325,1583222400"; d="scan'208";a="336468366" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Apr 2020 17:21:49 -0700 From: ira.weiny@intel.com To: linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, "Darrick J. Wong" Cc: Ira Weiny , Al Viro , Jan Kara , Dan Williams , Dave Chinner , Christoph Hellwig , "Theodore Y. Ts'o" , Jeff Moyer , linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Subject: [PATCH V11 06/11] fs/xfs: Make DAX mount option a tri-state Date: Mon, 27 Apr 2020 17:21:37 -0700 Message-Id: <20200428002142.404144-7-ira.weiny@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200428002142.404144-1-ira.weiny@intel.com> References: <20200428002142.404144-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org From: Ira Weiny As agreed upon[1]. We make the dax mount option a tri-state. '-o dax' continues to operate the same. We add 'always', 'never', and 'inode' (default). [1] https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/ Signed-off-by: Ira Weiny Reviewed-by: Darrick J. Wong --- Changes from V10: Move show options to xfs_info_set array Changes from V9: Fix indentation in xfs_mount_set_dax_mode() Do not report dax=inode Changes from v8: Move NEVER bit to 27 Use xfs signature style use xfs_dax_mode enum Changes from v7: Change to XFS_MOUNT_DAX_NEVER Changes from v6: Use 2 flag bits rather than a field. change iflag to inode Changes from v5: New Patch --- fs/xfs/xfs_mount.h | 1 + fs/xfs/xfs_super.c | 46 ++++++++++++++++++++++++++++++++++++++++++---- 2 files changed, 43 insertions(+), 4 deletions(-) diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index f6123fb0113c..37bfb50db809 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -238,6 +238,7 @@ typedef struct xfs_mount { allocator */ #define XFS_MOUNT_NOATTR2 (1ULL << 25) /* disable use of attr2 format */ #define XFS_MOUNT_DAX_ALWAYS (1ULL << 26) +#define XFS_MOUNT_DAX_NEVER (1ULL << 27) /* * Max and min values for mount-option defined I/O diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index ce169d1c7474..e80bd2c4c279 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -47,6 +47,39 @@ static struct kset *xfs_kset; /* top-level xfs sysfs dir */ static struct xfs_kobj xfs_dbg_kobj; /* global debug sysfs attrs */ #endif +enum xfs_dax_mode { + XFS_DAX_INODE = 0, + XFS_DAX_ALWAYS = 1, + XFS_DAX_NEVER = 2, +}; + +static void +xfs_mount_set_dax_mode( + struct xfs_mount *mp, + enum xfs_dax_mode mode) +{ + switch (mode) { + case XFS_DAX_INODE: + mp->m_flags &= ~(XFS_MOUNT_DAX_ALWAYS | XFS_MOUNT_DAX_NEVER); + break; + case XFS_DAX_ALWAYS: + mp->m_flags |= XFS_MOUNT_DAX_ALWAYS; + mp->m_flags &= ~XFS_MOUNT_DAX_NEVER; + break; + case XFS_DAX_NEVER: + mp->m_flags |= XFS_MOUNT_DAX_NEVER; + mp->m_flags &= ~XFS_MOUNT_DAX_ALWAYS; + break; + } +} + +static const struct constant_table dax_param_enums[] = { + {"inode", XFS_DAX_INODE }, + {"always", XFS_DAX_ALWAYS }, + {"never", XFS_DAX_NEVER }, + {} +}; + /* * Table driven mount option parser. */ @@ -59,7 +92,7 @@ enum { Opt_filestreams, Opt_quota, Opt_noquota, Opt_usrquota, Opt_grpquota, Opt_prjquota, Opt_uquota, Opt_gquota, Opt_pquota, Opt_uqnoenforce, Opt_gqnoenforce, Opt_pqnoenforce, Opt_qnoenforce, - Opt_discard, Opt_nodiscard, Opt_dax, + Opt_discard, Opt_nodiscard, Opt_dax, Opt_dax_enum, }; static const struct fs_parameter_spec xfs_fs_parameters[] = { @@ -103,6 +136,7 @@ static const struct fs_parameter_spec xfs_fs_parameters[] = { fsparam_flag("discard", Opt_discard), fsparam_flag("nodiscard", Opt_nodiscard), fsparam_flag("dax", Opt_dax), + fsparam_enum("dax", Opt_dax_enum, dax_param_enums), {} }; @@ -129,7 +163,8 @@ xfs_fs_show_options( { XFS_MOUNT_GRPID, ",grpid" }, { XFS_MOUNT_DISCARD, ",discard" }, { XFS_MOUNT_LARGEIO, ",largeio" }, - { XFS_MOUNT_DAX_ALWAYS, ",dax" }, + { XFS_MOUNT_DAX_ALWAYS, ",dax=always" }, + { XFS_MOUNT_DAX_NEVER, ",dax=never" }, { 0, NULL } }; struct xfs_mount *mp = XFS_M(root->d_sb); @@ -1261,7 +1296,10 @@ xfs_fc_parse_param( return 0; #ifdef CONFIG_FS_DAX case Opt_dax: - mp->m_flags |= XFS_MOUNT_DAX_ALWAYS; + xfs_mount_set_dax_mode(mp, XFS_DAX_ALWAYS); + return 0; + case Opt_dax_enum: + xfs_mount_set_dax_mode(mp, result.uint_32); return 0; #endif default: @@ -1468,7 +1506,7 @@ xfs_fc_fill_super( if (!rtdev_is_dax && !datadev_is_dax) { xfs_alert(mp, "DAX unsupported by block device. Turning off DAX."); - mp->m_flags &= ~XFS_MOUNT_DAX_ALWAYS; + xfs_mount_set_dax_mode(mp, XFS_DAX_NEVER); } if (xfs_sb_version_hasreflink(&mp->m_sb)) { xfs_alert(mp, From patchwork Tue Apr 28 00:21:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 1278093 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49B2QZ00Q1z9sT0 for ; Tue, 28 Apr 2020 10:22:13 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726515AbgD1AV5 (ORCPT ); Mon, 27 Apr 2020 20:21:57 -0400 Received: from mga09.intel.com ([134.134.136.24]:39933 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726484AbgD1AVx (ORCPT ); Mon, 27 Apr 2020 20:21:53 -0400 IronPort-SDR: tVkxbhlnDTUlF26DCKF9C0/zudZ/HUjBM4RLsgJ6+0aAwqLVbfE9VLiM4vSvIlXWkXXpl5FVf0 iy88y/Xf2umA== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Apr 2020 17:21:52 -0700 IronPort-SDR: TsZuNc5ClJ37I+d4gkRSjfbs3siVOfhOd8uW7S14VRn0daJY3aMIY3ZHoGDbaP69yC29DgmC6m zwvqTkWFBORw== X-IronPort-AV: E=Sophos;i="5.73,325,1583222400"; d="scan'208";a="367329120" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Apr 2020 17:21:50 -0700 From: ira.weiny@intel.com To: linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, "Darrick J. Wong" Cc: Ira Weiny , Dave Chinner , Al Viro , Jan Kara , Dan Williams , Dave Chinner , Christoph Hellwig , "Theodore Y. Ts'o" , Jeff Moyer , linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Subject: [PATCH V11 07/11] fs/xfs: Create function xfs_inode_should_enable_dax() Date: Mon, 27 Apr 2020 17:21:38 -0700 Message-Id: <20200428002142.404144-8-ira.weiny@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200428002142.404144-1-ira.weiny@intel.com> References: <20200428002142.404144-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org From: Ira Weiny xfs_inode_supports_dax() should reflect if the inode can support DAX not that it is enabled for DAX. Change the use of xfs_inode_supports_dax() to reflect only if the inode and underlying storage support dax. Add a new function xfs_inode_should_enable_dax() which reflects if the inode should be enabled for DAX. Reviewed-by: Dave Chinner Reviewed-by: Darrick J. Wong Signed-off-by: Ira Weiny --- Changes from V8 Change to 'should enable' (feedback given by Jan in ext4 series) Darrick I've preserved your Reviewed-by for now LMK if that is an issue. Changes from v7: Move S_ISREG check first use IS_ENABLED(CONFIG_FS_DAX) rather than duplicated function Changes from v6: Change enable checks to be sequential logic. Update for 2 bit tri-state option. Make 'static' consistent. Don't set S_DAX if !CONFIG_FS_DAX Changes from v5: Update to reflect the new tri-state mount option Changes from v3: Update functions and names to be more clear Update commit message Merge with 'fs/xfs: Clean up DAX support check' don't allow IS_DAX() on a directory use STATIC macro for static make xfs_inode_supports_dax() static --- fs/xfs/xfs_iops.c | 28 ++++++++++++++++++++++------ 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index 462f89af479a..1814f10e43d3 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -1243,13 +1243,12 @@ xfs_inode_supports_dax( { struct xfs_mount *mp = ip->i_mount; - /* Only supported on non-reflinked files. */ - if (!S_ISREG(VFS_I(ip)->i_mode) || xfs_is_reflink_inode(ip)) + /* Only supported on regular files. */ + if (!S_ISREG(VFS_I(ip)->i_mode)) return false; - /* DAX mount option or DAX iflag must be set. */ - if (!(mp->m_flags & XFS_MOUNT_DAX_ALWAYS) && - !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)) + /* Only supported on non-reflinked files. */ + if (xfs_is_reflink_inode(ip)) return false; /* Block size must match page size */ @@ -1260,6 +1259,23 @@ xfs_inode_supports_dax( return xfs_inode_buftarg(ip)->bt_daxdev != NULL; } +static bool +xfs_inode_should_enable_dax( + struct xfs_inode *ip) +{ + if (!IS_ENABLED(CONFIG_FS_DAX)) + return false; + if (ip->i_mount->m_flags & XFS_MOUNT_DAX_NEVER) + return false; + if (!xfs_inode_supports_dax(ip)) + return false; + if (ip->i_mount->m_flags & XFS_MOUNT_DAX_ALWAYS) + return true; + if (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX) + return true; + return false; +} + STATIC void xfs_diflags_to_iflags( struct inode *inode, @@ -1278,7 +1294,7 @@ xfs_diflags_to_iflags( inode->i_flags |= S_SYNC; if (flags & XFS_DIFLAG_NOATIME) inode->i_flags |= S_NOATIME; - if (xfs_inode_supports_dax(ip)) + if (xfs_inode_should_enable_dax(ip)) inode->i_flags |= S_DAX; } From patchwork Tue Apr 28 00:21:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 1278096 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49B2Qq3c0cz9sSk for ; Tue, 28 Apr 2020 10:22:27 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726591AbgD1AWS (ORCPT ); Mon, 27 Apr 2020 20:22:18 -0400 Received: from mga04.intel.com ([192.55.52.120]:41734 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726481AbgD1AVw (ORCPT ); Mon, 27 Apr 2020 20:21:52 -0400 IronPort-SDR: UfDoP9kmNaMvZJrXua9JkO58sgEpFDdHKMeJED/e6Q4RcqrUVwJEPjEgw9Y6RFLpHSg9CT6Rqc fghZYC+20opQ== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Apr 2020 17:21:52 -0700 IronPort-SDR: UnUUQOA6vhCCCVb2xLKmCAo+wPSCQjjy9hUhM1EjCr+hQfEVewE9ChqsbvndLdUWQAWc6YTmYN WjvT1gG31o6w== X-IronPort-AV: E=Sophos;i="5.73,325,1583222400"; d="scan'208";a="249027331" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Apr 2020 17:21:52 -0700 From: ira.weiny@intel.com To: linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, "Darrick J. Wong" Cc: Ira Weiny , Dave Chinner , Al Viro , Jan Kara , Dan Williams , Dave Chinner , Christoph Hellwig , "Theodore Y. Ts'o" , Jeff Moyer , linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Subject: [PATCH V11 08/11] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags() Date: Mon, 27 Apr 2020 17:21:39 -0700 Message-Id: <20200428002142.404144-9-ira.weiny@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200428002142.404144-1-ira.weiny@intel.com> References: <20200428002142.404144-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org From: Ira Weiny The functionality in xfs_diflags_to_linux() and xfs_diflags_to_iflags() are nearly identical. The only difference is that *_to_linux() is called after inode setup and disallows changing the DAX flag. Combining them can be done with a flag which indicates if this is the initial setup to allow the DAX flag to be properly set only at init time. So remove xfs_diflags_to_linux() and call the modified xfs_diflags_to_iflags() directly. While we are here simplify xfs_diflags_to_iflags() to take struct xfs_inode and use xfs_ip2xflags() to ensure future diflags are included correctly. Reviewed-by: Dave Chinner Reviewed-by: Darrick J. Wong Signed-off-by: Ira Weiny --- Changes from V7: Clarify with a comment the reason for leaving S_DAX out of the mask Changes from V6: Move unrelated hunk to previous patch. Change logic for better code generation. Changes from V5: The functions are no longer identical so we can only combine them rather than deleting one completely. This is reflected in the new init parameter. --- fs/xfs/xfs_inode.h | 1 + fs/xfs/xfs_ioctl.c | 33 +-------------------------------- fs/xfs/xfs_iops.c | 46 +++++++++++++++++++++++++++------------------- 3 files changed, 29 insertions(+), 51 deletions(-) diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index c6a63f6764a6..83073c883fbf 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -467,6 +467,7 @@ int xfs_break_layouts(struct inode *inode, uint *iolock, /* from xfs_iops.c */ extern void xfs_setup_inode(struct xfs_inode *ip); extern void xfs_setup_iops(struct xfs_inode *ip); +extern void xfs_diflags_to_iflags(struct xfs_inode *ip, bool init); /* * When setting up a newly allocated inode, we need to call diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 309958186d33..104495ac187c 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1201,37 +1201,6 @@ xfs_flags2diflags2( return di_flags2; } -STATIC void -xfs_diflags_to_linux( - struct xfs_inode *ip) -{ - struct inode *inode = VFS_I(ip); - unsigned int xflags = xfs_ip2xflags(ip); - - if (xflags & FS_XFLAG_IMMUTABLE) - inode->i_flags |= S_IMMUTABLE; - else - inode->i_flags &= ~S_IMMUTABLE; - if (xflags & FS_XFLAG_APPEND) - inode->i_flags |= S_APPEND; - else - inode->i_flags &= ~S_APPEND; - if (xflags & FS_XFLAG_SYNC) - inode->i_flags |= S_SYNC; - else - inode->i_flags &= ~S_SYNC; - if (xflags & FS_XFLAG_NOATIME) - inode->i_flags |= S_NOATIME; - else - inode->i_flags &= ~S_NOATIME; -#if 0 /* disabled until the flag switching races are sorted out */ - if (xflags & FS_XFLAG_DAX) - inode->i_flags |= S_DAX; - else - inode->i_flags &= ~S_DAX; -#endif -} - static int xfs_ioctl_setattr_xflags( struct xfs_trans *tp, @@ -1269,7 +1238,7 @@ xfs_ioctl_setattr_xflags( ip->i_d.di_flags = xfs_flags2diflags(ip, fa->fsx_xflags); ip->i_d.di_flags2 = di_flags2; - xfs_diflags_to_linux(ip); + xfs_diflags_to_iflags(ip, false); xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG); xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); XFS_STATS_INC(mp, xs_ig_attrchg); diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index 1814f10e43d3..b70b735fe4a4 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -1276,26 +1276,34 @@ xfs_inode_should_enable_dax( return false; } -STATIC void +void xfs_diflags_to_iflags( - struct inode *inode, - struct xfs_inode *ip) + struct xfs_inode *ip, + bool init) { - uint16_t flags = ip->i_d.di_flags; - - inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC | - S_NOATIME | S_DAX); - - if (flags & XFS_DIFLAG_IMMUTABLE) - inode->i_flags |= S_IMMUTABLE; - if (flags & XFS_DIFLAG_APPEND) - inode->i_flags |= S_APPEND; - if (flags & XFS_DIFLAG_SYNC) - inode->i_flags |= S_SYNC; - if (flags & XFS_DIFLAG_NOATIME) - inode->i_flags |= S_NOATIME; - if (xfs_inode_should_enable_dax(ip)) - inode->i_flags |= S_DAX; + struct inode *inode = VFS_I(ip); + unsigned int xflags = xfs_ip2xflags(ip); + unsigned int flags = 0; + + ASSERT(!(IS_DAX(inode) && init)); + + if (xflags & FS_XFLAG_IMMUTABLE) + flags |= S_IMMUTABLE; + if (xflags & FS_XFLAG_APPEND) + flags |= S_APPEND; + if (xflags & FS_XFLAG_SYNC) + flags |= S_SYNC; + if (xflags & FS_XFLAG_NOATIME) + flags |= S_NOATIME; + if (init && xfs_inode_should_enable_dax(ip)) + flags |= S_DAX; + + /* + * S_DAX can only be set during inode initialization and is never set by + * the VFS, so we cannot mask off S_DAX in i_flags. + */ + inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC | S_NOATIME); + inode->i_flags |= flags; } /* @@ -1321,7 +1329,7 @@ xfs_setup_inode( inode_fake_hash(inode); i_size_write(inode, ip->i_d.di_size); - xfs_diflags_to_iflags(inode, ip); + xfs_diflags_to_iflags(ip, true); if (S_ISDIR(inode->i_mode)) { /* From patchwork Tue Apr 28 00:21:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 1278092 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49B2QY1lXwz9sT3 for ; Tue, 28 Apr 2020 10:22:13 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726526AbgD1AV6 (ORCPT ); Mon, 27 Apr 2020 20:21:58 -0400 Received: from mga03.intel.com ([134.134.136.65]:52928 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726488AbgD1AVx (ORCPT ); Mon, 27 Apr 2020 20:21:53 -0400 IronPort-SDR: Z++leLlhYFsJN4LsVSFL+TVo1OPjEHtZl0+F7qVJewe3Gi6HxJbQ4SI6+SnuDWx0N2/kumXh4F cJchzrm/0scg== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Apr 2020 17:21:52 -0700 IronPort-SDR: FazWKlPef8llcIyHpMdLw8WvjPnwI2ghe9liwaqjxP0xrAUv2n4anikCOZBLyz02titprEG1On +AuJ5mBqB3ng== X-IronPort-AV: E=Sophos;i="5.73,325,1583222400"; d="scan'208";a="431980382" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Apr 2020 17:21:52 -0700 From: ira.weiny@intel.com To: linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, "Darrick J. Wong" Cc: Ira Weiny , Al Viro , Dave Chinner , Jan Kara , Dan Williams , Dave Chinner , Christoph Hellwig , "Theodore Y. Ts'o" , Jeff Moyer , linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Subject: [PATCH V11 09/11] fs: Lift XFS_IDONTCACHE to the VFS layer Date: Mon, 27 Apr 2020 17:21:40 -0700 Message-Id: <20200428002142.404144-10-ira.weiny@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200428002142.404144-1-ira.weiny@intel.com> References: <20200428002142.404144-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org From: Ira Weiny DAX effective mode (S_DAX) changes requires inode eviction. XFS has an advisory flag (XFS_IDONTCACHE) to prevent caching of the inode if no other additional references are taken. We lift this flag to the VFS layer and change the behavior slightly by allowing the flag to remain even if multiple references are taken. This will expedite the eviction of inodes to change S_DAX. Cc: Al Viro Reviewed-by: Dave Chinner Reviewed-by: Jan Kara Signed-off-by: Ira Weiny Reviewed-by: Darrick J. Wong --- Changes from V9: Fix misspelling in commit subject move XFS_IEOFBLOCKS to '9' Changes from V8: Remove XFS_IDONTCACHE --- fs/xfs/xfs_icache.c | 4 ++-- fs/xfs/xfs_inode.h | 3 +-- fs/xfs/xfs_super.c | 2 +- include/linux/fs.h | 6 +++++- 4 files changed, 9 insertions(+), 6 deletions(-) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 17a0b86fe701..de76f7f60695 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -477,7 +477,7 @@ xfs_iget_cache_hit( xfs_ilock(ip, lock_flags); if (!(flags & XFS_IGET_INCORE)) - xfs_iflags_clear(ip, XFS_ISTALE | XFS_IDONTCACHE); + xfs_iflags_clear(ip, XFS_ISTALE); XFS_STATS_INC(mp, xs_ig_found); return 0; @@ -559,7 +559,7 @@ xfs_iget_cache_miss( */ iflags = XFS_INEW; if (flags & XFS_IGET_DONTCACHE) - iflags |= XFS_IDONTCACHE; + VFS_I(ip)->i_state |= I_DONTCACHE; ip->i_udquot = NULL; ip->i_gdquot = NULL; ip->i_pdquot = NULL; diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 83073c883fbf..d8ce3eaa246e 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -218,8 +218,7 @@ static inline bool xfs_inode_has_cow_data(struct xfs_inode *ip) #define XFS_IFLOCK (1 << __XFS_IFLOCK_BIT) #define __XFS_IPINNED_BIT 8 /* wakeup key for zero pin count */ #define XFS_IPINNED (1 << __XFS_IPINNED_BIT) -#define XFS_IDONTCACHE (1 << 9) /* don't cache the inode long term */ -#define XFS_IEOFBLOCKS (1 << 10)/* has the preallocblocks tag set */ +#define XFS_IEOFBLOCKS (1 << 9) /* has the preallocblocks tag set */ /* * If this unlinked inode is in the middle of recovery, don't let drop_inode * truncate and free the inode. This can happen if we iget the inode during diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index e80bd2c4c279..6f91c13fb6ea 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -737,7 +737,7 @@ xfs_fs_drop_inode( return 0; } - return generic_drop_inode(inode) || (ip->i_flags & XFS_IDONTCACHE); + return generic_drop_inode(inode); } static void diff --git a/include/linux/fs.h b/include/linux/fs.h index a87cc5845a02..44bd45af760f 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2156,6 +2156,8 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src, * * I_CREATING New object's inode in the middle of setting up. * + * I_DONTCACHE Evict inode as soon as it is not used anymore. + * * Q: What is the difference between I_WILL_FREE and I_FREEING? */ #define I_DIRTY_SYNC (1 << 0) @@ -2178,6 +2180,7 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src, #define I_WB_SWITCH (1 << 13) #define I_OVL_INUSE (1 << 14) #define I_CREATING (1 << 15) +#define I_DONTCACHE (1 << 16) #define I_DIRTY_INODE (I_DIRTY_SYNC | I_DIRTY_DATASYNC) #define I_DIRTY (I_DIRTY_INODE | I_DIRTY_PAGES) @@ -3049,7 +3052,8 @@ extern int inode_needs_sync(struct inode *inode); extern int generic_delete_inode(struct inode *inode); static inline int generic_drop_inode(struct inode *inode) { - return !inode->i_nlink || inode_unhashed(inode); + return !inode->i_nlink || inode_unhashed(inode) || + (inode->i_state & I_DONTCACHE); } extern struct inode *ilookup5_nowait(struct super_block *sb, From patchwork Tue Apr 28 00:21:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 1278094 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49B2Qc4kmZz9sT1 for ; Tue, 28 Apr 2020 10:22:16 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726503AbgD1AV5 (ORCPT ); Mon, 27 Apr 2020 20:21:57 -0400 Received: from mga18.intel.com ([134.134.136.126]:16476 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726490AbgD1AVy (ORCPT ); Mon, 27 Apr 2020 20:21:54 -0400 IronPort-SDR: 3vWwRIdM05wiWKlVS/B0qckbhlC/5rwEy4mFLHktWLQRK8J1u56XXPf+W5LcYPc5ftDIU5oZ1W b071gugKGrlQ== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Apr 2020 17:21:53 -0700 IronPort-SDR: EBMto9saUZg0SHzhh9Esk6dkcfxfoqkZdSB7xasDw+wNWUVpbPtxfPMfee3gcUJeu7/B3YYy35 8vgFtfxw9ckg== X-IronPort-AV: E=Sophos;i="5.73,325,1583222400"; d="scan'208";a="404501302" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Apr 2020 17:21:53 -0700 From: ira.weiny@intel.com To: linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, "Darrick J. Wong" Cc: Ira Weiny , Al Viro , Jan Kara , Dan Williams , Dave Chinner , Christoph Hellwig , "Theodore Y. Ts'o" , Jeff Moyer , linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Subject: [PATCH V11 10/11] fs: Introduce DCACHE_DONTCACHE Date: Mon, 27 Apr 2020 17:21:41 -0700 Message-Id: <20200428002142.404144-11-ira.weiny@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200428002142.404144-1-ira.weiny@intel.com> References: <20200428002142.404144-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org From: Ira Weiny DCACHE_DONTCACHE indicates a dentry should not be cached on final dput(). Also add a helper function to mark DCACHE_DONTCACHE on all dentries pointing to a specific inode when that inode is being set I_DONTCACHE. This facilitates dropping dentry references to inodes sooner which require eviction to swap S_DAX mode. Cc: Al Viro Signed-off-by: Ira Weiny Reviewed-by: Jan Kara Reviewed-by: Darrick J. Wong --- Changes from V10: rename to d_mark_dontcache() Move function to fs/dcache.c Changes from V9: modify i_state under i_lock Update comment "Purge from memory on final dput()" Changes from V8: Update commit message Use mark_inode_dontcache in XFS Fix locking... can't use rcu here. Change name to mark_inode_dontcache --- fs/dcache.c | 19 +++++++++++++++++++ fs/xfs/xfs_icache.c | 2 +- include/linux/dcache.h | 2 ++ include/linux/fs.h | 1 + 4 files changed, 23 insertions(+), 1 deletion(-) diff --git a/fs/dcache.c b/fs/dcache.c index b280e07e162b..0d07fb335b78 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -647,6 +647,10 @@ static inline bool retain_dentry(struct dentry *dentry) if (dentry->d_op->d_delete(dentry)) return false; } + + if (unlikely(dentry->d_flags & DCACHE_DONTCACHE)) + return false; + /* retain; LRU fodder */ dentry->d_lockref.count--; if (unlikely(!(dentry->d_flags & DCACHE_LRU_LIST))) @@ -656,6 +660,21 @@ static inline bool retain_dentry(struct dentry *dentry) return true; } +void d_mark_dontcache(struct inode *inode) +{ + struct dentry *de; + + spin_lock(&inode->i_lock); + hlist_for_each_entry(de, &inode->i_dentry, d_u.d_alias) { + spin_lock(&de->d_lock); + de->d_flags |= DCACHE_DONTCACHE; + spin_unlock(&de->d_lock); + } + inode->i_state |= I_DONTCACHE; + spin_unlock(&inode->i_lock); +} +EXPORT_SYMBOL(d_mark_dontcache); + /* * Finish off a dentry we've decided to kill. * dentry->d_lock must be held, returns with it unlocked. diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index de76f7f60695..888646d74d7d 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -559,7 +559,7 @@ xfs_iget_cache_miss( */ iflags = XFS_INEW; if (flags & XFS_IGET_DONTCACHE) - VFS_I(ip)->i_state |= I_DONTCACHE; + d_mark_dontcache(VFS_I(ip)); ip->i_udquot = NULL; ip->i_gdquot = NULL; ip->i_pdquot = NULL; diff --git a/include/linux/dcache.h b/include/linux/dcache.h index c1488cc84fd9..a81f0c3cf352 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -177,6 +177,8 @@ struct dentry_operations { #define DCACHE_REFERENCED 0x00000040 /* Recently used, don't discard. */ +#define DCACHE_DONTCACHE 0x00000080 /* Purge from memory on final dput() */ + #define DCACHE_CANT_MOUNT 0x00000100 #define DCACHE_GENOCIDE 0x00000200 #define DCACHE_SHRINK_LIST 0x00000400 diff --git a/include/linux/fs.h b/include/linux/fs.h index 44bd45af760f..7c3e8c0306e0 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3055,6 +3055,7 @@ static inline int generic_drop_inode(struct inode *inode) return !inode->i_nlink || inode_unhashed(inode) || (inode->i_state & I_DONTCACHE); } +extern void d_mark_dontcache(struct inode *inode); extern struct inode *ilookup5_nowait(struct super_block *sb, unsigned long hashval, int (*test)(struct inode *, void *), From patchwork Tue Apr 28 00:21:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 1278091 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49B2QT0H2pz9sSv for ; Tue, 28 Apr 2020 10:22:09 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726564AbgD1AWE (ORCPT ); Mon, 27 Apr 2020 20:22:04 -0400 Received: from mga06.intel.com ([134.134.136.31]:25937 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726493AbgD1AWC (ORCPT ); Mon, 27 Apr 2020 20:22:02 -0400 IronPort-SDR: SF3QyLq5YT64BxRjhMkIBus8Oc1yRnHbHpcjUqk64jOWqoDnLPX9B49DCwLlEzEyMygy/3Rqpm gHAbsIxeJ9ZA== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Apr 2020 17:21:54 -0700 IronPort-SDR: Hq63tgAjB3f5Hpl2MxK6vDBrnRddUjyobkO3euXTviZ77LYTmNl1wB/azwRVOmuKmOwUr7j0pb FaNKIImWdCQQ== X-IronPort-AV: E=Sophos;i="5.73,325,1583222400"; d="scan'208";a="458590536" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Apr 2020 17:21:53 -0700 From: ira.weiny@intel.com To: linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, "Darrick J. Wong" Cc: Ira Weiny , Al Viro , Jan Kara , Dan Williams , Dave Chinner , Christoph Hellwig , "Theodore Y. Ts'o" , Jeff Moyer , linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Subject: [PATCH V11 11/11] fs/xfs: Update xfs_ioctl_setattr_dax_invalidate() Date: Mon, 27 Apr 2020 17:21:42 -0700 Message-Id: <20200428002142.404144-12-ira.weiny@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200428002142.404144-1-ira.weiny@intel.com> References: <20200428002142.404144-1-ira.weiny@intel.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org From: Ira Weiny Because of the separation of FS_XFLAG_DAX from S_DAX and the delayed setting of S_DAX, data invalidation no longer needs to happen when FS_XFLAG_DAX is changed. Change xfs_ioctl_setattr_dax_invalidate() to be xfs_ioctl_dax_check_set_cache() and alter the code to reflect the new functionality. Furthermore, we no longer need the locking so we remove the join_flags logic. Signed-off-by: Ira Weiny Reviewed-by: Darrick J. Wong --- Changes from V10: adjust for renamed d_mark_dontcache() function Changes from V9: Change name of function to xfs_ioctl_setattr_prepare_dax() Changes from V8: Change name of function to xfs_ioctl_dax_check_set_cache() Update commit message Fix bit manipulations Changes from V7: Use new flag_inode_dontcache() Skip don't cache if mount over ride is active. Changes from v6: Fix completely broken implementation and update commit message. Use the new VFS layer I_DONTCACHE to facilitate inode eviction and S_DAX changing on drop_caches Changes from v5: New patch --- fs/xfs/xfs_ioctl.c | 108 +++++++++------------------------------------ 1 file changed, 20 insertions(+), 88 deletions(-) diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 104495ac187c..ff474f2c9acf 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1245,64 +1245,26 @@ xfs_ioctl_setattr_xflags( return 0; } -/* - * If we are changing DAX flags, we have to ensure the file is clean and any - * cached objects in the address space are invalidated and removed. This - * requires us to lock out other IO and page faults similar to a truncate - * operation. The locks need to be held until the transaction has been committed - * so that the cache invalidation is atomic with respect to the DAX flag - * manipulation. - */ -static int -xfs_ioctl_setattr_dax_invalidate( +static void +xfs_ioctl_setattr_prepare_dax( struct xfs_inode *ip, - struct fsxattr *fa, - int *join_flags) + struct fsxattr *fa) { - struct inode *inode = VFS_I(ip); - struct super_block *sb = inode->i_sb; - int error; - - *join_flags = 0; - - /* - * It is only valid to set the DAX flag on regular files and - * directories on filesystems where the block size is equal to the page - * size. On directories it serves as an inherited hint so we don't - * have to check the device for dax support or flush pagecache. - */ - if (fa->fsx_xflags & FS_XFLAG_DAX) { - struct xfs_buftarg *target = xfs_inode_buftarg(ip); - - if (!bdev_dax_supported(target->bt_bdev, sb->s_blocksize)) - return -EINVAL; - } - - /* If the DAX state is not changing, we have nothing to do here. */ - if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode)) - return 0; - if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_DAX(inode)) - return 0; + struct xfs_mount *mp = ip->i_mount; + struct inode *inode = VFS_I(ip); if (S_ISDIR(inode->i_mode)) - return 0; - - /* lock, flush and invalidate mapping in preparation for flag change */ - xfs_ilock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL); - error = filemap_write_and_wait(inode->i_mapping); - if (error) - goto out_unlock; - error = invalidate_inode_pages2(inode->i_mapping); - if (error) - goto out_unlock; - - *join_flags = XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL; - return 0; + return; -out_unlock: - xfs_iunlock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL); - return error; + if ((mp->m_flags & XFS_MOUNT_DAX_ALWAYS) || + (mp->m_flags & XFS_MOUNT_DAX_NEVER)) + return; + if (((fa->fsx_xflags & FS_XFLAG_DAX) && + !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)) || + (!(fa->fsx_xflags & FS_XFLAG_DAX) && + (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX))) + d_mark_dontcache(inode); } /* @@ -1310,17 +1272,10 @@ xfs_ioctl_setattr_dax_invalidate( * have permission to do so. On success, return a clean transaction and the * inode locked exclusively ready for further operation specific checks. On * failure, return an error without modifying or locking the inode. - * - * The inode might already be IO locked on call. If this is the case, it is - * indicated in @join_flags and we take full responsibility for ensuring they - * are unlocked from now on. Hence if we have an error here, we still have to - * unlock them. Otherwise, once they are joined to the transaction, they will - * be unlocked on commit/cancel. */ static struct xfs_trans * xfs_ioctl_setattr_get_trans( - struct xfs_inode *ip, - int join_flags) + struct xfs_inode *ip) { struct xfs_mount *mp = ip->i_mount; struct xfs_trans *tp; @@ -1337,8 +1292,7 @@ xfs_ioctl_setattr_get_trans( goto out_unlock; xfs_ilock(ip, XFS_ILOCK_EXCL); - xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL | join_flags); - join_flags = 0; + xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL); /* * CAP_FOWNER overrides the following restrictions: @@ -1359,8 +1313,6 @@ xfs_ioctl_setattr_get_trans( out_cancel: xfs_trans_cancel(tp); out_unlock: - if (join_flags) - xfs_iunlock(ip, join_flags); return ERR_PTR(error); } @@ -1486,7 +1438,6 @@ xfs_ioctl_setattr( struct xfs_dquot *pdqp = NULL; struct xfs_dquot *olddquot = NULL; int code; - int join_flags = 0; trace_xfs_ioctl_setattr(ip); @@ -1510,18 +1461,9 @@ xfs_ioctl_setattr( return code; } - /* - * Changing DAX config may require inode locking for mapping - * invalidation. These need to be held all the way to transaction commit - * or cancel time, so need to be passed through to - * xfs_ioctl_setattr_get_trans() so it can apply them to the join call - * appropriately. - */ - code = xfs_ioctl_setattr_dax_invalidate(ip, fa, &join_flags); - if (code) - goto error_free_dquots; + xfs_ioctl_setattr_prepare_dax(ip, fa); - tp = xfs_ioctl_setattr_get_trans(ip, join_flags); + tp = xfs_ioctl_setattr_get_trans(ip); if (IS_ERR(tp)) { code = PTR_ERR(tp); goto error_free_dquots; @@ -1651,7 +1593,6 @@ xfs_ioc_setxflags( struct fsxattr fa; struct fsxattr old_fa; unsigned int flags; - int join_flags = 0; int error; if (copy_from_user(&flags, arg, sizeof(flags))) @@ -1668,18 +1609,9 @@ xfs_ioc_setxflags( if (error) return error; - /* - * Changing DAX config may require inode locking for mapping - * invalidation. These need to be held all the way to transaction commit - * or cancel time, so need to be passed through to - * xfs_ioctl_setattr_get_trans() so it can apply them to the join call - * appropriately. - */ - error = xfs_ioctl_setattr_dax_invalidate(ip, &fa, &join_flags); - if (error) - goto out_drop_write; + xfs_ioctl_setattr_prepare_dax(ip, &fa); - tp = xfs_ioctl_setattr_get_trans(ip, join_flags); + tp = xfs_ioctl_setattr_get_trans(ip); if (IS_ERR(tp)) { error = PTR_ERR(tp); goto out_drop_write;