From patchwork Wed Nov 1 15:36:42 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 832993 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3yRsmT311wz9sRn for ; Thu, 2 Nov 2017 02:37:45 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754837AbdKAPhG (ORCPT ); Wed, 1 Nov 2017 11:37:06 -0400 Received: from mx2.suse.de ([195.135.220.15]:44195 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754809AbdKAPhD (ORCPT ); Wed, 1 Nov 2017 11:37:03 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 02942ADD8; Wed, 1 Nov 2017 15:36:59 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 33FF21E11F6; Wed, 1 Nov 2017 16:36:56 +0100 (CET) From: Jan Kara To: Dan Williams Cc: Ross Zwisler , Christoph Hellwig , , linux-nvdimm@lists.01.org, , linux-api@vger.kernel.org, , , "Darrick J . Wong" , Jan Kara Subject: [PATCH 13/18] dax, iomap: Add support for synchronous faults Date: Wed, 1 Nov 2017 16:36:42 +0100 Message-Id: <20171101153648.30166-14-jack@suse.cz> X-Mailer: git-send-email 2.12.3 In-Reply-To: <20171101153648.30166-1-jack@suse.cz> References: <20171101153648.30166-1-jack@suse.cz> Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Add a flag to iomap interface informing the caller that inode needs fdstasync(2) for returned extent to become persistent and use it in DAX fault code so that we don't map such extents into page tables immediately. Instead we propagate the information that fdatasync(2) is necessary from dax_iomap_fault() with a new VM_FAULT_NEEDDSYNC flag. Filesystem fault handler is then responsible for calling fdatasync(2) and inserting pfn into page tables. Reviewed-by: Ross Zwisler Reviewed-by: Christoph Hellwig Signed-off-by: Jan Kara --- fs/dax.c | 39 +++++++++++++++++++++++++++++++++++++-- include/linux/iomap.h | 5 +++++ include/linux/mm.h | 6 +++++- 3 files changed, 47 insertions(+), 3 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index efc210ff6665..bb9ff907738c 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1091,6 +1091,7 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, unsigned flags = IOMAP_FAULT; int error, major = 0; bool write = vmf->flags & FAULT_FLAG_WRITE; + bool sync; int vmf_ret = 0; void *entry; pfn_t pfn; @@ -1169,6 +1170,8 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, goto finish_iomap; } + sync = (vma->vm_flags & VM_SYNC) && (iomap.flags & IOMAP_F_DIRTY); + switch (iomap.type) { case IOMAP_MAPPED: if (iomap.flags & IOMAP_F_NEW) { @@ -1182,12 +1185,27 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, entry = dax_insert_mapping_entry(mapping, vmf, entry, dax_iomap_sector(&iomap, pos), - 0, write); + 0, write && !sync); if (IS_ERR(entry)) { error = PTR_ERR(entry); goto error_finish_iomap; } + /* + * If we are doing synchronous page fault and inode needs fsync, + * we can insert PTE into page tables only after that happens. + * Skip insertion for now and return the pfn so that caller can + * insert it after fsync is done. + */ + if (sync) { + if (WARN_ON_ONCE(!pfnp)) { + error = -EIO; + goto error_finish_iomap; + } + *pfnp = pfn; + vmf_ret = VM_FAULT_NEEDDSYNC | major; + goto finish_iomap; + } trace_dax_insert_mapping(inode, vmf, entry); if (write) error = vm_insert_mixed_mkwrite(vma, vaddr, pfn); @@ -1287,6 +1305,7 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, struct address_space *mapping = vma->vm_file->f_mapping; unsigned long pmd_addr = vmf->address & PMD_MASK; bool write = vmf->flags & FAULT_FLAG_WRITE; + bool sync; unsigned int iomap_flags = (write ? IOMAP_WRITE : 0) | IOMAP_FAULT; struct inode *inode = mapping->host; int result = VM_FAULT_FALLBACK; @@ -1371,6 +1390,8 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, if (iomap.offset + iomap.length < pos + PMD_SIZE) goto finish_iomap; + sync = (vma->vm_flags & VM_SYNC) && (iomap.flags & IOMAP_F_DIRTY); + switch (iomap.type) { case IOMAP_MAPPED: error = dax_iomap_pfn(&iomap, pos, PMD_SIZE, &pfn); @@ -1379,10 +1400,24 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, entry = dax_insert_mapping_entry(mapping, vmf, entry, dax_iomap_sector(&iomap, pos), - RADIX_DAX_PMD, write); + RADIX_DAX_PMD, write && !sync); if (IS_ERR(entry)) goto finish_iomap; + /* + * If we are doing synchronous page fault and inode needs fsync, + * we can insert PMD into page tables only after that happens. + * Skip insertion for now and return the pfn so that caller can + * insert it after fsync is done. + */ + if (sync) { + if (WARN_ON_ONCE(!pfnp)) + goto finish_iomap; + *pfnp = pfn; + result = VM_FAULT_NEEDDSYNC; + goto finish_iomap; + } + trace_dax_pmd_insert_mapping(inode, vmf, PMD_SIZE, pfn, entry); result = vmf_insert_pfn_pmd(vma, vmf->address, vmf->pmd, pfn, write); diff --git a/include/linux/iomap.h b/include/linux/iomap.h index f64dc6ce5161..73e3b7085dbe 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -22,6 +22,11 @@ struct vm_fault; * Flags for all iomap mappings: */ #define IOMAP_F_NEW 0x01 /* blocks have been newly allocated */ +/* + * IOMAP_F_DIRTY indicates the inode has uncommitted metadata needed to access + * written data and requires fdatasync to commit them to persistent storage. + */ +#define IOMAP_F_DIRTY 0x02 /* * Flags that only need to be reported for IOMAP_REPORT requests: diff --git a/include/linux/mm.h b/include/linux/mm.h index 5411cb7442de..f57e55782d7d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1182,6 +1182,9 @@ static inline void clear_page_pfmemalloc(struct page *page) #define VM_FAULT_RETRY 0x0400 /* ->fault blocked, must retry */ #define VM_FAULT_FALLBACK 0x0800 /* huge page fault failed, fall back to small */ #define VM_FAULT_DONE_COW 0x1000 /* ->fault has fully handled COW */ +#define VM_FAULT_NEEDDSYNC 0x2000 /* ->fault did not modify page tables + * and needs fsync() to complete (for + * synchronous page faults in DAX) */ #define VM_FAULT_ERROR (VM_FAULT_OOM | VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV | \ VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE | \ @@ -1199,7 +1202,8 @@ static inline void clear_page_pfmemalloc(struct page *page) { VM_FAULT_LOCKED, "LOCKED" }, \ { VM_FAULT_RETRY, "RETRY" }, \ { VM_FAULT_FALLBACK, "FALLBACK" }, \ - { VM_FAULT_DONE_COW, "DONE_COW" } + { VM_FAULT_DONE_COW, "DONE_COW" }, \ + { VM_FAULT_NEEDDSYNC, "NEEDDSYNC" } /* Encode hstate index for a hwpoisoned large page */ #define VM_FAULT_SET_HINDEX(x) ((x) << 12)