[v2,1/2] ext4: Close race between direct IO and ext4_break_layouts()

Message ID 153374910694.40645.17166196534680658204.stgit@djiang5-desk3.ch.intel.com
State New
Headers show
Series
  • [v2,1/2] ext4: Close race between direct IO and ext4_break_layouts()
Related show

Commit Message

Dave Jiang Aug. 8, 2018, 5:25 p.m.
From: Ross Zwisler <zwisler@kernel.org>

If the refcount of a page is lowered between the time that it is returned
by dax_busy_page() and when the refcount is again checked in
ext4_break_layouts() => ___wait_var_event(), the waiting function
ext4_wait_dax_page() will never be called.  This means that
ext4_break_layouts() will still have 'retry' set to false, so we'll stop
looping and never check the refcount of other pages in this inode.

Instead, always continue looping as long as dax_layout_busy_page() gives us
a page which it found with an elevated refcount.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---

v2:
- remove verbiage in comment header (Jan)

 fs/ext4/inode.c |    9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

Comments

Eric Sandeen Sept. 10, 2018, 4:23 p.m. | #1
On 8/8/18 12:25 PM, Dave Jiang wrote:
> From: Ross Zwisler <zwisler@kernel.org>
> 
> If the refcount of a page is lowered between the time that it is returned
> by dax_busy_page() and when the refcount is again checked in
> ext4_break_layouts() => ___wait_var_event(), the waiting function
> ext4_wait_dax_page() will never be called.  This means that
> ext4_break_layouts() will still have 'retry' set to false, so we'll stop
> looping and never check the refcount of other pages in this inode.
> 
> Instead, always continue looping as long as dax_layout_busy_page() gives us
> a page which it found with an elevated refcount.
> 
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> Reviewed-by: Jan Kara <jack@suse.cz>

Ted - wondering if this one is still on your radar?

Thanks,
-Eric

> ---
> 
> v2:
> - remove verbiage in comment header (Jan)
> 
>  fs/ext4/inode.c |    9 +++------
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 8f6ad7667974..d2663a1e3ec2 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -4191,9 +4191,8 @@ int ext4_update_disksize_before_punch(struct inode *inode, loff_t offset,
>  	return 0;
>  }
>  
> -static void ext4_wait_dax_page(struct ext4_inode_info *ei, bool *did_unlock)
> +static void ext4_wait_dax_page(struct ext4_inode_info *ei)
>  {
> -	*did_unlock = true;
>  	up_write(&ei->i_mmap_sem);
>  	schedule();
>  	down_write(&ei->i_mmap_sem);
> @@ -4203,14 +4202,12 @@ int ext4_break_layouts(struct inode *inode)
>  {
>  	struct ext4_inode_info *ei = EXT4_I(inode);
>  	struct page *page;
> -	bool retry;
>  	int error;
>  
>  	if (WARN_ON_ONCE(!rwsem_is_locked(&ei->i_mmap_sem)))
>  		return -EINVAL;
>  
>  	do {
> -		retry = false;
>  		page = dax_layout_busy_page(inode->i_mapping);
>  		if (!page)
>  			return 0;
> @@ -4218,8 +4215,8 @@ int ext4_break_layouts(struct inode *inode)
>  		error = ___wait_var_event(&page->_refcount,
>  				atomic_read(&page->_refcount) == 1,
>  				TASK_INTERRUPTIBLE, 0, 0,
> -				ext4_wait_dax_page(ei, &retry));
> -	} while (error == 0 && retry);
> +				ext4_wait_dax_page(ei));
> +	} while (error == 0);
>  
>  	return error;
>  }
>
Jan Kara Sept. 11, 2018, 3:26 p.m. | #2
On Mon 10-09-18 11:23:56, Eric Sandeen wrote:
> On 8/8/18 12:25 PM, Dave Jiang wrote:
> > From: Ross Zwisler <zwisler@kernel.org>
> > 
> > If the refcount of a page is lowered between the time that it is returned
> > by dax_busy_page() and when the refcount is again checked in
> > ext4_break_layouts() => ___wait_var_event(), the waiting function
> > ext4_wait_dax_page() will never be called.  This means that
> > ext4_break_layouts() will still have 'retry' set to false, so we'll stop
> > looping and never check the refcount of other pages in this inode.
> > 
> > Instead, always continue looping as long as dax_layout_busy_page() gives us
> > a page which it found with an elevated refcount.
> > 
> > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> > Reviewed-by: Jan Kara <jack@suse.cz>
> 
> Ted - wondering if this one is still on your radar?

Resent the patch to Ted to catch more attention.

								Honza

Patch

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 8f6ad7667974..d2663a1e3ec2 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4191,9 +4191,8 @@  int ext4_update_disksize_before_punch(struct inode *inode, loff_t offset,
 	return 0;
 }
 
-static void ext4_wait_dax_page(struct ext4_inode_info *ei, bool *did_unlock)
+static void ext4_wait_dax_page(struct ext4_inode_info *ei)
 {
-	*did_unlock = true;
 	up_write(&ei->i_mmap_sem);
 	schedule();
 	down_write(&ei->i_mmap_sem);
@@ -4203,14 +4202,12 @@  int ext4_break_layouts(struct inode *inode)
 {
 	struct ext4_inode_info *ei = EXT4_I(inode);
 	struct page *page;
-	bool retry;
 	int error;
 
 	if (WARN_ON_ONCE(!rwsem_is_locked(&ei->i_mmap_sem)))
 		return -EINVAL;
 
 	do {
-		retry = false;
 		page = dax_layout_busy_page(inode->i_mapping);
 		if (!page)
 			return 0;
@@ -4218,8 +4215,8 @@  int ext4_break_layouts(struct inode *inode)
 		error = ___wait_var_event(&page->_refcount,
 				atomic_read(&page->_refcount) == 1,
 				TASK_INTERRUPTIBLE, 0, 0,
-				ext4_wait_dax_page(ei, &retry));
-	} while (error == 0 && retry);
+				ext4_wait_dax_page(ei));
+	} while (error == 0);
 
 	return error;
 }