diff mbox

[6/7] ext4: endless truncate due to nonlocked dio readers V2

Message ID 1347211634-11509-7-git-send-email-dmonakhov@openvz.org
State Superseded, archived
Headers show

Commit Message

Dmitry Monakhov Sept. 9, 2012, 5:27 p.m. UTC
If we have enough aggressive DIO readers, truncate and other dio
waiters will wait forever inside inode_dio_wait(). It is reasonable
to disable nonlock DIO read optimization during truncate.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext4/extents.c |    2 +-
 fs/ext4/fsync.c   |    2 +-
 fs/ext4/inode.c   |    2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

Comments

Zheng Liu Sept. 13, 2012, 10:41 a.m. UTC | #1
Hi Dmitry,

Could you please provide more detailed workload to convince me?  I
am thinking about whether we really need to disable dioread_nolock
feature in here.  In our benchmarks, we don't see this problem.

Regards,
Zheng

On Sun, Sep 09, 2012 at 09:27:13PM +0400, Dmitry Monakhov wrote:
> If we have enough aggressive DIO readers, truncate and other dio
> waiters will wait forever inside inode_dio_wait(). It is reasonable
> to disable nonlock DIO read optimization during truncate.
> 
> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> ---
>  fs/ext4/extents.c |    2 +-
>  fs/ext4/fsync.c   |    2 +-
>  fs/ext4/inode.c   |    2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 8252651..b5b801f 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -4853,7 +4853,7 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
>  	}
>  
>  	/* finish any pending end_io work */
> -	inode_dio_wait(inode);
> +	ext4_inode_dio_wait(inode, 1);
>  	ext4_flush_completed_IO(inode);
>  
>  	credits = ext4_writepage_trans_blocks(inode);
> diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
> index 290c5cf..bdf6bfd 100644
> --- a/fs/ext4/fsync.c
> +++ b/fs/ext4/fsync.c
> @@ -204,7 +204,7 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
>  	if (inode->i_sb->s_flags & MS_RDONLY)
>  		goto out;
>  
> -	inode_dio_wait(inode);
> +	ext4_inode_dio_wait(inode, 1);
>  	ret = ext4_flush_completed_IO(inode);
>  	if (ret < 0)
>  		goto out;
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 93e6b09..a850026 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -4335,7 +4335,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
>  			truncate_setsize(inode, attr->ia_size);
>  			/* Inode size will be reduced, wait for dio in flight */
>  			if (orphan)
> -				inode_dio_wait(inode);
> +				ext4_inode_dio_wait(inode, 1);
>  		}
>  		ext4_truncate(inode);
>  	}
> -- 
> 1.7.7.6
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara Sept. 13, 2012, 12:07 p.m. UTC | #2
Hello,

On Thu 13-09-12 18:41:36, Zheng Liu wrote:
> Could you please provide more detailed workload to convince me?  I
> am thinking about whether we really need to disable dioread_nolock
> feature in here.  In our benchmarks, we don't see this problem.
  I just did:

# Create file
dd if=/dev/zero of=/mnt/file bs=1M count=30
sync
# Start 10 DIO dio readers in parallel reading the file in a loop
for (( i = 0; i < 10; i++ )); do
  while true; do
    dd if=/mnt/file bs=4k iflag=direct of=/dev/null
  done &
done
sleep 1

# Try to truncate the file - never finishes.
truncate -s 16 /mnt/file

It is pretty easy to hit this. Besides being a DOS attack vector (but I
won't be too concerned about this - there are plenty of ways how local
process can screw you) I can easily imagine some application to get bitten
by this. 

								Honza
> 
> Regards,
> Zheng
> 
> On Sun, Sep 09, 2012 at 09:27:13PM +0400, Dmitry Monakhov wrote:
> > If we have enough aggressive DIO readers, truncate and other dio
> > waiters will wait forever inside inode_dio_wait(). It is reasonable
> > to disable nonlock DIO read optimization during truncate.
> > 
> > Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> > ---
> >  fs/ext4/extents.c |    2 +-
> >  fs/ext4/fsync.c   |    2 +-
> >  fs/ext4/inode.c   |    2 +-
> >  3 files changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> > index 8252651..b5b801f 100644
> > --- a/fs/ext4/extents.c
> > +++ b/fs/ext4/extents.c
> > @@ -4853,7 +4853,7 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
> >  	}
> >  
> >  	/* finish any pending end_io work */
> > -	inode_dio_wait(inode);
> > +	ext4_inode_dio_wait(inode, 1);
> >  	ext4_flush_completed_IO(inode);
> >  
> >  	credits = ext4_writepage_trans_blocks(inode);
> > diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
> > index 290c5cf..bdf6bfd 100644
> > --- a/fs/ext4/fsync.c
> > +++ b/fs/ext4/fsync.c
> > @@ -204,7 +204,7 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
> >  	if (inode->i_sb->s_flags & MS_RDONLY)
> >  		goto out;
> >  
> > -	inode_dio_wait(inode);
> > +	ext4_inode_dio_wait(inode, 1);
> >  	ret = ext4_flush_completed_IO(inode);
> >  	if (ret < 0)
> >  		goto out;
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index 93e6b09..a850026 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -4335,7 +4335,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
> >  			truncate_setsize(inode, attr->ia_size);
> >  			/* Inode size will be reduced, wait for dio in flight */
> >  			if (orphan)
> > -				inode_dio_wait(inode);
> > +				ext4_inode_dio_wait(inode, 1);
> >  		}
> >  		ext4_truncate(inode);
> >  	}
> > -- 
> > 1.7.7.6
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zheng Liu Sept. 13, 2012, 12:57 p.m. UTC | #3
On Thu, Sep 13, 2012 at 02:07:36PM +0200, Jan Kara wrote:
>   Hello,
> 
> On Thu 13-09-12 18:41:36, Zheng Liu wrote:
> > Could you please provide more detailed workload to convince me?  I
> > am thinking about whether we really need to disable dioread_nolock
> > feature in here.  In our benchmarks, we don't see this problem.
>   I just did:
> 
> # Create file
> dd if=/dev/zero of=/mnt/file bs=1M count=30
> sync
> # Start 10 DIO dio readers in parallel reading the file in a loop
> for (( i = 0; i < 10; i++ )); do
>   while true; do
>     dd if=/mnt/file bs=4k iflag=direct of=/dev/null
>   done &
> done
> sleep 1
> 
> # Try to truncate the file - never finishes.
> truncate -s 16 /mnt/file
> 
> It is pretty easy to hit this. Besides being a DOS attack vector (but I
> won't be too concerned about this - there are plenty of ways how local
> process can screw you) I can easily imagine some application to get bitten
> by this. 

Hi Jan,

Thanks for your explanation, but in my desktop I cannot reproduce this
problem.  The size of `file' is 16.  Am I missing something?

Regards,
Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara Sept. 13, 2012, 2:34 p.m. UTC | #4
On Thu 13-09-12 20:57:26, Zheng Liu wrote:
> On Thu, Sep 13, 2012 at 02:07:36PM +0200, Jan Kara wrote:
> >   Hello,
> > 
> > On Thu 13-09-12 18:41:36, Zheng Liu wrote:
> > > Could you please provide more detailed workload to convince me?  I
> > > am thinking about whether we really need to disable dioread_nolock
> > > feature in here.  In our benchmarks, we don't see this problem.
> >   I just did:
> > 
> > # Create file
> > dd if=/dev/zero of=/mnt/file bs=1M count=30
> > sync
> > # Start 10 DIO dio readers in parallel reading the file in a loop
> > for (( i = 0; i < 10; i++ )); do
> >   while true; do
> >     dd if=/mnt/file bs=4k iflag=direct of=/dev/null
> >   done &
> > done
> > sleep 1
> > 
> > # Try to truncate the file - never finishes.
> > truncate -s 16 /mnt/file
> > 
> > It is pretty easy to hit this. Besides being a DOS attack vector (but I
> > won't be too concerned about this - there are plenty of ways how local
> > process can screw you) I can easily imagine some application to get bitten
> > by this. 
> 
> Hi Jan,
> 
> Thanks for your explanation, but in my desktop I cannot reproduce this
> problem.  The size of `file' is 16.  Am I missing something?
  Hum, on my test machine with 3.6-rc1 it does not... Maybe for your
desktop you need a larger sleep before running truncate so that readers
have time to start up? Also I suppose you have ext4 mounted with
dioread_nolock mount option?

								Honza
Zheng Liu Sept. 13, 2012, 11:31 p.m. UTC | #5
On Thu, Sep 13, 2012 at 04:34:55PM +0200, Jan Kara wrote:
> On Thu 13-09-12 20:57:26, Zheng Liu wrote:
> > On Thu, Sep 13, 2012 at 02:07:36PM +0200, Jan Kara wrote:
> > >   Hello,
> > > 
> > > On Thu 13-09-12 18:41:36, Zheng Liu wrote:
> > > > Could you please provide more detailed workload to convince me?  I
> > > > am thinking about whether we really need to disable dioread_nolock
> > > > feature in here.  In our benchmarks, we don't see this problem.
> > >   I just did:
> > > 
> > > # Create file
> > > dd if=/dev/zero of=/mnt/file bs=1M count=30
> > > sync
> > > # Start 10 DIO dio readers in parallel reading the file in a loop
> > > for (( i = 0; i < 10; i++ )); do
> > >   while true; do
> > >     dd if=/mnt/file bs=4k iflag=direct of=/dev/null
> > >   done &
> > > done
> > > sleep 1
> > > 
> > > # Try to truncate the file - never finishes.
> > > truncate -s 16 /mnt/file
> > > 
> > > It is pretty easy to hit this. Besides being a DOS attack vector (but I
> > > won't be too concerned about this - there are plenty of ways how local
> > > process can screw you) I can easily imagine some application to get bitten
> > > by this. 
> > 
> > Hi Jan,
> > 
> > Thanks for your explanation, but in my desktop I cannot reproduce this
> > problem.  The size of `file' is 16.  Am I missing something?
>   Hum, on my test machine with 3.6-rc1 it does not... Maybe for your
> desktop you need a larger sleep before running truncate so that readers
> have time to start up? Also I suppose you have ext4 mounted with
> dioread_nolock mount option?

Yes, it can be reproduced after increasing sleep time.  Thanks.

Regards,
Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 8252651..b5b801f 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4853,7 +4853,7 @@  int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
 	}
 
 	/* finish any pending end_io work */
-	inode_dio_wait(inode);
+	ext4_inode_dio_wait(inode, 1);
 	ext4_flush_completed_IO(inode);
 
 	credits = ext4_writepage_trans_blocks(inode);
diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
index 290c5cf..bdf6bfd 100644
--- a/fs/ext4/fsync.c
+++ b/fs/ext4/fsync.c
@@ -204,7 +204,7 @@  int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 	if (inode->i_sb->s_flags & MS_RDONLY)
 		goto out;
 
-	inode_dio_wait(inode);
+	ext4_inode_dio_wait(inode, 1);
 	ret = ext4_flush_completed_IO(inode);
 	if (ret < 0)
 		goto out;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 93e6b09..a850026 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4335,7 +4335,7 @@  int ext4_setattr(struct dentry *dentry, struct iattr *attr)
 			truncate_setsize(inode, attr->ia_size);
 			/* Inode size will be reduced, wait for dio in flight */
 			if (orphan)
-				inode_dio_wait(inode);
+				ext4_inode_dio_wait(inode, 1);
 		}
 		ext4_truncate(inode);
 	}