diff mbox

ext4: Fix discard of inode prealloc space with delayed allocation.

Message ID 1235578922-7790-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
State Accepted, archived
Headers show

Commit Message

Aneesh Kumar K.V Feb. 25, 2009, 4:22 p.m. UTC
With delayed allocation we should not/cannot discard inode prealloc space
during file close. We would still have dirty pages for which we haven't allocated
blocks yet. With this fix after each get_blocks request we check whether we have
zero reserved blocks and if yes and we don't have any writers on the file we
discard inode prealloc space.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

---
 fs/ext4/file.c  |    9 ++++++++-
 fs/ext4/inode.c |    6 ++++++
 2 files changed, 14 insertions(+), 1 deletions(-)

Comments

Dmitry Monakhov Feb. 25, 2009, 4:57 p.m. UTC | #1
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:

> With delayed allocation we should not/cannot discard inode prealloc space
> during file close. We would still have dirty pages for which we haven't allocated
> blocks yet. With this fix after each get_blocks request we check whether we have
> zero reserved blocks and if yes and we don't have any writers on the file we
> discard inode prealloc space.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>
> ---
>  fs/ext4/file.c  |    9 ++++++++-
>  fs/ext4/inode.c |    6 ++++++
>  2 files changed, 14 insertions(+), 1 deletions(-)
>
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index f731cb5..4e468e2 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -33,9 +33,16 @@
>   */
>  static int ext4_release_file(struct inode *inode, struct file *filp)
>  {
> +	int rsv_data_blocks;
> +
> +	spin_lock(&EXT4_I(inode)->i_block_reservation_lock);
> +	rsv_data_blocks = EXT4_I(inode)->i_reserved_data_blocks;
> +	spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
> +
Seems we have race condition here because at this point someone may:
1)open file
2)then perform some write activity => (i_reserved_data_blocks != 0)
3)close file => (inode->i_writecount == 1)

>  	/* if we are the last writer on the inode, drop the block reservation */
>  	if ((filp->f_mode & FMODE_WRITE) &&
> -			(atomic_read(&inode->i_writecount) == 1))
> +			(atomic_read(&inode->i_writecount) == 1) &&
> +		        !rsv_data_blocks)
>  	{
>  		down_write(&EXT4_I(inode)->i_data_sem);
After we have grabbed i_data_sem we are protected from get_block activity,
and may safely recheck i_reserved_data_blocks again here.
>  		ext4_discard_preallocations(inode);
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 51cdd13..26dcec4 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1038,6 +1038,12 @@ static void ext4_da_update_reserve_space(struct inode *inode, int used)
>  	EXT4_I(inode)->i_reserved_data_blocks -= used;
>  
>  	spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
> +	/*
> +	 * If have done all the pending block allocation and if the
> +	 * we don't have any writer on the inode
> +	 */
No problem here because we are hold i_data_sem.
> +	if (!total && (atomic_read(&inode->i_writecount) == 0))
> +		ext4_discard_preallocations(inode);
>  }
>  
>  /*
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aneesh Kumar K.V Feb. 25, 2009, 5:37 p.m. UTC | #2
On Wed, Feb 25, 2009 at 07:57:52PM +0300, Dmitri Monakhov wrote:
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:
> 
> > With delayed allocation we should not/cannot discard inode prealloc space
> > during file close. We would still have dirty pages for which we haven't allocated
> > blocks yet. With this fix after each get_blocks request we check whether we have
> > zero reserved blocks and if yes and we don't have any writers on the file we
> > discard inode prealloc space.
> >
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> >
> > ---
> >  fs/ext4/file.c  |    9 ++++++++-
> >  fs/ext4/inode.c |    6 ++++++
> >  2 files changed, 14 insertions(+), 1 deletions(-)
> >
> > diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> > index f731cb5..4e468e2 100644
> > --- a/fs/ext4/file.c
> > +++ b/fs/ext4/file.c
> > @@ -33,9 +33,16 @@
> >   */
> >  static int ext4_release_file(struct inode *inode, struct file *filp)
> >  {
> > +	int rsv_data_blocks;
> > +
> > +	spin_lock(&EXT4_I(inode)->i_block_reservation_lock);
> > +	rsv_data_blocks = EXT4_I(inode)->i_reserved_data_blocks;
> > +	spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
> > +
> Seems we have race condition here because at this point someone may:
> 1)open file
> 2)then perform some write activity => (i_reserved_data_blocks != 0)
> 3)close file => (inode->i_writecount == 1)
> 

i_data_sem doesn't actually ensure i_reserved_data_blocks won't change.
We update i_reserved_data_blocks during block reservation
(ext4_da_write_begin). But yes what you mentioned is possible.
But I am not sure we need to be worried about that. It is true that
throwing away prealloc space is not good. But we still do block allocation
based on goal blocks. So we may end up allocating blocks within the
discarded prealloc space again.

What is bad is that if we don't discard the prealloc space even after
all the needed block allocation. That would mean nothing can be
allocated out of that prealloc space. With the current code We would
be discarding the prealloc space only when we hit ENOSPC 
. But then that would be bad.

-aneesh

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o Feb. 26, 2009, 5:59 a.m. UTC | #3
On Wed, Feb 25, 2009 at 09:52:02PM +0530, Aneesh Kumar K.V wrote:
> With delayed allocation we should not/cannot discard inode prealloc
> space during file close. We would still have dirty pages for which
> we haven't allocated blocks yet. With this fix after each get_blocks
> request we check whether we have zero reserved blocks and if yes and
> we don't have any writers on the file we discard inode prealloc
> space.

Thanks, applied in the ext4 patch queue.

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index f731cb5..4e468e2 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -33,9 +33,16 @@ 
  */
 static int ext4_release_file(struct inode *inode, struct file *filp)
 {
+	int rsv_data_blocks;
+
+	spin_lock(&EXT4_I(inode)->i_block_reservation_lock);
+	rsv_data_blocks = EXT4_I(inode)->i_reserved_data_blocks;
+	spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
+
 	/* if we are the last writer on the inode, drop the block reservation */
 	if ((filp->f_mode & FMODE_WRITE) &&
-			(atomic_read(&inode->i_writecount) == 1))
+			(atomic_read(&inode->i_writecount) == 1) &&
+		        !rsv_data_blocks)
 	{
 		down_write(&EXT4_I(inode)->i_data_sem);
 		ext4_discard_preallocations(inode);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 51cdd13..26dcec4 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1038,6 +1038,12 @@  static void ext4_da_update_reserve_space(struct inode *inode, int used)
 	EXT4_I(inode)->i_reserved_data_blocks -= used;
 
 	spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
+	/*
+	 * If have done all the pending block allocation and if the
+	 * we don't have any writer on the inode
+	 */
+	if (!total && (atomic_read(&inode->i_writecount) == 0))
+		ext4_discard_preallocations(inode);
 }
 
 /*