diff mbox

ext4: fix reservation overflow in ext4_da_write_begin

Message ID 542D6F2A.70006@redhat.com
State Accepted, archived
Headers show

Commit Message

Eric Sandeen Oct. 2, 2014, 3:28 p.m. UTC
Delalloc write journal reservations only reserve 1 credit,
to update the inode if necessary.  However, it may happen
once in a filesystem's lifetime that a file will cross
the 2G threshold, and require the LARGE_FILE feature to
be set in the superblock as well, if it was not set already.

This overruns the transaction reservation, and can be
demonstrated simply on any ext4 filesystem without the LARGE_FILE
feature already set:

dd if=/dev/zero of=testfile bs=1 seek=2147483646 count=1 \
	conv=notrunc of=testfile
sync
dd if=/dev/zero of=testfile bs=1 seek=2147483647 count=1 \
	conv=notrunc of=testfile

leads to:

EXT4-fs: ext4_do_update_inode:4296: aborting transaction: error 28 in __ext4_handle_dirty_super
EXT4-fs error (device loop0) in ext4_do_update_inode:4301: error 28
EXT4-fs error (device loop0) in ext4_reserve_inode_write:4757: Readonly filesystem
EXT4-fs error (device loop0) in ext4_dirty_inode:4876: error 28
EXT4-fs error (device loop0) in ext4_da_write_end:2685: error 28

Adjust the number of credits based on whether the flag is
already set, and whether the current write may extend past the
LARGE_FILE limit.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
--- 

Ok, how's this ... I do like this a lot better than the set-flag-on-
mount-or-remount, which started to get a bit icky.



--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Andreas Dilger Oct. 2, 2014, 9 p.m. UTC | #1
On Oct 2, 2014, at 9:28 AM, Eric Sandeen <sandeen@redhat.com> wrote:
> Delalloc write journal reservations only reserve 1 credit,
> to update the inode if necessary.  However, it may happen
> once in a filesystem's lifetime that a file will cross
> the 2G threshold, and require the LARGE_FILE feature to
> be set in the superblock as well, if it was not set already.
> 
> This overruns the transaction reservation, and can be
> demonstrated simply on any ext4 filesystem without the LARGE_FILE
> feature already set:
> 
> dd if=/dev/zero of=testfile bs=1 seek=2147483646 count=1 \
> 	conv=notrunc of=testfile
> sync
> dd if=/dev/zero of=testfile bs=1 seek=2147483647 count=1 \
> 	conv=notrunc of=testfile
> 
> leads to:
> 
> EXT4-fs: ext4_do_update_inode:4296: aborting transaction: error 28 in __ext4_handle_dirty_super
> EXT4-fs error (device loop0) in ext4_do_update_inode:4301: error 28
> EXT4-fs error (device loop0) in ext4_reserve_inode_write:4757: Readonly filesystem
> EXT4-fs error (device loop0) in ext4_dirty_inode:4876: error 28
> EXT4-fs error (device loop0) in ext4_da_write_end:2685: error 28
> 
> Adjust the number of credits based on whether the flag is
> already set, and whether the current write may extend past the
> LARGE_FILE limit.
> 
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>

Reviewed-by: Andreas Dilger <adilger@dilger.ca>

> --- 
> 
> Ok, how's this ... I do like this a lot better than the set-flag-on-
> mount-or-remount, which started to get a bit icky.
> 
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 3aa26e9..8d362c2 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2515,6 +2515,20 @@ static int ext4_nonda_switch(struct super_block *sb)
> 	return 0;
> }
> 
> +/* We always reserve for an inode update; the superblock could be there too */
> +static int ext4_da_write_credits(struct inode *inode, loff_t pos, unsigned len)
> +{
> +	if (EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,

This could be marked "likely()" I suspect, but not critical.

> +                                EXT4_FEATURE_RO_COMPAT_LARGE_FILE))
> +		return 1;
> +
> +	if (pos + len <= 0x7fffffffULL)
> +		return 1;
> +
> +	/* We might need to update the superblock to set LARGE_FILE */
> +	return 2;
> +}
> +
> static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
> 			       loff_t pos, unsigned len, unsigned flags,
> 			       struct page **pagep, void **fsdata)
> @@ -2565,7 +2579,8 @@ retry_grab:
> 	 * of file which has an already mapped buffer.
> 	 */
> retry_journal:
> -	handle = ext4_journal_start(inode, EXT4_HT_WRITE_PAGE, 1);
> +	handle = ext4_journal_start(inode, EXT4_HT_WRITE_PAGE,
> +			ext4_da_write_credits(inode, pos, len));
> 	if (IS_ERR(handle)) {
> 		page_cache_release(page);
> 		return PTR_ERR(handle);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cheers, Andreas
Theodore Ts'o Oct. 11, 2014, 11:52 p.m. UTC | #2
On Thu, Oct 02, 2014 at 03:00:23PM -0600, Andreas Dilger wrote:
> On Oct 2, 2014, at 9:28 AM, Eric Sandeen <sandeen@redhat.com> wrote:
> > Delalloc write journal reservations only reserve 1 credit,
> > to update the inode if necessary.  However, it may happen
> > once in a filesystem's lifetime that a file will cross
> > the 2G threshold, and require the LARGE_FILE feature to
> > be set in the superblock as well, if it was not set already.
> > 
> > This overruns the transaction reservation, and can be
> > demonstrated simply on any ext4 filesystem without the LARGE_FILE
> > feature already set:
> > 
> > dd if=/dev/zero of=testfile bs=1 seek=2147483646 count=1 \
> > 	conv=notrunc of=testfile
> > sync
> > dd if=/dev/zero of=testfile bs=1 seek=2147483647 count=1 \
> > 	conv=notrunc of=testfile
> > 
> > leads to:
> > 
> > EXT4-fs: ext4_do_update_inode:4296: aborting transaction: error 28 in __ext4_handle_dirty_super
> > EXT4-fs error (device loop0) in ext4_do_update_inode:4301: error 28
> > EXT4-fs error (device loop0) in ext4_reserve_inode_write:4757: Readonly filesystem
> > EXT4-fs error (device loop0) in ext4_dirty_inode:4876: error 28
> > EXT4-fs error (device loop0) in ext4_da_write_end:2685: error 28
> > 
> > Adjust the number of credits based on whether the flag is
> > already set, and whether the current write may extend past the
> > LARGE_FILE limit.
> > 
> > Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> 
> Reviewed-by: Andreas Dilger <adilger@dilger.ca>

Applied, thanks.  I added the likely() qualifer per Andreas'
suggestion.

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 3aa26e9..8d362c2 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2515,6 +2515,20 @@  static int ext4_nonda_switch(struct super_block *sb)
 	return 0;
 }
 
+/* We always reserve for an inode update; the superblock could be there too */
+static int ext4_da_write_credits(struct inode *inode, loff_t pos, unsigned len)
+{
+	if (EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
+                                EXT4_FEATURE_RO_COMPAT_LARGE_FILE))
+		return 1;
+
+	if (pos + len <= 0x7fffffffULL)
+		return 1;
+
+	/* We might need to update the superblock to set LARGE_FILE */
+	return 2;
+}
+
 static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
 			       loff_t pos, unsigned len, unsigned flags,
 			       struct page **pagep, void **fsdata)
@@ -2565,7 +2579,8 @@  retry_grab:
 	 * of file which has an already mapped buffer.
 	 */
 retry_journal:
-	handle = ext4_journal_start(inode, EXT4_HT_WRITE_PAGE, 1);
+	handle = ext4_journal_start(inode, EXT4_HT_WRITE_PAGE,
+			ext4_da_write_credits(inode, pos, len));
 	if (IS_ERR(handle)) {
 		page_cache_release(page);
 		return PTR_ERR(handle);