diff mbox

ext4: Add support for data=alloc_on_commit mode

Message ID 1237259998-12656-1-git-send-email-tytso@mit.edu
State Deferred
Headers show

Commit Message

Theodore Ts'o March 17, 2009, 3:19 a.m. UTC
Add an ext3 bug-for-bug compatible analogue for data=ordered mode.  In
this mode, we force all delayed allocation blocks involved with the
to-be-commited transaction to be allocated, and then flushed out to
disk before the transaction is commited.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
 fs/ext4/ext4.h       |    6 +++-
 fs/ext4/ext4_jbd2.h  |    3 +-
 fs/ext4/inode.c      |   12 +++++++++++
 fs/ext4/super.c      |   51 ++++++++++++++++++++++++++++++++++++-------------
 fs/jbd2/commit.c     |    3 ++
 include/linux/jbd2.h |    2 +
 6 files changed, 60 insertions(+), 17 deletions(-)

Comments

Eric Sandeen March 17, 2009, 4:07 a.m. UTC | #1
Theodore Ts'o wrote:
> Add an ext3 bug-for-bug compatible analogue for data=ordered mode.  In
> this mode, we force all delayed allocation blocks involved with the
> to-be-commited transaction to be allocated, and then flushed out to
> disk before the transaction is commited.
> 
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

Just out of curiosity, what is intended to be the default mode?  How do
you envision users choosing between data=ordered vs. data=alloc-on-commit?

Maybe I'm asking whether there's a Documentation/ patch to go with this? :)

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aneesh Kumar K.V March 17, 2009, 9:28 a.m. UTC | #2
On Mon, Mar 16, 2009 at 11:19:58PM -0400, Theodore Ts'o wrote:
> Add an ext3 bug-for-bug compatible analogue for data=ordered mode.  In
> this mode, we force all delayed allocation blocks involved with the
> to-be-commited transaction to be allocated, and then flushed out to
> disk before the transaction is commited.
> 

Wouldn't this cause a deadlock ? We want to commit a transaction because
we don't have enough journal space (via journal_start) and now that would cause block
allocation which would do another journal_start()

-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen March 18, 2009, 4:18 a.m. UTC | #3
Theodore Ts'o wrote:
> Add an ext3 bug-for-bug compatible analogue for data=ordered mode.  In
> this mode, we force all delayed allocation blocks involved with the
> to-be-commited transaction to be allocated, and then flushed out to
> disk before the transaction is commited.
> 
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

Haven't really looked into the cause yet but I was playing with this,
and ran:

# rm -f bigfile smallfile; dd if=/dev/zero of=bigfile bs=1M count=8192;
echo boo > smallfile; time /root/fsync smallfile; time /root/fsync bigfile

which resulted in:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000060
IP: [<ffffffffa02a6de1>] alloc_on_commit_callback+0x27/0x7f [ext4]
PGD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:0c.0/0000:02:00.0/irq
CPU 1
Modules linked in: ext4 jbd2 crc16 ipt_MASQUERADE iptable_nat nf_nat
bridge stp llc autofs4 sunrpc ipv6 cpufreq_ondemand powernow_k8
freq_table xfs exportfs video output sbs sbshc parport_pc lp parport tg3
serio_raw pata_amd k8temp hwmon pata_acpi ata_generic i2c_nforce2
i2c_core pcspkr qla2xxx scsi_transport_fc scsi_tgt shpchp mptspi
mptscsih mptbase scsi_transport_spi [last unloaded: ext4]
Pid: 5011, comm: kjournald2 Not tainted 2.6.29-rc8 #3 ProLiant DL145 G2
RIP: 0010:[<ffffffffa02a6de1>]  [<ffffffffa02a6de1>]
alloc_on_commit_callback+0x27/0x7f [ext4]
RSP: 0018:ffff88013c847d20  EFLAGS: 00010246
RAX: 000000000000025f RBX: 0000000000000000 RCX: 0000000000000001
RDX: 000000000000025f RSI: ffff88013ec91d00 RDI: ffff880137cedb8c
RBP: ffff88013c847d50 R08: ffff880000011580 R09: ffffffff81029cc2
R10: ffff880131744448 R11: ffff88010fd28070 R12: ffff880137ced824
R13: ffff880137ced800 R14: ffff880137cedb8c R15: 0000000000000000
FS:  00007fd21065b6e0(0000) GS:ffff88013fc01f80(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000060 CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kjournald2 (pid: 5011, threadinfo ffff88013c846000, task
ffff880131744410)
Stack:
 0000000000006449 0000000000000000 ffff880137ced824 ffff8800bf426230
 ffff880137ced800 ffff88013ec91d00 ffff88013c847ea0 ffffffffa00e2024
 ffff88013c847dc0 ffff8801199001c0 ffffffff815013a0 000017594c782cbd
Call Trace:
 [<ffffffffa00e2024>] jbd2_journal_commit_transaction+0xff7/0x105b [jbd2]
 [<ffffffffa00e597e>] kjournald2+0xe6/0x235 [jbd2]
 [<ffffffff8105b4f7>] ? autoremove_wake_function+0x0/0x38
 [<ffffffffa00e5898>] ? kjournald2+0x0/0x235 [jbd2]
 [<ffffffff8105b389>] kthread+0x49/0x78
 [<ffffffff8101251a>] child_rip+0xa/0x20
 [<ffffffff81029cc2>] ? native_load_tls+0xf/0x29
 [<ffffffff8105b340>] ? kthread+0x0/0x78
 [<ffffffff81012510>] ? child_rip+0x0/0x20
Code: 41 5d c9 c3 55 48 89 e5 41 57 41 56 4c 8d b7 8c 03 00 00 41 55 49
89 fd 41 54 53 48 83 ec 08 4c 8b 7f 48 4c 89 f7 e8 84 ee 0a e1 <49> 8b
5f 60 48 83 eb 10 4c 8b 63 10 eb 1f e8 39 36 d8 e0 90 48
RIP  [<ffffffffa02a6de1>] alloc_on_commit_callback+0x27/0x7f [ext4]
 RSP <ffff88013c847d20>
CR2: 0000000000000060
---[ end trace 7c7cc83cb0a81eef ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara March 18, 2009, 1:12 p.m. UTC | #4
> On Mon, Mar 16, 2009 at 11:19:58PM -0400, Theodore Ts'o wrote:
> > Add an ext3 bug-for-bug compatible analogue for data=ordered mode.  In
> > this mode, we force all delayed allocation blocks involved with the
> > to-be-commited transaction to be allocated, and then flushed out to
> > disk before the transaction is commited.
> > 
> 
> Wouldn't this cause a deadlock ? We want to commit a transaction because
> we don't have enough journal space (via journal_start) and now that would cause block
> allocation which would do another journal_start()
  Yes, that's exactly what I think. We cannot start a transaction while
committing another transaction. Also you must put the block allocation
into the transaction you're going to commit because of data consistency
guarantees.
  So if you want to do "alloc on commit" you have to reserve enough
credits to the running transaction at the "block reservation" time and
then use them for allocation at commit time. But this gets complex
because the number of needed credits is hard to estimate (we don't know how
many bitmaps / group descriptors we're going to modify). I'm not yet
sure how to solve this problem...

									Honza
Theodore Ts'o March 18, 2009, 6:19 p.m. UTC | #5
On Wed, Mar 18, 2009 at 02:12:15PM +0100, Jan Kara wrote:
> > Wouldn't this cause a deadlock ? We want to commit a transaction because
> > we don't have enough journal space (via journal_start) and now that would cause block
> > allocation which would do another journal_start()
>   Yes, that's exactly what I think. We cannot start a transaction while
> committing another transaction. Also you must put the block allocation
> into the transaction you're going to commit because of data consistency
> guarantees.
>   So if you want to do "alloc on commit" you have to reserve enough
> credits to the running transaction at the "block reservation" time and
> then use them for allocation at commit time. But this gets complex
> because the number of needed credits is hard to estimate (we don't know how
> many bitmaps / group descriptors we're going to modify). I'm not yet
> sure how to solve this problem...

Yeah, agreed, this is going to get tricky.  What we would have to do
is estimate a worst case, and include that in the running tally, and
then subtract it off when we start allocating the data blocks.

But the problem then is what happens to new file system operations?
If we stall them, it will be a major performance hit.  We can't let
them start a new transaction, because we can't have to open
transactions at the same time.  If we let them continue to run against
the current transaction, then #1, we could run out of space (although
the we give ourselves 25% of the journal as "slop" space which is
extremely generous), and #2, there is a race where the new file system
operations that do delayed allocation won't get allocated on the
commit.

So this is not going to be an easy problem to solve, not without
massively complicating the jbd2 layer...

							- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen March 19, 2009, 4:42 p.m. UTC | #6
Eric Sandeen wrote:
> Theodore Ts'o wrote:
>> Add an ext3 bug-for-bug compatible analogue for data=ordered mode.  In
>> this mode, we force all delayed allocation blocks involved with the
>> to-be-commited transaction to be allocated, and then flushed out to
>> disk before the transaction is commited.
>>
>> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> 
> Haven't really looked into the cause yet but I was playing with this,
> and ran:
> 
> # rm -f bigfile smallfile; dd if=/dev/zero of=bigfile bs=1M count=8192;
> echo boo > smallfile; time /root/fsync smallfile; time /root/fsync bigfile
> 
> which resulted in:
> 
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000060
> IP: [<ffffffffa02a6de1>] alloc_on_commit_callback+0x27/0x7f [ext4]

This may have been user error ... ignore for now, I'll follow up if I
decide it's real.  :)

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen March 19, 2009, 9:50 p.m. UTC | #7
Theodore Ts'o wrote:
> Add an ext3 bug-for-bug compatible analogue for data=ordered mode.  In
> this mode, we force all delayed allocation blocks involved with the
> to-be-commited transaction to be allocated, and then flushed out to
> disk before the transaction is commited.
> 
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

For fun I ran compilebench -i 10 -r 30 on this on a sata drive:

ext3 run complete:
==========================================================================
intial create total runs 10 avg 43.74 MB/s (user 0.58s sys 3.51s)
create total runs 5 avg 34.94 MB/s (user 0.58s sys 3.34s)
patch total runs 4 avg 21.07 MB/s (user 0.34s sys 3.42s)
compile total runs 7 avg 59.41 MB/s (user 0.14s sys 3.54s)
clean total runs 4 avg 427.96 MB/s (user 0.03s sys 0.52s)
read tree total runs 2 avg 23.94 MB/s (user 0.86s sys 5.29s)
read compiled tree total runs 1 avg 52.59 MB/s (user 0.91s sys 7.30s)
delete tree total runs 2 avg 5.25 seconds (user 0.48s sys 3.25s)
no runs for delete compiled tree
stat tree total runs 4 avg 4.11 seconds (user 0.50s sys 2.39s)
stat compiled tree total runs 1 avg 4.26 seconds (user 0.47s sys 2.58s)

ext4 default run complete:
==========================================================================
intial create total runs 10 avg 51.51 MB/s (user 0.60s sys 3.26s)
create total runs 5 avg 38.20 MB/s (user 0.57s sys 3.34s)
patch total runs 4 avg 22.80 MB/s (user 0.35s sys 3.48s)
compile total runs 7 avg 67.25 MB/s (user 0.13s sys 2.52s)
clean total runs 4 avg 687.29 MB/s (user 0.02s sys 0.39s)
read tree total runs 2 avg 24.05 MB/s (user 0.85s sys 5.27s)
read compiled tree total runs 1 avg 56.56 MB/s (user 0.99s sys 7.08s)
delete tree total runs 2 avg 3.85 seconds (user 0.35s sys 2.69s)
no runs for delete compiled tree
stat tree total runs 4 avg 2.78 seconds (user 0.36s sys 1.74s)
stat compiled tree total runs 1 avg 3.00 seconds (user 0.37s sys 1.88s)

ext4 alloc_on_commit run complete:
==========================================================================
intial create total runs 10 avg 46.96 MB/s (user 0.59s sys 3.34s)
create total runs 5 avg 37.80 MB/s (user 0.59s sys 3.26s)
patch total runs 4 avg 22.27 MB/s (user 0.33s sys 3.58s)
compile total runs 7 avg 65.96 MB/s (user 0.13s sys 2.60s)
clean total runs 4 avg 589.56 MB/s (user 0.03s sys 0.37s)
read tree total runs 2 avg 22.33 MB/s (user 0.86s sys 5.31s)
read compiled tree total runs 1 avg 55.71 MB/s (user 1.04s sys 7.07s)
delete tree total runs 2 avg 3.99 seconds (user 0.34s sys 2.69s)
no runs for delete compiled tree
stat tree total runs 4 avg 2.95 seconds (user 0.35s sys 1.74s)
stat compiled tree total runs 1 avg 3.18 seconds (user 0.39s sys 1.86s)

A bit slower than default, to be expected I guess.  Still faster than
ext3, which is... nice.  Didn't hang or oops :)

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index ebd1a50..b15b03e 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -541,8 +541,9 @@  do {									       \
 #define EXT4_MOUNT_NOLOAD		0x00100	/* Don't use existing journal*/
 #define EXT4_MOUNT_ABORT		0x00200	/* Fatal error detected */
 #define EXT4_MOUNT_DATA_FLAGS		0x00C00	/* Mode for data writes: */
+#define EXT4_MOUNT_ORDERED_DATA		0x00000	/* Flush data before commit */
 #define EXT4_MOUNT_JOURNAL_DATA		0x00400	/* Write data to journal */
-#define EXT4_MOUNT_ORDERED_DATA		0x00800	/* Flush data before commit */
+#define EXT4_MOUNT_ALLOC_COMMIT_DATA	0x00800	/* Alloc data on commit */
 #define EXT4_MOUNT_WRITEBACK_DATA	0x00C00	/* No data ordering */
 #define EXT4_MOUNT_UPDATE_JOURNAL	0x01000	/* Update the journal format */
 #define EXT4_MOUNT_NO_UID32		0x02000  /* Disable 32-bit UIDs */
@@ -820,10 +821,11 @@  static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino)
 #define EXT4_DEFM_XATTR_USER	0x0004
 #define EXT4_DEFM_ACL		0x0008
 #define EXT4_DEFM_UID16		0x0010
-#define EXT4_DEFM_JMODE		0x0060
+#define EXT4_DEFM_JMODE		0x00E0
 #define EXT4_DEFM_JMODE_DATA	0x0020
 #define EXT4_DEFM_JMODE_ORDERED	0x0040
 #define EXT4_DEFM_JMODE_WBACK	0x0060
+#define EXT4_DEFM_JMODE_ALLOC_COMMIT	0x00C0
 
 /*
  * Default journal batch times
diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
index be2f426..0453671 100644
--- a/fs/ext4/ext4_jbd2.h
+++ b/fs/ext4/ext4_jbd2.h
@@ -274,7 +274,8 @@  static inline int ext4_should_order_data(struct inode *inode)
 		return 0;
 	if (EXT4_I(inode)->i_flags & EXT4_JOURNAL_DATA_FL)
 		return 0;
-	if (test_opt(inode->i_sb, DATA_FLAGS) == EXT4_MOUNT_ORDERED_DATA)
+	if ((test_opt(inode->i_sb, DATA_FLAGS) == EXT4_MOUNT_ORDERED_DATA) ||
+	    (test_opt(inode->i_sb, DATA_FLAGS) == EXT4_MOUNT_ALLOC_COMMIT_DATA))
 		return 1;
 	return 0;
 }
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index b58e7e2..ba0112b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2754,6 +2754,17 @@  static int ext4_da_write_end(struct file *file,
 		   "dev %s ino %lu pos %llu len %u copied %u",
 		   inode->i_sb->s_id, inode->i_ino,
 		   (unsigned long long) pos, len, copied);
+
+	if (test_opt(inode->i_sb, DATA_FLAGS) ==
+	    EXT4_MOUNT_ALLOC_COMMIT_DATA) {
+		ret = ext4_jbd2_file_inode(handle, inode);
+		if (ret)
+			goto errout;
+		ret = ext4_mark_inode_dirty(handle, inode);
+		if (ret)
+			goto errout;
+	}
+
 	start = pos & (PAGE_CACHE_SIZE - 1);
 	end = start + copied - 1;
 
@@ -2791,6 +2802,7 @@  static int ext4_da_write_end(struct file *file,
 	copied = ret2;
 	if (ret2 < 0)
 		ret = ret2;
+errout:
 	ret2 = ext4_journal_stop(handle);
 	if (!ret)
 		ret = ret2;
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 3f32fb2..93e1bf9 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -67,7 +67,7 @@  static int ext4_statfs(struct dentry *dentry, struct kstatfs *buf);
 static int ext4_unfreeze(struct super_block *sb);
 static void ext4_write_super(struct super_block *sb);
 static int ext4_freeze(struct super_block *sb);
-
+static void alloc_on_commit_callback(journal_t *journal);
 
 ext4_fsblk_t ext4_block_bitmap(struct super_block *sb,
 			       struct ext4_group_desc *bg)
@@ -849,6 +849,8 @@  static int ext4_show_options(struct seq_file *seq, struct vfsmount *vfs)
 		seq_puts(seq, ",data=ordered");
 	else if (test_opt(sb, DATA_FLAGS) == EXT4_MOUNT_WRITEBACK_DATA)
 		seq_puts(seq, ",data=writeback");
+	else if (test_opt(sb, DATA_FLAGS) == EXT4_MOUNT_ALLOC_COMMIT_DATA)
+		seq_puts(seq, ",data=alloc_on_commit");
 
 	if (sbi->s_inode_readahead_blks != EXT4_DEF_INODE_READAHEAD_BLKS)
 		seq_printf(seq, ",inode_readahead_blks=%u",
@@ -1012,7 +1014,7 @@  enum {
 	Opt_journal_update, Opt_journal_dev,
 	Opt_journal_checksum, Opt_journal_async_commit,
 	Opt_abort, Opt_data_journal, Opt_data_ordered, Opt_data_writeback,
-	Opt_data_err_abort, Opt_data_err_ignore,
+	Opt_data_alloc_on_commit, Opt_data_err_abort, Opt_data_err_ignore,
 	Opt_usrjquota, Opt_grpjquota, Opt_offusrjquota, Opt_offgrpjquota,
 	Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_quota, Opt_noquota,
 	Opt_ignore, Opt_barrier, Opt_err, Opt_resize, Opt_usrquota,
@@ -1056,6 +1058,7 @@  static const match_table_t tokens = {
 	{Opt_data_journal, "data=journal"},
 	{Opt_data_ordered, "data=ordered"},
 	{Opt_data_writeback, "data=writeback"},
+	{Opt_data_alloc_on_commit, "data=alloc_on_commit"},
 	{Opt_data_err_abort, "data_err=abort"},
 	{Opt_data_err_ignore, "data_err=ignore"},
 	{Opt_offusrjquota, "usrjquota="},
@@ -1273,6 +1276,9 @@  static int parse_options(char *options, struct super_block *sb,
 		case Opt_data_ordered:
 			data_opt = EXT4_MOUNT_ORDERED_DATA;
 			goto datacheck;
+		case Opt_data_alloc_on_commit:
+			data_opt = EXT4_MOUNT_ALLOC_COMMIT_DATA;
+			goto datacheck;
 		case Opt_data_writeback:
 			data_opt = EXT4_MOUNT_WRITEBACK_DATA;
 		datacheck:
@@ -1852,6 +1858,26 @@  static void ext4_orphan_cleanup(struct super_block *sb,
 #endif
 	sb->s_flags = s_flags; /* Restore MS_RDONLY status */
 }
+
+/*
+ * This callback is called before each commit when we are using
+ * alloc-on-commit mode.
+ */
+static void alloc_on_commit_callback(journal_t *journal)
+{
+	struct jbd2_inode *jinode, *next_i;
+	transaction_t *transaction = journal->j_running_transaction;
+
+	spin_lock(&journal->j_list_lock);
+	list_for_each_entry_safe(jinode, next_i,
+				 &transaction->t_inode_list, i_list) {
+		spin_unlock(&journal->j_list_lock);
+		ext4_alloc_da_blocks(jinode->i_vfs_inode);
+		spin_lock(&journal->j_list_lock);
+	}
+	spin_unlock(&journal->j_list_lock);
+}
+
 /*
  * Maximal extent format file size.
  * Resulting logical blkno at s_maxbytes must fit in our on-disk
@@ -2283,6 +2309,9 @@  static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 		sbi->s_mount_opt |= EXT4_MOUNT_ORDERED_DATA;
 	else if ((def_mount_opts & EXT4_DEFM_JMODE) == EXT4_DEFM_JMODE_WBACK)
 		sbi->s_mount_opt |= EXT4_MOUNT_WRITEBACK_DATA;
+	else if ((def_mount_opts & EXT4_DEFM_JMODE) ==
+		 EXT4_DEFM_JMODE_ALLOC_COMMIT)
+		sbi->s_mount_opt |= EXT4_MOUNT_ALLOC_COMMIT_DATA;
 
 	if (le16_to_cpu(sbi->s_es->s_errors) == EXT4_ERRORS_PANIC)
 		set_opt(sbi->s_mount_opt, ERRORS_PANIC);
@@ -2654,18 +2683,9 @@  static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 	/* We have now updated the journal if required, so we can
 	 * validate the data journaling mode. */
 	switch (test_opt(sb, DATA_FLAGS)) {
-	case 0:
-		/* No mode set, assume a default based on the journal
-		 * capabilities: ORDERED_DATA if the journal can
-		 * cope, else JOURNAL_DATA
-		 */
-		if (jbd2_journal_check_available_features
-		    (sbi->s_journal, 0, 0, JBD2_FEATURE_INCOMPAT_REVOKE))
-			set_opt(sbi->s_mount_opt, ORDERED_DATA);
-		else
-			set_opt(sbi->s_mount_opt, JOURNAL_DATA);
-		break;
-
+	case EXT4_MOUNT_ALLOC_COMMIT_DATA:
+		sbi->s_journal->j_pre_commit_callback =
+			alloc_on_commit_callback;
 	case EXT4_MOUNT_ORDERED_DATA:
 	case EXT4_MOUNT_WRITEBACK_DATA:
 		if (!jbd2_journal_check_available_features
@@ -2784,6 +2804,9 @@  no_journal:
 			descr = " journalled data mode";
 		else if (test_opt(sb, DATA_FLAGS) == EXT4_MOUNT_ORDERED_DATA)
 			descr = " ordered data mode";
+		else if (test_opt(sb, DATA_FLAGS) ==
+			 EXT4_MOUNT_ALLOC_COMMIT_DATA)
+			descr = " alloc on commit data mode";
 		else
 			descr = " writeback data mode";
 	} else
diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 62804e5..e8a96e7 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -379,6 +379,9 @@  void jbd2_journal_commit_transaction(journal_t *journal)
 	spin_unlock(&journal->j_list_lock);
 #endif
 
+	if (journal->j_pre_commit_callback)
+		journal->j_pre_commit_callback(journal);
+
 	/* Do we need to erase the effects of a prior jbd2_journal_flush? */
 	if (journal->j_flags & JBD2_FLUSHED) {
 		jbd_debug(3, "super block updated\n");
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 4d248b3..43b1689 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -975,6 +975,8 @@  struct journal_s
 	u32			j_min_batch_time;
 	u32			j_max_batch_time;
 
+	/* This function is called before a transaction is closed */
+	void			(*j_pre_commit_callback)(journal_t *);
 	/* This function is called when a transaction is closed */
 	void			(*j_commit_callback)(journal_t *,
 						     transaction_t *);