diff mbox

[RFC] jbd2: don't write non-commit blocks synchronously

Message ID 20140306135642.GA22136@thunk.org
State New, archived
Headers show

Commit Message

Theodore Ts'o March 6, 2014, 1:56 p.m. UTC
On Wed, Mar 05, 2014 at 03:13:43PM +0100, Lucas Nussbaum wrote:
> TL;DR: we experience long temporary hangs when doing multiple mount -o
> remount at the same time as other I/O on an ext4 filesystem.

Hi Lukas,

Thanks for this report.  Are you willing to try a kernel patch?  If
so, could you try and see if this fixes your issue.  From looking at
your block trace, I saw a large number of suspicious 4k writes from
the jbd2 layer.

					- Ted

commit 137d7cea675fd7d8ff98b7e035fb6516dc4ab220
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Thu Mar 6 08:56:11 2014 -0500

    jbd2: don't write non-commit blocks synchronously
    
    We don't need to write the revoke blocks and descriptor blocks using
    WRITE_SYNC, since when we issue the commit block, thos blocks will get
    pushed out via REQ_FLUSH.  This will allow the journal blocks to be
    written in fewer i/o operations (otherwise we end up issuing a whole
    series of 4k writes unnecessarily).
    
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Lucas Nussbaum March 6, 2014, 5:28 p.m. UTC | #1
On 06/03/14 at 08:56 -0500, Theodore Ts'o wrote:
> On Wed, Mar 05, 2014 at 03:13:43PM +0100, Lucas Nussbaum wrote:
> > TL;DR: we experience long temporary hangs when doing multiple mount -o
> > remount at the same time as other I/O on an ext4 filesystem.
> 
> Hi Lukas,
> 
> Thanks for this report.  Are you willing to try a kernel patch?  If
> so, could you try and see if this fixes your issue.  From looking at
> your block trace, I saw a large number of suspicious 4k writes from
> the jbd2 layer.

Hi Ted,

This patch doesn't solve the problem. It seems that on average, it makes
the situation worse, but I'm not sure if this is just anecdotical
evidence or statistically valid.

Anything else I could try to gather more information?

Lucas
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o March 6, 2014, 6:27 p.m. UTC | #2
Hmm... OK, let me make sure I understand what is going on.  So you
have a single file system which is mounted read/write, and you are
doing a huge number of copies into the file system, which is keeping
it busy.  You are then running a huge number of "mount -o remount" on
that same file system, which should effectively be no-op's, since the
remount isn't actually change the read/only or read/write or any other
mount options.  Is that right?

Why were you doing the remount in in your actual production workload,
anyway?

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lucas Nussbaum March 6, 2014, 6:37 p.m. UTC | #3
On 06/03/14 at 18:28 +0100, Lucas Nussbaum wrote:
> On 06/03/14 at 08:56 -0500, Theodore Ts'o wrote:
> > On Wed, Mar 05, 2014 at 03:13:43PM +0100, Lucas Nussbaum wrote:
> > > TL;DR: we experience long temporary hangs when doing multiple mount -o
> > > remount at the same time as other I/O on an ext4 filesystem.
> > 
> > Hi Lukas,
> > 
> > Thanks for this report.  Are you willing to try a kernel patch?  If
> > so, could you try and see if this fixes your issue.  From looking at
> > your block trace, I saw a large number of suspicious 4k writes from
> > the jbd2 layer.
> 
> Hi Ted,
> 
> This patch doesn't solve the problem. It seems that on average, it makes
> the situation worse, but I'm not sure if this is just anecdotical
> evidence or statistically valid.
> 
> Anything else I could try to gather more information?

Two other data points:
- same problem with a 2.6.29-2-amd64 kernel from Debian
- with that 2.6.29-2-amd64, disabling the journal on the ext4 filesytem
  using 'tune2fs -O ^has_journal /dev/sda5' makes the problem go away
Lucas Nussbaum March 6, 2014, 6:45 p.m. UTC | #4
On 06/03/14 at 13:27 -0500, Theodore Ts'o wrote:
> Hmm... OK, let me make sure I understand what is going on.  So you
> have a single file system which is mounted read/write,

Yes

> and you are
> doing a huge number of copies into the file system, which is keeping
> it busy.

On that minimal system, I'm just copying ~300 MB of data. I wouldn't
qualify it as huge.

> You are then running a huge number of "mount -o remount" on
> that same file system,

at the same time as the data copy, yes.

> which should effectively be no-op's, since the
> remount isn't actually change the read/only or read/write or any other
> mount options.  Is that right?

Yes

> Why were you doing the remount in in your actual production workload,
> anyway?

We are booting a large number (hundreds) of LXC containers in order to
setup an experimental environment. Those LXC containers simply use
subdirectories on the ext4 filesystem as root directory.
What we saw is that the boot of LXC containers "deadlocked".

We later discovered that:
- this was caused by Debian's /etc/init.d/checkroot.sh that calls
  mount -o remount,defaults,rw 
- it was not a deadlock, but rather something looking like severe lock
  contention. After a seemingly random amount of time (2 to to 15 mins),
  the boot of LXC containers finishes.
- it was possible to reproduce the problem outside of LXC, using the
  "write data and do lots of remounts at the same time" setup I
  described earlier.
diff mbox

Patch

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index cf2fc05..fb64629 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -554,7 +554,7 @@  void jbd2_journal_commit_transaction(journal_t *journal)
 
 	blk_start_plug(&plug);
 	jbd2_journal_write_revoke_records(journal, commit_transaction,
-					  &log_bufs, WRITE_SYNC);
+					  &log_bufs, WRITE);
 	blk_finish_plug(&plug);
 
 	jbd_debug(3, "JBD2: commit phase 2b\n");
@@ -739,7 +739,7 @@  start_journal_io:
 				clear_buffer_dirty(bh);
 				set_buffer_uptodate(bh);
 				bh->b_end_io = journal_end_buffer_io_sync;
-				submit_bh(WRITE_SYNC, bh);
+				submit_bh(WRITE, bh);
 			}
 			cond_resched();
 			stats.run.rs_blocks_logged += bufs;