diff mbox

ubifs: sync() causes writes even if nothing is changed

Message ID 1295200104.2470.5.camel@koala
State New, archived
Headers show

Commit Message

Artem Bityutskiy Jan. 16, 2011, 5:48 p.m. UTC
On Wed, 2010-10-13 at 18:30 +0200, Hans J. Koch wrote:
> Running this command:
> 
> # while true ; do sync; sleep 1; done
> 
> causes two eraseblocks being erased every second, although there
> are no writes to the ubifs filesystem. I hacked some printks into
> my NAND driver that print page_address and column for each erase.
> With that, I get this output every second:
> 
> ...
> [   63.701765] erase p=0x0000ae40 c=0xffffffff
> [   63.706534] erase p=0xffffffff c=0xffffffff
> [   63.725492] erase p=0x0000ae80 c=0xffffffff
> [   63.730260] erase p=0xffffffff c=0xffffffff
> ...
> 
> From a quick glance at the ubifs code, this might come out of the
> garbage collector that is triggered on every sync() and writes
> something even if nothing has changed.

With nandsim I only can see one erase, but this is anyway suboptimal.
The below patch should fix the issue, please, test if you can. I've also
pushed it to ubifs-2.6.git.


From dca0fe61489805e0eb4ada7c6922856ca91eae52 Mon Sep 17 00:00:00 2001
From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Date: Sun, 16 Jan 2011 19:22:02 +0200
Subject: [PATCH] UBIFS: do not start the commit if there is nothing to commit

This patch fixes suboptimal UBIFS 'sync_fs()' implementation which causes
flash I/O even if the file-system is synchronized. E.g., a 'printk()'
in the MTD erasure function (e.g., 'nand_erase_nand()') can show that
for every 'sync' shell command UBIFS erases at least one eraseblock.

So '$ while true; do sync; done' will cause huge amount of flash I/O.

The reason for this is that UBIFS commits in 'sync_fs()', and starts the
commit even if there is nothing to commit, e.g., it anyway changes the
log. This patch adds a check in the 'do_commit()' UBIFS functions which
prevents the commit if there are not dirty znodes (hence, nothing to
commit).

Reported-by: Hans J. Koch <hjk@linutronix.de>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
 fs/ubifs/commit.c |   17 ++++++++++++++++-
 1 files changed, 16 insertions(+), 1 deletions(-)

Comments

Adrian Hunter Jan. 17, 2011, 8:19 a.m. UTC | #1
On 16/01/11 19:48, ext Artem Bityutskiy wrote:
> On Wed, 2010-10-13 at 18:30 +0200, Hans J. Koch wrote:
>> Running this command:
>>
>> # while true ; do sync; sleep 1; done
>>
>> causes two eraseblocks being erased every second, although there
>> are no writes to the ubifs filesystem. I hacked some printks into
>> my NAND driver that print page_address and column for each erase.
>> With that, I get this output every second:
>>
>> ...
>> [   63.701765] erase p=0x0000ae40 c=0xffffffff
>> [   63.706534] erase p=0xffffffff c=0xffffffff
>> [   63.725492] erase p=0x0000ae80 c=0xffffffff
>> [   63.730260] erase p=0xffffffff c=0xffffffff
>> ...
>>
>>  From a quick glance at the ubifs code, this might come out of the
>> garbage collector that is triggered on every sync() and writes
>> something even if nothing has changed.
>
> With nandsim I only can see one erase, but this is anyway suboptimal.
> The below patch should fix the issue, please, test if you can. I've also
> pushed it to ubifs-2.6.git.
>
>
>> From dca0fe61489805e0eb4ada7c6922856ca91eae52 Mon Sep 17 00:00:00 2001
> From: Artem Bityutskiy<Artem.Bityutskiy@nokia.com>
> Date: Sun, 16 Jan 2011 19:22:02 +0200
> Subject: [PATCH] UBIFS: do not start the commit if there is nothing to commit
>
> This patch fixes suboptimal UBIFS 'sync_fs()' implementation which causes
> flash I/O even if the file-system is synchronized. E.g., a 'printk()'
> in the MTD erasure function (e.g., 'nand_erase_nand()') can show that
> for every 'sync' shell command UBIFS erases at least one eraseblock.
>
> So '$ while true; do sync; done' will cause huge amount of flash I/O.
>
> The reason for this is that UBIFS commits in 'sync_fs()', and starts the
> commit even if there is nothing to commit, e.g., it anyway changes the
> log. This patch adds a check in the 'do_commit()' UBIFS functions which
> prevents the commit if there are not dirty znodes (hence, nothing to
> commit).

Possibly the LPT should be checked also.  Perhaps it can be dirty due
to trivial garbage collection.

Also, have you checked there are no degenerate cases where the commit
is required for some other reason such as consolidating the log or the
recovery  commit?

>
> Reported-by: Hans J. Koch<hjk@linutronix.de>
> Signed-off-by: Artem Bityutskiy<Artem.Bityutskiy@nokia.com>
> ---
>   fs/ubifs/commit.c |   17 ++++++++++++++++-
>   1 files changed, 16 insertions(+), 1 deletions(-)
>
> diff --git a/fs/ubifs/commit.c b/fs/ubifs/commit.c
> index 02429d8..a963d96 100644
> --- a/fs/ubifs/commit.c
> +++ b/fs/ubifs/commit.c
> @@ -70,6 +70,21 @@ static int do_commit(struct ubifs_info *c)
>   		goto out_up;
>   	}
>
> +	/*
> +	 * Every file-system change changes the TNC, and makes the root znode
> +	 * dirty. So if the root znode is clean we can just return immediately
> +	 * because there must be nothing to commit. Note, se do not have to
> +	 * lock @c->tnc_mutex because we have @c->commit_sem in write mode,
> +	 * which guarantees that no one else can access TNC functions
> +	 * concurrently.
> +	 */
> +	if (!c->zroot.znode || !test_bit(DIRTY_ZNODE,&c->zroot.znode->flags)) {
> +		ubifs_assert(atomic_long_read(&c->dirty_zn_cnt) == 0);
> +		err = 0;
> +		up_write(&c->commit_sem);
> +		goto out_cancel;
> +	}
> +
>   	/* Sync all write buffers (necessary for recovery) */
>   	for (i = 0; i<  c->jhead_cnt; i++) {
>   		err = ubifs_wbuf_sync(&c->jheads[i].wbuf);
> @@ -162,12 +177,12 @@ static int do_commit(struct ubifs_info *c)
>   	if (err)
>   		goto out;
>
> +out_cancel:
>   	spin_lock(&c->cs_lock);
>   	c->cmt_state = COMMIT_RESTING;
>   	wake_up(&c->cmt_wq);
>   	dbg_cmt("commit end");
>   	spin_unlock(&c->cs_lock);
> -
>   	return 0;
>
>   out_up:
Artem Bityutskiy Jan. 17, 2011, 9:04 a.m. UTC | #2
On Mon, 2011-01-17 at 10:19 +0200, Adrian Hunter wrote:
> Possibly the LPT should be checked also.  Perhaps it can be dirty due
> to trivial garbage collection.

I'll check, thanks.

> Also, have you checked there are no degenerate cases where the commit
> is required for some other reason such as consolidating the log or the
> recovery  commit?

Right, I missed this, will check as well.

Thanks!
Artem Bityutskiy Jan. 17, 2011, 9:52 p.m. UTC | #3
[Removed Hans J. Koch from CC as his mailbox is unreachable]

On Mon, 2011-01-17 at 10:19 +0200, Adrian Hunter wrote:
> Possibly the LPT should be checked also.  Perhaps it can be dirty due
> to trivial garbage collection.

Yes, AFAIU we can GC from budgeting, end up with trivial gc which will
make dirt in LTP but not in TNC. So you are right.

But there is another "trival GC" inside lprops subsystem, but I think we
should not worry about it.

> Also, have you checked there are no degenerate cases where the commit
> is required for some other reason such as consolidating the log or the
> recovery  commit?

I do not really see how could recovery be needed if nothing is made
dirty in TNC, but due to complexity of that stuff it is safer to do the
commit if we are mounting or remounting rw, because there may be such
situations.

I'll send new patches shortly, thank you!
diff mbox

Patch

diff --git a/fs/ubifs/commit.c b/fs/ubifs/commit.c
index 02429d8..a963d96 100644
--- a/fs/ubifs/commit.c
+++ b/fs/ubifs/commit.c
@@ -70,6 +70,21 @@  static int do_commit(struct ubifs_info *c)
 		goto out_up;
 	}
 
+	/*
+	 * Every file-system change changes the TNC, and makes the root znode
+	 * dirty. So if the root znode is clean we can just return immediately
+	 * because there must be nothing to commit. Note, se do not have to
+	 * lock @c->tnc_mutex because we have @c->commit_sem in write mode,
+	 * which guarantees that no one else can access TNC functions
+	 * concurrently.
+	 */
+	if (!c->zroot.znode || !test_bit(DIRTY_ZNODE, &c->zroot.znode->flags)) {
+		ubifs_assert(atomic_long_read(&c->dirty_zn_cnt) == 0);
+		err = 0;
+		up_write(&c->commit_sem);
+		goto out_cancel;
+	}
+
 	/* Sync all write buffers (necessary for recovery) */
 	for (i = 0; i < c->jhead_cnt; i++) {
 		err = ubifs_wbuf_sync(&c->jheads[i].wbuf);
@@ -162,12 +177,12 @@  static int do_commit(struct ubifs_info *c)
 	if (err)
 		goto out;
 
+out_cancel:
 	spin_lock(&c->cs_lock);
 	c->cmt_state = COMMIT_RESTING;
 	wake_up(&c->cmt_wq);
 	dbg_cmt("commit end");
 	spin_unlock(&c->cs_lock);
-
 	return 0;
 
 out_up: