ubifs: sync() causes writes even if nothing is changed

Message ID	1295200104.2470.5.camel@koala
State	New, archived
Headers	show Return-Path: <linux-mtd-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=subject:from:reply-to:to:cc:in-reply-to:references:content-type :date:message-id:mime-version:x-mailer:content-transfer-encoding; b=K0CAzklpHSROLbq880n1LiOvTA0BhTwFSQIJRLbuRrlsSklh5mV0zX/QdDKDa+VxbC ldV9XsDdoQ5EW6fV0a23gUJwNdlL7ESr0urMpU++bAdFqP2qk8DvsEQ6AMwNsJHlbc15 3tka1zz9EgPO7wrwYn85rp9XeiJ9e59FA5dEc= Subject: Re: ubifs: sync() causes writes even if nothing is changed From: Artem Bityutskiy <dedekind1@gmail.com> To: "Hans J. Koch" <hjk@linutronix.de>, "Adrian.Hunter" <Adrian.Hunter@nokia.com> In-Reply-To: <20101013163005.GB1889@silverbox.local> References: <20101013163005.GB1889@silverbox.local> Date: Sun, 16 Jan 2011 19:48:24 +0200 Message-ID: <1295200104.2470.5.camel@koala> Mime-Version: 1.0 summary: Content analysis details: (1.4 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low trust [209.85.161.49 listed in list.dnswl.org] 0.0 FREEMAIL_FROM Sender email is freemail (dedekind1[at]gmail.com) 2.2 FREEMAIL_ENVFROM_END_DIGIT Envelope-from freemail username ends in digit (dedekind1[at]gmail.com) -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>, linux-mtd@lists.infradead.org, Adrian Hunter <adrian.hunter@nokia.com> Precedence: list Reply-To: dedekind1@gmail.com Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Sender: linux-mtd-bounces@lists.infradead.org Errors-To: linux-mtd-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org

Message ID

1295200104.2470.5.camel@koala

State

New, archived

Headers

DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=subject:from:reply-to:to:cc:in-reply-to:references:content-type
	:date:message-id:mime-version:x-mailer:content-transfer-encoding;
	b=K0CAzklpHSROLbq880n1LiOvTA0BhTwFSQIJRLbuRrlsSklh5mV0zX/QdDKDa+VxbC
	ldV9XsDdoQ5EW6fV0a23gUJwNdlL7ESr0urMpU++bAdFqP2qk8DvsEQ6AMwNsJHlbc15
	3tka1zz9EgPO7wrwYn85rp9XeiJ9e59FA5dEc=
Subject: Re: ubifs: sync() causes writes even if nothing is changed
From: Artem Bityutskiy <dedekind1@gmail.com>
To: "Hans J. Koch" <hjk@linutronix.de>, "Adrian.Hunter"
	<Adrian.Hunter@nokia.com>
In-Reply-To: <20101013163005.GB1889@silverbox.local>
References: <20101013163005.GB1889@silverbox.local>
Date: Sun, 16 Jan 2011 19:48:24 +0200
Message-ID: <1295200104.2470.5.camel@koala>
Mime-Version: 1.0
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	linux-mtd@lists.infradead.org, Adrian Hunter <adrian.hunter@nokia.com>
Precedence: list
Reply-To: dedekind1@gmail.com
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Sender: linux-mtd-bounces@lists.infradead.org
Errors-To: linux-mtd-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org

Commit Message

Artem Bityutskiy Jan. 16, 2011, 5:48 p.m. UTC

On Wed, 2010-10-13 at 18:30 +0200, Hans J. Koch wrote:
> Running this command:
> 
> # while true ; do sync; sleep 1; done
> 
> causes two eraseblocks being erased every second, although there
> are no writes to the ubifs filesystem. I hacked some printks into
> my NAND driver that print page_address and column for each erase.
> With that, I get this output every second:
> 
> ...
> [   63.701765] erase p=0x0000ae40 c=0xffffffff
> [   63.706534] erase p=0xffffffff c=0xffffffff
> [   63.725492] erase p=0x0000ae80 c=0xffffffff
> [   63.730260] erase p=0xffffffff c=0xffffffff
> ...
> 
> From a quick glance at the ubifs code, this might come out of the
> garbage collector that is triggered on every sync() and writes
> something even if nothing has changed.

With nandsim I only can see one erase, but this is anyway suboptimal.
The below patch should fix the issue, please, test if you can. I've also
pushed it to ubifs-2.6.git.

From dca0fe61489805e0eb4ada7c6922856ca91eae52 Mon Sep 17 00:00:00 2001
From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Date: Sun, 16 Jan 2011 19:22:02 +0200
Subject: [PATCH] UBIFS: do not start the commit if there is nothing to commit

This patch fixes suboptimal UBIFS 'sync_fs()' implementation which causes
flash I/O even if the file-system is synchronized. E.g., a 'printk()'
in the MTD erasure function (e.g., 'nand_erase_nand()') can show that
for every 'sync' shell command UBIFS erases at least one eraseblock.

So '$ while true; do sync; done' will cause huge amount of flash I/O.

The reason for this is that UBIFS commits in 'sync_fs()', and starts the
commit even if there is nothing to commit, e.g., it anyway changes the
log. This patch adds a check in the 'do_commit()' UBIFS functions which
prevents the commit if there are not dirty znodes (hence, nothing to
commit).

Reported-by: Hans J. Koch <hjk@linutronix.de>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
 fs/ubifs/commit.c |   17 ++++++++++++++++-
 1 files changed, 16 insertions(+), 1 deletions(-)

Comments

Adrian Hunter Jan. 17, 2011, 8:19 a.m. UTC | #1

On 16/01/11 19:48, ext Artem Bityutskiy wrote:
> On Wed, 2010-10-13 at 18:30 +0200, Hans J. Koch wrote:
>> Running this command:
>>
>> # while true ; do sync; sleep 1; done
>>
>> causes two eraseblocks being erased every second, although there
>> are no writes to the ubifs filesystem. I hacked some printks into
>> my NAND driver that print page_address and column for each erase.
>> With that, I get this output every second:
>>
>> ...
>> [   63.701765] erase p=0x0000ae40 c=0xffffffff
>> [   63.706534] erase p=0xffffffff c=0xffffffff
>> [   63.725492] erase p=0x0000ae80 c=0xffffffff
>> [   63.730260] erase p=0xffffffff c=0xffffffff
>> ...
>>
>>  From a quick glance at the ubifs code, this might come out of the
>> garbage collector that is triggered on every sync() and writes
>> something even if nothing has changed.
>
> With nandsim I only can see one erase, but this is anyway suboptimal.
> The below patch should fix the issue, please, test if you can. I've also
> pushed it to ubifs-2.6.git.
>
>
>> From dca0fe61489805e0eb4ada7c6922856ca91eae52 Mon Sep 17 00:00:00 2001
> From: Artem Bityutskiy<Artem.Bityutskiy@nokia.com>
> Date: Sun, 16 Jan 2011 19:22:02 +0200
> Subject: [PATCH] UBIFS: do not start the commit if there is nothing to commit
>
> This patch fixes suboptimal UBIFS 'sync_fs()' implementation which causes
> flash I/O even if the file-system is synchronized. E.g., a 'printk()'
> in the MTD erasure function (e.g., 'nand_erase_nand()') can show that
> for every 'sync' shell command UBIFS erases at least one eraseblock.
>
> So '$ while true; do sync; done' will cause huge amount of flash I/O.
>
> The reason for this is that UBIFS commits in 'sync_fs()', and starts the
> commit even if there is nothing to commit, e.g., it anyway changes the
> log. This patch adds a check in the 'do_commit()' UBIFS functions which
> prevents the commit if there are not dirty znodes (hence, nothing to
> commit).

Possibly the LPT should be checked also.  Perhaps it can be dirty due
to trivial garbage collection.

Also, have you checked there are no degenerate cases where the commit
is required for some other reason such as consolidating the log or the
recovery  commit?

>
> Reported-by: Hans J. Koch<hjk@linutronix.de>
> Signed-off-by: Artem Bityutskiy<Artem.Bityutskiy@nokia.com>
> ---
>   fs/ubifs/commit.c |   17 ++++++++++++++++-
>   1 files changed, 16 insertions(+), 1 deletions(-)
>
> diff --git a/fs/ubifs/commit.c b/fs/ubifs/commit.c
> index 02429d8..a963d96 100644
> --- a/fs/ubifs/commit.c
> +++ b/fs/ubifs/commit.c
> @@ -70,6 +70,21 @@ static int do_commit(struct ubifs_info *c)
>   		goto out_up;
>   	}
>
> +	/*
> +	 * Every file-system change changes the TNC, and makes the root znode
> +	 * dirty. So if the root znode is clean we can just return immediately
> +	 * because there must be nothing to commit. Note, se do not have to
> +	 * lock @c->tnc_mutex because we have @c->commit_sem in write mode,
> +	 * which guarantees that no one else can access TNC functions
> +	 * concurrently.
> +	 */
> +	if (!c->zroot.znode || !test_bit(DIRTY_ZNODE,&c->zroot.znode->flags)) {
> +		ubifs_assert(atomic_long_read(&c->dirty_zn_cnt) == 0);
> +		err = 0;
> +		up_write(&c->commit_sem);
> +		goto out_cancel;
> +	}
> +
>   	/* Sync all write buffers (necessary for recovery) */
>   	for (i = 0; i<  c->jhead_cnt; i++) {
>   		err = ubifs_wbuf_sync(&c->jheads[i].wbuf);
> @@ -162,12 +177,12 @@ static int do_commit(struct ubifs_info *c)
>   	if (err)
>   		goto out;
>
> +out_cancel:
>   	spin_lock(&c->cs_lock);
>   	c->cmt_state = COMMIT_RESTING;
>   	wake_up(&c->cmt_wq);
>   	dbg_cmt("commit end");
>   	spin_unlock(&c->cs_lock);
> -
>   	return 0;
>
>   out_up:

Artem Bityutskiy Jan. 17, 2011, 9:04 a.m. UTC | #2

On Mon, 2011-01-17 at 10:19 +0200, Adrian Hunter wrote:
> Possibly the LPT should be checked also.  Perhaps it can be dirty due
> to trivial garbage collection.

I'll check, thanks.

> Also, have you checked there are no degenerate cases where the commit
> is required for some other reason such as consolidating the log or the
> recovery  commit?

Right, I missed this, will check as well.

Thanks!

Artem Bityutskiy Jan. 17, 2011, 9:52 p.m. UTC | #3

[Removed Hans J. Koch from CC as his mailbox is unreachable]

On Mon, 2011-01-17 at 10:19 +0200, Adrian Hunter wrote:
> Possibly the LPT should be checked also.  Perhaps it can be dirty due
> to trivial garbage collection.

Yes, AFAIU we can GC from budgeting, end up with trivial gc which will
make dirt in LTP but not in TNC. So you are right.

But there is another "trival GC" inside lprops subsystem, but I think we
should not worry about it.

> Also, have you checked there are no degenerate cases where the commit
> is required for some other reason such as consolidating the log or the
> recovery  commit?

I do not really see how could recovery be needed if nothing is made
dirty in TNC, but due to complexity of that stuff it is safer to do the
commit if we are mounting or remounting rw, because there may be such
situations.

I'll send new patches shortly, thank you!

diff --git a/fs/ubifs/commit.c b/fs/ubifs/commit.c
index 02429d8..a963d96 100644
--- a/fs/ubifs/commit.c
+++ b/fs/ubifs/commit.c
@@ -70,6 +70,21 @@  static int do_commit(struct ubifs_info *c)
 		goto out_up;
 	}
 
+	/*
+	 * Every file-system change changes the TNC, and makes the root znode
+	 * dirty. So if the root znode is clean we can just return immediately
+	 * because there must be nothing to commit. Note, se do not have to
+	 * lock @c->tnc_mutex because we have @c->commit_sem in write mode,
+	 * which guarantees that no one else can access TNC functions
+	 * concurrently.
+	 */
+	if (!c->zroot.znode || !test_bit(DIRTY_ZNODE, &c->zroot.znode->flags)) {
+		ubifs_assert(atomic_long_read(&c->dirty_zn_cnt) == 0);
+		err = 0;
+		up_write(&c->commit_sem);
+		goto out_cancel;
+	}
+
 	/* Sync all write buffers (necessary for recovery) */
 	for (i = 0; i < c->jhead_cnt; i++) {
 		err = ubifs_wbuf_sync(&c->jheads[i].wbuf);
@@ -162,12 +177,12 @@  static int do_commit(struct ubifs_info *c)
 	if (err)
 		goto out;
 
+out_cancel:
 	spin_lock(&c->cs_lock);
 	c->cmt_state = COMMIT_RESTING;
 	wake_up(&c->cmt_wq);
 	dbg_cmt("commit end");
 	spin_unlock(&c->cs_lock);
-
 	return 0;
 
 out_up:

ubifs: sync() causes writes even if nothing is changed

Commit Message

Comments

Patch