diff mbox

[4/9] jffs2: force the jffs2 GC daemon to behave a bit better

Message ID 200902112127.n1BLR2HX031153@imap1.linux-foundation.org
State Accepted
Headers show

Commit Message

Andrew Morton Feb. 11, 2009, 9:27 p.m. UTC
From: Andres Salomon <dilinger@queued.net>

I've noticed some pretty poor behavior on OLPC machines after bootup, when
gdm/X are starting.  The GCD monopolizes the scheduler (which in turns
means it gets to do more nand i/o), which results in processes taking much
much longer than they should to start.

As an example, on an OLPC machine going from OFW to a usable X (via
auto-login gdm) takes 2m 30s.  The majority of this time is consumed by
the switch into graphical mode.  With this patch, we cut a full 60s off of
bootup time.  After bootup, things are much snappier as well.

Note that we have seen a CRC node error with this patch that causes the machine
to fail to boot, but we've also seen that problem without this patch.

Signed-off-by: Andres Salomon <dilinger@debian.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/jffs2/background.c |   18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

Comments

Ricard Wanderlof Feb. 12, 2009, 9:17 a.m. UTC | #1
On Wed, 11 Feb 2009, akpm@linux-foundation.org wrote:

> I've noticed some pretty poor behavior on OLPC machines after bootup, when
> gdm/X are starting.  The GCD monopolizes the scheduler (which in turns
> means it gets to do more nand i/o), which results in processes taking much
> much longer than they should to start.

Can't really comment on how well the patch works, but we've also noticed a 
similar slowdown on our systems during startup, so the idea as such is 
welcome at any rate.

/Ricard
--
Ricard Wolf Wanderlöf                           ricardw(at)axis.com
Axis Communications AB, Lund, Sweden            www.axis.com
Phone +46 46 272 2016                           Fax +46 46 13 61 30
Aras Vaichas March 16, 2009, 11:28 p.m. UTC | #2
Ricard Wanderlof wrote:
> On Wed, 11 Feb 2009, akpm@linux-foundation.org wrote:
>
>   
>> I've noticed some pretty poor behavior on OLPC machines after bootup, when
>> gdm/X are starting.  The GCD monopolizes the scheduler (which in turns
>> means it gets to do more nand i/o), which results in processes taking much
>> much longer than they should to start.
>>     
>
> Can't really comment on how well the patch works, but we've also noticed a 
> similar slowdown on our systems during startup, so the idea as such is 
> welcome at any rate.
>   
I just posted a message to linux-arm-kernel regarding a similar problem 
I have an at91rm9200 based machine with NAND and JFFS2.

I suspected that garbage collection was causing delays of up to 3+ 
seconds during which time my watchdog patting daemon was not being 
scheduled ...

This was causing the watchdog timer to reset my machine but *only* when 
it was under significant NAND load  i.e. boot time, loading up the main 
application while GCD was running.

My solution was to double the watchdog timeout.

Aras
Reuben Dowle March 17, 2009, 1:23 a.m. UTC | #3
I have also noticed the performance degredation caused by the jffs2
garbage collection. The solution we used to this problem on our product
was to do 'pkill -STOP jffs2_gcd' very early in the boot process, then
right at the end of my boot script put 'sleep 90 && pkill -CONT
jffs2_gcd &'

This sped the boot time up a lot.

Reuben

-----Original Message-----
From: linux-mtd-bounces@lists.infradead.org
[mailto:linux-mtd-bounces@lists.infradead.org] On Behalf Of Aras Vaichas
Sent: Tuesday, 17 March 2009 12:29 p.m.
To: Ricard Wanderlof
Cc: dwmw2@infradead.org; akpm@linux-foundation.org; dilinger@debian.org;
linux-mtd@lists.infradead.org; dilinger@queued.net
Subject: Re: [patch 4/9] jffs2: force the jffs2 GC daemon to behave a
bitbetter

Ricard Wanderlof wrote:
> On Wed, 11 Feb 2009, akpm@linux-foundation.org wrote:
>
>   
>> I've noticed some pretty poor behavior on OLPC machines after bootup,

>> when gdm/X are starting.  The GCD monopolizes the scheduler (which in

>> turns means it gets to do more nand i/o), which results in processes 
>> taking much much longer than they should to start.
>>     
>
> Can't really comment on how well the patch works, but we've also 
> noticed a similar slowdown on our systems during startup, so the idea 
> as such is welcome at any rate.
>   
I just posted a message to linux-arm-kernel regarding a similar problem
I have an at91rm9200 based machine with NAND and JFFS2.

I suspected that garbage collection was causing delays of up to 3+
seconds during which time my watchdog patting daemon was not being
scheduled ...

This was causing the watchdog timer to reset my machine but *only* when
it was under significant NAND load  i.e. boot time, loading up the main
application while GCD was running.

My solution was to double the watchdog timeout.

Aras
diff mbox

Patch

diff -puN fs/jffs2/background.c~jffs2-force-the-jffs2-gc-daemon-to-behave-a-bit-better fs/jffs2/background.c
--- a/fs/jffs2/background.c~jffs2-force-the-jffs2-gc-daemon-to-behave-a-bit-better
+++ a/fs/jffs2/background.c
@@ -95,13 +95,17 @@  static int jffs2_garbage_collect_thread(
 			spin_unlock(&c->erase_completion_lock);
 			
 
-		/* This thread is purely an optimisation. But if it runs when
-		   other things could be running, it actually makes things a
-		   lot worse. Use yield() and put it at the back of the runqueue
-		   every time. Especially during boot, pulling an inode in
-		   with read_inode() is much preferable to having the GC thread
-		   get there first. */
-		yield();
+		/* Problem - immediately after bootup, the GCD spends a lot
+		 * of time in places like jffs2_kill_fragtree(); so much so
+		 * that userspace processes (like gdm and X) are starved
+		 * despite plenty of cond_resched()s and renicing.  Yield()
+		 * doesn't help, either (presumably because userspace and GCD
+		 * are generally competing for a higher latency resource -
+		 * disk).
+		 * This forces the GCD to slow the hell down.   Pulling an
+		 * inode in with read_inode() is much preferable to having
+		 * the GC thread get there first. */
+		schedule_timeout_interruptible(msecs_to_jiffies(50));
 
 		/* Put_super will send a SIGKILL and then wait on the sem.
 		 */