Patchwork [4/9] jffs2: force the jffs2 GC daemon to behave a bit better

login
register
mail settings
Submitter Andrew Morton
Date Feb. 11, 2009, 9:27 p.m.
Message ID <200902112127.n1BLR2HX031153@imap1.linux-foundation.org>
Download mbox | patch
Permalink /patch/22943/
State Accepted
Headers show

Comments

Andrew Morton - Feb. 11, 2009, 9:27 p.m.
From: Andres Salomon <dilinger@queued.net>

I've noticed some pretty poor behavior on OLPC machines after bootup, when
gdm/X are starting.  The GCD monopolizes the scheduler (which in turns
means it gets to do more nand i/o), which results in processes taking much
much longer than they should to start.

As an example, on an OLPC machine going from OFW to a usable X (via
auto-login gdm) takes 2m 30s.  The majority of this time is consumed by
the switch into graphical mode.  With this patch, we cut a full 60s off of
bootup time.  After bootup, things are much snappier as well.

Note that we have seen a CRC node error with this patch that causes the machine
to fail to boot, but we've also seen that problem without this patch.

Signed-off-by: Andres Salomon <dilinger@debian.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/jffs2/background.c |   18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)
Ricard Wanderlof - Feb. 12, 2009, 9:17 a.m.
On Wed, 11 Feb 2009, akpm@linux-foundation.org wrote:

> I've noticed some pretty poor behavior on OLPC machines after bootup, when
> gdm/X are starting.  The GCD monopolizes the scheduler (which in turns
> means it gets to do more nand i/o), which results in processes taking much
> much longer than they should to start.

Can't really comment on how well the patch works, but we've also noticed a 
similar slowdown on our systems during startup, so the idea as such is 
welcome at any rate.

/Ricard
--
Ricard Wolf Wanderlöf                           ricardw(at)axis.com
Axis Communications AB, Lund, Sweden            www.axis.com
Phone +46 46 272 2016                           Fax +46 46 13 61 30
Aras Vaichas - March 16, 2009, 11:28 p.m.
Ricard Wanderlof wrote:
> On Wed, 11 Feb 2009, akpm@linux-foundation.org wrote:
>
>   
>> I've noticed some pretty poor behavior on OLPC machines after bootup, when
>> gdm/X are starting.  The GCD monopolizes the scheduler (which in turns
>> means it gets to do more nand i/o), which results in processes taking much
>> much longer than they should to start.
>>     
>
> Can't really comment on how well the patch works, but we've also noticed a 
> similar slowdown on our systems during startup, so the idea as such is 
> welcome at any rate.
>   
I just posted a message to linux-arm-kernel regarding a similar problem 
I have an at91rm9200 based machine with NAND and JFFS2.

I suspected that garbage collection was causing delays of up to 3+ 
seconds during which time my watchdog patting daemon was not being 
scheduled ...

This was causing the watchdog timer to reset my machine but *only* when 
it was under significant NAND load  i.e. boot time, loading up the main 
application while GCD was running.

My solution was to double the watchdog timeout.

Aras
Reuben Dowle - March 17, 2009, 1:23 a.m.
I have also noticed the performance degredation caused by the jffs2
garbage collection. The solution we used to this problem on our product
was to do 'pkill -STOP jffs2_gcd' very early in the boot process, then
right at the end of my boot script put 'sleep 90 && pkill -CONT
jffs2_gcd &'

This sped the boot time up a lot.

Reuben

-----Original Message-----
From: linux-mtd-bounces@lists.infradead.org
[mailto:linux-mtd-bounces@lists.infradead.org] On Behalf Of Aras Vaichas
Sent: Tuesday, 17 March 2009 12:29 p.m.
To: Ricard Wanderlof
Cc: dwmw2@infradead.org; akpm@linux-foundation.org; dilinger@debian.org;
linux-mtd@lists.infradead.org; dilinger@queued.net
Subject: Re: [patch 4/9] jffs2: force the jffs2 GC daemon to behave a
bitbetter

Ricard Wanderlof wrote:
> On Wed, 11 Feb 2009, akpm@linux-foundation.org wrote:
>
>   
>> I've noticed some pretty poor behavior on OLPC machines after bootup,

>> when gdm/X are starting.  The GCD monopolizes the scheduler (which in

>> turns means it gets to do more nand i/o), which results in processes 
>> taking much much longer than they should to start.
>>     
>
> Can't really comment on how well the patch works, but we've also 
> noticed a similar slowdown on our systems during startup, so the idea 
> as such is welcome at any rate.
>   
I just posted a message to linux-arm-kernel regarding a similar problem
I have an at91rm9200 based machine with NAND and JFFS2.

I suspected that garbage collection was causing delays of up to 3+
seconds during which time my watchdog patting daemon was not being
scheduled ...

This was causing the watchdog timer to reset my machine but *only* when
it was under significant NAND load  i.e. boot time, loading up the main
application while GCD was running.

My solution was to double the watchdog timeout.

Aras

Patch

diff -puN fs/jffs2/background.c~jffs2-force-the-jffs2-gc-daemon-to-behave-a-bit-better fs/jffs2/background.c
--- a/fs/jffs2/background.c~jffs2-force-the-jffs2-gc-daemon-to-behave-a-bit-better
+++ a/fs/jffs2/background.c
@@ -95,13 +95,17 @@  static int jffs2_garbage_collect_thread(
 			spin_unlock(&c->erase_completion_lock);
 			
 
-		/* This thread is purely an optimisation. But if it runs when
-		   other things could be running, it actually makes things a
-		   lot worse. Use yield() and put it at the back of the runqueue
-		   every time. Especially during boot, pulling an inode in
-		   with read_inode() is much preferable to having the GC thread
-		   get there first. */
-		yield();
+		/* Problem - immediately after bootup, the GCD spends a lot
+		 * of time in places like jffs2_kill_fragtree(); so much so
+		 * that userspace processes (like gdm and X) are starved
+		 * despite plenty of cond_resched()s and renicing.  Yield()
+		 * doesn't help, either (presumably because userspace and GCD
+		 * are generally competing for a higher latency resource -
+		 * disk).
+		 * This forces the GCD to slow the hell down.   Pulling an
+		 * inode in with read_inode() is much preferable to having
+		 * the GC thread get there first. */
+		schedule_timeout_interruptible(msecs_to_jiffies(50));
 
 		/* Put_super will send a SIGKILL and then wait on the sem.
 		 */