diff mbox

ubifs : corruption after power cut test

Message ID 1282463086.16502.38.camel@brekeke
State New, archived
Headers show

Commit Message

Artem Bityutskiy Aug. 22, 2010, 7:44 a.m. UTC
On Wed, 2010-07-28 at 09:40 +0200, Matthieu CASTET wrote:
> I manage to reproduce it with the backtrace [1].

Matthieu, your work-around patch or something very close should
certainly be applied to the UBIFS tree, but I still would like to find
out what exactly happened in your setup.

I see 2 possibilities:

1. An error happened and 'ubifs_garbage_collect()' returned while
c->gc_lnum was -1. But in this case we should have switched to R/O mode,
and the master node would not be written. But may be for some reasons we
did not switch to R/O mode, dunno.

2. More likely scenario: in 'ubifs_rcvry_gc_commit()' we call
'ubifs_garbage_collect_leb()' directly, which can return while
c->gc_lnum is -1. And we do not handle this.

Would you please be patient enough to reproduce the issue once again with
the following patch, which was created against the latest ubifs-2.6.git, but
you should be easily able to apply it to your tree.

Artem.

Comments

Artem Bityutskiy Sept. 6, 2010, 8:55 a.m. UTC | #1
On Sun, 2010-08-22 at 10:44 +0300, Artem Bityutskiy wrote:
> On Wed, 2010-07-28 at 09:40 +0200, Matthieu CASTET wrote:
> > I manage to reproduce it with the backtrace [1].
> 
> Matthieu, your work-around patch or something very close should
> certainly be applied to the UBIFS tree, but I still would like to find
> out what exactly happened in your setup.
> 
> I see 2 possibilities:
> 
> 1. An error happened and 'ubifs_garbage_collect()' returned while
> c->gc_lnum was -1. But in this case we should have switched to R/O mode,
> and the master node would not be written. But may be for some reasons we
> did not switch to R/O mode, dunno.
> 
> 2. More likely scenario: in 'ubifs_rcvry_gc_commit()' we call
> 'ubifs_garbage_collect_leb()' directly, which can return while
> c->gc_lnum is -1. And we do not handle this.
> 
> Would you please be patient enough to reproduce the issue once again with
> the following patch, which was created against the latest ubifs-2.6.git, but
> you should be easily able to apply it to your tree.

Hi, any news?
Matthieu CASTET Sept. 9, 2010, 9:22 a.m. UTC | #2
Artem Bityutskiy a écrit :
> On Sun, 2010-08-22 at 10:44 +0300, Artem Bityutskiy wrote:
>> On Wed, 2010-07-28 at 09:40 +0200, Matthieu CASTET wrote:
>>> I manage to reproduce it with the backtrace [1].
>> Matthieu, your work-around patch or something very close should
>> certainly be applied to the UBIFS tree, but I still would like to find
>> out what exactly happened in your setup.
>>
>> I see 2 possibilities:
>>
>> 1. An error happened and 'ubifs_garbage_collect()' returned while
>> c->gc_lnum was -1. But in this case we should have switched to R/O mode,
>> and the master node would not be written. But may be for some reasons we
>> did not switch to R/O mode, dunno.
>>
>> 2. More likely scenario: in 'ubifs_rcvry_gc_commit()' we call
>> 'ubifs_garbage_collect_leb()' directly, which can return while
>> c->gc_lnum is -1. And we do not handle this.
>>
>> Would you please be patient enough to reproduce the issue once again with
>> the following patch, which was created against the latest ubifs-2.6.git, but
>> you should be easily able to apply it to your tree.
> 
> Hi, any news?
> 
Not much, I was busy on another subject but I will try ASAP.


Matthieu

PS : any idea/comment on the handling of interrupted write page by 
UBI/UBIFS ?
Artem Bityutskiy Sept. 9, 2010, 9:51 a.m. UTC | #3
> PS : any idea/comment on the handling of interrupted write page by 
> UBI/UBIFS ?

Err, I think these are perfectly handled, I read your e-mails, they were
a little messy, but I did not find anything UBIFS does not handle. I
sent you a fix for your oops.

Would you please re-formulate your questions nicely in a separate
e-mail, if you still have them?
Matthieu CASTET Sept. 24, 2010, 3:31 p.m. UTC | #4
Artem Bityutskiy a écrit :
> On Wed, 2010-07-28 at 09:40 +0200, Matthieu CASTET wrote:
>> I manage to reproduce it with the backtrace [1].
> 
> Matthieu, your work-around patch or something very close should
> certainly be applied to the UBIFS tree, but I still would like to find
> out what exactly happened in your setup.
> 
> I see 2 possibilities:
> 
> 1. An error happened and 'ubifs_garbage_collect()' returned while
> c->gc_lnum was -1. But in this case we should have switched to R/O mode,
> and the master node would not be written. But may be for some reasons we
> did not switch to R/O mode, dunno.
> 
> 2. More likely scenario: in 'ubifs_rcvry_gc_commit()' we call
> 'ubifs_garbage_collect_leb()' directly, which can return while
> c->gc_lnum is -1. And we do not handle this.
> 
> Would you please be patient enough to reproduce the issue once again with
> the following patch, which was created against the latest ubifs-2.6.git, but
> you should be easily able to apply it to your tree.
None of these check happen.

only the dump in ubifs_write_master.


Matthieu
Artem Bityutskiy Sept. 24, 2010, 4:50 p.m. UTC | #5
On Fri, 2010-09-24 at 17:31 +0200, Matthieu CASTET wrote:
> Artem Bityutskiy a écrit :
> > On Wed, 2010-07-28 at 09:40 +0200, Matthieu CASTET wrote:
> >> I manage to reproduce it with the backtrace [1].
> > 
> > Matthieu, your work-around patch or something very close should
> > certainly be applied to the UBIFS tree, but I still would like to find
> > out what exactly happened in your setup.
> > 
> > I see 2 possibilities:
> > 
> > 1. An error happened and 'ubifs_garbage_collect()' returned while
> > c->gc_lnum was -1. But in this case we should have switched to R/O mode,
> > and the master node would not be written. But may be for some reasons we
> > did not switch to R/O mode, dunno.
> > 
> > 2. More likely scenario: in 'ubifs_rcvry_gc_commit()' we call
> > 'ubifs_garbage_collect_leb()' directly, which can return while
> > c->gc_lnum is -1. And we do not handle this.
> > 
> > Would you please be patient enough to reproduce the issue once again with
> > the following patch, which was created against the latest ubifs-2.6.git, but
> > you should be easily able to apply it to your tree.
> None of these check happen.
> 
> only the dump in ubifs_write_master.

Hmm.... This is weird... I think I need your UBIFS. Is it possible to
share? You can take vanilla 2.6.27 and put all UBIFS stuff there. Or
send patches against ubifs-v2.6.27.git
diff mbox

Patch

diff --git a/fs/ubifs/budget.c b/fs/ubifs/budget.c
index c8ff0d1..aa433cd 100644
--- a/fs/ubifs/budget.c
+++ b/fs/ubifs/budget.c
@@ -83,6 +83,10 @@  static int run_gc(struct ubifs_info *c)
 	down_read(&c->commit_sem);
 	lnum = ubifs_garbage_collect(c, 1);
 	up_read(&c->commit_sem);
+	if (c->gc_lnum == -1) {
+		ubifs_err("gc_lnum is -1! ubifs_garbage_collect() returned %d", lnum);
+		dump_stack();
+	}
 	if (lnum < 0)
 		return lnum;
 
diff --git a/fs/ubifs/gc.c b/fs/ubifs/gc.c
index 396f24a..0e78832 100644
--- a/fs/ubifs/gc.c
+++ b/fs/ubifs/gc.c
@@ -807,12 +807,20 @@  int ubifs_garbage_collect(struct ubifs_info *c, int anyway)
 		goto out;
 	}
 out_unlock:
+	if (c->gc_lnum == -1) {
+		ubifs_err("gc_lnum is -1! ubifs_garbage_collect() is returning %d", ret);
+		dump_stack();
+	}
 	mutex_unlock(&wbuf->io_mutex);
 	return ret;
 
 out:
 	ubifs_assert(ret < 0);
 	ubifs_assert(ret != -ENOSPC && ret != -EAGAIN);
+	if (c->gc_lnum == -1) {
+		ubifs_err("gc_lnum is -1! ubifs_garbage_collect() is returning %d", ret);
+		dump_stack();
+	}
 	ubifs_wbuf_sync_nolock(wbuf);
 	ubifs_ro_mode(c, ret);
 	mutex_unlock(&wbuf->io_mutex);
diff --git a/fs/ubifs/journal.c b/fs/ubifs/journal.c
index d321bae..44df514 100644
--- a/fs/ubifs/journal.c
+++ b/fs/ubifs/journal.c
@@ -162,6 +162,10 @@  again:
 	mutex_unlock(&wbuf->io_mutex);
 
 	lnum = ubifs_garbage_collect(c, 0);
+	if (c->gc_lnum == -1) {
+		ubifs_err("gc_lnum is -1! ubifs_garbage_collect() returned %d", lnum);
+		dump_stack();
+	}
 	if (lnum < 0) {
 		err = lnum;
 		if (err != -ENOSPC)
diff --git a/fs/ubifs/recovery.c b/fs/ubifs/recovery.c
index daae9e1..3058256 100644
--- a/fs/ubifs/recovery.c
+++ b/fs/ubifs/recovery.c
@@ -1126,6 +1126,10 @@  int ubifs_rcvry_gc_commit(struct ubifs_info *c)
 	dbg_rcvry("GC'ing LEB %d", lnum);
 	mutex_lock_nested(&wbuf->io_mutex, wbuf->jhead);
 	err = ubifs_garbage_collect_leb(c, &lp);
+	if (c->gc_lnum == -1) {
+		ubifs_err("gc_lnum is -1! ubifs_garbage_collect_leb() returned %d", err);
+		dump_stack();
+	}
 	if (err >= 0) {
 		int err2 = ubifs_wbuf_sync_nolock(wbuf);