diff mbox

UBIFS failure & stable page writes

Message ID 85D877DD6EE67B4A9FCA9B9C3A4865670C3F1A3C5D@SI-MBX14.de.bosch.com
State New, archived
Headers show

Commit Message

Prins Anton (ST-CO/ENG1.1) June 13, 2013, 1:31 p.m. UTC
We decided not to patch for this weekend, but onlty make an additional logging in UBIFS:


This to make sure if UBIFS itself writes a node '0' or '1'... and it is forced by UBI, NAND, Peripheral or NAND-Device.
If there is a relation between logging and failing after reboot it would make sense... Means a lot of analyzing; but we have to find it!

Next step is to apply patches and to test again:
http://git.infradead.org/ubifs-2.6.git/commit/8afd500cb52a5d00bab4525dd5a560d199f979b9
http://git.infradead.org/ubifs-2.6.git/commit/2928f0d0c5ebd6c9605c0d98207a44376387c298

And hopefully we get rid of some unexpected orphan nodes.

How realistic is it that the double orphan free causes our problem?
Mats, are you sure the patches mentioned above are also not in your UBIFS?

Comments

Adrian Hunter June 13, 2013, 1:41 p.m. UTC | #1
On 13/06/13 16:31, Prins Anton (ST-CO/ENG1.1) wrote:
> We decided not to patch for this weekend, but onlty make an additional logging in UBIFS:
>
> diff -purN a/fs/ubifs/orphan.c b/fs/ubifs/orphan.c
> --- a/fs/ubifs/orphan.c	2013-06-13 12:19:58.490931170 +0200
> +++ b/fs/ubifs/orphan.c	2013-06-13 12:17:13.014931462 +0200
> @@ -300,6 +300,9 @@ static int write_orph_node(struct ubifs_
>  	for (i = 0; i < cnt; i++) {
>  		orphan = cnext;
>  		orph->inos[i] = cpu_to_le64(orphan->inum);
> +		if (orph->inos[i] < UBIFS_FIRST_INO) {
> +			printk(KERN_ERR "ERROR: Wrong ino in orphan list[%lu]: %lu\n", (unsigned long)i, (unsigned long)orph->inos[i]);
> +		}
>  		cnext = orphan->cnext;
>  		orphan->cnext = NULL;
>  	}
>
> This to make sure if UBIFS itself writes a node '0' or '1'... and it is forced by UBI, NAND, Peripheral or NAND-Device.
> If there is a relation between logging and failing after reboot it would make sense... Means a lot of analyzing; but we have to find it!

We know UBIFS writes the '0' or '1' because the CRC is correct.  UBIFS would
complain loudly if it were not.

>
> Next step is to apply patches and to test again:
> http://git.infradead.org/ubifs-2.6.git/commit/8afd500cb52a5d00bab4525dd5a560d199f979b9
> http://git.infradead.org/ubifs-2.6.git/commit/2928f0d0c5ebd6c9605c0d98207a44376387c298
>
> And hopefully we get rid of some unexpected orphan nodes.
>
> How realistic is it that the double orphan free causes our problem?
> Mats, are you sure the patches mentioned above are also not in your UBIFS?
>
>
>
Prins Anton (ST-CO/ENG1.1) June 13, 2013, 2:02 p.m. UTC | #2
Ah, right! Understood.

Then we will see "inode < UBIFS_FIRST_INO" log messages on Monday, pointing to the devices that are going to fail after reboot!

This will make sense, because once applied the patches, we can check for the 'log message' which makes all more deterministic (and allowes for backtracing at moment of failure instead of looking 'after reboot').

Thanks!

-----Original Message-----
From: Adrian Hunter [mailto:adrian.hunter@intel.com] 
Sent: donderdag 13 juni 2013 15:41
To: Prins Anton (ST-CO/ENG1.1)
Cc: Mats Kärrman; linux-mtd@lists.infradead.org; dedekind1@gmail.com
Subject: Re: UBIFS failure & stable page writes

On 13/06/13 16:31, Prins Anton (ST-CO/ENG1.1) wrote:
> We decided not to patch for this weekend, but onlty make an additional logging in UBIFS:
>
> diff -purN a/fs/ubifs/orphan.c b/fs/ubifs/orphan.c
> --- a/fs/ubifs/orphan.c	2013-06-13 12:19:58.490931170 +0200
> +++ b/fs/ubifs/orphan.c	2013-06-13 12:17:13.014931462 +0200
> @@ -300,6 +300,9 @@ static int write_orph_node(struct ubifs_
>  	for (i = 0; i < cnt; i++) {
>  		orphan = cnext;
>  		orph->inos[i] = cpu_to_le64(orphan->inum);
> +		if (orph->inos[i] < UBIFS_FIRST_INO) {
> +			printk(KERN_ERR "ERROR: Wrong ino in orphan list[%lu]: %lu\n", (unsigned long)i, (unsigned long)orph->inos[i]);
> +		}
>  		cnext = orphan->cnext;
>  		orphan->cnext = NULL;
>  	}
>
> This to make sure if UBIFS itself writes a node '0' or '1'... and it is forced by UBI, NAND, Peripheral or NAND-Device.
> If there is a relation between logging and failing after reboot it would make sense... Means a lot of analyzing; but we have to find it!

We know UBIFS writes the '0' or '1' because the CRC is correct.  UBIFS would
complain loudly if it were not.

>
> Next step is to apply patches and to test again:
> http://git.infradead.org/ubifs-2.6.git/commit/8afd500cb52a5d00bab4525dd5a560d199f979b9
> http://git.infradead.org/ubifs-2.6.git/commit/2928f0d0c5ebd6c9605c0d98207a44376387c298
>
> And hopefully we get rid of some unexpected orphan nodes.
>
> How realistic is it that the double orphan free causes our problem?
> Mats, are you sure the patches mentioned above are also not in your UBIFS?
>
>
>
diff mbox

Patch

diff -purN a/fs/ubifs/orphan.c b/fs/ubifs/orphan.c
--- a/fs/ubifs/orphan.c	2013-06-13 12:19:58.490931170 +0200
+++ b/fs/ubifs/orphan.c	2013-06-13 12:17:13.014931462 +0200
@@ -300,6 +300,9 @@  static int write_orph_node(struct ubifs_
 	for (i = 0; i < cnt; i++) {
 		orphan = cnext;
 		orph->inos[i] = cpu_to_le64(orphan->inum);
+		if (orph->inos[i] < UBIFS_FIRST_INO) {
+			printk(KERN_ERR "ERROR: Wrong ino in orphan list[%lu]: %lu\n", (unsigned long)i, (unsigned long)orph->inos[i]);
+		}
 		cnext = orphan->cnext;
 		orphan->cnext = NULL;
 	}