diff mbox

[1/2] UBI: Fix error path in scan_pool()

Message ID 1380141125-11063-1-git-send-email-richard@nod.at
State Accepted
Commit f240dca8f2fdf11d52a7bacd867e274c3b21a7d3
Headers show

Commit Message

Richard Weinberger Sept. 25, 2013, 8:32 p.m. UTC
We have to set "ret", not "err" in case of an error.

Reported-by: Richard Genoud <richard.genoud@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
---
 drivers/mtd/ubi/fastmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Richard Genoud Sept. 26, 2013, 7:20 a.m. UTC | #1
On 25/09/2013 22:32, Richard Weinberger wrote:
> We have to set "ret", not "err" in case of an error.
> 
> Reported-by: Richard Genoud <richard.genoud@gmail.com>
> Signed-off-by: Richard Weinberger <richard@nod.at>
> ---
>  drivers/mtd/ubi/fastmap.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c
> index f5aa4b0..9b42add 100644
> --- a/drivers/mtd/ubi/fastmap.c
> +++ b/drivers/mtd/ubi/fastmap.c
> @@ -428,7 +428,7 @@ static int scan_pool(struct ubi_device *ubi, struct ubi_attach_info *ai,
>  		if (be32_to_cpu(ech->image_seq) != ubi->image_seq) {
>  			ubi_err("bad image seq: 0x%x, expected: 0x%x",
>  				be32_to_cpu(ech->image_seq), ubi->image_seq);
> -			err = UBI_BAD_FASTMAP;
> +			ret = UBI_BAD_FASTMAP;
>  			goto out;
>  		}
>  
> 

Tested-by: Richard Genoud <richard.genoud@gmail.com>

It reveals another bug though.

[    0.812500] UBI: default fastmap pool size: 95
[    0.820312] UBI: default fastmap WL pool size: 25
[    0.820312] UBI: attaching mtd2 to ubi0
[    0.914062] UBI error: scan_pool: bad image seq: 0x0, expected: 0x352a9728
[    0.921875] UBI error: ubi_scan_fastmap: Attach by fastmap failed, doing a full scan!
[    0.929687] kmem_cache_destroy ubi_ainf_peb_slab: Slab cache still has objects
[    0.937500] CPU: 0 PID: 1 Comm: swapper Not tainted 3.11.1 #15
[    0.945312] [<c0012e20>] (unwind_backtrace+0x0/0xe0) from [<c0010bb8>] (show_stack+0x10/0x14)
[    0.953125] [<c0010bb8>] (show_stack+0x10/0x14) from [<c01a15e4>] (destroy_ai+0x230/0x244)
[    0.960937] [<c01a15e4>] (destroy_ai+0x230/0x244) from [<c01a2c5c>] (ubi_attach+0x1f8/0x34c)
[    0.968750] [<c01a2c5c>] (ubi_attach+0x1f8/0x34c) from [<c0197678>] (ubi_attach_mtd_dev+0x2b8/0x86c)
[    0.976562] [<c0197678>] (ubi_attach_mtd_dev+0x2b8/0x86c) from [<c0346630>] (ubi_init+0x1d8/0x29c)
[    0.984375] [<c0346630>] (ubi_init+0x1d8/0x29c) from [<c0008810>] (do_one_initcall+0x94/0x144)
[    0.992187] [<c0008810>] (do_one_initcall+0x94/0x144) from [<c0335a9c>] (kernel_init_freeable+0xe8/0x1ac)
[    1.000000] [<c0335a9c>] (kernel_init_freeable+0xe8/0x1ac) from [<c0251500>] (kernel_init+0x8/0xe4)
[    1.007812] [<c0251500>] (kernel_init+0x8/0xe4) from [<c000e590>] (ret_from_fork+0x14/0x24)
[    1.742187] UBI: scanning is finished

Thanks !

Richard.
Richard Weinberger Sept. 26, 2013, 7:24 a.m. UTC | #2
Am 26.09.2013 09:20, schrieb Richard Genoud:
> On 25/09/2013 22:32, Richard Weinberger wrote:
>> We have to set "ret", not "err" in case of an error.
>>
>> Reported-by: Richard Genoud <richard.genoud@gmail.com>
>> Signed-off-by: Richard Weinberger <richard@nod.at>
>> ---
>>  drivers/mtd/ubi/fastmap.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c
>> index f5aa4b0..9b42add 100644
>> --- a/drivers/mtd/ubi/fastmap.c
>> +++ b/drivers/mtd/ubi/fastmap.c
>> @@ -428,7 +428,7 @@ static int scan_pool(struct ubi_device *ubi, struct ubi_attach_info *ai,
>>  		if (be32_to_cpu(ech->image_seq) != ubi->image_seq) {
>>  			ubi_err("bad image seq: 0x%x, expected: 0x%x",
>>  				be32_to_cpu(ech->image_seq), ubi->image_seq);
>> -			err = UBI_BAD_FASTMAP;
>> +			ret = UBI_BAD_FASTMAP;
>>  			goto out;
>>  		}
>>  
>>
> 
> Tested-by: Richard Genoud <richard.genoud@gmail.com>
> 
> It reveals another bug though.
> 
> [    0.812500] UBI: default fastmap pool size: 95
> [    0.820312] UBI: default fastmap WL pool size: 25
> [    0.820312] UBI: attaching mtd2 to ubi0
> [    0.914062] UBI error: scan_pool: bad image seq: 0x0, expected: 0x352a9728

This worries me more...
The fastmap is corrupted, now we have to find out why.
Broken driver? Powercut? IO errors?

> [    0.921875] UBI error: ubi_scan_fastmap: Attach by fastmap failed, doing a full scan!
> [    0.929687] kmem_cache_destroy ubi_ainf_peb_slab: Slab cache still has objects

I see.

Thanks,
//richard
Richard Genoud Sept. 26, 2013, 7:33 a.m. UTC | #3
2013/9/26 Richard Weinberger <richard@nod.at>:
> Am 26.09.2013 09:20, schrieb Richard Genoud:
>> On 25/09/2013 22:32, Richard Weinberger wrote:
>>> We have to set "ret", not "err" in case of an error.
>>>
>>> Reported-by: Richard Genoud <richard.genoud@gmail.com>
>>> Signed-off-by: Richard Weinberger <richard@nod.at>
>>> ---
>>>  drivers/mtd/ubi/fastmap.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c
>>> index f5aa4b0..9b42add 100644
>>> --- a/drivers/mtd/ubi/fastmap.c
>>> +++ b/drivers/mtd/ubi/fastmap.c
>>> @@ -428,7 +428,7 @@ static int scan_pool(struct ubi_device *ubi, struct ubi_attach_info *ai,
>>>              if (be32_to_cpu(ech->image_seq) != ubi->image_seq) {
>>>                      ubi_err("bad image seq: 0x%x, expected: 0x%x",
>>>                              be32_to_cpu(ech->image_seq), ubi->image_seq);
>>> -                    err = UBI_BAD_FASTMAP;
>>> +                    ret = UBI_BAD_FASTMAP;
>>>                      goto out;
>>>              }
>>>
>>>
>>
>> Tested-by: Richard Genoud <richard.genoud@gmail.com>
>>
>> It reveals another bug though.
>>
>> [    0.812500] UBI: default fastmap pool size: 95
>> [    0.820312] UBI: default fastmap WL pool size: 25
>> [    0.820312] UBI: attaching mtd2 to ubi0
>> [    0.914062] UBI error: scan_pool: bad image seq: 0x0, expected: 0x352a9728
>
> This worries me more...
> The fastmap is corrupted, now we have to find out why.
> Broken driver? Powercut? IO errors?
I guess this is caused by u-boot when it deletes the UBI fastmap
internal volume.
I'm replaying the whole scenario with debug to see when it happens.

>
>> [    0.921875] UBI error: ubi_scan_fastmap: Attach by fastmap failed, doing a full scan!
>> [    0.929687] kmem_cache_destroy ubi_ainf_peb_slab: Slab cache still has objects
>
> I see.
>
> Thanks,
> //richard
Richard Genoud Sept. 26, 2013, 9:25 a.m. UTC | #4
2013/9/26 Richard Genoud <richard.genoud@gmail.com>:
> 2013/9/26 Richard Weinberger <richard@nod.at>:
>> Am 26.09.2013 09:20, schrieb Richard Genoud:
>>> On 25/09/2013 22:32, Richard Weinberger wrote:
>>>> We have to set "ret", not "err" in case of an error.
>>>>
>>>> Reported-by: Richard Genoud <richard.genoud@gmail.com>
>>>> Signed-off-by: Richard Weinberger <richard@nod.at>
>>>> ---
>>>>  drivers/mtd/ubi/fastmap.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c
>>>> index f5aa4b0..9b42add 100644
>>>> --- a/drivers/mtd/ubi/fastmap.c
>>>> +++ b/drivers/mtd/ubi/fastmap.c
>>>> @@ -428,7 +428,7 @@ static int scan_pool(struct ubi_device *ubi, struct ubi_attach_info *ai,
>>>>              if (be32_to_cpu(ech->image_seq) != ubi->image_seq) {
>>>>                      ubi_err("bad image seq: 0x%x, expected: 0x%x",
>>>>                              be32_to_cpu(ech->image_seq), ubi->image_seq);
>>>> -                    err = UBI_BAD_FASTMAP;
>>>> +                    ret = UBI_BAD_FASTMAP;
>>>>                      goto out;
>>>>              }
>>>>
>>>>
>>>
>>> Tested-by: Richard Genoud <richard.genoud@gmail.com>
>>>
>>> It reveals another bug though.
>>>
>>> [    0.812500] UBI: default fastmap pool size: 95
>>> [    0.820312] UBI: default fastmap WL pool size: 25
>>> [    0.820312] UBI: attaching mtd2 to ubi0
>>> [    0.914062] UBI error: scan_pool: bad image seq: 0x0, expected: 0x352a9728
>>
>> This worries me more...
>> The fastmap is corrupted, now we have to find out why.
>> Broken driver? Powercut? IO errors?
> I guess this is caused by u-boot when it deletes the UBI fastmap
> internal volume.
> I'm replaying the whole scenario with debug to see when it happens.

I added some traces and I found that : (dumping ec_header after
"ubi_io_read_ec_hdr(ubi, pnum, ech, 0); " in ubi_scan_fastmap())
[    0.812500] UBI: default fastmap pool size: 95
[    0.820312] UBI: default fastmap WL pool size: 25
[    0.828125] UBI: attaching mtd2 to ubi0
[    0.851562] ubi_scan_fastmap:
[    0.859375] Erase counter header dump:
[    0.859375]  magic          0x55424923
[    0.867187]  version        1
[    0.867187]  ec             1
[    0.867187]  vid_hdr_offset 2048
[    0.875000]  data_offset    4096
[    0.875000]  image_seq      891983656 <- image seq is good.
[    0.875000]  hdr_crc        0x13cc9a78
[    0.882812] erase counter header hexdump:
[    0.882812] 00000000: 55 42 49 23 01 00 00 00 00 00 00 00 00 00 00
01 00 00 08 00 00 00 10 00 35 2a 97
28 00 00 00 00  UBI#....................5*.(....
[    0.890625] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00
00 13 cc 9a 78  ...............................x
[    0.960937] UBI error: scan_pool: bad image seq: 0x0, expected: 0x352a9728
[    0.968750] UBI error: ubi_scan_fastmap: Attach by fastmap failed,
doing a full scan!
[    0.976562] kmem_cache_destroy ubi_ainf_peb_slab: Slab cache still
has objects
[    0.976562] CPU: 0 PID: 1 Comm: swapper Not tainted 3.11.1 #20
[    0.984375] [<c0012e20>] (unwind_backtrace+0x0/0xe0) from
[<c0010bb8>] (show_stack+0x10/0x14)
[    0.992187] [<c0010bb8>] (show_stack+0x10/0x14) from [<c01a15e4>]
(destroy_ai+0x230/0x244)
[    1.000000] [<c01a15e4>] (destroy_ai+0x230/0x244) from [<c01a2c4c>]
(ubi_attach+0x1e8/0x34c)
[    1.007812] [<c01a2c4c>] (ubi_attach+0x1e8/0x34c) from [<c0197678>]
(ubi_attach_mtd_dev+0x2b8/0x86c)
[    1.015625] [<c0197678>] (ubi_attach_mtd_dev+0x2b8/0x86c) from
[<c0346630>] (ubi_init+0x1d8/0x29c)
[    1.023437] [<c0346630>] (ubi_init+0x1d8/0x29c) from [<c0008810>]
(do_one_initcall+0x94/0x144)
[    1.031250] [<c0008810>] (do_one_initcall+0x94/0x144) from
[<c0335a9c>] (kernel_init_freeable+0xe8/0x1a
c)
[    1.039062] [<c0335a9c>] (kernel_init_freeable+0xe8/0x1ac) from
[<c0251528>] (kernel_init+0x8/0xe4)
[    1.046875] [<c0251528>] (kernel_init+0x8/0xe4) from [<c000e590>]
(ret_from_fork+0x14/0x24)
[    1.804687] UBI: scanning is finished
Richard Weinberger Sept. 26, 2013, 9:59 a.m. UTC | #5
Am 26.09.2013 11:25, schrieb Richard Genoud:
> I added some traces and I found that : (dumping ec_header after
> "ubi_io_read_ec_hdr(ubi, pnum, ech, 0); " in ubi_scan_fastmap())

Please show me the diff to understand what exactly you did. :)

> [    0.812500] UBI: default fastmap pool size: 95
> [    0.820312] UBI: default fastmap WL pool size: 25
> [    0.828125] UBI: attaching mtd2 to ubi0
> [    0.851562] ubi_scan_fastmap:
> [    0.859375] Erase counter header dump:
> [    0.859375]  magic          0x55424923
> [    0.867187]  version        1
> [    0.867187]  ec             1
> [    0.867187]  vid_hdr_offset 2048
> [    0.875000]  data_offset    4096
> [    0.875000]  image_seq      891983656 <- image seq is good.
> [    0.875000]  hdr_crc        0x13cc9a78
> [    0.882812] erase counter header hexdump:
> [    0.882812] 00000000: 55 42 49 23 01 00 00 00 00 00 00 00 00 00 00
> 01 00 00 08 00 00 00 10 00 35 2a 97
> 28 00 00 00 00  UBI#....................5*.(....
> [    0.890625] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00 00 00 00 00 00 00 00 00
> 00 13 cc 9a 78  ...............................x

Hmm, maybe a memory corruption. Strange.

Thanks,
//richard
Richard Weinberger Sept. 27, 2013, 11:01 a.m. UTC | #6
Am 26.09.2013 11:25, schrieb Richard Genoud:
> I added some traces and I found that : (dumping ec_header after
> "ubi_io_read_ec_hdr(ubi, pnum, ech, 0); " in ubi_scan_fastmap())
> [    0.812500] UBI: default fastmap pool size: 95
> [    0.820312] UBI: default fastmap WL pool size: 25
> [    0.828125] UBI: attaching mtd2 to ubi0
> [    0.851562] ubi_scan_fastmap:
> [    0.859375] Erase counter header dump:
> [    0.859375]  magic          0x55424923
> [    0.867187]  version        1
> [    0.867187]  ec             1
> [    0.867187]  vid_hdr_offset 2048
> [    0.875000]  data_offset    4096
> [    0.875000]  image_seq      891983656 <- image seq is good.
> [    0.875000]  hdr_crc        0x13cc9a78
> [    0.882812] erase counter header hexdump:
> [    0.882812] 00000000: 55 42 49 23 01 00 00 00 00 00 00 00 00 00 00
> 01 00 00 08 00 00 00 10 00 35 2a 97
> 28 00 00 00 00  UBI#....................5*.(....
> [    0.890625] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00 00 00 00 00 00 00 00 00
> 00 13 cc 9a 78  ...............................x
> [    0.960937] UBI error: scan_pool: bad image seq: 0x0, expected: 0x352a9728

BTW: Do you really print the same PEB as scan_pool() complains about?
Do you know how many PEBs for one fastmap are needed on your target?

Thanks,
//richard
Richard Genoud Sept. 27, 2013, 11:12 a.m. UTC | #7
2013/9/27 Richard Weinberger <richard@nod.at>:
> Am 26.09.2013 11:25, schrieb Richard Genoud:
>> I added some traces and I found that : (dumping ec_header after
>> "ubi_io_read_ec_hdr(ubi, pnum, ech, 0); " in ubi_scan_fastmap())
>> [    0.812500] UBI: default fastmap pool size: 95
>> [    0.820312] UBI: default fastmap WL pool size: 25
>> [    0.828125] UBI: attaching mtd2 to ubi0
>> [    0.851562] ubi_scan_fastmap:
>> [    0.859375] Erase counter header dump:
>> [    0.859375]  magic          0x55424923
>> [    0.867187]  version        1
>> [    0.867187]  ec             1
>> [    0.867187]  vid_hdr_offset 2048
>> [    0.875000]  data_offset    4096
>> [    0.875000]  image_seq      891983656 <- image seq is good.
>> [    0.875000]  hdr_crc        0x13cc9a78
>> [    0.882812] erase counter header hexdump:
>> [    0.882812] 00000000: 55 42 49 23 01 00 00 00 00 00 00 00 00 00 00
>> 01 00 00 08 00 00 00 10 00 35 2a 97
>> 28 00 00 00 00  UBI#....................5*.(....
>> [    0.890625] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 00 00 00 00 00 00 00 00 00 00 00 00
>> 00 13 cc 9a 78  ...............................x
>> [    0.960937] UBI error: scan_pool: bad image seq: 0x0, expected: 0x352a9728
>
> BTW: Do you really print the same PEB as scan_pool() complains about?
No, those debug traces were dumb, sorry...
I understand fastmap code better now :)
> Do you know how many PEBs for one fastmap are needed on your target?

Actually, I just found the problem.
U-boot has an old UBI implementation that was not aware of image_seq.
So when I write the ubifs volume, some PEB got erased by UBI(in uboot)
and written with a image_seq==0.
and fastmap doesn't pay attention to this use case.

I'll have to test a patch I made and I'll send it in a while.


Richard.
diff mbox

Patch

diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c
index f5aa4b0..9b42add 100644
--- a/drivers/mtd/ubi/fastmap.c
+++ b/drivers/mtd/ubi/fastmap.c
@@ -428,7 +428,7 @@  static int scan_pool(struct ubi_device *ubi, struct ubi_attach_info *ai,
 		if (be32_to_cpu(ech->image_seq) != ubi->image_seq) {
 			ubi_err("bad image seq: 0x%x, expected: 0x%x",
 				be32_to_cpu(ech->image_seq), ubi->image_seq);
-			err = UBI_BAD_FASTMAP;
+			ret = UBI_BAD_FASTMAP;
 			goto out;
 		}