diff mbox

UBI: add ubi_err() to report the failure of leb read

Message ID 548FE51D.60707@huawei.com
State Changes Requested
Headers show

Commit Message

hujianyang Dec. 16, 2014, 7:54 a.m. UTC
If an error occur while reading from PEBs, for example, an ECC error,
ubi_io_read() will print some error messages. But it's not enough for
debugging. These messages don't show the mapping info for a read from
UBIFS layer.

Although UBIFS will soon print its error messages after catching the
return value from UBI layer,  multi-path reading will confuse the
relationship between LEBs and PEBs showed by these messages.

This patch adds an ubi_err() to report reading errors in the function
ubi_eba_read_leb(). The mapping info of LEB and PEB is showed by
this error message.

Signed-off-by: hujianyang <hujianyang@huawei.com>
---
 drivers/mtd/ubi/eba.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

Comments

hujianyang Dec. 16, 2014, 8:02 a.m. UTC | #1
On 2014/12/16 15:54, hujianyang wrote:
> If an error occur while reading from PEBs, for example, an ECC error,
> ubi_io_read() will print some error messages. But it's not enough for
> debugging. These messages don't show the mapping info for a read from
> UBIFS layer.
> 
> Although UBIFS will soon print its error messages after catching the
> return value from UBI layer,  multi-path reading will confuse the
> relationship between LEBs and PEBs showed by these messages.
> 
> This patch adds an ubi_err() to report reading errors in the function
> ubi_eba_read_leb(). The mapping info of LEB and PEB is showed by
> this error message.
> 
> Signed-off-by: hujianyang <hujianyang@huawei.com>
> ---
>  drivers/mtd/ubi/eba.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/mtd/ubi/eba.c b/drivers/mtd/ubi/eba.c
> index b698534..b4e69e1 100644
> --- a/drivers/mtd/ubi/eba.c
> +++ b/drivers/mtd/ubi/eba.c
> @@ -477,6 +477,8 @@ out_free:
>  	ubi_free_vid_hdr(ubi, vid_hdr);
>  out_unlock:
>  	leb_read_unlock(ubi, vol_id, lnum);
> +	ubi_err(ubi, "err %d while reading %d bytes from offset %d of LEB %d:%d, PEB %d",
> +		err, len, offset, vol_id, lnum, pnum);
>  	return err;
>  }
> 

Hi,

I met a problem that I was failed to mount a UBIFS partition.

[   38.442770] UBI error: ubi_io_read: error -74 (ECC error) while reading 26624 bytes from PEB 54:104448, read 26624 bytes
[   38.852461] UBI error: ubi_io_read: error -74 (ECC error) while reading 77824 bytes from PEB 346:53248, read 77824 bytes
[   38.864142] UBIFS error (pid 1444): ubifs_recover_leb: corruption -3
[   38.870487] UBIFS error (pid 1444): ubifs_scanned_corruption: corruption at LEB 928:55280
[   38.878625] UBIFS error (pid 1444): ubifs_scanned_corruption: first 8192 bytes from LEB 928:55280
[   38.892117] UBIFS error (pid 1444): ubifs_recover_leb: LEB 928 scanning failed
mount: mounting ubi1:bak on /HFFS2: failed: Structure needs cleaning

I think it is caused by an ECC error of nand flash. Do we have some methods
to mount this partition? Data losing is acceptable.


Thanks,

Hu
Richard Weinberger Dec. 16, 2014, 8:58 a.m. UTC | #2
Am 16.12.2014 um 08:54 schrieb hujianyang:
> If an error occur while reading from PEBs, for example, an ECC error,
> ubi_io_read() will print some error messages. But it's not enough for
> debugging. These messages don't show the mapping info for a read from
> UBIFS layer.
> 
> Although UBIFS will soon print its error messages after catching the
> return value from UBI layer,  multi-path reading will confuse the
> relationship between LEBs and PEBs showed by these messages.
> 
> This patch adds an ubi_err() to report reading errors in the function
> ubi_eba_read_leb(). The mapping info of LEB and PEB is showed by
> this error message.
> 
> Signed-off-by: hujianyang <hujianyang@huawei.com>
> ---
>  drivers/mtd/ubi/eba.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/mtd/ubi/eba.c b/drivers/mtd/ubi/eba.c
> index b698534..b4e69e1 100644
> --- a/drivers/mtd/ubi/eba.c
> +++ b/drivers/mtd/ubi/eba.c
> @@ -477,6 +477,8 @@ out_free:
>  	ubi_free_vid_hdr(ubi, vid_hdr);
>  out_unlock:
>  	leb_read_unlock(ubi, vol_id, lnum);
> +	ubi_err(ubi, "err %d while reading %d bytes from offset %d of LEB %d:%d, PEB %d",
> +		err, len, offset, vol_id, lnum, pnum);

This label will also be reached if we run out of memory.
Please make sure that the new ubi_err() can only be reached in case of an MTD error.
Also make sure that the function prints only one message and not two.

Thanks,
//richard
Richard Weinberger Dec. 16, 2014, 9:21 a.m. UTC | #3
Am 16.12.2014 um 09:02 schrieb hujianyang:
> Hi,
> 
> I met a problem that I was failed to mount a UBIFS partition.
> 
> [   38.442770] UBI error: ubi_io_read: error -74 (ECC error) while reading 26624 bytes from PEB 54:104448, read 26624 bytes
> [   38.852461] UBI error: ubi_io_read: error -74 (ECC error) while reading 77824 bytes from PEB 346:53248, read 77824 bytes
> [   38.864142] UBIFS error (pid 1444): ubifs_recover_leb: corruption -3
> [   38.870487] UBIFS error (pid 1444): ubifs_scanned_corruption: corruption at LEB 928:55280
> [   38.878625] UBIFS error (pid 1444): ubifs_scanned_corruption: first 8192 bytes from LEB 928:55280
> [   38.892117] UBIFS error (pid 1444): ubifs_recover_leb: LEB 928 scanning failed
> mount: mounting ubi1:bak on /HFFS2: failed: Structure needs cleaning
> 
> I think it is caused by an ECC error of nand flash. Do we have some methods
> to mount this partition? Data losing is acceptable.

I don't think that UBIFS has such a mount option.
You can dump the raw data and inspect the corrupted data.
Maybe you can fix it by hand.

Thanks,
//richard
hujianyang Dec. 16, 2014, 9:52 a.m. UTC | #4
On 2014/12/16 17:21, Richard Weinberger wrote:
> Am 16.12.2014 um 09:02 schrieb hujianyang:
>> Hi,
>>
>> I met a problem that I was failed to mount a UBIFS partition.
>>
>> [   38.442770] UBI error: ubi_io_read: error -74 (ECC error) while reading 26624 bytes from PEB 54:104448, read 26624 bytes
>> [   38.852461] UBI error: ubi_io_read: error -74 (ECC error) while reading 77824 bytes from PEB 346:53248, read 77824 bytes
>> [   38.864142] UBIFS error (pid 1444): ubifs_recover_leb: corruption -3
>> [   38.870487] UBIFS error (pid 1444): ubifs_scanned_corruption: corruption at LEB 928:55280
>> [   38.878625] UBIFS error (pid 1444): ubifs_scanned_corruption: first 8192 bytes from LEB 928:55280
>> [   38.892117] UBIFS error (pid 1444): ubifs_recover_leb: LEB 928 scanning failed
>> mount: mounting ubi1:bak on /HFFS2: failed: Structure needs cleaning
>>
>> I think it is caused by an ECC error of nand flash. Do we have some methods
>> to mount this partition? Data losing is acceptable.
> 
> I don't think that UBIFS has such a mount option.

Er, I don't know it either. How about a mount option like --force?

> You can dump the raw data and inspect the corrupted data.
> Maybe you can fix it by hand.

Yes, I want a try~! If we have to introduce a new feature or new mount
option. So would you like to help me? Do you think it's a valuable
work?


Thanks~

Hu
Richard Weinberger Dec. 16, 2014, 9:57 a.m. UTC | #5
Am 16.12.2014 um 10:52 schrieb hujianyang:
> On 2014/12/16 17:21, Richard Weinberger wrote:
>> Am 16.12.2014 um 09:02 schrieb hujianyang:
>>> Hi,
>>>
>>> I met a problem that I was failed to mount a UBIFS partition.
>>>
>>> [   38.442770] UBI error: ubi_io_read: error -74 (ECC error) while reading 26624 bytes from PEB 54:104448, read 26624 bytes
>>> [   38.852461] UBI error: ubi_io_read: error -74 (ECC error) while reading 77824 bytes from PEB 346:53248, read 77824 bytes
>>> [   38.864142] UBIFS error (pid 1444): ubifs_recover_leb: corruption -3
>>> [   38.870487] UBIFS error (pid 1444): ubifs_scanned_corruption: corruption at LEB 928:55280
>>> [   38.878625] UBIFS error (pid 1444): ubifs_scanned_corruption: first 8192 bytes from LEB 928:55280
>>> [   38.892117] UBIFS error (pid 1444): ubifs_recover_leb: LEB 928 scanning failed
>>> mount: mounting ubi1:bak on /HFFS2: failed: Structure needs cleaning
>>>
>>> I think it is caused by an ECC error of nand flash. Do we have some methods
>>> to mount this partition? Data losing is acceptable.
>>
>> I don't think that UBIFS has such a mount option.
> 
> Er, I don't know it either. How about a mount option like --force?

Then every single embedded vendor will use this flag to keep the broken MTD/UBI/UBIFS setups running
as long as possible no mater of how corrupted the data is. :-)
IIRC UBIFS will either mount and work correctly as expected or fail hard.

>> You can dump the raw data and inspect the corrupted data.
>> Maybe you can fix it by hand.
> 
> Yes, I want a try~! If we have to introduce a new feature or new mount
> option. So would you like to help me? Do you think it's a valuable
> work?

I'm not a fan of such a mount option.
What we really need is a fsck.ubifs and a ubifs dump tool to fix and recover
broken UBIFS images.

Thanks,
//richard
hujianyang Dec. 16, 2014, 10:12 a.m. UTC | #6
On 2014/12/16 16:58, Richard Weinberger wrote:
> Am 16.12.2014 um 08:54 schrieb hujianyang:
>> If an error occur while reading from PEBs, for example, an ECC error,
>> ubi_io_read() will print some error messages. But it's not enough for
>> debugging. These messages don't show the mapping info for a read from
>> UBIFS layer.
>>
>> Although UBIFS will soon print its error messages after catching the
>> return value from UBI layer,  multi-path reading will confuse the
>> relationship between LEBs and PEBs showed by these messages.
>>
>> This patch adds an ubi_err() to report reading errors in the function
>> ubi_eba_read_leb(). The mapping info of LEB and PEB is showed by
>> this error message.
>>
>> Signed-off-by: hujianyang <hujianyang@huawei.com>
>> ---
>>  drivers/mtd/ubi/eba.c |    2 ++
>>  1 files changed, 2 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/mtd/ubi/eba.c b/drivers/mtd/ubi/eba.c
>> index b698534..b4e69e1 100644
>> --- a/drivers/mtd/ubi/eba.c
>> +++ b/drivers/mtd/ubi/eba.c
>> @@ -477,6 +477,8 @@ out_free:
>>  	ubi_free_vid_hdr(ubi, vid_hdr);
>>  out_unlock:
>>  	leb_read_unlock(ubi, vol_id, lnum);
>> +	ubi_err(ubi, "err %d while reading %d bytes from offset %d of LEB %d:%d, PEB %d",
>> +		err, len, offset, vol_id, lnum, pnum);
> 
> This label will also be reached if we run out of memory.

Yes, but I think it's OK. Because a read operation is really failed
by some errors. We can print these errors not only in case of MTD
errors but also in case of CRC errors, ENOMEM or others.

> Please make sure that the new ubi_err() can only be reached in case of an MTD error.
> Also make sure that the function prints only one message and not two.

Er, At first, I want to print just MTD error, perform ubi_err() in
mtd_is_eccerr() case, for example. But I found it's not easy to print
it only once. The *retry* label makes it difficulty to determine when
this function, ubi_eba_read_leb() returns, this message will print
two times or none.

So at last, I move this ubi_err() to the end of error handling path
for easy. ^.^

What's your opinion?

Hu
hujianyang Dec. 16, 2014, 10:34 a.m. UTC | #7
On 2014/12/16 17:57, Richard Weinberger wrote:
> Then every single embedded vendor will use this flag to keep the broken MTD/UBI/UBIFS setups running
> as long as possible no mater of how corrupted the data is. :-)
> IIRC UBIFS will either mount and work correctly as expected or fail hard.

You are right~!

Maybe we can set filesystem to RO if it is mounted with --force, and
allow users to copy their data to other place.

How about this?

> 
>>> You can dump the raw data and inspect the corrupted data.
>>> Maybe you can fix it by hand.
>>
>> Yes, I want a try~! If we have to introduce a new feature or new mount
>> option. So would you like to help me? Do you think it's a valuable
>> work?
> 
> I'm not a fan of such a mount option.
> What we really need is a fsck.ubifs and a ubifs dump tool to fix and recover
> broken UBIFS images.

I think it's better, but a bit harder. As I know, my UBIDUMP is far
from what you expect. I should spend more time on it.

After all, I was asked to fix this error. My plan is do something
after an ECC error is detected, not directly breakout to allow
this partition to be mounted. I don't think this solution will be
easily accept by mainline.

However, I'd like to show my work if I succeed.

Thanks~!

Hu
diff mbox

Patch

diff --git a/drivers/mtd/ubi/eba.c b/drivers/mtd/ubi/eba.c
index b698534..b4e69e1 100644
--- a/drivers/mtd/ubi/eba.c
+++ b/drivers/mtd/ubi/eba.c
@@ -477,6 +477,8 @@  out_free:
 	ubi_free_vid_hdr(ubi, vid_hdr);
 out_unlock:
 	leb_read_unlock(ubi, vol_id, lnum);
+	ubi_err(ubi, "err %d while reading %d bytes from offset %d of LEB %d:%d, PEB %d",
+		err, len, offset, vol_id, lnum, pnum);
 	return err;
 }