diff mbox series

UBI using outdated fastmaps

Message ID 20171130085642.smi3eu2khcrnrvud@pengutronix.de
State Not Applicable
Headers show
Series UBI using outdated fastmaps | expand

Commit Message

Sascha Hauer Nov. 30, 2017, 8:56 a.m. UTC
Hi All,

We are chasing problems with corrupted UBI volumes here. With an
excessive load of power cuts we occasionally see UBI corruptions. Most
of the time we see that a LEB is unmapped although it should really be
mapped.

I finally found one place where such corruptions can happen. In
ubi_update_fastmap() the new fastmap is written. It can happen that
there is no free PEB to write the fastmap to. In this case the code
reuses the PEB where the old fastmap is. The critical place is when the
PEB with the old fastmap is erased but not updated with the new fastmap.
A power cut here can trick the fastmap attach code to use an outdated
fastmap during next boot. When during next boot no fastmap is found, the
code will fall back to scanning and everything is fine. It can, however,
happen that there is an even older fastmap found which is then used in
the lack of the more recent one which just got erased.

It is illegal to erase the PEB with the latest fastmap when there are
still older fastmaps on the device and this is what happens here.

The problem can be reproduced relatively easily with the patch below.
It panics the kernel at the right point after having erased the old
block, but before it is written again. The ubi.fm_debug parameter during
next boot will then find inconsistencies in case there is an old fastmap
on the flash.

I haven't digged further yet, right now I have no idea how to fix this
properly.

Sascha

Comments

Richard Weinberger Nov. 30, 2017, 9:35 a.m. UTC | #1
Sascha,

Am Donnerstag, 30. November 2017, 09:56:42 CET schrieb Sascha Hauer:
> Hi All,
> 
> We are chasing problems with corrupted UBI volumes here. With an
> excessive load of power cuts we occasionally see UBI corruptions. Most
> of the time we see that a LEB is unmapped although it should really be
> mapped.
> 
> I finally found one place where such corruptions can happen. In
> ubi_update_fastmap() the new fastmap is written. It can happen that
> there is no free PEB to write the fastmap to. In this case the code
> reuses the PEB where the old fastmap is. The critical place is when the
> PEB with the old fastmap is erased but not updated with the new fastmap.
> A power cut here can trick the fastmap attach code to use an outdated
> fastmap during next boot. When during next boot no fastmap is found, the
> code will fall back to scanning and everything is fine. It can, however,
> happen that there is an even older fastmap found which is then used in
> the lack of the more recent one which just got erased.
> 
> It is illegal to erase the PEB with the latest fastmap when there are
> still older fastmaps on the device and this is what happens here.
> 
> The problem can be reproduced relatively easily with the patch below.
> It panics the kernel at the right point after having erased the old
> block, but before it is written again. The ubi.fm_debug parameter during
> next boot will then find inconsistencies in case there is an old fastmap
> on the flash.

*geeez*

> I haven't digged further yet, right now I have no idea how to fix this
> properly.

Upon attach UBI schedules old fastmap PEBs for erasure.
So, before reusing (and therefore erasing) the current anchor PEB we have to
flush all UBI work. The we can erase the anchor. If a power cur happens in 
between, UBI will do a full scan in worst case.

Thanks,
//richard
diff mbox series

Patch

diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c
index 2542a44f47f9..ece6dfb0b054 100644
--- a/drivers/mtd/ubi/fastmap.c
+++ b/drivers/mtd/ubi/fastmap.c
@@ -1619,6 +1619,8 @@  int ubi_update_fastmap(struct ubi_device *ubi)
 		/* no fresh anchor PEB was found, reuse the old one */
 		if (!tmp_e) {
 			ret = erase_block(ubi, old_fm->e[0]->pnum);
+			printk("%s: re-use old fm_anchor. Erased PEB %d\n", __func__, old_fm->e[0]->pnum);
+			panic("boom");
 			if (ret < 0) {
 				ubi_err(ubi, "could not erase old anchor PEB");