diff mbox series

[v6,12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

Message ID 20211227032246.2886878-13-chengzhihao1@huawei.com
State Under Review
Delegated to: Richard Weinberger
Headers show
Series Some bugfixs for ubi/ubifs | expand

Commit Message

Zhihao Cheng Dec. 27, 2021, 3:22 a.m. UTC
Fastmap pebs(pnum >= UBI_FM_MAX_START) won't be added into 'ai->fastmap'
while attaching ubi device if 'fm->used_blocks' is greater than 2, which
may cause warning from 'ubi_assert(ubi->good_peb_count == found_pebs)':

  UBI assert failed in ubi_wl_init at 1878 (pid 2409)
  Call Trace:
    ubi_wl_init.cold+0xae/0x2af [ubi]
    ubi_attach+0x1b0/0x780 [ubi]
    ubi_init+0x23a/0x3ad [ubi]
    load_module+0x22d2/0x2430

Reproduce:
  ID="0x20,0x33,0x00,0x00" # 16M 16KB PEB, 512 page
  modprobe nandsim id_bytes=$ID
  modprobe ubi mtd="0,0" fm_autoconvert  # Fastmap takes 2 pebs
  rmmod ubi
  modprobe ubi mtd="0,0" fm_autoconvert  # Attach by fastmap

Add all used fastmap pebs into list 'ai->fastmap' to make sure they can
be counted into 'found_pebs'.

Fixes: fdf10ed710c0aa ("ubi: Rework Fastmap attach base code")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
---
 drivers/mtd/ubi/fastmap.c | 35 +++++------------------------------
 1 file changed, 5 insertions(+), 30 deletions(-)

Comments

Richard Weinberger Jan. 10, 2022, 11:23 p.m. UTC | #1
----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <chengzhihao1@huawei.com>
> An: "richard" <richard@nod.at>, "Miquel Raynal" <miquel.raynal@bootlin.com>, "Vignesh Raghavendra" <vigneshr@ti.com>,
> "mcoquelin stm32" <mcoquelin.stm32@gmail.com>, "kirill shutemov" <kirill.shutemov@linux.intel.com>, "Sascha Hauer"
> <s.hauer@pengutronix.de>
> CC: "linux-mtd" <linux-mtd@lists.infradead.org>, "linux-kernel" <linux-kernel@vger.kernel.org>
> Gesendet: Montag, 27. Dezember 2021 04:22:43
> Betreff: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

> Fastmap pebs(pnum >= UBI_FM_MAX_START) won't be added into 'ai->fastmap'
> while attaching ubi device if 'fm->used_blocks' is greater than 2, which
> may cause warning from 'ubi_assert(ubi->good_peb_count == found_pebs)':
> 
>  UBI assert failed in ubi_wl_init at 1878 (pid 2409)
>  Call Trace:
>    ubi_wl_init.cold+0xae/0x2af [ubi]
>    ubi_attach+0x1b0/0x780 [ubi]
>    ubi_init+0x23a/0x3ad [ubi]
>    load_module+0x22d2/0x2430
> 
> Reproduce:
>  ID="0x20,0x33,0x00,0x00" # 16M 16KB PEB, 512 page
>  modprobe nandsim id_bytes=$ID
>  modprobe ubi mtd="0,0" fm_autoconvert  # Fastmap takes 2 pebs
>  rmmod ubi
>  modprobe ubi mtd="0,0" fm_autoconvert  # Attach by fastmap
> 
> Add all used fastmap pebs into list 'ai->fastmap' to make sure they can
> be counted into 'found_pebs'.
> 
> Fixes: fdf10ed710c0aa ("ubi: Rework Fastmap attach base code")
> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
> ---
> drivers/mtd/ubi/fastmap.c | 35 +++++------------------------------
> 1 file changed, 5 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c
> index 6b5f1ffd961b..01dcdd94c9d2 100644
> --- a/drivers/mtd/ubi/fastmap.c
> +++ b/drivers/mtd/ubi/fastmap.c
> /**
>  * ubi_scan_fastmap - scan the fastmap.
>  * @ubi: UBI device object
> @@ -865,7 +847,6 @@ int ubi_scan_fastmap(struct ubi_device *ubi, struct
> ubi_attach_info *ai,
> 	struct ubi_vid_hdr *vh;
> 	struct ubi_ec_hdr *ech;
> 	struct ubi_fastmap_layout *fm;
> -	struct ubi_ainf_peb *aeb;
> 	int i, used_blocks, pnum, fm_anchor, ret = 0;
> 	size_t fm_size;
> 	__be32 crc, tmp_crc;
> @@ -875,17 +856,6 @@ int ubi_scan_fastmap(struct ubi_device *ubi, struct
> ubi_attach_info *ai,
> 	if (fm_anchor < 0)
> 		return UBI_NO_FASTMAP;
> 
> -	/* Copy all (possible) fastmap blocks into our new attach structure. */
> -	list_for_each_entry(aeb, &scan_ai->fastmap, u.list) {
> -		struct ubi_ainf_peb *new;
> -
> -		new = clone_aeb(ai, aeb);
> -		if (!new)
> -			return -ENOMEM;
> -
> -		list_add(&new->u.list, &ai->fastmap);
> -	}
> -

scan_ai->fastmap may contain also old fastmap PEBs.
In the area < UBI_FM_MAX_START you can find outdated fastmap PEBs.
e.g. after power-cut.
That's why scan_ai->fastmap is copied into ai->fastmap.
Later in ubi_wl_init() these outdated PEBs will get erased.
So, you cannot remove this code.

But I fully agree with you that the fm->used_blocks > 1 case is not correct.
I fear if scan_ai->fastmap contains old fastmap PEBs and fm->used_blocks is > 1
we need to fall back to scanning mode while attaching.

Thanks,
//richard
Zhihao Cheng Jan. 11, 2022, 2:48 a.m. UTC | #2
Hi Richard,
> scan_ai->fastmap may contain also old fastmap PEBs.
> In the area < UBI_FM_MAX_START you can find outdated fastmap PEBs.
> e.g. after power-cut.
> That's why scan_ai->fastmap is copied into ai->fastmap.
> Later in ubi_wl_init() these outdated PEBs will get erased.
> So, you cannot remove this code.
I thought old fastmap PEBs(async erase works in ubi_update_fastmap()) 
will be counted into erase PEBs in the next attaching process, because I 
saw following code snippet in ubi_write_fastmap():
1260         list_for_each_entry(ubi_wrk, &ubi->works, list) { 

1261                 if (ubi_is_erase_work(ubi_wrk)) { 

1262                         wl_e = ubi_wrk->e; 

1263                         ubi_assert(wl_e); 

1264 

1265                         fec = (struct ubi_fm_ec *)(fm_raw + 
fm_pos);
1266 

1267                         fec->pnum = cpu_to_be32(wl_e->pnum); 

1268                         set_seen(ubi, wl_e->pnum, seen_pebs); 

1269                         fec->ec = cpu_to_be32(wl_e->ec); 

1270 

1271                         erase_peb_count++; 

1272                         fm_pos += sizeof(*fec); 

1273                         ubi_assert(fm_pos <= ubi->fm_size); 

1274                 } 

1275         } 

1276         fmh->erase_peb_count = cpu_to_be32(erase_peb_count);
Half-writing on fastmap will be recognized in scanning, and UBI 
fallbacks full scanning, So, I come up with two situations:
1. power-cut before new fastmap written, the old fastmap is completely 
saved until next attaching, and some free PEBs are written with new 
fastmap data. Luckly, fastmap anchor PEB's vid header is written first 
of all, bad fastmap will be returned by ubi_attach_fastmap() in next 
attaching.
2. power-cut after new fastmap written, the old fastmap PEBs will be 
added into 'ai->erase' list in next attaching.
Did I miss other possible circumstances?
> 
> But I fully agree with you that the fm->used_blocks > 1 case is not correct.
> I fear if scan_ai->fastmap contains old fastmap PEBs and fm->used_blocks is > 1
> we need to fall back to scanning mode while attaching.
Richard Weinberger Jan. 11, 2022, 7:27 a.m. UTC | #3
----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <chengzhihao1@huawei.com>
> An: "richard" <richard@nod.at>
> CC: "Miquel Raynal" <miquel.raynal@bootlin.com>, "Vignesh Raghavendra" <vigneshr@ti.com>, "mcoquelin stm32"
> <mcoquelin.stm32@gmail.com>, "kirill shutemov" <kirill.shutemov@linux.intel.com>, "Sascha Hauer"
> <s.hauer@pengutronix.de>, "linux-mtd" <linux-mtd@lists.infradead.org>, "linux-kernel" <linux-kernel@vger.kernel.org>
> Gesendet: Dienstag, 11. Januar 2022 03:48:24
> Betreff: Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

> Hi Richard,
>> scan_ai->fastmap may contain also old fastmap PEBs.
>> In the area < UBI_FM_MAX_START you can find outdated fastmap PEBs.
>> e.g. after power-cut.
>> That's why scan_ai->fastmap is copied into ai->fastmap.
>> Later in ubi_wl_init() these outdated PEBs will get erased.
>> So, you cannot remove this code.
> I thought old fastmap PEBs(async erase works in ubi_update_fastmap())
> will be counted into erase PEBs in the next attaching process, because I
> saw following code snippet in ubi_write_fastmap():
> 1260         list_for_each_entry(ubi_wrk, &ubi->works, list) {
> 
> 1261                 if (ubi_is_erase_work(ubi_wrk)) {
> 
> 1262                         wl_e = ubi_wrk->e;
> 
> 1263                         ubi_assert(wl_e);
> 
> 1264
> 
> 1265                         fec = (struct ubi_fm_ec *)(fm_raw +
> fm_pos);
> 1266
> 
> 1267                         fec->pnum = cpu_to_be32(wl_e->pnum);
> 
> 1268                         set_seen(ubi, wl_e->pnum, seen_pebs);
> 
> 1269                         fec->ec = cpu_to_be32(wl_e->ec);
> 
> 1270
> 
> 1271                         erase_peb_count++;
> 
> 1272                         fm_pos += sizeof(*fec);
> 
> 1273                         ubi_assert(fm_pos <= ubi->fm_size);
> 
> 1274                 }
> 
> 1275         }
> 
> 1276         fmh->erase_peb_count = cpu_to_be32(erase_peb_count);
> Half-writing on fastmap will be recognized in scanning, and UBI
> fallbacks full scanning, So, I come up with two situations:
> 1. power-cut before new fastmap written, the old fastmap is completely
> saved until next attaching, and some free PEBs are written with new
> fastmap data. Luckly, fastmap anchor PEB's vid header is written first
> of all, bad fastmap will be returned by ubi_attach_fastmap() in next
> attaching.
> 2. power-cut after new fastmap written, the old fastmap PEBs will be
> added into 'ai->erase' list in next attaching.
> Did I miss other possible circumstances?

In ubi_wl_init() there is another corner case documented:
                        /*
                         * The fastmap update code might not find a free PEB for
                         * writing the fastmap anchor to and then reuses the
                         * current fastmap anchor PEB. When this PEB gets erased
                         * and a power cut happens before it is written again we
                         * must make sure that the fastmap attach code doesn't
                         * find any outdated fastmap anchors, hence we erase the
                         * outdated fastmap anchor PEBs synchronously here.
                         */
                        if (aeb->vol_id == UBI_FM_SB_VOLUME_ID)
                                sync = true;

So ubi_wl_init() makes sure that all old fastmap anchors get erased before UBI
starts to operate. With your change this is no longer satisfied.

Thanks,
//richard
Zhihao Cheng Jan. 11, 2022, 11:44 a.m. UTC | #4
Hi Richard,
> In ubi_wl_init() there is another corner case documented:
>                          /*
>                           * The fastmap update code might not find a free PEB for
>                           * writing the fastmap anchor to and then reuses the
>                           * current fastmap anchor PEB. When this PEB gets erased
>                           * and a power cut happens before it is written again we
>                           * must make sure that the fastmap attach code doesn't
>                           * find any outdated fastmap anchors, hence we erase the
>                           * outdated fastmap anchor PEBs synchronously here.
>                           */
>                          if (aeb->vol_id == UBI_FM_SB_VOLUME_ID)
>                                  sync = true;
> 
> So ubi_wl_init() makes sure that all old fastmap anchors get erased before UBI
> starts to operate. With your change this is no longer satisfied
I seem to understand the another case. But I'm still confused that why 
outdated fastmap PEBs cannot be erased. When UBI comes to this point, it 
means UBI is attached by **full scanning mode**, scan_fast() returns two 
values:
   UBI_NO_FASTMAP: scan all pebs from pnum UBI_FM_MAX_START, ai is 
assigned with scan_ai, at last, all fastmap pebs are added into 
'ai->fastmap'
   UBI_BAD_FASTMAP: scan all pebs from pnum 0, all fastmap pebs are 
added into 'ai->fastmap'
Richard Weinberger Jan. 11, 2022, 11:57 a.m. UTC | #5
----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <chengzhihao1@huawei.com>
> An: "richard" <richard@nod.at>
> CC: "Miquel Raynal" <miquel.raynal@bootlin.com>, "Vignesh Raghavendra" <vigneshr@ti.com>, "mcoquelin stm32"
> <mcoquelin.stm32@gmail.com>, "kirill shutemov" <kirill.shutemov@linux.intel.com>, "Sascha Hauer"
> <s.hauer@pengutronix.de>, "linux-mtd" <linux-mtd@lists.infradead.org>, "linux-kernel" <linux-kernel@vger.kernel.org>
> Gesendet: Dienstag, 11. Januar 2022 12:44:49
> Betreff: Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

> Hi Richard,
>> In ubi_wl_init() there is another corner case documented:
>>                          /*
>>                           * The fastmap update code might not find a free PEB for
>>                           * writing the fastmap anchor to and then reuses the
>>                           * current fastmap anchor PEB. When this PEB gets erased
>>                           * and a power cut happens before it is written again we
>>                           * must make sure that the fastmap attach code doesn't
>>                           * find any outdated fastmap anchors, hence we erase the
>>                           * outdated fastmap anchor PEBs synchronously here.
>>                           */
>>                          if (aeb->vol_id == UBI_FM_SB_VOLUME_ID)
>>                                  sync = true;
>> 
>> So ubi_wl_init() makes sure that all old fastmap anchors get erased before UBI
>> starts to operate. With your change this is no longer satisfied
> I seem to understand the another case. But I'm still confused that why
> outdated fastmap PEBs cannot be erased. When UBI comes to this point, it
> means UBI is attached by **full scanning mode**, scan_fast() returns two
> values:
>   UBI_NO_FASTMAP: scan all pebs from pnum UBI_FM_MAX_START, ai is
> assigned with scan_ai, at last, all fastmap pebs are added into
> 'ai->fastmap'
>   UBI_BAD_FASTMAP: scan all pebs from pnum 0, all fastmap pebs are
> added into 'ai->fastmap'

I fear, I'm unable to follow your thoughts.
ubi_wl_init() is called in both cases, with and without fastmap.
And ai->fastmap contains all anchor PEBs that scan_fast() found.
This can be the most recent but also outdated anchor PEBs.

Thanks,
//richard
Zhihao Cheng Jan. 11, 2022, 1:23 p.m. UTC | #6
在 2022/1/11 19:57, Richard Weinberger 写道:
> ubi_wl_init() is called in both cases, with and without fastmap.
I agree.

> And ai->fastmap contains all anchor PEBs that scan_fast() found.
> This can be the most recent but also outdated anchor PEBs.
Is it exists a case that outdated fastmap PEBs are neither counted into 
'fmhdr->erase_peb_count' nor scanned into 'ai->fastmap' after attaching 
by fastmap.

1853                         if (ubi->lookuptbl[aeb->pnum]) 

1854                                 continue; 

1855 

1856                         /* 

1857                          * The fastmap update code might not find a 
free PEB for
1858                          * writing the fastmap anchor to and then 
reuses the
1859                          * current fastmap anchor PEB. When this 
PEB gets erased
1860                          * and a power cut happens before it is 
written again we
1861                          * must make sure that the fastmap attach 
code doesn't
1862                          * find any outdated fastmap anchors, hence 
we erase the
1863                          * outdated fastmap anchor PEBs 
synchronously here.
1864                          */ 

1865                         if (aeb->vol_id == UBI_FM_SB_VOLUME_ID) 

1866                                 sync = true;

I think UBI attaches failed by fastmap if kernel goes here.
1870                         err = erase_aeb(ubi, aeb, sync);
Richard Weinberger Jan. 11, 2022, 1:56 p.m. UTC | #7
----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <chengzhihao1@huawei.com>
> An: "richard" <richard@nod.at>
> CC: "Miquel Raynal" <miquel.raynal@bootlin.com>, "Vignesh Raghavendra" <vigneshr@ti.com>, "mcoquelin stm32"
> <mcoquelin.stm32@gmail.com>, "kirill shutemov" <kirill.shutemov@linux.intel.com>, "Sascha Hauer"
> <s.hauer@pengutronix.de>, "linux-mtd" <linux-mtd@lists.infradead.org>, "linux-kernel" <linux-kernel@vger.kernel.org>
> Gesendet: Dienstag, 11. Januar 2022 14:23:52
> Betreff: Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

> 在 2022/1/11 19:57, Richard Weinberger 写道:
>> ubi_wl_init() is called in both cases, with and without fastmap.
> I agree.
> 
>> And ai->fastmap contains all anchor PEBs that scan_fast() found.
>> This can be the most recent but also outdated anchor PEBs.
> Is it exists a case that outdated fastmap PEBs are neither counted into
> 'fmhdr->erase_peb_count' nor scanned into 'ai->fastmap' after attaching
> by fastmap.
> 

[...]
 
> I think UBI attaches failed by fastmap if kernel goes here.
> 1870                         err = erase_aeb(ubi, aeb, sync);

Hmm, I think the paranoia check in fastmap.c is too strict these days.
        if (WARN_ON(count_fastmap_pebs(ai) != ubi->peb_count -
                    ai->bad_peb_count - fm->used_blocks))
                goto fail_bad;

It does not account ai->fastmap. So if ai->fastmap contains old anchor PEBs
this check will trigger and force falling back to scanning mode.
With this check fixed, ubi_wl_init() will erase all old PEBs from ai->fastmap.

So I agree that this code path is wonky and can be cleaned up. But please be
extremely careful and give all your changes excessive testing with real workload
and power-cuts.

Thanks,
//richard
Zhihao Cheng Jan. 12, 2022, 3:46 a.m. UTC | #8
>>> ubi_wl_init() is called in both cases, with and without fastmap.
>> I agree.
>>
>>> And ai->fastmap contains all anchor PEBs that scan_fast() found.
>>> This can be the most recent but also outdated anchor PEBs.
>> Is it exists a case that outdated fastmap PEBs are neither counted into
>> 'fmhdr->erase_peb_count' nor scanned into 'ai->fastmap' after attaching
>> by fastmap.
>>
> 
> [...]
>   
>> I think UBI attaches failed by fastmap if kernel goes here.
>> 1870                         err = erase_aeb(ubi, aeb, sync);
> 
> Hmm, I think the paranoia check in fastmap.c is too strict these days.
>          if (WARN_ON(count_fastmap_pebs(ai) != ubi->peb_count -
>                      ai->bad_peb_count - fm->used_blocks))
>                  goto fail_bad;
> 
> It does not account ai->fastmap. So if ai->fastmap contains old anchor PEBs
> this check will trigger and force falling back to scanning mode.
> With this check fixed, ubi_wl_init() will erase all old PEBs from ai->fastmap.
Forgive my stubbornness, I think this strict check is good, could you 
show me a process to trigger this WARN_ON, it would be nice to provide a 
reproducer.
I still insist the point(after my fix patch applied): All outdated 
fastmap PEBs are added into 'ai->fastmap'(full scanning case) or counted 
into 'fmhdr->erase_peb_count'(fast attached case).
> 
> So I agree that this code path is wonky and can be cleaned up. But please be
> extremely careful and give all your changes excessive testing with real workload
> and power-cuts.Let's list all power-cut cases during fastmap updateing(I tried to 
simulate some of them on nandsim), in theory, I think the WARN_ON check 
is okay and all outdated fastmap PEBs can be erased in next attaching:

ubi_update_fastmap:
1565         for (i = 1; i < new_fm->used_blocks; i++) {
1570                 if (!tmp_e) {
1571                         if (old_fm && old_fm->e[i]) {
                              /* sync erase old fm data PEBs */
power-cut!!!
1585                         } else {
                              /* async erase old fm data PEBs */
power-cut!!!
1595                         }
1596                 } else {
1599                         if (old_fm && old_fm->e[i]) {
			     /* async erase old fm data PEBs */
power-cut!!!
1603                         }
1604                 }
1605         }
1621         if (old_fm) {
1623                 if (!tmp_e) {
  		     /* sync erase old fm anchor PEB */
power-cut!!!
1638                 } else {
		     /* async erase old fm anchor PEB */
power-cut!!!
1644                 }
1645         } else {
1658         }
1660         ret = ubi_write_fastmap(ubi, new_fm);

ubi_write_fastmap:
1324         ret = ubi_io_write_vid_hdr(ubi, new_fm->e[0]->pnum, avbuf);
power-cut!!!
1345         for (i = 1; i < new_fm->used_blocks; i++) {
1350                 ret = ubi_io_write_vid_hdr(...);
power-cut!!!
1356         }

1358         for (i = 0; i < new_fm->used_blocks; i++) {
1359                 ret = ubi_io_write_data(...),
power-cut!!!
1366         }
Richard Weinberger Jan. 14, 2022, 6:45 p.m. UTC | #9
----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <chengzhihao1@huawei.com>
> An: "richard" <richard@nod.at>
> CC: "Miquel Raynal" <miquel.raynal@bootlin.com>, "Vignesh Raghavendra" <vigneshr@ti.com>, "mcoquelin stm32"
> <mcoquelin.stm32@gmail.com>, "kirill shutemov" <kirill.shutemov@linux.intel.com>, "Sascha Hauer"
> <s.hauer@pengutronix.de>, "linux-mtd" <linux-mtd@lists.infradead.org>, "linux-kernel" <linux-kernel@vger.kernel.org>
> Gesendet: Mittwoch, 12. Januar 2022 04:46:28
> Betreff: Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

>>>> ubi_wl_init() is called in both cases, with and without fastmap.
>>> I agree.
>>>
>>>> And ai->fastmap contains all anchor PEBs that scan_fast() found.
>>>> This can be the most recent but also outdated anchor PEBs.
>>> Is it exists a case that outdated fastmap PEBs are neither counted into
>>> 'fmhdr->erase_peb_count' nor scanned into 'ai->fastmap' after attaching
>>> by fastmap.
>>>
>> 
>> [...]
>>   
>>> I think UBI attaches failed by fastmap if kernel goes here.
>>> 1870                         err = erase_aeb(ubi, aeb, sync);
>> 
>> Hmm, I think the paranoia check in fastmap.c is too strict these days.
>>          if (WARN_ON(count_fastmap_pebs(ai) != ubi->peb_count -
>>                      ai->bad_peb_count - fm->used_blocks))
>>                  goto fail_bad;
>> 
>> It does not account ai->fastmap. So if ai->fastmap contains old anchor PEBs
>> this check will trigger and force falling back to scanning mode.
>> With this check fixed, ubi_wl_init() will erase all old PEBs from ai->fastmap.
> Forgive my stubbornness, I think this strict check is good, could you
> show me a process to trigger this WARN_ON, it would be nice to provide a
> reproducer.

You can trigger this by interrupting UBI.
e.g. When UBI writes a new fastmap to the NAND, it schedules the old fastmap
PEBs for erasure. PEB erasure is asynchronous in UBI. So this can be delayed
for a very long time.
While developing UBI fastmap and performing powercut tests I saw this often
on targets.

> I still insist the point(after my fix patch applied): All outdated
> fastmap PEBs are added into 'ai->fastmap'(full scanning case) or counted
> into 'fmhdr->erase_peb_count'(fast attached case).

Yes. But if you look into ubi_wl_init() you see that fastmap anchor PEBs
get erases synchronously(!). The comment before the erasure explains why.
To complicate things, this code is currently unreachable because the WARN_ON()
is not right. I misses to count ai->fastmap.
So, when there are old fastmap PEBs found, the counter does not match
and UBI falls back to full scanning while it could to an attach by fastmap.

Fastmap is full with corner cases that have been found by massive amount of testing, sadly.

Thanks,
//richard
Zhihao Cheng Jan. 15, 2022, 8:22 a.m. UTC | #10
>>>           if (WARN_ON(count_fastmap_pebs(ai) != ubi->peb_count -
>>>                       ai->bad_peb_count - fm->used_blocks))
>>>                   goto fail_bad;
>>>
>>> It does not account ai->fastmap. So if ai->fastmap contains old anchor PEBs
>>> this check will trigger and force falling back to scanning mode.
>>> With this check fixed, ubi_wl_init() will erase all old PEBs from ai->fastmap.
>> Forgive my stubbornness, I think this strict check is good, could you
>> show me a process to trigger this WARN_ON, it would be nice to provide a
>> reproducer.
> 
> You can trigger this by interrupting UBI.
> e.g. When UBI writes a new fastmap to the NAND, it schedules the old fastmap
> PEBs for erasure. PEB erasure is asynchronous in UBI. So this can be delayed
> for a very long time.
I tried with following modifications:

diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c
index 01dcdd94c9d2..253e76e2a7d7 100644
--- a/drivers/mtd/ubi/fastmap.c
+++ b/drivers/mtd/ubi/fastmap.c
@@ -679,6 +679,7 @@ static int ubi_attach_fastmap(struct ubi_device *ubi,
  		if (fm_pos >= fm_size)
  			goto fail_bad;

+		pr_err("%s: add %d to erase list\n", __func__, be32_to_cpu(fmec->pnum));
  		ret = add_aeb(ai, &ai->erase, be32_to_cpu(fmec->pnum),
  			      be32_to_cpu(fmec->ec), 1);
  		if (ret)
@@ -1103,6 +1104,7 @@ void ubi_fastmap_destroy_checkmap(struct 
ubi_volume *vol)
   *
   * Returns 0 on success, < 0 indicates an internal error.
   */
+#include <linux/delay.h>
  static int ubi_write_fastmap(struct ubi_device *ubi,
  			     struct ubi_fastmap_layout *new_fm)
  {
@@ -1148,6 +1150,8 @@ static int ubi_write_fastmap(struct ubi_device *ubi,
  		goto out_free_dvbuf;
  	}

+	pr_err("%s: wait all erase workers deleted from list\n", __func__);
+	msleep(500);
  	spin_lock(&ubi->volumes_lock);
  	spin_lock(&ubi->wl_lock);

@@ -1196,6 +1200,8 @@ static int ubi_write_fastmap(struct ubi_device *ubi,

  	ubi_for_each_free_peb(ubi, wl_e, tmp_rb) {
  		fec = (struct ubi_fm_ec *)(fm_raw + fm_pos);
+		if (global_old_fm_peb == wl_e->pnum)
+			pr_err("%s: count peb %d as free(cannot happen)!!!\n", __func__, 
wl_e->pnum);

  		fec->pnum = cpu_to_be32(wl_e->pnum);
  		set_seen(ubi, wl_e->pnum, seen_pebs);
@@ -1208,6 +1214,8 @@ static int ubi_write_fastmap(struct ubi_device *ubi,
  	if (ubi->fm_next_anchor) {
  		fec = (struct ubi_fm_ec *)(fm_raw + fm_pos);

+		if (global_old_fm_peb == ubi->fm_next_anchor->pnum)
+			pr_err("%s: count peb %d as next anchor(cannot happen)!!!\n", 
__func__, ubi->fm_next_anchor->pnum);
  		fec->pnum = cpu_to_be32(ubi->fm_next_anchor->pnum);
  		set_seen(ubi, ubi->fm_next_anchor->pnum, seen_pebs);
  		fec->ec = cpu_to_be32(ubi->fm_next_anchor->ec);
@@ -1264,6 +1272,8 @@ static int ubi_write_fastmap(struct ubi_device *ubi,

  			fec = (struct ubi_fm_ec *)(fm_raw + fm_pos);

+			if (wl_e->pnum == global_old_fm_peb)
+				pr_err("%s: count fm anchor %d in erase work(should happen)!!!\n", 
__func__, wl_e->pnum);
  			fec->pnum = cpu_to_be32(wl_e->pnum);
  			set_seen(ubi, wl_e->pnum, seen_pebs);
  			fec->ec = cpu_to_be32(wl_e->ec);
@@ -1520,12 +1530,14 @@ static void return_fm_pebs(struct ubi_device *ubi,
   *
   * Returns 0 on success, < 0 indicates an internal error.
   */
+int global_old_fm_peb = -1;
  int ubi_update_fastmap(struct ubi_device *ubi)
  {
  	int ret, i, j;
  	struct ubi_fastmap_layout *new_fm, *old_fm;
  	struct ubi_wl_entry *tmp_e;

+	pr_err("%s: start\n", __func__);
  	down_write(&ubi->fm_protect);
  	down_write(&ubi->work_sem);
  	down_write(&ubi->fm_eba_sem);
@@ -1634,6 +1646,9 @@ int ubi_update_fastmap(struct ubi_device *ubi)
  			/* we've got a new anchor PEB, return the old one */
  			ubi_wl_put_fm_peb(ubi, old_fm->e[0], 0,
  					  old_fm->to_be_tortured[0]);
+			global_old_fm_peb = old_fm->e[0]->pnum;
+			pr_err("%s: put old fastmap peb %d\n", __func__, old_fm->e[0]->pnum);
+			/* Not check return value ? Fault injection may cause the WARNON() 
too! */
  			new_fm->e[0] = tmp_e;
  			old_fm->e[0] = NULL;
  		}
@@ -1665,6 +1680,7 @@ int ubi_update_fastmap(struct ubi_device *ubi)

  	ubi_ensure_anchor_pebs(ubi);

+	pr_err("%s: done\n", __func__);
  	return ret;

  err:
diff --git a/drivers/mtd/ubi/ubi.h b/drivers/mtd/ubi/ubi.h
index 7c083ad58274..17df8dcebbe9 100644
--- a/drivers/mtd/ubi/ubi.h
+++ b/drivers/mtd/ubi/ubi.h
@@ -37,6 +37,7 @@
  #define UBI_NAME_STR "ubi"

  struct ubi_device;
+extern int global_old_fm_peb;

  /* Normal UBI messages */
  __printf(2, 3)
diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c
index 8455f1d47f3c..a148280f4ce8 100644
--- a/drivers/mtd/ubi/wl.c
+++ b/drivers/mtd/ubi/wl.c
@@ -198,8 +198,12 @@ static int do_work(struct ubi_device *ubi)
  	 * the queue flush code has to be sure the whole queue of works is
  	 * done, and it takes the mutex in write mode.
  	 */
+	if (global_old_fm_peb != -1)
+		pr_err("-----\n");
  	down_read(&ubi->work_sem);
  	spin_lock(&ubi->wl_lock);
+	if (global_old_fm_peb != -1)
+		pr_err("=====\n");
  	if (list_empty(&ubi->works)) {
  		spin_unlock(&ubi->wl_lock);
  		up_read(&ubi->work_sem);
@@ -1070,6 +1074,7 @@ static int ensure_wear_leveling(struct ubi_device 
*ubi, int nested)
   * needed. Returns zero in case of success and a negative error code 
in case of
   * failure.
   */
+#include <linux/delay.h>
  static int __erase_worker(struct ubi_device *ubi, struct ubi_work *wl_wrk)
  {
  	struct ubi_wl_entry *e = wl_wrk->e;
@@ -1078,6 +1083,12 @@ static int __erase_worker(struct ubi_device *ubi, 
struct ubi_work *wl_wrk)
  	int lnum = wl_wrk->lnum;
  	int err, available_consumed = 0;

+	if (pnum == global_old_fm_peb) {
+		pr_err("%s: erase worker(erase %d) has been deleted from list!", 
__func__, pnum);
+		pr_err("Dump ubi image to file\n");
+		ubi_ro_mode(ubi);
+	}
+
  	dbg_wl("erase PEB %d EC %d LEB %d:%d",
  	       pnum, e->ec, wl_wrk->vol_id, wl_wrk->lnum);

modprobe nandsim id_bytes="0xec,0xa1,0x00,0x15"
modprobe ubi mtd="0,2048" fm_autoconvert
sleep 1   # Wait until all erase works done
ubimkvol -N vol_a -m -n 0 /dev/ubi0  # trigger fastmap updating
dd if=/dev/mtd0 of=flash
rmmod ubi
nandwrite /dev/mtd0 flash > /dev/null
modprobe ubi mtd="0,2048" fm_autoconvert

Kernel will print:
[13198.624433] ubi_update_fastmap: start
[13198.625164] ubi_write_fastmap: wait all erase workers deleted from list
[13199.134011] ubi_update_fastmap: done		# first attaching triggers 
wearleveling
[13199.623438] ubi_update_fastmap: start	# volume creation triggers 
updating fastmap
[13199.624732] ubi_update_fastmap: put old fastmap peb 2	# schedule old 
fastmap PEB to erase
[13199.624757] -----
[13199.626532] ubi_write_fastmap: wait all erase workers deleted from list
[13200.133059] ubi_write_fastmap: count fm anchor 2 in erase work(should 
happen)!!!	# Count PEB(in erase work) into 'fmh->erase_peb_count'
[13200.136996] ubi_update_fastmap: done
[13200.137000] =====
[13200.138833] __erase_worker: erase worker(erase 2) has been deleted 
from list!
[13200.138842] Dump ubi image to file
[13203.091720] ubi0: attaching mtd0
[13203.111179] ubi_attach_fastmap: add 2 to erase list
[13203.113013] ubi0: attached by fastmap

I think the WARNON cannot be triggered because old fastmap PEBs are 
always counted into 'fmh->erase_peb_count', and 'ubi->work_sem' 
serilizes ubi_update_fastmap() and do_work():
             ubi_update_fastmap                       ubi_thread
     down_write(&ubi->work_sem)
       ubi_wl_put_fm_peb(old fastmap PEB)
         schedule_erase
           list_add_tail(&wrk->list, &ubi->works)
                                                        do_work
 
down_read(&ubi->work_sem)
                                                          // Stuck here!
       ubi_write_fastmap
         list_for_each_entry(ubi_wrk, &ubi->works, list)
           fec->pnum = cpu_to_be32(wl_e->pnum)
     up_write(&ubi->work_sem)
 
list_del(&wrk->list)
                                                          erase_worker

> While developing UBI fastmap and performing powercut tests I saw this often
> on targets.
Was commit 2e8f08deabbc7ee("ubi: Fix races around ubi_refill_pools()") 
applied at that time? The WANRON can be triggered without this commit, 
because this commit makes sure do_work() cannot happen between 
ubi_wl_put_fm_peb() and ubi_write_fastmap().
> 
>> I still insist the point(after my fix patch applied): All outdated
>> fastmap PEBs are added into 'ai->fastmap'(full scanning case) or counted
>> into 'fmhdr->erase_peb_count'(fast attached case).
> 
> Yes. But if you look into ubi_wl_init() you see that fastmap anchor PEBs
> get erases synchronously(!). The comment before the erasure explains why.
About erasing fastmap anchor PEB synchronously, I admit curreunt UBI 
implementation cannot satisfy it, even with my fix applied. Wait, it 
seems that UBI has never made it sure. Old fastmap PEBs could be erased 
asynchronously, they could be counted into 'fmh->erase_peb_count' even 
in early UBI implementation code, so old fastmap anchor PEB will be 
added into 'ai->erase' and be erased asynchronously in next attaching. 
But, I feel it is not a problem, find_fm_anchor() can help us find out 
the right fastmap anchor PEB according seqnum.
> To complicate things, this code is currently unreachable because the WARN_ON()
> is not right. I misses to count ai->fastmap.
> So, when there are old fastmap PEBs found, the counter does not match
> and UBI falls back to full scanning while it could to an attach by fastmap.
>I think the mainly cause of failed fastmap attaching is half-written of 
fastmap. The counter check(eg. WARNON) is okay.
> Fastmap is full with corner cases that have been found by massive amount of testing, sadly.
Zhihao Cheng Jan. 15, 2022, 8:46 a.m. UTC | #11
>> Yes. But if you look into ubi_wl_init() you see that fastmap anchor PEBs
>> get erases synchronously(!). The comment before the erasure explains why.
> About erasing fastmap anchor PEB synchronously, I admit curreunt UBI 
> implementation cannot satisfy it, even with my fix applied. Wait, it 
> seems that UBI has never made it sure. Old fastmap PEBs could be erased 
> asynchronously, they could be counted into 'fmh->erase_peb_count' even 
> in early UBI implementation code, so old fastmap anchor PEB will be 
> added into 'ai->erase' and be erased asynchronously in next attaching. 
In next attaching old fastmap PEBs will be processed as following:
ubi_attach_fastmap -> add_aeb(ai, &ai->erase...)
ubi_wl_init
   list_for_each_entry_safe(aeb, tmp, &ai->erase)
     erase_aeb       // erase asynchronously
       ubi->lookuptbl[e->pnum] = e
   list_for_each_entry(aeb, &ai->fastmap, u.list)
     e = ubi_find_fm_block(ubi, aeb->pnum)
     if (e) {
        ...
     } else {
       if (ubi->lookuptbl[aeb->pnum])     // old fastmap PEBs are 
assigned to 'ubi->lookuptbl'
         continue;
     }
> But, I feel it is not a problem, find_fm_anchor() can help us find out 
> the right fastmap anchor PEB according seqnum.
Richard Weinberger Jan. 15, 2022, 10:01 a.m. UTC | #12
----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <chengzhihao1@huawei.com>
> An: "richard" <richard@nod.at>
> CC: "Miquel Raynal" <miquel.raynal@bootlin.com>, "Vignesh Raghavendra" <vigneshr@ti.com>, "mcoquelin stm32"
> <mcoquelin.stm32@gmail.com>, "kirill shutemov" <kirill.shutemov@linux.intel.com>, "Sascha Hauer"
> <s.hauer@pengutronix.de>, "linux-mtd" <linux-mtd@lists.infradead.org>, "linux-kernel" <linux-kernel@vger.kernel.org>
> Gesendet: Samstag, 15. Januar 2022 09:46:07
> Betreff: Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

>>> Yes. But if you look into ubi_wl_init() you see that fastmap anchor PEBs
>>> get erases synchronously(!). The comment before the erasure explains why.
>> About erasing fastmap anchor PEB synchronously, I admit curreunt UBI
>> implementation cannot satisfy it, even with my fix applied. Wait, it
>> seems that UBI has never made it sure. Old fastmap PEBs could be erased
>> asynchronously, they could be counted into 'fmh->erase_peb_count' even
>> in early UBI implementation code, so old fastmap anchor PEB will be
>> added into 'ai->erase' and be erased asynchronously in next attaching.
> In next attaching old fastmap PEBs will be processed as following:
> ubi_attach_fastmap -> add_aeb(ai, &ai->erase...)
> ubi_wl_init
>   list_for_each_entry_safe(aeb, tmp, &ai->erase)
>     erase_aeb       // erase asynchronously
>       ubi->lookuptbl[e->pnum] = e
>   list_for_each_entry(aeb, &ai->fastmap, u.list)
>     e = ubi_find_fm_block(ubi, aeb->pnum)
>     if (e) {
>        ...
>     } else {
>       if (ubi->lookuptbl[aeb->pnum])     // old fastmap PEBs are
> assigned to 'ubi->lookuptbl'
>         continue;
>     }
>> But, I feel it is not a problem, find_fm_anchor() can help us find out
> > the right fastmap anchor PEB according seqnum.

FYI, I think I understand now our disagreement.
You assume that old Fastmap PEBs are *guaranteed* to be part of Fastmap's erase list.
That's okay and this is what Linux as of today does.

My point is that we need to be paranoid and check carefully for old Fastmap PEBs
which might be *not* on the erase list.
I saw such issues in the wild. These were causes by old and/or buggy Fastmap
implementations.
Keep in mind that Linux is not the only system that implements UBI (and fastmap).

So let me give the whole situation another thought on how to improve it.
I totally agree with you that currently there is a problem with fm->used_blocks > 1.
I'm just careful (maybe too careful) about changing Fastmap code.

Thanks,
//richard
Zhihao Cheng Jan. 17, 2022, 1:31 a.m. UTC | #13
> FYI, I think I understand now our disagreement.
> You assume that old Fastmap PEBs are *guaranteed* to be part of Fastmap's erase list.
> That's okay and this is what Linux as of today does.
> 
> My point is that we need to be paranoid and check carefully for old Fastmap PEBs
> which might be *not* on the erase list.
> I saw such issues in the wild. These were causes by old and/or buggy Fastmap
> implementations.
> Keep in mind that Linux is not the only system that implements UBI (and fastmap).
Uh, that is really a point, I met UBI implemented in Vxworks ever. Now, 
you convinced me, we should process fastmap with considering bad 
images(caused by other implementations). Let's keep this wonky assertion 
until a better fix.
> 
> So let me give the whole situation another thought on how to improve it.
> I totally agree with you that currently there is a problem with fm->used_blocks > 1.
> I'm just careful (maybe too careful) about changing Fastmap code.
> 
> Thanks,
> //richard
> .
>
Zhihao Cheng Jan. 17, 2022, 2:52 a.m. UTC | #14
Hi Richard,
>> FYI, I think I understand now our disagreement.
>> You assume that old Fastmap PEBs are *guaranteed* to be part of 
>> Fastmap's erase list.
>> That's okay and this is what Linux as of today does.
>>
>> My point is that we need to be paranoid and check carefully for old 
>> Fastmap PEBs
>> which might be *not* on the erase list.
>> I saw such issues in the wild. These were causes by old and/or buggy 
>> Fastmap
>> implementations.
>> Keep in mind that Linux is not the only system that implements UBI 
>> (and fastmap).
> Uh, that is really a point, I met UBI implemented in Vxworks ever. Now, 
> you convinced me, we should process fastmap with considering bad 
> images(caused by other implementations). Let's keep this wonky assertion 
> until a better fix.
>>
>> So let me give the whole situation another thought on how to improve it.
>> I totally agree with you that currently there is a problem with 
>> fm->used_blocks > 1.
>> I'm just careful (maybe too careful) about changing Fastmap code.
>>I reconsider the WARNON, it can recognize the bad images and fall back 
full scanning mode early. If linux kernel encounters the WARNON, it 
means something wrong with your image(caused by bad UBI implementation). 
I begin to like this strict check.
After comparing WARNON with the assertion:
   WARN_ON(count_fastmap_pebs(ai) != ubi->peb_count - ai->bad_peb_count 
- fm->used_blocks)
   ubi_assert(ubi->good_peb_count == found_pebs)
The 'found_pebs' consists of 'ai->erase', 'ai->free', 'ai->volumes' and 
'ai->fastmap'. The count_fastmap_pebs() includes 'ai->erase', 'ai->free' 
and 'ai->volumes'. This means the number of 'ai->fastmap' equals to 
'fm->used_blocks'. So, the 'ai->fastmap' adding position in my fix is right.
In a word, can you accept the point that 'WARNON' can help us recognize 
the bad fastmap images?
diff mbox series

Patch

diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c
index 6b5f1ffd961b..01dcdd94c9d2 100644
--- a/drivers/mtd/ubi/fastmap.c
+++ b/drivers/mtd/ubi/fastmap.c
@@ -828,24 +828,6 @@  static int find_fm_anchor(struct ubi_attach_info *ai)
 	return ret;
 }
 
-static struct ubi_ainf_peb *clone_aeb(struct ubi_attach_info *ai,
-				      struct ubi_ainf_peb *old)
-{
-	struct ubi_ainf_peb *new;
-
-	new = ubi_alloc_aeb(ai, old->pnum, old->ec);
-	if (!new)
-		return NULL;
-
-	new->vol_id = old->vol_id;
-	new->sqnum = old->sqnum;
-	new->lnum = old->lnum;
-	new->scrub = old->scrub;
-	new->copy_flag = old->copy_flag;
-
-	return new;
-}
-
 /**
  * ubi_scan_fastmap - scan the fastmap.
  * @ubi: UBI device object
@@ -865,7 +847,6 @@  int ubi_scan_fastmap(struct ubi_device *ubi, struct ubi_attach_info *ai,
 	struct ubi_vid_hdr *vh;
 	struct ubi_ec_hdr *ech;
 	struct ubi_fastmap_layout *fm;
-	struct ubi_ainf_peb *aeb;
 	int i, used_blocks, pnum, fm_anchor, ret = 0;
 	size_t fm_size;
 	__be32 crc, tmp_crc;
@@ -875,17 +856,6 @@  int ubi_scan_fastmap(struct ubi_device *ubi, struct ubi_attach_info *ai,
 	if (fm_anchor < 0)
 		return UBI_NO_FASTMAP;
 
-	/* Copy all (possible) fastmap blocks into our new attach structure. */
-	list_for_each_entry(aeb, &scan_ai->fastmap, u.list) {
-		struct ubi_ainf_peb *new;
-
-		new = clone_aeb(ai, aeb);
-		if (!new)
-			return -ENOMEM;
-
-		list_add(&new->u.list, &ai->fastmap);
-	}
-
 	down_write(&ubi->fm_protect);
 	memset(ubi->fm_buf, 0, ubi->fm_size);
 
@@ -1029,6 +999,11 @@  int ubi_scan_fastmap(struct ubi_device *ubi, struct ubi_attach_info *ai,
 				"err: %i)", i, pnum, ret);
 			goto free_hdr;
 		}
+
+		/* Add all fastmap blocks into attach structure. */
+		ret = add_aeb(ai, &ai->fastmap, pnum, be64_to_cpu(ech->ec), 0);
+		if (ret)
+			goto free_hdr;
 	}
 
 	kfree(fmsb);