Message ID | 20220429143757.2691490-1-bjorn@mork.no |
---|---|
State | Superseded |
Delegated to: | Daniel Golle |
Headers | show |
Series | base-files: fix sysupgrade with ubi and kernel sharing partition | expand |
hi, sorry for the delay. I didn't expect that kind of sharing. your fix is not enough: when later the partition is written, it is via 'mtd -n write' which expects an erased partition. i will do a PR ASAP to fix this by invalidating the start of the kernel partition instead of erasing it, this is enough to invalidate the kernel CRC. thanks! On 4/29/22 11:37, Bjørn Mork wrote: > Commit ecbcc0b59551 bricks devices where a ubi rootfs and a raw > kernel shares the same mtd partition. > > This is the case for the ZyXEL NR7101 for example. The OEM bootloader > has no UBI support. OpenWrt splits the "Kernel" mtd partition in a raw > kernel part used by the bootloader and a UBI part used for the OpenWrt > rootfs and rootfs_data. Running mtd erase on this partition during > during upgrade erases the UBI part and results in a soft brick. > > Fixes: ecbcc0b59551 ("base-files: safer sysupgrade.tar for kernel-out-of-UBI") > Signed-off-by: Bjørn Mork <bjorn@mork.no> > --- > > I'm not sure what the proper fix for this is. I believe the intended > functionality of commit ecbcc0b59551 should be re-implemented somehow. > > I guess the real bug might be the dual usage of this partition? But > I'm pretty sure I found that as an example in some other device, > without being able to point it out now. > > But I believe we need this immediate fix in any case, since this is a > bricking regression. > > > package/base-files/files/lib/upgrade/nand.sh | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/package/base-files/files/lib/upgrade/nand.sh b/package/base-files/files/lib/upgrade/nand.sh > index 5ecdb0ff2363..5e5607d35cd8 100644 > --- a/package/base-files/files/lib/upgrade/nand.sh > +++ b/package/base-files/files/lib/upgrade/nand.sh > @@ -305,7 +305,7 @@ nand_upgrade_tar() { > local ubi_kernel_length > if [ "$kernel_length" ]; then > if [ "$kernel_mtd" ]; then > - mtd erase "$CI_KERNPART" > + : # mtd erase "$CI_KERNPART" > else > ubi_kernel_length="$kernel_length" > fi
Lanchon <lanchon@gmail.com> writes: > hi, > > sorry for the delay. I didn't expect that kind of sharing. I can understand that. I have not been able to find any other examples, so I have to take full responsibility for this unexpected configuration. I guess I thought it would make sense to have a "Kernel" partition large enough to allow reverting to OEM firmware. But I recently found out that the bootloader can be configured to boot from "Kernel2", which means that we can document a "revert to OEM" procedure using that partition instead. Will fail if/when we add dual image support to OpenWrt though. Another alternative is of course to add another partition for the OpenWrt kernel (pointed to by $CI_KERNPART), keeping the "Kernel" as an unused container only. Maybe the safest solution? > your fix is not enough: when later the partition is written, it is via > 'mtd -n write' which expects an erased partition. > > i will do a PR ASAP to fix this by invalidating the start of the > kernel partition instead of erasing it, this is enough to invalidate > the kernel CRC. OK, thanks for taking care of this. Bjørn
On 2022-05-02 17:10, Bjørn Mork wrote: > Lanchon <lanchon@gmail.com> writes: > >> hi, >> >> sorry for the delay. I didn't expect that kind of sharing. > > I can understand that. I have not been able to find any other examples, > so I have to take full responsibility for this unexpected configuration. > I guess I thought it would make sense to have a "Kernel" partition large > enough to allow reverting to OEM firmware. > > But I recently found out that the bootloader can be configured to boot > from "Kernel2", which means that we can document a "revert to OEM" > procedure using that partition instead. Will fail if/when we add dual > image support to OpenWrt though. > > Another alternative is of course to add another partition for the > OpenWrt kernel (pointed to by $CI_KERNPART), keeping the "Kernel" as an > unused container only. Maybe the safest solution? How about using partition nesting here? I've started to use it for NAND devices recently. One example: https://github.com/openwrt/openwrt/blob/65258f5d6093/target/linux/ramips/dts/mt7621_iptime_t5004.dts#L62-L79 Sungbo
Sungbo Eo <mans0n@gorani.run> writes: > On 2022-05-02 17:10, Bjørn Mork wrote: > >> Another alternative is of course to add another partition for the >> OpenWrt kernel (pointed to by $CI_KERNPART), keeping the "Kernel" as an >> unused container only. Maybe the safest solution? > > How about using partition nesting here? I've started to use it for NAND > devices recently. One example: > https://github.com/openwrt/openwrt/blob/65258f5d6093/target/linux/ramips/dts/mt7621_iptime_t5004.dts#L62-L79 Yes, that was what I meant. I believe it's a good solution. Bjørn
On 5/4/22 12:09, Sungbo Eo wrote: > On 2022-05-02 17:10, Bjørn Mork wrote: >> >> I can understand that. I have not been able to find any other examples, >> so I have to take full responsibility for this unexpected configuration. >> I guess I thought it would make sense to have a "Kernel" partition large >> enough to allow reverting to OEM firmware. >> >> But I recently found out that the bootloader can be configured to boot >> from "Kernel2", which means that we can document a "revert to OEM" >> procedure using that partition instead. Will fail if/when we add dual >> image support to OpenWrt though. >> >> Another alternative is of course to add another partition for the >> OpenWrt kernel (pointed to by $CI_KERNPART), keeping the "Kernel" as an >> unused container only. Maybe the safest solution? > How about using partition nesting here? I've started to use it for NAND > devices recently. One example: > https://github.com/openwrt/openwrt/blob/65258f5d6093/target/linux/ramips/dts/mt7621_iptime_t5004.dts#L62-L79 > > Sungbo that's interesting, i didn't know that was possible. in my extensive experience modding the partitioning of devices, which encompasses exactly *one* device, this is what i take home: in order to facilitate return to stock, i prefer keeping the complete set of OEM partitions and add new, overlapping partitions. the deprecated OEM partitions (those with their space repurposed) are renamed to "stock_<OEM-name>" and marked read-write. also the new partitions are added at the end of the list, so that OEM partition numbering is not affected and scripts designed to work on them work under OEM and custom kernels. this was important on my device which included a partition with MACs and stuff that had to be relocated to consolidate reusable space. nested partitioning is interesting but does not generalize: it can't coalesce partitions. it would also not conserve the partition numbering. it seems partition nesting is a great mechanism to support reading and writing a set of related partitions in one go, and applies to NOR. for NAND, where bad block skipping occurs, this doesn't work. and nesting doesn't appear to offer any advantage over plain overlapping partitions, which is a more general solution. PS. the original concern of this thread has been worked around by patching the code (merged into master).
diff --git a/package/base-files/files/lib/upgrade/nand.sh b/package/base-files/files/lib/upgrade/nand.sh index 5ecdb0ff2363..5e5607d35cd8 100644 --- a/package/base-files/files/lib/upgrade/nand.sh +++ b/package/base-files/files/lib/upgrade/nand.sh @@ -305,7 +305,7 @@ nand_upgrade_tar() { local ubi_kernel_length if [ "$kernel_length" ]; then if [ "$kernel_mtd" ]; then - mtd erase "$CI_KERNPART" + : # mtd erase "$CI_KERNPART" else ubi_kernel_length="$kernel_length" fi
Commit ecbcc0b59551 bricks devices where a ubi rootfs and a raw kernel shares the same mtd partition. This is the case for the ZyXEL NR7101 for example. The OEM bootloader has no UBI support. OpenWrt splits the "Kernel" mtd partition in a raw kernel part used by the bootloader and a UBI part used for the OpenWrt rootfs and rootfs_data. Running mtd erase on this partition during during upgrade erases the UBI part and results in a soft brick. Fixes: ecbcc0b59551 ("base-files: safer sysupgrade.tar for kernel-out-of-UBI") Signed-off-by: Bjørn Mork <bjorn@mork.no> --- I'm not sure what the proper fix for this is. I believe the intended functionality of commit ecbcc0b59551 should be re-implemented somehow. I guess the real bug might be the dual usage of this partition? But I'm pretty sure I found that as an example in some other device, without being able to point it out now. But I believe we need this immediate fix in any case, since this is a bricking regression. package/base-files/files/lib/upgrade/nand.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)