diff mbox series

base-files: fix sysupgrade with ubi and kernel sharing partition

Message ID 20220429143757.2691490-1-bjorn@mork.no
State Superseded
Delegated to: Daniel Golle
Headers show
Series base-files: fix sysupgrade with ubi and kernel sharing partition | expand

Commit Message

Bjørn Mork April 29, 2022, 2:37 p.m. UTC
Commit ecbcc0b59551 bricks devices where a ubi rootfs and a raw
kernel shares the same mtd partition.

This is the case for the ZyXEL NR7101 for example.  The OEM bootloader
has no UBI support.  OpenWrt splits the "Kernel" mtd partition in a raw
kernel part used by the bootloader and a UBI part used for the OpenWrt
rootfs and rootfs_data.  Running mtd erase on this partition during
during upgrade erases the UBI part and results in a soft brick.

Fixes: ecbcc0b59551 ("base-files: safer sysupgrade.tar for kernel-out-of-UBI")
Signed-off-by: Bjørn Mork <bjorn@mork.no>
---

I'm not sure what the proper fix for this is.  I believe the intended
functionality of commit ecbcc0b59551 should be re-implemented somehow.

I guess the real bug might be the dual usage of this partition?  But
I'm pretty sure I found that as an example in some other device,
without being able to point it out now.

But I believe we need this immediate fix in any case, since this is a
bricking regression.


 package/base-files/files/lib/upgrade/nand.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Lanchon May 2, 2022, 7:51 a.m. UTC | #1
hi,

sorry for the delay. I didn't expect that kind of sharing.

your fix is not enough: when later the partition is written, it is via 
'mtd -n write' which expects an erased partition.

i will do a PR ASAP to fix this by invalidating the start of the kernel 
partition instead of erasing it, this is enough to invalidate the kernel 
CRC.

thanks!



On 4/29/22 11:37, Bjørn Mork wrote:
> Commit ecbcc0b59551 bricks devices where a ubi rootfs and a raw
> kernel shares the same mtd partition.
>
> This is the case for the ZyXEL NR7101 for example.  The OEM bootloader
> has no UBI support.  OpenWrt splits the "Kernel" mtd partition in a raw
> kernel part used by the bootloader and a UBI part used for the OpenWrt
> rootfs and rootfs_data.  Running mtd erase on this partition during
> during upgrade erases the UBI part and results in a soft brick.
>
> Fixes: ecbcc0b59551 ("base-files: safer sysupgrade.tar for kernel-out-of-UBI")
> Signed-off-by: Bjørn Mork <bjorn@mork.no>
> ---
>
> I'm not sure what the proper fix for this is.  I believe the intended
> functionality of commit ecbcc0b59551 should be re-implemented somehow.
>
> I guess the real bug might be the dual usage of this partition?  But
> I'm pretty sure I found that as an example in some other device,
> without being able to point it out now.
>
> But I believe we need this immediate fix in any case, since this is a
> bricking regression.
>
>
>   package/base-files/files/lib/upgrade/nand.sh | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/package/base-files/files/lib/upgrade/nand.sh b/package/base-files/files/lib/upgrade/nand.sh
> index 5ecdb0ff2363..5e5607d35cd8 100644
> --- a/package/base-files/files/lib/upgrade/nand.sh
> +++ b/package/base-files/files/lib/upgrade/nand.sh
> @@ -305,7 +305,7 @@ nand_upgrade_tar() {
>   	local ubi_kernel_length
>   	if [ "$kernel_length" ]; then
>   		if [ "$kernel_mtd" ]; then
> -			mtd erase "$CI_KERNPART"
> +			: # mtd erase "$CI_KERNPART"
>   		else
>   			ubi_kernel_length="$kernel_length"
>   		fi
Bjørn Mork May 2, 2022, 8:10 a.m. UTC | #2
Lanchon <lanchon@gmail.com> writes:

> hi,
>
> sorry for the delay. I didn't expect that kind of sharing.

I can understand that. I have not been able to find any other examples,
so I have to take full responsibility for this unexpected configuration.
I guess I thought it would make sense to have a "Kernel" partition large
enough to allow reverting to OEM firmware.

But I recently found out that the bootloader can be configured to boot
from "Kernel2", which means that we can document a "revert to OEM"
procedure using that partition instead.  Will fail if/when we add dual
image support to OpenWrt though.

Another alternative is of course to add another partition for the
OpenWrt kernel (pointed to by $CI_KERNPART), keeping the "Kernel" as an
unused container only.  Maybe the safest solution?


> your fix is not enough: when later the partition is written, it is via
> 'mtd -n write' which expects an erased partition.
>
> i will do a PR ASAP to fix this by invalidating the start of the
> kernel partition instead of erasing it, this is enough to invalidate
> the kernel CRC.

OK, thanks for taking care of this.


Bjørn
Sungbo Eo May 4, 2022, 3:09 p.m. UTC | #3
On 2022-05-02 17:10, Bjørn Mork wrote:
> Lanchon <lanchon@gmail.com> writes:
> 
>> hi,
>>
>> sorry for the delay. I didn't expect that kind of sharing.
> 
> I can understand that. I have not been able to find any other examples,
> so I have to take full responsibility for this unexpected configuration.
> I guess I thought it would make sense to have a "Kernel" partition large
> enough to allow reverting to OEM firmware.
> 
> But I recently found out that the bootloader can be configured to boot
> from "Kernel2", which means that we can document a "revert to OEM"
> procedure using that partition instead.  Will fail if/when we add dual
> image support to OpenWrt though.
> 
> Another alternative is of course to add another partition for the
> OpenWrt kernel (pointed to by $CI_KERNPART), keeping the "Kernel" as an
> unused container only.  Maybe the safest solution?

How about using partition nesting here? I've started to use it for NAND
devices recently. One example:
https://github.com/openwrt/openwrt/blob/65258f5d6093/target/linux/ramips/dts/mt7621_iptime_t5004.dts#L62-L79

Sungbo
Bjørn Mork May 4, 2022, 3:25 p.m. UTC | #4
Sungbo Eo <mans0n@gorani.run> writes:
> On 2022-05-02 17:10, Bjørn Mork wrote:
>
>> Another alternative is of course to add another partition for the
>> OpenWrt kernel (pointed to by $CI_KERNPART), keeping the "Kernel" as an
>> unused container only.  Maybe the safest solution?
>
> How about using partition nesting here? I've started to use it for NAND
> devices recently. One example:
> https://github.com/openwrt/openwrt/blob/65258f5d6093/target/linux/ramips/dts/mt7621_iptime_t5004.dts#L62-L79

Yes, that was what I meant. I believe it's a good solution.


Bjørn
Lanchon May 4, 2022, 7:53 p.m. UTC | #5
On 5/4/22 12:09, Sungbo Eo wrote:
> On 2022-05-02 17:10, Bjørn Mork wrote:
>>
>> I can understand that. I have not been able to find any other examples,
>> so I have to take full responsibility for this unexpected configuration.
>> I guess I thought it would make sense to have a "Kernel" partition large
>> enough to allow reverting to OEM firmware.
>>
>> But I recently found out that the bootloader can be configured to boot
>> from "Kernel2", which means that we can document a "revert to OEM"
>> procedure using that partition instead.  Will fail if/when we add dual
>> image support to OpenWrt though.
>>
>> Another alternative is of course to add another partition for the
>> OpenWrt kernel (pointed to by $CI_KERNPART), keeping the "Kernel" as an
>> unused container only.  Maybe the safest solution?
> How about using partition nesting here? I've started to use it for NAND
> devices recently. One example:
> https://github.com/openwrt/openwrt/blob/65258f5d6093/target/linux/ramips/dts/mt7621_iptime_t5004.dts#L62-L79
>
> Sungbo

that's interesting, i didn't know that was possible.

in my extensive experience modding the partitioning of devices, which
encompasses exactly *one* device, this is what i take home:

in order to facilitate return to stock, i prefer keeping the complete set
of OEM partitions and add new, overlapping partitions. the deprecated OEM
partitions (those with their space repurposed) are renamed to
"stock_<OEM-name>" and marked read-write. also the new partitions are
added at the end of the list, so that OEM partition numbering is not
affected and scripts designed to work on them work under OEM and custom
kernels. this was important on my device which included a partition with
MACs and stuff that had to be relocated to consolidate reusable space.

nested partitioning is interesting but does not generalize: it can't
coalesce partitions. it would also not conserve the partition numbering.

it seems partition nesting is a great mechanism to support reading and
writing a set of related partitions in one go, and applies to NOR. for
NAND, where bad block skipping occurs, this doesn't work. and nesting
doesn't appear to offer any advantage over plain overlapping partitions,
which is a more general solution.


PS. the original concern of this thread has been worked around by
patching the code (merged into master).
diff mbox series

Patch

diff --git a/package/base-files/files/lib/upgrade/nand.sh b/package/base-files/files/lib/upgrade/nand.sh
index 5ecdb0ff2363..5e5607d35cd8 100644
--- a/package/base-files/files/lib/upgrade/nand.sh
+++ b/package/base-files/files/lib/upgrade/nand.sh
@@ -305,7 +305,7 @@  nand_upgrade_tar() {
 	local ubi_kernel_length
 	if [ "$kernel_length" ]; then
 		if [ "$kernel_mtd" ]; then
-			mtd erase "$CI_KERNPART"
+			: # mtd erase "$CI_KERNPART"
 		else
 			ubi_kernel_length="$kernel_length"
 		fi