diff mbox series

kernel: fix mtd partition erase<parent_erasesize writing to wrong address

Message ID 20200805211354.3922-1-git@johnthomson.fastmail.com.au
State Accepted
Delegated to: Adrian Schmutzler
Headers show
Series kernel: fix mtd partition erase<parent_erasesize writing to wrong address | expand

Commit Message

John Thomson Aug. 5, 2020, 9:13 p.m. UTC
This bug applied where mtd partition end address,
or erase start address, was not cleanly divisible by parent mtd erasesize.

This error would cause the bits following the end of the partition
to the next erasesize block boundary to be erased,
and this partition-overflow data to be written to the partition erase
address (missing additional partition offset address)
of the mtd (top) parent device.

Signed-off-by: John Thomson <git@johnthomson.fastmail.com.au>

--

4.19 also requires this fix

A little discussion here:
https://github.com/openwrt/openwrt/pull/3103#issuecomment-667610510

mtdpart.c should be made to work with 4K erase sectors, where available
Considering this here:
https://github.com/openwrt/openwrt/pull/3271
---
 .../411-mtd-partial_eraseblock_write.patch          | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

Comments

John Thomson Aug. 7, 2020, 7:51 a.m. UTC | #1
On Wed, 5 Aug 2020, at 21:13, John Thomson wrote:

> -@@ -213,6 +262,24 @@ static int part_erase(struct mtd_info *m
> 	if (instr->fail_addr != MTD_FAIL_ADDR_UNKNOWN)
>   		instr->fail_addr -= part->offset;

I should modify this to avoid the _write, if the _erase fails before the write address?

> 	if (mtd->flags & MTD_ERASE_PARTIAL) {
> 		if (partial_start) {
> 			part->parent->_write(part->parent,
Tomasz Maciej Nowak Aug. 14, 2020, 11:04 a.m. UTC | #2
Hi John.
W dniu 05.08.2020 o 23:13, John Thomson pisze:
> This bug applied where mtd partition end address,
> or erase start address, was not cleanly divisible by parent mtd erasesize.
> 
> This error would cause the bits following the end of the partition
> to the next erasesize block boundary to be erased,
> and this partition-overflow data to be written to the partition erase
> address (missing additional partition offset address)
> of the mtd (top) parent device.
> 
> Signed-off-by: John Thomson <git@johnthomson.fastmail.com.au>

Yay!
The description in this patch matches the symptoms I described Year ago in this bug report:
https://bugs.openwrt.org/index.php?do=details&task_id=2428
and this patch fixes it, Thank You very much.

Tested-by: Tomasz Maciej Nowak <tomek_n@o2.pl>
Fixes: FS#2428

> 
> --
> 
> 4.19 also requires this fix
> 
> A little discussion here:
> https://github.com/openwrt/openwrt/pull/3103#issuecomment-667610510
> 
> mtdpart.c should be made to work with 4K erase sectors, where available
> Considering this here:
> https://github.com/openwrt/openwrt/pull/3271
> ---
>  .../411-mtd-partial_eraseblock_write.patch          | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/target/linux/generic/pending-5.4/411-mtd-partial_eraseblock_write.patch b/target/linux/generic/pending-5.4/411-mtd-partial_eraseblock_write.patch
> index b46c3f5ed4..c48a144d3d 100644
> --- a/target/linux/generic/pending-5.4/411-mtd-partial_eraseblock_write.patch
> +++ b/target/linux/generic/pending-5.4/411-mtd-partial_eraseblock_write.patch
> @@ -19,7 +19,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
>   /* Our partition linked list */
>   static LIST_HEAD(mtd_partitions);
>   static DEFINE_MUTEX(mtd_partitions_mutex);
> -@@ -206,6 +208,53 @@ static int part_erase(struct mtd_info *m
> +@@ -206,11 +208,77 @@ static int part_erase(struct mtd_info *m
>   {
>   	struct mtd_part *part = mtd_to_part(mtd);
>   	int ret;
> @@ -73,10 +73,9 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
>   
>   	instr->addr += part->offset;
>   	ret = part->parent->_erase(part->parent, instr);
> -@@ -213,6 +262,24 @@ static int part_erase(struct mtd_info *m
> + 	if (instr->fail_addr != MTD_FAIL_ADDR_UNKNOWN)
>   		instr->fail_addr -= part->offset;
> - 	instr->addr -= part->offset;
> - 
> ++
>  +	if (mtd->flags & MTD_ERASE_PARTIAL) {
>  +		if (partial_start) {
>  +			part->parent->_write(part->parent,
> @@ -95,10 +94,10 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
>  +		kfree(erase_buf);
>  +	}
>  +
> - 	return ret;
> - }
> + 	instr->addr -= part->offset;
>   
> -@@ -525,19 +592,22 @@ static struct mtd_part *allocate_partiti
> + 	return ret;
> +@@ -525,19 +593,22 @@ static struct mtd_part *allocate_partiti
>   	remainder = do_div(tmp, wr_alignment);
>   	if ((slave->mtd.flags & MTD_WRITEABLE) && remainder) {
>   		/* Doesn't start on a boundary of major erase size */
>
Adrian Schmutzler Sept. 19, 2020, 11:09 a.m. UTC | #3
Hi,

> -----Original Message-----
> From: openwrt-devel [mailto:openwrt-devel-bounces@lists.openwrt.org]
> On Behalf Of John Thomson
> Sent: Mittwoch, 5. August 2020 23:14
> To: openwrt-devel@lists.openwrt.org
> Cc: John Thomson <git@johnthomson.fastmail.com.au>
> Subject: [PATCH] kernel: fix mtd partition erase<parent_erasesize writing to
> wrong address
> 
> This bug applied where mtd partition end address, or erase start address,
> was not cleanly divisible by parent mtd erasesize.
> 
> This error would cause the bits following the end of the partition to the next
> erasesize block boundary to be erased, and this partition-overflow data to be
> written to the partition erase address (missing additional partition offset
> address) of the mtd (top) parent device.

We had a longer discussion about that on IRC yesterday, where we also discussed (stylistically) different solutions.

As it appears, kernel did a major overhaul of mtdpart.c in a more recent major kernel version:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=46b5889cc2c54bac7d7e727a44d28a298df23cef

So, any cosmetic improvement would probably be in vain anyway. Therefore, I decided to merge the patch in its current (tested) state.

However, this essentially means we will meet the same problem again with the next kernel bump, and it won't become easier.
In the discussion, jow had the idea to circumvent these problems and handle the non-aligned partitions in user-space. Maybe one can exploit the mtd-rw tool (available as kmod-mtd-rw in packages repo) for unlocking the relevant areas and then do everything else in user-space.

If that seems suitable, we could use the merged patch for the 20.xx release, and then migrate master to a better solution when we have one.

I put the discussion below in case somebody feels inspired to play around with that (first half is the general discussion, second half mentions the user-space solution).

Best

Adrian

---

[...]
12:31 jow: adrianschmutzler: I took a look at the difference and in hindsight it seems to make a lot of sense
12:33 jow: adrianschmutzler: plain diff: (Link: https://pastebin.com/AqE89TXB)https://pastebin.com/AqE89TXB - code context: (Link: https://pastebin.com/Hn2ny4Lt)https://pastebin.com/Hn2ny4Lt
12:34 jow: adrianschmutzler: so the "instr->addr -= part->offset;" address manipulation is moved after the _write() which makes sense since we're supposed to both erase and (partially) re-write the same block
12:34 jow: (my second paste shows the old, not the patched code)
12:35 jow: the way the code is right now, it would erase the entire block partly occupied by a partition, then overwrite the first block of the mtd device
12:35 jow: this is my interpretation at least, I am not an mtd subsystem expert
12:38 adrianschmutzler: okay, thanks. will have a look at the older kernel now, as in the PR it is called "a merge bug  in 4.19 & 5.4" ...
12:39 jow: yeah, my first thought was the same, probably a merge/patch rebase quirk
12:41 jow: tbh, I'd refactor the code while at it, directly substract instr->addr after part->parent->_erase(part->parent, instr); like it is done in the current broken code and explicitly pass instr->addr + part->offset and instr->addr + instr->len + part->offset respectively to part->parent->_write() to mirror what is done for mtd_read() a few further lines up
12:41 adrianschmutzler: unfortunately upstream also doesn't seem to be interested in commenting on the patches
12:43 jow: mtd_read() and part->parent->_write() take explicit offsets as arguments (part->offset + instr->addr + ...) while part->parent->_erase() uses the ->addr member from instr, hence the address manipulation
12:43 jow: and I'd revert the address to its original value as soon as possible and continue passing calculated offsets later
12:44 adrianschmutzler: jow: So, it's actually bad design after all?
12:45 jow: this would be my fix on top of the current broken code: (Link: https://pastebin.com/diff/EjUdXshw)https://pastebin.com/diff/EjUdXshw
12:45 adrianschmutzler: the need to increase/decrease instr->addr ...
12:47 adrianschmutzler: jow: okay. IIRC, they stirred around mtd in one of the recent kernel anyway, and also did some fairly similarly looking changes with offsets there as well
12:47 jow: I do not know enough about the struct erase_info and part->parent->_erase() semantics to understand why _erase() apparently uses instr->addr relative to the partition offset while part->parent->_write() expects an absolute offset relative to the mtd block
12:48 jow: that seems inconsitent to me, maybe the _erase() operates on a wrong offset as well
12:53 jow: adrianschmutzler: I took at the upstream code as well: (Link: https://lxr.missinglinkelectronics.com/linux/drivers/mtd/mtdpart.c#L197)https://lxr.missinglinkelectronics.com/linux/drivers/mtd/mtdpart.c#L197 and I think I got a better idea now. So the various part_*() functions expect an erase_info struct with partition-relative addresses, they then interally manipulate those addresses to be mtd-relative, invoke the lowlevel whole mtd ops, then revert the address back to their
12:53 jow: original partition-relative value
12:54 jow: so the originally proposed patch which simply moves the reversal (instr->addr -= part->offset;) directly before the `return ret;` is fine and matches the style of the kernel part_*() functions
12:58 jow: adrianschmutzler: just for completeness, and even better fix would be (Link: https://pastebin.com/diff/mFSb8Uqp)https://pastebin.com/diff/mFSb8Uqp
13:02 adrianschmutzler: jow: why is the additional mtd_to_part(mtd) superior?
13:03 jow: it should be optimized away by the compiler, the code is superior because the intention is clearer
13:03 jow: likewise the mtd_read() at the beginning could be changed to part_read()  and the `part->offset + ` removed in each invocation
13:04 jow: this way the only "address fiddling" that remains is in the original function body of part_erase()
13:04 jow: (we can't obviously call part_erase() from within part_erase() since that'd recurse forever)
13:05 jow: but this is all just a matter of style and taste, not sure what the subsystem maintainers would prefer
13:06 jow: I guess however that this partial erase will likely will never go upstream
13:06 adrianschmutzler: okay, I see your point now
13:06 adrianschmutzler: probably that's why they remained silent on the other issues
13:07 jow: I actually wonder why this can't be done in userspace
13:08 jow: or in wahtever code using the mtd api
13:08 jow: after all it is just a matter of reading an entire block, holding it in memory, erasing the entire block and then simply partially write back the parts we did not want to erase
13:09 jow: the only obstacle would be the write protection for non-eb aligned parts I guess
13:10 adrianschmutzler: no idea; but I will throw that idea towards the affected people, maybe they want to go into that direction
13:11 jow: in the meanwhile I'd say we should apply the proposed fix
13:11 jow: it makes sense
13:11 jow: and its not the submitters fault that the orginal code is not beautiful to begin with
13:11 adrianschmutzler: sysupgrade won't be a problem for a user-space solution, as other stuff there happens in user-space as well?
13:12 jow: if I remember correctly, that partial erase thing was introduced years ago to support sysupgrade on Redboot systems
13:13 adrianschmutzler: that's one of currently broken systems and the patch fixes them apparently
13:13 jow: there we wanted to write accross the FIS table dictated kernel/rootfs boundary using the mtd util and nbd implemented a kernel patch and corresponding functionality in mtd to do exactly that
13:13 jow: and I believe one of (the?) problem was that the kernel mtd subsystem write-protects all non-aligned partitions by default
13:14 adrianschmutzler: but that sounds more like the subsequent patch:
13:14 adrianschmutzler: 412-mtd-partial_eraseblock_unlock.patch
13:15 adrianschmutzler: which states it's for ixp4xx only, which probably isn't true anymore ...
13:16 jow: adrianschmutzler: I think both is needed
13:19 jow: adrianschmutzler: I think the partial erase was needed to rewrite the FIS partition while keeping the Redboot config (often both share one eraseblock)
13:20 adrianschmutzler: okay, then since this is tested and waiting some time already, and since part_write is removed with newer kernel anyway, I merge the existing patch.
13:20 jow: adrianschmutzler: and the partial erase unlock patch covers cases where the partition you manipulate is locked because its not spanning an entire eb
13:21 jow: at least this is my understanding
13:22 adrianschmutzler: jow: okay. but then the 412 patch is probably relevant to all devices with redboot partitions and the ixp4xx references should be removed from the description ...
13:22 jow: or rather, it allows unlocking non-aligned parts by simply tweaking the flash area begin and end offsets to start at the previous eb boundary and end on the next
13:22 jow: so you're not only unlocking the target area but a little bit of stuff before and after too
13:23 jow: otherwise the mtd subsystem would reject the unlock
13:24 adrianschmutzler: so, it's effectively a hack
13:24 jow: yes
13:24 adrianschmutzler: btw just found that Rafał wanted to remove these scripts already in March: (Link: https://patchwork.ozlabs.org/project/openwrt/patch/20200309114302.14383-1-zajec5@gmail.com/)https://patchwork.ozlabs.org/project/openwrt/patch/20200309114302.14383-1-zajec5@gmail.com/
13:25 adrianschmutzler: scripts->patches
13:26 jow: I think that if we somehow have a mechanism to forcibly unlock all flash areas (which could be done by a much leaner and more isolated patch I suppose) we could handle all the other edge cases in userspace
13:26 jow: by disregarding any partitioning and simply treating the entire mtd as continuous block device with offsets
13:27 jow: like you would e.g. dd an x86 image to the blockdev
13:28 jow: ah, just remebered this: (Link: https://github.com/jclehner/mtd-rw)https://github.com/jclehner/mtd-rw
13:28 jow: something like this could be a solution for sysupgrade
13:28 jow: not sure if its still compatible to 5.4 but if yes the we wouldn't even need a patch at all
13:29 adrianschmutzler: mtd-rw is in packages repo and works fine AFAIK
13:29 jow: a slightly less dangerous solution would be a kernel module or some other facility which would allow us to create mtd partitions on demand, not sure if something like this is possible
13:30 jow: one could thne simply "spwan" a new mtd partition for the flash area relevant for sysupgrade, then "mtd write" in there
13:30 adrianschmutzler: I just hesitate to include and enable a tool like that by default
13:30 jow: while "masking" other sensible parts like bootloader etc. to avoid bricks if something goes wrong
13:33 jow: hm well, yeah I understand your concerns
13:33 jow: but on x86 for example you can also happily dd over your entire disk
13:33 jow: granted, it is way easier to recover, but still...
13:34 adrianschmutzler: I will push that forward to the Mikrotik people anyway, as they seem to be the most affected at the moment. Maybe they have time to play around with idea going into that direction
13:34 adrianschmutzler: "Mikrotik people" -> contributors working with Mikrotik devices, not the vendor of course
13:37 jow: I could imagine that adding mtd-rw by defualt and adding some kind of sysctl to trigger it would be good enough
13:37 adrianschmutzler: together with Tomasz on the Redboot front there should be enough people interested
13:37 adrianschmutzler: hmm, I really have to think about that
13:37 jow: that sysctl could then be toggled by "mtd unlock" which never actually worked for me when I needed it
13:38 jow: or some other equivalent commend to not break existing "mtd unlock" users
13:38 jow: *command
13:38 adrianschmutzler: I fear this may end up in a lot of scripts then, where people will use it to take shortcuts ...
13:39 adrianschmutzler: but if it solves the problem, we should consider it
13:40 adrianschmutzler: because maintaining these patches won't become easier
13:41 jow: I agree
14:12 rmilecki: adrianschmutzler: jow: omg, that parial erase size crap again?
14:13 jow: rmilecki: we've been discussing its original intnetion and possible, non-invasive alternatives
14:14 jow: rmilecki: current working idea is to ship some kind of mtd unlock mechanism and handling the overlapping write/erase things in userspace
14:15 jow: rmilecki: (Link: https://github.com/jclehner/mtd-rw)https://github.com/jclehner/mtd-rw plus some clever scripting should already achive that
14:15 adrianschmutzler: rmilecki: the problem is that this partial erase stuff is needed for the redboot devices and all of the mikrotik ones as well (which currently all have "broken" sysupgrade)
14:16 rmilecki: the problem is it exists in a current form
14:16 rmilecki: afair it enables those partial wries for all ops, without checking if access is aligned or not
14:18 rmilecki: adrianschmutzler: i see jow posted few improvement ideas, do you want to implement them? if so, as part of submitted patch? or on top of that (later)?
14:21 adrianschmutzler: rmilecki: most of jow's improvement were about style. however, in (Link: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=46b5889cc2c54bac7d7e727a44d28a298df23cef)https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=46b5889cc2c54bac7d7e727a44d28a298df23cef kernel changes partition handling anyway
14:21 rmilecki: oh, that one is a hure change
14:21 adrianschmutzler: so, I don't think it's worth to care too much about style when we have to do a different implementation for next stable kernel anyway.
14:21 rmilecki: it's in some recent kernel though, right?
14:21 adrianschmutzler: and the existing patch is tested and known to be working on the affected devices
14:22 rmilecki: adrianschmutzler: ok, i agree, let's just make it work
14:22 adrianschmutzler: haven't checked explicitly, but somewhere after 5.4 of course
Thibaut Sept. 19, 2020, 11:46 a.m. UTC | #4
Hi,

Thanks for the summary.

I’m however under the impression that two distinct things are being confused here: partial erase and 4K_LIMIT.

Specifically, prior to the recent (4.17) changes in the mtd subsystem, Mikrotik devices made use of two separate hacks:

1) the ability to set different EB sizes (e.g. 64K or 4K) for different mtd partitions (which AIUI https://lists.infradead.org/pipermail/linux-mtd/2020-August/081636.html is trying to bring back to life)
2) the ability to perform partial erases when e.g. a small (4KB) partition with 4K EB was overlapping the boundary of an adjacent 64K EB

As far as MikroTik devices are concerned, the only reason we need 1) & 2) above is to be able to edit the bootloader configuration which is stored in “soft_config”, a 4K partition (registered with 4K EB) that is typically adjacent to a larger, read-only, bootloader (using 64K EB).

However, the other issue that has arisen and the reason for the sysupgrade borkage is somewhat different: until said recent changes, Mikrotik (and others AIUI) devices used the CONFIG_MTD_SPI_NOR_USE_4K_SECTORS_LIMIT hack so that the small r/w partition (such as soft_config) properly used 4K EB, while the larger partitions (e.g. rootfs) kept using 64K EB (for performance reasons). That hack is also broken.

On Mikrotik devices, the sysupgrade failures are due to the fact that when 4K_SECTORS is turned on, 4K_SECTORS_LIMIT is ignored and so _all_ partitions end up using 4K EB. However rootfs is properly 64K-aligned and thus can (and should) use 64K EB. That's also what the build system expects, which results in jffs2 failure to recover sysupgrade saved data.

That second problem is AIUI addressed by John’s attempt to fix upstream so that mtd-spi-nor-attempt-4K-erase.patch works.

HTH,
Thibaut

> On 19 Sep 2020, at 13:09, Adrian Schmutzler <mail@adrianschmutzler.de> wrote:
> 
> Hi,
> 
>> -----Original Message-----
>> From: openwrt-devel [mailto:openwrt-devel-bounces@lists.openwrt.org]
>> On Behalf Of John Thomson
>> Sent: Mittwoch, 5. August 2020 23:14
>> To: openwrt-devel@lists.openwrt.org
>> Cc: John Thomson <git@johnthomson.fastmail.com.au>
>> Subject: [PATCH] kernel: fix mtd partition erase<parent_erasesize writing to
>> wrong address
>> 
>> This bug applied where mtd partition end address, or erase start address,
>> was not cleanly divisible by parent mtd erasesize.
>> 
>> This error would cause the bits following the end of the partition to the next
>> erasesize block boundary to be erased, and this partition-overflow data to be
>> written to the partition erase address (missing additional partition offset
>> address) of the mtd (top) parent device.
> 
> We had a longer discussion about that on IRC yesterday, where we also discussed (stylistically) different solutions.
> 
> As it appears, kernel did a major overhaul of mtdpart.c in a more recent major kernel version:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=46b5889cc2c54bac7d7e727a44d28a298df23cef
> 
> So, any cosmetic improvement would probably be in vain anyway. Therefore, I decided to merge the patch in its current (tested) state.
> 
> However, this essentially means we will meet the same problem again with the next kernel bump, and it won't become easier.
> In the discussion, jow had the idea to circumvent these problems and handle the non-aligned partitions in user-space. Maybe one can exploit the mtd-rw tool (available as kmod-mtd-rw in packages repo) for unlocking the relevant areas and then do everything else in user-space.
> 
> If that seems suitable, we could use the merged patch for the 20.xx release, and then migrate master to a better solution when we have one.
> 
> I put the discussion below in case somebody feels inspired to play around with that (first half is the general discussion, second half mentions the user-space solution).
> 
> Best
> 
> Adrian
> 
> ---
> 
> [...]
> 12:31 jow: adrianschmutzler: I took a look at the difference and in hindsight it seems to make a lot of sense
> 12:33 jow: adrianschmutzler: plain diff: (Link: https://pastebin.com/AqE89TXB)https://pastebin.com/AqE89TXB - code context: (Link: https://pastebin.com/Hn2ny4Lt)https://pastebin.com/Hn2ny4Lt
> 12:34 jow: adrianschmutzler: so the "instr->addr -= part->offset;" address manipulation is moved after the _write() which makes sense since we're supposed to both erase and (partially) re-write the same block
> 12:34 jow: (my second paste shows the old, not the patched code)
> 12:35 jow: the way the code is right now, it would erase the entire block partly occupied by a partition, then overwrite the first block of the mtd device
> 12:35 jow: this is my interpretation at least, I am not an mtd subsystem expert
> 12:38 adrianschmutzler: okay, thanks. will have a look at the older kernel now, as in the PR it is called "a merge bug  in 4.19 & 5.4" ...
> 12:39 jow: yeah, my first thought was the same, probably a merge/patch rebase quirk
> 12:41 jow: tbh, I'd refactor the code while at it, directly substract instr->addr after part->parent->_erase(part->parent, instr); like it is done in the current broken code and explicitly pass instr->addr + part->offset and instr->addr + instr->len + part->offset respectively to part->parent->_write() to mirror what is done for mtd_read() a few further lines up
> 12:41 adrianschmutzler: unfortunately upstream also doesn't seem to be interested in commenting on the patches
> 12:43 jow: mtd_read() and part->parent->_write() take explicit offsets as arguments (part->offset + instr->addr + ...) while part->parent->_erase() uses the ->addr member from instr, hence the address manipulation
> 12:43 jow: and I'd revert the address to its original value as soon as possible and continue passing calculated offsets later
> 12:44 adrianschmutzler: jow: So, it's actually bad design after all?
> 12:45 jow: this would be my fix on top of the current broken code: (Link: https://pastebin.com/diff/EjUdXshw)https://pastebin.com/diff/EjUdXshw
> 12:45 adrianschmutzler: the need to increase/decrease instr->addr ...
> 12:47 adrianschmutzler: jow: okay. IIRC, they stirred around mtd in one of the recent kernel anyway, and also did some fairly similarly looking changes with offsets there as well
> 12:47 jow: I do not know enough about the struct erase_info and part->parent->_erase() semantics to understand why _erase() apparently uses instr->addr relative to the partition offset while part->parent->_write() expects an absolute offset relative to the mtd block
> 12:48 jow: that seems inconsitent to me, maybe the _erase() operates on a wrong offset as well
> 12:53 jow: adrianschmutzler: I took at the upstream code as well: (Link: https://lxr.missinglinkelectronics.com/linux/drivers/mtd/mtdpart.c#L197)https://lxr.missinglinkelectronics.com/linux/drivers/mtd/mtdpart.c#L197 and I think I got a better idea now. So the various part_*() functions expect an erase_info struct with partition-relative addresses, they then interally manipulate those addresses to be mtd-relative, invoke the lowlevel whole mtd ops, then revert the address back to their
> 12:53 jow: original partition-relative value
> 12:54 jow: so the originally proposed patch which simply moves the reversal (instr->addr -= part->offset;) directly before the `return ret;` is fine and matches the style of the kernel part_*() functions
> 12:58 jow: adrianschmutzler: just for completeness, and even better fix would be (Link: https://pastebin.com/diff/mFSb8Uqp)https://pastebin.com/diff/mFSb8Uqp
> 13:02 adrianschmutzler: jow: why is the additional mtd_to_part(mtd) superior?
> 13:03 jow: it should be optimized away by the compiler, the code is superior because the intention is clearer
> 13:03 jow: likewise the mtd_read() at the beginning could be changed to part_read()  and the `part->offset + ` removed in each invocation
> 13:04 jow: this way the only "address fiddling" that remains is in the original function body of part_erase()
> 13:04 jow: (we can't obviously call part_erase() from within part_erase() since that'd recurse forever)
> 13:05 jow: but this is all just a matter of style and taste, not sure what the subsystem maintainers would prefer
> 13:06 jow: I guess however that this partial erase will likely will never go upstream
> 13:06 adrianschmutzler: okay, I see your point now
> 13:06 adrianschmutzler: probably that's why they remained silent on the other issues
> 13:07 jow: I actually wonder why this can't be done in userspace
> 13:08 jow: or in wahtever code using the mtd api
> 13:08 jow: after all it is just a matter of reading an entire block, holding it in memory, erasing the entire block and then simply partially write back the parts we did not want to erase
> 13:09 jow: the only obstacle would be the write protection for non-eb aligned parts I guess
> 13:10 adrianschmutzler: no idea; but I will throw that idea towards the affected people, maybe they want to go into that direction
> 13:11 jow: in the meanwhile I'd say we should apply the proposed fix
> 13:11 jow: it makes sense
> 13:11 jow: and its not the submitters fault that the orginal code is not beautiful to begin with
> 13:11 adrianschmutzler: sysupgrade won't be a problem for a user-space solution, as other stuff there happens in user-space as well?
> 13:12 jow: if I remember correctly, that partial erase thing was introduced years ago to support sysupgrade on Redboot systems
> 13:13 adrianschmutzler: that's one of currently broken systems and the patch fixes them apparently
> 13:13 jow: there we wanted to write accross the FIS table dictated kernel/rootfs boundary using the mtd util and nbd implemented a kernel patch and corresponding functionality in mtd to do exactly that
> 13:13 jow: and I believe one of (the?) problem was that the kernel mtd subsystem write-protects all non-aligned partitions by default
> 13:14 adrianschmutzler: but that sounds more like the subsequent patch:
> 13:14 adrianschmutzler: 412-mtd-partial_eraseblock_unlock.patch
> 13:15 adrianschmutzler: which states it's for ixp4xx only, which probably isn't true anymore ...
> 13:16 jow: adrianschmutzler: I think both is needed
> 13:19 jow: adrianschmutzler: I think the partial erase was needed to rewrite the FIS partition while keeping the Redboot config (often both share one eraseblock)
> 13:20 adrianschmutzler: okay, then since this is tested and waiting some time already, and since part_write is removed with newer kernel anyway, I merge the existing patch.
> 13:20 jow: adrianschmutzler: and the partial erase unlock patch covers cases where the partition you manipulate is locked because its not spanning an entire eb
> 13:21 jow: at least this is my understanding
> 13:22 adrianschmutzler: jow: okay. but then the 412 patch is probably relevant to all devices with redboot partitions and the ixp4xx references should be removed from the description ...
> 13:22 jow: or rather, it allows unlocking non-aligned parts by simply tweaking the flash area begin and end offsets to start at the previous eb boundary and end on the next
> 13:22 jow: so you're not only unlocking the target area but a little bit of stuff before and after too
> 13:23 jow: otherwise the mtd subsystem would reject the unlock
> 13:24 adrianschmutzler: so, it's effectively a hack
> 13:24 jow: yes
> 13:24 adrianschmutzler: btw just found that Rafał wanted to remove these scripts already in March: (Link: https://patchwork.ozlabs.org/project/openwrt/patch/20200309114302.14383-1-zajec5@gmail.com/)https://patchwork.ozlabs.org/project/openwrt/patch/20200309114302.14383-1-zajec5@gmail.com/
> 13:25 adrianschmutzler: scripts->patches
> 13:26 jow: I think that if we somehow have a mechanism to forcibly unlock all flash areas (which could be done by a much leaner and more isolated patch I suppose) we could handle all the other edge cases in userspace
> 13:26 jow: by disregarding any partitioning and simply treating the entire mtd as continuous block device with offsets
> 13:27 jow: like you would e.g. dd an x86 image to the blockdev
> 13:28 jow: ah, just remebered this: (Link: https://github.com/jclehner/mtd-rw)https://github.com/jclehner/mtd-rw
> 13:28 jow: something like this could be a solution for sysupgrade
> 13:28 jow: not sure if its still compatible to 5.4 but if yes the we wouldn't even need a patch at all
> 13:29 adrianschmutzler: mtd-rw is in packages repo and works fine AFAIK
> 13:29 jow: a slightly less dangerous solution would be a kernel module or some other facility which would allow us to create mtd partitions on demand, not sure if something like this is possible
> 13:30 jow: one could thne simply "spwan" a new mtd partition for the flash area relevant for sysupgrade, then "mtd write" in there
> 13:30 adrianschmutzler: I just hesitate to include and enable a tool like that by default
> 13:30 jow: while "masking" other sensible parts like bootloader etc. to avoid bricks if something goes wrong
> 13:33 jow: hm well, yeah I understand your concerns
> 13:33 jow: but on x86 for example you can also happily dd over your entire disk
> 13:33 jow: granted, it is way easier to recover, but still...
> 13:34 adrianschmutzler: I will push that forward to the Mikrotik people anyway, as they seem to be the most affected at the moment. Maybe they have time to play around with idea going into that direction
> 13:34 adrianschmutzler: "Mikrotik people" -> contributors working with Mikrotik devices, not the vendor of course
> 13:37 jow: I could imagine that adding mtd-rw by defualt and adding some kind of sysctl to trigger it would be good enough
> 13:37 adrianschmutzler: together with Tomasz on the Redboot front there should be enough people interested
> 13:37 adrianschmutzler: hmm, I really have to think about that
> 13:37 jow: that sysctl could then be toggled by "mtd unlock" which never actually worked for me when I needed it
> 13:38 jow: or some other equivalent commend to not break existing "mtd unlock" users
> 13:38 jow: *command
> 13:38 adrianschmutzler: I fear this may end up in a lot of scripts then, where people will use it to take shortcuts ...
> 13:39 adrianschmutzler: but if it solves the problem, we should consider it
> 13:40 adrianschmutzler: because maintaining these patches won't become easier
> 13:41 jow: I agree
> 14:12 rmilecki: adrianschmutzler: jow: omg, that parial erase size crap again?
> 14:13 jow: rmilecki: we've been discussing its original intnetion and possible, non-invasive alternatives
> 14:14 jow: rmilecki: current working idea is to ship some kind of mtd unlock mechanism and handling the overlapping write/erase things in userspace
> 14:15 jow: rmilecki: (Link: https://github.com/jclehner/mtd-rw)https://github.com/jclehner/mtd-rw plus some clever scripting should already achive that
> 14:15 adrianschmutzler: rmilecki: the problem is that this partial erase stuff is needed for the redboot devices and all of the mikrotik ones as well (which currently all have "broken" sysupgrade)
> 14:16 rmilecki: the problem is it exists in a current form
> 14:16 rmilecki: afair it enables those partial wries for all ops, without checking if access is aligned or not
> 14:18 rmilecki: adrianschmutzler: i see jow posted few improvement ideas, do you want to implement them? if so, as part of submitted patch? or on top of that (later)?
> 14:21 adrianschmutzler: rmilecki: most of jow's improvement were about style. however, in (Link: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=46b5889cc2c54bac7d7e727a44d28a298df23cef)https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=46b5889cc2c54bac7d7e727a44d28a298df23cef kernel changes partition handling anyway
> 14:21 rmilecki: oh, that one is a hure change
> 14:21 adrianschmutzler: so, I don't think it's worth to care too much about style when we have to do a different implementation for next stable kernel anyway.
> 14:21 rmilecki: it's in some recent kernel though, right?
> 14:21 adrianschmutzler: and the existing patch is tested and known to be working on the affected devices
> 14:22 rmilecki: adrianschmutzler: ok, i agree, let's just make it work
> 14:22 adrianschmutzler: haven't checked explicitly, but somewhere after 5.4 of course
diff mbox series

Patch

diff --git a/target/linux/generic/pending-5.4/411-mtd-partial_eraseblock_write.patch b/target/linux/generic/pending-5.4/411-mtd-partial_eraseblock_write.patch
index b46c3f5ed4..c48a144d3d 100644
--- a/target/linux/generic/pending-5.4/411-mtd-partial_eraseblock_write.patch
+++ b/target/linux/generic/pending-5.4/411-mtd-partial_eraseblock_write.patch
@@ -19,7 +19,7 @@  Signed-off-by: Felix Fietkau <nbd@nbd.name>
  /* Our partition linked list */
  static LIST_HEAD(mtd_partitions);
  static DEFINE_MUTEX(mtd_partitions_mutex);
-@@ -206,6 +208,53 @@ static int part_erase(struct mtd_info *m
+@@ -206,11 +208,77 @@ static int part_erase(struct mtd_info *m
  {
  	struct mtd_part *part = mtd_to_part(mtd);
  	int ret;
@@ -73,10 +73,9 @@  Signed-off-by: Felix Fietkau <nbd@nbd.name>
  
  	instr->addr += part->offset;
  	ret = part->parent->_erase(part->parent, instr);
-@@ -213,6 +262,24 @@ static int part_erase(struct mtd_info *m
+ 	if (instr->fail_addr != MTD_FAIL_ADDR_UNKNOWN)
  		instr->fail_addr -= part->offset;
- 	instr->addr -= part->offset;
- 
++
 +	if (mtd->flags & MTD_ERASE_PARTIAL) {
 +		if (partial_start) {
 +			part->parent->_write(part->parent,
@@ -95,10 +94,10 @@  Signed-off-by: Felix Fietkau <nbd@nbd.name>
 +		kfree(erase_buf);
 +	}
 +
- 	return ret;
- }
+ 	instr->addr -= part->offset;
  
-@@ -525,19 +592,22 @@ static struct mtd_part *allocate_partiti
+ 	return ret;
+@@ -525,19 +593,22 @@ static struct mtd_part *allocate_partiti
  	remainder = do_div(tmp, wr_alignment);
  	if ((slave->mtd.flags & MTD_WRITEABLE) && remainder) {
  		/* Doesn't start on a boundary of major erase size */