diff mbox series

ubifs: wbuf->offs must be aligned to max_write_size

Message ID 20190430142823.28044-1-xoxyuxu@gmail.com
State Under Review, archived
Delegated to: Richard Weinberger
Headers show
Series ubifs: wbuf->offs must be aligned to max_write_size | expand

Commit Message

ぉゅぅ April 30, 2019, 2:28 p.m. UTC
UBIFS has a journal recovery function.
It is useful for devices that experience a power failure.

And as per comment of ubifs_wbuf_sync_nolock(), wbuf is optimized for
performance by writing aligned max_write_size.

In following environment, checking offset is not aligned to max_write_buffer.

- Using a SPI-NOR device with a write buffer size over 256 bytes
  For example: Micron MT28EW01GABA, its write buffer is 512 words
- LEB hedaer size is 64 bytes
- UBI header size is 64 bytes

So if write buffer command make a crrupt data in a block,
is_last_write() and no_more_nodes() can not check correctly.

This patch adjusts wbuf writes to max_write_size, taking into account
leb_start. The recovery process also checks the data at the corrected
alignment position.

Signed-off-by: Yuichi Nakai <xoxyuxu@gmail.com>
---
 fs/ubifs/debug.c    |  2 +-
 fs/ubifs/io.c       | 16 +++++++++-------
 fs/ubifs/misc.h     | 13 +++++++++++++
 fs/ubifs/recovery.c |  6 +++---
 4 files changed, 26 insertions(+), 11 deletions(-)

Comments

Richard Weinberger May 5, 2019, 10:08 p.m. UTC | #1
On Tue, Apr 30, 2019 at 4:29 PM Yuichi Nakai <xoxyuxu@gmail.com> wrote:
>
> UBIFS has a journal recovery function.
> It is useful for devices that experience a power failure.
>
> And as per comment of ubifs_wbuf_sync_nolock(), wbuf is optimized for
> performance by writing aligned max_write_size.
>
> In following environment, checking offset is not aligned to max_write_buffer.
>
> - Using a SPI-NOR device with a write buffer size over 256 bytes
>   For example: Micron MT28EW01GABA, its write buffer is 512 words
> - LEB hedaer size is 64 bytes
> - UBI header size is 64 bytes

There are no such things are LEB or UBI header.
Do you mean UBI EC und UBI VID headers?

What is c->leb_start in your case?

> So if write buffer command make a crrupt data in a block,
> is_last_write() and no_more_nodes() can not check correctly.
>
> This patch adjusts wbuf writes to max_write_size, taking into account
> leb_start. The recovery process also checks the data at the corrected
> alignment position.

I have a hard time understanding your patch.
Are you facing a failure? If so, please share it.
ぉゅぅ May 6, 2019, 8:54 a.m. UTC | #2
Thanks for your reply!

I am sorry for the patch that is hard to understand.
This is my first patch post. I will do my best to convey my intentions.

2019年5月6日(月) 7:08 Richard Weinberger <richard.weinberger@gmail.com>:
>
> On Tue, Apr 30, 2019 at 4:29 PM Yuichi Nakai <xoxyuxu@gmail.com> wrote:
> >
> > UBIFS has a journal recovery function.
> > It is useful for devices that experience a power failure.
> >
> > And as per comment of ubifs_wbuf_sync_nolock(), wbuf is optimized for
> > performance by writing aligned max_write_size.
> >
> > In following environment, checking offset is not aligned to max_write_buffer.
> >
> > - Using a SPI-NOR device with a write buffer size over 256 bytes
> >   For example: Micron MT28EW01GABA, its write buffer is 512 words
> > - LEB hedaer size is 64 bytes
> > - UBI header size is 64 bytes
>
> There are no such things are LEB or UBI header.
> Do you mean UBI EC und UBI VID headers?

I missed writing a correct word..
Yes, UBI EC header and UBI VID header are both 64bytes in our environment.

> What is c->leb_start in your case?

It is 128 Bytes where 64+64.
There is no padding / alignment  because NOR-Flash. (min. IO size is 1.)

Following ASCII art is an erase block, tick(+) is writer buffer boundary.
|------------+------------+------------+------------+------------|
|<->|     ; EC header
|   |<->| ; VID header
|<----->| ; c->leb_start : LEB start offset from an erase block
        |<---+------------+------------+------------+----------->| ;
LEB(c->leb_size)

wbuf takes into account max write size bounds, so references leb_start.
This implementation can be found as follows:
----
FILE: fs/ubifs/io.c
int ubifs_wbuf_init(struct ubifs_info *c, struct ubifs_wbuf *wbuf)
/*
 * If the LEB starts at the max. write size aligned address, then
 * write-buffer size has to be set to @c->max_write_size. Otherwise,
 * set it to something smaller so that it ends at the closest max.
 * write size boundary.
 */
size = c->max_write_size - (c->leb_start % c->max_write_size);
----

Currently, is_last_write() rounds up 'offset' to c->max_write_size.
But the offset is start from c->leb_start.
It is NOT aligned write buffer in our case (shown as above AA).

I think that the implementation of round-up is for write buffer boundary

My understanding is...
The reason the implementation rounds up to a write buffer boundary is that
the effects of NOR Flash writes are considered to fall on a buffer boundary.


> > So if write buffer command make a crrupt data in a block,
> > is_last_write() and no_more_nodes() can not check correctly.
> >
> > This patch adjusts wbuf writes to max_write_size, taking into account
> > leb_start. The recovery process also checks the data at the corrected
> > alignment position.
>
> I have a hard time understanding your patch.
> Are you facing a failure? If so, please share it.

Yes.
But the data can not be sent because it was found in our company.
As I work for traditional Japanese company, the data can not send.
In our case, a journal LEB has a broken data in last empty area.
It should treat as padding data, I think.
And subsequent nodes were not broken.
So we want to recover subsequent nodes, too.
Although it is not the end of the journal data, I do not know why the
data is broken.

The padding data which is 0xFFs in the padding area is checked in
is_last_write().
The current implementation does not consider leb_start, so it starts checking
from an offset different from the true max write buffer boundary.

So the last broken data in a write buffer boundary is not skipped, so
ubifs_recover_leb() detects as a crrupted.


Reproduce scenario:
1. system is booting up
2. UBI attached mtdx
3. UBI can get some volume
4. UBIFS tries to mount the ubi partition
5. Its flag indicate 'need recovery'. So UBIFS executes recovery
sequence by calling
    ubifs_replay_journal() from mount_ubifs() which is called from
mounting with UBIFS.

call trace is as follows:
mount_ubifs()
-> ubifs_replay_journal()
 -> replay_buds()
  -> replay_bud()
   -> ubifs_recover_leb()
    -> ubifs_scan_a_node()
    -> is_last_write()

The above description is for is_last_write(). Others are also same.

For NAND, max_write_size and size of headers are
c->min_io_size(perhaps a sector size).
So this modification is not effected.

> --
> Thanks,
> //richard
Richard Weinberger May 7, 2019, 8:09 a.m. UTC | #3
On Mon, May 6, 2019 at 10:54 AM ぉゅぅ <xoxyuxu@gmail.com> wrote:
>
> Thanks for your reply!
>
> I am sorry for the patch that is hard to understand.
> This is my first patch post. I will do my best to convey my intentions.

Thanks, now I understand the problem. :-)
I'm currently checking whether there are more such cases we need to fix.
diff mbox series

Patch

diff --git a/fs/ubifs/debug.c b/fs/ubifs/debug.c
index c49ff50fdceb..d8c1fa6d182d 100644
--- a/fs/ubifs/debug.c
+++ b/fs/ubifs/debug.c
@@ -2565,7 +2565,7 @@  static int corrupt_data(const struct ubifs_info *c, const void *buf,
 
 	from = prandom_u32() % len;
 	/* Corruption span max to end of write unit */
-	to = min(len, ALIGN(from + 1, c->max_write_size));
+	to = min(len, ubifs_align_max_write(c, from + 1));
 
 	ubifs_warn(c, "filled bytes %u-%u with %s", from, to - 1,
 		   ffs ? "0xFFs" : "random data");
diff --git a/fs/ubifs/io.c b/fs/ubifs/io.c
index d124117efd42..06ccaeb4c5d9 100644
--- a/fs/ubifs/io.c
+++ b/fs/ubifs/io.c
@@ -564,7 +564,8 @@  int ubifs_wbuf_sync_nolock(struct ubifs_wbuf *wbuf)
 	ubifs_assert(c, wbuf->size % c->min_io_size == 0);
 	ubifs_assert(c, !c->ro_media && !c->ro_mount);
 	if (c->leb_size - wbuf->offs >= c->max_write_size)
-		ubifs_assert(c, !((wbuf->offs + wbuf->size) % c->max_write_size));
+		ubifs_assert(c, !((c->leb_start + wbuf->offs + wbuf->size)
+					% c->max_write_size));
 
 	if (c->ro_error)
 		return -EROFS;
@@ -595,8 +596,8 @@  int ubifs_wbuf_sync_nolock(struct ubifs_wbuf *wbuf)
 	 */
 	if (c->leb_size - wbuf->offs < c->max_write_size)
 		wbuf->size = c->leb_size - wbuf->offs;
-	else if (wbuf->offs & (c->max_write_size - 1))
-		wbuf->size = ALIGN(wbuf->offs, c->max_write_size) - wbuf->offs;
+	else if ((c->leb_start + wbuf->offs) & (c->max_write_size - 1))
+		wbuf->size = ubifs_align_max_write(wbuf->offs) - wbuf->offs;
 	else
 		wbuf->size = c->max_write_size;
 	wbuf->avail = wbuf->size;
@@ -636,8 +637,8 @@  int ubifs_wbuf_seek_nolock(struct ubifs_wbuf *wbuf, int lnum, int offs)
 	wbuf->offs = offs;
 	if (c->leb_size - wbuf->offs < c->max_write_size)
 		wbuf->size = c->leb_size - wbuf->offs;
-	else if (wbuf->offs & (c->max_write_size - 1))
-		wbuf->size = ALIGN(wbuf->offs, c->max_write_size) - wbuf->offs;
+	else if ((c->leb_start + wbuf->offs) & (c->max_write_size - 1))
+		wbuf->size = ubifs_align_max_write(wbuf->offs) - wbuf->offs;
 	else
 		wbuf->size = c->max_write_size;
 	wbuf->avail = wbuf->size;
@@ -746,7 +747,8 @@  int ubifs_wbuf_write_nolock(struct ubifs_wbuf *wbuf, void *buf, int len)
 	ubifs_assert(c, !c->ro_media && !c->ro_mount);
 	ubifs_assert(c, !c->space_fixup);
 	if (c->leb_size - wbuf->offs >= c->max_write_size)
-		ubifs_assert(c, !((wbuf->offs + wbuf->size) % c->max_write_size));
+		ubifs_assert(c, !((c->leb_start + wbuf->offs + wbuf->size)
+					% c->max_write_size));
 
 	if (c->leb_size - wbuf->offs - wbuf->used < aligned_len) {
 		err = -ENOSPC;
@@ -813,7 +815,7 @@  int ubifs_wbuf_write_nolock(struct ubifs_wbuf *wbuf, void *buf, int len)
 		len -= wbuf->avail;
 		aligned_len -= wbuf->avail;
 		written += wbuf->avail;
-	} else if (wbuf->offs & (c->max_write_size - 1)) {
+	} else if ((c->leb_start + wbuf->offs) & (c->max_write_size - 1)) {
 		/*
 		 * The write-buffer offset is not aligned to
 		 * @c->max_write_size and @wbuf->size is less than
diff --git a/fs/ubifs/misc.h b/fs/ubifs/misc.h
index 6f87237fdbf4..269350749ce7 100644
--- a/fs/ubifs/misc.h
+++ b/fs/ubifs/misc.h
@@ -290,4 +290,17 @@  static inline int ubifs_next_log_lnum(const struct ubifs_info *c, int lnum)
 
 const char *ubifs_assert_action_name(struct ubifs_info *c);
 
+/**
+ * ubifs_align_max_write -
+ * @c: UBIFS file-system description object
+ * @offs: logical eraseblock offset to aligned to
+ *
+ * This function calcurates offset which aligned to size of max_write_size
+ * from start of LEB .
+ */
+static inline int ubifs_align_max_write(const struct ubifs_info *c, int offs)
+{
+	return (c->leb_start + offs) & (c->max_write_size - 1) - c->leb_start ;
+}
+
 #endif /* __UBIFS_MISC_H__ */
diff --git a/fs/ubifs/recovery.c b/fs/ubifs/recovery.c
index 8526b7ec4707..33fbfb5921ed 100644
--- a/fs/ubifs/recovery.c
+++ b/fs/ubifs/recovery.c
@@ -420,7 +420,7 @@  static int is_last_write(const struct ubifs_info *c, void *buf, int offs)
 	 * Round up to the next @c->max_write_size boundary i.e. @offs is in
 	 * the last wbuf written. After that should be empty space.
 	 */
-	empty_offs = ALIGN(offs + 1, c->max_write_size);
+	empty_offs = ubifs_align_max_write(offs + 1);
 	check_len = c->leb_size - empty_offs;
 	p = buf + empty_offs - offs;
 	return is_empty(p, check_len);
@@ -474,7 +474,7 @@  static int no_more_nodes(const struct ubifs_info *c, void *buf, int len,
 	int skip, dlen = le32_to_cpu(ch->len);
 
 	/* Check for empty space after the corrupt node's common header */
-	skip = ALIGN(offs + UBIFS_CH_SZ, c->max_write_size) - offs;
+	skip = ubifs_align_max_write(offs + UBIFS_CH_SZ) - offs;
 	if (is_empty(buf + skip, len - skip))
 		return 1;
 	/*
@@ -486,7 +486,7 @@  static int no_more_nodes(const struct ubifs_info *c, void *buf, int len,
 		return 0;
 	}
 	/* Now we know the corrupt node's length we can skip over it */
-	skip = ALIGN(offs + dlen, c->max_write_size) - offs;
+	skip = ubifs_align_max_write(offs + dlen) - offs;
 	/* After which there should be empty space */
 	if (is_empty(buf + skip, len - skip))
 		return 1;