diff mbox

[1/2] UBIFS: fix free log space calculation

Message ID 1405514528-14649-1-git-send-email-dedekind1@gmail.com
State Accepted
Commit 84ad611a16b5a53cdc12b6d0522d0a6e08eee394
Headers show

Commit Message

Artem Bityutskiy July 16, 2014, 12:42 p.m. UTC
From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>

Hu (hujianyang <hujianyang@huawei.com>) discovered an issue in the
'empty_log_bytes()' function, which calculates how many bytes are left in the
log:

"
If 'c->lhead_lnum + 1 == c->ltail_lnum' and 'c->lhead_offs == c->leb_size', 'h'
would equalent to 't' and 'empty_log_bytes()' would return 'c->log_bytes'
instead of 0.
"

At this point it is not clear what would be the consequences of this, and
whether this may lead to any problems, but this patch addresses the issue just
in case.

Reported-by: hujianyang <hujianyang@huawei.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
---
 fs/ubifs/log.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Comments

hujianyang Aug. 11, 2014, 3:21 a.m. UTC | #1
> ---
>  fs/ubifs/log.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ubifs/log.c b/fs/ubifs/log.c
> index ed24422..7e818ec 100644
> --- a/fs/ubifs/log.c
> +++ b/fs/ubifs/log.c
> @@ -106,10 +106,14 @@ static inline long long empty_log_bytes(const struct ubifs_info *c)
>  	h = (long long)c->lhead_lnum * c->leb_size + c->lhead_offs;
>  	t = (long long)c->ltail_lnum * c->leb_size;
>  
> -	if (h >= t)
> +	if (h > t)
>  		return c->log_bytes - h + t;
> -	else
> +	else if (h != t)
>  		return t - h;
> +	else if (c->lhead_lnum != c->ltail_lnum)
> +		return 0;
> +	else
> +		return c->log_bytes;
>  }
>  
>  /**
> 

Hi Artem,

After almost one month test, the error I've reported

UBIFS: background thread "ubifs_bgt0_0" started, PID 2222
UBIFS: recovery needed
UBIFS error (pid 2220): replay_log_leb: first log node at LEB 4:0 is not CS node
UBIFS error (pid 2220): replay_log_leb: log error detected while replaying the log at LEB 4:0
 magic          0x6101831
 crc            0x81e22cd5
 node_type      8 (reference node)
 group_type     0 (no node group)
 sqnum          2375337
 len            64
 lnum           5273
 offs           122880
 jhead          2
UBIFS: background thread "ubifs_bgt0_0" stops

never came out again.

I think this patch really solve that problem. Now I want to show my
analysis and wish you add "Cc stable" for this patch.

With the help of ubidump tool, I got the MST NODEs and LOG NODEs of
the image which is broken.

master node:

look at LEB 1:20480 (106496 bytes left)
scanning master node at LEB 1:20480
	magic          0x6101831
	crc            0x38ff5c67
	node_type      7 (master node)
	group_type     0 (no node group)
	sqnum          15799802
	len            512
	highest_inum   161
	commit number  7954
	flags          0x3
	log_lnum       4
	root_lnum      2258
	root_offs      7048
	root_len       148
	gc_lnum        2324
	ihead_lnum     2258
	ihead_offs     8192
	index_size     2511408
	lpt_lnum       15
	lpt_offs       113571
	nhead_lnum     15
	nhead_offs     114688
	ltab_lnum      15
	ltab_offs      110592
	lsave_lnum     0
	lsave_offs     0
	lscan_lnum     2930
	leb_cnt        6844
	empty_lebs     3362
	idx_lebs       23
	total_free     430188544
	total_dirty    56069504
	total_used     377968720
	total_dead     7032
	total_dark     31196320
look at LEB 1:20992 (105984 bytes left)
scanning padding node at LEB 1:20992
1508 bytes padded at LEB 1:20992, offset now 22528
look at LEB 1:22528 (104448 bytes left)
hit empty space at LEB 1:22528
stop scanning LEB 1 at offset 126976

log nodes:

leb number     first sqnum   first node type    full/not full
   3            15801163         ref node           full
   4            15801288         ref node           full
   5            15801433         ref node           full
   6            15801598         ref node           full
   7            15801966         ref node         not full
   8            15800414         ref node           full
   9            15800538         ref node           full
  10            15800664         ref node           full
  11            15800789         ref node           full
  12            15800915         ref node           full
  13            15801039         ref node           full

The min_io_size of this device is 2048 and the leb_size of it is
126976. Each of leb is full except 7 means no frequently power
cutting happen because log leb will change into next one if the
volume is remount.

We know cs node must in leb 4 as @log_lnum is 4 in mst node. But
it is a ref node as we see it in leb 4. What happened?

Leb 7 has the highest sqnum and it is not full, that means it is
the last log leb before an unclean power off. So we know leb 8 is
the oldest log leb exist. But it has a higher sqnum than mst node,
15799802, that means mst node is written before the oldest node
in current log area.

Each time we write a cs node, we write a new mst node. The sqnum
of this mst node should higher than cs node. So the cs node in
leb 4 is overlapped.

We keep an empty leb in log area by @min_log_bytes to start-up a
new commit. When commit start, we may use this leb and set
@min_log_bytes to zero.

Suppose old ltail_lnum is 5 and lhead_lnum is 3, leb 3 is full and
a new commit start. We write a new cs node at the beginning of leb
4 and set @min_log_bytes to zero. Now, as empty_log_bytes() is wrong,
ubifs_add_bud_to_log() may wrap leb 5, leb 6... At last, overlap
the leb 4 which has the newest cs node and mst_node would point to,
why not? we don't have any assertion or any protection to void this
except the check of empty_log_bytes() with an error shows above.

Now an unfortunately power cut happens, this volume is broken.

I think this patch fixed this problem by correctly protect the log
area. So maybe you can add "Cc stable" before you send it to mainline.

By the way, commit 8989cd69b9d2 "UBIFS: fix a race condition" fix
a real potential race and useful for avoiding errors in log area.
Could you please add "Cc stable" for it too?


Thanks,

Hu
Artem Bityutskiy Sept. 8, 2014, 9:43 a.m. UTC | #2
On Mon, 2014-08-11 at 11:21 +0800, hujianyang wrote:
> > ---
> >  fs/ubifs/log.c | 8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/ubifs/log.c b/fs/ubifs/log.c
> > index ed24422..7e818ec 100644
> > --- a/fs/ubifs/log.c
> > +++ b/fs/ubifs/log.c
> > @@ -106,10 +106,14 @@ static inline long long empty_log_bytes(const struct ubifs_info *c)
> >  	h = (long long)c->lhead_lnum * c->leb_size + c->lhead_offs;
> >  	t = (long long)c->ltail_lnum * c->leb_size;
> >  
> > -	if (h >= t)
> > +	if (h > t)
> >  		return c->log_bytes - h + t;
> > -	else
> > +	else if (h != t)
> >  		return t - h;
> > +	else if (c->lhead_lnum != c->ltail_lnum)
> > +		return 0;
> > +	else
> > +		return c->log_bytes;
> >  }
> >  
> >  /**
> > 
> 
> Hi Artem,
> 
> After almost one month test, the error I've reported
> 
> UBIFS: background thread "ubifs_bgt0_0" started, PID 2222
> UBIFS: recovery needed
> UBIFS error (pid 2220): replay_log_leb: first log node at LEB 4:0 is not CS node
> UBIFS error (pid 2220): replay_log_leb: log error detected while replaying the log at LEB 4:0
>  magic          0x6101831
>  crc            0x81e22cd5
>  node_type      8 (reference node)
>  group_type     0 (no node group)
>  sqnum          2375337
>  len            64
>  lnum           5273
>  offs           122880
>  jhead          2
> UBIFS: background thread "ubifs_bgt0_0" stops
> 
> never came out again.
> 
> I think this patch really solve that problem. Now I want to show my
> analysis and wish you add "Cc stable" for this patch.
> 
> With the help of ubidump tool, I got the MST NODEs and LOG NODEs of
> the image which is broken.
> 
> master node:
> 
> look at LEB 1:20480 (106496 bytes left)
> scanning master node at LEB 1:20480
> 	magic          0x6101831
> 	crc            0x38ff5c67
> 	node_type      7 (master node)
> 	group_type     0 (no node group)
> 	sqnum          15799802
> 	len            512
> 	highest_inum   161
> 	commit number  7954
> 	flags          0x3
> 	log_lnum       4
> 	root_lnum      2258
> 	root_offs      7048
> 	root_len       148
> 	gc_lnum        2324
> 	ihead_lnum     2258
> 	ihead_offs     8192
> 	index_size     2511408
> 	lpt_lnum       15
> 	lpt_offs       113571
> 	nhead_lnum     15
> 	nhead_offs     114688
> 	ltab_lnum      15
> 	ltab_offs      110592
> 	lsave_lnum     0
> 	lsave_offs     0
> 	lscan_lnum     2930
> 	leb_cnt        6844
> 	empty_lebs     3362
> 	idx_lebs       23
> 	total_free     430188544
> 	total_dirty    56069504
> 	total_used     377968720
> 	total_dead     7032
> 	total_dark     31196320
> look at LEB 1:20992 (105984 bytes left)
> scanning padding node at LEB 1:20992
> 1508 bytes padded at LEB 1:20992, offset now 22528
> look at LEB 1:22528 (104448 bytes left)
> hit empty space at LEB 1:22528
> stop scanning LEB 1 at offset 126976
> 
> log nodes:
> 
> leb number     first sqnum   first node type    full/not full
>    3            15801163         ref node           full
>    4            15801288         ref node           full
>    5            15801433         ref node           full
>    6            15801598         ref node           full
>    7            15801966         ref node         not full
>    8            15800414         ref node           full
>    9            15800538         ref node           full
>   10            15800664         ref node           full
>   11            15800789         ref node           full
>   12            15800915         ref node           full
>   13            15801039         ref node           full
> 
> The min_io_size of this device is 2048 and the leb_size of it is
> 126976. Each of leb is full except 7 means no frequently power
> cutting happen because log leb will change into next one if the
> volume is remount.
> 
> We know cs node must in leb 4 as @log_lnum is 4 in mst node. But
> it is a ref node as we see it in leb 4. What happened?
> 
> Leb 7 has the highest sqnum and it is not full, that means it is
> the last log leb before an unclean power off. So we know leb 8 is
> the oldest log leb exist. But it has a higher sqnum than mst node,
> 15799802, that means mst node is written before the oldest node
> in current log area.
> 
> Each time we write a cs node, we write a new mst node. The sqnum
> of this mst node should higher than cs node. So the cs node in
> leb 4 is overlapped.
> 
> We keep an empty leb in log area by @min_log_bytes to start-up a
> new commit. When commit start, we may use this leb and set
> @min_log_bytes to zero.
> 
> Suppose old ltail_lnum is 5 and lhead_lnum is 3, leb 3 is full and
> a new commit start. We write a new cs node at the beginning of leb
> 4 and set @min_log_bytes to zero. Now, as empty_log_bytes() is wrong,
> ubifs_add_bud_to_log() may wrap leb 5, leb 6... At last, overlap
> the leb 4 which has the newest cs node and mst_node would point to,
> why not? we don't have any assertion or any protection to void this
> except the check of empty_log_bytes() with an error shows above.
> 
> Now an unfortunately power cut happens, this volume is broken.
> 
> I think this patch fixed this problem by correctly protect the log
> area. So maybe you can add "Cc stable" before you send it to mainline.
> 
> By the way, commit 8989cd69b9d2 "UBIFS: fix a race condition" fix
> a real potential race and useful for avoiding errors in log area.
> Could you please add "Cc stable" for it too?

OK, I am adding these tags:

    Cc: stable@vger.kernel.org
    Tested-by: hujianyang <hujianyang@huawei.com>

And putting the patches to the linux-next queue.
diff mbox

Patch

diff --git a/fs/ubifs/log.c b/fs/ubifs/log.c
index ed24422..7e818ec 100644
--- a/fs/ubifs/log.c
+++ b/fs/ubifs/log.c
@@ -106,10 +106,14 @@  static inline long long empty_log_bytes(const struct ubifs_info *c)
 	h = (long long)c->lhead_lnum * c->leb_size + c->lhead_offs;
 	t = (long long)c->ltail_lnum * c->leb_size;
 
-	if (h >= t)
+	if (h > t)
 		return c->log_bytes - h + t;
-	else
+	else if (h != t)
 		return t - h;
+	else if (c->lhead_lnum != c->ltail_lnum)
+		return 0;
+	else
+		return c->log_bytes;
 }
 
 /**