diff mbox

UBIFS: Fix empty_log_bytes() computing error

Message ID 1405508618.1906.26.camel@sauron.fi.intel.com
State RFC
Headers show

Commit Message

Artem Bityutskiy July 16, 2014, 11:03 a.m. UTC
On Fri, 2014-07-11 at 10:13 +0800, hujianyang wrote:
> Hi Artem,
> 
> After several days testing, the error we fixed in commit 642f63ed10,
> "UBIFS: fix fatal race condition" comes out again. This commit seems
> not working for my problem and I have to research into it again.

Do you say that your original patch fixed the problem, and the re-worked
"fix fatal race condition" patch does not fix it, so it is incorrect?

Or do you say that probably the original patch also did not fix the
problem, so both do not really fix the issue?

Or do you say that "fix fatal race condition" fixes only part of the
problem, and there is another problem, which needs fixing too?

Or you are not sure?

> With the help of my ubidump tool this time, I got the detailed info
> of log area and mst nodes and found a terrible mistake in function
> empty_log_bytes().

OK.

> I used to think this helper function was correct but it truns out I
> was wrong. We set @c->min_log_bytes to 0 in ubifs_log_start_commit()
> and suppose empty_log_bytes() will return 0 when the log area is full.

Yes.

> But it's not true.

> If lhead_lnum + 1 == ltail_lnum and c->lhead_offs == c->leb_size, @h
> would equal to @t and empty_log_bytes() will return log_bytes expect 0.

Well-spotted!

> 
> I prefer to add a patch like:
> 
> +       if (h == t && c->lhead_offs == c->leb_size)
> +               return 0;

Yeah. Or something which may be slightly easier to read:

> So just make this helper function return 0 if @h equal to @t.

Even though there is always something in the log, it is safer to make
sure the function works correctly for empty log too...

> -	if (h >= t)
> +	if (h > t)
>  		return c->log_bytes - h + t;

... so I think this is not good enough.

>  	else
>  		return t - h;
> @@ -240,6 +240,7 @@ int ubifs_add_bud_to_log(struct ubifs_info *c, int jhead, int lnum, int offs)
> 
>  	if (c->lhead_offs > c->leb_size - c->ref_node_alsz) {
>  		c->lhead_lnum = ubifs_next_log_lnum(c, c->lhead_lnum);
> +		ubifs_assert(c->lhead_lnum != c->ltail_lnum);
>  		c->lhead_offs = 0;
>  	}

Let's have it as a separate patch.

Do you want me to create 2 patches based on this information and send
them to you, our you submit them? They should have 'Cc:
stable@vger.kernel.org'.

Thanks!

Comments

hujianyang July 16, 2014, 11:44 a.m. UTC | #1
On 2014/7/16 19:03, Artem Bityutskiy wrote:
> On Fri, 2014-07-11 at 10:13 +0800, hujianyang wrote:
>> Hi Artem,
>>
>> After several days testing, the error we fixed in commit 642f63ed10,
>> "UBIFS: fix fatal race condition" comes out again. This commit seems
>> not working for my problem and I have to research into it again.
> 
> Do you say that your original patch fixed the problem, and the re-worked
> "fix fatal race condition" patch does not fix it, so it is incorrect?
> 
> Or do you say that probably the original patch also did not fix the
> problem, so both do not really fix the issue?
> 
> Or do you say that "fix fatal race condition" fixes only part of the
> problem, and there is another problem, which needs fixing too?
> 
> Or you are not sure?
> 

First, I think the re-worked patch and my origin patch fix same
problem. Once I think the error 'not CS node' is caused by power
cut, so I research the code and find this race.

But, after several days test, this error came out again and I
think we don't catch the true cause. Then I find the error below.

I think previous fixes(commit 642f63ed10) are useful but not solve
my problem. Maybe you can change the commit log or just keep them
because they can really fix a problem like that.

> 
> Do you want me to create 2 patches based on this information and send
> them to you, our you submit them? They should have 'Cc:
> stable@vger.kernel.org'.
> 
> Thanks!
> 

Yes, I'm not sure how to fix it is better and I was busy these
days. You can fix this error in your way and cc the patch to me.

But I don't know if this error is the true cause of my problem. I
would like to report to you if I get something new.

Thanks~!

Hu
Artem Bityutskiy July 16, 2014, 12:38 p.m. UTC | #2
On Wed, 2014-07-16 at 19:44 +0800, hujianyang wrote:
> First, I think the re-worked patch and my origin patch fix same
> problem. Once I think the error 'not CS node' is caused by power
> cut, so I research the code and find this race.
> 
> But, after several days test, this error came out again and I
> think we don't catch the true cause. Then I find the error below.
> 
> I think previous fixes(commit 642f63ed10) are useful but not solve
> my problem. Maybe you can change the commit log or just keep them
> because they can really fix a problem like that.

OK, I'll keep the patches, but will amend the commit message a bit and
tell there that they are fixing a theoretical issue. And I'll remove Cc:
stable so far.

> Yes, I'm not sure how to fix it is better and I was busy these
> days. You can fix this error in your way and cc the patch to me.
> 
> But I don't know if this error is the true cause of my problem. I
> would like to report to you if I get something new.

I've created 2 patches based on your e-mail. I did not test them at all.
The assertion - I move it to 'ubifs_next_log_lnum()'. May be this is
incorrect, but looks like a good place for me on the first glance.

I guess you will continue testing, right? Please, use these patches and
let me know if they are wrong, if they help or not, etc. I'll send the 2
patches shortly.

Thanks!
Artem Bityutskiy July 16, 2014, 12:43 p.m. UTC | #3
On Wed, 2014-07-16 at 15:38 +0300, Artem Bityutskiy wrote:
> I guess you will continue testing, right? Please, use these patches and
> let me know if they are wrong, if they help or not, etc. I'll send the 2
> patches shortly.

By the way, you may try making the log smaller or larger, instead of
using the default size - may be it'll help making the issue easier to
reproduce?
diff mbox

Patch

diff --git a/fs/ubifs/log.c b/fs/ubifs/log.c
index ed24422..7e818ec 100644
--- a/fs/ubifs/log.c
+++ b/fs/ubifs/log.c
@@ -106,10 +106,14 @@  static inline long long empty_log_bytes(const struct ubifs_info *c)
        h = (long long)c->lhead_lnum * c->leb_size + c->lhead_offs;
        t = (long long)c->ltail_lnum * c->leb_size;
 
-       if (h >= t)
+       if (h > t)
                return c->log_bytes - h + t;
-       else
+       else if (h != t)
                return t - h;
+       else if (c->lhead_lnum != c->ltail_lnum)
+               return 0;
+       else
+               return c->log_bytes;
 }

> but I think there may be no other situation that lead @h equal to @t.