Message ID | 1405508618.1906.26.camel@sauron.fi.intel.com |
---|---|
State | RFC |
Headers | show |
On 2014/7/16 19:03, Artem Bityutskiy wrote: > On Fri, 2014-07-11 at 10:13 +0800, hujianyang wrote: >> Hi Artem, >> >> After several days testing, the error we fixed in commit 642f63ed10, >> "UBIFS: fix fatal race condition" comes out again. This commit seems >> not working for my problem and I have to research into it again. > > Do you say that your original patch fixed the problem, and the re-worked > "fix fatal race condition" patch does not fix it, so it is incorrect? > > Or do you say that probably the original patch also did not fix the > problem, so both do not really fix the issue? > > Or do you say that "fix fatal race condition" fixes only part of the > problem, and there is another problem, which needs fixing too? > > Or you are not sure? > First, I think the re-worked patch and my origin patch fix same problem. Once I think the error 'not CS node' is caused by power cut, so I research the code and find this race. But, after several days test, this error came out again and I think we don't catch the true cause. Then I find the error below. I think previous fixes(commit 642f63ed10) are useful but not solve my problem. Maybe you can change the commit log or just keep them because they can really fix a problem like that. > > Do you want me to create 2 patches based on this information and send > them to you, our you submit them? They should have 'Cc: > stable@vger.kernel.org'. > > Thanks! > Yes, I'm not sure how to fix it is better and I was busy these days. You can fix this error in your way and cc the patch to me. But I don't know if this error is the true cause of my problem. I would like to report to you if I get something new. Thanks~! Hu
On Wed, 2014-07-16 at 19:44 +0800, hujianyang wrote: > First, I think the re-worked patch and my origin patch fix same > problem. Once I think the error 'not CS node' is caused by power > cut, so I research the code and find this race. > > But, after several days test, this error came out again and I > think we don't catch the true cause. Then I find the error below. > > I think previous fixes(commit 642f63ed10) are useful but not solve > my problem. Maybe you can change the commit log or just keep them > because they can really fix a problem like that. OK, I'll keep the patches, but will amend the commit message a bit and tell there that they are fixing a theoretical issue. And I'll remove Cc: stable so far. > Yes, I'm not sure how to fix it is better and I was busy these > days. You can fix this error in your way and cc the patch to me. > > But I don't know if this error is the true cause of my problem. I > would like to report to you if I get something new. I've created 2 patches based on your e-mail. I did not test them at all. The assertion - I move it to 'ubifs_next_log_lnum()'. May be this is incorrect, but looks like a good place for me on the first glance. I guess you will continue testing, right? Please, use these patches and let me know if they are wrong, if they help or not, etc. I'll send the 2 patches shortly. Thanks!
On Wed, 2014-07-16 at 15:38 +0300, Artem Bityutskiy wrote: > I guess you will continue testing, right? Please, use these patches and > let me know if they are wrong, if they help or not, etc. I'll send the 2 > patches shortly. By the way, you may try making the log smaller or larger, instead of using the default size - may be it'll help making the issue easier to reproduce?
diff --git a/fs/ubifs/log.c b/fs/ubifs/log.c index ed24422..7e818ec 100644 --- a/fs/ubifs/log.c +++ b/fs/ubifs/log.c @@ -106,10 +106,14 @@ static inline long long empty_log_bytes(const struct ubifs_info *c) h = (long long)c->lhead_lnum * c->leb_size + c->lhead_offs; t = (long long)c->ltail_lnum * c->leb_size; - if (h >= t) + if (h > t) return c->log_bytes - h + t; - else + else if (h != t) return t - h; + else if (c->lhead_lnum != c->ltail_lnum) + return 0; + else + return c->log_bytes; } > but I think there may be no other situation that lead @h equal to @t.