Patchwork UBIFS: fix master node recovery

login
register
mail settings
Submitter Anatolij Gustschin
Date July 6, 2011, 9:30 a.m.
Message ID <1309944626-30195-1-git-send-email-agust@denx.de>
Download mbox | patch
Permalink /patch/103445/
State New
Headers show

Comments

Anatolij Gustschin - July 6, 2011, 9:30 a.m.
When the 1st LEB was unmapped and written but 2nd LEB not,
the master node recovery doesn't succeed after power cut.
We see following error when mounting UBIFS partition on NOR
flash:

UBIFS error (pid 1137): ubifs_recover_master_node: failed to recover master node

Additional 2nd master node offset check is needed to fix the
problem. If the 2nd master node is at the end in the 2nd LEB,
first master node is used for recovery.

Signed-off-by: Anatolij Gustschin <agust@denx.de>
---
 fs/ubifs/recovery.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)
Artem Bityutskiy - July 7, 2011, 6:15 a.m.
Anatolij,

On Wed, 2011-07-06 at 11:30 +0200, Anatolij Gustschin wrote:
> When the 1st LEB was unmapped and written but 2nd LEB not,
> the master node recovery doesn't succeed after power cut.
> We see following error when mounting UBIFS partition on NOR
> flash:
> 
> UBIFS error (pid 1137): ubifs_recover_master_node: failed to recover master node

Could you please provide some numbers - when this error happens, what is
offs2, sz, and c->leb_size when this error happens?

Frankly, I do not understand this change so far...
Artem Bityutskiy - July 7, 2011, 6:26 a.m.
On Wed, 2011-07-06 at 11:30 +0200, Anatolij Gustschin wrote:
> When the 1st LEB was unmapped and written but 2nd LEB not,
> the master node recovery doesn't succeed after power cut.
> We see following error when mounting UBIFS partition on NOR
> flash:
> 
> UBIFS error (pid 1137): ubifs_recover_master_node: failed to recover master node
> 
> Additional 2nd master node offset check is needed to fix the
> problem. If the 2nd master node is at the end in the 2nd LEB,
> first master node is used for recovery.
> 
> Signed-off-by: Anatolij Gustschin <agust@denx.de>
> ---
>  fs/ubifs/recovery.c |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/ubifs/recovery.c b/fs/ubifs/recovery.c
> index 783d8e0..0e951f0 100644
> --- a/fs/ubifs/recovery.c
> +++ b/fs/ubifs/recovery.c
> @@ -274,7 +274,9 @@ int ubifs_recover_master_node(struct ubifs_info *c)
>  				if (cor1)
>  					goto out_err;
>  				mst = mst1;
> -			} else if (offs1 == 0 && offs2 + sz >= c->leb_size) {
> +			} else if (offs1 == 0 &&
> +				   (offs2 + sz >= c->leb_size ||
> +				    (c->leb_size - offs2 - sz) < sz)) {

Hmm, ok, so we have situation like this I guess:

First LEB:   |M_______|
Second LEB:  |MMMMMMM_|
              1234567      

Where M - valid master node, _ - just free space.

So offs1 is 0, and offs2 is the position where the 7th M _starts_.

What we want to check is that there is no room for another master node
in the second LEB. We have to take offs2, add sz, and make sure that LEB
size minus that is less than sz, i.e., exactly what you have done.

And  offs2 + sz >= c->leb_size seems to be completely incorrect and
should be removed, AFAICS. Can you confirm that?

>  				/* 1st LEB was unmapped and written, 2nd not */
>  				if (cor1)
>  					goto out_err;
Anatolij Gustschin - July 7, 2011, 7:36 a.m.
On Thu, 07 Jul 2011 09:15:33 +0300
Artem Bityutskiy <dedekind1@gmail.com> wrote:

> Anatolij,
> 
> On Wed, 2011-07-06 at 11:30 +0200, Anatolij Gustschin wrote:
> > When the 1st LEB was unmapped and written but 2nd LEB not,
> > the master node recovery doesn't succeed after power cut.
> > We see following error when mounting UBIFS partition on NOR
> > flash:
> > 
> > UBIFS error (pid 1137): ubifs_recover_master_node: failed to recover master node
> 
> Could you please provide some numbers - when this error happens, what is
> offs2, sz, and c->leb_size when this error happens?

offs2 = 261120, sz = 512, c-leb_size = 262016. There are 384 unused
bytes at the end of the LEB 2: 262016 - 261120 - 512 = 384.

Anatolij
Anatolij Gustschin - July 7, 2011, 7:43 a.m.
On Thu, 07 Jul 2011 09:26:26 +0300
Artem Bityutskiy <dedekind1@gmail.com> wrote:

> On Wed, 2011-07-06 at 11:30 +0200, Anatolij Gustschin wrote:
> > When the 1st LEB was unmapped and written but 2nd LEB not,
> > the master node recovery doesn't succeed after power cut.
> > We see following error when mounting UBIFS partition on NOR
> > flash:
> > 
> > UBIFS error (pid 1137): ubifs_recover_master_node: failed to recover master node
> > 
> > Additional 2nd master node offset check is needed to fix the
> > problem. If the 2nd master node is at the end in the 2nd LEB,
> > first master node is used for recovery.
> > 
> > Signed-off-by: Anatolij Gustschin <agust@denx.de>
> > ---
> >  fs/ubifs/recovery.c |    4 +++-
> >  1 files changed, 3 insertions(+), 1 deletions(-)
> > 
> > diff --git a/fs/ubifs/recovery.c b/fs/ubifs/recovery.c
> > index 783d8e0..0e951f0 100644
> > --- a/fs/ubifs/recovery.c
> > +++ b/fs/ubifs/recovery.c
> > @@ -274,7 +274,9 @@ int ubifs_recover_master_node(struct ubifs_info *c)
> >  				if (cor1)
> >  					goto out_err;
> >  				mst = mst1;
> > -			} else if (offs1 == 0 && offs2 + sz >= c->leb_size) {
> > +			} else if (offs1 == 0 &&
> > +				   (offs2 + sz >= c->leb_size ||
> > +				    (c->leb_size - offs2 - sz) < sz)) {
> 
> Hmm, ok, so we have situation like this I guess:
> 
> First LEB:   |M_______|
> Second LEB:  |MMMMMMM_|
>               1234567      
> 
> Where M - valid master node, _ - just free space.
> 
> So offs1 is 0, and offs2 is the position where the 7th M _starts_.

Yes, in our case we have 511 valid master nodes + 384 bytes free
space in LEB 2 and one valid master node in LEB 1 + free space.

> What we want to check is that there is no room for another master node
> in the second LEB. We have to take offs2, add sz, and make sure that LEB
> size minus that is less than sz, i.e., exactly what you have done.
> 
> And  offs2 + sz >= c->leb_size seems to be completely incorrect and
> should be removed, AFAICS. Can you confirm that?

Yes, offs2 + sz >= c->leb_size check is not correct, I think.

Anatolij
Artem Bityutskiy - July 7, 2011, 7:57 a.m.
On Thu, 2011-07-07 at 09:43 +0200, Anatolij Gustschin wrote:
> Yes, in our case we have 511 valid master nodes + 384 bytes free
> space in LEB 2 and one valid master node in LEB 1 + free space.
> 
> > What we want to check is that there is no room for another master node
> > in the second LEB. We have to take offs2, add sz, and make sure that LEB
> > size minus that is less than sz, i.e., exactly what you have done.
> > 
> > And  offs2 + sz >= c->leb_size seems to be completely incorrect and
> > should be removed, AFAICS. Can you confirm that?
> 
> Yes, offs2 + sz >= c->leb_size check is not correct, I think.

I also think so. Could you send v2 of your patch - kill the incorrect
check and add your correct check instead.

I do not have time for testing this now, so I'd ask you go test it. But
I realize that power cut testing can take a lot of time, and your patch
looks right to me, so I'm happy to take your patch right away, but I'd
anyway ask you to conduct some testing anyway and inform about your
results later.

BTW, you can try the power cut emulation testing. If you care a lot
about power-cut robustness, it is helpful. The integck test now can do
power cut emulation testing with UBIFS. But there still some small
issues, though.
Anatolij Gustschin - July 7, 2011, 8:44 a.m.
On Thu, 07 Jul 2011 10:57:05 +0300
Artem Bityutskiy <dedekind1@gmail.com> wrote:

> On Thu, 2011-07-07 at 09:43 +0200, Anatolij Gustschin wrote:
> > Yes, in our case we have 511 valid master nodes + 384 bytes free
> > space in LEB 2 and one valid master node in LEB 1 + free space.
> > 
> > > What we want to check is that there is no room for another master node
> > > in the second LEB. We have to take offs2, add sz, and make sure that LEB
> > > size minus that is less than sz, i.e., exactly what you have done.
> > > 
> > > And  offs2 + sz >= c->leb_size seems to be completely incorrect and
> > > should be removed, AFAICS. Can you confirm that?
> > 
> > Yes, offs2 + sz >= c->leb_size check is not correct, I think.
> 
> I also think so. Could you send v2 of your patch - kill the incorrect
> check and add your correct check instead.

Thinking more about this I can imagine the situation where
off2 + sz == c->leb_size condition is true (depending on LEB size,
node alignment, etc.), so we should check for this condition also.
off2 + sz > c->leb_size can't be the case since off2 is the offset
of a valid node. So we should check for
off2 + sz == c->leb_size || (c->leb_size - offs2 - sz) < sz.

> I do not have time for testing this now, so I'd ask you go test it. But
> I realize that power cut testing can take a lot of time, and your patch
> looks right to me, so I'm happy to take your patch right away, but I'd
> anyway ask you to conduct some testing anyway and inform about your
> results later.
> 
> BTW, you can try the power cut emulation testing. If you care a lot
> about power-cut robustness, it is helpful. The integck test now can do
> power cut emulation testing with UBIFS. But there still some small
> issues, though.

Thanks,
Anatolij
Artem Bityutskiy - July 7, 2011, 8:57 a.m.
On Thu, 2011-07-07 at 10:44 +0200, Anatolij Gustschin wrote:
> On Thu, 07 Jul 2011 10:57:05 +0300
> Artem Bityutskiy <dedekind1@gmail.com> wrote:
> 
> > On Thu, 2011-07-07 at 09:43 +0200, Anatolij Gustschin wrote:
> > > Yes, in our case we have 511 valid master nodes + 384 bytes free
> > > space in LEB 2 and one valid master node in LEB 1 + free space.
> > > 
> > > > What we want to check is that there is no room for another master node
> > > > in the second LEB. We have to take offs2, add sz, and make sure that LEB
> > > > size minus that is less than sz, i.e., exactly what you have done.
> > > > 
> > > > And  offs2 + sz >= c->leb_size seems to be completely incorrect and
> > > > should be removed, AFAICS. Can you confirm that?
> > > 
> > > Yes, offs2 + sz >= c->leb_size check is not correct, I think.
> > 
> > I also think so. Could you send v2 of your patch - kill the incorrect
> > check and add your correct check instead.
> 
> Thinking more about this I can imagine the situation where
> off2 + sz == c->leb_size condition is true (depending on LEB size,
> node alignment, etc.), so we should check for this condition also.
> off2 + sz > c->leb_size can't be the case since off2 is the offset
> of a valid node. So we should check for
> off2 + sz == c->leb_size || (c->leb_size - offs2 - sz) < sz.

If offs2 + sz == c->leb_size, then we end up with

0 < sz, which will be true, and everything should be fine, no?
Anatolij Gustschin - July 7, 2011, 9:05 a.m.
On Thu, 07 Jul 2011 11:57:04 +0300
Artem Bityutskiy <dedekind1@gmail.com> wrote:
...
> > Thinking more about this I can imagine the situation where
> > off2 + sz == c->leb_size condition is true (depending on LEB size,
> > node alignment, etc.), so we should check for this condition also.
> > off2 + sz > c->leb_size can't be the case since off2 is the offset
> > of a valid node. So we should check for
> > off2 + sz == c->leb_size || (c->leb_size - offs2 - sz) < sz.
> 
> If offs2 + sz == c->leb_size, then we end up with
> 
> 0 < sz, which will be true, and everything should be fine, no?

Yes, you are right, I didn't though about this.

Thanks,
Anatolij

Patch

diff --git a/fs/ubifs/recovery.c b/fs/ubifs/recovery.c
index 783d8e0..0e951f0 100644
--- a/fs/ubifs/recovery.c
+++ b/fs/ubifs/recovery.c
@@ -274,7 +274,9 @@  int ubifs_recover_master_node(struct ubifs_info *c)
 				if (cor1)
 					goto out_err;
 				mst = mst1;
-			} else if (offs1 == 0 && offs2 + sz >= c->leb_size) {
+			} else if (offs1 == 0 &&
+				   (offs2 + sz >= c->leb_size ||
+				    (c->leb_size - offs2 - sz) < sz)) {
 				/* 1st LEB was unmapped and written, 2nd not */
 				if (cor1)
 					goto out_err;