diff mbox series

[U-Boot] fw_printenv: Don't bail out directly after one env read error

Message ID 20180626093759.24018-1-adrian.ratiu@ni.com
State Accepted
Commit 3925b2ac97b50b1facab096ac98243615683c295
Delegated to: Wolfgang Denk
Headers show
Series [U-Boot] fw_printenv: Don't bail out directly after one env read error | expand

Commit Message

Ioan-Adrian Ratiu June 26, 2018, 9:37 a.m. UTC
From: Joe Hershberger <joe.hershberger@ni.com>

When using a redundant environment a read error should simply mean to
not use that copy instead of giving up completely. The other copy may
be just fine.

Signed-off-by: Joe Hershberger <joe.hershberger@ni.com>
Signed-off-by: Ioan-Adrian Ratiu <adrian.ratiu@ni.com>
---
 tools/env/fw_env.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

Comments

Joe Hershberger June 26, 2018, 3:16 p.m. UTC | #1
On Tue, Jun 26, 2018 at 4:37 AM, Ioan-Adrian Ratiu <adrian.ratiu@ni.com> wrote:
> From: Joe Hershberger <joe.hershberger@ni.com>
>
> When using a redundant environment a read error should simply mean to
> not use that copy instead of giving up completely. The other copy may
> be just fine.
>
> Signed-off-by: Joe Hershberger <joe.hershberger@ni.com>
> Signed-off-by: Ioan-Adrian Ratiu <adrian.ratiu@ni.com>

Hey Tom, can you pull this in?

Thanks,
-Joe

> ---
>  tools/env/fw_env.c | 24 +++++++++++++++---------
>  1 file changed, 15 insertions(+), 9 deletions(-)
>
> diff --git a/tools/env/fw_env.c b/tools/env/fw_env.c
> index a5d75958e1..3a5ad026f0 100644
> --- a/tools/env/fw_env.c
> +++ b/tools/env/fw_env.c
> @@ -1427,14 +1427,21 @@ int fw_env_open(struct env_opts *opts)
>         }
>
>         dev_current = 0;
> -       if (flash_io(O_RDONLY)) {
> +
> +       if (!flash_io(O_RDONLY)) {
> +               crc0 = crc32(0, (uint8_t *)environment.data, ENV_SIZE);
> +               crc0_ok = (crc0 == *environment.crc);
> +       } else if (have_redund_env) {
> +               /*
> +                * to give the redundant env a chance, maybe it's good:
> +                * mark env crc0 invalid then test below if crc1 is ok
> +                */
> +               crc0_ok = 0;
> +       } else {
>                 ret = -EIO;
>                 goto open_cleanup;
>         }
>
> -       crc0 = crc32(0, (uint8_t *)environment.data, ENV_SIZE);
> -
> -       crc0_ok = (crc0 == *environment.crc);
>         if (!have_redund_env) {
>                 if (!crc0_ok) {
>                         fprintf(stderr,
> @@ -1462,8 +1469,10 @@ int fw_env_open(struct env_opts *opts)
>                  */
>                 environment.image = addr1;
>                 if (flash_io(O_RDONLY)) {
> -                       ret = -EIO;
> -                       goto open_cleanup;
> +                       crc1_ok = 0;
> +               } else {
> +                       crc1 = crc32(0, (uint8_t *)redundant->data, ENV_SIZE);
> +                       crc1_ok = (crc1 == redundant->crc);
>                 }
>
>                 /* Check flag scheme compatibility */
> @@ -1489,9 +1498,6 @@ int fw_env_open(struct env_opts *opts)
>                         goto open_cleanup;
>                 }
>
> -               crc1 = crc32(0, (uint8_t *)redundant->data, ENV_SIZE);
> -
> -               crc1_ok = (crc1 == redundant->crc);
>                 flag1 = redundant->flags;
>
>                 if (crc0_ok && !crc1_ok) {
> --
> 2.17.1
>
> _______________________________________________
> U-Boot mailing list
> U-Boot@lists.denx.de
> https://lists.denx.de/listinfo/u-boot
Tom Rini June 28, 2018, 2:20 a.m. UTC | #2
On Tue, Jun 26, 2018 at 12:37:59PM +0300, Ioan-Adrian Ratiu wrote:

> From: Joe Hershberger <joe.hershberger@ni.com>
> 
> When using a redundant environment a read error should simply mean to
> not use that copy instead of giving up completely. The other copy may
> be just fine.
> 
> Signed-off-by: Joe Hershberger <joe.hershberger@ni.com>
> Signed-off-by: Ioan-Adrian Ratiu <adrian.ratiu@ni.com>

Applied to u-boot/master, thanks!
Wolfgang Denk June 29, 2018, 10:53 a.m. UTC | #3
Dear Adrian,

In message <20180626093759.24018-1-adrian.ratiu@ni.com> you wrote:
> From: Joe Hershberger <joe.hershberger@ni.com>
> 
> When using a redundant environment a read error should simply mean to
> not use that copy instead of giving up completely. The other copy may
> be just fine.

While the general idea is fine, I think we should NOT automatically
read data from the backup copy, at least not without clearly letting
the user know about this - and such notification should also work in
automated scripts or cod calling these routines, so a plain warning
message is NOT sufficient.

I suggest that the default remains as is: environment read errors
cause an error return of this function.


But it would probably nice for recovery purposes or such to add an
option to switch into some "permissive" mode - here the fall-back to
the redundant copy would be permitted, and the return code should
indicate what happened (read of primary env copy OK; read failed,
but redundant copy could ne read; all read attemtps failed).


Thanks!

Best regards,

Wolfgang Denk
Wolfgang Denk June 29, 2018, 10:57 a.m. UTC | #4
Dear Joe,

In message <CANr=Z=atFzdNO6gNhMgopCHaQ-KXPGMfaOz2+_KCVrKwkMOhuw@mail.gmail.com> you wrote:
>
> > When using a redundant environment a read error should simply mean to
> > not use that copy instead of giving up completely. The other copy may
> > be just fine.
> >
> > Signed-off-by: Joe Hershberger <joe.hershberger@ni.com>
> > Signed-off-by: Ioan-Adrian Ratiu <adrian.ratiu@ni.com>
> 
> Hey Tom, can you pull this in?

NO!  Please don't!!

NAK!!

This patch can lead to reading incorrect (old, no longer valid)
values without any way for the user to see what is happening.

This must not be done!

Best regards,

Wolfgang Denk
Wolfgang Denk June 29, 2018, 10:59 a.m. UTC | #5
Dear Tom,

In message <20180628022059.GR4609@bill-the-cat.ec.rr.com> you wrote:
> 
> > From: Joe Hershberger <joe.hershberger@ni.com>
> >=20
> > When using a redundant environment a read error should simply mean to
> > not use that copy instead of giving up completely. The other copy may
> > be just fine.
> >=20
> > Signed-off-by: Joe Hershberger <joe.hershberger@ni.com>
> > Signed-off-by: Ioan-Adrian Ratiu <adrian.ratiu@ni.com>
>
> Applied to u-boot/master, thanks!

Please revert.  This is introducing a source for nasty, almost
impossible to detect bugs!

Read errors must never be ignored.

Best regards,

Wolfgang Denk
Tom Rini June 29, 2018, 6:15 p.m. UTC | #6
On Fri, Jun 29, 2018 at 12:57:45PM +0200, Wolfgang Denk wrote:
> Dear Joe,
> 
> In message <CANr=Z=atFzdNO6gNhMgopCHaQ-KXPGMfaOz2+_KCVrKwkMOhuw@mail.gmail.com> you wrote:
> >
> > > When using a redundant environment a read error should simply mean to
> > > not use that copy instead of giving up completely. The other copy may
> > > be just fine.
> > >
> > > Signed-off-by: Joe Hershberger <joe.hershberger@ni.com>
> > > Signed-off-by: Ioan-Adrian Ratiu <adrian.ratiu@ni.com>
> > 
> > Hey Tom, can you pull this in?
> 
> NO!  Please don't!!
> 
> NAK!!
> 
> This patch can lead to reading incorrect (old, no longer valid)
> values without any way for the user to see what is happening.
> 
> This must not be done!

I'm not 100% sure, after reading all of the code, if there's a problem.
What we indeed do not want to do is be silent in seeing that the first
environment location we read from failed.  But AFAICT if flash_io
returns non-zero we also output something useful to stderr, so it should
be visible to the user that something went wrong.  The next question is,
if half of the redundant environment has failed, is the other half
considered valid (so long as the crc passes) or would only the built-in
be valid?  I would think the other half is the valid one.
Wolfgang Denk June 30, 2018, 10:35 a.m. UTC | #7
Dear Tom,

In message <20180629181550.GG18596@bill-the-cat> you wrote:
> 
> > This patch can lead to reading incorrect (old, no longer valid)
> > values without any way for the user to see what is happening.
> > 
> > This must not be done!
>
> I'm not 100% sure, after reading all of the code, if there's a problem.

There is.

> What we indeed do not want to do is be silent in seeing that the first
> environment location we read from failed.  But AFAICT if flash_io
> returns non-zero we also output something useful to stderr, so it should
> be visible to the user that something went wrong.

But there is no indication about the problem in the return code, so
any tools using this have no way to determine there was a problem.

> The next question is,
> if half of the redundant environment has failed, is the other half
> considered valid (so long as the crc passes) or would only the built-in
> be valid?  I would think the other half is the valid one.

It may be valid, but it is still not what we want.

redundant environment is used to provide a fall-back in case of
poblems when _writing_ the environment, i. e. like a power-off while
a "saveenv" is running.  The purpose is to make sute that even then
we will always have a valid environment, with correct checksum.

But as soon as the saveenv has succesfully completed, we do not have
to equal, exchangable copies - instead, we have one which holds the
current data (and only this is what represents the "correct" state),
and another copy, which holds the old state, i. e. which is
obsolete.

Sielntly swapping the current and the obsolete data is a serious
bug, which must never happen.  It may be OK as part of some recovery
procedure to limit the damage, but it must never ever go unnoticed.

Assume you switch to another boot device / boot partion as part of a
system update procedure.  Then booting the wrong system is
definitely not what you want.

Please revert!

Best regards,

Wolfgang Denk
Tom Rini July 2, 2018, 3:16 p.m. UTC | #8
On Sat, Jun 30, 2018 at 12:35:49PM +0200, Wolfgang Denk wrote:
> Dear Tom,
> 
> In message <20180629181550.GG18596@bill-the-cat> you wrote:
> > 
> > > This patch can lead to reading incorrect (old, no longer valid)
> > > values without any way for the user to see what is happening.
> > > 
> > > This must not be done!
> >
> > I'm not 100% sure, after reading all of the code, if there's a problem.
> 
> There is.
> 
> > What we indeed do not want to do is be silent in seeing that the first
> > environment location we read from failed.  But AFAICT if flash_io
> > returns non-zero we also output something useful to stderr, so it should
> > be visible to the user that something went wrong.
> 
> But there is no indication about the problem in the return code, so
> any tools using this have no way to determine there was a problem.

Ah, true, thanks!

> > The next question is,
> > if half of the redundant environment has failed, is the other half
> > considered valid (so long as the crc passes) or would only the built-in
> > be valid?  I would think the other half is the valid one.
> 
> It may be valid, but it is still not what we want.
> 
> redundant environment is used to provide a fall-back in case of
> poblems when _writing_ the environment, i. e. like a power-off while
> a "saveenv" is running.  The purpose is to make sute that even then
> we will always have a valid environment, with correct checksum.
> 
> But as soon as the saveenv has succesfully completed, we do not have
> to equal, exchangable copies - instead, we have one which holds the
> current data (and only this is what represents the "correct" state),
> and another copy, which holds the old state, i. e. which is
> obsolete.
> 
> Sielntly swapping the current and the obsolete data is a serious
> bug, which must never happen.  It may be OK as part of some recovery
> procedure to limit the damage, but it must never ever go unnoticed.
> 
> Assume you switch to another boot device / boot partion as part of a
> system update procedure.  Then booting the wrong system is
> definitely not what you want.
> 
> Please revert!

Joe, Ioan-Adrian, I've reverted this.  Please re-work the changes such
that the use case you need to support is handled in some manner (likely
a new flag being passed to fw_printenv/setenv) such that the current
behavior is still the default, given Wolfgang's explanations above,
thanks!
diff mbox series

Patch

diff --git a/tools/env/fw_env.c b/tools/env/fw_env.c
index a5d75958e1..3a5ad026f0 100644
--- a/tools/env/fw_env.c
+++ b/tools/env/fw_env.c
@@ -1427,14 +1427,21 @@  int fw_env_open(struct env_opts *opts)
 	}
 
 	dev_current = 0;
-	if (flash_io(O_RDONLY)) {
+
+	if (!flash_io(O_RDONLY)) {
+		crc0 = crc32(0, (uint8_t *)environment.data, ENV_SIZE);
+		crc0_ok = (crc0 == *environment.crc);
+	} else if (have_redund_env) {
+		/*
+		 * to give the redundant env a chance, maybe it's good:
+		 * mark env crc0 invalid then test below if crc1 is ok
+		 */
+		crc0_ok = 0;
+	} else {
 		ret = -EIO;
 		goto open_cleanup;
 	}
 
-	crc0 = crc32(0, (uint8_t *)environment.data, ENV_SIZE);
-
-	crc0_ok = (crc0 == *environment.crc);
 	if (!have_redund_env) {
 		if (!crc0_ok) {
 			fprintf(stderr,
@@ -1462,8 +1469,10 @@  int fw_env_open(struct env_opts *opts)
 		 */
 		environment.image = addr1;
 		if (flash_io(O_RDONLY)) {
-			ret = -EIO;
-			goto open_cleanup;
+			crc1_ok = 0;
+		} else {
+			crc1 = crc32(0, (uint8_t *)redundant->data, ENV_SIZE);
+			crc1_ok = (crc1 == redundant->crc);
 		}
 
 		/* Check flag scheme compatibility */
@@ -1489,9 +1498,6 @@  int fw_env_open(struct env_opts *opts)
 			goto open_cleanup;
 		}
 
-		crc1 = crc32(0, (uint8_t *)redundant->data, ENV_SIZE);
-
-		crc1_ok = (crc1 == redundant->crc);
 		flag1 = redundant->flags;
 
 		if (crc0_ok && !crc1_ok) {