diff mbox

[U-Boot,v2,3/7] spl: nand: support redundant u-boot image

Message ID 1464780204-17737-4-git-send-email-boris.brezillon@free-electrons.com
State Superseded
Delegated to: Scott Wood
Headers show

Commit Message

Boris Brezillon June 1, 2016, 11:23 a.m. UTC
On modern NAND it's more than recommended to have a backup copy of the
u-boot binary to recover from corruption: bitflips are quite common on
MLC NANDs, and the read-disturbance will corrupt your u-boot partitition
more quickly than what you would see on an SLC NAND.

Add an extra Kconfig option to specify the offset of the redundant u-boot
image.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
---
 common/spl/spl_nand.c    | 8 ++++++++
 drivers/mtd/nand/Kconfig | 6 ++++++
 2 files changed, 14 insertions(+)

Comments

Crystal Wood June 4, 2016, 1:15 a.m. UTC | #1
On Wed, 2016-06-01 at 13:23 +0200, Boris Brezillon wrote:
> On modern NAND it's more than recommended to have a backup copy of the
> u-boot binary to recover from corruption: bitflips are quite common on
> MLC NANDs, and the read-disturbance will corrupt your u-boot partitition
> more quickly than what you would see on an SLC NAND.

Wouldn't the same happen to the SPL itself?  Or is the boot block implemented
in a different, more robust manner?

> Add an extra Kconfig option to specify the offset of the redundant u-boot
> image.
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
> Acked-by: Hans de Goede <hdegoede@redhat.com>
> ---
>  common/spl/spl_nand.c    | 8 ++++++++
>  drivers/mtd/nand/Kconfig | 6 ++++++
>  2 files changed, 14 insertions(+)
> 
> diff --git a/common/spl/spl_nand.c b/common/spl/spl_nand.c
> index 612bd4a..0bf0848 100644
> --- a/common/spl/spl_nand.c
> +++ b/common/spl/spl_nand.c
> @@ -12,6 +12,9 @@
>  
>  #ifndef CONFIG_SYS_NAND_U_BOOT_OFFS
>  #define CONFIG_SYS_NAND_U_BOOT_OFFS CONFIG_SPL_NAND_U_BOOT_OFFS
> +#define CONFIG_SYS_NAND_U_BOOT_OFFS_REDUND
> CONFIG_SPL_NAND_U_BOOT_OFFS_REDUND
> +#else
> +#define CONFIG_SYS_NAND_U_BOOT_OFFS_REDUND CONFIG_SYS_NAND_U_BOOT_OFFS
>  #endif
>  
>  #if defined(CONFIG_SPL_NAND_RAW_ONLY)
> @@ -111,6 +114,11 @@ int spl_nand_load_image(void)
>  #endif
>  	/* Load u-boot */
>  	err = spl_nand_load_element(CONFIG_SYS_NAND_U_BOOT_OFFS, header);
> +#if CONFIG_SYS_NAND_U_BOOT_OFFS != CONFIG_SYS_NAND_U_BOOT_OFFS_REDUND
> +	if (err)
> +		err =
> spl_nand_load_element(CONFIG_SYS_NAND_U_BOOT_OFFS_REDUND,
> +					    header);
> +#endif

If one of the images has failed, doesn't it need to be reflashed before the
other one goes bad as well?  How does the failure get communicated to later
parts of the system that would be responsible for such reflashing?

-Scott
Boris Brezillon June 4, 2016, 6:15 a.m. UTC | #2
On Fri, 03 Jun 2016 20:15:16 -0500
Scott Wood <oss@buserror.net> wrote:

> On Wed, 2016-06-01 at 13:23 +0200, Boris Brezillon wrote:
> > On modern NAND it's more than recommended to have a backup copy of the
> > u-boot binary to recover from corruption: bitflips are quite common on
> > MLC NANDs, and the read-disturbance will corrupt your u-boot partitition
> > more quickly than what you would see on an SLC NAND.  
> 
> Wouldn't the same happen to the SPL itself?  Or is the boot block implemented
> in a different, more robust manner?

Nope, the same happens to the SPL image, and we're actually using the
same trick: the brom code search for a valid SPL image every 64
pages is duplicated every 64 pages (it tests the first 8 locations:
page 0, page 64, page 128, ..., page 448).

We usually fill 2 blocks with SPL images (repeating it several times in
each block).

> 
> > Add an extra Kconfig option to specify the offset of the redundant u-boot
> > image.
> > 
> > Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
> > Acked-by: Hans de Goede <hdegoede@redhat.com>
> > ---
> >  common/spl/spl_nand.c    | 8 ++++++++
> >  drivers/mtd/nand/Kconfig | 6 ++++++
> >  2 files changed, 14 insertions(+)
> > 
> > diff --git a/common/spl/spl_nand.c b/common/spl/spl_nand.c
> > index 612bd4a..0bf0848 100644
> > --- a/common/spl/spl_nand.c
> > +++ b/common/spl/spl_nand.c
> > @@ -12,6 +12,9 @@
> >  
> >  #ifndef CONFIG_SYS_NAND_U_BOOT_OFFS
> >  #define CONFIG_SYS_NAND_U_BOOT_OFFS CONFIG_SPL_NAND_U_BOOT_OFFS
> > +#define CONFIG_SYS_NAND_U_BOOT_OFFS_REDUND
> > CONFIG_SPL_NAND_U_BOOT_OFFS_REDUND
> > +#else
> > +#define CONFIG_SYS_NAND_U_BOOT_OFFS_REDUND CONFIG_SYS_NAND_U_BOOT_OFFS
> >  #endif
> >  
> >  #if defined(CONFIG_SPL_NAND_RAW_ONLY)
> > @@ -111,6 +114,11 @@ int spl_nand_load_image(void)
> >  #endif
> >  	/* Load u-boot */
> >  	err = spl_nand_load_element(CONFIG_SYS_NAND_U_BOOT_OFFS, header);
> > +#if CONFIG_SYS_NAND_U_BOOT_OFFS != CONFIG_SYS_NAND_U_BOOT_OFFS_REDUND
> > +	if (err)
> > +		err =
> > spl_nand_load_element(CONFIG_SYS_NAND_U_BOOT_OFFS_REDUND,
> > +					    header);
> > +#endif  
> 
> If one of the images has failed, doesn't it need to be reflashed before the
> other one goes bad as well?

Yes, that's the idea.

> How does the failure get communicated to later
> parts of the system that would be responsible for such reflashing?

Linux is taking care of that (a script tries to read the u-boot
partition, and if fails it reflashes it with the content of the
u-boot-backup partition, or with a reference u-boot.bin file).
I guess u-boot could do it too.

Anyway, that's a different story ;).
Crystal Wood June 4, 2016, 7:17 a.m. UTC | #3
On Sat, 2016-06-04 at 08:15 +0200, Boris Brezillon wrote:
> On Fri, 03 Jun 2016 20:15:16 -0500
> Scott Wood <oss@buserror.net> wrote:
> 
> > How does the failure get communicated to later
> > parts of the system that would be responsible for such reflashing?
> 
> Linux is taking care of that (a script tries to read the u-boot
> partition, and if fails it reflashes it with the content of the
> u-boot-backup partition, or with a reference u-boot.bin file).
> I guess u-boot could do it too.
> 
> Anyway, that's a different story ;).

It seemed like it would be good to export the information if possible (e.g. an
environment variable) rather than rereading and thus making failures happen
twice as quickly.  But that can wait until someone actually wants to use it
that way.

-Scott
diff mbox

Patch

diff --git a/common/spl/spl_nand.c b/common/spl/spl_nand.c
index 612bd4a..0bf0848 100644
--- a/common/spl/spl_nand.c
+++ b/common/spl/spl_nand.c
@@ -12,6 +12,9 @@ 
 
 #ifndef CONFIG_SYS_NAND_U_BOOT_OFFS
 #define CONFIG_SYS_NAND_U_BOOT_OFFS CONFIG_SPL_NAND_U_BOOT_OFFS
+#define CONFIG_SYS_NAND_U_BOOT_OFFS_REDUND CONFIG_SPL_NAND_U_BOOT_OFFS_REDUND
+#else
+#define CONFIG_SYS_NAND_U_BOOT_OFFS_REDUND CONFIG_SYS_NAND_U_BOOT_OFFS
 #endif
 
 #if defined(CONFIG_SPL_NAND_RAW_ONLY)
@@ -111,6 +114,11 @@  int spl_nand_load_image(void)
 #endif
 	/* Load u-boot */
 	err = spl_nand_load_element(CONFIG_SYS_NAND_U_BOOT_OFFS, header);
+#if CONFIG_SYS_NAND_U_BOOT_OFFS != CONFIG_SYS_NAND_U_BOOT_OFFS_REDUND
+	if (err)
+		err = spl_nand_load_element(CONFIG_SYS_NAND_U_BOOT_OFFS_REDUND,
+					    header);
+#endif
 	nand_deselect();
 	return err;
 }
diff --git a/drivers/mtd/nand/Kconfig b/drivers/mtd/nand/Kconfig
index 4b0f92c..2f1d1f7 100644
--- a/drivers/mtd/nand/Kconfig
+++ b/drivers/mtd/nand/Kconfig
@@ -110,6 +110,12 @@  config SPL_NAND_U_BOOT_OFFS
 	Set the offset from the start of the nand where u-boot should be
 	loaded from.
 
+config SPL_NAND_U_BOOT_OFFS_REDUND
+	hex "Location in NAND to read U-Boot from"
+	default SPL_NAND_U_BOOT_OFFS
+	help
+	Set the offset from the start of the nand where u-boot should be
+	loaded from.
 
 config SPL_NAND_DENALI
 	bool "Support Denali NAND controller for SPL"