diff mbox

[U-Boot] Add 'sf update' command to do smart SPI flash update

Message ID 1313761624-11129-1-git-send-email-sjg@chromium.org
State Superseded, archived
Headers show

Commit Message

Simon Glass Aug. 19, 2011, 1:47 p.m. UTC
This adds a new SPI flash command which only rewrites blocks if the contents
need to change. This can speed up SPI flash programming when much of the
data is unchanged from what is already there.

Signed-off-by: Simon Glass <sjg@chromium.org>
---
 common/cmd_sf.c |   75 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 73 insertions(+), 2 deletions(-)

Comments

Mike Frysinger Aug. 19, 2011, 3:14 p.m. UTC | #1
On Friday, August 19, 2011 09:47:04 Simon Glass wrote:
> This adds a new SPI flash command which only rewrites blocks if the
> contents need to change. This can speed up SPI flash programming when much
> of the data is unchanged from what is already there.

sounds good

> +static int spi_flash_update(struct spi_flash *flash, u32 offset,
> +		size_t len, const char *buf)
> +{
> +	const char *oper;

i wonder if there'd be any code size difference if we had:
	static const char * const opers[] = {
		"malloc",
		"read",
		...
	};
	const char *oper;
	...
		oper = 0;
	...
		++oper;
	...
		++oper;
	...
	printf("...%s...", ..., opers[oper]);

i hope it'd be slightly smaller as you would be loading up a small int in the 
for loop and not an entire pointer ...

> +	int err = 0;
> +	char *cmp_buf = NULL;
> +	size_t cmp_size = 0;
> +	size_t todo;	/* number of bytes to do in this pass */
> +	size_t skipped, written;		/* statistics */
> +
> +	for (skipped = written = 0; !err && len > 0;
> +			buf += todo, offset += todo, len -= todo) {
> +		todo = min(len, flash->sector_size);
> +		debug("offset=%#x, sector_size=%#x, todo=%#x\n",
> +		      offset, flash->sector_size, todo);
> +		if (!err) {

why do you need to check err here ?  you already have "!err" in the for loop 
constraint, err starts off at 0, and there is no setting of err between the 
top of the for loop and this point.

> +		if (!err) {
> +			oper = "read";
> +			err = spi_flash_read(flash, offset, todo, cmp_buf);
> +		}

all this error checking makes my head spin.  i wonder if it'd look any better 
using the (more normal?) style:
err = ...;
if (err)
	break;

err = ...;
if (err)
	break;

you could then also drop "!err" from the for loop constraint, and the initial 
"err = 0".  not that big of a deal if the code size is the same ...

> +		if (!err) {
> +			if (0 == memcmp(cmp_buf, buf, todo)) {

please reverse the sides of this compare.  this is unusual style for the 
linux/u-boot projects.

> +	if (cmp_buf)
> +		free(cmp_buf);

the if() isn't necessary ... free(NULL) works fine and would add extra 
overhead only in the (uncommon) failure case.  same goes for the malloc() 
logic earlier in this func.

> +	"sf update addr offset len	- erase and write `len' bytes from "
> +		"memory\n"

please don't split the string constant on non-\n boundaries
-mike
Simon Glass Aug. 19, 2011, 6:07 p.m. UTC | #2
Hi Mike,

On Fri, Aug 19, 2011 at 9:14 AM, Mike Frysinger <vapier@gentoo.org> wrote:
> On Friday, August 19, 2011 09:47:04 Simon Glass wrote:
>> This adds a new SPI flash command which only rewrites blocks if the
>> contents need to change. This can speed up SPI flash programming when much
>> of the data is unchanged from what is already there.
>
> sounds good
>
>> +static int spi_flash_update(struct spi_flash *flash, u32 offset,
>> +             size_t len, const char *buf)
>> +{
>> +     const char *oper;
>
> i wonder if there'd be any code size difference if we had:
>        static const char * const opers[] = {
>                "malloc",
>                "read",
>                ...
>        };
>        const char *oper;
>        ...
>                oper = 0;
>        ...
>                ++oper;
>        ...
>                ++oper;
>        ...
>        printf("...%s...", ..., opers[oper]);
>
> i hope it'd be slightly smaller as you would be loading up a small int in the
> for loop and not an entire pointer ...

It's exactly the same code/data size on ARM, but perhaps smaller on a
different architecture. I like that approach anyway so will change it.

>
>> +     int err = 0;
>> +     char *cmp_buf = NULL;
>> +     size_t cmp_size = 0;
>> +     size_t todo;    /* number of bytes to do in this pass */
>> +     size_t skipped, written;                /* statistics */
>> +
>> +     for (skipped = written = 0; !err && len > 0;
>> +                     buf += todo, offset += todo, len -= todo) {
>> +             todo = min(len, flash->sector_size);
>> +             debug("offset=%#x, sector_size=%#x, todo=%#x\n",
>> +                   offset, flash->sector_size, todo);
>> +             if (!err) {
>
> why do you need to check err here ?  you already have "!err" in the for loop
> constraint, err starts off at 0, and there is no setting of err between the
> top of the for loop and this point.

It's a hangover from previous code - will remove it.

>
>> +             if (!err) {
>> +                     oper = "read";
>> +                     err = spi_flash_read(flash, offset, todo, cmp_buf);
>> +             }
>
> all this error checking makes my head spin.  i wonder if it'd look any better
> using the (more normal?) style:
> err = ...;
> if (err)
>        break;
>
> err = ...;
> if (err)
>        break;
>
> you could then also drop "!err" from the for loop constraint, and the initial
> "err = 0".  not that big of a deal if the code size is the same ...

Ick. I will send a new patch with the code in a function and see what
you think about that. It is 0x88 bytes smaller on ARM than this patch.

>
>> +             if (!err) {
>> +                     if (0 == memcmp(cmp_buf, buf, todo)) {
>
> please reverse the sides of this compare.  this is unusual style for the
> linux/u-boot projects.

ok

>
>> +     if (cmp_buf)
>> +             free(cmp_buf);
>
> the if() isn't necessary ... free(NULL) works fine and would add extra
> overhead only in the (uncommon) failure case.  same goes for the malloc()
> logic earlier in this func.

OK.

>
>> +     "sf update addr offset len      - erase and write `len' bytes from "
>> +             "memory\n"
>
> please don't split the string constant on non-\n boundaries

ok, will go over 80cols.

> -mike
>
Mike Frysinger Aug. 19, 2011, 6:19 p.m. UTC | #3
On Friday, August 19, 2011 14:07:16 Simon Glass wrote:
> On Fri, Aug 19, 2011 at 9:14 AM, Mike Frysinger wrote:
> > On Friday, August 19, 2011 09:47:04 Simon Glass wrote:
> >> +static int spi_flash_update(struct spi_flash *flash, u32 offset,
> >> +             size_t len, const char *buf)
> >> +{
> >> +     const char *oper;
> > 
> > i wonder if there'd be any code size difference if we had:
> >        static const char * const opers[] = {
> >                "malloc",
> >                "read",
> >                ...
> >        };
> >        const char *oper;
> >        ...
> >                oper = 0;
> >        ...
> >                ++oper;
> >        ...
> >                ++oper;
> >        ...
> >        printf("...%s...", ..., opers[oper]);
> > 
> > i hope it'd be slightly smaller as you would be loading up a small int in
> > the for loop and not an entire pointer ...
> 
> It's exactly the same code/data size on ARM, but perhaps smaller on a
> different architecture. I like that approach anyway so will change it.

i know on Blackfin at least, doing something like:
	i = 0;
will expand into a 16bit insn:
	R0 = 0;
and the follow up:
	++i;
will be another 16bit insn:
	R0 += 1;

while something like:
	void *foo = "foo";
will expand into two 32bit insns:
	P0.L = _some_label_for_string;
	P0.H = _some_label_for_string;
and since there will be no known relationship between the diff strings at the 
C -> asm stage (since string placement is done by the linker), each string 
load will be a sep pointer load.

i imagine that the normal ARM insn set sucks at this kind of code density, but 
i'd think their reworks (like thumb and friends) would be much better.

do we leverage thumb at all in ARM u-boot ?  or am i crazy/stupid for thinking 
this could possibly be a good thing ?
-mike
Simon Glass Aug. 19, 2011, 6:25 p.m. UTC | #4
Hi Mike,

On Fri, Aug 19, 2011 at 12:19 PM, Mike Frysinger <vapier@gentoo.org> wrote:
> On Friday, August 19, 2011 14:07:16 Simon Glass wrote:
>> On Fri, Aug 19, 2011 at 9:14 AM, Mike Frysinger wrote:
>> > On Friday, August 19, 2011 09:47:04 Simon Glass wrote:
>> >> +static int spi_flash_update(struct spi_flash *flash, u32 offset,
>> >> +             size_t len, const char *buf)
>> >> +{
>> >> +     const char *oper;
>> >
>> > i wonder if there'd be any code size difference if we had:
>> >        static const char * const opers[] = {
>> >                "malloc",
>> >                "read",
>> >                ...
>> >        };
>> >        const char *oper;
>> >        ...
>> >                oper = 0;
>> >        ...
>> >                ++oper;
>> >        ...
>> >                ++oper;
>> >        ...
>> >        printf("...%s...", ..., opers[oper]);
>> >
>> > i hope it'd be slightly smaller as you would be loading up a small int in
>> > the for loop and not an entire pointer ...
>>
>> It's exactly the same code/data size on ARM, but perhaps smaller on a
>> different architecture. I like that approach anyway so will change it.
>
> i know on Blackfin at least, doing something like:
>        i = 0;
> will expand into a 16bit insn:
>        R0 = 0;
> and the follow up:
>        ++i;
> will be another 16bit insn:
>        R0 += 1;
>
> while something like:
>        void *foo = "foo";
> will expand into two 32bit insns:
>        P0.L = _some_label_for_string;
>        P0.H = _some_label_for_string;
> and since there will be no known relationship between the diff strings at the
> C -> asm stage (since string placement is done by the linker), each string
> load will be a sep pointer load.
>
> i imagine that the normal ARM insn set sucks at this kind of code density, but
> i'd think their reworks (like thumb and friends) would be much better.

Well it's just a literal pool access on ARM, so a single instruction
on both ARM and Thumb. Not sure how else the compiler would do it...

>
> do we leverage thumb at all in ARM u-boot ?  or am i crazy/stupid for thinking
> this could possibly be a good thing ?
> -mike
>

No we don't. IMO if available we should use Thumb 2 instead of ARM
most of the time - it is a lot smaller and runs about the same speed.

Regards,
Simon
Mike Frysinger Aug. 19, 2011, 6:57 p.m. UTC | #5
On Friday, August 19, 2011 14:25:46 Simon Glass wrote:
> On Fri, Aug 19, 2011 at 12:19 PM, Mike Frysinger wrote:
> > On Friday, August 19, 2011 14:07:16 Simon Glass wrote:
> >> On Fri, Aug 19, 2011 at 9:14 AM, Mike Frysinger wrote:
> >> > On Friday, August 19, 2011 09:47:04 Simon Glass wrote:
> >> >> +static int spi_flash_update(struct spi_flash *flash, u32 offset,
> >> >> +             size_t len, const char *buf)
> >> >> +{
> >> >> +     const char *oper;
> >> > 
> >> > i wonder if there'd be any code size difference if we had:
> >> >        static const char * const opers[] = {
> >> >                "malloc",
> >> >                "read",
> >> >                ...
> >> >        };
> >> >        const char *oper;
> >> >        ...
> >> >                oper = 0;
> >> >        ...
> >> >                ++oper;
> >> >        ...
> >> >                ++oper;
> >> >        ...
> >> >        printf("...%s...", ..., opers[oper]);
> >> > 
> >> > i hope it'd be slightly smaller as you would be loading up a small int
> >> > in the for loop and not an entire pointer ...
> >> 
> >> It's exactly the same code/data size on ARM, but perhaps smaller on a
> >> different architecture. I like that approach anyway so will change it.
> > 
> > i know on Blackfin at least, doing something like:
> >        i = 0;
> > will expand into a 16bit insn:
> >        R0 = 0;
> > and the follow up:
> >        ++i;
> > will be another 16bit insn:
> >        R0 += 1;
> > 
> > while something like:
> >        void *foo = "foo";
> > will expand into two 32bit insns:
> >        P0.L = _some_label_for_string;
> >        P0.H = _some_label_for_string;
> > and since there will be no known relationship between the diff strings at
> > the C -> asm stage (since string placement is done by the linker), each
> > string load will be a sep pointer load.
> > 
> > i imagine that the normal ARM insn set sucks at this kind of code
> > density, but i'd think their reworks (like thumb and friends) would be
> > much better.
> 
> Well it's just a literal pool access on ARM, so a single instruction
> on both ARM and Thumb. Not sure how else the compiler would do it...

i'm not as well versed in ARM lingo as you to understand what "literal pool 
access" means ...
-mike
Simon Glass Aug. 19, 2011, 7:12 p.m. UTC | #6
On Fri, Aug 19, 2011 at 12:57 PM, Mike Frysinger <vapier@gentoo.org> wrote:
> On Friday, August 19, 2011 14:25:46 Simon Glass wrote:
>> On Fri, Aug 19, 2011 at 12:19 PM, Mike Frysinger wrote:
>> > On Friday, August 19, 2011 14:07:16 Simon Glass wrote:
>> >> On Fri, Aug 19, 2011 at 9:14 AM, Mike Frysinger wrote:
>> >> > On Friday, August 19, 2011 09:47:04 Simon Glass wrote:
>> >> >> +static int spi_flash_update(struct spi_flash *flash, u32 offset,
>> >> >> +             size_t len, const char *buf)
>> >> >> +{
>> >> >> +     const char *oper;
>> >> >
>> >> > i wonder if there'd be any code size difference if we had:
>> >> >        static const char * const opers[] = {
>> >> >                "malloc",
>> >> >                "read",
>> >> >                ...
>> >> >        };
>> >> >        const char *oper;
>> >> >        ...
>> >> >                oper = 0;
>> >> >        ...
>> >> >                ++oper;
>> >> >        ...
>> >> >                ++oper;
>> >> >        ...
>> >> >        printf("...%s...", ..., opers[oper]);
>> >> >
>> >> > i hope it'd be slightly smaller as you would be loading up a small int
>> >> > in the for loop and not an entire pointer ...
>> >>
>> >> It's exactly the same code/data size on ARM, but perhaps smaller on a
>> >> different architecture. I like that approach anyway so will change it.
>> >
>> > i know on Blackfin at least, doing something like:
>> >        i = 0;
>> > will expand into a 16bit insn:
>> >        R0 = 0;
>> > and the follow up:
>> >        ++i;
>> > will be another 16bit insn:
>> >        R0 += 1;
>> >
>> > while something like:
>> >        void *foo = "foo";
>> > will expand into two 32bit insns:
>> >        P0.L = _some_label_for_string;
>> >        P0.H = _some_label_for_string;
>> > and since there will be no known relationship between the diff strings at
>> > the C -> asm stage (since string placement is done by the linker), each
>> > string load will be a sep pointer load.
>> >
>> > i imagine that the normal ARM insn set sucks at this kind of code
>> > density, but i'd think their reworks (like thumb and friends) would be
>> > much better.
>>
>> Well it's just a literal pool access on ARM, so a single instruction
>> on both ARM and Thumb. Not sure how else the compiler would do it...
>
> i'm not as well versed in ARM lingo as you to understand what "literal pool
> access" means ...
> -mike
>
Hi Mike,

:-) It's just a pool of literals, basically in this case a list of
string pointers, placed close to the PC address so that you can easily
access them with a PC-relative load. It's not an ARM think though -
lots of CPUs use them. The alternative is to encode the full address
in the instruction, so I suppose it's this:

ldr r0, fred    (which turns into ldr r0, [pc, #offset of fred])

fred:
   DCD 0x12345678


instead of:

mov r0, #0x12345678

Regards,
Simon
diff mbox

Patch

diff --git a/common/cmd_sf.c b/common/cmd_sf.c
index 11a491d..4a807eb 100644
--- a/common/cmd_sf.c
+++ b/common/cmd_sf.c
@@ -6,6 +6,7 @@ 
  */
 
 #include <common.h>
+#include <malloc.h>
 #include <spi_flash.h>
 
 #include <asm/io.h>
@@ -109,6 +110,70 @@  static int do_spi_flash_probe(int argc, char * const argv[])
 	return 0;
 }
 
+/**
+ * Update an area of SPI flash by erasing and writing any blocks which need
+ * to change. Existing blocks with the correct data are left unchanged.
+ *
+ * @param flash		flash context pointer
+ * @param offset	flash offset to write
+ * @param len		number of bytes to write
+ * @param buf		buffer to write from
+ */
+static int spi_flash_update(struct spi_flash *flash, u32 offset,
+		size_t len, const char *buf)
+{
+	const char *oper;
+	int err = 0;
+	char *cmp_buf = NULL;
+	size_t cmp_size = 0;
+	size_t todo;	/* number of bytes to do in this pass */
+	size_t skipped, written;		/* statistics */
+
+	for (skipped = written = 0; !err && len > 0;
+			buf += todo, offset += todo, len -= todo) {
+		todo = min(len, flash->sector_size);
+		debug("offset=%#x, sector_size=%#x, todo=%#x\n",
+		      offset, flash->sector_size, todo);
+		if (!err) {
+			oper = "malloc";
+			if (todo > cmp_size) {
+				if (cmp_buf)
+					free(cmp_buf);
+				cmp_buf = malloc(todo);
+				cmp_size = todo;
+			}
+			err = cmp_buf == NULL;
+		}
+		if (!err) {
+			oper = "read";
+			err = spi_flash_read(flash, offset, todo, cmp_buf);
+		}
+		if (!err) {
+			if (0 == memcmp(cmp_buf, buf, todo)) {
+				debug("Skip region %x size %x: no change\n",
+				      offset, todo);
+				skipped += todo;
+				continue;
+			}
+		}
+		if (!err) {
+			oper = "erase";
+			err = spi_flash_erase(flash, offset, todo);
+		}
+		if (!err) {
+			oper = "write";
+			err = spi_flash_write(flash, offset, todo, buf);
+			written += todo;
+		}
+	}
+	printf("%d bytes written, %d bytes skipped\n", written, skipped);
+	if (err)
+		printf("SPI flash %s failed\n", oper);
+	if (cmp_buf)
+		free(cmp_buf);
+	return err;
+}
+
 static int do_spi_flash_read_write(int argc, char * const argv[])
 {
 	unsigned long addr;
@@ -137,7 +202,9 @@  static int do_spi_flash_read_write(int argc, char * const argv[])
 		return 1;
 	}
 
-	if (strcmp(argv[0], "read") == 0)
+	if (strcmp(argv[0], "update") == 0)
+		ret = spi_flash_update(flash, offset, len, buf);
+	else if (strcmp(argv[0], "read") == 0)
 		ret = spi_flash_read(flash, offset, len, buf);
 	else
 		ret = spi_flash_write(flash, offset, len, buf);
@@ -203,7 +270,8 @@  static int do_spi_flash(cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[
 		return 1;
 	}
 
-	if (strcmp(cmd, "read") == 0 || strcmp(cmd, "write") == 0)
+	if (strcmp(cmd, "read") == 0 || strcmp(cmd, "write") == 0 ||
+			strcmp(cmd, "update") == 0)
 		ret = do_spi_flash_read_write(argc, argv);
 	else if (strcmp(cmd, "erase") == 0)
 		ret = do_spi_flash_erase(argc, argv);
@@ -229,4 +297,7 @@  U_BOOT_CMD(
 	"				  at `addr' to flash at `offset'\n"
 	"sf erase offset [+]len		- erase `len' bytes from `offset'\n"
 	"				  `+len' round up `len' to block size"
+	"sf update addr offset len	- erase and write `len' bytes from "
+		"memory\n"
+	"				  at `addr' to flash at `offset'\n"
 );