Message ID | 20170402155032.27473-2-sjg@chromium.org |
---|---|
State | Accepted |
Delegated to: | Simon Glass |
Headers | show |
Am Sonntag, 2. April 2017, 09:50:28 CEST schrieb Simon Glass: > Most of the time the optimised memset() is what we want. For extreme > situations such as TPL it may be too large. For example on the 'rock' > board, using a simple loop saves a useful 48 bytes. With gcc 4.9 and > the rodata bug, this patch is enough to reduce the TPL image below the > limit. > > Signed-off-by: Simon Glass <sjg@chromium.org> > --- > > Changes in v2: > - Adjust the option to be SPL-only > - Change the option to default to off (name it CONFIG_SPL_TINY_MEMSET) > > lib/Kconfig | 8 ++++++++ > lib/string.c | 6 ++++-- > 2 files changed, 12 insertions(+), 2 deletions(-) > > diff --git a/lib/Kconfig b/lib/Kconfig > index 65c01573e1..58b5717dcd 100644 > --- a/lib/Kconfig > +++ b/lib/Kconfig > @@ -52,6 +52,14 @@ config LIB_RAND > help > This library provides pseudo-random number generator functions. > > +config SPL_TINY_MEMSET > + bool "Use a very small memset() in SPL" > + help > + The faster memset() is the arch-specific one (if available) enabled > + by CONFIG_USE_ARCH_MEMSET. If that is not enabled, we can still get > + better performance by write a word at a time. Enable this option > + to reduce code size slightly at the cost of some speed. Wording sounds off, I guess we could do something like [...better performance by] writing a word at a time. In very size-constrained environments even this may be to big though. [Enable this option...] Otherwise Reviewed-by: Heiko Stuebner <heiko@sntech.de> > + > source lib/dhry/Kconfig > > source lib/rsa/Kconfig > diff --git a/lib/string.c b/lib/string.c > index 67d5f6a421..c1a28c14ce 100644 > --- a/lib/string.c > +++ b/lib/string.c > @@ -437,8 +437,10 @@ char *strswab(const char *s) > void * memset(void * s,int c,size_t count) > { > unsigned long *sl = (unsigned long *) s; > - unsigned long cl = 0; > char *s8; > + > +#if !CONFIG_IS_ENABLED(TINY_MEMSET) > + unsigned long cl = 0; > int i; > > /* do it one word at a time (32 bits or 64 bits) while possible */ > @@ -452,7 +454,7 @@ void * memset(void * s,int c,size_t count) > count -= sizeof(*sl); > } > } > - /* fill 8 bits at a time */ > +#endif /* fill 8 bits at a time */ > s8 = (char *)sl; > while (count--) > *s8++ = c;
On 4 April 2017 at 03:38, Heiko Stübner <heiko@sntech.de> wrote: > > Am Sonntag, 2. April 2017, 09:50:28 CEST schrieb Simon Glass: > > Most of the time the optimised memset() is what we want. For extreme > > situations such as TPL it may be too large. For example on the 'rock' > > board, using a simple loop saves a useful 48 bytes. With gcc 4.9 and > > the rodata bug, this patch is enough to reduce the TPL image below the > > limit. > > > > Signed-off-by: Simon Glass <sjg@chromium.org> > > --- > > > > Changes in v2: > > - Adjust the option to be SPL-only > > - Change the option to default to off (name it CONFIG_SPL_TINY_MEMSET) > > > > lib/Kconfig | 8 ++++++++ > > lib/string.c | 6 ++++-- > > 2 files changed, 12 insertions(+), 2 deletions(-) > > > > diff --git a/lib/Kconfig b/lib/Kconfig > > index 65c01573e1..58b5717dcd 100644 > > --- a/lib/Kconfig > > +++ b/lib/Kconfig > > @@ -52,6 +52,14 @@ config LIB_RAND > > help > > This library provides pseudo-random number generator functions. > > > > +config SPL_TINY_MEMSET > > + bool "Use a very small memset() in SPL" > > + help > > + The faster memset() is the arch-specific one (if available) enabled > > + by CONFIG_USE_ARCH_MEMSET. If that is not enabled, we can still get > > + better performance by write a word at a time. Enable this option > > + to reduce code size slightly at the cost of some speed. > > Wording sounds off, I guess we could do something like > > [...better performance by] writing a word at a time. In very size-constrained > environments even this may be to big though. [Enable this option...] > > Otherwise > Reviewed-by: Heiko Stuebner <heiko@sntech.de> I am going to apply this one now and leave the rest of the series until it has had a bit more review. But this one is needed for me to enable the rock board. Fixed this and: Applied to u-boot-rockchip
diff --git a/lib/Kconfig b/lib/Kconfig index 65c01573e1..58b5717dcd 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -52,6 +52,14 @@ config LIB_RAND help This library provides pseudo-random number generator functions. +config SPL_TINY_MEMSET + bool "Use a very small memset() in SPL" + help + The faster memset() is the arch-specific one (if available) enabled + by CONFIG_USE_ARCH_MEMSET. If that is not enabled, we can still get + better performance by write a word at a time. Enable this option + to reduce code size slightly at the cost of some speed. + source lib/dhry/Kconfig source lib/rsa/Kconfig diff --git a/lib/string.c b/lib/string.c index 67d5f6a421..c1a28c14ce 100644 --- a/lib/string.c +++ b/lib/string.c @@ -437,8 +437,10 @@ char *strswab(const char *s) void * memset(void * s,int c,size_t count) { unsigned long *sl = (unsigned long *) s; - unsigned long cl = 0; char *s8; + +#if !CONFIG_IS_ENABLED(TINY_MEMSET) + unsigned long cl = 0; int i; /* do it one word at a time (32 bits or 64 bits) while possible */ @@ -452,7 +454,7 @@ void * memset(void * s,int c,size_t count) count -= sizeof(*sl); } } - /* fill 8 bits at a time */ +#endif /* fill 8 bits at a time */ s8 = (char *)sl; while (count--) *s8++ = c;
Most of the time the optimised memset() is what we want. For extreme situations such as TPL it may be too large. For example on the 'rock' board, using a simple loop saves a useful 48 bytes. With gcc 4.9 and the rodata bug, this patch is enough to reduce the TPL image below the limit. Signed-off-by: Simon Glass <sjg@chromium.org> --- Changes in v2: - Adjust the option to be SPL-only - Change the option to default to off (name it CONFIG_SPL_TINY_MEMSET) lib/Kconfig | 8 ++++++++ lib/string.c | 6 ++++-- 2 files changed, 12 insertions(+), 2 deletions(-)