diff mbox series

string: Use memcpy() within memmove() when we can

Message ID 20201125122610.1.I56a5b1310adc1bce11401f8f2e1577be96dee65a@changeid
State Changes Requested
Delegated to: Tom Rini
Headers show
Series string: Use memcpy() within memmove() when we can | expand

Commit Message

Patrick DELAUNAY Nov. 25, 2020, 11:26 a.m. UTC
A common use of memmove() can be handled by memcpy(). Also memcpy()
includes an optimization for large sizes: it copies a word at a time. So
we can get a speed-up by calling memcpy() to handle our move in this case.

Update memmove() to call also memcpy() if the source don't overlap
the destination (src + count <= dest).

Signed-off-by: Patrick Delaunay <patrick.delaunay@st.com>
---
This patch allows to save 38ms for Kernel Image extraction (7327624 Bytes)
from FIT loaded at 0xC2000000 for ARMV7 board STM32MP157C-EV1,
and with kernel destination = Load Address: 0xc4000000,
located after the FIT without overlap, compared with
destination = Load Address: 0xc0008000.

-> 14,332 us vs 52,239 in bootstage report

In this case the memmove funtion is called in common/image.c::memmove_wd()
to handle overlap.


 lib/string.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Rasmus Villemoes Nov. 25, 2020, 12:07 p.m. UTC | #1
On 25/11/2020 12.26, Patrick Delaunay wrote:
> A common use of memmove() can be handled by memcpy(). Also memcpy()
> includes an optimization for large sizes: it copies a word at a time. So
> we can get a speed-up by calling memcpy() to handle our move in this case.
> 
> Update memmove() to call also memcpy() if the source don't overlap
> the destination (src + count <= dest).
> 
> Signed-off-by: Patrick Delaunay <patrick.delaunay@st.com>
> ---
> This patch allows to save 38ms for Kernel Image extraction (7327624 Bytes)
> from FIT loaded at 0xC2000000 for ARMV7 board STM32MP157C-EV1,
> and with kernel destination = Load Address: 0xc4000000,
> located after the FIT without overlap, compared with
> destination = Load Address: 0xc0008000.
> 
> -> 14,332 us vs 52,239 in bootstage report
> 
> In this case the memmove funtion is called in common/image.c::memmove_wd()
> to handle overlap.
> 
> 
>  lib/string.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/string.c b/lib/string.c
> index ae7835f600..ef8ead976c 100644
> --- a/lib/string.c
> +++ b/lib/string.c
> @@ -567,7 +567,7 @@ void * memmove(void * dest,const void *src,size_t count)
>  {
>  	char *tmp, *s;
>  
> -	if (dest <= src) {
> +	if (dest <= src || (src + count) <= dest) {
>  		memcpy(dest, src, count);

Hm. So the condition you add (src + count <= dest) implies no overlap at
all, so that's ok. So this doesn't really have anything to do with your
patch per se.

The existing condition relies on memcpy doing forward-copying. That's
the case if the implementation from lib/string.c is in use, i.e. if
__HAVE_ARCH_MEMCPY is not defined. And if an arch defines
__HAVE_ARCH_MEMMOVE, this memmove() is not used.

But AFAICT, there's a potential problem for the case where
__HAVE_ARCH_MEMCPY is defined but __HAVE_ARCH_MEMMOVE is not, and e.g.
arch/arm/include/asm/string.h does

#if CONFIG_IS_ENABLED(USE_ARCH_MEMCPY)
#define __HAVE_ARCH_MEMCPY
#endif
#undef __HAVE_ARCH_MEMMOVE

Of course, the arch-specific implementation _may_ also do
forward-copying (I haven't tried to decipher it), but that seems to be a
rather fragile assumption. At the very least, some comments would be in
order.

Rasmus
Tom Rini Dec. 8, 2020, 4:10 p.m. UTC | #2
On Wed, Nov 25, 2020 at 01:07:43PM +0100, Rasmus Villemoes wrote:
> On 25/11/2020 12.26, Patrick Delaunay wrote:
> > A common use of memmove() can be handled by memcpy(). Also memcpy()
> > includes an optimization for large sizes: it copies a word at a time. So
> > we can get a speed-up by calling memcpy() to handle our move in this case.
> > 
> > Update memmove() to call also memcpy() if the source don't overlap
> > the destination (src + count <= dest).
> > 
> > Signed-off-by: Patrick Delaunay <patrick.delaunay@st.com>
> > ---
> > This patch allows to save 38ms for Kernel Image extraction (7327624 Bytes)
> > from FIT loaded at 0xC2000000 for ARMV7 board STM32MP157C-EV1,
> > and with kernel destination = Load Address: 0xc4000000,
> > located after the FIT without overlap, compared with
> > destination = Load Address: 0xc0008000.
> > 
> > -> 14,332 us vs 52,239 in bootstage report
> > 
> > In this case the memmove funtion is called in common/image.c::memmove_wd()
> > to handle overlap.
> > 
> > 
> >  lib/string.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/lib/string.c b/lib/string.c
> > index ae7835f600..ef8ead976c 100644
> > --- a/lib/string.c
> > +++ b/lib/string.c
> > @@ -567,7 +567,7 @@ void * memmove(void * dest,const void *src,size_t count)
> >  {
> >  	char *tmp, *s;
> >  
> > -	if (dest <= src) {
> > +	if (dest <= src || (src + count) <= dest) {
> >  		memcpy(dest, src, count);
> 
> Hm. So the condition you add (src + count <= dest) implies no overlap at
> all, so that's ok. So this doesn't really have anything to do with your
> patch per se.
> 
> The existing condition relies on memcpy doing forward-copying. That's
> the case if the implementation from lib/string.c is in use, i.e. if
> __HAVE_ARCH_MEMCPY is not defined. And if an arch defines
> __HAVE_ARCH_MEMMOVE, this memmove() is not used.
> 
> But AFAICT, there's a potential problem for the case where
> __HAVE_ARCH_MEMCPY is defined but __HAVE_ARCH_MEMMOVE is not, and e.g.
> arch/arm/include/asm/string.h does
> 
> #if CONFIG_IS_ENABLED(USE_ARCH_MEMCPY)
> #define __HAVE_ARCH_MEMCPY
> #endif
> #undef __HAVE_ARCH_MEMMOVE
> 
> Of course, the arch-specific implementation _may_ also do
> forward-copying (I haven't tried to decipher it), but that seems to be a
> rather fragile assumption. At the very least, some comments would be in
> order.

Looking at this deeper, today, ARM (non-64bit) can and usually but not
always does use the asm optimized memcpy / memset.  No other optimized
functions were copied from Linux, and no other arches seem to use them
today either.  I think in sum then, Patrick can you please do a v2 that
adds a comment here, in case we get more optimizations in the future?
And thanks for the review here Rasmus!
diff mbox series

Patch

diff --git a/lib/string.c b/lib/string.c
index ae7835f600..ef8ead976c 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -567,7 +567,7 @@  void * memmove(void * dest,const void *src,size_t count)
 {
 	char *tmp, *s;
 
-	if (dest <= src) {
+	if (dest <= src || (src + count) <= dest) {
 		memcpy(dest, src, count);
 	} else {
 		tmp = (char *) dest + count;