Message ID | 20201125122610.1.I56a5b1310adc1bce11401f8f2e1577be96dee65a@changeid |
---|---|
State | Changes Requested |
Delegated to: | Tom Rini |
Headers | show |
Series | string: Use memcpy() within memmove() when we can | expand |
On 25/11/2020 12.26, Patrick Delaunay wrote: > A common use of memmove() can be handled by memcpy(). Also memcpy() > includes an optimization for large sizes: it copies a word at a time. So > we can get a speed-up by calling memcpy() to handle our move in this case. > > Update memmove() to call also memcpy() if the source don't overlap > the destination (src + count <= dest). > > Signed-off-by: Patrick Delaunay <patrick.delaunay@st.com> > --- > This patch allows to save 38ms for Kernel Image extraction (7327624 Bytes) > from FIT loaded at 0xC2000000 for ARMV7 board STM32MP157C-EV1, > and with kernel destination = Load Address: 0xc4000000, > located after the FIT without overlap, compared with > destination = Load Address: 0xc0008000. > > -> 14,332 us vs 52,239 in bootstage report > > In this case the memmove funtion is called in common/image.c::memmove_wd() > to handle overlap. > > > lib/string.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/lib/string.c b/lib/string.c > index ae7835f600..ef8ead976c 100644 > --- a/lib/string.c > +++ b/lib/string.c > @@ -567,7 +567,7 @@ void * memmove(void * dest,const void *src,size_t count) > { > char *tmp, *s; > > - if (dest <= src) { > + if (dest <= src || (src + count) <= dest) { > memcpy(dest, src, count); Hm. So the condition you add (src + count <= dest) implies no overlap at all, so that's ok. So this doesn't really have anything to do with your patch per se. The existing condition relies on memcpy doing forward-copying. That's the case if the implementation from lib/string.c is in use, i.e. if __HAVE_ARCH_MEMCPY is not defined. And if an arch defines __HAVE_ARCH_MEMMOVE, this memmove() is not used. But AFAICT, there's a potential problem for the case where __HAVE_ARCH_MEMCPY is defined but __HAVE_ARCH_MEMMOVE is not, and e.g. arch/arm/include/asm/string.h does #if CONFIG_IS_ENABLED(USE_ARCH_MEMCPY) #define __HAVE_ARCH_MEMCPY #endif #undef __HAVE_ARCH_MEMMOVE Of course, the arch-specific implementation _may_ also do forward-copying (I haven't tried to decipher it), but that seems to be a rather fragile assumption. At the very least, some comments would be in order. Rasmus
On Wed, Nov 25, 2020 at 01:07:43PM +0100, Rasmus Villemoes wrote: > On 25/11/2020 12.26, Patrick Delaunay wrote: > > A common use of memmove() can be handled by memcpy(). Also memcpy() > > includes an optimization for large sizes: it copies a word at a time. So > > we can get a speed-up by calling memcpy() to handle our move in this case. > > > > Update memmove() to call also memcpy() if the source don't overlap > > the destination (src + count <= dest). > > > > Signed-off-by: Patrick Delaunay <patrick.delaunay@st.com> > > --- > > This patch allows to save 38ms for Kernel Image extraction (7327624 Bytes) > > from FIT loaded at 0xC2000000 for ARMV7 board STM32MP157C-EV1, > > and with kernel destination = Load Address: 0xc4000000, > > located after the FIT without overlap, compared with > > destination = Load Address: 0xc0008000. > > > > -> 14,332 us vs 52,239 in bootstage report > > > > In this case the memmove funtion is called in common/image.c::memmove_wd() > > to handle overlap. > > > > > > lib/string.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/lib/string.c b/lib/string.c > > index ae7835f600..ef8ead976c 100644 > > --- a/lib/string.c > > +++ b/lib/string.c > > @@ -567,7 +567,7 @@ void * memmove(void * dest,const void *src,size_t count) > > { > > char *tmp, *s; > > > > - if (dest <= src) { > > + if (dest <= src || (src + count) <= dest) { > > memcpy(dest, src, count); > > Hm. So the condition you add (src + count <= dest) implies no overlap at > all, so that's ok. So this doesn't really have anything to do with your > patch per se. > > The existing condition relies on memcpy doing forward-copying. That's > the case if the implementation from lib/string.c is in use, i.e. if > __HAVE_ARCH_MEMCPY is not defined. And if an arch defines > __HAVE_ARCH_MEMMOVE, this memmove() is not used. > > But AFAICT, there's a potential problem for the case where > __HAVE_ARCH_MEMCPY is defined but __HAVE_ARCH_MEMMOVE is not, and e.g. > arch/arm/include/asm/string.h does > > #if CONFIG_IS_ENABLED(USE_ARCH_MEMCPY) > #define __HAVE_ARCH_MEMCPY > #endif > #undef __HAVE_ARCH_MEMMOVE > > Of course, the arch-specific implementation _may_ also do > forward-copying (I haven't tried to decipher it), but that seems to be a > rather fragile assumption. At the very least, some comments would be in > order. Looking at this deeper, today, ARM (non-64bit) can and usually but not always does use the asm optimized memcpy / memset. No other optimized functions were copied from Linux, and no other arches seem to use them today either. I think in sum then, Patrick can you please do a v2 that adds a comment here, in case we get more optimizations in the future? And thanks for the review here Rasmus!
diff --git a/lib/string.c b/lib/string.c index ae7835f600..ef8ead976c 100644 --- a/lib/string.c +++ b/lib/string.c @@ -567,7 +567,7 @@ void * memmove(void * dest,const void *src,size_t count) { char *tmp, *s; - if (dest <= src) { + if (dest <= src || (src + count) <= dest) { memcpy(dest, src, count); } else { tmp = (char *) dest + count;
A common use of memmove() can be handled by memcpy(). Also memcpy() includes an optimization for large sizes: it copies a word at a time. So we can get a speed-up by calling memcpy() to handle our move in this case. Update memmove() to call also memcpy() if the source don't overlap the destination (src + count <= dest). Signed-off-by: Patrick Delaunay <patrick.delaunay@st.com> --- This patch allows to save 38ms for Kernel Image extraction (7327624 Bytes) from FIT loaded at 0xC2000000 for ARMV7 board STM32MP157C-EV1, and with kernel destination = Load Address: 0xc4000000, located after the FIT without overlap, compared with destination = Load Address: 0xc0008000. -> 14,332 us vs 52,239 in bootstage report In this case the memmove funtion is called in common/image.c::memmove_wd() to handle overlap. lib/string.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)