diff mbox

[U-Boot] lib/string: memmove: use memcpy if it is safe to do so

Message ID 1319789046-17715-1-git-send-email-jacmet@sunsite.dk
State Rejected
Delegated to: Wolfgang Denk
Headers show

Commit Message

Peter Korsgaard Oct. 28, 2011, 8:04 a.m. UTC
memmove is used in a number of performance critical places, like copying the
linux kernel from nor flash to ram, so optimizing it can be interesting.

Unfortunately, an arch specific version of memmove often isn't used, and
not supported at all on a number of archs (arm/mips/nds32/nios2/x86) -
But memcpy typically is.

Often memmove is called for copies where src/dest don't overlap, so we
could use the more efficient memcpy instead. Detect these situations and
forward those request to memcpy instead.

Adds 40 bytes to memmove and speeds up boot with ~300ms on a 400MHz arm9.

Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>
---
 lib/string.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

Comments

Wolfgang Denk Oct. 28, 2011, 8:25 p.m. UTC | #1
Dear Peter Korsgaard,

In message <1319789046-17715-1-git-send-email-jacmet@sunsite.dk> you wrote:
> memmove is used in a number of performance critical places, like copying the
> linux kernel from nor flash to ram, so optimizing it can be interesting.
> 
> Unfortunately, an arch specific version of memmove often isn't used, and
> not supported at all on a number of archs (arm/mips/nds32/nios2/x86) -
> But memcpy typically is.

So you are adding code, making the system even less efficient?  This
sounds to be acounter-productive approach.

Can you not instead arrange for arch specific, optimized versions
memmove() to be used?

Best regards,

Wolfgang Denk
Peter Korsgaard Oct. 28, 2011, 11:28 p.m. UTC | #2
>>>>> "Wolfgang" == Wolfgang Denk <wd@denx.de> writes:

Hi,

 >> memmove is used in a number of performance critical places, like copying the
 >> linux kernel from nor flash to ram, so optimizing it can be interesting.
 >> 
 >> Unfortunately, an arch specific version of memmove often isn't used, and
 >> not supported at all on a number of archs (arm/mips/nds32/nios2/x86) -
 >> But memcpy typically is.

 Wolfgang> So you are adding code, making the system even less efficient?  This
 Wolfgang> sounds to be acounter-productive approach.

Well, slightly less efficient for the (uncommon) situation where
src/dest overlaps in return for significantly more efficiency for the
situation where they don't.

 Wolfgang> Can you not instead arrange for arch specific, optimized versions
 Wolfgang> memmove() to be used?

That's another option, but more work for implementers - So I would
prefer this trivial patch instead.

But I'll drop it if you disagree.
Mike Frysinger Oct. 29, 2011, 12:35 a.m. UTC | #3
On Fri, Oct 28, 2011 at 22:25, Wolfgang Denk wrote:
> Peter Korsgaard wrote:
>> memmove is used in a number of performance critical places, like copying the
>> linux kernel from nor flash to ram, so optimizing it can be interesting.
>>
>> Unfortunately, an arch specific version of memmove often isn't used, and
>> not supported at all on a number of archs (arm/mips/nds32/nios2/x86) -
>> But memcpy typically is.
>
> So you are adding code, making the system even less efficient?  This
> sounds to be acounter-productive approach.
>
> Can you not instead arrange for arch specific, optimized versions
> memmove() to be used?

i think this is kind of a loose-loose situation.  in common code, we
have to use memmove() if the two regions could *possibly* overlap.  so
in practice, we end up calling memmove() in more cases than strictly
necessary.  but the only real choice is to update all the call sites
with region overlap checks.

at least with Peter's patch, we centralize all these checks in one
place producing smaller code.  in some cases, we probably add a little
bit of runtime overhead when the new if() statement returns false, but
it seems to be that one successful call quickly outweighs multiple
unsuccessful if() checks.

i also think this falls into the same category as the previous commit
b038db852b5b7120f1ff825d8e2a5c2cd14c2f0f:
memcpy/memmove: Do not copy to same address

so the ideal probably would be to have an optimized memmove in place,
but for the (many) arches which don't have this, we seem to get a
"win" overall ?
-mike
Wolfgang Denk Oct. 29, 2011, 1:29 p.m. UTC | #4
Dear Peter Korsgaard,

In message <87vcr8ofw9.fsf@macbook.be.48ers.dk> you wrote:
>
>  Wolfgang> So you are adding code, making the system even less efficient?  This
>  Wolfgang> sounds to be acounter-productive approach.
> 
> Well, slightly less efficient for the (uncommon) situation where
> src/dest overlaps in return for significantly more efficiency for the
> situation where they don't.

The test which situation we have needs to be always performed, and
adds overhead, especially for short copies.

>  Wolfgang> Can you not instead arrange for arch specific, optimized versions
>  Wolfgang> memmove() to be used?
> 
> That's another option, but more work for implementers - So I would
> prefer this trivial patch instead.
> 
> But I'll drop it if you disagree.

I prefer to keep this code as is, and add optimized versions of
memmove() where needed / wanted.

Best regards,

Wolfgang Denk
diff mbox

Patch

diff --git a/lib/string.c b/lib/string.c
index 2c4f0ec..239cc11 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -504,12 +504,16 @@  void * memmove(void * dest,const void *src,size_t count)
 		return dest;
 
 	if (dest <= src) {
+		if (dest + count <= src)
+			return memcpy(dest, src, count);
 		tmp = (char *) dest;
 		s = (char *) src;
 		while (count--)
 			*tmp++ = *s++;
 		}
 	else {
+		if (src + count <= dest)
+			return memcpy(dest, src, count);
 		tmp = (char *) dest + count;
 		s = (char *) src + count;
 		while (count--)