Message ID | 1319789046-17715-1-git-send-email-jacmet@sunsite.dk |
---|---|
State | Rejected |
Delegated to: | Wolfgang Denk |
Headers | show |
Dear Peter Korsgaard, In message <1319789046-17715-1-git-send-email-jacmet@sunsite.dk> you wrote: > memmove is used in a number of performance critical places, like copying the > linux kernel from nor flash to ram, so optimizing it can be interesting. > > Unfortunately, an arch specific version of memmove often isn't used, and > not supported at all on a number of archs (arm/mips/nds32/nios2/x86) - > But memcpy typically is. So you are adding code, making the system even less efficient? This sounds to be acounter-productive approach. Can you not instead arrange for arch specific, optimized versions memmove() to be used? Best regards, Wolfgang Denk
>>>>> "Wolfgang" == Wolfgang Denk <wd@denx.de> writes: Hi, >> memmove is used in a number of performance critical places, like copying the >> linux kernel from nor flash to ram, so optimizing it can be interesting. >> >> Unfortunately, an arch specific version of memmove often isn't used, and >> not supported at all on a number of archs (arm/mips/nds32/nios2/x86) - >> But memcpy typically is. Wolfgang> So you are adding code, making the system even less efficient? This Wolfgang> sounds to be acounter-productive approach. Well, slightly less efficient for the (uncommon) situation where src/dest overlaps in return for significantly more efficiency for the situation where they don't. Wolfgang> Can you not instead arrange for arch specific, optimized versions Wolfgang> memmove() to be used? That's another option, but more work for implementers - So I would prefer this trivial patch instead. But I'll drop it if you disagree.
On Fri, Oct 28, 2011 at 22:25, Wolfgang Denk wrote: > Peter Korsgaard wrote: >> memmove is used in a number of performance critical places, like copying the >> linux kernel from nor flash to ram, so optimizing it can be interesting. >> >> Unfortunately, an arch specific version of memmove often isn't used, and >> not supported at all on a number of archs (arm/mips/nds32/nios2/x86) - >> But memcpy typically is. > > So you are adding code, making the system even less efficient? This > sounds to be acounter-productive approach. > > Can you not instead arrange for arch specific, optimized versions > memmove() to be used? i think this is kind of a loose-loose situation. in common code, we have to use memmove() if the two regions could *possibly* overlap. so in practice, we end up calling memmove() in more cases than strictly necessary. but the only real choice is to update all the call sites with region overlap checks. at least with Peter's patch, we centralize all these checks in one place producing smaller code. in some cases, we probably add a little bit of runtime overhead when the new if() statement returns false, but it seems to be that one successful call quickly outweighs multiple unsuccessful if() checks. i also think this falls into the same category as the previous commit b038db852b5b7120f1ff825d8e2a5c2cd14c2f0f: memcpy/memmove: Do not copy to same address so the ideal probably would be to have an optimized memmove in place, but for the (many) arches which don't have this, we seem to get a "win" overall ? -mike
Dear Peter Korsgaard, In message <87vcr8ofw9.fsf@macbook.be.48ers.dk> you wrote: > > Wolfgang> So you are adding code, making the system even less efficient? This > Wolfgang> sounds to be acounter-productive approach. > > Well, slightly less efficient for the (uncommon) situation where > src/dest overlaps in return for significantly more efficiency for the > situation where they don't. The test which situation we have needs to be always performed, and adds overhead, especially for short copies. > Wolfgang> Can you not instead arrange for arch specific, optimized versions > Wolfgang> memmove() to be used? > > That's another option, but more work for implementers - So I would > prefer this trivial patch instead. > > But I'll drop it if you disagree. I prefer to keep this code as is, and add optimized versions of memmove() where needed / wanted. Best regards, Wolfgang Denk
diff --git a/lib/string.c b/lib/string.c index 2c4f0ec..239cc11 100644 --- a/lib/string.c +++ b/lib/string.c @@ -504,12 +504,16 @@ void * memmove(void * dest,const void *src,size_t count) return dest; if (dest <= src) { + if (dest + count <= src) + return memcpy(dest, src, count); tmp = (char *) dest; s = (char *) src; while (count--) *tmp++ = *s++; } else { + if (src + count <= dest) + return memcpy(dest, src, count); tmp = (char *) dest + count; s = (char *) src + count; while (count--)
memmove is used in a number of performance critical places, like copying the linux kernel from nor flash to ram, so optimizing it can be interesting. Unfortunately, an arch specific version of memmove often isn't used, and not supported at all on a number of archs (arm/mips/nds32/nios2/x86) - But memcpy typically is. Often memmove is called for copies where src/dest don't overlap, so we could use the more efficient memcpy instead. Detect these situations and forward those request to memcpy instead. Adds 40 bytes to memmove and speeds up boot with ~300ms on a 400MHz arm9. Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk> --- lib/string.c | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-)