Message ID | 9010ef9da0b2730af564a138b8d316d48eaf6d43.1431436210.git.christophe.leroy@c-s.fr (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On Tue, 2015-05-12 at 15:32 +0200, Christophe Leroy wrote: > cacheable_memzero uses dcbz instruction and is more efficient than > memset(0) when the destination is in RAM > > This patch renames memset as generic_memset, and defines memset > as a prolog to cacheable_memzero. This prolog checks if the byte > to set is 0 and if the buffer is in RAM. If not, it falls back to > generic_memcpy() > > Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> > --- > arch/powerpc/lib/copy_32.S | 15 ++++++++++++++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/lib/copy_32.S b/arch/powerpc/lib/copy_32.S > index cbca76c..d8a9a86 100644 > --- a/arch/powerpc/lib/copy_32.S > +++ b/arch/powerpc/lib/copy_32.S > @@ -12,6 +12,7 @@ > #include <asm/cache.h> > #include <asm/errno.h> > #include <asm/ppc_asm.h> > +#include <asm/page.h> > > #define COPY_16_BYTES \ > lwz r7,4(r4); \ > @@ -74,6 +75,18 @@ CACHELINE_MASK = (L1_CACHE_BYTES-1) > * to set them to zero. This requires that the destination > * area is cacheable. -- paulus > */ > +_GLOBAL(memset) > + cmplwi r4,0 > + bne- generic_memset > + cmplwi r5,L1_CACHE_BYTES > + blt- generic_memset > + lis r8,max_pfn@ha > + lwz r8,max_pfn@l(r8) > + tophys (r9,r3) > + srwi r9,r9,PAGE_SHIFT > + cmplw r9,r8 > + bge- generic_memset > + mr r4,r5 max_pfn includes highmem, and tophys only works on normal kernel addresses. If we were to point memset_io, memcpy_toio, etc. at noncacheable versions, are there any other callers left that can reasonably point at uncacheable memory? -Scott
Le 14/05/2015 02:55, Scott Wood a écrit : > On Tue, 2015-05-12 at 15:32 +0200, Christophe Leroy wrote: >> cacheable_memzero uses dcbz instruction and is more efficient than >> memset(0) when the destination is in RAM >> >> This patch renames memset as generic_memset, and defines memset >> as a prolog to cacheable_memzero. This prolog checks if the byte >> to set is 0 and if the buffer is in RAM. If not, it falls back to >> generic_memcpy() >> >> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> >> --- >> arch/powerpc/lib/copy_32.S | 15 ++++++++++++++- >> 1 file changed, 14 insertions(+), 1 deletion(-) >> >> diff --git a/arch/powerpc/lib/copy_32.S b/arch/powerpc/lib/copy_32.S >> index cbca76c..d8a9a86 100644 >> --- a/arch/powerpc/lib/copy_32.S >> +++ b/arch/powerpc/lib/copy_32.S >> @@ -12,6 +12,7 @@ >> #include <asm/cache.h> >> #include <asm/errno.h> >> #include <asm/ppc_asm.h> >> +#include <asm/page.h> >> >> #define COPY_16_BYTES \ >> lwz r7,4(r4); \ >> @@ -74,6 +75,18 @@ CACHELINE_MASK = (L1_CACHE_BYTES-1) >> * to set them to zero. This requires that the destination >> * area is cacheable. -- paulus >> */ >> +_GLOBAL(memset) >> + cmplwi r4,0 >> + bne- generic_memset >> + cmplwi r5,L1_CACHE_BYTES >> + blt- generic_memset >> + lis r8,max_pfn@ha >> + lwz r8,max_pfn@l(r8) >> + tophys (r9,r3) >> + srwi r9,r9,PAGE_SHIFT >> + cmplw r9,r8 >> + bge- generic_memset >> + mr r4,r5 > max_pfn includes highmem, and tophys only works on normal kernel > addresses. Is there any other simple way to determine whether an address is in RAM or not ? I did that because of the below function from mm/mem.c |int page_is_ram(unsigned long pfn) { #ifndef CONFIG_PPC64 /* XXX for now */ return pfn< max_pfn; #else unsigned long paddr= (pfn<< PAGE_SHIFT); struct memblock_region*reg; for_each_memblock(memory, reg) if (paddr>= reg->base&& paddr< (reg->base+ reg->size)) return 1; return 0; #endif } | > > If we were to point memset_io, memcpy_toio, etc. at noncacheable > versions, are there any other callers left that can reasonably point at > uncacheable memory? Do you mean we could just consider that memcpy() and memset() are called only with destination on RAM and thus we could avoid the check ? copy_tofrom_user() already does this assumption (allthought a user app could possibly provide a buffer located in an ALSA mapped IO area) Christophe --- L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. http://www.avast.com
On Thu, 2015-05-14 at 10:50 +0200, christophe leroy wrote: > > Le 14/05/2015 02:55, Scott Wood a écrit : > > On Tue, 2015-05-12 at 15:32 +0200, Christophe Leroy wrote: > >> cacheable_memzero uses dcbz instruction and is more efficient than > >> memset(0) when the destination is in RAM > >> > >> This patch renames memset as generic_memset, and defines memset > >> as a prolog to cacheable_memzero. This prolog checks if the byte > >> to set is 0 and if the buffer is in RAM. If not, it falls back to > >> generic_memcpy() > >> > >> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> > >> --- > >> arch/powerpc/lib/copy_32.S | 15 ++++++++++++++- > >> 1 file changed, 14 insertions(+), 1 deletion(-) > >> > >> diff --git a/arch/powerpc/lib/copy_32.S b/arch/powerpc/lib/copy_32.S > >> index cbca76c..d8a9a86 100644 > >> --- a/arch/powerpc/lib/copy_32.S > >> +++ b/arch/powerpc/lib/copy_32.S > >> @@ -12,6 +12,7 @@ > >> #include <asm/cache.h> > >> #include <asm/errno.h> > >> #include <asm/ppc_asm.h> > >> +#include <asm/page.h> > >> > >> #define COPY_16_BYTES \ > >> lwz r7,4(r4); \ > >> @@ -74,6 +75,18 @@ CACHELINE_MASK = (L1_CACHE_BYTES-1) > >> * to set them to zero. This requires that the destination > >> * area is cacheable. -- paulus > >> */ > >> +_GLOBAL(memset) > >> + cmplwi r4,0 > >> + bne- generic_memset > >> + cmplwi r5,L1_CACHE_BYTES > >> + blt- generic_memset > >> + lis r8,max_pfn@ha > >> + lwz r8,max_pfn@l(r8) > >> + tophys (r9,r3) > >> + srwi r9,r9,PAGE_SHIFT > >> + cmplw r9,r8 > >> + bge- generic_memset > >> + mr r4,r5 > > max_pfn includes highmem, and tophys only works on normal kernel > > addresses. > Is there any other simple way to determine whether an address is in RAM > or not ? If you want to do it based on the virtual address, rather than doing a tablewalk or TLB search, you need to limit it to lowmem. > I did that because of the below function from mm/mem.c > > |int page_is_ram(unsigned long pfn) > { > #ifndef CONFIG_PPC64 /* XXX for now */ > return pfn< max_pfn; > #else > unsigned long paddr= (pfn<< PAGE_SHIFT); > struct memblock_region*reg; > > for_each_memblock(memory, reg) > if (paddr>= reg->base&& paddr< (reg->base+ reg->size)) > return 1; > return 0; > #endif > } Right, the problem is figuring out the pfn in the first place. > > If we were to point memset_io, memcpy_toio, etc. at noncacheable > > versions, are there any other callers left that can reasonably point at > > uncacheable memory? > Do you mean we could just consider that memcpy() and memset() are called > only with destination on RAM and thus we could avoid the check ? Maybe. If that's not a safe assumption I hope someone will point it out. > copy_tofrom_user() already does this assumption (allthought a user app > could possibly provide a buffer located in an ALSA mapped IO area) The user could also pass in NULL. That's what the fixups are for. :-) -Scott
diff --git a/arch/powerpc/lib/copy_32.S b/arch/powerpc/lib/copy_32.S index cbca76c..d8a9a86 100644 --- a/arch/powerpc/lib/copy_32.S +++ b/arch/powerpc/lib/copy_32.S @@ -12,6 +12,7 @@ #include <asm/cache.h> #include <asm/errno.h> #include <asm/ppc_asm.h> +#include <asm/page.h> #define COPY_16_BYTES \ lwz r7,4(r4); \ @@ -74,6 +75,18 @@ CACHELINE_MASK = (L1_CACHE_BYTES-1) * to set them to zero. This requires that the destination * area is cacheable. -- paulus */ +_GLOBAL(memset) + cmplwi r4,0 + bne- generic_memset + cmplwi r5,L1_CACHE_BYTES + blt- generic_memset + lis r8,max_pfn@ha + lwz r8,max_pfn@l(r8) + tophys (r9,r3) + srwi r9,r9,PAGE_SHIFT + cmplw r9,r8 + bge- generic_memset + mr r4,r5 _GLOBAL(cacheable_memzero) li r5,0 addi r6,r3,-4 @@ -116,7 +129,7 @@ _GLOBAL(cacheable_memzero) bdnz 8b blr -_GLOBAL(memset) +_GLOBAL(generic_memset) rlwimi r4,r4,8,16,23 rlwimi r4,r4,16,0,15 addi r6,r3,-4
cacheable_memzero uses dcbz instruction and is more efficient than memset(0) when the destination is in RAM This patch renames memset as generic_memset, and defines memset as a prolog to cacheable_memzero. This prolog checks if the byte to set is 0 and if the buffer is in RAM. If not, it falls back to generic_memcpy() Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> --- arch/powerpc/lib/copy_32.S | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)