Message ID | 2923c970-026c-4e00-be7a-0650e82421b5@BAMAIL02.ba.imgtec.org |
---|---|
State | New |
Headers | show |
> +#if __mips_isa_rev > 5
I've just so happened to be starting some mips2 tests with glibc
and found see that __mips_isa_rev is not defined below mips32.
This is then an error given -Werror is in force.
i.e. I think we will need a defined(__mips_isa_rev) in any of the
conditions which use it. This is likely to affect all of the R6
patches.
Thanks,
Matthew
On Fri, 19 Dec 2014, Matthew Fortune wrote: > > +#if __mips_isa_rev > 5 > > I've just so happened to be starting some mips2 tests with glibc > and found see that __mips_isa_rev is not defined below mips32. > This is then an error given -Werror is in force. We still have -Wno-error=undef, so it *shouldn't* be an error (but once Siddhesh's remaining -Wundef fixes are in, and any further fixes needed to get clean for -Wundef in at least one configuration, we should remove -Wno-error=undef, so such issues certainly need fixing soon). > i.e. I think we will need a defined(__mips_isa_rev) in any of the > conditions which use it. This is likely to affect all of the R6 > patches. Please put this in a central place (a single header that defines __mips_isa_rev to 0 if not already defined, for example) rather than duplicating "defined" tests in different places ("defined" tests are typo-prone and so discouraged in glibc).
Joseph Myers <joseph@codesourcery.com> writes: > On Fri, 19 Dec 2014, Matthew Fortune wrote: > > > > +#if __mips_isa_rev > 5 > > > > I've just so happened to be starting some mips2 tests with glibc and > > found see that __mips_isa_rev is not defined below mips32. > > This is then an error given -Werror is in force. > > We still have -Wno-error=undef, so it *shouldn't* be an error (but once > Siddhesh's remaining -Wundef fixes are in, and any further fixes needed > to get clean for -Wundef in at least one configuration, we should remove > -Wno-error=undef, so such issues certainly need fixing soon). Ah, OK. I think my build must have failed for a different reason and there was __mips_isa_rev warning close to the end that I misread as an error. I guess we can leave it to a cleanup commit for all the instances then. Matthew
On Fri, Dec 19, 2014 at 03:26:44PM -0800, Steve Ellcey wrote: > Here is the last of my patches for mips32r6/mips64r6 support. It updates > memset to use byte copies instead of stl or str to align the destination > because those instructions are not supported in mips32r6 or mips64r6. > It also avoids using the 'prepare for store' prefetch hint because that > is not supported on mips32r6 or mips64r6 either. > > Tested with the mips32r6/mips64r6 GCC, binutils and qemu simulator. > > OK to checkin? > > Steve Ellcey > sellcey@imgtec.com > > > PTR_ADDU a0,a0,t2 > +#else /* R6_CODE */ > + andi t2,a0,7 > + lapc t9,L(atable) > + PTR_LSA t9,t2,t9,2 > + jrc t9 > +L(atable): > + bc L(aligned) > + bc L(lb7) That could be performance regression, test if its faster than existing loop on unpredictable branches [B] Also try if just branches are better, like in following c code [A] Table lookup could be even slower in real workloads as it adds latency when table is not in cache. From practical standpoint realigning code looks like dead code, on x64 83% percents of calls are 16 byte aligned and I cannot find application that makes call unaligned to 8 bytes. You will get better speedup by adding a check if its already aligned and moving realignment code to bottom of file to improve instruction cache usage. [A] if (((int) x) & 1) *x = mask; x &= ~1; if (((int) x) & 2) *((uint16_t*) x) = mask; x &= ~2; if (((int) x) & 4) *((uint32_t*) x) = mask; x &= ~4; [B] #include <string.h> int main (int x) { char foo[100]; int i; for (i = 0; i < 100000000; i++) memset (foo + (i % 16), 1, 32 - (i % 16)); return foo[17]; }
On Sat, 20 Dec 2014, Matthew Fortune wrote: > Joseph Myers <joseph@codesourcery.com> writes: > > On Fri, 19 Dec 2014, Matthew Fortune wrote: > > > > > > +#if __mips_isa_rev > 5 > > > > > > I've just so happened to be starting some mips2 tests with glibc and > > > found see that __mips_isa_rev is not defined below mips32. > > > This is then an error given -Werror is in force. > > > > We still have -Wno-error=undef, so it *shouldn't* be an error (but once > > Siddhesh's remaining -Wundef fixes are in, and any further fixes needed > > to get clean for -Wundef in at least one configuration, we should remove > > -Wno-error=undef, so such issues certainly need fixing soon). > > Ah, OK. I think my build must have failed for a different reason > and there was __mips_isa_rev warning close to the end that I > misread as an error. > > I guess we can leave it to a cleanup commit for all the instances > then. I'll wait for the cleanup and revised versions of the memset / memcpy patches (on the basis that the identified -Wundef issue means that the memset / memcpy patches can't go in as-is).
On Fri, 19 Dec 2014, Steve Ellcey wrote: > @@ -188,9 +205,9 @@ LEAF(MEMSET_NAME) > > .set nomips16 > .set noreorder > -/* If the size is less than 2*NSIZE (8 or 16), go to L(lastb). Regardless of > +/* If the size is less than 4*NSIZE (16 or 32), go to L(lastb). Regardless of > size, copy dst pointer to v0 for the return value. */ > - slti t2,a2,(2 * NSIZE) > + slti t2,a2,(4 * NSIZE) > bne t2,zero,L(lastb) > move v0,a0 > This doesn't appear to have anything obvious to do with r6. Please submit it separately from the r6 changes, with its own rationale (or if it is in fact needed as part of the r6 changes, please explain further). In general, r6 changes should not result in any changes to the code in glibc built for non-r6 - if the best way to support r6 also involves changes to code for non-r6 (e.g. if some cleanup or optimization naturally results in r6 support because the non-r6-compatible code was bad in some way and r6-compatible code is better for all versions, or if code compatible with both r6 and pre-r6 can be just as good on all versions as the existing non-r6-compatible code) then it's best to submit those changes separately.
On Mon, 2014-12-22 at 18:02 +0000, Joseph Myers wrote: > On Fri, 19 Dec 2014, Steve Ellcey wrote: > > > @@ -188,9 +205,9 @@ LEAF(MEMSET_NAME) > > > > .set nomips16 > > .set noreorder > > -/* If the size is less than 2*NSIZE (8 or 16), go to L(lastb). Regardless of > > +/* If the size is less than 4*NSIZE (16 or 32), go to L(lastb). Regardless of > > size, copy dst pointer to v0 for the return value. */ > > - slti t2,a2,(2 * NSIZE) > > + slti t2,a2,(4 * NSIZE) > > bne t2,zero,L(lastb) > > move v0,a0 > > > > This doesn't appear to have anything obvious to do with r6. Please submit > it separately from the r6 changes, with its own rationale (or if it is in > fact needed as part of the r6 changes, please explain further). > > In general, r6 changes should not result in any changes to the code in > glibc built for non-r6 - if the best way to support r6 also involves > changes to code for non-r6 (e.g. if some cleanup or optimization naturally > results in r6 support because the non-r6-compatible code was bad in some > way and r6-compatible code is better for all versions, or if code > compatible with both r6 and pre-r6 can be just as good on all versions as > the existing non-r6-compatible code) then it's best to submit those > changes separately. The change is needed for r6 because I align the buffer to 8 bytes instead of to 4 bytes for 32 bit mode and 8 bytes for 64 bit mode. This should help with cache alignment and, depending on the CPU, with memory bonding where two sequential 4 byte loads can be handled as if they were a single 8 byte load. Without the slti change (assuming O32 mode), if I have a non-aligned 9 byte buffer I could set up to 7 bytes to get it 8 byte aligned and that leaves less then a full 4 byte word to be set. The existing code after this check assumes there is at lease one full word needs to be set. I considered changing the alignment code to only align on a 4 byte boundary for O32 mode, or ifdef'ing this test but it seemed cleaner to increase the minimum size of buffers that get handled via a simple byte copy loop for both r6 and earlier CPU's. If you don't like this part of the patch I could go back to aligning on a 4 byte boundary in 32 bit mode for now and get some performance data on the different alignment options later. Steve Ellcey sellcey@imgtec.com
On Mon, 22 Dec 2014, Steve Ellcey wrote: > I considered changing the alignment code to only align on a 4 byte > boundary for O32 mode, or ifdef'ing this test but it seemed cleaner to > increase the minimum size of buffers that get handled via a simple byte > copy loop for both r6 and earlier CPU's. In that case, submit the change as a preparatory patch, with its own justification for being OK for existing CPUs (that it doesn't affect performance, or whatever), so that the r6 patch can avoid changing pre-r6 code.
diff --git a/sysdeps/mips/memset.S b/sysdeps/mips/memset.S index 9991480..4cc6dd7 100644 --- a/sysdeps/mips/memset.S +++ b/sysdeps/mips/memset.S @@ -54,6 +54,14 @@ # endif #endif +#if __mips_isa_rev > 5 +# if (PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE) +# undef PREFETCH_STORE_HINT +# define PREFETCH_STORE_HINT PREFETCH_HINT_STORE_STREAMED +# endif +# define R6_CODE +#endif + /* Some asm.h files do not have the L macro definition. */ #ifndef L # if _MIPS_SIM == _ABIO32 @@ -72,6 +80,15 @@ # endif #endif +/* New R6 instructions that may not be in asm.h. */ +#ifndef PTR_LSA +# if _MIPS_SIM == _ABI64 +# define PTR_LSA dlsa +# else +# define PTR_LSA lsa +# endif +#endif + /* Using PREFETCH_HINT_PREPAREFORSTORE instead of PREFETCH_STORE or PREFETCH_STORE_STREAMED offers a large performance advantage but PREPAREFORSTORE has some special restrictions to consider. @@ -188,9 +205,9 @@ LEAF(MEMSET_NAME) .set nomips16 .set noreorder -/* If the size is less than 2*NSIZE (8 or 16), go to L(lastb). Regardless of +/* If the size is less than 4*NSIZE (16 or 32), go to L(lastb). Regardless of size, copy dst pointer to v0 for the return value. */ - slti t2,a2,(2 * NSIZE) + slti t2,a2,(4 * NSIZE) bne t2,zero,L(lastb) move v0,a0 @@ -231,11 +248,46 @@ LEAF(MEMSET_NAME) /* If the destination address is not aligned do a partial store to get it aligned. If it is already aligned just jump to L(aligned). */ L(set0): +#if !defined (R6_CODE) andi t2,a3,(NSIZE-1) /* word-unaligned address? */ beq t2,zero,L(aligned) /* t2 is the unalignment count */ PTR_SUBU a2,a2,t2 C_STHI a1,0(a0) PTR_ADDU a0,a0,t2 +#else /* R6_CODE */ + andi t2,a0,7 + lapc t9,L(atable) + PTR_LSA t9,t2,t9,2 + jrc t9 +L(atable): + bc L(aligned) + bc L(lb7) + bc L(lb6) + bc L(lb5) + bc L(lb4) + bc L(lb3) + bc L(lb2) + bc L(lb1) +L(lb7): + sb a1,6(a0) +L(lb6): + sb a1,5(a0) +L(lb5): + sb a1,4(a0) +L(lb4): + sb a1,3(a0) +L(lb3): + sb a1,2(a0) +L(lb2): + sb a1,1(a0) +L(lb1): + sb a1,0(a0) + + li t9,8 + subu t2,t9,t2 + PTR_SUBU a2,a2,t2 + PTR_ADDU a0,a0,t2 +#endif /* R6_CODE */ L(aligned): /* If USE_DOUBLE is not set we may still want to align the data on a 16 @@ -286,8 +338,12 @@ L(loop16w): bgtz v1,L(skip_pref) nop #endif +#if defined(R6_CODE) + PREFETCH_FOR_STORE (2, a0) +#else PREFETCH_FOR_STORE (4, a0) PREFETCH_FOR_STORE (5, a0) +#endif L(skip_pref): C_ST a1,UNIT(0)(a0) C_ST a1,UNIT(1)(a0)