Message ID | CAMe9rOqpbUF2m40pxjyr+O8pSrA9EmUgNsOcBPuP-wDaMqn+RQ@mail.gmail.com |
---|---|
State | New |
Headers | show |
On 18-03-2016 10:51, H.J. Lu wrote: > On Fri, Mar 18, 2016 at 6:22 AM, Pawar, Amit <Amit.Pawar@amd.com> wrote: >>> No, it isn't fixed. Avoid_AVX_Fast_Unaligned_Load should disable __memcpy_avx_unaligned and nothing more. Also you need to fix ALL selections. >> >> diff --git a/sysdeps/x86_64/multiarch/memcpy.S b/sysdeps/x86_64/multiarch/memcpy.S >> index 8882590..a5afaf4 100644 >> --- a/sysdeps/x86_64/multiarch/memcpy.S >> +++ b/sysdeps/x86_64/multiarch/memcpy.S >> @@ -39,6 +39,8 @@ ENTRY(__new_memcpy) >> ret >> #endif >> 1: lea __memcpy_avx_unaligned(%rip), %RAX_LP >> + HAS_ARCH_FEATURE (Avoid_AVX_Fast_Unaligned_Load) >> + jnz 3f >> HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load) >> jnz 2f >> lea __memcpy_sse2_unaligned(%rip), %RAX_LP >> @@ -52,6 +54,8 @@ ENTRY(__new_memcpy) >> jnz 2f >> lea __memcpy_ssse3(%rip), %RAX_LP >> 2: ret >> +3: lea __memcpy_ssse3(%rip), %RAX_LP >> + ret >> END(__new_memcpy) >> >> # undef ENTRY >> >> Will update all IFUNC's if this ok else please suggest. >> > > Better, but not OK. Try something like > > iff --git a/sysdeps/x86_64/multiarch/memcpy.S > b/sysdeps/x86_64/multiarch/memcpy.S > index ab5998c..2abe2fd 100644 > --- a/sysdeps/x86_64/multiarch/memcpy.S > +++ b/sysdeps/x86_64/multiarch/memcpy.S > @@ -42,9 +42,11 @@ ENTRY(__new_memcpy) > ret > #endif > 1: lea __memcpy_avx_unaligned(%rip), %RAX_LP > + HAS_ARCH_FEATURE (Avoid_AVX_Fast_Unaligned_Load) > + jnz 3f > HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load) > jnz 2f > - lea __memcpy_sse2_unaligned(%rip), %RAX_LP > +3: lea __memcpy_sse2_unaligned(%rip), %RAX_LP > HAS_ARCH_FEATURE (Fast_Unaligned_Load) > jnz 2f > lea __memcpy_sse2(%rip), %RAX_LP > > I know this is not related to this patch, but any reason to not code the resolver using the libc_ifunc macros?
On Fri, Mar 18, 2016 at 6:55 AM, Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote: > > > On 18-03-2016 10:51, H.J. Lu wrote: >> On Fri, Mar 18, 2016 at 6:22 AM, Pawar, Amit <Amit.Pawar@amd.com> wrote: >>>> No, it isn't fixed. Avoid_AVX_Fast_Unaligned_Load should disable __memcpy_avx_unaligned and nothing more. Also you need to fix ALL selections. >>> >>> diff --git a/sysdeps/x86_64/multiarch/memcpy.S b/sysdeps/x86_64/multiarch/memcpy.S >>> index 8882590..a5afaf4 100644 >>> --- a/sysdeps/x86_64/multiarch/memcpy.S >>> +++ b/sysdeps/x86_64/multiarch/memcpy.S >>> @@ -39,6 +39,8 @@ ENTRY(__new_memcpy) >>> ret >>> #endif >>> 1: lea __memcpy_avx_unaligned(%rip), %RAX_LP >>> + HAS_ARCH_FEATURE (Avoid_AVX_Fast_Unaligned_Load) >>> + jnz 3f >>> HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load) >>> jnz 2f >>> lea __memcpy_sse2_unaligned(%rip), %RAX_LP >>> @@ -52,6 +54,8 @@ ENTRY(__new_memcpy) >>> jnz 2f >>> lea __memcpy_ssse3(%rip), %RAX_LP >>> 2: ret >>> +3: lea __memcpy_ssse3(%rip), %RAX_LP >>> + ret >>> END(__new_memcpy) >>> >>> # undef ENTRY >>> >>> Will update all IFUNC's if this ok else please suggest. >>> >> >> Better, but not OK. Try something like >> >> iff --git a/sysdeps/x86_64/multiarch/memcpy.S >> b/sysdeps/x86_64/multiarch/memcpy.S >> index ab5998c..2abe2fd 100644 >> --- a/sysdeps/x86_64/multiarch/memcpy.S >> +++ b/sysdeps/x86_64/multiarch/memcpy.S >> @@ -42,9 +42,11 @@ ENTRY(__new_memcpy) >> ret >> #endif >> 1: lea __memcpy_avx_unaligned(%rip), %RAX_LP >> + HAS_ARCH_FEATURE (Avoid_AVX_Fast_Unaligned_Load) >> + jnz 3f >> HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load) >> jnz 2f >> - lea __memcpy_sse2_unaligned(%rip), %RAX_LP >> +3: lea __memcpy_sse2_unaligned(%rip), %RAX_LP >> HAS_ARCH_FEATURE (Fast_Unaligned_Load) >> jnz 2f >> lea __memcpy_sse2(%rip), %RAX_LP >> >> > > I know this is not related to this patch, but any reason to not code the > resolver using the libc_ifunc macros? Did you mean writing them in C? It can be done. Someone needs to write patches.
On Fri, Mar 18, 2016 at 6:51 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Fri, Mar 18, 2016 at 6:22 AM, Pawar, Amit <Amit.Pawar@amd.com> wrote: >>>No, it isn't fixed. Avoid_AVX_Fast_Unaligned_Load should disable __memcpy_avx_unaligned and nothing more. Also you need to fix ALL selections. >> >> diff --git a/sysdeps/x86_64/multiarch/memcpy.S b/sysdeps/x86_64/multiarch/memcpy.S >> index 8882590..a5afaf4 100644 >> --- a/sysdeps/x86_64/multiarch/memcpy.S >> +++ b/sysdeps/x86_64/multiarch/memcpy.S >> @@ -39,6 +39,8 @@ ENTRY(__new_memcpy) >> ret >> #endif >> 1: lea __memcpy_avx_unaligned(%rip), %RAX_LP >> + HAS_ARCH_FEATURE (Avoid_AVX_Fast_Unaligned_Load) >> + jnz 3f >> HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load) >> jnz 2f >> lea __memcpy_sse2_unaligned(%rip), %RAX_LP >> @@ -52,6 +54,8 @@ ENTRY(__new_memcpy) >> jnz 2f >> lea __memcpy_ssse3(%rip), %RAX_LP >> 2: ret >> +3: lea __memcpy_ssse3(%rip), %RAX_LP >> + ret >> END(__new_memcpy) >> >> # undef ENTRY >> >> Will update all IFUNC's if this ok else please suggest. >> > > Better, but not OK. Try something like > > iff --git a/sysdeps/x86_64/multiarch/memcpy.S > b/sysdeps/x86_64/multiarch/memcpy.S > index ab5998c..2abe2fd 100644 > --- a/sysdeps/x86_64/multiarch/memcpy.S > +++ b/sysdeps/x86_64/multiarch/memcpy.S > @@ -42,9 +42,11 @@ ENTRY(__new_memcpy) > ret > #endif > 1: lea __memcpy_avx_unaligned(%rip), %RAX_LP > + HAS_ARCH_FEATURE (Avoid_AVX_Fast_Unaligned_Load) > + jnz 3f > HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load) > jnz 2f > - lea __memcpy_sse2_unaligned(%rip), %RAX_LP > +3: lea __memcpy_sse2_unaligned(%rip), %RAX_LP > HAS_ARCH_FEATURE (Fast_Unaligned_Load) > jnz 2f > lea __memcpy_sse2(%rip), %RAX_LP > One question. If you don't want __memcpy_avx_unaligned, why do you set AVX_Fast_Unaligned_Load?
>One question. If you don't want __memcpy_avx_unaligned, why do you set AVX_Fast_Unaligned_Load?
Any idea whether currently any other string and memory functions are under implementation based on this? If not then let me just verify it.
Also this feature is enabled in generic code. To disable it, need to change after this.
--Amit Pawar
On Fri, Mar 18, 2016 at 8:19 AM, Pawar, Amit <Amit.Pawar@amd.com> wrote: >>One question. If you don't want __memcpy_avx_unaligned, why do you set AVX_Fast_Unaligned_Load? > Any idea whether currently any other string and memory functions are under implementation based on this? If not then let me just verify it. > > Also this feature is enabled in generic code. To disable it, need to change after this. > It was done based on assumption that AVX enabled machine has fast AVX unaligned load. If it isn't true for AMD CPUs, we can enable it for all Intel AVX CPUs and you can set it for AMD CPUs properly.
--- a/sysdeps/x86_64/multiarch/memcpy.S +++ b/sysdeps/x86_64/multiarch/memcpy.S @@ -42,9 +42,11 @@ ENTRY(__new_memcpy) ret #endif 1: lea __memcpy_avx_unaligned(%rip), %RAX_LP + HAS_ARCH_FEATURE (Avoid_AVX_Fast_Unaligned_Load) + jnz 3f HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load) jnz 2f - lea __memcpy_sse2_unaligned(%rip), %RAX_LP +3: lea __memcpy_sse2_unaligned(%rip), %RAX_LP HAS_ARCH_FEATURE (Fast_Unaligned_Load) jnz 2f lea __memcpy_sse2(%rip), %RAX_LP