Message ID | CAMe9rOqGKGsWHsM1NO7L46QdtMALoG_Wq3mahg=beWSesAg0jg@mail.gmail.com |
---|---|
State | New |
Headers | show |
>diff --git a/sysdeps/x86_64/multiarch/memcpy.S >b/sysdeps/x86_64/multiarch/memcpy.S >index 8882590..3c67da8 100644 >--- a/sysdeps/x86_64/multiarch/memcpy.S >+++ b/sysdeps/x86_64/multiarch/memcpy.S >@@ -40,7 +40,7 @@ ENTRY(__new_memcpy) > #endif > 1: lea __memcpy_avx_unaligned(%rip), %RAX_LP > HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load) >- jnz 2f >+ jnz 3f > lea __memcpy_sse2_unaligned(%rip), %RAX_LP > HAS_ARCH_FEATURE (Fast_Unaligned_Load) > jnz 2f >@@ -52,6 +52,10 @@ ENTRY(__new_memcpy) > jnz 2f > lea __memcpy_ssse3(%rip), %RAX_LP > 2: ret >+3: HAS_ARCH_FEATURE (Avoid_AVX_Fast_Unaligned_Load) jz 2b lea >+__memcpy_ssse3_back(%rip), %RAX_LP ret > END(__new_memcpy) > >This is wrong. You should check Avoid_AVX_Fast_Unaligned_Load >to disable __memcpy_avx_unaligned, not select > __memcpy_ssse3_back. Each selection should be loaded >only once. Now OK?. --Amit Pawar
On Fri, Mar 18, 2016 at 5:25 AM, Pawar, Amit <Amit.Pawar@amd.com> wrote: >>diff --git a/sysdeps/x86_64/multiarch/memcpy.S >>b/sysdeps/x86_64/multiarch/memcpy.S >>index 8882590..3c67da8 100644 >>--- a/sysdeps/x86_64/multiarch/memcpy.S >>+++ b/sysdeps/x86_64/multiarch/memcpy.S >>@@ -40,7 +40,7 @@ ENTRY(__new_memcpy) >> #endif >> 1: lea __memcpy_avx_unaligned(%rip), %RAX_LP >> HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load) >>- jnz 2f >>+ jnz 3f >> lea __memcpy_sse2_unaligned(%rip), %RAX_LP >> HAS_ARCH_FEATURE (Fast_Unaligned_Load) >> jnz 2f >>@@ -52,6 +52,10 @@ ENTRY(__new_memcpy) >> jnz 2f >> lea __memcpy_ssse3(%rip), %RAX_LP >> 2: ret >>+3: HAS_ARCH_FEATURE (Avoid_AVX_Fast_Unaligned_Load) jz 2b lea >>+__memcpy_ssse3_back(%rip), %RAX_LP ret >> END(__new_memcpy) >> >>This is wrong. You should check Avoid_AVX_Fast_Unaligned_Load >>to disable __memcpy_avx_unaligned, not select >> __memcpy_ssse3_back. Each selection should be loaded >>only once. > > Now OK?. No, it isn't fixed. Avoid_AVX_Fast_Unaligned_Load should disable __memcpy_avx_unaligned and nothing more. Also you need to fix ALL selections.
diff --git a/sysdeps/x86_64/multiarch/memcpy.S b/sysdeps/x86_64/multiarch/memcpy.S index 8882590..3c67da8 100644 --- a/sysdeps/x86_64/multiarch/memcpy.S +++ b/sysdeps/x86_64/multiarch/memcpy.S @@ -40,7 +40,7 @@ ENTRY(__new_memcpy) #endif 1: lea __memcpy_avx_unaligned(%rip), %RAX_LP HAS_ARCH_FEATURE (AVX_Fast_Unaligned_Load) - jnz 2f + jnz 3f lea __memcpy_sse2_unaligned(%rip), %RAX_LP HAS_ARCH_FEATURE (Fast_Unaligned_Load) jnz 2f @@ -52,6 +52,10 @@ ENTRY(__new_memcpy) jnz 2f lea __memcpy_ssse3(%rip), %RAX_LP 2: ret +3: HAS_ARCH_FEATURE (Avoid_AVX_Fast_Unaligned_Load) + jz 2b + lea __memcpy_ssse3_back(%rip), %RAX_LP + ret END(__new_memcpy) This is wrong. You should check Avoid_AVX_Fast_Unaligned_Load