Message ID | CAMe9rOrsZmd2m6sCNR7H532eaU_ySwscuC-bv3bWNMHoAV5Bgg@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Friday 27 January 2017 11:27 PM, H.J. Lu wrote: > I am testing this patch for > > https://sourceware.org/bugzilla/show_bug.cgi?id=21081 > > I'd like to check it in before code freeze. > > > 0001-Add-VZEROUPPER-to-memset-vec-unaligned-erms.S-BZ-210.patch > > > From 9097edb85e04c137f226f3d371afff34a4ab17b7 Mon Sep 17 00:00:00 2001 > From: "H.J. Lu" <hjl.tools@gmail.com> > Date: Tue, 24 Jan 2017 15:58:49 -0800 > Subject: [PATCH] Add VZEROUPPER to memset-vec-unaligned-erms.S [BZ #21081] > > Since memset-vec-unaligned-erms.S has VDUP_TO_VEC0_AND_SET_RETURN at > function entry, memset optimized for AVX2 and AVX512 will always use > ymm/zmm register. VZEROUPPER should be placed before ret in > > L(stosb): > movq %rdx, %rcx > movzbl %sil, %eax > movq %rdi, %rdx > rep stosb > movq %rdx, %rax > ret > > since it can be reached from > > L(stosb_more_2x_vec): > cmpq $REP_STOSB_THRESHOLD, %rdx > ja L(stosb) > > [BZ #21081] > * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S > (L(stosb)): Add VZEROUPPER before ret. > --- > sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S > index ff214f0..704eed9 100644 > --- a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S > +++ b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S > @@ -110,6 +110,8 @@ ENTRY (__memset_erms) > ENTRY (MEMSET_SYMBOL (__memset, erms)) > # endif > L(stosb): > + /* Issue vzeroupper before rep stosb. */ > + VZEROUPPER > movq %rdx, %rcx > movzbl %sil, %eax > movq %rdi, %rdx > Looks good to me. Siddhesh
From 9097edb85e04c137f226f3d371afff34a4ab17b7 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" <hjl.tools@gmail.com> Date: Tue, 24 Jan 2017 15:58:49 -0800 Subject: [PATCH] Add VZEROUPPER to memset-vec-unaligned-erms.S [BZ #21081] Since memset-vec-unaligned-erms.S has VDUP_TO_VEC0_AND_SET_RETURN at function entry, memset optimized for AVX2 and AVX512 will always use ymm/zmm register. VZEROUPPER should be placed before ret in L(stosb): movq %rdx, %rcx movzbl %sil, %eax movq %rdi, %rdx rep stosb movq %rdx, %rax ret since it can be reached from L(stosb_more_2x_vec): cmpq $REP_STOSB_THRESHOLD, %rdx ja L(stosb) [BZ #21081] * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S (L(stosb)): Add VZEROUPPER before ret. --- sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S index ff214f0..704eed9 100644 --- a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S +++ b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S @@ -110,6 +110,8 @@ ENTRY (__memset_erms) ENTRY (MEMSET_SYMBOL (__memset, erms)) # endif L(stosb): + /* Issue vzeroupper before rep stosb. */ + VZEROUPPER movq %rdx, %rcx movzbl %sil, %eax movq %rdi, %rdx -- 2.9.3