diff mbox series

[v3,3/3] x86: Expand the comment on when REP STOSB is used on memset

Message ID 20240208130840.533348-4-adhemerval.zanella@linaro.org
State New
Headers show
Series x86: Improve ERMS usage on Zen3+ | expand

Commit Message

Adhemerval Zanella Netto Feb. 8, 2024, 1:08 p.m. UTC
---
 sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

H.J. Lu Feb. 12, 2024, 3:56 p.m. UTC | #1
On Thu, Feb 8, 2024 at 5:08 AM Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
>
> ---
>  sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
> index 9984c3ca0f..97839a2248 100644
> --- a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
> +++ b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
> @@ -21,7 +21,9 @@
>     2. If size is less than VEC, use integer register stores.
>     3. If size is from VEC_SIZE to 2 * VEC_SIZE, use 2 VEC stores.
>     4. If size is from 2 * VEC_SIZE to 4 * VEC_SIZE, use 4 VEC stores.
> -   5. If size is more to 4 * VEC_SIZE, align to 4 * VEC_SIZE with
> +   5. On machines ERMS feature, if size is greater or equal than
> +      __x86_rep_stosb_threshold then REP STOSB will be used.
> +   6. If size is more to 4 * VEC_SIZE, align to 4 * VEC_SIZE with
>        4 VEC stores and store 4 * VEC at a time until done.  */
>
>  #include <sysdep.h>
> --
> 2.34.1
>

LGTM.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

Thanks.
diff mbox series

Patch

diff --git a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
index 9984c3ca0f..97839a2248 100644
--- a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
+++ b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
@@ -21,7 +21,9 @@ 
    2. If size is less than VEC, use integer register stores.
    3. If size is from VEC_SIZE to 2 * VEC_SIZE, use 2 VEC stores.
    4. If size is from 2 * VEC_SIZE to 4 * VEC_SIZE, use 4 VEC stores.
-   5. If size is more to 4 * VEC_SIZE, align to 4 * VEC_SIZE with
+   5. On machines ERMS feature, if size is greater or equal than
+      __x86_rep_stosb_threshold then REP STOSB will be used.
+   6. If size is more to 4 * VEC_SIZE, align to 4 * VEC_SIZE with
       4 VEC stores and store 4 * VEC at a time until done.  */
 
 #include <sysdep.h>