From patchwork Mon Jun 27 21:18:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 1649115 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=fIxlnk01; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4LX0xD2F7vz9sGJ for ; Tue, 28 Jun 2022 07:20:00 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D1A063857B99 for ; Mon, 27 Jun 2022 21:19:57 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D1A063857B99 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1656364797; bh=d+Bg1k/z+vRqCHfCKb3M807wVBgXfJsKReRtXnVUe0I=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=fIxlnk010QXk6luaINCm80t77rl9v4bQDPcrv+JS7DhUF2CZ8WlAK5klJJJf+hlWo Il9/z6iV80uMMuisOuW10TEyescLkpDSm94pk+xwMwYYaM7fOB1l8XbiUGZ7MCtrUw WNzekqgLQjdzn76nhl0eqHXQGEoWUOwGuu4MsSS8= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pf1-x42b.google.com (mail-pf1-x42b.google.com [IPv6:2607:f8b0:4864:20::42b]) by sourceware.org (Postfix) with ESMTPS id C114A3858283 for ; Mon, 27 Jun 2022 21:19:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C114A3858283 Received: by mail-pf1-x42b.google.com with SMTP id d17so10126588pfq.9 for ; Mon, 27 Jun 2022 14:19:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=d+Bg1k/z+vRqCHfCKb3M807wVBgXfJsKReRtXnVUe0I=; b=ZVi9AUtXXrgjXBL8w21ZPkWt+UsaPt9BLjkQnh1sgN0UTzptlq0IoesMZRSY0nxX/w kChoX5zmT6nPe+v4s8MjNFx2/v/NL4d7B56ZXqcpLCCN/SK9XLmj5mclnGdiCNlRjIkd sF65lQpeex4/9SfhLu9QnIVTvI/kIgA71jvZyDutX7E4PnTSenhRpHvzpkjZk8yRfvVp x2njtPoKAvvo2i/5Dy2shkqYK20yQfugwGFSR/n/Nkm83kZByICAUnwi/x2Zqn9Qwlz1 zu+lB/LxTNR6EWIkQmFuBHMO54HoDR+d+GesFfzMGnYav57hi0p0fCNVFVfJ1RLBI2B6 9qTw== X-Gm-Message-State: AJIora+Fww1lrJAgP89g4fzXCJAeGuyOI1xma60qqaFrGmeCt89T85ct I2Up9/InxOZDx9JwHB7xcMtHzVGHARc= X-Google-Smtp-Source: AGRyM1t+SfjnVrXUQ4m1cMbc7IdWS65YWFad2OtN040HrlWJoWaIB9cvRG4VSSZOJvvyX/qPR3r00Q== X-Received: by 2002:a63:6a85:0:b0:3fa:722a:fbdc with SMTP id f127-20020a636a85000000b003fa722afbdcmr14667452pgc.174.1656364739782; Mon, 27 Jun 2022 14:18:59 -0700 (PDT) Received: from gnu-tgl-3.localdomain ([172.58.37.230]) by smtp.gmail.com with ESMTPSA id bi1-20020a056a02024100b00410702f9bbasm1775300pgb.23.2022.06.27.14.18.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Jun 2022 14:18:59 -0700 (PDT) Received: from gnu-tgl-3.. (localhost [IPv6:::1]) by gnu-tgl-3.localdomain (Postfix) with ESMTP id 4646AC0375; Mon, 27 Jun 2022 14:18:58 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v2 2/2] x86-64: Only define used SSE/AVX/AVX512 run-time resolvers Date: Mon, 27 Jun 2022 14:18:58 -0700 Message-Id: <20220627211858.2807553-2-hjl.tools@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220627211858.2807553-1-hjl.tools@gmail.com> References: <20220627211858.2807553-1-hjl.tools@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-3027.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "H.J. Lu via Libc-alpha" From: "H.J. Lu" Reply-To: "H.J. Lu" Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" When glibc is built with x86-64 ISA level v3, SSE run-time resolvers aren't used. For x86-64 ISA level v4 build, both SSE and AVX resolvers are unused. Check the minimum x86-64 ISA level to exclude the unused run-time resolvers. --- sysdeps/x86/isa-level.h | 2 ++ sysdeps/x86_64/dl-machine.h | 12 ++++--- sysdeps/x86_64/dl-trampoline.S | 59 ++++++++++++++++++---------------- 3 files changed, 42 insertions(+), 31 deletions(-) diff --git a/sysdeps/x86/isa-level.h b/sysdeps/x86/isa-level.h index c6156e7f7a..f293aea906 100644 --- a/sysdeps/x86/isa-level.h +++ b/sysdeps/x86/isa-level.h @@ -68,10 +68,12 @@ compile-time constant.. */ /* ISA level >= 4 guaranteed includes. */ +#define AVX512F_X86_ISA_LEVEL 4 #define AVX512VL_X86_ISA_LEVEL 4 #define AVX512BW_X86_ISA_LEVEL 4 /* ISA level >= 3 guaranteed includes. */ +#define AVX_X86_ISA_LEVEL 3 #define AVX2_X86_ISA_LEVEL 3 #define BMI2_X86_ISA_LEVEL 3 diff --git a/sysdeps/x86_64/dl-machine.h b/sysdeps/x86_64/dl-machine.h index 34766325ae..005d089501 100644 --- a/sysdeps/x86_64/dl-machine.h +++ b/sysdeps/x86_64/dl-machine.h @@ -28,6 +28,7 @@ #include #include #include +#include /* Return nonzero iff ELF header is compatible with the running host. */ static inline int __attribute__ ((unused)) @@ -86,6 +87,8 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[], /* Identify this shared object. */ *(ElfW(Addr) *) (got + 1) = (ElfW(Addr)) l; + const struct cpu_features* cpu_features = __get_cpu_features (); + /* The got[2] entry contains the address of a function which gets called to get the address of a so far unresolved function and jump to it. The profiling extension of the dynamic linker allows @@ -94,9 +97,9 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[], end in this function. */ if (__glibc_unlikely (profile)) { - if (CPU_FEATURE_USABLE (AVX512F)) + if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512F)) *(ElfW(Addr) *) (got + 2) = (ElfW(Addr)) &_dl_runtime_profile_avx512; - else if (CPU_FEATURE_USABLE (AVX)) + else if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX)) *(ElfW(Addr) *) (got + 2) = (ElfW(Addr)) &_dl_runtime_profile_avx; else *(ElfW(Addr) *) (got + 2) = (ElfW(Addr)) &_dl_runtime_profile_sse; @@ -112,9 +115,10 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[], /* This function will get called to fix up the GOT entry indicated by the offset on the stack, and then jump to the resolved address. */ - if (GLRO(dl_x86_cpu_features).xsave_state_size != 0) + if (MINIMUM_X86_ISA_LEVEL >= AVX_X86_ISA_LEVEL + || GLRO(dl_x86_cpu_features).xsave_state_size != 0) *(ElfW(Addr) *) (got + 2) - = (CPU_FEATURE_USABLE (XSAVEC) + = (CPU_FEATURE_USABLE_P (cpu_features, XSAVEC) ? (ElfW(Addr)) &_dl_runtime_resolve_xsavec : (ElfW(Addr)) &_dl_runtime_resolve_xsave); else diff --git a/sysdeps/x86_64/dl-trampoline.S b/sysdeps/x86_64/dl-trampoline.S index 831a654713..f669805ac5 100644 --- a/sysdeps/x86_64/dl-trampoline.S +++ b/sysdeps/x86_64/dl-trampoline.S @@ -20,6 +20,7 @@ #include #include #include +#include #ifndef DL_STACK_ALIGNMENT /* Due to GCC bug: @@ -62,35 +63,39 @@ #undef VMOVA #undef VEC_SIZE -#define VEC_SIZE 32 -#define VMOVA vmovdqa -#define VEC(i) ymm##i -#define _dl_runtime_profile _dl_runtime_profile_avx -#include "dl-trampoline.h" -#undef _dl_runtime_profile -#undef VEC -#undef VMOVA -#undef VEC_SIZE +#if MINIMUM_X86_ISA_LEVEL <= AVX_X86_ISA_LEVEL +# define VEC_SIZE 32 +# define VMOVA vmovdqa +# define VEC(i) ymm##i +# define _dl_runtime_profile _dl_runtime_profile_avx +# include "dl-trampoline.h" +# undef _dl_runtime_profile +# undef VEC +# undef VMOVA +# undef VEC_SIZE +#endif +#if MINIMUM_X86_ISA_LEVEL < AVX_X86_ISA_LEVEL /* movaps/movups is 1-byte shorter. */ -#define VEC_SIZE 16 -#define VMOVA movaps -#define VEC(i) xmm##i -#define _dl_runtime_profile _dl_runtime_profile_sse -#undef RESTORE_AVX -#include "dl-trampoline.h" -#undef _dl_runtime_profile -#undef VEC -#undef VMOVA -#undef VEC_SIZE - -#define USE_FXSAVE -#define STATE_SAVE_ALIGNMENT 16 -#define _dl_runtime_resolve _dl_runtime_resolve_fxsave -#include "dl-trampoline.h" -#undef _dl_runtime_resolve -#undef USE_FXSAVE -#undef STATE_SAVE_ALIGNMENT +# define VEC_SIZE 16 +# define VMOVA movaps +# define VEC(i) xmm##i +# define _dl_runtime_profile _dl_runtime_profile_sse +# undef RESTORE_AVX +# include "dl-trampoline.h" +# undef _dl_runtime_profile +# undef VEC +# undef VMOVA +# undef VEC_SIZE + +# define USE_FXSAVE +# define STATE_SAVE_ALIGNMENT 16 +# define _dl_runtime_resolve _dl_runtime_resolve_fxsave +# include "dl-trampoline.h" +# undef _dl_runtime_resolve +# undef USE_FXSAVE +# undef STATE_SAVE_ALIGNMENT +#endif #define USE_XSAVE #define STATE_SAVE_ALIGNMENT 64