From patchwork Sat Jan 20 14:27:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 1888765 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=YAGww4Fc; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4THJkX0srpz1yPv for ; Sun, 21 Jan 2024 01:28:33 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id ED5A03858284 for ; Sat, 20 Jan 2024 14:28:24 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-pg1-x52a.google.com (mail-pg1-x52a.google.com [IPv6:2607:f8b0:4864:20::52a]) by sourceware.org (Postfix) with ESMTPS id 79BD33858D39 for ; Sat, 20 Jan 2024 14:27:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 79BD33858D39 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 79BD33858D39 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::52a ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1705760879; cv=none; b=J8XAj0HqfOVDQZD8awfx8vNyNAIqnpwja4tMT/jgxPDi4zw0xpL4EmACMrt0dXWnwPCH3d5zm/FzCkxpVXvbQ42yk7aVwF1zQPLs77B4+/J5XgMZldv/Snfo/u5799xBLwZ+3Awo9JtvFVgvuhSHIMIfJPoLRhf9Vrx2gpGSUDc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1705760879; c=relaxed/simple; bh=kWtcMIMxlroOC7J8kqe2sTE9XHgmExnZfhoUmmvOA7g=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=Uxm27qfVI4+Unnufp8inyFIhvs2UQ453/BvoaKvMk55+j+9paCq0SrXKxi/g/VSAVs71LvPc/x/tz1Yy8YEs2In7cycK/d4FPG/YONQfvUk4nJVlTLk1grE0GSuZ8VuDXuAY+7lm9tOgbfgX+ZQOrxhV8yoNbiNBMWcEoJeK3TQ= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pg1-x52a.google.com with SMTP id 41be03b00d2f7-53fbf2c42bfso1292702a12.3 for ; Sat, 20 Jan 2024 06:27:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1705760875; x=1706365675; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=nLHc7RMXPqEqqmp5K3hUN3EzaJzJxP0/y/ENR/Lqh1E=; b=YAGww4Fc+8DiUaSuV+wz8hNMvcKcMV1BvrElEorWnC5EbrOxvlWVt+e5CIH8W7AGYJ hdw9QlRx7AXAhHBRzHi5WMLK2BUy3KrUgxtVzZtRbQ1/J+6lmK1uE2fnfFeOcvXMXCbE ZX9ywfB2pIDWjaFkPTiKauCrpIu5gfuqsf7IWufIbiZeNznUXc+TzRF/3savTy4umvHO h7Y7SP4A+e+1+tbH6fUHG95S5JTsE8ooWYiqP11Z1RV+VXMAfvU/O2/+DN6dymn2Faa/ boYom+ND6h3ahigXQRsSrKgC2RrRVCSps3sayj9KTWmT3RkTqLJ3ZGS/TzhOaErnCaDw jSsg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705760875; x=1706365675; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=nLHc7RMXPqEqqmp5K3hUN3EzaJzJxP0/y/ENR/Lqh1E=; b=dTtWV7kTXrG1en3552H7oxdCILh1S8zKsG738g+nj9s2y/GSDBDkVDHGLXI+TcCUt6 PTQHmMDdoEgXf40JQT7CoML3ULSck0bYFTUhKrBN5sAJIWpY83COcw4Cg/epYdvzfv1s LzKaKbjsSQsXjG4x1v/qQjpbKlIl53qcN2SPylLNRhQ5+J5h8msr2LCGEV3REmUfM8bz +wcuzzMlAuK09G4iiVpnJHInhsrzH1dAmZOgl9QpmsxMAaxvQUqkVr7mHKGENkUVkmcZ oLb6GqDqRjFM8HG4r/MsiDeflm1nuAoczT2iiAj1QegVxdk+Lf2VbcII3DPEHixVnNLi Xa6w== X-Gm-Message-State: AOJu0Ywdw9ABKD+JbxlkHbjaHcqXMQ6fICdNeC8/w6PhJSO2yVSPnQSo zHEcnQkEMGkjmsFemsLctSuNgjHJXsIkmfuiT1WF6JlxdBn8lazDnldEBCZ4 X-Google-Smtp-Source: AGHT+IFbn4CaSC6aKboV/4RkJL9X2hONO9rITjqIcO3UkKo6/MUnr9QoX+pgzFaSycFLxcZNKcbLnA== X-Received: by 2002:a17:903:41cf:b0:1d7:3131:4dde with SMTP id u15-20020a17090341cf00b001d731314ddemr1227720ple.116.1705760875052; Sat, 20 Jan 2024 06:27:55 -0800 (PST) Received: from gnu-cfl-3.localdomain ([172.56.168.9]) by smtp.gmail.com with ESMTPSA id u2-20020a17090341c200b001d4593a2e8fsm4612577ple.83.2024.01.20.06.27.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 20 Jan 2024 06:27:54 -0800 (PST) Received: from gnu-cfl-3.. (localhost [IPv6:::1]) by gnu-cfl-3.localdomain (Postfix) with ESMTP id D23167402E2; Sat, 20 Jan 2024 06:27:52 -0800 (PST) From: "H.J. Lu" To: gcc-patches@gcc.gnu.org Cc: ubizjak@gmail.com, hongtao.liu@intel.com, jh@suse.cz Subject: [PATCH 0/2] x86: Don't save callee-saved registers if not needed Date: Sat, 20 Jan 2024 06:27:50 -0800 Message-ID: <20240120142752.1387725-1-hjl.tools@gmail.com> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-Spam-Status: No, score=-3018.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org In some cases, there are no need to save callee-saved registers: 1. If a noreturn function doesn't throw nor support exceptions, it can skip saving callee-saved registers. 2. When an interrupt handler is implemented by an assembly stub which does: 1. Save all registers. 2. Call a C function. 3. Restore all registers. 4. Return from interrupt. it is completely unnecessary to save and restore any registers in the C function called by the assembly stub, even if they would normally be callee-saved. This patch set adds no_callee_saved_registers function attribute, which is complementary to no_caller_saved_registers function attribute, to classify x86 backend call-saved register handling type with 1. Default call-saved registers. 2. No caller-saved registers with no_caller_saved_registers attribute. 3. No callee-saved registers with no_callee_saved_registers attribute. Functions of no callee-saved registers won't save callee-saved registers. If a noreturn function doesn't throw nor support exceptions, it is classified as the no callee-saved registers type. With these changes, __libc_start_main in glibc 2.39, which is a noreturn function, is changed from __libc_start_main: endbr64 push %r15 push %r14 mov %rcx,%r14 push %r13 push %r12 push %rbp mov %esi,%ebp push %rbx mov %rdx,%rbx sub $0x28,%rsp mov %rdi,(%rsp) mov %fs:0x28,%rax mov %rax,0x18(%rsp) xor %eax,%eax test %r9,%r9 to __libc_start_main: endbr64 sub $0x28,%rsp mov %esi,%ebp mov %rdx,%rbx mov %rcx,%r14 mov %rdi,(%rsp) mov %fs:0x28,%rax mov %rax,0x18(%rsp) xor %eax,%eax test %r9,%r9 In Linux kernel 6.7.0 on x86-64, do_exit is changed from do_exit: endbr64 call push %r15 push %r14 push %r13 push %r12 mov %rdi,%r12 push %rbp push %rbx mov %gs:0x0,%rbx sub $0x28,%rsp mov %gs:0x28,%rax mov %rax,0x20(%rsp) xor %eax,%eax call *0x0(%rip) # test $0x2,%ah je to do_exit: endbr64 call sub $0x28,%rsp mov %rdi,%r12 mov %gs:0x28,%rax mov %rax,0x20(%rsp) xor %eax,%eax mov %gs:0x0,%rbx call *0x0(%rip) # test $0x2,%ah je I compared GCC master branch bootstrap and test times on a slow machine with 6.6 Linux kernels compiled with the original GCC 13 and the GCC 13 with the backported patch. The performance data isn't precise since the measurements were done on different days with different GCC sources under different 6.6 kernel versions. GCC master branch build time in seconds: before after improvement 30043.75user 30013.16user 0% 1274.85system 1243.72system 2.4% GCC master branch test time in seconds (new tests added): before after improvement 216035.90user 216547.51user 0 27365.51system 26658.54system 2.6% Backported to GCC 13 to rebuild system glibc and kernel on Fedora 39. Systems perform normally. H.J. Lu (2): x86: Add no_callee_saved_registers function attribute x86: Don't save callee-saved registers in noreturn functions gcc/config/i386/i386-expand.cc | 72 ++++++++++++++++--- gcc/config/i386/i386-options.cc | 61 ++++++++++++---- gcc/config/i386/i386.cc | 70 ++++++++++++++---- gcc/config/i386/i386.h | 20 +++++- gcc/doc/extend.texi | 8 +++ .../gcc.dg/torture/no-callee-saved-run-1a.c | 23 ++++++ .../gcc.dg/torture/no-callee-saved-run-1b.c | 59 +++++++++++++++ .../gcc.target/i386/no-callee-saved-1.c | 30 ++++++++ .../gcc.target/i386/no-callee-saved-10.c | 46 ++++++++++++ .../gcc.target/i386/no-callee-saved-11.c | 11 +++ .../gcc.target/i386/no-callee-saved-12.c | 10 +++ .../gcc.target/i386/no-callee-saved-13.c | 16 +++++ .../gcc.target/i386/no-callee-saved-14.c | 16 +++++ .../gcc.target/i386/no-callee-saved-15.c | 17 +++++ .../gcc.target/i386/no-callee-saved-16.c | 16 +++++ .../gcc.target/i386/no-callee-saved-17.c | 16 +++++ .../gcc.target/i386/no-callee-saved-18.c | 51 +++++++++++++ .../gcc.target/i386/no-callee-saved-2.c | 30 ++++++++ .../gcc.target/i386/no-callee-saved-3.c | 8 +++ .../gcc.target/i386/no-callee-saved-4.c | 8 +++ .../gcc.target/i386/no-callee-saved-5.c | 11 +++ .../gcc.target/i386/no-callee-saved-6.c | 12 ++++ .../gcc.target/i386/no-callee-saved-7.c | 49 +++++++++++++ .../gcc.target/i386/no-callee-saved-8.c | 50 +++++++++++++ .../gcc.target/i386/no-callee-saved-9.c | 49 +++++++++++++ gcc/testsuite/gcc.target/i386/pr38534-1.c | 26 +++++++ gcc/testsuite/gcc.target/i386/pr38534-2.c | 18 +++++ gcc/testsuite/gcc.target/i386/pr38534-3.c | 19 +++++ gcc/testsuite/gcc.target/i386/pr38534-4.c | 18 +++++ .../gcc.target/i386/stack-check-17.c | 19 ++--- 30 files changed, 809 insertions(+), 50 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/torture/no-callee-saved-run-1a.c create mode 100644 gcc/testsuite/gcc.dg/torture/no-callee-saved-run-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-1.c create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-10.c create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-11.c create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-12.c create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-13.c create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-14.c create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-15.c create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-16.c create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-17.c create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-18.c create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-2.c create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-3.c create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-4.c create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-5.c create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-6.c create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-7.c create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-8.c create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-9.c create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-3.c create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-4.c