From patchwork Tue Jun 23 14:57:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Malcomson X-Patchwork-Id: 1315312 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=armh.onmicrosoft.com header.i=@armh.onmicrosoft.com header.a=rsa-sha256 header.s=selector2-armh-onmicrosoft-com header.b=ktDyUIFq; dkim=pass (1024-bit key) header.d=armh.onmicrosoft.com header.i=@armh.onmicrosoft.com header.a=rsa-sha256 header.s=selector2-armh-onmicrosoft-com header.b=ktDyUIFq; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49rqC52Y9dz9sRN for ; Wed, 24 Jun 2020 00:57:53 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A40BE395C842; Tue, 23 Jun 2020 14:57:50 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR04-HE1-obe.outbound.protection.outlook.com (mail-eopbgr70070.outbound.protection.outlook.com [40.107.7.70]) by sourceware.org (Postfix) with ESMTPS id 8A67D3893676 for ; Tue, 23 Jun 2020 14:57:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 8A67D3893676 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=Matthew.Malcomson@arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=MbFUYzCd3SkFdQ+G7ltJwwQ/3o/jZiaDRMwHaWSZbm8=; b=ktDyUIFqgbQV8RoWW6ZC7LNL4a9BzDphtXXpP9/GFdo9kUd4KXsyYisneQqE50nbGHvXyD1AroaPX1I8Rt+a3CayIGnkyhVPZA4eKrwdvnjsXSp/j+UA9x07iVPo86AjlysS8sd7zrQmwP/ZvvIHlLfjHpKchv4rY2BlW/RjHW4= Received: from AM6PR10CA0017.EURPRD10.PROD.OUTLOOK.COM (2603:10a6:209:89::30) by DB8PR08MB5322.eurprd08.prod.outlook.com (2603:10a6:10:114::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3109.21; Tue, 23 Jun 2020 14:57:41 +0000 Received: from AM5EUR03FT012.eop-EUR03.prod.protection.outlook.com (2603:10a6:209:89:cafe::eb) by AM6PR10CA0017.outlook.office365.com (2603:10a6:209:89::30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3109.22 via Frontend Transport; Tue, 23 Jun 2020 14:57:41 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; gcc.gnu.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;gcc.gnu.org; dmarc=bestguesspass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM5EUR03FT012.mail.protection.outlook.com (10.152.16.161) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3109.22 via Frontend Transport; Tue, 23 Jun 2020 14:57:41 +0000 Received: ("Tessian outbound 2ba684f51d22:v59"); Tue, 23 Jun 2020 14:57:41 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 3e973c2589201f0b X-CR-MTA-TID: 64aa7808 Received: from 9800f9ab4ec9.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id C85B15F1-0703-413A-8480-EF0E3B5A0A69.1; Tue, 23 Jun 2020 14:57:35 +0000 Received: from EUR02-VE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 9800f9ab4ec9.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Tue, 23 Jun 2020 14:57:35 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=gXCUPU2VADAjJE1m60tfF9S2xxT9IwYNsjDlHiKME1H8jEv3944oJ5nytNxGy/iGCcT6gs9tNGFGcotzEskrhCz+GNrixAlOWail5trYGSx84eY0+ZYuWHIoK9meg/7y1PTA76+uiAL51fBPsVucaV9qovbFziekQyb5Ys68vhDw/llG6YJELQTQqkjVJiwmA4PJBhvS90jbSfM7zu90XdCzQQ93enbNvDpvlLdoatA8Z+J2NQLnalcpEcaZF05pEAMmcOz6uGt71EF+VTWB2lDVYqgk46V6NqRourUl8T6uRx3mieVLOJXJWvWhtO5AlrtQhmX39ZZb2wOFMAwmaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=MbFUYzCd3SkFdQ+G7ltJwwQ/3o/jZiaDRMwHaWSZbm8=; b=FgyUF8xotvBXNkZHbZApwY/rPqrdFV55Y1L27ZlxIS3SnN/xiYO0JRmdoHfO/pdPpmZgoOFsj6vM2TmDZgTqbHQZ6WmtvU4ex5Zq6+Le+6p99PTthzApoQgK0iYYjHfVcEOPmNjAoF0TGJI15fyhuhitfvsEmcV2KEE6CTXfBYQbZwZ+algFO8YHF7UIq8aHpLjpbdLOqwnlp1VU1ekXDLF6Yj/tTHNXYT+NurEeWsfRtMIDoA1ms8nPY/vQQLN0JWFaw1iEoA+A4iLkRXrqwOXq0g+1kv1JdK4u+e1KLrFfq1QocbDUKVj29Vtm8V0taenoSLpg7nDnTnPuQrDf9g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=MbFUYzCd3SkFdQ+G7ltJwwQ/3o/jZiaDRMwHaWSZbm8=; b=ktDyUIFqgbQV8RoWW6ZC7LNL4a9BzDphtXXpP9/GFdo9kUd4KXsyYisneQqE50nbGHvXyD1AroaPX1I8Rt+a3CayIGnkyhVPZA4eKrwdvnjsXSp/j+UA9x07iVPo86AjlysS8sd7zrQmwP/ZvvIHlLfjHpKchv4rY2BlW/RjHW4= Authentication-Results-Original: gcc.gnu.org; dkim=none (message not signed) header.d=none;gcc.gnu.org; dmarc=none action=none header.from=arm.com; Received: from AM6PR08MB3157.eurprd08.prod.outlook.com (2603:10a6:209:48::24) by AM6PR08MB3222.eurprd08.prod.outlook.com (2603:10a6:209:43::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3109.24; Tue, 23 Jun 2020 14:57:33 +0000 Received: from AM6PR08MB3157.eurprd08.prod.outlook.com ([fe80::c03f:ab79:bd41:a19d]) by AM6PR08MB3157.eurprd08.prod.outlook.com ([fe80::c03f:ab79:bd41:a19d%2]) with mapi id 15.20.3109.027; Tue, 23 Jun 2020 14:57:33 +0000 From: Matthew Malcomson Date: Tue, 23 Jun 2020 15:57:20 +0100 To: gcc-patches@gcc.gnu.org References: <159160871195.11352.6180251272920790062.scripted-patch-series@arm.com> In-Reply-To: Subject: [Patch v2 3/3] aarch64: Mitigate SLS for BLR instruction X-ClientProxiedBy: LO2P265CA0143.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:9f::35) To AM6PR08MB3157.eurprd08.prod.outlook.com (2603:10a6:209:48::24) Message-ID: MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from e120487-lin.cambridge.arm.com (217.140.106.52) by LO2P265CA0143.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:9f::35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3109.24 via Frontend Transport; Tue, 23 Jun 2020 14:57:32 +0000 X-Originating-IP: [217.140.106.52] X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-HT: Tenant X-MS-Office365-Filtering-Correlation-Id: addb58d1-c823-4a47-c37d-08d81785c719 X-MS-TrafficTypeDiagnostic: AM6PR08MB3222:|DB8PR08MB5322: X-MS-Exchange-Transport-Forked: True X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true NoDisclaimer: true X-MS-Oob-TLC-OOBClassifiers: OLM:10000;OLM:10000; X-Forefront-PRVS: 04433051BF X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: HO76DMbZdyG4Be8hJE4KpIQJUVsf1aO4NAb4P2eQQBts9HFu4sIlOo0XPAVBY5zUuIqST9/zgdmhhiowgSIHKq8wT3QlcJxbOHEYpyalp26q0NlITReN7NnVv0b75/t8ok9nakIo2pLatFsXxjXk/rRQN+gefQchj7wJyJPDwDsN/f2wM9dY4eeckukA3Gi/cpksqd2y2Y8JQLQ6X8Woa9ixZ8Rgr3br+xqE/kvnulnYjcDVjnevAL9UXs2vWEWfbT+WmSifLbPX0JCMOZ812ysgTtlHytJ0AdYbv01TxW05DUIDnsU257Nj8mDr32T0nyAd6B8eryznYazCaYjjVQ== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:AM6PR08MB3157.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFTY:; SFS:(4636009)(376002)(346002)(39860400002)(396003)(136003)(366004)(9686003)(316002)(956004)(44832011)(16526019)(2906002)(186003)(4326008)(83380400001)(52536014)(5660300002)(235185007)(30864003)(6486002)(8676002)(66946007)(66616009)(66556008)(66476007)(86362001)(8936002)(6666004)(33964004)(6916009)(26005)(6512007)(52116002)(478600001)(33656002); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData: 8nN7aaru2DGI2iSlq4FJuT44OeEzSX1Kin43tkqfHm8XatItfYrs6hVMDGXg4CvTbxKd9+4BVgnM2X4IEPNchVlleQ0Hxbd1u01GSW3feao4HDNMSbniZsu3Ubao8RpK6DPuOnNFQD0ErZUhyKVp1iKq6FJUnkQ7Q/iZDvISlsFqLgpaJaJn5BHo3IAxOBnD33NrjRXU/DxhUIyoOYkbxIpbqf19HcPB9xXRe7QfN5hnfe35DvA19UDyyIupY9ukKd6JFhUJt3F0grF9w1+DGfXs4Z7g3jMOia8ANKI1vH7cQ8i0FzVB4mbKknwalBTq+L6xIsKmVsbdGL7KzVIgaebSZVVoIdJLsqeTAGNb8Q46JG5R6XIHcoZpR8MLMseTAYyHWCDIyV8ky2pO73WIsn32CTQEZOntFnijsZRW8pRVXCVo5rJw89mrMyKJshC2PioEqG8wmmr/JwpBq5JrO4RQC512KphCyyfxQPt4oig= X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM6PR08MB3222 Original-Authentication-Results: gcc.gnu.org; dkim=none (message not signed) header.d=none;gcc.gnu.org; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT012.eop-EUR03.prod.protection.outlook.com X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFTY:; SFS:(4636009)(136003)(396003)(39860400002)(346002)(376002)(46966005)(6916009)(9686003)(6512007)(33964004)(4326008)(6486002)(26005)(33656002)(86362001)(6666004)(30864003)(70586007)(66616009)(70206006)(52536014)(186003)(16526019)(235185007)(5660300002)(956004)(478600001)(44832011)(336012)(82740400003)(82310400002)(356005)(83380400001)(2906002)(81166007)(8676002)(47076004)(316002)(8936002)(36906005); DIR:OUT; SFP:1101; X-MS-Office365-Filtering-Correlation-Id-Prvs: a21ad5c7-afb5-42ce-af40-08d81785c241 X-Forefront-PRVS: 04433051BF X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: gN9ESOHkdUXYIyo8RjLla0Spb4wZm/et2BzMbfSfYjip6WLHANJkMG96+uMpvuPes6Mc9iH8wCZK7Ze4tD7A8vGAqcA9JyqL6VKP7igETX6y8603rk+tVtZuvJokTUXFJiwVxAh2SjELpCfSNyh01jzb3EtTota4YgeuFmjnyEP729ZHHTYtS37ojGyPlglXHGdNpfRpqV7h3MHlZJFIMUrb7Nv3jmC2XfEtCWMkRS6lZ8cWNmpV1nOM1EUdtUGjhSORe5S2iFJmW1MYH8aRshPXJyveZUhj92dmJ4nXH2aHX/MZWVBCBhgc3uObaKmRUDi5B26X5IGeOOPFdKhbGv/gczRilZrxsiyo7uIdEAV4HCf5xWZPLCsdp++kJPeMKDpb+ADgGafsaPDXcoKjDA== X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Jun 2020 14:57:41.0199 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: addb58d1-c823-4a47-c37d-08d81785c719 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB8PR08MB5322 X-Spam-Status: No, score=-18.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, KAM_LOTSOFHASH, MSGID_FROM_MTA_HEADER, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kristof.Beyls@arm.com, Richard.Earnshaw@arm.com, Marcus.Shawcroft@arm.com Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" This patch introduces the mitigation for Straight Line Speculation past the BLR instruction. This mitigation replaces BLR instructions with a BL to a stub which uses a BR to jump to the original value. These function stubs are then appended with a speculation barrier to ensure no straight line speculation happens after these jumps. When optimising for speed we use a set of stubs for each function since this should help the branch predictor make more accurate predictions about where a stub should branch. When optimising for size we use one set of stubs for all functions. This set of stubs can have human readable names, and we are using `__call_indirect_x` for register x. When BTI branch protection is enabled the BLR instruction can jump to a `BTI c` instruction using any register, while the BR instruction can only jump to a `BTI c` instruction using the x16 or x17 registers. Hence, in order to ensure this transformation is safe we mov the value of the original register into x16 and use x16 for the BR. As an example when optimising for size: a BLR x0 instruction would get transformed to something like BL __call_indirect_x0 where __call_indirect_x0 labels a thunk that contains __call_indirect_x0: MOV X16, X0 BR X16 The first version of this patch used local symbols specific to a compilation unit to try and avoid relocations. This was mistaken since functions coming from the same compilation unit can still be in different sections, and the assembler will insert relocations at jumps between sections. On any relocation the linker is permitted to emit a veneer to handle jumps between symbols that are very far apart. The registers x16 and x17 may be clobbered by these veneers. Hence the function stubs cannot rely on the values of x16 and x17 being the same as just before the function stub is called. Similar can be said for the hot/cold partitioning of single functions, so function-local stubs have the same restriction. This updated version of the patch never emits function stubs for x16 and x17, and instead forces other registers to be used. Given the above, there is now no benefit to local symbols (since they are not enough to avoid dealing with linker intricacies). This patch now uses global symbols with hidden visibility each stored in their own COMDAT section. This means stubs can be shared between compilation units while still avoiding the PLT indirection. This patch also removes the `__call_indirect_x30` stub (and function-local equivalent) which would simply jump back to the original location. The function-local stubs are emitted to the assembly output file in one chunk, which means we need not add the speculation barrier directly after each one. This is because we know for certain that the instructions directly after the BR in all but the last function stub will be from another one of these stubs and hence will not contain a speculation gadget. Instead we add a speculation barrier at the end of the sequence of stubs. The global stubs are emitted in COMDAT/.linkonce sections by themselves so that the linker can remove duplicates from multiple object files. This means they are not emitted in one chunk, and each one must include the speculation barrier. Another difference is that since the global stubs are shared across compilation units we do not know that all functions will be targeting an architecture supporting the SB instruction. Rather than provide multiple stubs for each architecture, we provide a stub that will work for all architectures -- using the DSB+ISB barrier. This mitigation does not apply for BLR instructions in the following places: - Some accesses to thread-local variables use a code sequence with a BLR instruction. This code sequence is part of the binary interface between compiler and linker. If this BLR instruction needs to be mitigated, it'd probably be best to do so in the linker. It seems that the code sequence for thread-local variable access is unlikely to lead to a Spectre Revalation Gadget. - PLT stubs are produced by the linker and each contain a BLR instruction. It seems that at most only after the last PLT stub a Spectre Revalation Gadget might appear. Testing: Bootstrap and regtest on AArch64 (with BOOT_CFLAGS="-mharden-sls=retbr,blr") Used a temporary hack(1) in gcc-dg.exp to use these options on every test in the testsuite, a slight modification to emit the speculation barrier after every function stub, and a script to check that the output never emitted a BLR, or unmitigated BR or RET instruction. Similar on an aarch64-none-elf cross-compiler. 1) Temporary hack emitted a speculation barrier at the end of every stub function, and used a script to ensure that: a) Every RET or BR is immediately followed by a speculation barrier. b) No BLR instruction is emitted by compiler. gcc/ChangeLog: 2020-06-23 Matthew Malcomson * config/aarch64/aarch64-protos.h (aarch64_indirect_call_asm): New declaration. * config/aarch64/aarch64.c (aarch64_regno_regclass): Handle new stub registers class. (aarch64_class_max_nregs): Likewise. (aarch64_register_move_cost): Likewise. (aarch64_sls_shared_thunks): Global array to store stub labels. (aarch64_sls_emit_function_stub): New. (aarch64_create_blr_label): New. (aarch64_sls_emit_blr_function_thunks): New. (aarch64_sls_emit_shared_blr_thunks): New. (aarch64_asm_file_end): New. (aarch64_indirect_call_asm): New. (TARGET_ASM_FILE_END): Use aarch64_asm_file_end. (TARGET_ASM_FUNCTION_EPILOGUE): Use aarch64_sls_emit_blr_function_thunks. * config/aarch64/aarch64.h (STB_REGNUM_P): New. (enum reg_class): Add STUB_REGS class. (machine_function): Introduce `call_via` array for function-local stub labels. * config/aarch64/aarch64.md (*call_insn, *call_value_insn): Use aarch64_indirect_call_asm to emit code when hardening BLR instructions. * config/aarch64/constraints.md (Ucr): New constraint representing registers for indirect calls. Is GENERAL_REGS usually, and STUB_REGS when hardening BLR instruction against SLS. * config/aarch64/predicates.md (aarch64_general_reg): STUB_REGS class is also a general register. gcc/testsuite/ChangeLog: 2020-06-23 Matthew Malcomson * gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c: New test. * gcc.target/aarch64/sls-mitigation/sls-miti-blr.c: New test. ############### Attachment also inlined for ease of reply ############### diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index d2eb739bc89ecd9d0212416b8dc3ee4ba236a271..e79f9cbc783e75132e999395ff975f9768436419 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -781,6 +781,7 @@ extern const atomic_ool_names aarch64_ool_ldeor_names; tree aarch64_resolve_overloaded_builtin_general (location_t, tree, void *); const char * aarch64_sls_barrier (int); +const char * aarch64_indirect_call_asm (rtx); extern bool aarch64_harden_sls_retbr_p (void); extern bool aarch64_harden_sls_blr_p (void); diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index f996472d6990b7709602ae93f7a2cb7daa0e84b0..9795c929b8733f89722d3660456f5e7d6405d902 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -643,6 +643,16 @@ extern unsigned aarch64_architecture_version; #define GP_REGNUM_P(REGNO) \ (((unsigned) (REGNO - R0_REGNUM)) <= (R30_REGNUM - R0_REGNUM)) +/* Registers known to be preserved over a BL instruction. This consists of the + GENERAL_REGS without x16, x17, and x30. The x30 register is changed by the BL + instruction itself, while the x16 and x17 registers may be used by veneers + which can be inserted by the linker. */ +#define STUB_REGNUM_P(REGNO) \ + (GP_REGNUM_P (REGNO) \ + && ((unsigned) (REGNO - R0_REGNUM)) != (R16_REGNUM - R0_REGNUM) \ + && ((unsigned) (REGNO - R0_REGNUM)) != (R17_REGNUM - R0_REGNUM) \ + && ((unsigned) (REGNO - R0_REGNUM)) != (R30_REGNUM - R0_REGNUM)) \ + #define FP_REGNUM_P(REGNO) \ (((unsigned) (REGNO - V0_REGNUM)) <= (V31_REGNUM - V0_REGNUM)) @@ -667,6 +677,7 @@ enum reg_class { NO_REGS, TAILCALL_ADDR_REGS, + STUB_REGS, GENERAL_REGS, STACK_REG, POINTER_REGS, @@ -689,6 +700,7 @@ enum reg_class { \ "NO_REGS", \ "TAILCALL_ADDR_REGS", \ + "STUB_REGS", \ "GENERAL_REGS", \ "STACK_REG", \ "POINTER_REGS", \ @@ -708,6 +720,7 @@ enum reg_class { \ { 0x00000000, 0x00000000, 0x00000000 }, /* NO_REGS */ \ { 0x00030000, 0x00000000, 0x00000000 }, /* TAILCALL_ADDR_REGS */\ + { 0x3ffcffff, 0x00000000, 0x00000000 }, /* STUB_REGS */ \ { 0x7fffffff, 0x00000000, 0x00000003 }, /* GENERAL_REGS */ \ { 0x80000000, 0x00000000, 0x00000000 }, /* STACK_REG */ \ { 0xffffffff, 0x00000000, 0x00000003 }, /* POINTER_REGS */ \ @@ -879,6 +892,8 @@ typedef struct GTY (()) machine_function struct aarch64_frame frame; /* One entry for each hard register. */ bool reg_is_wrapped_separately[LAST_SAVED_REGNUM]; + /* One entry for each general purpose register. */ + rtx call_via[SP_REGNUM]; bool label_is_assembled; } machine_function; #endif diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 27a6b78ec6925106f7b745d949b510b6f273c651..17b040e2d09a8a4960fd6b02d53f4ccee78f9e93 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -10607,6 +10607,9 @@ aarch64_label_mentioned_p (rtx x) enum reg_class aarch64_regno_regclass (unsigned regno) { + if (STUB_REGNUM_P (regno)) + return STUB_REGS; + if (GP_REGNUM_P (regno)) return GENERAL_REGS; @@ -10869,7 +10872,7 @@ aarch64_asm_trampoline_template (FILE *f) specific attributes to choose between hardening against straight line speculation or not, but such function specific attributes are likely to happen in the future. */ - output_asm_insn ("dsb\tsy\n\tisb", NULL); + asm_fprintf (f, "\tdsb\tsy\n\tisb\n"); /* The trampoline needs an extra padding instruction. In case if BTI is enabled the padding instruction is replaced by the BTI instruction at @@ -10919,6 +10922,7 @@ aarch64_class_max_nregs (reg_class_t regclass, machine_mode mode) unsigned int nregs, vec_flags; switch (regclass) { + case STUB_REGS: case TAILCALL_ADDR_REGS: case POINTER_REGS: case GENERAL_REGS: @@ -13157,10 +13161,12 @@ aarch64_register_move_cost (machine_mode mode, = aarch64_tune_params.regmove_cost; /* Caller save and pointer regs are equivalent to GENERAL_REGS. */ - if (to == TAILCALL_ADDR_REGS || to == POINTER_REGS) + if (to == TAILCALL_ADDR_REGS || to == POINTER_REGS + || to == STUB_REGS) to = GENERAL_REGS; - if (from == TAILCALL_ADDR_REGS || from == POINTER_REGS) + if (from == TAILCALL_ADDR_REGS || from == POINTER_REGS + || from == STUB_REGS) from = GENERAL_REGS; /* Make RDFFR very expensive. In particular, if we know that the FFR @@ -22964,6 +22970,215 @@ aarch64_sls_barrier (int mitigation_required) : ""; } +static GTY (()) tree aarch64_sls_shared_thunks[30]; +static GTY (()) bool aarch64_sls_shared_thunks_needed = false; +const char *indirect_symbol_names[30] = { + "__call_indirect_x0", + "__call_indirect_x1", + "__call_indirect_x2", + "__call_indirect_x3", + "__call_indirect_x4", + "__call_indirect_x5", + "__call_indirect_x6", + "__call_indirect_x7", + "__call_indirect_x8", + "__call_indirect_x9", + "__call_indirect_x10", + "__call_indirect_x11", + "__call_indirect_x12", + "__call_indirect_x13", + "__call_indirect_x14", + "__call_indirect_x15", + "", /* "__call_indirect_x16", */ + "", /* "__call_indirect_x17", */ + "__call_indirect_x18", + "__call_indirect_x19", + "__call_indirect_x20", + "__call_indirect_x21", + "__call_indirect_x22", + "__call_indirect_x23", + "__call_indirect_x24", + "__call_indirect_x25", + "__call_indirect_x26", + "__call_indirect_x27", + "__call_indirect_x28", + "__call_indirect_x29", +}; + +/* Function to create a BLR thunk. This thunk is used to mitigate straight + line speculation. Instead of a simple BLR that can be speculated past, + we emit a BL to this thunk, and this thunk contains a BR to the relevant + register. These thunks have the relevant speculation barries put after + their indirect branch so that speculation is blocked. + + We use such a thunk so the speculation barriers are kept off the + architecturally executed path in order to reduce the performance overhead. + + When optimising for size we use stubs shared by the linked object. + When optimising for performance we emit stubs for each function in the hope + that the branch predictor can better train on jumps specific for a given + function. */ +rtx +aarch64_sls_create_blr_label (int regnum) +{ + gcc_assert (regnum < 30 && regnum != 16 && regnum != 17); + if (optimize_function_for_size_p (cfun)) + { + /* For the thunks shared between different functions in this compilation + unit we use a named symbol -- this is just for users to more easily + understand the generated assembly. */ + aarch64_sls_shared_thunks_needed = true; + const char *thunk_name = indirect_symbol_names[regnum]; + if (aarch64_sls_shared_thunks[regnum] == NULL) + { + /* Build a decl representing this function stub and record it for + later. We build a decl here so we can use the GCC machinery for + handling sections automatically (through `get_named_section` and + `make_decl_one_only`). That saves us a lot of trouble handling + the specifics of different output file formats. */ + tree decl = build_decl (BUILTINS_LOCATION, FUNCTION_DECL, + get_identifier (thunk_name), + build_function_type_list (void_type_node, + NULL_TREE)); + DECL_RESULT (decl) = build_decl (BUILTINS_LOCATION, RESULT_DECL, + NULL_TREE, void_type_node); + TREE_PUBLIC (decl) = 1; + TREE_STATIC (decl) = 1; + DECL_IGNORED_P (decl) = 1; + DECL_ARTIFICIAL (decl) = 1; + make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl)); + resolve_unique_section (decl, 0, false); + aarch64_sls_shared_thunks[regnum] = decl; + } + + return gen_rtx_SYMBOL_REF (Pmode, thunk_name); + } + + if (cfun->machine->call_via[regnum] == NULL) + cfun->machine->call_via[regnum] + = gen_rtx_LABEL_REF (Pmode, gen_label_rtx ()); + return cfun->machine->call_via[regnum]; +} + +/* Helper function for aarch64_sls_emit_blr_function_thunks and + aarch64_sls_emit_shared_blr_thunks below. */ +static void +aarch64_sls_emit_function_stub (FILE *out_file, int regnum) +{ + /* Save in x16 and branch to that function so this transformation does + not prevent jumping to `BTI c` instructions. */ + asm_fprintf (out_file, "\tmov\tx16, x%d\n", regnum); + asm_fprintf (out_file, "\tbr\tx16\n"); +} + +/* Emit all BLR stubs for this particular function. + Here we emit all the BLR stubs needed for the current function. Since we + emit these stubs in a consecutive block we know there will be no speculation + gadgets between each stub, and hence we only emit a speculation barrier at + the end of the stub sequences. + + This is called in the TARGET_ASM_FUNCTION_EPILOGUE hook. */ +void +aarch64_sls_emit_blr_function_thunks (FILE *out_file) +{ + if (! aarch64_harden_sls_blr_p ()) + return; + + bool any_functions_emitted = false; + /* We must save and restore the current function section since this assembly + is emitted at the end of the function. This means it can be emitted *just + after* the cold section of a function. That cold part would be emitted in + a different section. That switch would trigger a `.cfi_endproc` directive + to be emitted in the original section and a `.cfi_startproc` directive to + be emitted in the new section. Switching to the original section without + restoring would mean that the `.cfi_endproc` emitted as a function ends + would happen in a different section -- leaving an unmatched + `.cfi_startproc` in the cold text section and an unmatched `.cfi_endproc` + in the standard text section. */ + section *save_text_section = in_section; + switch_to_section (function_section (current_function_decl)); + for (int regnum = 0; regnum < 30; ++regnum) + { + rtx specu_label = cfun->machine->call_via[regnum]; + if (specu_label == NULL) + continue; + + targetm.asm_out.print_operand (out_file, specu_label, 0); + asm_fprintf (out_file, ":\n"); + aarch64_sls_emit_function_stub (out_file, regnum); + any_functions_emitted = true; + } + if (any_functions_emitted) + /* Can use the SB if needs be here, since this stub will only be used + by the current function, and hence for the current target. */ + asm_fprintf (out_file, "\t%s\n", aarch64_sls_barrier (true)); + switch_to_section (save_text_section); +} + +/* Emit shared BLR stubs for the current compilation unit. + Over the course of compiling this unit we may have converted some BLR + instructions to a BL to a shared stub function. This is where we emit those + stub functions. + This function is for the stubs shared between different functions in this + compilation unit. We share when optimising for size instead of speed. + + This function is called through the TARGET_ASM_FILE_END hook. */ +void +aarch64_sls_emit_shared_blr_thunks (FILE *out_file) +{ + if (! aarch64_sls_shared_thunks_needed) + return; + + for (int regnum = 0; regnum < 30; ++regnum) + { + tree decl = aarch64_sls_shared_thunks[regnum]; + if (!decl) + continue; + + const char *name = indirect_symbol_names[regnum]; + switch_to_section (get_named_section (decl, NULL, 0)); + ASM_OUTPUT_ALIGN (out_file, 2); + targetm.asm_out.globalize_label (out_file, name); + /* Only emits if the compiler is configured for an assembler that can + handle visibility directives. */ + targetm.asm_out.assemble_visibility (decl, VISIBILITY_HIDDEN); + ASM_OUTPUT_TYPE_DIRECTIVE (out_file, name, "function"); + ASM_OUTPUT_LABEL (out_file, name); + aarch64_sls_emit_function_stub (out_file, regnum); + /* Use the most conservative target to ensure it can always be used by any + function in the translation unit. */ + asm_fprintf (out_file, "\tdsb\tsy\n\tisb\n"); + ASM_DECLARE_FUNCTION_SIZE (out_file, name, decl); + } +} + +/* Implement TARGET_ASM_FILE_END. */ +void +aarch64_asm_file_end () +{ + aarch64_sls_emit_shared_blr_thunks (asm_out_file); + /* Since this function will be called for the ASM_FILE_END hook, we ensure + that what would be called otherwise (e.g. `file_end_indicate_exec_stack` + for FreeBSD) still gets called. */ +#ifdef TARGET_ASM_FILE_END + TARGET_ASM_FILE_END (); +#endif +} + +const char * +aarch64_indirect_call_asm (rtx addr) +{ + gcc_assert (REG_P (addr)); + if (aarch64_harden_sls_blr_p ()) + { + rtx stub_label = aarch64_sls_create_blr_label (REGNO (addr)); + output_asm_insn ("bl\t%0", &stub_label); + } + else + output_asm_insn ("blr\t%0", &addr); + return ""; +} + /* Target-specific selftests. */ #if CHECKING_P @@ -23514,6 +23729,12 @@ aarch64_libgcc_floating_mode_supported_p #undef TARGET_MD_ASM_ADJUST #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust +#undef TARGET_ASM_FILE_END +#define TARGET_ASM_FILE_END aarch64_asm_file_end + +#undef TARGET_ASM_FUNCTION_EPILOGUE +#define TARGET_ASM_FUNCTION_EPILOGUE aarch64_sls_emit_blr_function_thunks + struct gcc_target targetm = TARGET_INITIALIZER; #include "gt-aarch64.h" diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 2a29d650a24cb5e576620f81b7f6541b0c08d044..660eb207fc87477b9cadbe74b102fca53d64400d 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1019,16 +1019,15 @@ ) (define_insn "*call_insn" - [(call (mem:DI (match_operand:DI 0 "aarch64_call_insn_operand" "r, Usf")) + [(call (mem:DI (match_operand:DI 0 "aarch64_call_insn_operand" "Ucr, Usf")) (match_operand 1 "" "")) (unspec:DI [(match_operand:DI 2 "const_int_operand")] UNSPEC_CALLEE_ABI) (clobber (reg:DI LR_REGNUM))] "" "@ - blr\\t%0 + * return aarch64_indirect_call_asm (operands[0]); bl\\t%c0" - [(set_attr "type" "call, call")] -) + [(set_attr "type" "call, call")]) (define_expand "call_value" [(parallel @@ -1047,13 +1046,13 @@ (define_insn "*call_value_insn" [(set (match_operand 0 "" "") - (call (mem:DI (match_operand:DI 1 "aarch64_call_insn_operand" "r, Usf")) + (call (mem:DI (match_operand:DI 1 "aarch64_call_insn_operand" "Ucr, Usf")) (match_operand 2 "" ""))) (unspec:DI [(match_operand:DI 3 "const_int_operand")] UNSPEC_CALLEE_ABI) (clobber (reg:DI LR_REGNUM))] "" "@ - blr\\t%1 + * return aarch64_indirect_call_asm (operands[1]); bl\\t%c1" [(set_attr "type" "call, call")] ) diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md index d993268a187fad9c80c32b16d8e95b26783bde24..8cc6f50888122b707a087984afc6d5ec354e1e2c 100644 --- a/gcc/config/aarch64/constraints.md +++ b/gcc/config/aarch64/constraints.md @@ -24,6 +24,15 @@ (define_register_constraint "Ucs" "TAILCALL_ADDR_REGS" "@internal Registers suitable for an indirect tail call") +(define_register_constraint "Ucr" + "aarch64_harden_sls_blr_p () ? STUB_REGS : GENERAL_REGS" + "@internal Registers to be used for an indirect call. + This is usually the general registers, but when we are hardening against + Straight Line Speculation we disallow x16, x17, and x30 so we can use + indirection stubs. These indirection stubs cannot use the above registers + since they will be reached by a BL that may have to go through a linker + veneer.") + (define_register_constraint "w" "FP_REGS" "Floating point and SIMD vector registers.") diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index 215fcec5955340288572e816216274faf84ce7b0..1754b1eff9f9bfa1117e03acaf226fde36d53375 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -32,7 +32,8 @@ (define_predicate "aarch64_general_reg" (and (match_operand 0 "register_operand") - (match_test "REGNO_REG_CLASS (REGNO (op)) == GENERAL_REGS"))) + (match_test "REGNO_REG_CLASS (REGNO (op)) == STUB_REGS + || REGNO_REG_CLASS (REGNO (op)) == GENERAL_REGS"))) ;; Return true if OP a (const_int 0) operand. (define_predicate "const0_operand" diff --git a/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c new file mode 100644 index 0000000000000000000000000000000000000000..8adf753b4c5b4802bc80c725c9b36a5e9997b52f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c @@ -0,0 +1,40 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-mharden-sls=blr -mbranch-protection=bti" } */ +/* + Ensure that the SLS hardening of BLR leaves no BLR instructions. + Here we also check that there are no BR instructions with anything except an + x16 or x17 register. This is because a `BTI c` instruction can be branched + to using a BLR instruction using any register, but can only be branched to + with a BR using an x16 or x17 register. + */ +typedef int (foo) (int, int); +typedef void (bar) (int, int); +struct sls_testclass { + foo *x; + bar *y; + int left; + int right; +}; + +/* We test both RTL patterns for a call which returns a value and a call which + does not. */ +int blr_call_value (struct sls_testclass x) +{ + int retval = x.x(x.left, x.right); + if (retval % 10) + return 100; + return 9; +} + +int blr_call (struct sls_testclass x) +{ + x.y(x.left, x.right); + if (x.left % 10) + return 100; + return 9; +} + +/* { dg-final { scan-assembler-not "\tblr\t" } } */ +/* { dg-final { scan-assembler-not "\tbr\tx(?!16|17)" } } */ +/* { dg-final { scan-assembler "\tbr\tx(16|17)" } } */ + diff --git a/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr.c b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr.c new file mode 100644 index 0000000000000000000000000000000000000000..e8d22f438b22e763e1ee3171efc1b8c464b17185 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr.c @@ -0,0 +1,35 @@ +/* { dg-additional-options "-mharden-sls=blr -save-temps" } */ +/* + Ensure that the SLS hardening of BLR leaves no BLR instructions. + We only test that all BLR instructions have been removed, not that the + resulting code makes sense. + */ +typedef int (foo) (int, int); +typedef void (bar) (int, int); +struct sls_testclass { + foo *x; + bar *y; + int left; + int right; +}; + +/* We test both RTL patterns for a call which returns a value and a call which + does not. */ +int blr_call_value (struct sls_testclass x) +{ + int retval = x.x(x.left, x.right); + if (retval % 10) + return 100; + return 9; +} + +int blr_call (struct sls_testclass x) +{ + x.y(x.left, x.right); + if (x.left % 10) + return 100; + return 9; +} + +/* { dg-final { scan-assembler-not "\tblr\t" } } */ +/* { dg-final { scan-assembler "\tbr\tx\[0-9\]\[0-9\]?" } } */ diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index d2eb739bc89ecd9d0212416b8dc3ee4ba236a271..e79f9cbc783e75132e999395ff975f9768436419 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -781,6 +781,7 @@ extern const atomic_ool_names aarch64_ool_ldeor_names; tree aarch64_resolve_overloaded_builtin_general (location_t, tree, void *); const char * aarch64_sls_barrier (int); +const char * aarch64_indirect_call_asm (rtx); extern bool aarch64_harden_sls_retbr_p (void); extern bool aarch64_harden_sls_blr_p (void); diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index f996472d6990b7709602ae93f7a2cb7daa0e84b0..9795c929b8733f89722d3660456f5e7d6405d902 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -643,6 +643,16 @@ extern unsigned aarch64_architecture_version; #define GP_REGNUM_P(REGNO) \ (((unsigned) (REGNO - R0_REGNUM)) <= (R30_REGNUM - R0_REGNUM)) +/* Registers known to be preserved over a BL instruction. This consists of the + GENERAL_REGS without x16, x17, and x30. The x30 register is changed by the BL + instruction itself, while the x16 and x17 registers may be used by veneers + which can be inserted by the linker. */ +#define STUB_REGNUM_P(REGNO) \ + (GP_REGNUM_P (REGNO) \ + && ((unsigned) (REGNO - R0_REGNUM)) != (R16_REGNUM - R0_REGNUM) \ + && ((unsigned) (REGNO - R0_REGNUM)) != (R17_REGNUM - R0_REGNUM) \ + && ((unsigned) (REGNO - R0_REGNUM)) != (R30_REGNUM - R0_REGNUM)) \ + #define FP_REGNUM_P(REGNO) \ (((unsigned) (REGNO - V0_REGNUM)) <= (V31_REGNUM - V0_REGNUM)) @@ -667,6 +677,7 @@ enum reg_class { NO_REGS, TAILCALL_ADDR_REGS, + STUB_REGS, GENERAL_REGS, STACK_REG, POINTER_REGS, @@ -689,6 +700,7 @@ enum reg_class { \ "NO_REGS", \ "TAILCALL_ADDR_REGS", \ + "STUB_REGS", \ "GENERAL_REGS", \ "STACK_REG", \ "POINTER_REGS", \ @@ -708,6 +720,7 @@ enum reg_class { \ { 0x00000000, 0x00000000, 0x00000000 }, /* NO_REGS */ \ { 0x00030000, 0x00000000, 0x00000000 }, /* TAILCALL_ADDR_REGS */\ + { 0x3ffcffff, 0x00000000, 0x00000000 }, /* STUB_REGS */ \ { 0x7fffffff, 0x00000000, 0x00000003 }, /* GENERAL_REGS */ \ { 0x80000000, 0x00000000, 0x00000000 }, /* STACK_REG */ \ { 0xffffffff, 0x00000000, 0x00000003 }, /* POINTER_REGS */ \ @@ -879,6 +892,8 @@ typedef struct GTY (()) machine_function struct aarch64_frame frame; /* One entry for each hard register. */ bool reg_is_wrapped_separately[LAST_SAVED_REGNUM]; + /* One entry for each general purpose register. */ + rtx call_via[SP_REGNUM]; bool label_is_assembled; } machine_function; #endif diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 27a6b78ec6925106f7b745d949b510b6f273c651..17b040e2d09a8a4960fd6b02d53f4ccee78f9e93 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -10607,6 +10607,9 @@ aarch64_label_mentioned_p (rtx x) enum reg_class aarch64_regno_regclass (unsigned regno) { + if (STUB_REGNUM_P (regno)) + return STUB_REGS; + if (GP_REGNUM_P (regno)) return GENERAL_REGS; @@ -10869,7 +10872,7 @@ aarch64_asm_trampoline_template (FILE *f) specific attributes to choose between hardening against straight line speculation or not, but such function specific attributes are likely to happen in the future. */ - output_asm_insn ("dsb\tsy\n\tisb", NULL); + asm_fprintf (f, "\tdsb\tsy\n\tisb\n"); /* The trampoline needs an extra padding instruction. In case if BTI is enabled the padding instruction is replaced by the BTI instruction at @@ -10919,6 +10922,7 @@ aarch64_class_max_nregs (reg_class_t regclass, machine_mode mode) unsigned int nregs, vec_flags; switch (regclass) { + case STUB_REGS: case TAILCALL_ADDR_REGS: case POINTER_REGS: case GENERAL_REGS: @@ -13157,10 +13161,12 @@ aarch64_register_move_cost (machine_mode mode, = aarch64_tune_params.regmove_cost; /* Caller save and pointer regs are equivalent to GENERAL_REGS. */ - if (to == TAILCALL_ADDR_REGS || to == POINTER_REGS) + if (to == TAILCALL_ADDR_REGS || to == POINTER_REGS + || to == STUB_REGS) to = GENERAL_REGS; - if (from == TAILCALL_ADDR_REGS || from == POINTER_REGS) + if (from == TAILCALL_ADDR_REGS || from == POINTER_REGS + || from == STUB_REGS) from = GENERAL_REGS; /* Make RDFFR very expensive. In particular, if we know that the FFR @@ -22964,6 +22970,215 @@ aarch64_sls_barrier (int mitigation_required) : ""; } +static GTY (()) tree aarch64_sls_shared_thunks[30]; +static GTY (()) bool aarch64_sls_shared_thunks_needed = false; +const char *indirect_symbol_names[30] = { + "__call_indirect_x0", + "__call_indirect_x1", + "__call_indirect_x2", + "__call_indirect_x3", + "__call_indirect_x4", + "__call_indirect_x5", + "__call_indirect_x6", + "__call_indirect_x7", + "__call_indirect_x8", + "__call_indirect_x9", + "__call_indirect_x10", + "__call_indirect_x11", + "__call_indirect_x12", + "__call_indirect_x13", + "__call_indirect_x14", + "__call_indirect_x15", + "", /* "__call_indirect_x16", */ + "", /* "__call_indirect_x17", */ + "__call_indirect_x18", + "__call_indirect_x19", + "__call_indirect_x20", + "__call_indirect_x21", + "__call_indirect_x22", + "__call_indirect_x23", + "__call_indirect_x24", + "__call_indirect_x25", + "__call_indirect_x26", + "__call_indirect_x27", + "__call_indirect_x28", + "__call_indirect_x29", +}; + +/* Function to create a BLR thunk. This thunk is used to mitigate straight + line speculation. Instead of a simple BLR that can be speculated past, + we emit a BL to this thunk, and this thunk contains a BR to the relevant + register. These thunks have the relevant speculation barries put after + their indirect branch so that speculation is blocked. + + We use such a thunk so the speculation barriers are kept off the + architecturally executed path in order to reduce the performance overhead. + + When optimising for size we use stubs shared by the linked object. + When optimising for performance we emit stubs for each function in the hope + that the branch predictor can better train on jumps specific for a given + function. */ +rtx +aarch64_sls_create_blr_label (int regnum) +{ + gcc_assert (regnum < 30 && regnum != 16 && regnum != 17); + if (optimize_function_for_size_p (cfun)) + { + /* For the thunks shared between different functions in this compilation + unit we use a named symbol -- this is just for users to more easily + understand the generated assembly. */ + aarch64_sls_shared_thunks_needed = true; + const char *thunk_name = indirect_symbol_names[regnum]; + if (aarch64_sls_shared_thunks[regnum] == NULL) + { + /* Build a decl representing this function stub and record it for + later. We build a decl here so we can use the GCC machinery for + handling sections automatically (through `get_named_section` and + `make_decl_one_only`). That saves us a lot of trouble handling + the specifics of different output file formats. */ + tree decl = build_decl (BUILTINS_LOCATION, FUNCTION_DECL, + get_identifier (thunk_name), + build_function_type_list (void_type_node, + NULL_TREE)); + DECL_RESULT (decl) = build_decl (BUILTINS_LOCATION, RESULT_DECL, + NULL_TREE, void_type_node); + TREE_PUBLIC (decl) = 1; + TREE_STATIC (decl) = 1; + DECL_IGNORED_P (decl) = 1; + DECL_ARTIFICIAL (decl) = 1; + make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl)); + resolve_unique_section (decl, 0, false); + aarch64_sls_shared_thunks[regnum] = decl; + } + + return gen_rtx_SYMBOL_REF (Pmode, thunk_name); + } + + if (cfun->machine->call_via[regnum] == NULL) + cfun->machine->call_via[regnum] + = gen_rtx_LABEL_REF (Pmode, gen_label_rtx ()); + return cfun->machine->call_via[regnum]; +} + +/* Helper function for aarch64_sls_emit_blr_function_thunks and + aarch64_sls_emit_shared_blr_thunks below. */ +static void +aarch64_sls_emit_function_stub (FILE *out_file, int regnum) +{ + /* Save in x16 and branch to that function so this transformation does + not prevent jumping to `BTI c` instructions. */ + asm_fprintf (out_file, "\tmov\tx16, x%d\n", regnum); + asm_fprintf (out_file, "\tbr\tx16\n"); +} + +/* Emit all BLR stubs for this particular function. + Here we emit all the BLR stubs needed for the current function. Since we + emit these stubs in a consecutive block we know there will be no speculation + gadgets between each stub, and hence we only emit a speculation barrier at + the end of the stub sequences. + + This is called in the TARGET_ASM_FUNCTION_EPILOGUE hook. */ +void +aarch64_sls_emit_blr_function_thunks (FILE *out_file) +{ + if (! aarch64_harden_sls_blr_p ()) + return; + + bool any_functions_emitted = false; + /* We must save and restore the current function section since this assembly + is emitted at the end of the function. This means it can be emitted *just + after* the cold section of a function. That cold part would be emitted in + a different section. That switch would trigger a `.cfi_endproc` directive + to be emitted in the original section and a `.cfi_startproc` directive to + be emitted in the new section. Switching to the original section without + restoring would mean that the `.cfi_endproc` emitted as a function ends + would happen in a different section -- leaving an unmatched + `.cfi_startproc` in the cold text section and an unmatched `.cfi_endproc` + in the standard text section. */ + section *save_text_section = in_section; + switch_to_section (function_section (current_function_decl)); + for (int regnum = 0; regnum < 30; ++regnum) + { + rtx specu_label = cfun->machine->call_via[regnum]; + if (specu_label == NULL) + continue; + + targetm.asm_out.print_operand (out_file, specu_label, 0); + asm_fprintf (out_file, ":\n"); + aarch64_sls_emit_function_stub (out_file, regnum); + any_functions_emitted = true; + } + if (any_functions_emitted) + /* Can use the SB if needs be here, since this stub will only be used + by the current function, and hence for the current target. */ + asm_fprintf (out_file, "\t%s\n", aarch64_sls_barrier (true)); + switch_to_section (save_text_section); +} + +/* Emit shared BLR stubs for the current compilation unit. + Over the course of compiling this unit we may have converted some BLR + instructions to a BL to a shared stub function. This is where we emit those + stub functions. + This function is for the stubs shared between different functions in this + compilation unit. We share when optimising for size instead of speed. + + This function is called through the TARGET_ASM_FILE_END hook. */ +void +aarch64_sls_emit_shared_blr_thunks (FILE *out_file) +{ + if (! aarch64_sls_shared_thunks_needed) + return; + + for (int regnum = 0; regnum < 30; ++regnum) + { + tree decl = aarch64_sls_shared_thunks[regnum]; + if (!decl) + continue; + + const char *name = indirect_symbol_names[regnum]; + switch_to_section (get_named_section (decl, NULL, 0)); + ASM_OUTPUT_ALIGN (out_file, 2); + targetm.asm_out.globalize_label (out_file, name); + /* Only emits if the compiler is configured for an assembler that can + handle visibility directives. */ + targetm.asm_out.assemble_visibility (decl, VISIBILITY_HIDDEN); + ASM_OUTPUT_TYPE_DIRECTIVE (out_file, name, "function"); + ASM_OUTPUT_LABEL (out_file, name); + aarch64_sls_emit_function_stub (out_file, regnum); + /* Use the most conservative target to ensure it can always be used by any + function in the translation unit. */ + asm_fprintf (out_file, "\tdsb\tsy\n\tisb\n"); + ASM_DECLARE_FUNCTION_SIZE (out_file, name, decl); + } +} + +/* Implement TARGET_ASM_FILE_END. */ +void +aarch64_asm_file_end () +{ + aarch64_sls_emit_shared_blr_thunks (asm_out_file); + /* Since this function will be called for the ASM_FILE_END hook, we ensure + that what would be called otherwise (e.g. `file_end_indicate_exec_stack` + for FreeBSD) still gets called. */ +#ifdef TARGET_ASM_FILE_END + TARGET_ASM_FILE_END (); +#endif +} + +const char * +aarch64_indirect_call_asm (rtx addr) +{ + gcc_assert (REG_P (addr)); + if (aarch64_harden_sls_blr_p ()) + { + rtx stub_label = aarch64_sls_create_blr_label (REGNO (addr)); + output_asm_insn ("bl\t%0", &stub_label); + } + else + output_asm_insn ("blr\t%0", &addr); + return ""; +} + /* Target-specific selftests. */ #if CHECKING_P @@ -23514,6 +23729,12 @@ aarch64_libgcc_floating_mode_supported_p #undef TARGET_MD_ASM_ADJUST #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust +#undef TARGET_ASM_FILE_END +#define TARGET_ASM_FILE_END aarch64_asm_file_end + +#undef TARGET_ASM_FUNCTION_EPILOGUE +#define TARGET_ASM_FUNCTION_EPILOGUE aarch64_sls_emit_blr_function_thunks + struct gcc_target targetm = TARGET_INITIALIZER; #include "gt-aarch64.h" diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 2a29d650a24cb5e576620f81b7f6541b0c08d044..660eb207fc87477b9cadbe74b102fca53d64400d 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1019,16 +1019,15 @@ ) (define_insn "*call_insn" - [(call (mem:DI (match_operand:DI 0 "aarch64_call_insn_operand" "r, Usf")) + [(call (mem:DI (match_operand:DI 0 "aarch64_call_insn_operand" "Ucr, Usf")) (match_operand 1 "" "")) (unspec:DI [(match_operand:DI 2 "const_int_operand")] UNSPEC_CALLEE_ABI) (clobber (reg:DI LR_REGNUM))] "" "@ - blr\\t%0 + * return aarch64_indirect_call_asm (operands[0]); bl\\t%c0" - [(set_attr "type" "call, call")] -) + [(set_attr "type" "call, call")]) (define_expand "call_value" [(parallel @@ -1047,13 +1046,13 @@ (define_insn "*call_value_insn" [(set (match_operand 0 "" "") - (call (mem:DI (match_operand:DI 1 "aarch64_call_insn_operand" "r, Usf")) + (call (mem:DI (match_operand:DI 1 "aarch64_call_insn_operand" "Ucr, Usf")) (match_operand 2 "" ""))) (unspec:DI [(match_operand:DI 3 "const_int_operand")] UNSPEC_CALLEE_ABI) (clobber (reg:DI LR_REGNUM))] "" "@ - blr\\t%1 + * return aarch64_indirect_call_asm (operands[1]); bl\\t%c1" [(set_attr "type" "call, call")] ) diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md index d993268a187fad9c80c32b16d8e95b26783bde24..8cc6f50888122b707a087984afc6d5ec354e1e2c 100644 --- a/gcc/config/aarch64/constraints.md +++ b/gcc/config/aarch64/constraints.md @@ -24,6 +24,15 @@ (define_register_constraint "Ucs" "TAILCALL_ADDR_REGS" "@internal Registers suitable for an indirect tail call") +(define_register_constraint "Ucr" + "aarch64_harden_sls_blr_p () ? STUB_REGS : GENERAL_REGS" + "@internal Registers to be used for an indirect call. + This is usually the general registers, but when we are hardening against + Straight Line Speculation we disallow x16, x17, and x30 so we can use + indirection stubs. These indirection stubs cannot use the above registers + since they will be reached by a BL that may have to go through a linker + veneer.") + (define_register_constraint "w" "FP_REGS" "Floating point and SIMD vector registers.") diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index 215fcec5955340288572e816216274faf84ce7b0..1754b1eff9f9bfa1117e03acaf226fde36d53375 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -32,7 +32,8 @@ (define_predicate "aarch64_general_reg" (and (match_operand 0 "register_operand") - (match_test "REGNO_REG_CLASS (REGNO (op)) == GENERAL_REGS"))) + (match_test "REGNO_REG_CLASS (REGNO (op)) == STUB_REGS + || REGNO_REG_CLASS (REGNO (op)) == GENERAL_REGS"))) ;; Return true if OP a (const_int 0) operand. (define_predicate "const0_operand" diff --git a/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c new file mode 100644 index 0000000000000000000000000000000000000000..8adf753b4c5b4802bc80c725c9b36a5e9997b52f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c @@ -0,0 +1,40 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-mharden-sls=blr -mbranch-protection=bti" } */ +/* + Ensure that the SLS hardening of BLR leaves no BLR instructions. + Here we also check that there are no BR instructions with anything except an + x16 or x17 register. This is because a `BTI c` instruction can be branched + to using a BLR instruction using any register, but can only be branched to + with a BR using an x16 or x17 register. + */ +typedef int (foo) (int, int); +typedef void (bar) (int, int); +struct sls_testclass { + foo *x; + bar *y; + int left; + int right; +}; + +/* We test both RTL patterns for a call which returns a value and a call which + does not. */ +int blr_call_value (struct sls_testclass x) +{ + int retval = x.x(x.left, x.right); + if (retval % 10) + return 100; + return 9; +} + +int blr_call (struct sls_testclass x) +{ + x.y(x.left, x.right); + if (x.left % 10) + return 100; + return 9; +} + +/* { dg-final { scan-assembler-not "\tblr\t" } } */ +/* { dg-final { scan-assembler-not "\tbr\tx(?!16|17)" } } */ +/* { dg-final { scan-assembler "\tbr\tx(16|17)" } } */ + diff --git a/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr.c b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr.c new file mode 100644 index 0000000000000000000000000000000000000000..e8d22f438b22e763e1ee3171efc1b8c464b17185 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr.c @@ -0,0 +1,35 @@ +/* { dg-additional-options "-mharden-sls=blr -save-temps" } */ +/* + Ensure that the SLS hardening of BLR leaves no BLR instructions. + We only test that all BLR instructions have been removed, not that the + resulting code makes sense. + */ +typedef int (foo) (int, int); +typedef void (bar) (int, int); +struct sls_testclass { + foo *x; + bar *y; + int left; + int right; +}; + +/* We test both RTL patterns for a call which returns a value and a call which + does not. */ +int blr_call_value (struct sls_testclass x) +{ + int retval = x.x(x.left, x.right); + if (retval % 10) + return 100; + return 9; +} + +int blr_call (struct sls_testclass x) +{ + x.y(x.left, x.right); + if (x.left % 10) + return 100; + return 9; +} + +/* { dg-final { scan-assembler-not "\tblr\t" } } */ +/* { dg-final { scan-assembler "\tbr\tx\[0-9\]\[0-9\]?" } } */