From patchwork Fri Nov 10 17:33:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 1862476 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=armh.onmicrosoft.com header.i=@armh.onmicrosoft.com header.a=rsa-sha256 header.s=selector2-armh-onmicrosoft-com header.b=rh8tYsH2; dkim=pass (1024-bit key) header.d=armh.onmicrosoft.com header.i=@armh.onmicrosoft.com header.a=rsa-sha256 header.s=selector2-armh-onmicrosoft-com header.b=rh8tYsH2; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SRmBx28N8z1yQl for ; Sat, 11 Nov 2023 04:33:45 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DCF8B3858C01 for ; Fri, 10 Nov 2023 17:33:41 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05on2087.outbound.protection.outlook.com [40.107.21.87]) by sourceware.org (Postfix) with ESMTPS id 2E5CF3858D32 for ; Fri, 10 Nov 2023 17:33:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2E5CF3858D32 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2E5CF3858D32 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.21.87 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1699637611; cv=pass; b=ljT0S3STAvwD+w03M5SsY/TfKl0intS9K0FyZlwphvseZxWOnXJ3uNfGiBQm5IGJa84rhnDZuU3vs0NYeVvHk8f0iAu3QsVzsQINvZjk4GWzo3fYZWxAYcyLCUoag+AbXCV3Q/GRUPr8uWmFW0zslrcK7sNuiD9fh7XmiLCqEfI= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1699637611; c=relaxed/simple; bh=9f1whndkPa5MyeLjOSMJnUxMzY+OWAv/TOsWsRG7dxU=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=m1a7RnIO4qWIGsd+0GPib9oVVyXb/re/l9olbpRvhfEn0a10zfmlvJfm1dnF8hvn7fBanlhX0hfqiUyS0HMmKR14vQUYVhXTejm4fEEcpi/Z5Tmy77R4nhOYMdCDTdRi9KDEmKjHQREj3MpCebl8GWx1RODixUvq/1WMsQ55X0A= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=U7JZifYO8DwfWUHpvB1mbMYZEFtIHfOwAZAcwpBjFmBlVNlxlvzCDLmQ8TcN6/uqTdogfU51p31H2rKxsd/iLIQirfGlql/975nYlRMSJTnVesiSWT5DiluCcKbH2MhWDGYFyvrfnjL4DW6tPcaIZWprDMPeg2Bl1wJVPO4Ny8JmQlUt7zNfTEWQN/pEvt323HDgmbOC1bJMty5uo9vPQfDztjnUZoKd6ceF3bGn7fPCRp3YG3CQckH6VZBD3OUC9y8toFDLFcqoUFZ5IACPQj/We7ngX4D+gAja/kz9q3pzOS1Pxst0S3rHQEjEHCPqMeZJosZrCYZlCMNA4ZFqTA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=lu6vZARO1LeQP2CGD/W9puZYKTf3jnGPVrMKxwN89qg=; b=GxZnAd8MyIsVcDU0gvVNfof9fW0/gGmLQTM6Un7mvLG9MJpg3a2C2qG5xhvTM33WVc3Cn9ZVUKqHXy3XM1pBT8zk+dm7G8HMN2uJz/uW8KUDyJGhghXffcfcHJ1ZRvkWjxZZHm/s2CJ5uY9GlWtazI6gQwy5ECfQau9p3YHa+mO3MIIhoW2KIuGkS82t2Snvd1WZtT+ZuYIXo6Q+Mi3OKbPKcGE+8dwjl1p0ldczGuwhOreeQgr2SgH06ThWKXoqiUE+xLr8XFKcFfeGs2QR2RQQbJOflo8Jv8oHUmogy0jUa/BdOgAmMF3fPMivgX1XsUph/NOywMqnWwj4kTm+3A== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=lu6vZARO1LeQP2CGD/W9puZYKTf3jnGPVrMKxwN89qg=; b=rh8tYsH2eRLY/6KI/Uh8jxkCzkCMEyjx6sdCzD7kf9uZx9J5VA8XDzzlkPlQGDRS5Xva2MpRzMuwmiLxN3zqWuZsluaZFlVU4U0W2evtww+7P23yvaA2ENu50nJYE5mWBWxHC+9b/DLnaP8U4JnYWNoEBjBrlXHcKcKpJkpTntc= Received: from DU2PR04CA0262.eurprd04.prod.outlook.com (2603:10a6:10:28e::27) by AS2PR08MB9572.eurprd08.prod.outlook.com (2603:10a6:20b:608::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6954.28; Fri, 10 Nov 2023 17:33:25 +0000 Received: from DB5PEPF00014B9D.eurprd02.prod.outlook.com (2603:10a6:10:28e:cafe::6c) by DU2PR04CA0262.outlook.office365.com (2603:10a6:10:28e::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6977.19 via Frontend Transport; Fri, 10 Nov 2023 17:33:25 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB5PEPF00014B9D.mail.protection.outlook.com (10.167.8.164) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6977.16 via Frontend Transport; Fri, 10 Nov 2023 17:33:25 +0000 Received: ("Tessian outbound 20615a7e7970:v228"); Fri, 10 Nov 2023 17:33:24 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: ccce7936657ed274 X-CR-MTA-TID: 64aa7808 Received: from 54e9b748342e.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 40172A21-892A-4FA4-BEE0-A5FD43688632.1; Fri, 10 Nov 2023 17:33:19 +0000 Received: from EUR02-DB5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 54e9b748342e.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 10 Nov 2023 17:33:19 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=dtLfmzlyey9/3OjwQWNMRSjufPe/G/WsqkZxLyhZBq3m9Xv2Jz39vKRLt4musbqtCqJnruqQ1kyqksavb3wbtelTpPJb+z6es2FNrxZFfOUUsV4zC1HWe+iU/AZ4/A4n0F3nMNrOomKfrgjpZczfAcV/kKXLTp3Lf03MyW25iPr4utXNjk8f7l60dzchqYG2x7lBm9o3ChvBkdsY8+b6gsKPg+TFwfaNl4SG+/DAPJkTPDRHaGG50AY3hdCMgZDkIkd7mh7cLNx5fzlBy2OlHbONOTG7+V/adHTb5Hp+gXvqK5+VIx8VUgKGveT4lc8Hx9EsowM1bKhk7L7sywMqLA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=lu6vZARO1LeQP2CGD/W9puZYKTf3jnGPVrMKxwN89qg=; b=go0kEzFoZDUgPFN/IN/MXFCuVRkbr6WIy9fS2KfkVmUXE6iZGeX65ik/xo0PTS/M5RxBuOCwb1KV2C3H+VEiEdvKU4AZzrJiunix+8pa4kI7BR+4y4bE525jAhcB2m2vEXE9c9THulA1SCJoYt3R23r1arBNRvW6+MhdkmidPIIzRrHzluLTvxdRO7oJONmBDPepkzyP/6gmzg4KDhZe52UIN/Zq3Kzov1ld6d0rIYVeSidZR/xkTHq2xHqvRqVQWrznl7IP+uvdWXd5aNZK2+LGiWJSX7XdpkUZcUBSE+2CZlHD8a0G5M2Nh12i4ZMN61w7lfNPo07AXlNXLS/J8g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=lu6vZARO1LeQP2CGD/W9puZYKTf3jnGPVrMKxwN89qg=; b=rh8tYsH2eRLY/6KI/Uh8jxkCzkCMEyjx6sdCzD7kf9uZx9J5VA8XDzzlkPlQGDRS5Xva2MpRzMuwmiLxN3zqWuZsluaZFlVU4U0W2evtww+7P23yvaA2ENu50nJYE5mWBWxHC+9b/DLnaP8U4JnYWNoEBjBrlXHcKcKpJkpTntc= Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by AS8PR08MB7766.eurprd08.prod.outlook.com (2603:10a6:20b:526::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6977.19; Fri, 10 Nov 2023 17:33:17 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::cfc5:acc1:cfc1:9704]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::cfc5:acc1:cfc1:9704%5]) with mapi id 15.20.6954.028; Fri, 10 Nov 2023 17:33:16 +0000 From: Wilco Dijkstra To: 'GNU C Library' CC: Szabolcs Nagy Subject: [PATCH 1/3] AArch64: Cleanup emag memset Thread-Topic: [PATCH 1/3] AArch64: Cleanup emag memset Thread-Index: AQHaE/tjx2PHJXcNXkuLVKP0zFh/3w== Date: Fri, 10 Nov 2023 17:33:14 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|AS8PR08MB7766:EE_|DB5PEPF00014B9D:EE_|AS2PR08MB9572:EE_ X-MS-Office365-Filtering-Correlation-Id: ed123a0d-d591-48ab-01ae-08dbe21324a5 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: vizeO0cuVHSyg/aMbvA3grfQxulkRlIxk947Q354FSLe6MlULGWkOYqH2o3HrHei/GwUa7fOVY6hhqhPHuFw8rqd6xL2o/iobBI3XAY0kAXLmJZ7ell4whKJ9olOnJ3jgwKWtjnoAvfRk7Mn+rKIjpV4qk8JY3NZyCnf/QFUcqO3MEaNhfUpQQS7BI9rTJkyJfgFS95zcRKmMjE3rypc/uYUED/0KODFTpa5b8usdMIGTo/6niZ/pOf/o/1MfD9SU/b6CGZOB2Uvieq4m1KsEhaputYJlgTTPAYK5mWGBHc1WhF2R89Da0vHDmZxDYWpx6YlTgEiW8M/Hl8z5TO1DNuRKDH0WlXg41XsL1F1xweSJc/keA1VugJTmUBSlAJsuTaU9Ru0Hr1Ol74l3u10OebXNxeSseEa0j8taYd5DkB1exvWyLN2qQIzX9LzlYnb7Mc9ZO6V1njvnRM0WMVr02wrGEaHFSZqmenS7cKShvK0aZbYEXCmE11QnGs6kGl2ND+Edn4d5/D76QOjj7yc4LTckyY3AquRwutUEVIuTi+M1+GAuneGm6DYTrH0ojeKC6Upj7CLHEJOkjzTpAefzluM7pM2nn6ao1rXjBqM6RxnjgcN0PaGVfITjs0fLmicjo0VIaj8z+jiWmTd8BGO5+OO20epKqHUnqt15B1d7oU= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PAWPR08MB8982.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230031)(136003)(366004)(346002)(396003)(39860400002)(376002)(230922051799003)(64100799003)(451199024)(186009)(1800799009)(26005)(38100700002)(6506007)(7696005)(55016003)(122000001)(83380400001)(9686003)(478600001)(316002)(71200400001)(91956017)(66946007)(64756008)(76116006)(66556008)(66476007)(66446008)(6916009)(8936002)(38070700009)(33656002)(41300700001)(4326008)(52536014)(2906002)(8676002)(86362001)(5660300002)(2004002); DIR:OUT; SFP:1101; MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB7766 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB5PEPF00014B9D.eurprd02.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 7cb879aa-a641-4e35-441d-08dbe2131e8d X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: xXrCSgocUMWNCh64uiZ4WxMzmtxpd27v8dALn+mu8Zsy8FC0JVEmp8XB+PS6Al9m3tEHqDAi0oguCjf6S8nztKugtAM/x2nj2hZAq1yf/g67EKOJz4sl+Eam+aGXKwfN/kHjcbI8l9mi5Bic5pa6KUIMyLW4Gkpi1HpL1qjfcZplTGSU4XVaAaXBvKd379RWrijt0F8TcpnvNRVKajsUO1UdMhjtJ521byzFbEKZw0YGTcN3lDxh0PLfwv0axrHghm64uH77XebpFKOayKtCDqFeTylOvmvX+sGDW7tYZ8ePApEA5rSKTLDMo80/uf1nH+A8UWuYShr1ManFZgLPCE1ZtZy7i4lG4dRngsjLJ5V6H6jRa4B8fy/VkT6/eV2rClKQ0m+aiTIjIrylAmEmTFoXOqErHZ3BBWVY/p5aTHOXpLhfGLs8niEPEQqtYm9fzZSlorNEBy5I5FdYzo/lnsH5lpVUNkiSoNlC7znaoVDXeb3cKE0wCX5NEa7eg2NCIgROcdN2rDNySr3FFaPFQumta6O9OpzI0eHiUUXHOIz2icJW+gZSoP1adLV1bwHfbNrbsIGba54dsoZQwOC18Lu+N50/CKx4WyGwdifRpOC+KjwzVpMG3xqpXdKSc9kcsDZNJdvcpjOdkaZHUHoTHkGmdW0fz2pBacjMlW20VNOJKEYtLNGkujE2Lo3SoZxN+4UrLoWkmura2NncaXTqxdusflhtI5X1PjIzF5G+iP8B4XdhwjUODnixztL/m0yC X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230031)(4636009)(376002)(396003)(346002)(39860400002)(136003)(230922051799003)(1800799009)(186009)(82310400011)(64100799003)(451199024)(46966006)(36840700001)(40470700004)(40460700003)(40480700001)(55016003)(7696005)(9686003)(478600001)(6506007)(82740400003)(33656002)(336012)(356005)(86362001)(81166007)(36860700001)(70586007)(2906002)(52536014)(5660300002)(8676002)(26005)(41300700001)(47076005)(70206006)(4326008)(316002)(6916009)(83380400001)(8936002)(2004002); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Nov 2023 17:33:25.0106 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: ed123a0d-d591-48ab-01ae-08dbe21324a5 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB5PEPF00014B9D.eurprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB9572 X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SCC_10_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Cleanup emag memset - merge the memset_base64.S file, remove the ZVA code. OK for commit? Reviewed-by: Adhemerval Zanella diff --git a/sysdeps/aarch64/multiarch/ifunc-impl-list.c b/sysdeps/aarch64/multiarch/ifunc-impl-list.c index 836e8317a5d3b652134d199cf685499983b1a3fc..3596d3c8d3403b4ea07d80d9a8877e2908a9883e 100644 --- a/sysdeps/aarch64/multiarch/ifunc-impl-list.c +++ b/sysdeps/aarch64/multiarch/ifunc-impl-list.c @@ -57,7 +57,7 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, /* Enable this on non-falkor processors too so that other cores can do a comparative analysis with __memset_generic. */ IFUNC_IMPL_ADD (array, i, memset, (zva_size == 64), __memset_falkor) - IFUNC_IMPL_ADD (array, i, memset, (zva_size == 64), __memset_emag) + IFUNC_IMPL_ADD (array, i, memset, 1, __memset_emag) IFUNC_IMPL_ADD (array, i, memset, 1, __memset_kunpeng) #if HAVE_AARCH64_SVE_ASM IFUNC_IMPL_ADD (array, i, memset, sve && zva_size == 256, __memset_a64fx) diff --git a/sysdeps/aarch64/multiarch/memset.c b/sysdeps/aarch64/multiarch/memset.c index 23fc66e15879847557b0e4f6941f03bc7ac5cab9..9193b197ddc3a647768184a6a639d6635cfea96e 100644 --- a/sysdeps/aarch64/multiarch/memset.c +++ b/sysdeps/aarch64/multiarch/memset.c @@ -56,7 +56,7 @@ select_memset_ifunc (void) if ((IS_FALKOR (midr) || IS_PHECDA (midr)) && zva_size == 64) return __memset_falkor; - if (IS_EMAG (midr) && zva_size == 64) + if (IS_EMAG (midr)) return __memset_emag; return __memset_generic; diff --git a/sysdeps/aarch64/multiarch/memset_base64.S b/sysdeps/aarch64/multiarch/memset_base64.S deleted file mode 100644 index 0e8f709fa58478d6e9d62020c576bb9be108866c..0000000000000000000000000000000000000000 --- a/sysdeps/aarch64/multiarch/memset_base64.S +++ /dev/null @@ -1,185 +0,0 @@ -/* Copyright (C) 2018-2023 Free Software Foundation, Inc. - - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library. If not, see - . */ - -#include -#include "memset-reg.h" - -#ifndef MEMSET -# define MEMSET __memset_base64 -#endif - -/* To disable DC ZVA, set this threshold to 0. */ -#ifndef DC_ZVA_THRESHOLD -# define DC_ZVA_THRESHOLD 512 -#endif - -/* Assumptions: - * - * ARMv8-a, AArch64, unaligned accesses - * - */ - -ENTRY (MEMSET) - - PTR_ARG (0) - SIZE_ARG (2) - - bfi valw, valw, 8, 8 - bfi valw, valw, 16, 16 - bfi val, val, 32, 32 - - add dstend, dstin, count - - cmp count, 96 - b.hi L(set_long) - cmp count, 16 - b.hs L(set_medium) - - /* Set 0..15 bytes. */ - tbz count, 3, 1f - str val, [dstin] - str val, [dstend, -8] - ret - - .p2align 3 -1: tbz count, 2, 2f - str valw, [dstin] - str valw, [dstend, -4] - ret -2: cbz count, 3f - strb valw, [dstin] - tbz count, 1, 3f - strh valw, [dstend, -2] -3: ret - - .p2align 3 - /* Set 16..96 bytes. */ -L(set_medium): - stp val, val, [dstin] - tbnz count, 6, L(set96) - stp val, val, [dstend, -16] - tbz count, 5, 1f - stp val, val, [dstin, 16] - stp val, val, [dstend, -32] -1: ret - - .p2align 4 - /* Set 64..96 bytes. Write 64 bytes from the start and - 32 bytes from the end. */ -L(set96): - stp val, val, [dstin, 16] - stp val, val, [dstin, 32] - stp val, val, [dstin, 48] - stp val, val, [dstend, -32] - stp val, val, [dstend, -16] - ret - - .p2align 4 -L(set_long): - stp val, val, [dstin] - bic dst, dstin, 15 -#if DC_ZVA_THRESHOLD - cmp count, DC_ZVA_THRESHOLD - ccmp val, 0, 0, cs - b.eq L(zva_64) -#endif - /* Small-size or non-zero memset does not use DC ZVA. */ - sub count, dstend, dst - - /* - * Adjust count and bias for loop. By subtracting extra 1 from count, - * it is easy to use tbz instruction to check whether loop tailing - * count is less than 33 bytes, so as to bypass 2 unnecessary stps. - */ - sub count, count, 64+16+1 - -#if DC_ZVA_THRESHOLD - /* Align loop on 16-byte boundary, this might be friendly to i-cache. */ - nop -#endif - -1: stp val, val, [dst, 16] - stp val, val, [dst, 32] - stp val, val, [dst, 48] - stp val, val, [dst, 64]! - subs count, count, 64 - b.hs 1b - - tbz count, 5, 1f /* Remaining count is less than 33 bytes? */ - stp val, val, [dst, 16] - stp val, val, [dst, 32] -1: stp val, val, [dstend, -32] - stp val, val, [dstend, -16] - ret - -#if DC_ZVA_THRESHOLD - .p2align 3 -L(zva_64): - stp val, val, [dst, 16] - stp val, val, [dst, 32] - stp val, val, [dst, 48] - bic dst, dst, 63 - - /* - * Previous memory writes might cross cache line boundary, and cause - * cache line partially dirty. Zeroing this kind of cache line using - * DC ZVA will incur extra cost, for it requires loading untouched - * part of the line from memory before zeoring. - * - * So, write the first 64 byte aligned block using stp to force - * fully dirty cache line. - */ - stp val, val, [dst, 64] - stp val, val, [dst, 80] - stp val, val, [dst, 96] - stp val, val, [dst, 112] - - sub count, dstend, dst - /* - * Adjust count and bias for loop. By subtracting extra 1 from count, - * it is easy to use tbz instruction to check whether loop tailing - * count is less than 33 bytes, so as to bypass 2 unnecessary stps. - */ - sub count, count, 128+64+64+1 - add dst, dst, 128 - nop - - /* DC ZVA sets 64 bytes each time. */ -1: dc zva, dst - add dst, dst, 64 - subs count, count, 64 - b.hs 1b - - /* - * Write the last 64 byte aligned block using stp to force fully - * dirty cache line. - */ - stp val, val, [dst, 0] - stp val, val, [dst, 16] - stp val, val, [dst, 32] - stp val, val, [dst, 48] - - tbz count, 5, 1f /* Remaining count is less than 33 bytes? */ - stp val, val, [dst, 64] - stp val, val, [dst, 80] -1: stp val, val, [dstend, -32] - stp val, val, [dstend, -16] - ret -#endif - -END (MEMSET) diff --git a/sysdeps/aarch64/multiarch/memset_emag.S b/sysdeps/aarch64/multiarch/memset_emag.S index 6fecad4fae699f9967da94ddc88867afd5c59414..bbfa815925899149e2313a9317380fa9fd089abd 100644 --- a/sysdeps/aarch64/multiarch/memset_emag.S +++ b/sysdeps/aarch64/multiarch/memset_emag.S @@ -18,17 +18,95 @@ . */ #include +#include "memset-reg.h" -#define MEMSET __memset_emag - -/* - * Using DC ZVA to zero memory does not produce better performance if - * memory size is not very large, especially when there are multiple - * processes/threads contending memory/cache. Here we set threshold to - * zero to disable using DC ZVA, which is good for multi-process/thread - * workloads. +/* Assumptions: + * + * ARMv8-a, AArch64, unaligned accesses + * */ -#define DC_ZVA_THRESHOLD 0 +ENTRY (__memset_emag) + + PTR_ARG (0) + SIZE_ARG (2) + + bfi valw, valw, 8, 8 + bfi valw, valw, 16, 16 + bfi val, val, 32, 32 + + add dstend, dstin, count + + cmp count, 96 + b.hi L(set_long) + cmp count, 16 + b.hs L(set_medium) + + /* Set 0..15 bytes. */ + tbz count, 3, 1f + str val, [dstin] + str val, [dstend, -8] + ret + + .p2align 3 +1: tbz count, 2, 2f + str valw, [dstin] + str valw, [dstend, -4] + ret +2: cbz count, 3f + strb valw, [dstin] + tbz count, 1, 3f + strh valw, [dstend, -2] +3: ret + + .p2align 3 + /* Set 16..96 bytes. */ +L(set_medium): + stp val, val, [dstin] + tbnz count, 6, L(set96) + stp val, val, [dstend, -16] + tbz count, 5, 1f + stp val, val, [dstin, 16] + stp val, val, [dstend, -32] +1: ret + + .p2align 4 + /* Set 64..96 bytes. Write 64 bytes from the start and + 32 bytes from the end. */ +L(set96): + stp val, val, [dstin, 16] + stp val, val, [dstin, 32] + stp val, val, [dstin, 48] + stp val, val, [dstend, -32] + stp val, val, [dstend, -16] + ret + + .p2align 4 +L(set_long): + stp val, val, [dstin] + bic dst, dstin, 15 + /* Small-size or non-zero memset does not use DC ZVA. */ + sub count, dstend, dst + + /* + * Adjust count and bias for loop. By subtracting extra 1 from count, + * it is easy to use tbz instruction to check whether loop tailing + * count is less than 33 bytes, so as to bypass 2 unnecessary stps. + */ + sub count, count, 64+16+1 + +1: stp val, val, [dst, 16] + stp val, val, [dst, 32] + stp val, val, [dst, 48] + stp val, val, [dst, 64]! + subs count, count, 64 + b.hs 1b + + tbz count, 5, 1f /* Remaining count is less than 33 bytes? */ + stp val, val, [dst, 16] + stp val, val, [dst, 32] +1: stp val, val, [dstend, -32] + stp val, val, [dstend, -16] + ret -#include "./memset_base64.S" +END (__memset_emag)